Keywords

1 Introduction

Cancer is a disease mainly caused by the accumulation of mutations in two gene classes, which are proto-oncogenes and tumor suppressor genes (Weinberg 1996). With its incidence growing rapidly, cancer is regarded as an important obstacle to human life extension (Torre et al. 2016b). In terms of cancer deaths worldwide for both men and women, lung cancer, colorectal cancer, and liver cancer are top three cancer types (Sung et al. 2021).

Since the twentieth century, lung cancer started to become the most common cause of cancer death as well as the second most commonly occurring cancer in both men and women internationally (Alberg and Samet 2003). It also ranks the most frequently diagnosed cancer and the leading cause of cancer mortality in men (Sung et al. 2021). In the United States, every year the number of patients who die from lung cancer is higher than the combined death toll from colon, breast, and prostate cancer (Spiro and Silvestri 2005). Tobacco smoking is regarded as the leading cause of lung cancer (Salgia and Skarin 1998). Compared with non-smokers, smokers have a 20- to 30-fold increase in lung cancer risk (Minna et al. 2002). Hence, the industrialized countries, where the smoking prevalence first took place, have the highest lung cancer incidence rates (Alberg et al. 2005). Over the past several decades, because of tobacco control policies and smoking cessation, the smoking prevalence keeps decreasing in those countries (De Groot et al. 2018), thus the burden of lung cancer shifts to developing countries (Torre et al. 2016a). People also gain more knowledge in lung cancer biology. The majority of lung cancers have been divided into four histological types, which are small-cell lung cancer (SCLC) and three non-small-cell lung cancer (NSCLC) types including squamous cell carcinoma, adenocarcinoma, and large cell carcinoma (Wistuba and Gazdar 2006). However, the mortality rates of lung cancer still remain high (Barta et al. 2019), which might be explained by nonspecific symptoms of this disease at early stages (Van Meerbeeck et al. 2011). When seek medical treatment, most patients present with advanced disease which is nearly incurable (Patz et al. 2000).

Nowadays, colorectal cancer (CRC) is the second most common cause of cancer-related death worldwide and the third most common malignant disease (Center et al. 2009). Generally, colorectal cancer has been thought as a disease of the elderly, with rare people being diagnosed before 50, but it also strikes younger people (O'connell et al. 2004). In addition, colorectal cancer is the only type that strikes both men and women with approximately equal frequency (Potter 1999), since it is the second most common cancer in females and the third most common cancer in males (Siegel et al. 2014). What’s more, the incidence rates of colorectal cancer vary greatly around the world (Stintzing 2014). It is well-known that most cases of colorectal cancer are detected in western countries (Mármol et al. 2017), because people in longstanding developed countries often exhibit same factors playing important roles in the development of colorectal cancer, which might include obesity, unhealthy diet, smoking, alcohol consumption, and physical inactivity (Fearon 1995; Weinberg and Schoen 2014). However, in recent years, high incidence rates of CRC have been observed in newly developed countries where the risk of suffering from colorectal cancer was once quite low (Mármol et al. 2017).

Liver cancer is regarded as an aggressive and heterogeneous tumor which ranks the third most common cause of cancer-related death as well as the second leading cause of cancer-related death in man around the world (Yamashita and Wang 2013; Gao et al. 2019). In addition, liver cancer can be divided into primary liver cancer and secondary liver cancer in nature (Mckillop and Schrum 2005). As for primary liver cancer (PLC), based on different histological features, it can be categorized into six subtypes, which are hepatocellular carcinoma (HCC), intrahepatic cholangiocarcinoma (iCCA), mixed hepatocellular-cholangiocarcinoma (HCC-CCA), fibrolamellar HCC, and the pediatric neoplasm hepatoblastoma (Mcglynn et al. 2001; Srivatanakul et al. 2004). Among these histological types, HCC is the commonest primary liver cancer worldwide which accounts for nearly 90% of all cases of primary liver malignancies (Ariff et al. 2009). The second most frequent type of primary liver cancer is iCCA, the incidence rates increase steadily (Sia et al. 2017). What’s more, the incidence rates of liver cancer in different countries vary significantly (Bosch et al. 1999). Blaming for hepatitis B virus (HBV) infection, the East and Southeast Asia as well as the Middle and Western Africa have the highest liver cancer rates (Bosch et al. 2004). Thanks to HBV vaccine, liver cancer incidence rates is decreasing in several highest-risk areas (Chen and Zhang 2011). However, in some low-risk western countries, the rates continue to increase. Risk factors such as obesity, cigarette smoking, hepatitis C virus (HCV) infection and chronic alcohol abuse are believed to be related to liver cancer in these areas (Bishayee 2014). Gender is another risk factor for liver cancer development, males are more susceptible than females, as the incidence rates of liver cancer among men is over twice that among women (Liu et al. 2015).

Through molecular and genetic studies of cancer, multiple biomarkers of colorectal cancer, liver cancer, and lung cancer have been identified (Zochbauer-Muller and Minna 2000; Bishayee 2014; Dienstmann et al. 2017). However, it is quite difficult to find diagnostic, prognostic, and therapeutic targets from these outcomes, and the morality is still high for patients all over the world (Chakraborty et al. 2018). With the advancement of high-throughput omics technologies, researchers are now able to study genomics, transcriptomics, proteomics, and phosphoproteomic data at the same time (Ahmed 2020). Although through analyzing single omics data set, one can observe the alternation and association of biological entities at that level, the interaction between multiple molecular layers cannot be fully assessed (Biswas and Chakrabarti 2020). Hence, in lung cancer, liver cancer, and colon cancer research, many multi-omics analyses have been conducted in order to gain a holistic view of the molecular dynamics underlying cancer progression and to make a progress in early detection and prognosis (Sun and Hu 2016). Also, because of the heterogeneous nature of cancer, different patient may have different clinical responses to the same treatment (Du and Elemento 2015). For this problem, multi-omics studies at an individual level have been conducted to develop precision cancer medicine (Ghosh et al. 2018; Mantini et al. 2021).

In this review, we introduced different types of omics data used in the research of colorectal cancer, liver cancer, and lung cancer. In addition, we summarized currently used technologies for high-throughput multi-omics data analysis. We also reviewed integrative analyses using genomic, epigenomic, transcriptomic, proteomic, and metabolomics data that helped reveal the molecular pathology of colorectal cancer, liver cancer, and lung cancer. Finally, we discussed challenges and envisioned the future of precision cancer medicine.

2 Various Multi-Omics Data Types and Selected Repositories

With the advent of sequencing technologies, biomolecules in a given biological samples can be identified and quantified at multiple omics levels (Das et al. 2020). Next-generation sequencing (NGS) is now frequently used for whole-genome or whole-exome sequencing (Behjati and Tarpey 2013). ChIP-seq (chromatin immunoprecipitation) and DNase1-seq (DNase I hypersensitive sites-sequencing) are used for detection of DNA-protein interactions. RNA-seq can be used to identify and quantify RNA molecules (Kim and Dekker 2018; Lu et al. 2019). As for proteomic and metabolomic study, mass-spectrometry based techniques are widely used (Domon and Aebersold 2006). Omics data generated by these techniques, including but not limited to genomic, epigenomic, transcriptomic, proteomic, and metabolomic, is together called as multi-omics data (Liu et al. 2019). There are several publicly accessible databases listed in references (Huang et al. 2017; Subramanian et al. 2020), which accommodate multiple omics data sets and serve as rich resources for understanding the etiology of human cancer.

2.1 DriverDB v3

The DriverDB database (http://ngs.ym.edu.tw/driverdb/) contains numerous exome-seq data that was extracted from The Cancer Genome Atlas (TCGA), The International Cancer Genome Consortium (ICGC), Prostate Cancer Genetics Project (PCGP), The Therapeutically Applicable Research to Generate Effective Treatments (TARGET), and published papers (Cheng et al. 2014). More exome-seq data as well as additional RNA-seq data from TCGA, ICGC, and published papers were added to updated DriverDB v2 (Chung et al. 2016). DriverDB v3, the latest version, incorporated not only new exome-seq and RNA-seq datasets but also copy number variation (CNV), methylation, and smRNA-seq datasets. By applying various bioinformatic tools it contains, users can identify abnormalities at multi-omics levels and discover driver genes and mutations (Liu et al. 2020a).

2.2 TCGA Portal

The Cancer Genome Atlas (TCGA) was launched by The National Institute of Health (NIH) in 2006 aiming to reveal genomic and epigenomic alternations associated with 32 types of human cancers (Wang et al. 2016). For each type of human cancer, various kinds of data including gene expression, exon expression, miRNA expression, protein expression, single nucleotide polymorphism (SNP), copy number variation (CNV), loss of heterozygosity (LOH), and DNA methylation has been generated and processed (Tomczak et al. 2015). The aforementioned data are stored in a free-access database, namely the TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga/). Without a doubt, the wealth of TCGA data has led to the discovery of diagnostic biomarkers and development of new cancer therapies (Colaprico et al. 2016).

2.3 ICGC

The International Cancer Genome Consortium (ICGC; https://icgc.org/) mainly contains mutational genomic data in nearly 50 cancer types. The International Cancer Genome Consortium Data Portal (https://dcc.icgc.org) is a user-friendly platform which helps users visualize, analyze, and interpret cancer-related genetic, molecular, and clinical data it contains. This may lead to deeper understanding of tumor biology as well as development of better diagnostic methods and drugs (Zhang et al. 2019).

2.4 CCLE

In order to promote the translation of genetic and pharmacological data generated by cancer cell line studies into understanding of cancer progression and development of novel therapies, Cancer Cell Line Encyclopedia (https://portals.broadinstitute.org/ccle) was built by the collaboration between the broad Institute and the Novartis Institute (Barretina et al. 2012). The original release of CCLE contains a large-scale genomic data set from 947 human cancer cell lines and pharmacological profiling of 24 anticancer drugs across 479 of those cell lines. Later, whole genome sequencing, RNA-seq, miRNA profiling, and histone profiling were added to it (Nusinow et al. 2020).

2.5 LinkedOmics

The LinkedOmics database (http://www.linkedomics.org) contains mass spectrometry (MS)-based global proteomics data which was downloaded from the Clinical Proteomic Tumor Analysis Consortium (CPTAC). Multi-omics data including genomic, epigenomic, and transcriptomic data as well as clinical data for 32 TCGA cancer types which were downloaded from The Cancer Genome Atlas (TCGA) project were also added to this database. Aiming to allow users to analyze these data in detail, LinkedOmics provided three analysis modules, namely LinkFinder, LinkCompare, and LinkInterpreter. For each cancer cohort, the LinkFinder module allows user to find associations between an attribute of interest and all other attributes. These associations can be compared with query attributes through the LinkCompare module and interpreted through the LinkInterpreter module. The results are presented in the form of plot or heatmap, which may effectively help users gain biological understanding (Vasaikar et al. 2018).

2.6 RHPCG

Consisting of a group of kinases, hippo signaling pathway is a highly conserved pathway which plays important roles in controlling cell proliferation, apoptosis, and migration. Dysregulation of Hippo signaling pathway is involved in the initiation and progression of cancer, such as breast cancer, lung cancer and so on. The Regulation of the Hippo Pathway in Cancer Genome database (http://www.medsysbio.org/RHPCG) can serve as an open resource for visualizing alternations of Hippo pathway genes as well as understanding the roles of Hippo pathway in cancer, because RHPCG was designed to allow users easily search, view, and download alternations of core Hippo-protein-encoding genes in 33 cancer types at levels of genomics, epigenomics, and transcriptomics (Wang et al. 2019).

2.7 MOBCdb

The Multi-Omics Breast Cancer Database (http://bigd.big.ac.cn/MOBCdb/) was constructed in order to facilitate identification of breast cancer subtypes and discovery of novel biomarkers. MOBCdb contains SNV, gene expression, microRNA expression, DNA methylation, clinical, and drug response data that were downloaded from the TCGA data portal, GENECODE, miRBase, PharmGKB, and NCBI. With more than 10,000 files stored in the database, MOBCdb provides several methods to help users effectively gain information. In addition, by using the genome-wide browser in MOBCdb, users can visualize different omics data easily. The survival module was designed to help users find new biomarkers (Xie et al. 2018).

2.8 Target

The Therapeutically Applicable Research to Generate Effective Treatments database (https://ocg.cancer.gov/programs/target) was built by the cooperation of extramural and NCI investigators. TARGET originated with two pilot projects, now it contains the clinical information, gene expression, miRNA expression, copy number, and sequencing data of 24 molecular types of cancer. The effort of TARGET researchers has undoubtedly accelerated discoveries of genomic alterations in cancer and facilitated rapid translation of those findings into the clinic (Wu et al. 2021a).

There is much information that can be obtained from the data sets stored in the aforementioned databases. For instance, genomic studies can reveal the associations between tumorigenesis and genetic mutations (Ghosh et al. 2018). Also, Epigenomic data can lead to knowledge regarding how chemical modifications of DNA and protein drive tumorigenesis (Rhee 2018). Similarly, transcriptomic profiling can be used to detect the association between cancer and dysregulated genes (Canzler et al. 2020). Proteomic data can help researchers better understand its function in human cancer (Matthiesen and Jensen 2008). Because each omics data type only provides a partial view of the complexity of cancer, biological mechanisms can be fully captured only through integrating different omics data types (Hao et al. 2019).

3 Selected Integrative Tools for Multi-Omics Analysis

Cancer is a consequence of malfunction and alteration in multiple molecular layers (Hausman 2019). With decreasing time and cost to generate multiple omics datasets from biological samples, an increased need for large-scale omics analysis tools emerged to explore relationships between different biological readouts (Altenbuchinger et al. 2020). Usually, steps to conduct an integrative analysis of these readouts include data normalization, variable selection, cluster analysis, and dimensional reduction (Meng et al. 2016; Chauvel et al. 2020; Nicora et al. 2020). In this section, we review eight computational integrative tools that are capable of multi-omics data analysis. The first five tools were designed to reveal the biological mechanisms connecting identified key drivers and pathways to diseases. The remaining three tools can be used to discover new therapeutic interventions or support clinical decision making.

Integrative Omics Data Analysis (iODA) is a software for omics data analysis, which is written in Java and able to run on Windows or Linux operating systems. iODA can integrate and refine data generated by RNA-seq, miRNA-seq, and ChIP-seq, which leads to the revelation of complex pathogenesis of human cancer. There are six statistical methods included, namely Least Sum of Ordered Subset Squared, Cancer Outlier Profile Analysis, Maximum Ordered Subset T-statistics, Outlier Robust T-statistics, Outlier Sum, and t-test, which can be selected by users to process their input data. Then, differentially expressed genes and miRNAs as well as transcription factor binding sites are extracted for the following pathway enrichment analysis and consistency analysis. The dysfunctional molecules are mapped on the KEGG pathway, and the consistent molecular signatures are identified as key pathogenic factors in cancer. The source code as well as executable file of iODA can be downloaded at http://www.sysbio.org.cn/iODA for free (Yu et al. 2020).

The interactive tool for statistical analysis of omics and clinical data (IOAT in short) is a R and Python-based Windows application for analyzing and visualizing multi-omics and clinical data. IOAT is a user-friendly tool designed for non-programmers. It can perform feature screening, risk assessment, clustering, and survival analysis after reading a comma-separated value text file imported by users and preprocessing the multi-omics and clinical data contained in the file. All results are displayed in a report, which enables users to view the outcomes of each step and thus gain a better understanding of their data. Additionally, IOAT considers data breaches. After downloading an executable file from https://github.com/WlSun shine/IOAT-software, users can use this desktop tool without the need for network connectivity, ensuring the security of their personal data (Wu et al. 2021b).

MEXPRESS is a simple and user-friendly web tool for visualizing and interpreting multiple omics data that does not require clinical researchers to be programmers. Users can view gene expression, DNA methylation, and clinical data extracted from TCGA by entering a gene name and selecting a cancer type. MEXPRESS can also be used to conduct statistical analyses on these datasets and determine their correlation, which is extremely useful for biomarker discovery (Koch et al. 2015). While the core functions of MEXPRESS remain unchanged in the new version released in 2019, new data types, statistical methods, and options are included. All code is available for free download at https://github.com/akoch8/mexpress (Koch et al. 2019).

PROMO is a powerful and integrative Windows software written in Matlab that is designed to analyze large genomic and clinical datasets contained in multi-omics databases effectively. It includes several features such as data preprocessing, exploration and visualization, clustering, enrichment analysis, biomarker discovery, and classification of cancer subtypes. After importing a multi-omics dataset into PROMO, users can discover correlations between features at various multi-omics levels as well as the genes involved in biological differences, resulting in a better understanding of biological mechanisms and the discovery of new biomarkers. PROMO is freely accessible to the public at http://acgt.cs.tau.ac.il/promo/ (Netanely et al. 2019).

Chromatin structures, such as topologically associating domains (TAD) and TAD boundaries, are critical for gene expression regulation. Changes in the structure of chromatin may contribute to the progression of human cancer (Valencia and Kadoch 2019). PredTAD is a machine learning tool that uses the Gradient Boosting Machine (GBM) algorithm to predict 3D chromatin structures. It makes use of genomic and epigenomic data to predict and detect TAD boundary variants in normal and cancer cell genomes. Correlations between TAD boundary alternations and the expression of nearby genes can be identified using RNA-seq data analysis. Because genes located near altered boundaries may be involved in a cascade of oncogenic signaling pathways, PredTAD is an effective tool for transforming genomic and ChIP data into an understanding of the roles of chromatin structures in cancer progression. The source code for PredTAD is available at https://github.com/jchyr-sbmi/PredTAD/ (Chyr et al. 2021).

IOBR is a computational tool for interpreting multi-omics data; its application in immuno-oncology biological research has the potential to shed new light on tumor-immune interactions and accelerate the development of immunotherapies. It is composed of four functional modules: signature and tumor microenvironment (TME) estimation, phenotype estimation, mutation estimation, and module construction. IOBR is capable of identifying signature genes and phenotype-relevant signatures, analyzing signature-associated mutations, and building models using previously identified signatures. These models can be used to forecast therapy response, prognosis for cancer, and tumor resistance. The IOBR R package can be downloaded from https://github.com/IOBR/IOBR (Zeng et al. 2021).

DrugComboExplorer, a computational systems biology tool, predicts drug combinations for specific cancer types by integrating DNA-seq, RNA-seq, methylation, and gene copy number data. It processes multi-omics data from cancer patients, identifies driver signaling networks, and quantifies the efficacy of combinatorial drugs on these networks using multiple algorithms. Combinations of optimal drugs that target driver signaling networks may be a way to copy resistance progression. The source code for DrugComboExplorer is available at https://github.com/Roosevelt-PKU/drugcombinationprediction (Huang et al. 2019).

OncoPDSS is a system that interprets multi-omics variants detected in cancer samples as supporting evidence for clinical pharmacotherapy decision-making. It contains the OncoPDSS knowledgebase (OncoPDSSkb), which was created to store data on drug-drug interactions, clinical trials for cancer, and drug indications. OncoPDSS imports user-uploaded variants. It uses a classification strategy to determine whether pharmacotherapies are potentially effective or not based on OncoPDSSkb mutation records, cancer records, and drug records that serve as oncology pharmacotherapy evidence. As a result, this tool will significantly aid clinicians and physicians in making clinical decisions, while also providing cancer researchers with novel treatment strategies. OncoPDSS is accessible via the following link: https://oncopdss.capitalbiobigdata.com (Xu et al. 2020a).

Recent cancer projects as well as multi-omics databases provide the research community with a wealth of omics data and clinical information on cancer patients (Cieslik and Chinnaiyan 2020). Integrative analysis of these data is challenging and requires bioinformatics, statistical, and programming skills (Chakraborty et al. 2018; Park et al. 2020). Numerous tools have been built to solve this problem. However, some limitations still exist. For instance, iODA only supports the analysis of mRNA, miRNA, and ChIP-seq data (Yu et al. 2020). Efforts should be devoted to develop new tools that can be applied for all omics data types. In addition, several tools utilize the R language, which is not friendly for researchers with limited biostatistical or programming knowledge (Eicher et al. 2020; Graw et al. 2021). Web-based interfaces should be created to allow fundamental researchers to leverage the merits of multi-omics tools.

4 Overview of Cancer Multi-Omics Research

4.1 Lung Cancer

Lung cancer is a highly complex and heterogeneous disease (De Sousa and Carvalho 2018). In recent decades, cancer researches focusing on the discovery of prognostic indicators and therapeutic targets have already been made (Jones and Baldwin 2018). Li proposed a novel method for mining cancer-related gene modules based on multi-omics data. First, genome-wide regulatory networks were constructed using key regulatory factors identified by feature selection method. Second, dysregulated gene sets were identified by comparing regulatory networks in variant and non-variant samples, which were then used to generate cancer-related gene modules. This new mining method has been proved to be applicable to lung cancer research (Li et al. 2019). By analyzing genomic, transcriptomics, and proteomic data, Kong et al. identified abnormal expressed membrane proteins in highly metastatic lung cancer cells. The high expression level of CDH2, EGFT, ITGA3, ITGB1, ITGA5 and low expression level of CALR were found to be associated with cancer metastasis (Kong et al. 2020).

Small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) are two major types of lung cancer (Wu et al. 2020). Patients diagnosed as NSCLC accounts for nearly 85% of all lung cancer patients, which makes NSCLC the most common histological type of lung cancer (Wang et al. 2018). Chen et al. performed gene expression, prognosis, DNA methylation, and gene mutation analysis of NUF2 gene. It was shown that that the more NUF2 expressed, the poorer prognosis patients had. Thus, NUF2 might be considered as a prognostic biomarker of NSCLC and can be used for cancer treatment (Chen et al. 2014). Luan et al. integrated DNA methylation, RNA, miRNA and DNA copy number data to construct a survival risk model. Based on this, the chromosome regions 17q24.3 and 11p15.5 were identified as the copy number variation regions that were associated with NSCLC patient survival (Luan et al. 2020).

NSCLC can be further divided into three main subtypes, lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), and large cell carcinoma (LCC) (Herbst et al. 2018).

Numerous potential biomarkers have been identified as a result of advancements in the molecule biology of LUAD. Paula et al. used proteomic data, ChIP-seq and RNA-seq assays to demonstrate that MGA gene, which is mutated and copy number deleted in LUAD, acts as a tumor suppressor by repressing genes activated by the MYC pathway. This discovery may open new therapeutic avenues (Llabata et al. 2020). Zhang et al. estimated different tumor microenvironment infiltration patterns and the correlation between these patterns and the genetic or epigenetic alterations by analyzing expression, RNA-seq, WES, and DNA methylation profiles. A prognosis model was constructed using the detected genetic and epigenetic alternations, which may aid in the development of a more accurate prognostic predictor for human LUAD (Zhang et al. 2020b). Ken built a SVM to subclass patients based on their survival based on clinical data from LUAD. By combining RNA expression and miRNA expression data of these subtypes, six genes were efficiently identified to be associated with LUAD patient survival: ERO1B, DPY19L1, NCAM1, RET, MARCH1, and SLC7A8 (Asada et al. 2020). Lee et al. applied mRNA, miRNA, DNA methylation and CNV data to develop a deep learning autoencoding approach for survival risk stratification. They successfully identify significant prognostic difference between two groups of LUAD patients using this model (Lee et al. 2020).

LUSC has a worse prognosis than LUAD (Zhang et al. 2020a). Numerous studies have already been conducted to ascertain the molecular characteristics of this subtype. According to Zhang, an integrative analysis of methylation and gene expression data revealed that 113 methylation features and 23 gene expression features are strongly associated with lung cancer. SFTA3 and LPP may serve as molecular markers for subtyping NSCLC (Zhang et al. 2020a). Additionally, Xu et al. investigated the gene expression changes associated with DNA copy number or DNA methylation in LUSC patients by integrating genomic, transcriptomic, and epigenetic data. Seven genes expressed at a high level, which could be due to CNV or methylation and result in a poor prognosis (Xu et al. 2020c). Additionally, Hu et al. examined multi-omics differences between LUSC patients with high and low levels of programmed death 1 expression (PD1). It was discovered that 178 genes involved in immunity were significantly upregulated in the high expression group, which may contribute to a better understanding of the relationship between PD1 and immunotherapy effect (Hu et al. 2020).

Pulmonary sarcomatoid carcinomas (PSC) is a rare tumor in the family of NSCLC (Antoine et al. 2016). Yang et al. conducted multi-omics analysis of PSC samples and found out that PSC may be converted from the epithelial components and can be divided into five subtypes based on different histological morphologies (Yang et al. 2020b). Also, it was delineated that a large portion of patients had mutations in the p53, RTK/RAS, and PI3K pathways, suggesting that targeted therapy could be an option for patients with PSC (Yang et al. 2020b). Totally, their study shed light on the biological nature and brought entry points for the treatment of this rare malignancy (Yang et al. 2020b).

4.2 Colorectal Cancer

Colorectal cancer (CRC) is a heterogeneous disease (Berg et al. 2017; Almusawi et al. 2021). Various studies performed in recent years have provided insights into the molecular characteristics of CRC. Xu et al. explored genes related to CRC prognosis and incidence (Xu et al. 2020b). Genes annotated with single nucleotide mutation sites, copy number variation sites, and methylation sites along with differentially expressed genes were identified as candidate genes (Xu et al. 2020b). Moreover, a weighted gene co-expression network analysis was performed to search for hub genes (Xu et al. 2020b). Finally, LRRC26 and REP15 were identified as CRC-specific driving genes (Xu et al. 2020b). Yuan et al. attempted to link genetic variants, genes, and risk of CRC (Yuan et al. 2021). They conducted expression quantitative trait loci (eQTL) analysis, meta-analysis, and methylation quantitative trait loci (mQTL) analysis of 131 lead SNPs to explore potential target genes (Yuan et al. 2021). In addition, a colocalization analysis of genes identified in the previous step was performed, which revealed 66 putative susceptibility genes in CRC (Yuan et al. 2021). Ayiomamitis et al. investigated the roles of cyclooxygenase 2 (COX-2), an enzyme that promotes prostaglandin E2 (PGE2) production, and human telomerase reverse transcriptase (hTERT), a component of telomerase, in the onset of CRC (Ayiomamitis et al. 2019). By analyzing the expression levels of COX-2, PGE2, and hTERT along with telomerase activity, they demonstrated that COX-2 plays a key role in the initial stages of CRC development (Ayiomamitis et al. 2019). Also, high COX-2 expression was found to be associated with low hTERT expression and a better survival among CRC patients (Ayiomamitis et al. 2019). To gain a better understanding of the clinical relevance between obesity and CRC, Holowatyj et al. performed transcriptomic analysis on visceral adipose and tumor tissues and metabolomics analysis on blood samples of CRC patients (Holowatyj et al. 2020). Combining results generated by each omics measurement, they elucidated that glycolytic metabolism, GPVI signaling, and fibrosis participated in the adipose-tumor crosstalk and could promote CRC development (Holowatyj et al. 2020). Ghaffari et al. investigated the underlying mechanisms that drive metastatic progression (Ghaffari et al. 2021). They performed RNA-seq, ChIP-seq, and ATAC-seq on a CRC cell line (Ghaffari et al. 2021). Then, a statistical model was used to comprehensively analyze these multi-omics profiles along with TF-DNA binding information (Ghaffari et al. 2021). It was elucidated that JunD, a TF, plays a crucial role in CRC migration and invasion (Ghaffari et al. 2021).

It is widely accepted that most colorectal cancers arise as a result of transformation from adenoma to adenocarcinoma (Lam et al. 2016), which is triggered by the stepwise accumulation of genetic and epigenetic mutations (Aarons et al. 2014). Using the deep learning framework, Lv et al. constructed a prognostic model for patients with colon adenocarcinoma (COAD) using the TCGA and GEO databases (Lv et al. 2020). After applying this model to the TCGA dataset, it was discovered that two subgroups with significantly different survival rates existed. Further analysis of these two subgroups revealed 1217 differentially expressed genes and ten differentially expressed miRNAs, which may aid in deciphering the mechanisms underlying COAD development (Lv et al. 2020). Yin et al. proposed an approach to detect potential prognosis risk biomarkers (PRBs) (Yin et al. 2020). First, based on gene expression, exon expression, DNA methylation, and somatic mutation profiles along with clinical information of COAD patients, the multi-omics-based prognostic analysis (MPA) model was used to select features closely related to the prognosis of COAD patients (Yin et al. 2020). Second, they applied the protein-protein interaction (PPI) network to annotate the functions of these features (Yin et al. 2020). Finally, 13 features were identified as PRBs through the further validation, which may serve as drug targets in COAD treatment (Yin et al. 2020).

CRC is also known as bowel and colon cancer, which makes colon cancer (CC) a subset of it (Jahanafrooz et al. 2020). Tong et al. successfully constructed a prognostic prediction model of CC patients by integrating clinical features, gene expression, miRNA expression, and DNA methylation data extracted from TCGA (Tong et al. 2020). Compared with models based on clinical and gene expression data, this integrative prognostic model was more effective, suggesting that the more types of omics data integrated, the better the cancer prognostic model would perform (Tong et al. 2020). Yang et al. also established a prognostic model for CC (Yang et al. 2020a). They first conducted an identification of differentially methylated genes, differentially expressed genes and miRNAs between tumor samples and normal samples (Yang et al. 2020a). Then, using omics features correlated with prognosis, the prognostic model was built, which might be helpful for CC research (Yang et al. 2020a). Yi et al. explored the underlying mechanisms of Wnt/β-catenin signaling regulating EMT program (Yi et al. 2020). It was validated that the RUNX2 expression activated by Wnt signaling pathway would lead to an increase in the expression of EMT-associated genes (Yi et al. 2020). Because EMT has been proved to be highly correlated with metastasis formation and tumorigenesis (Pastushenko and Blanpain 2019), RUNX2 might serve as a prognostic biomarker for CC. Arora et al. detected the dysregulated expression pattern of seven classical non-homologous end joining (c-NHEJ) pathway genes in CC (Arora et al. 2020). Compared to normal tissues, XRCC5, XRCC6, PRKDC, and PAXX were observed to be overexpressed in tumor tissues, whereas the expression level of LIG4 and NHEJ1 were downregulated (Arora et al. 2020). In addition, PAXX was identified as a prognostic biomarker (Arora et al. 2020). Thus, their study may help reveal the clinical significance of c-NHEJ pathway genes in CC. Using a novel upstream analysis strategy, Kel et al. deciphered the molecular mechanisms of the resistance to methotrexate (MTX) in CC (Kel et al. 2016). This strategy mainly contains two steps, i.e., the identification of transcription factors (TFs) and master regulators that activate these TFs (Kel et al. 2016). After applying this approach to transcriptomics, proteomics, and ChIP-seq data, PKC-alpha, TGF-alpha, TGF-beta, and alpha9-integrin were identified as anti-resistance targets (Kel et al. 2016). Their findings would provide new insight into oncology drug resistance research.

Left-sided colon cancer (LCC), which originates from the hindgut, and right-sided colon cancer (RCC), which originates from the midgut, are two subtypes of CC (Song et al. 2020). In addition to the different tumor locations, there are many differences between them (Shen et al. 2015). To gain a better understanding of these differences, Huang et al. analyzed transcriptomics, clinical, and somatic mutation data of patients with CC (Huang et al. 2021). A total of 360 differentially expressed genes were observed (Huang et al. 2021). Among them, it was indicated that BRAF and KRAS mutations were frequently presented in RCC, whereas APC mutation was frequently presented in LCC (Huang et al. 2021). In addition, the 4-mRNA and 6-mRNA were identified as prognostic signatures for LCC and RCC, respectively (Huang et al. 2021). Similarly, Hu et al. conducted a study on the differences in molecular features between LCC and RCC (Hu et al. 2018). It was revealed that PARC was hypermethylated in RCC, whereas CDX2 was hypermethylated in LCC (Hu et al. 2018). Also, the expression levels of miR31, miR155, and miR625 were observed to be upregulated in RCC, whereas the expression levels of miR-296 and miR592 were downregulated in LCC (Hu et al. 2018). In addition, compared with LCC, the mutation rate of KRAS and BRAF was higher in RCC, which was believed to be associated with a worse prognosis (Hu et al. 2018). Yi et al. performed a systematic analysis on the regulatory mechanisms between gene mutations and tumor immune microenvironment (TIME) in LCC and RCC cells (Yi et al. 2021). It was revealed that the mutations of top mutated genes were strongly correlated with TIME, DNA methylation levels of some immune checkpoints, and immune-related genes and miRNAs in RCC. However, these associations were less significant in LCC (Yi et al. 2021).

4.3 Liver Cancer

Liver cancer, one of the extraordinarily heterogeneous diseases, is caused by the interplay of various internal and environmental factors (Li and Wang 2016; Marengo et al. 2016). The development of omics strategies has helped us gain a holistic view of tumor biology. Shen et al. distinguished two molecular subtypes by analyzing genomic, epigenomic, and transcriptomic data from patients with liver cancer (Shen et al. 2021b). In addition, two prognostic molecular targets, ANXA2 and CHAF1B, were highly expressed in tumor tissues and identified to be strongly related to the prognosis of liver cancer patients (Shen et al. 2021b). Their research findings could provide new insight into the exploration of key biomarkers and mechanisms of liver cancer (Shen et al. 2021b).

Primary liver cancer is a serious public health issue, with HCC as the most common pathological subtype (Lin et al. 2016). Significant effort has been made to reveal the biological nature of HCC. Based on multi-omics datasets of HCC samples downloaded from TCGA and GEO databases, Liu et al. conducted an investigation on the methyltransferase-like 3 (METTL3) as well as methyltransferase-like 14 (METTL14), which were both core molecules of a multicomponent methyltransferase complex (MTC) that catalyzed the formation of N6-methyladenosine (m6A) (Liu et al. 2020b). It was clarified that METTL3 and METTL14 influence distinct signaling pathways and biological processes, thus may play opposite regulatory roles in HCC (Liu et al. 2020b). Using several databases, Jin et al. investigated the impact of the expression levels of CDK1, CCNB1, and CCNB2 in the survival of HCC patients (Zou et al. 2020). The upregulation of CDK1, CCNB1, and CCNB2, which might be caused by low levels of methylation or genomic alternations, was found to be highly correlated with poor prognosis in HCC patients (Zou et al. 2020). Using multi-omics analysis of metabolomics and absolute quantification proteomics, Dan et al. conducted an investigation on the effects of canagliflozin (CANA) on the proliferation of HCC cell lines (Nakano et al. 2020). It was shown that CANA, the sodium glucose co-transporter 2 (SGLT2) inhibitor, mainly altered oxidative phosphorylation metabolism, fatty acid metabolism, and DNA synthesis, which may suppress cell proliferation of Hep3B and Huh7 cells (Nakano et al. 2020). Shen et al. performed a multi-omics analysis to explore the metabolic impact of estrogen and its receptors in HCC cells (Shen et al. 2021a). It was suggested that estrogen acts on its receptors to suppress HepG2 cell growth via altering glucose and lipid metabolism, which might be part of the reason why women have a lower risk of HCC development as compared to men worldwide (Shen et al. 2021a). Woo et al. integrated CNV, DNA methylation, and mRNA expression data of a cohort of HCC patients to identify DNA copy-number-correlated (CNVcor) and methylation-correlated (METcor) genes (Woo et al. 2017). The frequencies of CNVcor gene aberration were indicated to be significantly correlated with frequencies of METcor gene aberration, demonstrating that the concomitant regulation of transcriptomes by alternations in DNA copy numbers and methylation should be took into consideration in liver cancer research (Woo et al. 2017).

In developing countries prevalent for hepatitis B virus (HBV) infection, HBV still remains the most common etiologic agent of HCC (Chang 2014). Much work also has been done to uncover the direct and indirect mechanisms that are involved in HCC oncogenesis by HBV (Xie 2017). Through the integration of proteomics and metabolomics assays, Xie et al. conducted an exploration on the mechanisms of HBV-induced HCC (Xie et al. 2017). They demonstrated that HBV core protein might contribute to the progression of HCC by modifying the metabolism of glycolysis and amino acid (Xie et al. 2017). Consequently, HBV core protein could represent a promising target for antiviral therapy (Xie et al. 2017). Aiming to identify novel biomarkers in HCC, Miao et al. performed multi-omics analyses integrating genomic, transcriptomics, and clinicopathological data of patients with HBV-related multifocal HCC (Miao et al. 2014). Six genes with abnormal expression levels were identified (Miao et al. 2014). Among them, TTK might be an overall prognostic indicator for HCC, because the expression level of TTK was shown to be highly correlated with metastatic potential, postsurgical recurrence, and survival of HCC patients (Miao et al. 2014). Gao et al. conducted a comprehensive proteogenomic characterization of tumor and adjacent liver samples from 159 HCC patients with HBV infection (Gao et al. 2019). Two metabolic enzymes, PYCR2 and ADH1A, were identified to participate in HCC metabolic reprogramming (Gao et al. 2019). Because the upregulation of PYCR2 or downregulation of ADH1A may result in HCC progression, they were also validated as potential prognostic biomarkers (Gao et al. 2019).

Since accurate stratification is essential for clinical decision making (Preisser et al. 2020), different stratification methods applied to cohorts of HCC patients have been developed. Kumardeep et al. proposed a deep learning-based model derived from RNA-seq, miRNA-seq, CpG methylation and clinical data of HCC samples to identify two subgroups with significantly different survival (Chaudhary et al. 2018). It was illuminated that the more aggressive subgroup is associated with TP53 inactivation mutations and Wnt pathway activation (Chaudhary et al. 2018). Therefore, this risk stratification model may be useful at HCC prognosis prediction as well as therapeutic intervention (Chaudhary et al. 2018). Xiao et al. formed an integration method and used this method for an analysis of mRNA expression data, DNA-methylation data, somatic mutation data, and clinical information of HCC samples (Ouyang et al. 2020). 34 differentially expressed genes (DEGs) were identified, some of them were verified as diagnostic biomarkers for HCC (Ouyang et al. 2020). According to the gene expression data of the aforementioned DEGs, tumor samples were divided into three subtypes that displayed different biological processes (Ouyang et al. 2020). Hence, what they found out might help improve precision medicine regarding HCC (Ouyang et al. 2020).

The advanced molecular biological techniques as well as improving understanding of complex mechanisms of liver cancer has driven the development of precision medicine (Yoo et al. 2018). Yildiz analyzed datasets generated by high-throughput drug screening and genomic and transcriptomic studies on HCC cell lines (Yildiz 2018). He divided HCC cells into two subtypes that responded differently to the same drug treatments (Yildiz 2018). 6 molecular targets were revealed to be associated with drug sensitivity, which could aid the development of effective molecular therapies (Yildiz 2018). Also, the EGFR/PI3K/AKT/mTOR signaling pathway was believed to play a central role in the regulation of sensitivity and resistance to drug treatments in HCC (Yildiz 2018). Christos et al. utilized a computational approach to explore the novel drug targets in mTOR-driven HCC (Dimitrakopoulos et al. 2021). 74 mediators under the impact of upstream genetic aberrations and changes in miRNA expression were identified, among which YAP1, GRB2, HDAC4, SIRT1, and LIS1 were validated to be dysregulated in human HCC (Dimitrakopoulos et al. 2021). Thus, inhibitors of these mediators may be potentially useful in HCC treatment (Dimitrakopoulos et al. 2021).

5 Conclusion

Multiomics clearly has advantages when it comes to translating the biological characteristics of cancer into understandable and clinically interpretable data. The advancement of multiomics research in the context of a specific cancer reveals numerous “invisible” but critical correlations. Multiple biomarkers have a higher specificity than previous single-gene markers, laying the groundwork for future research in this field. The identification of specific markers enables the diagnosis of cancer and subsequent treatment, as well as better stratifying patients and developing more effective and personalized treatment methods.

As mentioned previously, multi-omics methods have been successfully applied to colorectal cancer, liver cancer, and lung cancer, yielding a wealth of biological data. As methods and resources for multi-omics analysis mature, multi-omics research will play an increasingly important role in understanding the pathogenesis of cancer and developing effective treatment measures.

However, there is a growing gap between the ability to integrate, process, and interpret data and the ability to generate large amounts of omics data. The majority of data standardization efforts and development of a central public database of omics data have been abandoned. Simultaneously, the majority of tools for multi-omics integration are insufficiently robust, prone to errors, and are only suitable for advanced users with programming expertise. There is still a long way to go before multi-omics analysis is widely applied and its value is maximized in cancer research.