Abstract
The complexities of tumor genomes are rapidly being uncovered, but how they are regulated into functional proteomes remains poorly understood. Standard proteomics workflows use databases of known proteins, but these databases do not capture the uniqueness of the cancer transcriptome, with its point mutations, unusual splice variants and gene fusions. Onco-proteogenomics integrates mass spectrometry–generated data with genomic information to identify tumor-specific peptides. Linking tumor-derived DNA, RNA and protein measurements into a central-dogma perspective has the potential to improve our understanding of cancer biology.
Similar content being viewed by others
Change history
05 November 2014
In the version of this article initially published, page numbers were missing from reference number 30. The error has been corrected in the HTML and PDF versions of the article.
References
Weinstein, J.N. et al. The cancer genome atlas pan-cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Morin, R.D. et al. Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 476, 298–303 (2011).
Steidl, C. et al. MHC class II transactivator CIITA is a recurrent gene fusion partner in lymphoid cancers. Nature 471, 377–381 (2011).
Shapiro, I.M. et al. An EMT-driven alternative splicing program occurs in human breast cancer and modulates cellular phenotype. PLoS Genet. 7, e1002218 (2011).
Tuch, B.B. et al. Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations. PLoS ONE 5, e9317 (2010).
Kislinger, T. et al. Global survey of organ and organelle protein expression in mouse: combined proteomic and transcriptomic profiling. Cell 125, 173–186 (2006).
Gygi, S.P., Rochon, Y., Franza, B.R. & Aebersold, R. Correlation between protein and mRNA abundance in yeast. Mol. Cell. Biol. 19, 1720–1730 (1999).
Byers, L.A. et al. Proteomic profiling identifies dysregulated pathways in small cell lung cancer and novel therapeutic targets including PARP1. Cancer Discov. 2, 798–811 (2012).
Ingolia, N.T., Brar, G.A., Rouskin, S., McGeachy, A.M. & Weissman, J.S. The ribosome profiling strategy for monitoring translation in vivo by deep sequencing of ribosome-protected mRNA fragments. Nat. Protoc. 7, 1534–1550 (2012).
Mann, M., Kulak, N.A., Nagaraj, N. & Cox, J. The coming age of complete, accurate, and ubiquitous proteomes. Mol. Cell 49, 583–590 (2013).
Amberger, J., Bocchini, C.A., Scott, A.F. & Hamosh, A. McKusick's Online Mendelian Inheritance in Man (OMIM). Nucleic Acids Res. 37, D793–D796 (2009).
Lane, L. et al. neXtProt: a knowledge platform for human proteins. Nucleic Acids Res. 40, D76–D83 (2012).
Kim, N., Shin, S. & Lee, S. ECgene: genome-based EST clustering and gene modeling for alternative splicing. Genome Res. 15, 566–576 (2005).
Kim, P. et al. ChimerDB 2.0—a knowledgebase for fusion genes updated. Nucleic Acids Res. 38, D81–D85 (2010).
Forbes, S.A. et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39, D945–D950 (2011).
Allmer, J. Algorithms for the de novo sequencing of peptides from tandem mass spectra. Expert Rev. Proteomics 8, 645–657 (2011).
Fermin, D. et al. Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol. 7, R35 (2006).
Branca, R.M. et al. HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics. Nat. Methods 11, 59–62 (2014).
Xing, X.-B. et al. The discovery of novel protein-coding features in mouse genome based on mass spectrometry data. Genomics 98, 343–351 (2011).
Castellana, N.E. et al. Discovery and revision of Arabidopsis genes by proteogenomics. Proc. Natl. Acad. Sci. USA 105, 21034–21038 (2008).
Gawryluk, R.M., Chisholm, K.A., Pinto, D.M. & Gray, M.W. Composition of the mitochondrial electron transport chain in Acanthamoeba castellanii: structural and evolutionary insights. Biochim. Biophys. Acta 1817, 2027–2037 (2012).
Yates, J.R. III., Eng, J.K. & McCormack, A.L. Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67, 3202–3210 (1995).
Mo, F. et al. A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data. BMC Bioinformatics 9, 537 (2008).
Li, M. et al. Widespread RNA and DNA sequence differences in the human transcriptome. Science 333, 53–58 (2011).
Pickrell, J.K., Gilad, Y. & Pritchard, J.K. Comment on “Widespread RNA and DNA sequence differences in the human transcriptome.”. Science 335, 1302 (2012).
Kleinman, C.L. & Majewski, J. Comment on “Widespread RNA and DNA sequence difference in the human transcriptome.”. Science 335, 1302 (2012).
Frenkel-Morgenstern, M. et al. ChiTaRS: a database of human, mouse and fruit fly chimeric transcripts and RNA-sequencing data. Nucleic Acids Res. 41, D142–D151 (2013).
Sheynkman, G.M., Shortreed, M.R., Frey, B.L. & Smith, L.M. Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol. Cell. Proteomics 12, 2341–2353 (2013).
Helmy, M., Tomita, M. & Ishihama, Y. Peptide identification by searching large-scale tandem mass spectra against large databases: bioinformatics methods in proteogenomics. Genes Genomes Genomics 6, 76–85 (2012).
Nesvizhskii, A.I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114–1125 (2014).
Risk, B.A., Spitzer, W.J. & Giddings, M.C. Peppy: proteogenomic search software. J. Proteome Res. 12, 3019–3025 (2013).
Woo, S. et al. Proteogenomic database construction driven from large scale RNA-Seq data. J. Proteome Res. 13, 21–28 (2014).
Wang, X. & Zhang, B. customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics 29, 3235–3237 (2013).
Zhang, B. et al. Proteogenomic characterization of human colon and rectal cancer. Nature 513, 382–387 (2014).
Helmy, M., Sugiyama, N., Tomita, M. & Ishihama, Y. Onco-proteogenomics: a novel approach to identify cancer-specific mutations combining proteomics and transcriptome deep sequencing. Genome Biol. 11 (suppl. 1), 17 (2010).
Evans, V.C. et al. De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat. Methods 9, 1207–1211 (2012).
Wang, X. et al. Protein identification using customized protein sequence databases derived from RNA-Seq data. J. Proteome Res. 11, 1009–1017 (2012).
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
Halvey, P.J. et al. Proteogenomic analysis reveals unanticipated adaptations of colorectal tumor cells to deficiencies in DNA mismatch repair. Cancer Res. 74, 387–397 (2014).
Aquino, P.F. et al. Exploring the proteomic landscape of a gastric cancer biopsy with the Shotgun Imaging Analyzer. J. Proteome Res. 13, 314–320 (2014).
Menon, R. & Omenn, G.S. Proteomic characterization of novel alternative splice variant proteins in human epidermal growth factor receptor 2/neu–induced breast cancers. Cancer Res. 70, 3440–3449 (2010).
Aebersold, R. et al. The biology/disease-driven human proteome project (B/D-HPP): enabling protein research for the life sciences community. J. Proteome Res. 12, 23–27 (2013).
Gonzalez-Perez, A. et al. Computational approaches to identify functional genetic variants in cancer genomes. Nat. Methods 10, 723–729 (2013).
Liu, H., Sadygov, R.G. & Yates, J.R. III. A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76, 4193–4201 (2004).
Freed-Pastor, W.A. & Prives, C. Mutant p53: one name, many proteins. Genes Dev. 26, 1268–1286 (2012).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Jagtap, P. et al. A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. Proteomics 13, 1352–1357 (2013).
Nesvizhskii, A.I., Vitek, O. & Aebersold, R. Analysis and validation of proteomic data generated by tandem mass spectrometry. Nat. Methods 4, 787–797 (2007).
Boutros, P.C. et al. Global optimization of somatic variant identification in cancer genomes with a global community challenge. Nat. Genet. 46, 318–319 (2014).
O'Rawe, J. et al. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med. 5, 28 (2013).
Wis´niewski, J.R. et al. Extensive quantitative remodeling of the proteome between normal colon tissue and adenocarcinoma. Mol. Syst. Biol. 8, 611 (2012).
Moghaddas Gholami, A. et al. Global proteome analysis of the NCI-60 cell line panel. Cell Rep. 4, 609–620 (2013).
Branca, R.M. et al. HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics. Nat. Methods 11, 59–62 (2014).
Hanahan, D. & Weinberg, R.A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Khan, Z. et al. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342, 1100–1104 (2013).
Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).
Dennis, J.W., Nabi, I.R. & Demetriou, M. Metabolism, cell surface organization, and disease. Cell 139, 1229–1241 (2009).
Dufour, A. & Overall, C.M. Missing the target: matrix metalloproteinase antitargets in inflammation and cancer. Trends Pharmacol. Sci. 34, 233–242 (2013).
McDermott, J.E. et al. Challenges in biomarker discovery: combining expert insights with statistical analysis of complex omics data. Expert Opin. Med. Diagn. 7, 37–51 (2013).
Brinton, L.T., Brentnall, T.A., Smith, J.A. & Kelly, K.A. Metastatic biomarker discovery through proteomics. Cancer Genomics Proteomics 9, 345–355 (2012).
Johansson, Å. et al. Identification of genetic variants influencing the human plasma proteome. Proc. Natl. Acad. Sci. USA 110, 4673–4678 (2013).
Wu, L. et al. Variation and genetic control of protein abundance in humans. Nature 499, 79–82 (2013).
Maier, T. et al. Quantification of mRNA and protein and integration with protein turnover in a bacterium. Mol. Syst. Biol. 7, 511 (2011).
Boisvert, F.M. et al. A quantitative spatial proteomics analysis of proteome turnover in human cells. Mol. Cell. Proteomics 11, M111.011429 (2012).
Smith, L.M. & Kelleher, N.L. Proteoform: a single term describing protein complexity. Nat. Methods 10, 186–187 (2013).
Elschenbroich, S. & Kislinger, T. Targeted proteomics by selected reaction monitoring mass spectrometry: applications to systems biology and biomarker discovery. Mol. Biosyst. 7, 292–303 (2011).
Keshishian, H., Addona, T., Burgess, M., Kuhn, E. & Carr, S.A. Quantitative, multiplexed assays for low abundance proteins in plasma by targeted mass spectrometry and stable isotope dilution. Mol. Cell. Proteomics 6, 2212–2229 (2007).
Stahl-Zeng, J. et al. High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites. Mol. Cell. Proteomics 6, 1809–1817 (2007).
Halvey, P.J., Ferrone, C.R. & Liebler, D.C. GeLC-MRM quantitation of mutant KRAS oncoprotein in complex biological samples. J. Proteome Res. 11, 3908–3913 (2012).
Anderson, N.L. et al. Mass spectrometric quantitation of peptides and proteins using Stable Isotope Standards and Capture by Anti-Peptide Antibodies (SISCAPA). J. Proteome Res. 3, 235–244 (2004).
Hembrough, T. et al. Selected reaction monitoring (SRM) analysis of epidermal growth factor receptor (EGFR) in formalin fixed tumor tissue. Clin. Proteomics 9, 5 (2012).
Khatun, J. et al. Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions. BMC Genomics 14, 141 (2013).
Menon, R. et al. Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. Cancer Res. 69, 300–309 (2009).
Sun, H. et al. Identification of gene fusions from human lung cancer mass spectrometry data. BMC Genomics 14 (suppl. 8), S5 (2013).
Acknowledgements
T.K. is supported through the Canadian Research Chairs Program. This work was supported in part by grants from the Canadian Institute of Health Research (MOP-93772 to T.K. and MOP-114896 to T.K. and P.C.B.), by the Ontario Ministry of Health and Long Term Care, with the support of the Ontario Institute for Cancer Research to P.C.B. through funding provided by the Government of Ontario and by a Movember Discovery grant from Prostate Cancer Canada (D2013-21) to T.K. and P.C.B. This work was supported by Prostate Cancer Canada and is proudly funded by the Movember Foundation (#RS2014-01). P.C.B. was supported by a Terry Fox Research Institute New Investigator Award. J.A.A. is supported by a Natural Sciences and Engineering Research Council of Canada doctoral fellowship (CGS-D). A.S. was supported through by a Department of Medical Biophysics Excellence Award and by a Kristi Piia CALLUM Memorial Fellowship. The authors thank R.X. Sun and J.D. Watson for critical reading of the manuscript.
Author information
Authors and Affiliations
Contributions
T.K. and P.C.B. conceived of the project. J.A.A., A.S., T.K. and P.C.B. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Rights and permissions
About this article
Cite this article
Alfaro, J., Sinha, A., Kislinger, T. et al. Onco-proteogenomics: cancer proteomics joins forces with genomics. Nat Methods 11, 1107–1113 (2014). https://doi.org/10.1038/nmeth.3138
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.3138
- Springer Nature America, Inc.
This article is cited by
-
Improved methods for RNAseq-based alternative splicing analysis
Scientific Reports (2021)
-
ProGeo-neo: a customized proteogenomic workflow for neoantigen prediction and selection
BMC Medical Genomics (2020)
-
Recent advances in mass spectrometry based clinical proteomics: applications to cancer research
Clinical Proteomics (2020)
-
Comprehensive proteome and phosphoproteome profiling shows negligible influence of RNAlater on protein abundance and phosphorylation
Clinical Proteomics (2019)
-
Clinical potential of mass spectrometry-based proteogenomics
Nature Reviews Clinical Oncology (2019)