Abstract
Constant improvements in mass spectrometry technologies and laboratory workflows have enabled the proteomics investigation of biological samples of growing complexity. Microbiomes represent such complex samples for which metaproteomics analyses are becoming increasingly popular. Metaproteomics experimental procedures create large amounts of data from which biologically relevant signal must be efficiently extracted to draw meaningful conclusions. Such a data processing requires appropriate bioinformatics tools specifically developed for, or capable of handling metaproteomics data. In this chapter, we outline current and novel tools that can perform the most commonly used steps in the analysis of cutting-edge metaproteomics data, such as peptide and protein identification and quantification, as well as data normalization, imputation, mining, and visualization. We also provide details about the experimental setups in which these tools should be used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Heyer R, Schallert K, Büdel A et al (2019) A robust and universal metaproteomics workflow for research studies and routine diagnostics within 24 h using phenol extraction, fasp digest, and the metaproteomeanalyzer. Front Microbiol 10:1883
Heyer R, Schallert K, Zoun R et al (2017) Challenges and perspectives of metaproteomic data analysis. J Biotechnol 261:24–36
Stahl DC, Swiderek KM, Davis MT, Lee TD (1996) Data-controlled automation of liquid chromatography/tandem mass spectrometry analysis of peptide mixtures. J Am Soc Mass Spectrom 7:532–540
Venable JD, Dong M-Q, Wohlschlegel J et al (2004) Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat Methods 1:39–45
Gillet LC, Navarro P, Tate S et al (2012) Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol Cell Proteomics 11(O111):016717
Doerr A (2014) DIA mass spectrometry. Nat Methods 12:35–35
Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5:976–989
Tanca A, Palomba A, Fraumene C et al (2016) The impact of sequence database choice on metaproteomic results in gut microbiota studies. Microbiome 4:51
Tanca A, Palomba A, Deligios M et al (2013) Evaluating the impact of different sequence databases on metaproteome analysis: insights from a lab-assembled microbial mixture. PLoS One 8:e82981
Timmins-Schiffman E, May DH, Mikan M et al (2017) Critical decisions in metaproteomics: achieving high confidence protein annotations in a sea of unknowns. ISME J 11:309–314
O’Leary NA, Wright MW, Brister JR et al (2016) Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 44:D733–D745
Li J, Jia H, Cai X et al (2014) An integrated catalog of reference genes in the human gut microbiome. Nat Biotechnol 32:834–841
Kuhring M, Renard BY (2015) Estimating the computational limits of detection of microbial non-model organisms. Proteomics 15:3580–3584
Jagtap P, Goslinga J, Kooren JA et al (2013) A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. Proteomics 13:1352–1357
Zhang X, Ning Z, Mayne J et al (2016) MetaPro-IQ: a universal metaproteomic approach to studying human and mouse gut microbiota. Microbiome 4:31
Craig R, Beavis RC (2003) A method for reducing the time required to match protein sequences with tandem mass spectra. Rapid Commun Mass Spectrom 17:2310–2316
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467
Tyanova S, Temu T, Cox J (2016) The MaxQuant computational platform for mass spectrometry-based shotgun proteomics. Nat Protoc 11:2301–2319
Beyter D, Lin MS, Yu Y et al (2018) ProteoStorm: an ultrafast metaproteomics database search framework. Cell Syst 7:463–467
Xiao J, Tanca A, Jia B et al (2018) Metagenomic taxonomy-guided database-searching strategy for improving metaproteomic analysis. J Proteome Res 17:1596–1605
UniProt Consortium (2021) UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res 49:D480–D489
Park SKR, Jung T, Thuy-Boun PS et al (2019) ComPIL 2.0: an updated comprehensive metaproteomics database. J Proteome Res 18:616–622
Xu T, Park SK, Venable JD et al (2015) ProLuCID: an improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteome 129:16–24
Lam H, Deutsch EW, Eddes JS et al (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7:655–667
Craig R, Cortens JC, Fenyo D, Beavis RC (2006) Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res 5:1843–1849
Frewen BE, Merrihew GE, Wu CC et al (2006) Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal Chem 78:5678–5684
Yang Y, Liu X, Shen C et al (2020) In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics. Nat Commun 11:1–11
Gessulat S, Schmidt T, Zolg DP et al (2019) Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat Methods 16:509–518
Pietilä S, Suomi T, Aakko J, Elo LL (2019) A data analysis protocol for quantitative data-independent acquisition proteomics. Methods Mol Biol 1871:455–465
Aakko J, Pietilä S, Suomi T et al (2020) Data-independent acquisition mass spectrometry in metaproteomics of gut microbiota—implementation and computational analysis. J Proteome Res 19:432–436
Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4:207–214
Käll L, Canterbury JD, Weston J et al (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 4:923–925
The M, MacCoss MJ, Noble WS, Käll L (2016) Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. J Am Soc Mass Spectrom 27:1719–1727
Mikan MP, Harvey HR, Timmins-Schiffman E et al (2020) Metaproteomics reveal that rapid perturbations in organic matter prioritize functional restructuring over taxonomy in western Arctic Ocean microbiomes. ISME J 14:39–52
Guo X, Li Z, Yao Q et al (2018) Sipros ensemble improves database searching and filtering for complex metaproteomics. Bioinformatics 34:795–802
Keller A, Nesvizhskii AI, Kolker E, Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74:5383–5392
Cociorva D, Tabb L, Yates JR (2007) Validation of tandem mass spectrometry database search results using DTASelect. Curr Protoc Bioinform 13:Unit 13.4
Chatterjee S, Stupp GS, Park SKR et al (2016) A comprehensive and scalable database search system for metaproteomics. BMC Genomics 17:642
Ma B, Zhang K, Hendrie C et al (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 17:2337–2342
Frank A, Pevzner P (2005) PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 77:964–973
Fischer B, Roth V, Roos F et al (2005) NovoHMM: a hidden Markov model for de novo peptide sequencing. Anal Chem 77:7265–7273
Kleikamp HBC, Pronk M, Tugui C et al (2021) Database-independent de novo metaproteomics of complex microbial communities. Cell Syst 12:375–383.e5
Behsaz B, Mohimani H, Gurevich A et al (2020) De novo peptide sequencing reveals many cyclopeptides in the human gut and other environments. Cell Syst 10:99–108
Thompson A, Schäfer J, Kuhn K et al (2003) Tandem mass tags: a novel quantification strategy for comparative analysis of complex protein mixtures by MS/MS. Anal Chem 75:1895–1904
Ong S-E, Blagoev B, Kratchmarova I et al (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1:376–386
Ross PL, Huang YN, Marchese JN et al (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using amine-reactive isobaric tagging reagents. Mol Cell Proteomics 3:1154–1169
Zhang X, Ning Z, Mayne J et al (2016) In vitro metabolic labeling of intestinal microbiota for quantitative metaproteomics. Anal Chem 88:6120–6125
Tang J, Fu J, Wang Y et al (2020) ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies. Brief Bioinform 21:621–636
Riffle M, May DH, Timmins-Schiffman E et al (2018) MetaGOmics: a web-based tool for peptide-centric functional and taxonomic analysis of metaproteomics data. Proteomes 6:2
Mayers MD, Moon C, Stupp GS et al (2017) Quantitative metaproteomics and activity-based probe enrichment reveals significant alterations in protein expression from a mouse model of inflammatory bowel disease. J Proteome Res 16:1014–1026
Cheng K, Ning Z, Zhang X et al (2017) MetaLab: an automated pipeline for metaproteomic data analysis. Microbiome 5:157
Cheng K, Ning Z, Zhang X et al (2020) MetaLab 2.0 enables accurate post-translational modifications profiling in metaproteomics. J Am Soc Mass Spectrom 31:1473–1482
Zhang X, Ning Z, Mayne J et al (2020) Widespread protein lysine acetylation in gut microbiome and its alterations in patients with Crohn’s disease. Nat Commun 11:1–12
Schiebenhoefer H, Schallert K, Renard BY et al (2020) A complete and flexible workflow for metaproteomics data analysis based on MetaProteomeAnalyzer and prophane. Nat Protoc 15:3212–3239
Muth T, Behne A, Heyer R et al (2015) The MetaProteomeAnalyzer: a powerful open-source software suite for metaproteomics data analysis and interpretation. J Proteome Res 14:1557–1565
Muth T, Kohrs F, Heyer R et al (2018) MPA portable: a stand-alone software package for analyzing metaproteome samples on the go. Anal Chem 90:685–689
Schneider T, Schmid E, de Castro JV et al (2011) Structure and function of the symbiosis partners of the lung lichen (Lobaria pulmonaria L. Hoffm.) analyzed by metaproteomics. Proteomics 11:2752–2756
Geer LY, Markey SP, Kowalak JA et al Open mass spectrometry search algorithm. J Proteome Res 3:958–964
Van Den Bossche T, Verschaffelt P, Schallert K et al (2020) Connecting MetaProteomeAnalyzer and PeptideShaker to unipept for seamless end-to-end metaproteomics data analysis. J Proteome Res 19:3562–3566
Vaudel M, Burkhart JM, Zahedi RP et al (2015) PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol 33:22–24
Gurdeep Singh R, Tanca A, Palomba A et al (2019) Unipept 4.0: functional analysis of metaproteome data. J Proteome Res 18:606–615
Verschaffelt P, Van Den Bossche T, Martens L et al (2021) Unipept desktop: a faster, more powerful metaproteomics results analysis tool. J Proteome Res 20:2005–2009
Perez-Riverol Y, Csordas A, Bai J et al (2018) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 47:D442–D450
Deutsch EW, Csordas A, Sun Z et al (2017) The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition. Nucleic Acids Res 45:D1100–D1106
Jagtap PD, Blakely A, Murray K et al (2015) Metaproteomic analysis using the galaxy framework. Proteomics 15:3553–3565
Huson DH, Weber N (2013) Microbial community analysis using MEGAN. Methods Enzymol 531:465–485
Röst HL, Sachsenberg T, Aiche S et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13:741–748
Grüning B, Chilton J, Köster J et al (2018) Practical computational reproducibility in the life sciences. Cell Syst. 6:631–635
Berthold MR, Cebron N, Dill F et al (2007) KNIME: the Konstanz information miner. In: Studies in classification, data analysis, and knowledge organization (GfKL 2007). Springer
Sachsenberg T, Herbst FA, Taubert M et al (2015) MetaProSIP: automated inference of stable isotope incorporation rates in proteins for functional metaproteomics. J Proteome 14:619–627
Deutsch EW, Mendoza L, Shteynberg D et al (2015) Trans-proteomic pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin Appl 9:745–754
Rabe A, Gesell Salazar M, Michalik S et al (2019) Metaproteomics analysis of microbial diversity of human saliva and tongue dorsum in young healthy individuals. J Oral Microbiol 11:1654786
Välikangas T, Suomi T, Elo LL (2018) A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief Bioinform 19:1–11
Willforss J, Chawade A, Levander F (2019) NormalyzerDE: online tool for improved normalization of omics expression data and high-sensitivity differential expression analysis. J Proteome Res 18:732–740
Polpitiya AD, Qian W-J, Jaitly N et al (2008) DAnTE: a statistical tool for quantitative analysis of -omics data. Bioinformatics 24:1556–1558
Marion S, Desharnais L, Studer N et al (2020) Biogeography of microbial bile acid transformations along the murine gut. J Lipid Res 61:1450–1463
Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinform 13:1–9
Lazar C, Gatto L, Ferro M et al (2016) Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J Proteome Res 15:1116–1125
Liu M, Dongre A (2020) Proper imputation of missing values in proteomics datasets for differential expression analysis. Brief Bioinform 22:bbaa112
Wang S, Li W, Hu L et al (2020) NAguideR: performing and prioritizing missing value imputations for consistent bottom-up proteomic analyses. Nucleic Acids Res 48:e83–e83
Graw S, Tang J, Zafar MK et al (2020) proteiNorm—a user-friendly tool for normalization and analysis of TMT and label-free protein quantification. ACS Omega 5:25625–25633
Nesvizhskii AI, Aebersold R (2005) Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics 4:1419–1440
Serang O, Noble W (2012) A review of statistical methods for protein identification using tandem mass spectrometry. Stat Interface 5:3–20
Carbon S, Douglass E, Dunn N et al (2019) The gene ontology resource: 20 years and still GOing strong. Nucleic Acids Res 47:D330–D338
Bairoch A (2000) The ENZYME database in 2000. Nucleic Acids Res 28:304–305
Mooradian AD, van der Post S, Naegle KM, Held JM (2020) ProteoClade: a taxonomic toolkit for multi-species and metaproteomic analysis. PLoS Comput Biol 16:e1007741
Saunders JK, Gaylord DA, Held NA et al (2020) METATRYP v 2.0: metaproteomic least common ancestor analysis for taxonomic inference using specialized sequence assemblies-standalone software and web servers for marine microorganisms and coronaviruses. J Proteome Res 19:4718–4729
Saito MA, Saunders JK, Chagnon M et al (2021) Development of an ocean protein portal for interactive discovery and education. J Proteome Res 20:326–336
Ogata H, Goto S, Sato K et al (1999) KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 27:29–34
Galperin MY, Wolf YI, Makarova KS et al (2021) COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 49:D274–D281
Huerta-Cepas J, Szklarczyk D, Heller D et al (2019) EggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res 47(D1):D309–D314
The UniProt Consortium (2019) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47:D506–D515
Blakeley-Ruiz JA, Erickson AR, Cantarel BL et al (2019) Metaproteomics reveals persistent and phylum-redundant metabolic functional stability in adult human gut microbiomes of Crohn’s remission patients despite temporal variations in microbial taxa, genomes, and proteomes. Microbiome 7:18
Easterly CW, Sajulga R, Mehta S et al (2019) MetaQuantome: an integrated, quantitative metaproteomics approach reveals connections between taxonomy and protein function in complex microbiomes. Mol Cell Proteomics 18:S82–S91
Simopoulos CMA, Ning Z, Zhang X et al (2020) pepFunk: a tool for peptide-centric functional analysis of metaproteomic human gut microbiome studies. Bioinformatics 36:4171–4179
Bolyen E, Dillon M, Bokulich N et al (2019) Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol 37:852–857
Rechenberger J, Samaras P, Jarzab A et al (2019) Challenges in clinical metaproteomics highlighted by the analysis of acute leukemia patients with gut colonization by multidrug-resistant enterobacteriaceae. Proteomes 7:2
Starke R, Bastida F, Abadía J et al (2017) Ecological and functional adaptations to water management in a semiarid agroecosystem: a soil metaproteomics approach. Sci Rep 7:1–16
Li L, Ning Z, Zhang X et al (2020) RapidAIM: a culture- and metaproteomics-based rapid assay of individual microbiome responses to drugs. Microbiome 8:33
Li L, Chang L, Zhang X et al (2020) Berberine and its structural analogs have differing effects on functional profiles of individual gut microbiomes. Gut Microbes 11:1348–1361
Li L, Ryan J, Ning Z et al (2020) A functional ecological network based on metaproteomics responses of individual gut microbiomes to resistant starches. Comput Struct Biotechnol J 18:3833–3842
Acknowledgements
This work was supported by Natural Sciences and Engineering Research Council of Canada Discovery grants to M.L.A. and D.F. Funding from the Government of Canada through Genome Canada and the Ontario Genomics Institute (OGI-156), the Natural Sciences and Engineering Research Council of Canada (NSERC, grant no. 210034), and the Ontario Ministry of Economic Development and Innovation (ORF-DIG-14405) to D.F. C.M.A.S. was funded by a stipend from the NSERC CREATE in Technologies for Microbiome Science and Engineering (TECHNOMISE) Program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Simopoulos, C.M.A., Figeys, D., Lavallée-Adam, M. (2022). Novel Bioinformatics Strategies Driving Dynamic Metaproteomic Studies. In: Geddes-McAlister, J. (eds) Proteomics in Systems Biology. Methods in Molecular Biology, vol 2456. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2124-0_22
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2124-0_22
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2123-3
Online ISBN: 978-1-0716-2124-0
eBook Packages: Springer Protocols