Abstract
Mass spectrometry (MS)-based proteomics is currently the most successful approach to measure and compare peptides and proteins in a large variety of biological samples. Modern mass spectrometers, equipped with high-resolution analyzers, provide large amounts of data output. This is the case of shotgun/bottom-up proteomics, which consists in the enzymatic digestion of protein into peptides that are then measured by MS-instruments through a data dependent acquisition (DDA) mode. Dedicated bioinformatic tools and platforms have been developed to face the increasing size and complexity of raw MS data that need to be processed and interpreted for large-scale protein identification and quantification. This chapter illustrates the most popular bioinformatics solution for the analysis of shotgun MS-proteomics data. A general description will be provided on the data preprocessing options and the different search engines available, including practical suggestions on how to optimize the parameters for peptide search, based on hands-on experience.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Change history
22 January 2022
In the original version of this book, chapter 16 was published non-open access. It has now been changed to open access under a CC BY 4.0 license, and the copyright holder has been updated to “The Author(s).” This book has been updated with these changes.
References
Zhang Y et al (2013) Protein analysis by shotgun/bottom-up proteomics. Chem Rev 113(4):2343–2394
Martens L et al (2005) Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories. Proteomics 5(13):3501–3505
Deutsch E (2008) mzML: a single, unifying data format for mass spectrometer output. Proteomics 8(14):2776–2777
Deutsch EW (2012) File formats commonly used in mass spectrometry proteomics. Mol Cell Proteomics 11(12):1612–1621
Chambers MC et al (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30(10):918–920
Rost HL et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13(9):741–748
Smith R et al (2014) Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist’s point of view. BMC Bioinformatics 15(Suppl 7):S9
Mujezinovic N et al (2006) Cleaning of raw peptide MS/MS spectra: improved protein identification following deconvolution of multiply charged peaks, isotope clusters, and removal of background noise. Proteomics 6(19):5117–5131
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20(9):1466–1467
Geer LY et al (2004) Open mass spectrometry search algorithm. J Proteome Res 3(5):958–964
Cox J et al (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10(4):1794–1805
Kim S, Pevzner PA (2014) MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5:5277
Dorfer V et al (2014) MS Amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra. J Proteome Res 13(8):3679–3684
Barsnes H, Vaudel M (2018) SearchGUI: a highly adaptable common interface for proteomics search and de novo engines. J Proteome Res 17(7):2552–2555
Vaudel M et al (2015) PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol 33(1):22–24
Desiere F (2006) The PeptideAtlas project. Nucleic Acids Res 34(90001):D655–D658
Lam H et al (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7(5):655–667
Deutsch EW et al (2015) Trans-proteomic pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin Appl 9(7–8):745–754
Lam H et al (2008) Building consensus spectral libraries for peptide identification in proteomics. Nat Methods 5(10):873–875
Shiferaw GA et al (2020) COSS: a fast and user-friendly tool for spectral library searching. J Proteome Res 19(7):2786–2793
Bogdanoff WA et al (2016) De novo sequencing and resurrection of a human astrovirus-neutralizing antibody. ACS Infect Dis 2(5):313–321
Guthals A et al (2017) De novo MS/MS sequencing of native human antibodies. J Proteome Res 16(1):45–54
Tran NH et al (2016) Complete de novo assembly of monoclonal antibody sequences. Sci Rep 6:31730
Tabb DL et al (2008) DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J Proteome Res 7(9):3838–3846
Frank A, Pevzner P (2005) PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 77(4):964–973
Tran NH et al (2017) De novo peptide sequencing by deep learning. Proc Natl Acad Sci U S A 114(31):8247–8252
Lee J-Y et al (2018) Proteomics of natural bacterial isolates powered by deep learning-based de novo identification. bioRxiv 428334. https://doi.org/10.1101/428334
Karunratanakul K et al (2019) Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework. Mol Cell Proteomics 18(12):2478–2491
Zhou XX et al (2017) pDeep: predicting MS/MS spectra of peptides with deep learning. Anal Chem 89(23):12690–12697
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Yang H et al (2019) pNovo 3: precise de novo peptide sequencing using a learning-to-rank framework. Bioinformatics 35(14):i183–i190
Ma B (2015) Novor: real-time peptide de novo sequencing software. J Am Soc Mass Spectrom 26(11):1885–1894
Tran NH et al (2019) Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry. Nat Methods 16(1):63–66
Ma B et al (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 17(20):2337–2342
Tanner S et al (2005) InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal Chem 77(14):4626–4639
Dasari S et al (2010) TagRecon: high-throughput mutation identification through sequence tagging. J Proteome Res 9(4):1716–1726
Holman JD, Ma ZQ, Tabb DL (2012) Identifying proteomic LC-MS/MS data sets with Bumbershoot and IDPicker. Curr Protoc Bioinformatics Chapter 13:Unit13.17
Devabhaktuni A et al (2019) TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol 37(4):469–479
Bateman A et al (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158–D169
Harrow J et al (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res 22(9):1760–1774
Pruitt KD et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42(D1):756–763
Verheggen K et al (2020) Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows. Mass Spectrom Rev 39(3):292–306
Siepen JA et al (2007) Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. J Proteome Res 6(1):399–408
Elias JE, Gygi SP (2010) Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol Biol 604:55–71
Zhang J et al (2012) PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 11(4):M111.010587
Xie F et al (2011) Liquid chromatography-mass spectrometry-based quantitative proteomics. J Biol Chem 286(29):25443–25449
Mueller LN et al (2008) An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. J Proteome Res 7(1):51–61
Ong SE et al (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1(5):376–386
von Haller PD et al (2003) The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis, and the application of statistical tools for data analysis and interpretation. Mol Cell Proteomics 2(7):428–442
Casey TM et al (2017) Analysis of reproducibility of proteome coverage and quantitation using isobaric mass tags (iTRAQ and TMT). J Proteome Res 16(2):384–392
Khan Z et al (2009) Protein quantification across hundreds of experimental conditions. Proc Natl Acad Sci U S A 106:15544–15548
Han DK et al (2001) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol 19(10):946–951
Pendarvis K et al (2009) An automated proteomic data analysis workflow for mass spectrometry. BMC Bioinformatics 10(Suppl 11):S17
Cox J et al (2014) Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics 13(9):2513–2526
Zhang W et al (2012) LFQuant: a label-free fast quantitative analysis tool for high-resolution LC-MS/MS proteomics data. Proteomics 12(23–24):3475–3484
Van Riper SK et al (2016) RIPPER: a framework for MS1 only metabolomics and proteomics label-free relative quantification. Bioinformatics 32(13):2035–2037
Chang C et al (2019) PANDA: a comprehensive and flexible tool for quantitative proteomics data analysis. Bioinformatics 35(5):898–900
Jones AR et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11(7):M111.014381
Walzer M et al (2013) The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 12(8):2332–2340
Proteome Discoverer. https://www.thermofisher.com/
Protein Pilot. https://sciex.com/
Bern M, Kil YJ, Becker C (2012) Byonic: advanced peptide and protein identification software. Curr Protoc Bioinformatics Chapter 13:Unit13.20
Progenesis. http://www.nonlinear.com/progenesis/
Peaks Studio. www.thermofisher.com
Mascot Distiller. https://www.matrixscience.com/
Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26(12):1367–1372
Tyanova S et al (2016) The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13(9):731–740
Sinitcyn P et al (2018) MaxQuant goes Linux. Nat Methods 15(6):401–401
Chi H et al (2018) Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat Biotechnol 36(11):1059–1061
Weisser H et al (2016) Flexible data analysis pipeline for high-confidence proteogenomics. J Proteome Res 15(12):4686–4695
Junker J et al (2012) TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data. J Proteome Res 11(7):3914–3920
Berthold MR et al (2008) KNIME: the konstanz information miner. In: Data analysis, machine learning and applications. Springer, Berlin
Sturm M, Kohlbacher O (2009) TOPPView: an open-source viewer for mass spectrometry data. J Proteome Res 8(7):3760–3763
Deutsch EW et al (2010) A guided tour of the trans-proteomic pipeline. Proteomics 10(6):1150–1159
Carvalho PC et al (2009) YADA: a tool for taking the most out of high-resolution spectra. Bioinformatics 25(20):2734–2736
Liu X et al (2010) Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol Cell Proteomics 9(12):2772–2782
Sheng Q et al (2015) Preprocessing significantly improves the peptide/protein identification sensitivity of high-resolution isobarically labeled tandem mass spectrometry data. Mol Cell Proteomics 14(2):405–417
Lundgren DH et al (2009) Protein identification using Sorcerer 2 and SEQUEST. Curr Protoc bioinformatics Chapter 13:Unit 13.3
Park CY et al (2008) Rapid and accurate peptide identification from tandem mass spectra. J Proteome Res 7(7):3022–3027
Diament BJ, Noble WS (2011) Faster SEQUEST searching for peptide identification from tandem mass spectra. J Proteome Res 10(9):3871–3879
Xu T et al (2015) ProLuCID: an improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteome 129:16–24
Faherty BK, Gerber SA (2010) MacroSEQUEST: efficient candidate-centric searching and high-resolution correlation analysis for large-scale proteomics data sets. Anal Chem 82(16):6821–6829
Milloy JA, Faherty BK, Gerber SA (2012) Tempest: GPU-CPU computing for high-throughput database spectral matching. J Proteome Res 11(7):3581–3591
Eng JK, Jahan TA, Hoopmann MR (2013) Comet: an open-source MS/MS sequence database search tool. Proteomics 13(1):22–24
Olsen JV, Mann M (2004) Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. Proc Natl Acad Sci U S A 101(37):13417–13422
Griss J (2016) Spectral library searching in proteomics. Proteomics 16(5):729–740
Jaffe JD, Berg HC, Church GM (2004) Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4(1):59–77
Garin-Muga A, Corrales FJ, Segura V (2016) Proteogenomic analysis of single amino acid polymorphisms in cancer research. Adv Exp Med Biol 926:93–113
Aggarwal S, Yadav AK (2016) False discovery rate estimation in proteomics. Methods Mol Biol 1362:119–128
Baker PR, Clauser KR. Protein Prospector. http://prospector.ucsf.edu/prospector/mshome.htm
Everett LJ, Bierl C, Master SR (2010) Unbiased statistical analysis for multi-stage proteomic search strategies. J Proteome Res 9(2):700–707
Matrix Science Ltd. (2010) Mind your P’s and Q’s: Maximising sensitivity with percolator. In: Matrix science ASMS workshop and user meeting Salt Lake City, May 23, 2010
Käll L et al (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 4(11):923–925
Perez-Riverol Y et al (2019) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 47(D1):D442–D450
Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6(2):654–661
Dasari S et al (2012) Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment. J Proteome Res 11(3):1686–1695
Ma ZQ et al (2009) IDPicker 2.0: improved protein assembly with high discrimination peptide identification filtering. J Proteome Res 8(8):3872–3881
Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5(11):976–989
Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data proteomics and 2-DE. Electrophoresis 20:3551–3567
Shortreed MR et al (2015) Global identification of protein post-translational modifications in a single-pass database search. J Proteome Res 14(11):4714–4720
Coleman M (2009) Greylag: software for tandem mass spectrum peptide identification
Risk BA, Spitzer WJ, Giddings MC (2013) Peppy: proteogenomic search software. J Proteome Res 12(6):3019–3025
Jeong K, Kim S, Pevzner PA (2013) UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics 29(16):1953–1962
Tabb DL, Saraf A, Yates JR (2003) GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal Chem 75(23):6415–6421
Wang X et al (2014) JUMP: a tag-based database search tool for peptide identification with high sensitivity and accuracy. Mol Cell Proteomics 13(12):3663–3673
Craig R et al (2006) Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res 5(8):1843–1849
NIST (2019) MS PepSearch
Frewen BE et al (2006) Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal Chem 78(16):5678–5684
Pino LK et al (2020) The Skyline ecosystem: informatics for quantitative mass spectrometry proteomics. Mass Spectrom Rev 39(3):229–244
Funding
T.B.’s research activity is supported by grants from the Italian Association for Cancer Research (grant# IG-2018-21834) and by EPIC-XS, project number 823839, funded by the Horizon 2020 program of the European Union; F.M. is sponsored by a postdoctoral fellowship from FIEO-CCM.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Yadav, A., Marini, F., Cuomo, A., Bonaldi, T. (2021). Software Options for the Analysis of MS-Proteomic Data. In: Cecconi, D. (eds) Proteomics Data Analysis. Methods in Molecular Biology, vol 2361. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1641-3_3
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1641-3_3
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1640-6
Online ISBN: 978-1-0716-1641-3
eBook Packages: Springer Protocols