Software Options for the Analysis of MS-Proteomic Data

Yadav, Avinash; Marini, Federica; Cuomo, Alessandro; Bonaldi, Tiziana

doi:10.1007/978-1-0716-1641-3_3

Avinash Yadav³^na1,
Federica Marini³^na1,
Alessandro Cuomo³^na1 &
…
Tiziana Bonaldi³^na1

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2361))

3026 Accesses
2 Citations
1 Altmetric

The original version of this chapter was revised. The correction to this chapter is available at https://doi.org/10.1007/978-1-0716-1641-3_19

Abstract

Mass spectrometry (MS)-based proteomics is currently the most successful approach to measure and compare peptides and proteins in a large variety of biological samples. Modern mass spectrometers, equipped with high-resolution analyzers, provide large amounts of data output. This is the case of shotgun/bottom-up proteomics, which consists in the enzymatic digestion of protein into peptides that are then measured by MS-instruments through a data dependent acquisition (DDA) mode. Dedicated bioinformatic tools and platforms have been developed to face the increasing size and complexity of raw MS data that need to be processed and interpreted for large-scale protein identification and quantification. This chapter illustrates the most popular bioinformatics solution for the analysis of shotgun MS-proteomics data. A general description will be provided on the data preprocessing options and the different search engines available, including practical suggestions on how to optimize the parameters for peptide search, based on hands-on experience.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Introduction to Mass Spectrometry-Based Proteomics

Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics

Change history

22 January 2022
In the original version of this book, chapter 16 was published non-open access. It has now been changed to open access under a CC BY 4.0 license, and the copyright holder has been updated to “The Author(s).” This book has been updated with these changes.

References

Zhang Y et al (2013) Protein analysis by shotgun/bottom-up proteomics. Chem Rev 113(4):2343–2394
Article CAS PubMed PubMed Central Google Scholar
Martens L et al (2005) Do we want our data raw? Including binary mass spectrometry data in public proteomics data repositories. Proteomics 5(13):3501–3505
Article CAS PubMed Google Scholar
Deutsch E (2008) mzML: a single, unifying data format for mass spectrometer output. Proteomics 8(14):2776–2777
Article CAS PubMed Google Scholar
Deutsch EW (2012) File formats commonly used in mass spectrometry proteomics. Mol Cell Proteomics 11(12):1612–1621
Article PubMed PubMed Central Google Scholar
Chambers MC et al (2012) A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol 30(10):918–920
Article CAS PubMed PubMed Central Google Scholar
Rost HL et al (2016) OpenMS: a flexible open-source software platform for mass spectrometry data analysis. Nat Methods 13(9):741–748
Article CAS PubMed Google Scholar
Smith R et al (2014) Proteomics, lipidomics, metabolomics: a mass spectrometry tutorial from a computer scientist’s point of view. BMC Bioinformatics 15(Suppl 7):S9
Article PubMed PubMed Central Google Scholar
Mujezinovic N et al (2006) Cleaning of raw peptide MS/MS spectra: improved protein identification following deconvolution of multiply charged peaks, isotope clusters, and removal of background noise. Proteomics 6(19):5117–5131
Article CAS PubMed Google Scholar
Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20(9):1466–1467
Article CAS PubMed Google Scholar
Geer LY et al (2004) Open mass spectrometry search algorithm. J Proteome Res 3(5):958–964
Article CAS PubMed Google Scholar
Cox J et al (2011) Andromeda: a peptide search engine integrated into the MaxQuant environment. J Proteome Res 10(4):1794–1805
Article CAS PubMed Google Scholar
Kim S, Pevzner PA (2014) MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5:5277
Article CAS PubMed Google Scholar
Dorfer V et al (2014) MS Amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra. J Proteome Res 13(8):3679–3684
Article CAS PubMed PubMed Central Google Scholar
Barsnes H, Vaudel M (2018) SearchGUI: a highly adaptable common interface for proteomics search and de novo engines. J Proteome Res 17(7):2552–2555
Article CAS PubMed Google Scholar
Vaudel M et al (2015) PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat Biotechnol 33(1):22–24
Article CAS PubMed Google Scholar
Desiere F (2006) The PeptideAtlas project. Nucleic Acids Res 34(90001):D655–D658
Article CAS PubMed Google Scholar
Lam H et al (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7(5):655–667
Article CAS PubMed Google Scholar
Deutsch EW et al (2015) Trans-proteomic pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics. Proteomics Clin Appl 9(7–8):745–754
Article CAS PubMed PubMed Central Google Scholar
Lam H et al (2008) Building consensus spectral libraries for peptide identification in proteomics. Nat Methods 5(10):873–875
Article CAS PubMed PubMed Central Google Scholar
Shiferaw GA et al (2020) COSS: a fast and user-friendly tool for spectral library searching. J Proteome Res 19(7):2786–2793
Article CAS PubMed Google Scholar
Bogdanoff WA et al (2016) De novo sequencing and resurrection of a human astrovirus-neutralizing antibody. ACS Infect Dis 2(5):313–321
Article CAS PubMed PubMed Central Google Scholar
Guthals A et al (2017) De novo MS/MS sequencing of native human antibodies. J Proteome Res 16(1):45–54
Article CAS PubMed Google Scholar
Tran NH et al (2016) Complete de novo assembly of monoclonal antibody sequences. Sci Rep 6:31730
Article CAS PubMed PubMed Central Google Scholar
Tabb DL et al (2008) DirecTag: accurate sequence tags from peptide MS/MS through statistical scoring. J Proteome Res 7(9):3838–3846
Article CAS PubMed PubMed Central Google Scholar
Frank A, Pevzner P (2005) PepNovo: de novo peptide sequencing via probabilistic network modeling. Anal Chem 77(4):964–973
Article CAS PubMed Google Scholar
Tran NH et al (2017) De novo peptide sequencing by deep learning. Proc Natl Acad Sci U S A 114(31):8247–8252
Article CAS PubMed PubMed Central Google Scholar
Lee J-Y et al (2018) Proteomics of natural bacterial isolates powered by deep learning-based de novo identification. bioRxiv 428334. https://doi.org/10.1101/428334
Karunratanakul K et al (2019) Uncovering thousands of new peptides with sequence-mask-search hybrid de novo peptide sequencing framework. Mol Cell Proteomics 18(12):2478–2491
Article CAS PubMed PubMed Central Google Scholar
Zhou XX et al (2017) pDeep: predicting MS/MS spectra of peptides with deep learning. Anal Chem 89(23):12690–12697
Article CAS PubMed Google Scholar
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article CAS Google Scholar
Yang H et al (2019) pNovo 3: precise de novo peptide sequencing using a learning-to-rank framework. Bioinformatics 35(14):i183–i190
Article CAS PubMed PubMed Central Google Scholar
Ma B (2015) Novor: real-time peptide de novo sequencing software. J Am Soc Mass Spectrom 26(11):1885–1894
Article CAS PubMed PubMed Central Google Scholar
Tran NH et al (2019) Deep learning enables de novo peptide sequencing from data-independent-acquisition mass spectrometry. Nat Methods 16(1):63–66
Article PubMed Google Scholar
Ma B et al (2003) PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun Mass Spectrom 17(20):2337–2342
Article CAS PubMed Google Scholar
Tanner S et al (2005) InsPecT: identification of posttranslationally modified peptides from tandem mass spectra. Anal Chem 77(14):4626–4639
Article CAS PubMed Google Scholar
Dasari S et al (2010) TagRecon: high-throughput mutation identification through sequence tagging. J Proteome Res 9(4):1716–1726
Article CAS PubMed PubMed Central Google Scholar
Holman JD, Ma ZQ, Tabb DL (2012) Identifying proteomic LC-MS/MS data sets with Bumbershoot and IDPicker. Curr Protoc Bioinformatics Chapter 13:Unit13.17
Google Scholar
Devabhaktuni A et al (2019) TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat Biotechnol 37(4):469–479
Article CAS PubMed PubMed Central Google Scholar
Bateman A et al (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158–D169
Article CAS Google Scholar
Harrow J et al (2012) GENCODE: the reference human genome annotation for the ENCODE project. Genome Res 22(9):1760–1774
Article CAS PubMed PubMed Central Google Scholar
Pruitt KD et al (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42(D1):756–763
Article Google Scholar
Verheggen K et al (2020) Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows. Mass Spectrom Rev 39(3):292–306
Article CAS PubMed Google Scholar
Siepen JA et al (2007) Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics. J Proteome Res 6(1):399–408
Article CAS PubMed PubMed Central Google Scholar
Elias JE, Gygi SP (2010) Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol Biol 604:55–71
Article CAS PubMed PubMed Central Google Scholar
Zhang J et al (2012) PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 11(4):M111.010587
Article PubMed Google Scholar
Xie F et al (2011) Liquid chromatography-mass spectrometry-based quantitative proteomics. J Biol Chem 286(29):25443–25449
Article CAS PubMed PubMed Central Google Scholar
Mueller LN et al (2008) An assessment of software solutions for the analysis of mass spectrometry based quantitative proteomics data. J Proteome Res 7(1):51–61
Article CAS PubMed Google Scholar
Ong SE et al (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accurate approach to expression proteomics. Mol Cell Proteomics 1(5):376–386
Article CAS PubMed Google Scholar
von Haller PD et al (2003) The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis, and the application of statistical tools for data analysis and interpretation. Mol Cell Proteomics 2(7):428–442
Article Google Scholar
Casey TM et al (2017) Analysis of reproducibility of proteome coverage and quantitation using isobaric mass tags (iTRAQ and TMT). J Proteome Res 16(2):384–392
Article CAS PubMed Google Scholar
Khan Z et al (2009) Protein quantification across hundreds of experimental conditions. Proc Natl Acad Sci U S A 106:15544–15548
Article CAS PubMed PubMed Central Google Scholar
Han DK et al (2001) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat Biotechnol 19(10):946–951
Article CAS PubMed PubMed Central Google Scholar
Pendarvis K et al (2009) An automated proteomic data analysis workflow for mass spectrometry. BMC Bioinformatics 10(Suppl 11):S17
Article PubMed PubMed Central Google Scholar
Cox J et al (2014) Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ. Mol Cell Proteomics 13(9):2513–2526
Article CAS PubMed PubMed Central Google Scholar
Zhang W et al (2012) LFQuant: a label-free fast quantitative analysis tool for high-resolution LC-MS/MS proteomics data. Proteomics 12(23–24):3475–3484
Article CAS PubMed Google Scholar
Van Riper SK et al (2016) RIPPER: a framework for MS1 only metabolomics and proteomics label-free relative quantification. Bioinformatics 32(13):2035–2037
Article PubMed PubMed Central Google Scholar
Chang C et al (2019) PANDA: a comprehensive and flexible tool for quantitative proteomics data analysis. Bioinformatics 35(5):898–900
Article CAS PubMed Google Scholar
Jones AR et al (2012) The mzIdentML data standard for mass spectrometry-based proteomics results. Mol Cell Proteomics 11(7):M111.014381
Article PubMed PubMed Central Google Scholar
Walzer M et al (2013) The mzQuantML data standard for mass spectrometry-based quantitative studies in proteomics. Mol Cell Proteomics 12(8):2332–2340
Article CAS PubMed PubMed Central Google Scholar
Proteome Discoverer. https://www.thermofisher.com/
Protein Pilot. https://sciex.com/
Bern M, Kil YJ, Becker C (2012) Byonic: advanced peptide and protein identification software. Curr Protoc Bioinformatics Chapter 13:Unit13.20
Google Scholar
Progenesis. http://www.nonlinear.com/progenesis/
Peaks Studio. www.thermofisher.com
Mascot Distiller. https://www.matrixscience.com/
Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat Biotechnol 26(12):1367–1372
Article CAS PubMed Google Scholar
Tyanova S et al (2016) The Perseus computational platform for comprehensive analysis of (prote)omics data. Nat Methods 13(9):731–740
Article CAS PubMed Google Scholar
Sinitcyn P et al (2018) MaxQuant goes Linux. Nat Methods 15(6):401–401
Article CAS PubMed Google Scholar
Chi H et al (2018) Comprehensive identification of peptides in tandem mass spectra using an efficient open search engine. Nat Biotechnol 36(11):1059–1061
Article CAS Google Scholar
Weisser H et al (2016) Flexible data analysis pipeline for high-confidence proteogenomics. J Proteome Res 15(12):4686–4695
Article CAS PubMed PubMed Central Google Scholar
Junker J et al (2012) TOPPAS: a graphical workflow editor for the analysis of high-throughput proteomics data. J Proteome Res 11(7):3914–3920
Article CAS PubMed Google Scholar
Berthold MR et al (2008) KNIME: the konstanz information miner. In: Data analysis, machine learning and applications. Springer, Berlin
Google Scholar
Sturm M, Kohlbacher O (2009) TOPPView: an open-source viewer for mass spectrometry data. J Proteome Res 8(7):3760–3763
Article CAS PubMed Google Scholar
Deutsch EW et al (2010) A guided tour of the trans-proteomic pipeline. Proteomics 10(6):1150–1159
Article CAS PubMed PubMed Central Google Scholar
Carvalho PC et al (2009) YADA: a tool for taking the most out of high-resolution spectra. Bioinformatics 25(20):2734–2736
Article CAS PubMed PubMed Central Google Scholar
Liu X et al (2010) Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol Cell Proteomics 9(12):2772–2782
Article CAS PubMed PubMed Central Google Scholar
Sheng Q et al (2015) Preprocessing significantly improves the peptide/protein identification sensitivity of high-resolution isobarically labeled tandem mass spectrometry data. Mol Cell Proteomics 14(2):405–417
Article CAS PubMed Google Scholar
Lundgren DH et al (2009) Protein identification using Sorcerer 2 and SEQUEST. Curr Protoc bioinformatics Chapter 13:Unit 13.3
Google Scholar
Park CY et al (2008) Rapid and accurate peptide identification from tandem mass spectra. J Proteome Res 7(7):3022–3027
Article CAS PubMed PubMed Central Google Scholar
Diament BJ, Noble WS (2011) Faster SEQUEST searching for peptide identification from tandem mass spectra. J Proteome Res 10(9):3871–3879
Article CAS PubMed PubMed Central Google Scholar
Xu T et al (2015) ProLuCID: an improved SEQUEST-like algorithm with enhanced sensitivity and specificity. J Proteome 129:16–24
Article CAS Google Scholar
Faherty BK, Gerber SA (2010) MacroSEQUEST: efficient candidate-centric searching and high-resolution correlation analysis for large-scale proteomics data sets. Anal Chem 82(16):6821–6829
Article CAS PubMed PubMed Central Google Scholar
Milloy JA, Faherty BK, Gerber SA (2012) Tempest: GPU-CPU computing for high-throughput database spectral matching. J Proteome Res 11(7):3581–3591
Article CAS PubMed PubMed Central Google Scholar
Eng JK, Jahan TA, Hoopmann MR (2013) Comet: an open-source MS/MS sequence database search tool. Proteomics 13(1):22–24
Article CAS PubMed Google Scholar
Olsen JV, Mann M (2004) Improved peptide identification in proteomics by two consecutive stages of mass spectrometric fragmentation. Proc Natl Acad Sci U S A 101(37):13417–13422
Article CAS PubMed PubMed Central Google Scholar
Griss J (2016) Spectral library searching in proteomics. Proteomics 16(5):729–740
Article CAS PubMed Google Scholar
Jaffe JD, Berg HC, Church GM (2004) Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4(1):59–77
Article CAS PubMed Google Scholar
Garin-Muga A, Corrales FJ, Segura V (2016) Proteogenomic analysis of single amino acid polymorphisms in cancer research. Adv Exp Med Biol 926:93–113
Article CAS PubMed Google Scholar
Aggarwal S, Yadav AK (2016) False discovery rate estimation in proteomics. Methods Mol Biol 1362:119–128
Article CAS PubMed Google Scholar
Baker PR, Clauser KR. Protein Prospector. http://prospector.ucsf.edu/prospector/mshome.htm
Everett LJ, Bierl C, Master SR (2010) Unbiased statistical analysis for multi-stage proteomic search strategies. J Proteome Res 9(2):700–707
Article CAS PubMed Google Scholar
Matrix Science Ltd. (2010) Mind your P’s and Q’s: Maximising sensitivity with percolator. In: Matrix science ASMS workshop and user meeting Salt Lake City, May 23, 2010
Google Scholar
Käll L et al (2007) Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat Methods 4(11):923–925
Article PubMed Google Scholar
Perez-Riverol Y et al (2019) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res 47(D1):D442–D450
Article CAS PubMed Google Scholar
Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6(2):654–661
Article CAS PubMed PubMed Central Google Scholar
Dasari S et al (2012) Pepitome: evaluating improved spectral library search for identification complementarity and quality assessment. J Proteome Res 11(3):1686–1695
Article CAS PubMed PubMed Central Google Scholar
Ma ZQ et al (2009) IDPicker 2.0: improved protein assembly with high discrimination peptide identification filtering. J Proteome Res 8(8):3872–3881
Article CAS PubMed PubMed Central Google Scholar
Eng JK, McCormack AL, Yates JR (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom 5(11):976–989
Article CAS PubMed Google Scholar
Pappin DJC, Creasy DM, Cottrell JS (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data proteomics and 2-DE. Electrophoresis 20:3551–3567
Article PubMed Google Scholar
Shortreed MR et al (2015) Global identification of protein post-translational modifications in a single-pass database search. J Proteome Res 14(11):4714–4720
Article CAS PubMed PubMed Central Google Scholar
Coleman M (2009) Greylag: software for tandem mass spectrum peptide identification
Google Scholar
Risk BA, Spitzer WJ, Giddings MC (2013) Peppy: proteogenomic search software. J Proteome Res 12(6):3019–3025
Article CAS PubMed PubMed Central Google Scholar
Jeong K, Kim S, Pevzner PA (2013) UniNovo: a universal tool for de novo peptide sequencing. Bioinformatics 29(16):1953–1962
Article CAS PubMed PubMed Central Google Scholar
Tabb DL, Saraf A, Yates JR (2003) GutenTag: high-throughput sequence tagging via an empirically derived fragmentation model. Anal Chem 75(23):6415–6421
Article CAS PubMed PubMed Central Google Scholar
Wang X et al (2014) JUMP: a tag-based database search tool for peptide identification with high sensitivity and accuracy. Mol Cell Proteomics 13(12):3663–3673
Article CAS PubMed PubMed Central Google Scholar
Craig R et al (2006) Using annotated peptide mass spectrum libraries for protein identification. J Proteome Res 5(8):1843–1849
Article CAS PubMed Google Scholar
NIST (2019) MS PepSearch
Google Scholar
Frewen BE et al (2006) Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal Chem 78(16):5678–5684
Article CAS PubMed Google Scholar
Pino LK et al (2020) The Skyline ecosystem: informatics for quantitative mass spectrometry proteomics. Mass Spectrom Rev 39(3):229–244
Article CAS PubMed Google Scholar

Download references

Funding

T.B.’s research activity is supported by grants from the Italian Association for Cancer Research (grant# IG-2018-21834) and by EPIC-XS, project number 823839, funded by the Horizon 2020 program of the European Union; F.M. is sponsored by a postdoctoral fellowship from FIEO-CCM.

Author information

Avinash Yadav and Federica Marini contributed equally to this work.

Authors and Affiliations

Department of Experimental Oncology, European Institute of Oncology (IEO), IRCCS, Milan, Italy
Avinash Yadav, Federica Marini, Alessandro Cuomo & Tiziana Bonaldi

Authors

Avinash Yadav
View author publications
You can also search for this author in PubMed Google Scholar
Federica Marini
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Cuomo
View author publications
You can also search for this author in PubMed Google Scholar
Tiziana Bonaldi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tiziana Bonaldi .

Editor information

Editors and Affiliations

Department of Biotechnology, University of Verona, VERONA, Verona, Italy
Daniela Cecconi

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Yadav, A., Marini, F., Cuomo, A., Bonaldi, T. (2021). Software Options for the Analysis of MS-Proteomic Data. In: Cecconi, D. (eds) Proteomics Data Analysis. Methods in Molecular Biology, vol 2361. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1641-3_3

Download citation

DOI: https://doi.org/10.1007/978-1-0716-1641-3_3
Published: 09 July 2021
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1640-6
Online ISBN: 978-1-0716-1641-3
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Software Options for the Analysis of MS-Proteomic Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Introduction to Mass Spectrometry-Based Proteomics

Introduction to Mass Spectrometry-Based Proteomics

Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics

Change history

22 January 2022

References

Funding

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation