Abstract
We present a framework for the analysis of multiplexed mass spectrometry proteomics data that reduces estimation error when combining multiple isobaric batches. Variations in the number and quality of observations have long complicated the analysis of isobaric proteomics data. Here we show that the power to detect statistical associations is substantially improved by utilizing models that directly account for known sources of variation in the number and quality of observations that occur across batches.
In a multibatch benchmarking experiment, our open-source software (msTrawler) increases the power to detect changes, especially in the range of less than twofold changes, while simultaneously increasing quantitative proteome coverage by utilizing more low-signal observations. Further analyses of previously published multiplexed datasets of 4 and 23 batches highlight both increased power and the ability to navigate complex missing data patterns without relying on unverifiable imputations or discarding reliable measurements.
Similar content being viewed by others
Code availability
The msTrawler R package, is available at www.github.com/Calico/msTrawler and the supplementary data and code used to generate the analyses in this paper are available at https://console.cloud.google.com/storage/browser/mstrawler_paper. A patent has been filed for the msTrawler data analysis framework and workflows.
References
Gaun, A. et al. Automated 16-plex plasma proteomics with real-time search and ion mobility mass spectrometry enables large-scale profiling in naked mole-rats and mice.J. Proteome Res. 20, 1280–1295 (2021).
Muntel, J. et al. Comparison of protein quantification in a complex background by DIA and TMT workflows with fixed instrument time. J. Proteome Res. 18, 1340–1351 (2019).
Li, J. et al. TMTpro-18plex: the expanded and complete set of TMTpro reagents for sample multiplexing. J. Proteome Res. 20, 2964–2972 (2021).
Nusinow, D. P. et al. Quantitative proteomics of the cancer cell line encyclopedia. Cell 180, 387–402.e16 (2020).
Petralia, F. et al. Integrated proteogenomic characterization across major histological types of pediatric brain cancer. Cell 183, 1962–1985.e31 (2020).
Keele, G. R. et al. Regulation of protein abundance in genetically diverse mouse populations. Cell Genom. 1, 100003 (2021).
Brenes, A., Hukelmann, J., Bensaddek, D. & Lamond, A. I. Multibatch TMT reveals false positives, batch effects and missing values. Mol. Cell. Proteomics 18, 1967–1980 (2019).
O’Brien, J. J. et al. Compositional proteomics: effects of spatial constraints on protein quantification utilizing isobaric tags. J. Proteome Res. 17, 590–599 (2018).
Huang, T. et al. MSstatsTMT: statistical detection of differentially abundant proteins in experiments with isobaric labeling and multiple mixtures. Mol. Cell. Proteomics 19, 1706–1723 (2020).
Clark, D. J. et al. Integrated proteogenomic characterization of clear cell renal cell carcinoma. Cell 179, 964–983.e31 (2019).
O’Brien, J. J. et al. The effects of nonignorable missing data on label-free mass spectrometry proteomics experiments. Ann. Appl. Stat. 12, 2075–2095 (2018).
Lazar, C. et al. Accounting for the multiple natures of missing values in label-free quantitative proteomics data sets to compare imputation strategies. J. Proteome Res. 15, 1116–1125 (2016).
O’Connell, J. D., Paulo, J. A., O’Brien, J. J. & Gygi, S. P. Proteome-wide evaluation of two common protein quantification methods. J. Proteome Res. 17, 1934–1942 (2018).
Chan, M. et al. Novel insights from a multiomics dissection of the Hayflick limit. eLife 11, e70283 (2022).
Schweppe, D. K. et al. Characterization and optimization of multiplexed quantitative analyses using high-field asymmetric-waveform ion mobility mass spectrometry. Anal. Chem. 91, 4010–4016 (2019).
Navarrete-Perea, J., Gygi, S. P. & Paulo, J. A. HYpro16: a two-proteome mixture to assess interference in isobaric tag-based sample multiplexing experiments. J. Am. Soc. Mass Spectrom. 32, 247–254 (2021).
Paulo, J. A. et al. Quantitative mass spectrometry-based multiplexing compares the abundance of 5000 S. cerevisiae proteins across 10 carbon sources. J. Proteomics 148, 85–93 (2016).
Peshkin, L., Gupta, M., Ryazanova, L. & Wühr, M. Bayesian confidence intervals for multiplexed proteomics integrate ion-statistics with peptide quantification concordance. Mol. Cell Proteomics 18, 2108–2120 (2019).
Karp, N. A. et al. Addressing accuracy and precision issues in iTRAQ quantitation. Mol. Cell. Proteomics 9, 1885–1897 (2010).
McAlister, G. C. et al. MultiNotch MS3 enables accurate, sensitive, and multiplexed detection of differential expression across cancer cell line proteomes. Anal. Chem. 86, 7150–7158 (2014).
Herbrich, S. M. et al. Statistical inference from multiple iTRAQ experiments without using common reference standards. J. Proteome Res. 12, 594–604 (2013).
Erickson, B. K. et al. Evaluating multiplexed quantitative phosphopeptide analysis on a hybrid quadrupole mass filter/linear ion trap/orbitrap mass spectrometer. Anal. Chem. 87, 1241–1249 (2015).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B (Methodol.) 57, 289–300 (1995).
Wagner, K. D. & Wagner, N. The senescence markers p16INK4A, p14ARF/p19ARF, and p21 in organ development and homeostasis. Cells 11, 1966 (2022).
Ma, W. et al. DreamAI: algorithm for the imputation of proteomics data. Preprint at bioRxiv https://doi.org/10.1101/2020.07.21.214205 (2020).
Pereira, M. S. L., Klamt, F., Thomé, C. C., Worm, P. V. & de Oliveira, D. L. Metabotropic glutamate receptors as a new therapeutic target for malignant gliomas.Oncotarget 8, 22279–22298 (2017).
O’Brien, J. J., Gunawardena, H. P. & Qaqish, B. F. Row versus column correlations: avoiding the ecological fallacy in RNA/protein expression studies. Brief Bioinform. 19, 946–953 (2017).
Schweppe, D. K. et al. Full-featured, real-time database searching platform enables fast and accurate multiplexed quantitative proteomics. J. Proteome Res. 19, 2026–2034 (2020).
Huttlin, E. L. et al. A tissue-specific atlas of mouse protein phosphorylation and expression. Cell 143, 1174–1189 (2010).
Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).
Eng, J. K., McCormack, A. L. & Yates, J. R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5, 976–989 (1994).
Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).
Perez-Riverol, Y. et al. The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47, D442–D450 (2019).
Acknowledgements
We thank all members of the mass spectrometry and computational teams at Calico Life Sciences LLC for assistance and helpful discussions, in particular E. Melamud, B. Bennett, L. Chan, T. Nguyen, P. Seitzer and J. Xu. Also, we thank the IT teams for their help with the support of in-house data analysis software and in particular A. Chekholko. We also thank S. Gygi, D. Schweppe, J. Mintseris, E. Huttlin, M. Wühr and B. Qaqish for helping us to clarify the key messages in the paper. Funding for this work was provided by Calico Life Sciences LLC.
Author information
Authors and Affiliations
Contributions
Experiments were conceived and planned by J.J.O., A.R., A.W. and F.E.M. Experiments were carried out by A.W., A.G. and N.O. The new algorithms were created by J.J.O. The software was developed by J.J.O. and W.L. Data analyses and interpretations were performed by J.J.O., W.L., A.R., A.G., D.G.H., N.O. and F.E.M. The paper was written by J.J.O. with input from all authors.
Corresponding authors
Ethics declarations
Competing interests
All authors were employees of Calico Life Sciences LLC at the time of submission.
Peer review
Peer review information
Nature Methods thanks Samuel Payne and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Appendix 1 (Supplementary Table 1 and Fig. 1), Appendix 2 (Supplementary Figs. 2 and 3), Appendix 3 (Supplementary Fig. 4 and Tables 2 and 3), Appendix 4 (Supplementary Fig. 5), Appendix 5 (Supplementary Fig. 6), Supplementary Figs. 7–10, Appendix 6 (supplementary methods and Table 4) and references.
Supplementary Dataset 1
msTrawler results from the interbatch experiment.
Supplementary Dataset 2
Worksheet containing results from the msTrawler reanalysis of the Hayflick time course.
Supplementary Dataset 3
msTrawler results from the reanalysis of the pediatric brain tumor data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
O’Brien, J.J., Raj, A., Gaun, A. et al. A data analysis framework for combining multiple batches increases the power of isobaric proteomics experiments. Nat Methods 21, 290–300 (2024). https://doi.org/10.1038/s41592-023-02120-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41592-023-02120-6
- Springer Nature America, Inc.