Abstract
Tandem mass spectrometry (MS/MS) is the most important method for the peptide and protein identification. One approach to interpret the MS/MS data is de novo sequencing, which is becoming more and more accurate and important. However De novo sequencing usually can only confidently determine partial sequences, while the undetermined parts are represented by “mass gaps”. We call such a partially determined sequence a gapped sequence tag. When a gapped sequence tag is searched in a database for protein identification, the determined parts should match the database sequence exactly, while each mass gap should match a substring of amino acids whose masses total up to the value of the mass gap. In such a case, the standard string matching algorithm does not work any more. In this paper, we present a new efficient algorithm to find the matches of gapped sequence tags in a protein database.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aebersold, R., Mann, M.: Mass spectrometry-based proteomics. Nature 422, 198–207 (2003)
Castelo, A.T., Martins, W., Gao, G.R.: TROLL-Tandem Repeat Occurrence Locator. Bioinformatics 18, 634–636 (2002)
Aho, V.A., Corasick, J.M.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18(6), 333–340 (1975)
Altschul, S.F., et al.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
Bartels, C.: Fast algorithm for peptide sequencing by mass spectroscopy. Biomed. Environ. Mass Spectrom. 19, 363–368 (1990)
Brudno, M., et al.: Fast and sensitive multiple alignment of large genomic sequences. BMC Bioinformatics 4, 66 (2003)
Chen, T., et al.: A dynamic programming approach to de novo peptide sequencing via tandem mass spectrometry. J. Comp. Biology 8(3), 325–337 (2001)
Dančík, V., et al.: De novo protein sequencing via tandem mass-spectrometry. J. Comp. Biology 6, 327–341 (1999)
Eng, J.K., McCormack, A.L., Yates, J.R.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am.Soc. Mass Spectrom. 5, 976–989 (1994)
Fernández-de-Cossío, J., et al.: Automated interpretation of high-energy collisioninduced dissociation spectra of singly-protonated peptides by “SeqMS”, a software aid for de novo sequencing by MS/MS. Rapid Commun. Mass Spectrom. 12, 1867–1878 (1998)
Hines, W.M., et al.: Pattern-based algorithm for peptide sequencing from tandem high energy collision-induced dissociation mass spectra. J. Am. Sco. Mass. Spectrom. 3, 326–336 (1992)
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)
Ma, B., et al.: PEAKS: powerful software for peptide de novo sequencing by MS/MS. Commun. Mass Spectrom. 17(20), 2337–2342 (2003)
Ma, B., Zhang, K., Liang, C.: An effective algorithm for the peptide de novo sequencing from MS/MS spectrum. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds.) CPM 2003. LNCS, vol. 2676, pp. 266–278. Springer, Heidelberg (2003)
Morris, J.H., Pratt, V.P.: A linear pattern-matching algorithm, Report 40, University of California, Berkeley (1970)
Morgenstern, B.: DIALIGN 2: improvement of the segment-tosegment approach to multiple sequence alignment. Bioinformatics 15, 211–218 (1999)
Perkins, D.N., et al.: Probability-based protein identification by searching sequence database using mass spectrometry data. Electrophoresis 20, 3551–3567 (1999)
Pevzner, P.A., Dančík, V., Tang, C.: Mutation tolerant protein identification by mass spectrometry. J. Comp. Biology 6, 777–787 (2000)
Sakurai, T., et al.: Paas3: A computer program to determine probable sequence of peptides from mass spectrometric data. Biomed. Mass spectrum 11(8), 396–399 (1984)
Searle, B.C., et al.: High-throughput identification of proteins and unanticipated sequence modifications using a mass-based alignment algorithm for MS/MS de Novo sequencing results. To appear in Anal. Chem.
Shevchenko, A., et al.: Charting the proteomes of organisms with unsequenced genomes by MALDI-quadrupole time-of-flight mass spectrometry and BLAST homology searching. Anal Chem. 73(9), 1917–1926 (2001)
Taylor, J.A., Johnson, R.S.: Sequence database searches via de novo peptide sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 11, 1067–1075 (1997)
Taylor, J.A., Johnson, R.S.: Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem. 73, 2594–2604 (2001)
Yates, J.R.I., et al.: Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67, 1426–1436 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Han, Y., Ma, B., Zhang, K. (2005). An Automata Approach to Match Gapped Sequence Tags Against Protein Database. In: Domaratzki, M., Okhotin, A., Salomaa, K., Yu, S. (eds) Implementation and Application of Automata. CIAA 2004. Lecture Notes in Computer Science, vol 3317. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30500-2_16
Download citation
DOI: https://doi.org/10.1007/978-3-540-30500-2_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24318-2
Online ISBN: 978-3-540-30500-2
eBook Packages: Computer ScienceComputer Science (R0)