Match Chaining Algorithms for cDNA Mapping

Shibuya, Tetsuo; Kurochkin, Igor

doi:10.1007/978-3-540-39763-2_33

Tetsuo Shibuya⁹ &
Igor Kurochkin¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

884 Accesses
15 Citations

Abstract

We propose a new algorithm called the MCCM (Match Chaining-based cDNA Mapping) algorithm that allows mapping cDNAs to the genomes efficiently and accurately, utilizing local matches called MUMs (maximal unique matches) or MRMs (maximal rare matches) obtained with suffix trees. From the MUMs (or MRMs), our algorithm selects appropriate matches which are related to the cDNA mapping. We call the selection the match chaining problem. Several O(klogk)-time algorithms are known where k is the number of the input matches, but they do not permit overlaps of the matches. We propose a new O(klogk)-time algorithm for the problem with provision for overlaps. Previously, only an O(k ²)-time algorithm existed. Furthermore, we also incorporate a restriction on the distances between matches for accurate cDNA mapping. We examine the performance of our algorithm through computational experiments using sequences of the FANTOM mouse cDNA database and the mouse genome. According to the experiments, the MCCM algorithm is not only very fast, but also very accurate: We achieved >95% specificity and >97% sensitivity at the same time against the mapping results of the FANTOM annotators.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

microTaboo: a general and practical solution to the k-disjoint problem

Article Open access 02 May 2017

Sequence to Graph Alignment Using Gap-Sensitive Co-linear Chaining

MinCNE: Identifying Conserved Noncoding Elements Using Min-Wise Hashing

References

Aho, A.V., Hopcroft, J.E., Ullman, J.D.: Data Structures and Algorithms. Addison-Wesley, Reading (1983)
MATH Google Scholar
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Google Scholar
Adelson-Velskil, G.M., Landis, E.M.: Soviet Math (Dokl.) 3, 1259–1263 (1962)
Google Scholar
Bender, M.A., Farach, M.: The LCA problem revisited. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 88–94. Springer, Heidelberg (2000)
Chapter Google Scholar
Bentley, J., Maurer, H.: Efficient worst-case data structures for range searching. Acta Informatica 13, 155–168 (1980)
Article MATH MathSciNet Google Scholar
Delcher, A.L., Kasif, S., Fleischmann, D., Paterson, J., White, O., Salzberg, S.L.: Alignment of whole genomes. Nucleic Acids Res. 27(11), 2369–2376 (1999)
Article Google Scholar
Delcher, A.L., Phillippy, A., Carlton, J., Salzberg, L.: Fast algorithms for large-scale genome alignment and comparison. Nucleic Acids Res. 30(11), 2478–2483 (2002)
Article Google Scholar
FANTOM Consortium and the RIKEN Genome Exploration Research Group Phase I & II Team, Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature, 420, 563-573 (2002)
Google Scholar
Farach, M.: Optimal suffix tree construction with large alphabets. In: Proc. 38th IEEE Symp. Foundations of Computer Science, pp. 137–143 (1997)
Google Scholar
Florea, L., Hartzell, G., Zhang, Z., Rubin, G.M., Miller, W.: A computer program for aligning a cDNA Sequence with a Genomic DNA Sequence. Genome Res. 8, 967–974 (1998)
Google Scholar
Gelfand, M.S., Mironov, A.A., Pevzner, P.A.: Gene recognition via spliced sequence alignment. Proc. Natl. Acad. Sci. USA 93, 9061–9066 (1996)
Article Google Scholar
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Hoehl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(Suppl. 1), S312–S320 (2002)
Google Scholar
Kent, W.J.: The BLAST-like alignment tool. Genome Res. 12, 656–664 (2002)
MathSciNet Google Scholar
Labrador, M., Mongelard, F., Plata-Rengifo, P., Bacter, E.M., Corces, V.G., Gerasimova, T.I.: Protein encoding by both DNA strands. Nature 409, 1000 (2001)
Article Google Scholar
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Article MATH MathSciNet Google Scholar
McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23, 262–272 (1976)
Article MATH MathSciNet Google Scholar
Mott, R.: EST GENOME: A program to align spliced DNA sequences to unspliced genomic DNA. Comput. Applic. Biosci. 13(4), 477–478 (1997)
Google Scholar
Myers, E., Miller, W.: Chaining multiple-alignment fragments in subquadratic time. In: Proc. ACM-SIAM Symp. on Discrete Algorithms, pp. 38–47 (1995)
Google Scholar
Ogasawara, J., Morishita, S.: Fast and sensitive algorithm for aligning ESTs to Human Genome. In: Proc. 1st IEEE Computer Society Bioinformatics Conference, Palo Alto, CA, pp. 43–53 (2002)
Google Scholar
Sze, S.-H., Pevzner, P.A.: Las Vegas algorithms for gene recognition: suboptimal and error-tolerant spliced alignment. J. Comp. Biol. 4(3), 297–309 (1997)
Article Google Scholar
Ukkonen, E.: On-line construction of suffix-trees. Algorithmica 14, 249–260 (1995)
Article MATH MathSciNet Google Scholar
Usuka, J., Zhu, W., Brendel, V.: Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16(3), 203–211 (2000)
Article Google Scholar
Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Tokyo Research Laboratory, 1623-14, Shimo-tsuruma, Yamato, Kanagawa, 242-8502, Japan
Tetsuo Shibuya
RIKEN Genomic Sciences Center, 1-7-22, Suehiro-cho, Tsurumi, Yokohama, Kanagawa, 230-0045, Japan
Igor Kurochkin

Authors

Tetsuo Shibuya
View author publications
You can also search for this author in PubMed Google Scholar
Igor Kurochkin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Biomathematical Sciences, The Mount Sinai School of Medicine, 10029-6574, New York, NY
Gary Benson
Institute of Biomedical and Life Sciences, Division of Environmental and Evolutionary Biology, University of Glasgow, G12 8QQ, Glasgow, Scotland
Roderic D. M. Page

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shibuya, T., Kurochkin, I. (2003). Match Chaining Algorithms for cDNA Mapping. In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_33

Download citation

DOI: https://doi.org/10.1007/978-3-540-39763-2_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20076-5
Online ISBN: 978-3-540-39763-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Match Chaining Algorithms for cDNA Mapping

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

microTaboo: a general and practical solution to the k-disjoint problem

Sequence to Graph Alignment Using Gap-Sensitive Co-linear Chaining

MinCNE: Identifying Conserved Noncoding Elements Using Min-Wise Hashing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Match Chaining Algorithms for cDNA Mapping

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

microTaboo: a general and practical solution to the k-disjoint problem

Sequence to Graph Alignment Using Gap-Sensitive Co-linear Chaining

MinCNE: Identifying Conserved Noncoding Elements Using Min-Wise Hashing

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation