Abstract
A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n+z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.
Supported by the ESPRIT Long Term Research Programme of the EU under project number 20244 (ALCOM-IT).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
G.M. Adel’son-Vel’skii and Y.M. Landis. An algorithm for the organization of information. Doklady Akademii Nauk SSSR, 146:263–266, 1962. English translation in Soviet Math. Dokl., 3:1259-1262.
A. Apostolico and F.P. Preparata. Optimal off-line detection of repetitions in a string. Theoretical Computer Science, 22:297–315, 1983.
G.S. Brodal, R.B. Lyngsø, C.N.S. Pedersen, and J. Stoye. Finding maximal pairs with bounded gap. Technical Report RS-99-12, BRICS, April 1999.
M.R. Brown and R.E. Tarjan. A fast merging algorithm. Journal of the ACM, 26(2):211–226, 1979.
M. Crochemore. An optimal algorithm for computing the repetitions in a word. Information Processing Letters, 12(5):244–250, 1981.
M. Crochemore. Tranducers and repetitions. Theoretical Computer Science, 45:63–86, 1986.
M. Farach. Optimal sufix tree construction with large alphabets. In Proceedings of the 38th Annual Symposium on Foundations of Computer Science (FOCS), pages 137–143, 1997.
L.J. Guibas and R. Sedgewick. A dichromatic framework for balanced trees. In Proceedings of the 19th Annual Symposium on Foundations of Computer Science (FOCS), pages 8–21, 1978.
D. Gusfield. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.
D. Gusfield and J. Stoye. Linear time algorithms for_nding and representing all the tandem repeats in a string. Technical Report CSE-98-4, Department of Computer Science, UC Davis, 1998.
S. Huddleston and K. Mehlhorn. A new data structure for representing sorted lists. Acta Informatica, 17:157–184, 1982.
F.K. Hwang and S. Lin. A simple algorithm for merging two disjoint linearly ordered sets. SIAM Journal on Computing, 1(1):31–39, 1972.
S. Karlin, M. Morris, G. Ghandour, and M.-Y. Leung. Efficient algorithms for molecular sequence analysis. Proceedings of the National Academy of Science, USA, 85:841–845, 1988.
R. Kolpakov and G. Kucherov. Maximal repetitions in words or how to find all squares in linear time. Technical Report 98-R-227, LORIA, 1998.
S.R. Kosaraju. Computation of squares in a string. In Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 807 of Lecture Notes in Computer Science, pages 146–150, 1994.
G.M. Landau and J.P. Schmidt. An algorithm for approximate tandem repeats. In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science, pages 120–133, 1993.
M.-Y. Leung, B.E. Blaisdell, C. Burge, and S. Karlin. An efficient algorithm for identifying matches with errors in multiple long molecular sequences. Journal of Molecular Biology, 221:1367–1378, 1991.
M.G. Main and R.J. Lorentz. An O(n log n) algorithm for finding all repetitions in a string. Journal of Algorithms, 5:422–432, 1984.
M.G. Main and R.J. Lorentz. Linear time recognition of squarefree strings. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, volume F12 of NATO ASI Series, pages 271–278. Springer, Berlin, 1985.
E.M. McCreight. A space-economical sufix tree construction algorithm. Journal of the ACM, 23(2):262–272, 1976.
K. Mehlhorn. Sorting and Searching, volume 1 of Data Structures and Algorithms. Springer-Verlag, 1994.
K. Mehlhorn and S. Näher. The LEDA Platform of Combinatorial and Geometric Computing. Cambridge University Press, 1999. To appear. See http://www.mpisb.mpg.de/_mehlhorn/LEDAbook.html.
M.-F. Sagot and E.W. Myers. Identifying satellites in nucleic acid sequences. In Proceedings of the 2nd Annual International Conference on Computational Molecular Biology (RECOMB), pages 234–242, 1998.
J. Stoye and D. Gusfield. Simple and flexible detection of contiguous repeats using a sufix tree. In Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1448 of Lecture Notes in Computer Science, pages 140–152, 1998.
E. Ukkonen. On-line construction of sufix trees. Algorithmica, 14:249–260, 1995.
P. Weiner. Linear pattern matching algorithms. In Proceedings of the 14th Symposium on Switching and Automata Theory, pages 1–11, 1973.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brodal, G.S., Lyngsø, R.B., Pedersen, C.N.S., Stoye, J. (1999). Finding Maximal Pairs with Bounded Gap. In: Crochemore, M., Paterson, M. (eds) Combinatorial Pattern Matching. CPM 1999. Lecture Notes in Computer Science, vol 1645. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48452-3_11
Download citation
DOI: https://doi.org/10.1007/3-540-48452-3_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66278-5
Online ISBN: 978-3-540-48452-3
eBook Packages: Springer Book Archive