Abstract
Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least one such match exists. So, for example for the alphabet Σ = {a, b, c}, the string s = abaccbabaaa has Parikh vector p(s) = (6,3,2), and the Parikh vector q = (2,1,1) appears once in s in position (1,4). Like its more precise counterpart, the renown Exact String Matching, Jumbled Pattern Matching has ubiquitous applications, e.g., string matching with a dyslectic word processor, table rearrangements, anagram checking, Scrabble playing and, allegedly, also analysis of mass spectrometry data. We consider two simple algorithms for Jumbled Pattern Matching and use very complicated data structures and analytic tools to show that they are not worse than the most obvious algorithm. We also show that we can achieve non-trivial efficient average case behavior, but that’s less fun to describe in this abstract so we defer the details to the main part of the article, to be read at the reader’s risk...well, at the reader’s discretion.
Part of this work was done while F.C. and Zs.L. were visiting the Alfréd Rényi Institute of Mathematics in Budapest, Hungary, within the EU Marie Curie Transfer of Knowledge project “Hungarian Bioinformatics (HUBI).”
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(5-6), 409–421 (2003)
Babai, L., Felzenszwalb, P.F.: Computing rank-convolutions with a mask. ACM Trans. Algorithms 6(1), 1–13 (2009)
Bellman, R., Karush, W.: Mathematical programming and the maximum transform. Journal of the Soc. for Industrial and Applied Math. 10(3), 550–567 (1962)
Benson, G.: Composition alignment. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 447–461. Springer, Heidelberg (2003)
Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry. Bioinformatics 23(2), 5–12 (2007)
Böcker, S., Lipták, Z.: A fast and simple algorithm for the Money Changing Problem. Algorithmica 48(4), 413–432 (2007)
Bremner, D., Chan, T.M., Demaine, E.D., Erickson, J., Hurtado, F., Iacono, J., Langerman, S., Taslakian, P.: Necklaces, convolutions, and X + Y. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 160–171. Springer, Heidelberg (2006)
Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)
Chan, T.M.: All-pairs shortest paths with real weights in O(n 3/logn) time. Algorithmica 50(2), 236–243 (2008)
Cicalese, F., Fici, G., Lipták, Z.: Searching for jumbled patterns in strings. In: Proc. of the Prague Stringology Conference 2009, pp. 105–117 (2009)
Cieliebak, M., Erlebach, T., Lipták, Z., Stoye, J., Welzl, E.: Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Applied Mathematics 137(1), 27–46 (2004)
Clark, D.: Compact pat trees. PhD thesis, University of Waterloo, Canada (1996)
Eppstein, D.A.: Efficient algorithms for sequence analysis with concave and convex gap costs. PhD thesis, New York, NY, USA (1989)
Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. Journal of Computational Biology 11(6), 1050–1060 (2004)
Felzenszwalb, P.F., Huttenlocher, D.P., Kleinberg, J.M.: Fast algorithms for large-state-space HMMs with applications to web usage analysis. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) NIPS. MIT Press, Cambridge (2003)
Goczyła, K.: The generalized Banach match-box problem: Application in disc storage management. Acta Applicandae Mathematicae 5, 27–36 (1986)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)
Jokinen, P., Tarhio, J., Ukkonen, E.: A comparison of approximate string matching algorithms. Software Practice and Experience 26(12), 1439–1458 (1996)
Mendelson, H., Pliskin, J., Yechiali, U.: Optimal storage allocation for serial files. Communications of the ACM 22, 124–130 (1979)
Mendelson, H., Pliskin, J., Yechiali, U.: A stochastic allocation problem. Operations Research 28, 687–693 (1980)
Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)
Parida, L.: Gapped permutation patterns for comparative genomics. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 376–387. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Burcsi, P., Cicalese, F., Fici, G., Lipták, Z. (2010). On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching. In: Boldi, P., Gargano, L. (eds) Fun with Algorithms. FUN 2010. Lecture Notes in Computer Science, vol 6099. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13122-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-13122-6_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13121-9
Online ISBN: 978-3-642-13122-6
eBook Packages: Computer ScienceComputer Science (R0)