Abstract
Let \({\cal D} =\{d_1,d_2,...,d_D\}\) be a collection of D string documents of n characters in total. The two-pattern matching problems ask to index \({\cal D}\) for answering the following queries efficiently.
-
report/count the unique documents containing P 1 and P 2.
-
report/count the unique documents containing P 1 , but not P 2.
Here P 1 and P 2 represent input patterns of length p 1 and p 2 respectively. Linear space data structures with \(O(p_1+p_2+\sqrt{nk}\log^{O(1)} n)\) query cost are already known for the reporting version, where k represents the output size. For the counting version (i.e., report the value k), a simple linear-space index with \(O(p_1+p_2+ \sqrt{n})\) query cost can be constructed in O(n 3/2) time. However, it is still not known if these are the best possible bounds for these problems. In this paper, we show a strong connection between these string indexing problems and the boolean matrix multiplication problem. Based on this, we argue that these results cannot be improved significantly using purely combinatorial techniques. We also provide an improved upper bound for a related problem known as two-dimensional substring indexing.
Work supported in part by the Danish National Research Foundation grant DNRF84 through Center for Massive Data Algorithmics (MADALGO).
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bansal, N., Williams, R.: Regularity lemmas and combinatorial algorithms. Theory of Computing 8(1), 69–94 (2012)
Brodal, G.S., Davoodi, P., Rao, S.S.: On space efficient two dimensional range minimum data structures. Algorithmica 63(4), 815–830 (2012)
Chan, T.M., Durocher, S., Larsen, K.G., Morrison, J., Wilkinson, B.T.: Linear-space data structures for range mode query in arrays. In: STACS. LIPIcs, vol. 14, pp. 290–301. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012)
Chan, T.M., Durocher, S., Skala, M., Wilkinson, B.T.: Linear-space data structures for range minority query in arrays. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 295–306. Springer, Heidelberg (2012)
Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the ram, revisited. In: Symposium on Computational Geometry, pp. 1–10. ACM (2011)
Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40-42), 3795–3800 (2010)
Ferragina, P., Koudas, N., Muthukrishnan, S., Srivastava, D.: Two-dimensional substring indexing. J. Comput. Syst. Sci. 66(4), 763–774 (2003)
Fischer, J., Gagie, T., Kopelowitz, T., Lewenstein, M., Mäkinen, V., Salmela, L., Välimäki, N.: Forbidden patterns. In: Fernández-Baca, D. (ed.) LATIN 2012. LNCS, vol. 7256, pp. 327–337. Springer, Heidelberg (2012)
Gall, F.L.: Powers of tensors and fast matrix multiplication. CoRR, abs/1401.7714 (2014)
Golynski, A., Munro, J.I., Rao, S.S.: Rankselect operations on large alphabets: A tool for text indexing. In: SODA, pp. 368–373. ACM Press (2006)
Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: String retrieval for multi-pattern queries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 55–66. Springer, Heidelberg (2010)
Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Document listing for queries with excluded pattern. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 185–195. Springer, Heidelberg (2012)
Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Space-efficient framework for top-k string retrieval. In: JACM (2014)
JáJá, J., Mortensen, C.W., Shi, Q.: Space-efficient and fast algorithms for multidimensional dominance reporting and counting. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 558–568. Springer, Heidelberg (2004)
Matias, Y., Muthukrishnan, S.M., Şahinalp, S.C., Ziv, J.: Augmenting suffix trees, with applications. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 67–78. Springer, Heidelberg (1998)
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: SODA, pp. 657–666. ACM/SIAM (2002)
Navarro, G.: Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. CoRR, abs/1304.6023 (2013)
Nekrich, Y., Navarro, G.: Sorted range reporting. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 271–282. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Larsen, K.G., Munro, J.I., Nielsen, J.S., Thankachan, S.V. (2014). On Hardness of Several String Indexing Problems. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds) Combinatorial Pattern Matching. CPM 2014. Lecture Notes in Computer Science, vol 8486. Springer, Cham. https://doi.org/10.1007/978-3-319-07566-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-07566-2_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07565-5
Online ISBN: 978-3-319-07566-2
eBook Packages: Computer ScienceComputer Science (R0)