Abstract
Text indexing, the problem in which one desires to preprocess a (usually large) text for future (shorter) queries, has been researched ever since the suffix tree was invented in the early 70’s. With textual data continuing to increase and with changes in the way it is accessed, new data structures and new algorithmic methods are continuously required. Therefore, text indexing is of utmost importance and is a very active research domain.
Orthogonal range searching, classically associated with the computational geometry community, is one of the tools that has increasingly become important for various text indexing applications. Initially, in the mid 90’s there were a couple of results recognizing this connection. In the last few years we have seen an increase in use of this method and are reaching a deeper understanding of the range searching uses for text indexing.
In this monograph we survey some of these results.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Agarwal, P.K.: Range searching. In: Handbook of Discrete and Computational Geometry, pp. 575–598. CRC Press, Inc. (1997)
Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. In: Proc. of Foundations of Computer Science (FOCS), pp. 198–207 (2000)
Amir, A., Apostolico, A., Landau, G.M., Levy, A., Lewenstein, M., Porat, E.: Range LCP. In: Asano, T., Nakano, S.-i., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 683–692. Springer, Heidelberg (2011)
Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. Journal of Discrete Algorithms 1(5-6), 409–421 (2003)
Amir, A., Aumann, Y., Lewenstein, M., Porat, E.: Function matching. SIAM Journal on Computing 35(5), 1007–1022 (2006)
Amir, A., Benson, G., Farach, M.: Let sleeping files lie: Pattern matching in z-compressed files. Journal of Computer and System Sciences 52(2), 299–307 (1996)
Amir, A., Chencinski, E., Iliopoulos, C.S., Kopelowitz, T., Zhang, H.: Property matching and weighted matching. Theoretical Computer Science 395(2-3), 298–310 (2008)
Amir, A., Fischer, J., Lewenstein, M.: Two-dimensional range minimum queries. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 286–294. Springer, Heidelberg (2007)
Amir, A., Francheschini, G., Grossi, R., Kopelowitz, T., Lewenstein, M., Lewenstein, N.: Managing unbounded-length keys in comparison-driven data structures with applications to on-line indexing. SIAM Journal on Computing (to appear, 2013)
Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. Journal of Algorithms 37(2), 309–325 (2000)
Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Transactions on Algorithms 3(2) (2007)
Arroyuelo, D., Navarro, G., Sadakane, K.: Stronger Lempel-Ziv based compressed text indexing. Algorithmica 62(1-2), 54–101 (2012)
Atallah, M.J., Yuan, H.: Data structures for range minimum queries in multidimensional arrays. In: Proc. of the Symposium on Discrete Algorithms (SODA), pp. 150–160 (2010)
Badkobeh, G., Fici, G., Kroon, S., Lipták, Z.: Binary jumbled string matching for highly run-length compressible texts. Information Processing Letters 113, 604–608 (2013)
Baker, B.S.: Parameterized pattern matching: Algorithms and applications. Journal of Computer and System Sciences 52(1), 28–42 (1996)
Barbay, J., Claude, F., Navarro, G.: Compact binary relation representations with rich functionality. The Computing Research Repository (arXiv), abs/1201.3602 (2012)
Bialynicka-Birula, I., Grossi, R.: Rank-sensitive data structures. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 79–90. Springer, Heidelberg (2005)
Bille, P., Fischer, J., Gørtz, I.L., Kopelowitz, T., Sach, B., Vildhøj, H.W.: Sparse suffix tree construction in small space. In: Proc. of International Colloquium on Automata, Languages and Complexity, ICALP (2013)
Bille, P., Gørtz, I.L.: Substring range reporting. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 299–308. Springer, Heidelberg (2011)
Bose, P., He, M., Maheshwari, A., Morin, P.: Succinct orthogonal range search structures on a grid with applications to text indexing. In: Dehne, F., Gavrilova, M., Sack, J.-R., Tóth, C.D. (eds.) WADS 2009. LNCS, vol. 5664, pp. 98–109. Springer, Heidelberg (2009)
Brodal, G.S., Gąsieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 65–74. Springer, Heidelberg (1996)
Brodal, G.S., Davoodi, P., Lewenstein, M., Raman, R., Rao, S.S.: Two dimensional range minimum queries and Fibonacci lattices. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp. 217–228. Springer, Heidelberg (2012)
Brodal, G.S., Davoodi, P., Rao, S.S.: On space efficient two dimensional range minimum data structures. Algorithmica 63(4), 815–830 (2012)
Brodnik, A., Munro, J.I.: Membership in constant time and almost-minimum space. SIAM Journal on Computing 28(5), 1627–1640 (1999)
Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Information Processing Letters 92(6), 293–297 (2004)
Chan, H.-L., Lam, T.W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: A linear size index for approximate pattern matching. Journal of Discrete Algorithms 9(4), 358–364 (2011)
Chan, T.M.: Persistent predecessor search and orthogonal point location on the word ram. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 1131–1145 (2011)
Chan, T.M., Larsen, K.G., Pǎtraşcu, M.: Orthogonal range searching on the RAM, revisited. In: Proc. of the Symposium on Computational Geometry, SOCG (2011)
Chazelle, B.: A functional approach to data structures and its use in multidimensional searching. SIAM Journal on Computing 17(3), 427–462 (1988)
Chazelle, B.: Lower bounds for orthogonal range searching: I. the reporting case. Journal of the ACM 37(2), 200–212 (1990)
Chazelle, B., Rosenberg, B.: The complexity of computing partial sums off-line. International Journal of Computational Geometry and Applications 1(1), 33–45 (1991)
Chien, Y.-F., Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Geometric Burrows-Wheeler transform: Compressed text indexing via sparse suffixes and range searching. Algorithmica (to appear, 2013)
Cicalese, F., Fici, G., Lipták, Z.: Searching for jumbled patterns in strings. In: Holub, J., Zdárek, J. (eds.) Proceedings of the Prague Stringology Conference (PSC), pp. 105–117 (2009)
Claude, F., Navarro, G.: Self-indexed grammar-based compression. Fundamenta Informaticae 111(3), 313–337 (2011)
Claude, F., Navarro, G.: Improved grammar-based compressed indexes. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 180–192. Springer, Heidelberg (2012)
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. of Symposium on Theory of Computing (STOC), pp. 91–100 (2004)
Cormode, G., Muthukrishnan, S.: Substring compression problems. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 321–330 (2005)
Crochemore, M., Iliopoulos, C.S., Kubica, M., Rahman, M.S., Tischler, G., Walen, T.: Improved algorithms for the range next value problem and applications. Theoretical Computer Science 434, 23–34 (2012)
Crochemore, M., Iliopoulos, C.S., Rahman, M.S.: Finding patterns in given intervals. In: Kučera, L., Kučera, A. (eds.) MFCS 2007. LNCS, vol. 4708, pp. 645–656. Springer, Heidelberg (2007)
Crochemore, M., Kubica, M., Walen, T., Iliopoulos, C.S., Rahman, M.S.: Finding patterns in given intervals. Fundamenta Informaticae 101(3), 173–186 (2010)
Davoodi, P., Landau, G., Lewenstein, M.: Multi-dimensional range minimum queries (manuscript, 2013)
Davoodi, P., Raman, R., Satti, S.R.: Succinct representations of binary trees for range minimum queries. In: Gudmundsson, J., Mestre, J., Viglas, T. (eds.) COCOON 2012. LNCS, vol. 7434, pp. 396–407. Springer, Heidelberg (2012)
Demaine, E.D., Landau, G.M., Weimann, O.: On cartesian trees and range minimum queries. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009, Part I. LNCS, vol. 5555, pp. 341–353. Springer, Heidelberg (2009)
Demaine, E.D., López-Ortiz, A.: A linear lower bound on index size for text retrieval. Journal of Algorithms 48(1), 2–15 (2003)
Dietz, P.F., Raman, R.: Persistence, amortization and randomization. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 78–88 (1991)
Farach, M., Muthukrishnan, S.: Perfect hashing for strings: Formalization and algorithms. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 130–140. Springer, Heidelberg (1996)
Farach, M., Thorup, M.: String matching in Lempel-Ziv compressed strings. Algorithmica 20(4), 388–404 (1998)
Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. Journal of the ACM 47(6), 987–1011 (2000)
Ferragina, P.: Dynamic text indexing under string updates. Journal of Algorithms 22(2), 296–328 (1997)
Ferragina, P., Manzini, G.: Indexing compressed text. Journal of the ACM 52(4), 552–581 (2005)
Ferragina, P., Muthukrishnan, S., de Berg, M.: Multi-method dispatching: A geometric approach with applications to string matching problems. In: Proc. of Symposium on Theory of Computing (STOC), pp. 483–491 (1999)
Fischer, J., Gagie, T., Kopelowitz, T., Lewenstein, M., Mäkinen, V., Salmela, L., Välimäki, N.: Forbidden patterns. In: Fernández-Baca, D. (ed.) LATIN 2012. LNCS, vol. 7256, pp. 327–337. Springer, Heidelberg (2012)
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM Journal on Computing 40(2), 465–492 (2011)
Gabow, H.N., Bentley, J.L., Tarjan, R.E.: Scaling and related techniques for geometry problems. In: Proc. of the Symposium on Theory of Computing (STOC), pp. 135–143 (1984)
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012)
Golin, M., Iacono, J., Krizanc, D., Raman, R., Rao, S.S.: Encoding 2D range maximum queries. In: Asano, T., Nakano, S.-I., Okamoto, Y., Watanabe, O. (eds.) ISAAC 2011. LNCS, vol. 7074, pp. 180–189. Springer, Heidelberg (2011)
Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 368–373 (2006)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM Journal on Computing 35(2), 378–407 (2005)
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press (1997)
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM Journal on Computing 13(2), 338–355 (1984)
Hon, W.-K., Ku, T.-H., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed dictionary matching with one error. In: Proc. of the Data Compression Conference (DCC), pp. 113–122 (2011)
Hon, W.-K., Patil, M., Shah, R., Thankachan, S.V.: Compressed property suffix trees. In: Proc. of the Data Compression Conference (DCC), pp. 123–132 (2011)
Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: On position restricted substring searching in succinct space. Journal of Discrete Algorithms 17, 109–114 (2012)
Hon, W.-K., Shah, R., Vitter, J.S.: Space-efficient framework for top-k string retrieval problems. In: Proc. of Foundations of Computer Science (FOCS), pp. 713–722 (2009)
Iliopoulos, C.S., Rahman, M.S.: Faster index for property matching. Information Processing Letters 105(6), 218–223 (2008)
Iliopoulos, C.S., Rahman, M.S.: Indexing factors with gaps. Algorithmica 55(1), 60–70 (2009)
Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554 (1989)
Juan, M.T., Liu, J.J., Wang, Y.L.: Errata for “faster index for property matching“. Information Processing Letters 109(18), 1027–1029 (2009)
Kärkkäinen, J.: Repetition-Based Text Indexes. PhD thesis, University of Helsinki, Finland (1999)
Kärkkäinen, J., Sanders, P., Burkhardt, S.: Linear work suffix array construction. Journal of the ACM 53(6), 918–936 (2006)
Kärkkäinen, J., Ukkonen, E.: Lempel-Ziv parsing and sublinear-size index structures for string matching. In: Proc. 3rd South American Workshop on String Processing (WSP). International Informatics Series 4, pp. 141–155. Carleton University Press (1996)
Kärkkäinen, J., Ukkonen, E.: Sparse suffix trees. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 219–230. Springer, Heidelberg (1996)
Karpinski, M., Nekrich, Y.: Top-k color queries for document retrieval. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 401–411 (2011)
Keller, O., Kopelowitz, T., Landau, S., Lewenstein, M.: Generalized substring compression. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 26–38. Springer, Heidelberg (2009)
Keller, O., Kopelowitz, T., Lewenstein, M.: Range non-overlapping indexing and successive list indexing. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 625–636. Springer, Heidelberg (2007)
Kopelowitz, T.: The property suffix tree with dynamic properties. In: Amir, A., Parida, L. (eds.) CPM 2010. LNCS, vol. 6129, pp. 63–75. Springer, Heidelberg (2010)
Kopelowitz, T., Kucherov, G., Nekrich, Y., Starikovskaya, T.A.: Cross-document pattern matching. Journal of Discrete Algorithms (to appear, 2013)
Kopelowitz, T., Lewenstein, M.: Dynamic weighted ancestors. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 565–574 (2007)
Kopelowitz, T., Lewenstein, M., Porat, E.: Persistency in suffix trees with applications to string interval problems. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 67–80. Springer, Heidelberg (2011)
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theoretical Computer Science 483, 115–133 (2013)
Landau, G.M., Vishkin, U.: Fast string matching with k differences. Journal of Computer and System Sciences 37(1), 63–78 (1988)
Lenhof, H.-P., Smid, M.H.M.: Using persistent data structures for adding range restrictions to searching problems. Theoretical Informatics and Applications (ITA) 28(1), 25–49 (1994)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady 10, 707–710 (1966)
Lewenstein, M.: Parameterized matching. In: Encyclopedia of Algorithms (2008)
Mäkinen, V., Navarro, G.: Position-restricted substring searching. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 703–714. Springer, Heidelberg (2006)
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM Journal on Computing 22(5), 935–948 (1993)
McCreight, E.M.: A space-economical suffix tree construction algorithm. Journal of the ACM 23(2), 262–272 (1976)
Moosa, T.M., Rahman, M.S.: Indexing permutations for binary strings. Information Processing Letters 110(18-19), 795–798 (2010)
Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proc. of the Symposium on Discrete Algorithms (SODA), pp. 657–666 (2002)
Navarro, G.: Wavelet trees for all. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 2–26. Springer, Heidelberg (2012)
Navarro, G.: Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. The Computing Research Repository (arXiv), abs/1304.6023 (2013)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), 2 (2007)
Navarro, G., Nekrich, Y.: Top-k document retrieval in optimal time and linear space. In: Proc. of Symposium on Discrete Algorithms (SODA), pp. 1066–1077 (2012)
Nekrich, Y., Navarro, G.: Sorted range reporting. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 271–282. Springer, Heidelberg (2012)
Russo, L.M.S., Oliveira, A.L.: A compressed self-index using a Ziv-Lempel dictionary. Information Retrieval 11(4), 359–388 (2008)
Sadakane, K.: Succinct data structures for flexible text retrieval systems. Journal of Discrete Algorithms 5(1), 12–22 (2007)
Shah, R., Sheng, C., Thankachan, S.V., Vitter, J.S.: On optimal top-k string retrieval. The Computing Research Repository (arXiv), abs/1207.2632 (2012)
Thankachan, S.V.: Compressed indexes for aligned pattern matching. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 410–419. Springer, Heidelberg (2011)
Tsur, D.: Fast index for approximate string matching. Journal of Discrete Algorithms 8(4), 339–345 (2010)
Ukkonen, E.: On-line construction of suffix trees. Algorithmica 14(3), 249–260 (1995)
van Emde Boas, P.: Preserving order in a forest in less than logarithmic time and linear space. Information Processing Letters 6(3), 80–82 (1977)
Vuillemin, J.: A unifying look at data structures. Communications of the ACM 23(4), 229–239 (1980)
Weiner, P.: Linear pattern matching algorithm. In: Proc. of the Symposium on Switching and Automata Theory, pp. 1–11 (1973)
Willard, D.E.: Log-logarithmic worst-case range queries are possible in space θ(n). Information Processing Letters 17(2), 81–84 (1983)
Yu, C.-C., Hon, W.-K., Wang, B.-F.: Improved data structures for the orthogonal range successor problem. Computational Geometry 44(3), 148–159 (2011)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lewenstein, M. (2013). Orthogonal Range Searching for Text Indexing. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds) Space-Efficient Data Structures, Streams, and Algorithms. Lecture Notes in Computer Science, vol 8066. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40273-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-40273-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40272-2
Online ISBN: 978-3-642-40273-9
eBook Packages: Computer ScienceComputer Science (R0)