Abstract
We present a data structure that stores a string s[1..n] over the alphabet [1..σ] in nH 0(s) + o(n)(H 0(s) + 1) bits, where H 0(s) is the zero-order entropy of s. This data structure supports the queries access and rank in time \(({\mathcal O}{{\rm lg lg}\sigma})\), and the select query in constant time. This result improves on previously known data structures using \(nH_0(s)+o(n\lg\sigma)\) bits, where on highly compressible instances the redundancy \(o(n\lg\sigma)\) cease to be negligible compared to the nH 0(s) bits that encode the data. The technique is based on combining previous results through an ingenious partitioning of the alphabet, and practical enough to be implementable. It applies not only to strings, but also to several other compact data structures. For example, we achieve (i) faster search times and lower redundancy for the smallest existing full-text self-index; (ii) compressed permutations π with times for π() and π − 1() improved to log-logarithmic; and (iii) the first compressed representation of dynamic collections of disjoint sets.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Andersson, A.: Sorting and searching revisited. In: Karlsson, R., Lingas, A. (eds.) SWAT 1996. LNCS, vol. 1097, pp. 185–197. Springer, Heidelberg (1996)
Barbay, J., Golynski, A., Munro, J.I., Rao, S.S.: Adaptive searching in succinctly encoded binary relations and tree-structured documents. Theoretical Computer Science 387(3), 284–297 (2007)
Barbay, J., He, M., Munro, J.I., Rao, S.S.: Succinct indexes for strings, binary relations and multi-labeled trees. In: Proc. 18th SODA, pp. 680–689 (2007)
Barbay, J., Navarro, G.: Compressed representations of permutations, and applications. In: Proc. 26th STACS, pp. 111–122 (2009)
Blandford, D., Blelloch, G.: Index compression through document reordering. In: Proc. DCC, pp. 342–351 (2002)
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Claude, F., Navarro, G.: Practical rank/select queries over arbitrary sequences. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 176–187. Springer, Heidelberg (2008)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representations of sequences and full-text indexes. ACM Transactions on Algorithms 3(2) (2007)
Gagie, T., Navarro, G., Nekrich, Y.: Fast and compact prefix codes. In: van Leeuwen, J., Muscholl, A., Peleg, D., Pokorný, J., Rumpe, B. (eds.) SOFSEM 2010. LNCS, vol. 5901, pp. 419–427. Springer, Heidelberg (2010)
Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proc. 17th SODA, pp. 368–373 (2006)
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850 (2003)
Hreinsson, J.B., Krøyer, M., Pagh, R.: Storing a compressed function with constant time access. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 730–741. Springer, Heidelberg (2009)
Levcopoulos, C., Petersson, O.: Sorting shuffled monotone sequences. Information and Computation 112(1), 37–50 (1994)
Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theoretical Computer Science 387(3), 332–347 (2007)
Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)
Mehlhorn, K.: Sorting presorted files. In: Weihrauch, K. (ed.) GI-TCS 1979. LNCS, vol. 67, pp. 199–212. Springer, Heidelberg (1979)
Munro, J.I., Raman, R., Raman, V., Rao, S.S.: Succinct representations of permutations. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 345–356. Springer, Heidelberg (2003)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1):article 2 (2007)
Rahman, N., Raman, R.: Rank and select operations on binary strings. In: Kao, M.-Y. (ed.) Encyclopedia of Algorithms. Springer, Heidelberg (2008)
Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th SODA, pp. 233–242 (2002)
Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms 48(2), 294–313 (2003)
Tarjan, R.E., van Leeuwen, J.: Worst-case analysis of set union algorithms. Journal of the ACM 31(2), 245–281 (1984)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Barbay, J., Gagie, T., Navarro, G., Nekrich, Y. (2010). Alphabet Partitioning for Compressed Rank/Select and Applications. In: Cheong, O., Chwa, KY., Park, K. (eds) Algorithms and Computation. ISAAC 2010. Lecture Notes in Computer Science, vol 6507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17514-5_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-17514-5_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17513-8
Online ISBN: 978-3-642-17514-5
eBook Packages: Computer ScienceComputer Science (R0)