Abstract
Given a sequence of n bits with binary zero-order entropy H 0, we present a dynamic data structure that requires nH 0 + o(n) bits of space, which is able of performing rank and select, as well as inserting and deleting bits at arbitrary positions, in O(logn) worst-case time. This extends previous results by Hon et al. [ISAAC 2003] achieving O(logn/loglogn) time for rank and select but \(\Theta({\textrm{polylog}}(n))\) amortized time for inserting and deleting bits, and requiring n + o(n) bits of space; and by Raman et al. [SODA 2002] which have constant query time but a static structure. In particular, our result becomes the first entropy-bound dynamic data structure for rank and select over bit sequences.
We then show how the above result can be used to build a dynamic full-text self-index for a collection of texts over an alphabet of size σ, of overall length n and zero-order entropy H 0. The index requires nH 0 + o(n logσ) bits of space, and can count the number of occurrences of a pattern of length m in time O(m logn logσ). Reporting the occ occurrences can be supported in O(occ log2 n logσ) time, paying O(n) extra space. Insertion of text to the collection takes O(logn logσ) time per symbol, which becomes O(log2 n logσ) for deletions. This improves a previous result by Chan et al. [CPM 2004]. As a consequence, we obtain an O(n logn logσ) time construction algorithm for a compressed self-index requiring nH 0 + o(n logσ) bits working space during construction.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Apostolico, A.: The myriad virtues of subword trees. In: Combinatorial Algorithms on Words. NATO ISI Series, pp. 85–96. Springer, Heidelberg (1985)
Arroyuelo, D., Navarro, G.: Space-efficient construction of LZ-index. In: Deng, X., Du, D.-Z. (eds.) ISAAC 2005. LNCS, vol. 3827, pp. 1143–1152. Springer, Heidelberg (2005)
Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)
Chan, W.-L., Hon, W.-K., Lam, T.-W.: Compressed index for a dynamic collection of texts. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 445–456. Springer, Heidelberg (2004)
Dietz, P.: Optimal algorithms for list indexing and subset rank. In: Dehne, F., Santoro, N., Sack, J.-R. (eds.) WADS 1989. LNCS, vol. 382, pp. 39–46. Springer, Heidelberg (1989)
Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: Proc. FOCS 2000, pp. 390–398 (2000)
Ferragina, P., Manzini, G., Mäkinen, V., Navarro, G.: Compressed representation of sequences and full-text indexes. ACM Transactions on Algorithms (to appear, 2006); Preliminary versions, In: Proc. SPIRE 2004 and Tech. Rep. TR/DCC-2004-5, Dept. of Computer Science Univ. of Chile, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/sequences.ps.gz
Grossi, R., Gupta, A., Vitter, J.: High-order entropy-compressed text indexes. In: Proc. SODA 2003, pp. 841–850 (2003)
Hon, W.-K., Sadakane, K., Sung, W.-K.: Succinct data structures for searchable partial sums. In: Ibaraki, T., Katoh, N., Ono, H. (eds.) ISAAC 2003. LNCS, vol. 2906, pp. 505–516. Springer, Heidelberg (2003)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nordic Journal of Computing 12(1), 40–66 (2005)
Manber, U., Myers, G.: Suffix arrays: a new method for on-line string searches. SIAM Journal on Computing, 935–948 (1993)
Navarro, G.: Indexing text using the Ziv-Lempel trie. Journal of Discrete Algorithms (JDA) 2(1), 87–114 (2004)
Raman, R., Raman, V., Srinivasa Rao, S.: Succinct dynamic data structures. In: Dehne, F., Sack, J.-R., Tamassia, R. (eds.) WADS 2001. LNCS, vol. 2125, pp. 426–437. Springer, Heidelberg (2001)
Raman, R., Raman, V., Srinivasa Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. SODA 2002, pp. 233–242 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mäkinen, V., Navarro, G. (2006). Dynamic Entropy-Compressed Sequences and Full-Text Indexes. In: Lewenstein, M., Valiente, G. (eds) Combinatorial Pattern Matching. CPM 2006. Lecture Notes in Computer Science, vol 4009. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780441_28
Download citation
DOI: https://doi.org/10.1007/11780441_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35455-0
Online ISBN: 978-3-540-35461-1
eBook Packages: Computer ScienceComputer Science (R0)