Abstract
We introduce a symbol reordering technique that implicitly synchronizes variable-length codes, such that it is possible to directly access the i-th codeword without need of any sampling method. The technique is practical and has many applications to the representation of ordered sets, sparse bitmaps, partial sums, and compressed data structures for suffix trees, arrays, and inverted indexes, to name just a few. We show experimentally that the technique offers a competitive alternative to other data structures that handle this problem.
Funded in part (for the Spanish group) by MEC grant (TIN2006-15071-C03-03); and for the third author by Fondecyt Grant 1-080019, Chile.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Brisaboa, N., Fariña, A., Navarro, G., Paramá, J.: Lightweight natural language text compression. Information Retrieval 10, 1–33 (2007)
Clark, D.: Compact Pat Trees. PhD thesis, University of Waterloo, Canada (1996)
Claude, F., Navarro, G.: Practical rank/select queries over arbitrary sequences. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 176–187. Springer, Heidelberg (2008)
Culpepper, J., Moffat, A.: Compact set representation for information retrieval. In: Ziviani, N., Baeza-Yates, R. (eds.) SPIRE 2007. LNCS, vol. 4726, pp. 137–148. Springer, Heidelberg (2007)
Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice. ACM JEA 13, article 12, 30 pages (2009)
Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. In: Proc. 18th SODA, pp. 690–696 (2007)
Fischer, J., Mäkinen, V., Navarro, G.: An(other) entropy-bounded compressed suffix tree. In: Ferragina, P., Landau, G.M. (eds.) CPM 2008. LNCS, vol. 5029, pp. 152–165. Springer, Heidelberg (2008)
González, R., Grabowski, S., Mäkinen, V., Navarro, G.: Practical implementation of rank and select queries. In: Poster Proc. 4th WEA, pp. 27–38 (2005)
Grossi, R., Vitter, J.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proc. 32nd STOC, pp. 397–406 (2000)
Gupta, A., Hon, W.-K., Shah, R., Vitter, J.: Compressed dictionaries: Space measures, data sets, and experiments. In: Proc. 5th WEA, pp. 158–169 (2006)
Huffman, D.: A method for the construction of minimum-redundancy codes. Proceedings of the I.R.E. 40(9), 1090–1101 (1952)
Jacobson, G.: Space-efficient static trees and graphs. In: Proc. 30th FOCS, pp. 549–554 (1989)
Mäkinen, V., Navarro, G.: Dynamic entropy-compressed sequences and full-text indexes. ACM Transactions on Algorithms (TALG) 4(3), 1–38 (2008)
Moffat, A.: Word-based text compression. Software Practice and Experience 19(2), 185–198 (1989)
Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM TOIS 18(2), 113–139 (2000)
Munro, I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)
Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proc. 9th ALENEX (2007)
Raman, R., Raman, V., Rao, S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proc. 13th SODA, pp. 233–242 (2002)
Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. Journal of Algorithms 48(2), 294–313 (2003)
Solomon, D.: Variable-length codes for data compression. Springer, Heidelberg (2007)
Williams, H.E., Zobel, J.: Compressing integers for fast file access. COMPJ: The Computer Journal 42(3), 193–201 (1999)
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes, 2nd edn. Morgan Kaufmann Publishers, New York (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brisaboa, N.R., Ladra, S., Navarro, G. (2009). Directly Addressable Variable-Length Codes. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-03784-9_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03783-2
Online ISBN: 978-3-642-03784-9
eBook Packages: Computer ScienceComputer Science (R0)