Alphabet Permutation for Differentially Encoding Text

Landau, Gad M.; Levi, Ofer; Skiena, Steven

doi:10.1007/978-3-540-30213-1_32

Gad M. Landau¹⁸,
Ofer Levi¹⁸ &
Steven Skiena¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3246))

Included in the following conference series:

International Symposium on String Processing and Information Retrieval

724 Accesses

Abstract

One degree of freedom not usually exploited in developing high-performance text-processing algorithms is the encoding of the underlying atomic character set. Here we consider a text compression method where the specific character set collating-sequence employed in encoding the text has a big impact on performance. We demonstrate that permuting the standard character collating-sequences yields a small win on Asian-language texts over gzip. We also show improved compression with our method for English texts, although not by enough to beat standard methods. However, we also design a class of artificial languages on which our method clearly beats gzip, often by an order of magnitude.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Practical Alphabet-Partitioning Rank/Select Data Structure

Random Access to High-Order Entropy Compressed Text

Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

References

Chapin, B., Tate, S.: Higher compression from the burrows-wheeler transform by modified sorting. In: IEEE Data Compression Conference (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, University of Haifa, Mount Carmel, Haifa, 31905, Israel
Gad M. Landau & Ofer Levi
Dept. of Computer Science, SUNY, Stony Brook, NY, 11794-4400, USA
Steven Skiena

Authors

Gad M. Landau
View author publications
You can also search for this author in PubMed Google Scholar
Ofer Levi
View author publications
You can also search for this author in PubMed Google Scholar
Steven Skiena
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Georgia Institute of Technology and Università di Padova,
Alberto Apostolico
Department of Information Engineering, University of Padova,
Massimo Melucci

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Landau, G.M., Levi, O., Skiena, S. (2004). Alphabet Permutation for Differentially Encoding Text. In: Apostolico, A., Melucci, M. (eds) String Processing and Information Retrieval. SPIRE 2004. Lecture Notes in Computer Science, vol 3246. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30213-1_32

Download citation

DOI: https://doi.org/10.1007/978-3-540-30213-1_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23210-0
Online ISBN: 978-3-540-30213-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Alphabet Permutation for Differentially Encoding Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Practical Alphabet-Partitioning Rank/Select Data Structure

Random Access to High-Order Entropy Compressed Text

Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Alphabet Permutation for Differentially Encoding Text

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Practical Alphabet-Partitioning Rank/Select Data Structure

Random Access to High-Order Entropy Compressed Text

Linear-Size CDAWG: New Repetition-Aware Indexing and Grammar Compression

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation