On Inverted Index Compression for Search Engine Efficiency

Catena, Matteo; Macdonald, Craig; Ounis, Iadh

doi:10.1007/978-3-319-06028-6_30

Matteo Catena²²,
Craig Macdonald²³ &
Iadh Ounis²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8416))

Included in the following conference series:

European Conference on Information Retrieval

3224 Accesses
12 Citations
39 Altmetric

Abstract

Efficient access to the inverted index data structure is a key aspect for a search engine to achieve fast response times to users’ queries. While the performance of an information retrieval (IR) system can be enhanced through the compression of its posting lists, there is little recent work in the literature that thoroughly compares and analyses the performance of modern integer compression schemes across different types of posting information (document ids, frequencies, positions). In this paper, we experiment with different modern integer compression algorithms, integrating these into a modern IR system. Through comprehensive experiments conducted on two large, widely used document corpora and large query sets, our results show the benefit of compression for different types of posting information to the space- and time-efficiency of the search engine. Overall, we find that the simple Frame of Reference compression scheme results in the best query response times for all types of posting information. Moreover, we observe that the frequency and position posting information in Web corpora that have large volumes of anchor text are more challenging to compress, yet compression is beneficial in reducing average query response times.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

An Experimental Study of Index Compression and DAAT Query Processing Methods

Pre-indexing Pruning Strategies

Enhanced FastPFOR for Inverted Index Compression

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Dean, J.: Challenges in building large-scale information retrieval systems: invited talk. In: Proc. WSDM 2009 (2009)
Google Scholar
Anh, V.N., Moffat, A.: Pruned query evaluation using pre-computed impact scores. In: Proc. SIGIR 2006 (2006)
Google Scholar
Moffat, A., Webber, W., Zobel, J., Baeza-Yates, R.: A pipelined architecture for distributed text query evaluation. Inf. Retr. 10 (2007)
Google Scholar
Broccolo, D., Macdonald, C., Orlando, S., Ounis, I., Perego, R., Tonellotto, N.: Load-sensitive selective pruning for distributed search. In: Proc. CIKM 2013 (2013)
Google Scholar
Witten, I.H., Bell, T.C., Moffat, A.: Managing Gigabytes: Compressing and Indexing Documents and Images, 1st edn. (1994)
Google Scholar
Elias, P.: Universal codeword sets and representations of the integers. Trans. Info. Theory 21(2) (1975)
Google Scholar
Yan, H., Ding, S., Suel, T.: Compressing term positions in web indexes. In: Proc. SIGIR 2009 (2009)
Google Scholar
Zukowski, M., Heman, S., Nes, N., Boncz, P.: Super-scalar RAM-CPU cache compression. In: Proc. ICDE 2006 (2006)
Google Scholar
Yan, H., Ding, S., Suel, T.: Inverted index compression and query processing with optimized document ordering. In: Proc. WWW 2009 (2009)
Google Scholar
Lemire, D., Boytsov, L.: Decoding billions of integers per second through vectorization. Software: Practice and Experience (2013)
Google Scholar
Delbru, R., Campinas, S., Samp, K., Tummarello, G.: Adaptive frame of reference for compressing inverted lists. Technical Report 2010-12-16, DERI (2010)
Google Scholar
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable IR Platform. In: Proc. OSIR 2006 (2006)
Google Scholar
Wang, L., Lin, J., Metzler, D.: Learning to efficiently rank. In: Proc. SIGIR 2010 (2010)
Google Scholar
Shurman, E., Brutlag, J.: Performance related changes and their user impacts. In: Velocity: Web Performance and Operations Conference (2009)
Google Scholar
Broder, A.Z., Carmel, D., Herscovici, M., Soffer, A., Zien, J.: Efficient query evaluation using a two-level retrieval process. In: Proc. CIKM 2003 (2003)
Google Scholar
Cambazoglu, B.B., Zaragoza, H., Chapelle, O., Chen, J., Liao, C., Zheng, Z., Degenhardt, J.: Early exit optimizations for additive machine learned ranking systems. In: Proc. WSDM 2010 (2010)
Google Scholar
Wang, L., Lin, J., Metzler, D.: A cascade ranking model for efficient ranked retrieval. In: Proc. SIGIR 2011 (2011)
Google Scholar
Macdonald, C., Santos, R.L., Ounis, I., He, B.: About learning models with multiple query dependent features. Trans. Info. Sys. 13(3) (2013)
Google Scholar
Williams, H.E., Zobel, J.: Compressing integers for fast file access. The Computer Journal 42 (1999)
Google Scholar
Scholer, F., Williams, H.E., Yiannis, J., Zobel, J.: Compression of inverted indexes for fast query evaluation. In: Proc. SIGIR 2002 (2002)
Google Scholar
Golomb, S.: Run-length encodings. Trans. Infor. Theory 12(3) (1966)
Google Scholar
Rice, R., Plaunt, J.: Adaptive variable-length coding for efficient compression of spacecraft television data. Trans. Communication Technology 19(6) (1971)
Google Scholar
Anh, V.N., Moffat, A.: Inverted index compression using word-aligned binary codes. Inf. Retr. 8(1) (2005)
Google Scholar
Goldstein, J., Ramakrishnan, R., Shaft, U.: Compressing relations and indexes. In: Proc. ICDE 1998 (1998)
Google Scholar
Zhang, J., Long, X., Suel, T.: Performance of compressed inverted list caching in search engines. In: Proc. WWW 2008 (2008)
Google Scholar
Zhang, J., Suel, T.: Efficient search in large textual collections with redundancy. In: Proc. WWW 2007 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

GSSI - Gran Sasso Science Institute, INFN, Viale F. Crispi 7, 67100, L’Aquila, Italy
Matteo Catena
School of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK
Craig Macdonald & Iadh Ounis

Authors

Matteo Catena
View author publications
You can also search for this author in PubMed Google Scholar
Craig Macdonald
View author publications
You can also search for this author in PubMed Google Scholar
Iadh Ounis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke & Tom Kenter &
Centrum Wiskunde en Informatica, Amsterdam, The Netherlands and Delft University of Technology, Delft, The Netherlands
Arjen P. de Vries
University of Illinois at Urbana-Champaign, Urbana, IL, USA
ChengXiang Zhai
University of Twente, Twente, The Netheralnds and Erasmus University Rotterdam, Rotterdam, The Netherlands
Franciska de Jong
SalesPredict, Haifa, Israel
Kira Radinsky
Microsoft Research, Cambridge, UK
Katja Hofmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Catena, M., Macdonald, C., Ounis, I. (2014). On Inverted Index Compression for Search Engine Efficiency. In: de Rijke, M., et al. Advances in Information Retrieval. ECIR 2014. Lecture Notes in Computer Science, vol 8416. Springer, Cham. https://doi.org/10.1007/978-3-319-06028-6_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-06028-6_30
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06027-9
Online ISBN: 978-3-319-06028-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Inverted Index Compression for Search Engine Efficiency

Abstract

Chapter PDF

Similar content being viewed by others

An Experimental Study of Index Compression and DAAT Query Processing Methods

Pre-indexing Pruning Strategies

Enhanced FastPFOR for Inverted Index Compression

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On Inverted Index Compression for Search Engine Efficiency

Abstract

Chapter PDF

Similar content being viewed by others

An Experimental Study of Index Compression and DAAT Query Processing Methods

Pre-indexing Pruning Strategies

Enhanced FastPFOR for Inverted Index Compression

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation