Abstract
In this paper we present the design goals and implementation outline of Carrot2, an open source framework for rapid development of applications dealing with Web Information Retrieval and Web Mining. The framework has been written from scratch keeping in mind flexibility and efficiency of processing. We show two software architectures that meet the requirements of these two aspects and provide evidence of their use in clustering of search results.
We also discuss the importance and advantages of contributing and integrating the results of scientific projects with the open source community.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Grefenstette, G.: Comparing two language identification schemes. In: Proceedings of the 3rd International Conference on Statistical Analysis of Textual Data (1995)
Lang, H.C.: A tolerance rough set approach to clustering web search results. Faculty of Mathematics, Informatics and Mechanics, Warsaw University (2004)
Schockaert, S.: Het clusteren van zoekresultaten met behulp van vaagmieren (clustering of search results using fuzzy ants). Master thesis, University of Ghent (2004)
Zamir, O.: Clustering Web Documents: A Phrase-Based Method for Grouping Search Engine Results. PhD thesis, University of Washington (1999)
Osiński, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on Singular Value Decomposition. In: Kłopotek, M.A., Wierzchoń, S.T., Trojanowski, K. (eds.) Proceedings of the International IIS: Intelligent Information Processing and Web Mining Conference, Zakopane, Poland. Advances in Soft Computing, pp. 359–368. Springer, Heidelberg (2004)
Jensen, L.R.: A reuse repository with automated synonym support and cluster generation. Department of Computer Science at the Faculty of Science, University of Aarhus, Denmark (2004)
Osiński, S.: Dimensionality reduction techniques for search results clustering. MSc thesis, University of Sheffield, UK (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Osiński, S., Weiss, D. (2005). Carrot2: Design of a Flexible and Efficient Web Information Retrieval Framework. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds) Advances in Web Intelligence. AWIC 2005. Lecture Notes in Computer Science(), vol 3528. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11495772_68
Download citation
DOI: https://doi.org/10.1007/11495772_68
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26219-0
Online ISBN: 978-3-540-31900-9
eBook Packages: Computer ScienceComputer Science (R0)