Abstract
Tag clouds have gained popularity over the internet to provide a quick overview of the content of a website or a text. We introduce a new visualisation which displays more information: the tree cloud. Like a word cloud, it shows the most frequent words of the text, where the size reflects the frequency, but the words are arranged on a tree to reflect their semantic proximity according to the text. Such tree clouds help identify the main topics of a document, and even be used for text analysis. We also provide methods to evaluate the quality of the obtained tree cloud, and some key steps of its construction. Our algorithms are implemented in the free software TreeCloud available at http://www.treecloud.org.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Barthélémy, J. P., & Luong, N. X. (1987). Sur la topologie d’un arbre phylogénétique: Aspects théoriques, algorithmes et applications l’analyse de données textuelles. Mathématiques et Sciences Humaines, 100, 57–80.
Brunet, E. (1993). Un hypertexte statistique: Hyperbase. JADT 1993, 1–16.
Cilibrasi, R., & Vitanyi, P. (2007). The google similarity distance. IEEE/ACM Transactions on Knowledge and Data Engineering, 19(3), 370–383.
van Eck, N. J. (2005). Towards Automatic Knwoledge Discovery from Scientific Literature. MSc Thesis.
Evert, S. (2005). The Statistics of Word Cooccurrences, Word Pairs and Collocations. Phd Thesis, pp. 75–91.
Fujimura, K., Fujimura, S., Matsubayashi, T., Yamada, T., & Okuda, H. (2008). Topigraphy: Visualization for Large-scale tag clouds. WWW2008, Beijing, China.
Gascuel, O., & Levy, D. (1996). A reduction algorithm for approximating a (nonmetric) dissimilarity by a tree distance. Journal of Classification, 13(1), 129–155.
Guénoche, A., & Darlu, P. (2009). TreeOfTrees: A new method to evaluate gene tree distances. Manuscript.
Guénoche, A., & Garreta, H. (2000). Can we have confidence in a tree representation? Lecture Notes in Computer Science, 2066, 45–56.
Harrison, C. (2008). Visualizing the bible. http://www.chrisharrison.net/projects/bibleviz.
Hassan-Montero, Y., & Herrero-Solana, V. (2006). Improving tag-clouds as visual information retrieval interfaces. InSciT2006. Merida, Spain.
Kaser, O. and Lemire, D. (2007). Tag-Cloud Drawing: Algorithms for Cloud Visualization, in Tagging and Metadata for Social Information Organization (workshop at WWW2007), 10 pages, May 2007.
Saitou, N., & Nei, M. (1987). The neighbor-joining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution, 4, 406–425.
Sattah, S., & Tversky, A. (1977). Additive similarity trees. Psychometrika, 42, 319–345.
Shaw, B. (2005). Semidefinite embedding applied to visualizing folksonomies. Manuscript, 9 pages, December 2005.
Véronis, J. (2004). Hyperlex, lexical cartography for information retrieval. Computer, Speech and Language, 18(3), 223–252.
Viégas, F. B., & Wattenberg, M. (2008). Tag clouds and the case for vernacular visualization. ACM Interactions, 15(4), 49–52.
Viprey, J.-M. (2006). Ergonomiser la visualisation AFC dans un environnement d’Exploration textuelle : une projection “Géodésique”. JADT 2006, 981–992.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gambette, P., Véronis, J. (2010). Visualising a Text with a Tree Cloud. In: Locarek-Junge, H., Weihs, C. (eds) Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10745-0_61
Download citation
DOI: https://doi.org/10.1007/978-3-642-10745-0_61
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10744-3
Online ISBN: 978-3-642-10745-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)