Abstract
We introduce the application of text categorization techniques to the astronomy field to work out semantic ambiguities between table column’s names. In the astronomy field, astronomers often assign different names to table columns at their will even if they are about the same attributes of sky objects. As a result, it produces a big problem for data analysis over different tables. To solve this problem, the standard vocabulary called “unified concept descriptors (UCD)” has been defined. The reported data about sky objects can be easily analyzed through assigning columns to the predefined UCDs. In this paper, the widely used Rocchio categorization algorithm is implemented to assign UCD. An algorithm is realized to extract domain-specific semantics for text indexing while the traditional cosine-based category score model is extended by combining domain knowledge. The experiments show that Rocchio algorithm together with the proposed category score model performs well.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
CDS, http://cdsweb.u-strasbg.fr/ ; VizieR, http://cdsweb.u-strasbg.fr/viz-bin/VizieR ; UCD assignment, http://cdsweb.u-strasbg.fr/UCD/assign/
Cai, L., Hofmann, T.: Text categorization by boosting automatically extracted concepts. In: Proceedings of the 26th SIGIR conference, Canada, pp. 182–189 (2003)
Joachims, T.: A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. In: The 14th Int. Conf. Machine Learning, pp. 143–151 (1997)
Kou, H.: Intelligent Web Wrapper Generation Using Text Mining Techniques, PhD thesis, University of Versailles (July 2003)
Ortiz, P.F., Ochsenbein, F., Wicenec, A., Albrecht, M.: ESO/CDS Data-mining Tool Development Project. In: Mehringer, D.M., Plante, R.L., Roberts, D.A. (eds.) ASP Conf. Ser., vol. 172, ASP, San Francisco (1999)
Riloff, E., Lehnert, W.: Information extraction as a basis for high-precision text classification. ACM Transactions on Information Systems 12(3), 296–333 (1994)
Yang, Y.: A study on thresholding strategies for text categorization. In: Proceedings of the 24th ACM SIGIR Conference, pp. 137–145 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kou, H., Napoli, A., Toussaint, Y. (2005). Application of Text Categorization to Astronomy Field. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_4
Download citation
DOI: https://doi.org/10.1007/11428817_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26031-8
Online ISBN: 978-3-540-32110-1
eBook Packages: Computer ScienceComputer Science (R0)