Abstract
The Semantic Web needs ontologies as an integral component. Current methods for learning and enhancing ontologies, need to be further improved to overcome the knowledge acquisition bottleneck. The identification of concepts and relations with only minimal user interaction is still a challenging objective. Current approaches performed to extract semantics often use association rules or clustering upon regular flat text. In this paper we describe an approach on extracting semantics from Web Document collections which takes advantage of the semi structured content within XHTML (an XML dialect which can be obtained from traditional HTML documents) Web Documents.
The XTREEM (Xhtml TREE Mining) method uses structural information, the mark-up in Web content, as indicators of term boundaries and for co-hyponymy relations.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Basili, R., Missikoff, M., Velardi, P.: Identification of relevant terms to support the construction of Domain Ontologies. In: ACL 2001 workshop on Human language Technologies, Toulouse, France (July 2001)
Buitelaar, P., Olejnik, D., Sintek, M.: Ontology Learning from Text: Methods, Evaluation and Applications, Frontiers in Artificial Intelligence and Applications Series, vol. 123. IOS Press, Amsterdam (2005)
http://www.websters-online-dictionary.org/definition/english/co/co-hyponyms.html
Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.: A Methodology for Clustering XML Documents by Structure. Information Systems (in press, 2004)
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in KnowItAll. In: Proceedings of the 13th International WWW Conference, New York (2004)
Faure, D., Nedellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: the system ASIUM. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)
Gillam, L., Tariq, M., Ahmad, K.: Terminology and the Construction of Ontology. In: Terminology, vol. 11, pp. 55–81. John Benjamins Publishing Company, Amsterdam (2005)
Kruschwitz, U.: A Rapidly Acquired Domain Model Derived from Mark-up Structure. In: Proceedings of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorization, Helsinki (2001)
Kruschwitz, U.: Exploiting Structure for Intelligent Web Search. In: Proc. of the 34th Hawaii International Conference on System Sciences (HICSS), Maui Hawaii, IEEE, Los Alamitos (2001)
Kashyap, V.: Design and creation of ontologies for environmental information retrieval. In: Proc. of the 12th Workshop on Knowledge Acquisition, Modeling and Management. Alberta, Canada (1999)
Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proc. of ECAI-2000, pp. 321–325 (2000)
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proc. of International Workshop on the Web and Databases, pp. 61–66 (2002)
Stojanovic, L., Stojanovic, N., Volz, R.: Migrating data-intensive Web Sites into the Semantic Web. In: Proc. of the 17th ACM symposium on applied computing, pp. 1100–1107. ACM Press, New York (2002)
Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from web documents. In: Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL 2004), pp. 73–80. Boston, Massachusetts (2004)
Witschel, H.F.: Terminology extraction and automatic indexing - comparison and qualitative evaluation of methods. In: Proc. of Terminology and Knowledge Engineering (TKE) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brunzel, M., Spiliopoulou, M. (2006). Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM. In: Nayak, R., Zaki, M.J. (eds) Knowledge Discovery from XML Documents. KDXD 2006. Lecture Notes in Computer Science, vol 3915. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11730262_5
Download citation
DOI: https://doi.org/10.1007/11730262_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33180-3
Online ISBN: 978-3-540-33181-0
eBook Packages: Computer ScienceComputer Science (R0)