Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM

Brunzel, Marko; Spiliopoulou, Myra

doi:10.1007/11730262_5

Marko Brunzel¹⁸ &
Myra Spiliopoulou¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3915))

Included in the following conference series:

International Workshop on Knowledge Discovery from XML Documents

295 Accesses
8 Citations

Abstract

The Semantic Web needs ontologies as an integral component. Current methods for learning and enhancing ontologies, need to be further improved to overcome the knowledge acquisition bottleneck. The identification of concepts and relations with only minimal user interaction is still a challenging objective. Current approaches performed to extract semantics often use association rules or clustering upon regular flat text. In this paper we describe an approach on extracting semantics from Web Document collections which takes advantage of the semi structured content within XHTML (an XML dialect which can be obtained from traditional HTML documents) Web Documents.

The XTREEM (Xhtml TREE Mining) method uses structural information, the mark-up in Web content, as indicators of term boundaries and for co-hyponymy relations.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Article 04 August 2017

Discovering Semantics from Data-Centric XML

Schema Extraction and Integration of Heterogeneous XML Document Collections

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Basili, R., Missikoff, M., Velardi, P.: Identification of relevant terms to support the construction of Domain Ontologies. In: ACL 2001 workshop on Human language Technologies, Toulouse, France (July 2001)
Google Scholar
Buitelaar, P., Olejnik, D., Sintek, M.: Ontology Learning from Text: Methods, Evaluation and Applications, Frontiers in Artificial Intelligence and Applications Series, vol. 123. IOS Press, Amsterdam (2005)
Google Scholar
http://www.websters-online-dictionary.org/definition/english/co/co-hyponyms.html
Dalamagas, T., Cheng, T., Winkel, K.-J., Sellis, T.: A Methodology for Clustering XML Documents by Structure. Information Systems (in press, 2004)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-Scale Information Extraction in KnowItAll. In: Proceedings of the 13th International WWW Conference, New York (2004)
Google Scholar
Faure, D., Nedellec, C.: Knowledge acquisition of predicate argument structures from technical texts using machine learning: the system ASIUM. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999)
Chapter Google Scholar
Gillam, L., Tariq, M., Ahmad, K.: Terminology and the Construction of Ontology. In: Terminology, vol. 11, pp. 55–81. John Benjamins Publishing Company, Amsterdam (2005)
Google Scholar
Kruschwitz, U.: A Rapidly Acquired Domain Model Derived from Mark-up Structure. In: Proceedings of the ESSLLI 2001 Workshop on Semantic Knowledge Acquisition and Categorization, Helsinki (2001)
Google Scholar
Kruschwitz, U.: Exploiting Structure for Intelligent Web Search. In: Proc. of the 34th Hawaii International Conference on System Sciences (HICSS), Maui Hawaii, IEEE, Los Alamitos (2001)
Google Scholar
Kashyap, V.: Design and creation of ontologies for environmental information retrieval. In: Proc. of the 12th Workshop on Knowledge Acquisition, Modeling and Management. Alberta, Canada (1999)
Google Scholar
Maedche, A., Staab, S.: Discovering conceptual relations from text. In: Proc. of ECAI-2000, pp. 321–325 (2000)
Google Scholar
Nierman, A., Jagadish, H.V.: Evaluating Structural Similarity in XML Documents. In: Proc. of International Workshop on the Web and Databases, pp. 61–66 (2002)
Google Scholar
Stojanovic, L., Stojanovic, N., Volz, R.: Migrating data-intensive Web Sites into the Semantic Web. In: Proc. of the 17th ACM symposium on applied computing, pp. 1100–1107. ACM Press, New York (2002)
Google Scholar
Shinzato, K., Torisawa, K.: Acquiring hyponymy relations from web documents. In: Proceedings of the 2004 Human Language Technology Conference (HLT-NAACL 2004), pp. 73–80. Boston, Massachusetts (2004)
Google Scholar
Witschel, H.F.: Terminology extraction and automatic indexing - comparison and qualitative evaluation of methods. In: Proc. of Terminology and Knowledge Engineering (TKE) (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Otto-von-Guericke-University Magdeburg, Germany
Marko Brunzel & Myra Spiliopoulou

Authors

Marko Brunzel
View author publications
You can also search for this author in PubMed Google Scholar
Myra Spiliopoulou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Information Technology, Queensland University of Technology, Brisbane, Australia
Richi Nayak
Computer Science Department, Rensselaer Polytechnic Institute, USA
Mohammed J. Zaki

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brunzel, M., Spiliopoulou, M. (2006). Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM. In: Nayak, R., Zaki, M.J. (eds) Knowledge Discovery from XML Documents. KDXD 2006. Lecture Notes in Computer Science, vol 3915. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11730262_5

Download citation

DOI: https://doi.org/10.1007/11730262_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33180-3
Online ISBN: 978-3-540-33181-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM

Abstract

Chapter PDF

Similar content being viewed by others

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Discovering Semantics from Data-Centric XML

Schema Extraction and Integration of Heterogeneous XML Document Collections

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Discovering Multi Terms and Co-hyponymy from XHTML Documents with XTREEM

Abstract

Chapter PDF

Similar content being viewed by others

Machine learning techniques for XML (co-)clustering by structure-constrained phrases

Discovering Semantics from Data-Centric XML

Schema Extraction and Integration of Heterogeneous XML Document Collections

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation