Abstract
This paper discusses the automatic ontology construction process in a digital library. Traditional automatic ontology construction uses hierarchical clustering to group similar terms, and the result hierarchy is usually not satisfactory for human’s recognition. Human-provided knowledge network presents strong semantic features, but this generation process is both labor-intensive and inconsistent under large scale scenario. The method proposed in this paper combines the statistical correction and latent topic extraction of textual data in a digital library, which produces a semantic-oriented and OWL-based ontology. The experimental document collection used here is the Chinese Recorder, which served as a link between the various missions that were part of the rise and heyday of the Western effort to Christianize the Far East. The ontology construction process is described and a final ontology in OWL format is shown in our result.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Digital Library
- Latent Dirichlet Allocation
- Latent Semantic Analysis
- Domain Ontology
- Optical Character Recognition
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Yeh, J.-H., Sie, S.-h.: Towards automatic concept hierarchy generation for specific knowledge network. In: Ali, M., Dapoigny, R. (eds.) IEA/AIE 2006. LNCS (LNAI), vol. 4031, pp. 982–989. Springer, Heidelberg (2006)
Chen, C.-c., Yeh, J.-H., Sie, S.-h.: Government ontology and thesaurus construction: A taiwanese experience. In: Fox, E.A., Neuhold, E.J., Premsmit, P., Wuwongse, V. (eds.) ICADL 2005, vol. 3815, pp. 263–272. Springer, Heidelberg (2005)
Deborah, L., McGuinness, Harmelen, F.v.: OWL Web Ontology Language Overview. W3C Recommendation (February 2004), http://www.w3.org/TR/owl-features/
Noy, N.F., McGuinness, D.L.: Ontology Development 101: A Guide to Creating Your First Ontology (2001)
The Chinese Recorder, Scholarly Resources, Inc, 1867-1941
Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice-Hall, Englewood Cliffs (1988)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31, 264–323 (1999)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: Proceedings of ICML 1997, 14th International Conference on Machine Learning (1997)
Li, F., Yang, Y.: A loss function analysis for classification methods in text categorization. In: The Twentieth International Conference on Machine Learning (ICML 2003), pp. 472–479 (2003)
Valdes-Perez, R.E., et al.: Demonstration of Hierarchical Document Clustering of Digital Library Retrieval Results. In: Joint Conference on Digital Libraries (JDCL 2001), Roanoke, VA, June 24-28 (2001)(presented as a demonstration)
Yang, Y., Zhang, J., Kisiel, B.: A scalability analysis of classifiers in text categorization. In: ACM SIGIR 2003, pp. 96–103 (2003)
Widyantoro, D., Ioerger, T.R., Yen, J.: An Incremental Approach to Building a Cluster Hierarchy. In: Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM 2002 (2002)
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42(1), 177–196 (2001)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. Journal of Machine Learning Research 3(5), 993–1022 (2003)
Girolami, M., Kaban, A.: On an equivalence between PLSI and LDA. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 433–434 (2003)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Collins, M.: A new statistical parser based on bigram lexical dependencies. In: Proceedings of the 34th Annual Meeting of the Association of Computational Linguistics, Santa Cruz, CA, pp. 184–191 (1996)
British National Corpus, http://www.natcorp.ox.ac.uk/
Lodwick, K.L.: The Chinese Recorder Index: a guide to Christian Missions in Asia, 1867–1941. Scholarly Resources Inc., Wilmington (1986)
Noy, N.F., Fergerson, R.W., Musen, M.A.: The knowledge model of protégé-2000: Combining interoperability and flexibility. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 17–32. Springer, Heidelberg (2000)
Yeh, J.-h., Sie, S.-h.: Common Ontology Generation with Partially Available Side Information through Similarity Propagation. In: Proceedings of the 2007 International Conference on Semantic Web and Web Services(SWWS 2007), Las Vegas, USA (June 2007)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Doklady Akademii Nauk SSSR 163(4), 845–848 (1965)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yeh, Jh., Yang, N. (2008). Ontology Construction Based on Latent Topic Extraction in a Digital Library. In: Buchanan, G., Masoodian, M., Cunningham, S.J. (eds) Digital Libraries: Universal and Ubiquitous Access to Information. ICADL 2008. Lecture Notes in Computer Science, vol 5362. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89533-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-89533-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89532-9
Online ISBN: 978-3-540-89533-6
eBook Packages: Computer ScienceComputer Science (R0)