Abstract
The relational database model is widely used in real applications. We propose a way of complementing such a database with an XML data warehouse. The approach we propose is generic, and driven by a domain ontology. The XML data warehouse is built from data extracted from the Web, which are semantically tagged using terms belonging to the domain ontology. The semantic tagging is fuzzy, since, instead of tagging the values of the Web document with one value of the domain ontology, we propose to use tags expressed in terms of a possibility distribution representing a set of possible terms, each term being weighted by a possibility degree. The querying of the XML data warehouse is also fuzzy: the end-users can express their preferences by means of fuzzy selection criteria. We present our approach on a first application domain: predictive microbiology.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Aguilé ra, V., Cluet, S., Vetri, P., Vodislav, D., & Wattez, F. (2000). Querying the XML documents on the Web. In Proceedings of the ACMSIGIR Workshop on XML and I.R., Athens, July 2000.
Bosc, P., Lietard, L., & Pivert, O. (1994). Soft querying, a new feature for database management system. In Proceedings DEXA'94 (Database and EXpert system Application), Lecture Notes in Computer Science #856 (pp. 631–640). Springer-Verlag.
Bosc, P., Lietard, L., & Pivert, O. (1999). Fuzziness in D atabase M anagement S ystems, chapter Fuzzy theory techniques and applications in data-base management systems, (pp. 666–671). Academic Press.
Boyce, B. R., Meadow, C. T., & Kraft, D. H. (1995). Measurement in information science. New York: Academic Press.
Bosc, P., & Pivert, O. (1995). SQL f: A relational database language for fuzzy querying. IEEE Transactions on Fuzzy Systems, 3(1), 1–17.
Bordogna, G., & Pasi, G. (1999). A fuzzy object oriented data model managing vague and uncertain information. International Journal of Intelligent Systems, 14(6), SCI 3495.
Bordogna, G. & Pasi, G. (2001). Modeling vagueness in information retrieval. In Proceedings of ESSIR 2000, Lecture Notes in Computer Science #1980, (pp. 207–241).
Bordogna, G., & Pasi, G, (2002). Flexible querying of web documents. In Proceedings of the ACM Symposium Applied Computing, (pp. 675–680). Madrid, Spain, 2002.
Buche, P., Dervin, C., Haemmerlé, O., & Thomopoulos, R. (2005). Fuzzy querying of incomplete, imprecise and heterogeneously structured data in the relational model using ontologies and rules. IEEE Transactions on Fuzzy Systems, 13(3), 373–383.
De Cock, M., Guadarrama, S., & Nikravesh, M. (2004). Fuzzy thesauri for and from the www. In M. Nikravesh, L. Zadeh, J. Kacprzyk (Eds.), soft computing for information processing and Analysis, (pp. 275–284).
Dubois, D., & Prade, H. (1988). Possibility theory—An approach to computerized processing of uncertainty. New York: Plenum Press.
Egghe, L., & Michel, C. (2002). Strong similarity measures for ordered sets of documents in information retrieval. Information Processing and Management, 38, 823–848.
Hignette, G., Buche, P., Dibie-Barthélemy, J., & Haemmerlé, O. (2005). Fuzzy semantic annotation of xml documents. In E. T. J. Castro (Ed.), In Proceedings of CAiSE'05 Workshops. The 17th conference on advanced information systems engineering, DisWeb'05, (pp. 319–332). Porto, Portugal, 2005. FEUP edicoes.
Lin, Dekang, (1998). An information-theoretic definition of similarity. In ICML '98: Proceedings of the Fifteenth International Conference on Machine Learning (pp. 296–304). San Francisco, CA, USA, 1998. Morgan Kaufmann Publishers Inc.
Miyamoto, S. (1990). Information retrieval based on fuzzy associations. Fuzzy Sets and Systems, 38, 191–205.
Prade, H. (1984). Lipski's approach to incomplete information data bases restated and generalized in the setting of Z adeh's possibility theory. Information Systems, 9(1), 27–42.
Salton, G., & Buckley, C. (1988). Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5), 513–523.
Salton, G., & Gill, M.J.Mc. (1987). Introduction to modern information retrieval. New York: Mc Graw-Hill.
Saïs, F., Gagliardi, H., Haemmerlé, O., & Pernelle, N., janvier (2005). Enrichissement sémantique de documents SML représentant des tableaux. In Actes des 5émes journÈes Extraction et Gestion des Connaissances, EGC'2005, Revue des Nouvelles Technologies de l'Information—RNTI, (pp. 407–419), Paris, France, Janvier 2005.
Spark Jones, K. A. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28(1), 11–20.
Xyleme, Lucie, (2001). A dynamic warehouse for xml data of the web. IEEE Data Engineering Bulletin.
Yager, R. (1988). On ordered weighted averaging aggregation operators in multi-criteria decision making. IEEE Transactions on Systems, Man and Cybernetics, 18(1), 183–190.
Zadeh, L. (1965). Fuzzy sets. Information and Control, 8, 338–353.
Zadeh, L., (1978). Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems, 1, 3–28.
Zadeh, L. A. (1983). A computational approach to fuzzy quantifiers in natural languages. Computing and Mathematics with Applications, 9, 149–184.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Buche, P., Dibie-Barthélemy, J., Haemmerlé, O. et al. Fuzzy semantic tagging and flexible querying of XML documents extracted from the Web. J Intell Inf Syst 26, 25–40 (2006). https://doi.org/10.1007/s10844-006-5449-8
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-5449-8