Abstract
We present Metamorph, a system and framework for generating vertical deep Web search engines in a knowledge-based way. The approach enables the separation between the roles of a higher skilled ontology engineer and a less skilled service engineer, which adds new web sources in an intuitive, semi-automatic manner using the proven Lixto suite. One part of the framework is the understanding process for complex web search forms, and the generation of an ontological representation of each form and its intrinsic run-time dependencies. Based on these representations, a unified meta form and matchings from the meta form to the individual search forms and vice versa are created, taking into account different form element types, contents and labels. We discuss several aspects of the Metamorph ontology, which focuses especially on the interaction semantics of web forms, and give a short account of our semi-automatic tagging system.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baumgartner, R., Flesca, S., Gottlob, G.: Visual Web Information Extraction with Lixto. In: Proc. VLDB (2001)
Baumgartner, R.: Methoden und Werkzeuge zur Webdatenextraktion. In: Semantic Web: Auf dem Weg zur vernetzten Wissensgesellschaft. Springer, Heidelberg (2006) (in German)
Chang, C., Kayed, M., Girgis, M.R., Shaalan, K.F.: A Survey of Web Information Extraction Systems. IEEE Trans. on Knowledge and Data Eng. 18/10 (2006)
Crescenzi, V., Mecca, G., Merialdo, P.: Roadrunner: Towards automatic data extraction from large web sites. In: Proc. VLDB (2001)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proc. ACL (2002)
Dapper. Products page (2006), http://www.dapper.net
Embley, D.W., Campbell, D.M., Smith, R., Liddle, S.W.: Ontology-based Extraction and Structuring of Information from Data-rich Unstructured Documents. In: Proc. CIKM (1998)
He, H., Meng, W., Yu, C., Wu, Z.: WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce (2003)
He, B., Chang, K.: Automatic complex schema matching across Web query interfaces: A correlation mining approach. ACM Trans. Database Syst. 31/1 (2006)
IBM Mashup Center, http://www-306.ibm.com/software/info/mashup-center
Kuhlins, S., Tredwell, R.: Toolkits for generating wrappers. Net.Object Days (2002)
Laender, A.H.F., Ribeiro-Neto, B.A., Silva, A.S., Teixara, J.S.: A brief survey of web data extraction tools. SIGMOD Rec. 31/2 (2002)
Ennals, R., Garofalakis, M.: MashMaker: Mashups for the Masses. In: SIGMOD (2007)
Meng, W., Peng, Q.: Clustering e-commerce search engines (2004)
Shestakov, D., Bhowmick, S., Lim, E.P.: DEQUE: Querying the Deep Web. Data and Knowledge Engineering 52 (2005)
Wang, J., Lochovsky, F.H.: Data Extraction and Label Assignment for Web Databases. In: Proc. WWW (2003)
Yahoo Pipes, http://pipes.yahoo.com/pipes
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Holzinger, W., Krüpl, B., Baumgartner, R. (2009). Automated Ontology-Driven Metasearch Generation with Metamorph . In: Vossen, G., Long, D.D.E., Yu, J.X. (eds) Web Information Systems Engineering - WISE 2009. WISE 2009. Lecture Notes in Computer Science, vol 5802. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04409-0_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-04409-0_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04408-3
Online ISBN: 978-3-642-04409-0
eBook Packages: Computer ScienceComputer Science (R0)