Abstract
A wrapper is a program which extracts data from a web site and reorganizes them in a database. Wrapper generation from web sites is a key technique in realizing such a metasearch system. We present a new method of automatic wrapper generation for metasearch using our efficient learning algorithm for term trees. Term trees are ordered tree structured patterns with structured variables, which represent structural features common to tree structured data such as HTML files.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Crescenzi, V., Mecca, G., Merialdo, P.: ROADRUNNER: Towards automatic data extraction from large web sites. In: Proc. VLDB-2001, pp. 109–118 (2001)
Dale, R., Paris, C., Tilbrook, M.: Information extraction via path merging. In: Gedeon, T(T.) D., Fung, L.C.C. (eds.) AI 2003. LNCS (LNAI), vol. 2903, pp. 150–160. Springer, Heidelberg (2003)
Hirokawa, S.: HumanTecnoSystem Co. Research and development of the next-generation search engine by dynamic integration of search sites, in Japanese (2002), http://daisen.cc.kyushu-u.ac.jp/thesis/thesis.pdf
Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive power of tree and string based wrappers. In: Proceedings of IJCAI-2003 Workshop on Information Integration on the Web (IIWeb-2003), pp. 21–26 (2003)
Kushmerick, N.: Wrapper induction: efficiency and expressiveness. Artificial Intelligence 118, 15–68 (2000)
Laender, A.H.F., Ribeiro-Neto, B.A., da Silva, A.S., Teixeira, J.S.: A brief survey of web data extraction tools. SIGMOD Record 31(2), 84–93 (2002)
Liu, B., Grossman, R.L., Zhai, Y.: Mining data records in web pages. In: Proc. KDD-2003, pp. 601–606. AAAI Press, Menlo Park (2003)
Miyahara, T., Suzuki, Y., Shoudai, T., Uchida, T., Takahashi, K., Ueda, H.: Discovery of maximally frequent tag tree patterns with contractible variables from semistructured documents. In: Dai, H., Srikant, R., Zhang, C. (eds.) PAKDD 2004. LNCS (LNAI), vol. 3056, pp. 133–144. Springer, Heidelberg (2004)
Sakamoto, H., Murakami, Y., Arimura, H., Arikawa, S.: Extracting partial structures from html documents. In: Proceedings of the Fourteenth International Florida Artificial Intelligence Research Society Conference,2001, pp. 264–268 (2001)
Suzuki, Y., Shoudai, T., Matsumoto, S., Uchida, T., Miyahara, T.: Efficient learning of ordered and unordered tree patterns with contractible variables. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds.) ALT 2003. LNCS (LNAI), vol. 2842, pp. 114–128. Springer, Heidelberg (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aikou, K., Suzuki, Y., Shoudai, T., Miyahara, T. (2004). Automatic Wrapper Generation for Metasearch Using Ordered Tree Structured Patterns. In: Webb, G.I., Yu, X. (eds) AI 2004: Advances in Artificial Intelligence. AI 2004. Lecture Notes in Computer Science(), vol 3339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30549-1_97
Download citation
DOI: https://doi.org/10.1007/978-3-540-30549-1_97
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24059-4
Online ISBN: 978-3-540-30549-1
eBook Packages: Computer ScienceComputer Science (R0)