Abstract
The literature provides many techniques to infer rules that can be used to configureweb information extractors.Unfortunately, these techniques have been developed independently, which makes it very difficult to compare the results: there is not even a collection of datasets on which these techniques can be assessed. Furthermore, there is not a common infrastructure to implement these techniques, which makes implementing them costly. In this paper, we propose a framework that helps software engineers implement their techniques and compare the results. Having such a framework allows comparing techniques side by side and our experiments prove that it helps reduce development costs.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Álvarez, M., et al.: Extracting lists of data records from semi-structured web pages. Data Knowl. Eng. 64(2) (2008)
Chang, C.-H., et al.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10) (2006)
Chang, C.-H., Lui, S.-C.: IEPAD: information extraction based on pattern discovery. In: WWW (2001)
Chiticariu, L., et al.: Enterprise information extraction: recent developments and open challenges. In: SIGMOD Conference (2010)
Crescenzi, V., et al.: Roadrunner: Towards automatic data extraction from large web sites. In: VLDB (2001)
de Viana, I.F., Hernandez, I., Jiménez, P., Rivero, C.R., Sleiman, H.A.: Integrating Deep-Web Information Sources. In: Demazeau, Y., Dignum, F., Corchado, J.M., Bajo, J., Corchuelo, R., Corchado, E., Fernández-Riverola, F., Julián, V.J., Pawlewski, P., Campbell, A. (eds.) Trends in PAAMS. AISC, vol. 71, pp. 311–320. Springer, Heidelberg (2010)
Hsu, C.-N., Dung, M.-T.: Generating finite-state transducers for semi-structured data extraction from the web. Inf. Syst. 23(8) (1998)
Kayed, M., Chang, C.-H.: FiVaTech: Page-level web data extraction from template pages. IEEE Trans. Knowl. Data Eng. (2010)
Kushmerick, N.: et al. Wrapper induction: Efficiency and expressiveness. Artif. Intell. 118(1-2) (2000)
Laender, A.H.F., et al.: DEByE - data extraction by example. Data Knowl. Eng. 40(2) (2002)
Muslea, I., et al.: Extraction patterns for information extraction tasks: A survey. In: AAAI-1999 Workshop on Machine Learning for IE (1999)
Muslea, I., et al.: Hierarchical wrapper induction for semistructured information sources. Autonomous Agents and Multi-Agent Systems 4(1/2) (2001)
Papadakis, N., et al.: Stavies: A system for information extraction from unknown web data sources through automatic web wrapper generation using clustering techniques. IEEE Trans. Knowl. Data Eng. 17(12) (2005)
Simon, K., Lausen, G.: ViPER: augmenting automatic information extraction with visual perceptions. In: International Conference on Information and Knowledge Management (2005)
Wang, J., Lochovsky, F.H.: Data extraction and label assignment for web databases. In: WWW (2003)
Zhai, Y., Liu, B.: Structured data extraction from the Web based on partial tree alignment. IEEE Trans. Knowl. Data Eng. 18(12) (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sleiman, H.A., Corchuelo, R. (2012). Information Extraction Framework. In: Rodríguez, J., Pérez, J., Golinska, P., Giroux, S., Corchuelo, R. (eds) Trends in Practical Applications of Agents and Multiagent Systems. Advances in Intelligent and Soft Computing, vol 157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28795-4_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-28795-4_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28794-7
Online ISBN: 978-3-642-28795-4
eBook Packages: EngineeringEngineering (R0)