Automatic Legal Document Analysis: Improving the Results of Information Extraction Processes Using an Ontology

Buey, María G.; Roman, Cristian; Garrido, Angel Luis; Bobed, Carlos; Mena, Eduardo

doi:10.1007/978-3-319-77604-0_24

María G. Buey⁷,
Cristian Roman⁷,
Angel Luis Garrido⁸,
Carlos Bobed⁸ &
…
Eduardo Mena⁸

Part of the book series: Studies in Big Data ((SBD,volume 40))

922 Accesses
3 Citations

Abstract

Information Extraction (IE) is a pervasive task in the industry that allows to obtain automatically structured data from documents in natural language. Current software systems focused on this activity are able to extract a large percentage of the required information, but they do not usually focus on the quality of the extracted data. In this paper we present an approach focused on validating and improving the quality of the results of an IE system. Our proposal is based on the use of ontologies which store domain knowledge, and which we leverage to detect and solve consistency errors in the extracted data. We have implemented our approach to run against the output of the AIS system, an IE system specialized in analyzing legal documents and we have tested it using a real dataset. Preliminary results confirm the interest of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GaiusT 2.0: Evolution of a Framework for Annotating Legal Documents

Introducing Solon: A Semantic Platform for Managing Legal Sources

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Notes

1.
https://www.isyc.com/es/soluciones/oncustomer.html.
2.
https://www.isyc.com.
3.
https://www.tessi.fr.
4.
http://dblab.cs.toronto.edu/project/xcurator/.
5.
https://twitter.com/.
6.
A Named Entity is a unique identifier of an entity in a text, e.g.,’Marie Curie’ is a NE of a person.
7.
R2RML: RDB to RDF Mapping Language, https://www.w3.org/TR/r2rml/.
8.
This example is directly taken from the experiments dataset. Proper names and specific data have been altered for reasons of privacy.
9.
https://developers.google.com/maps/documentation/geocoding/start.
10.
http://openrefine.org/.
11.
https://github.com/OpenRefine/OpenRefine/wiki/General-Refine-Expression-Language.
12.
In Spain, people have a first name, an optional middle name, and two mandatory last names, the first one if the father family name and the second one is the mother family name, although legally this order can be interchanged.

References

Rahm, E., Do, H.H.: Data cleaning: problems and current approaches. IEEE Data Eng. Bull. 23(4), 3–13 (2000)
Google Scholar
Curry, E., Freitas, A., ORiáin, S.: The role of community-driven data curation for enterprises. In: Linking Enterprise Data, pp. 25–47 (2010)
Chapter Google Scholar
Gruber, T.R.: Toward principles for the design of ontologies used for knowledge sharing. Int. J. Human Comput. Stud. 43(5–6), 907–928 (1995)
Article Google Scholar
Buey, M.G., Garrido, A.L., Bobed, C., Ilarri, S.: The AIS project: boosting information extraction from legal documents by using ontologies. In: Proceedings of the 8th International Conference on Agents and Artificial Intelligence (ICAART 2016), pp. 438–445 (2016)
Google Scholar
Wimalasuriya, D.C., Dou, D.: Ontology-based information extraction: an introduction and a survey of current approaches. J. Inf. Sci. 36(3), 306–323 (2010)
Article Google Scholar
Borobia, J.R., Bobed, C., Garrido, A.L., Mena, E.: SIWAM: using social data to semantically assess the difficulties in mountain activities. In: Proceedings of 10th International Conference on Web Information Systems and Technologies (WEBIST’14), pp. 41–48 (2014)
Google Scholar
Garrido, A.L., Buey, M.G., Muñoz, G., Casado-Rubio, J.L.: Information extraction on weather forecasts with semantic technologies. In: International Conference on Applications of Natural Language to Information Systems (NLDB 2016), pp. 140–151. Springer International Publishing, Berlin (2016)
Chapter Google Scholar
Maletic, J.I., Marcus, A.: Data cleansing. In: Data Mining and Knowledge Discovery Handbook, pp. 21–36. Springer, Boston, MA (2005)
Google Scholar
Sarpong, K.A.M., Arthur, J.K.: Analysis of data cleansing approaches regarding dirty data-a comparative study. Int. J. Comput. Appl. 76(7) (2013)
Google Scholar
Yeganeh, S., Hassanzadeh, O., Miller, R. J.: Linking semistructured data on the web. In: Interface (2011)
Google Scholar
Guo, W., Li, H., Ji, H., Diab, M.T.: Linking tweets to news: a framework to enrich short text data in social media. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013), pp. 239–249 (2013)
Google Scholar
Wang, J., Bansal, M., Gimpel, K., Ziebart, B.D., Clement, T.Y.: A sense-topic model for word sense induction with unsupervised data enrichment. Trans. Assoc. Comput. Linguist. 3, 59–71 (2015)
Google Scholar
Sekine, S., Ranchhod, E.: Named Entities: Recognition, Classification and Use. John Benjamins Publishing Company (2009)
Google Scholar
Hu, Y., McKenzie, G., Yang, J.A., Gao, S., Abdalla, A., Janowicz, K.: A linked-data-driven web portal for learning analytics: data enrichment, interactive visualization, and knowledge discovery. In: LAK Workshops (2014)
Google Scholar
Yosef, M.A.: U-AIDA: a customizable system for named entity recognition, classification, and disambiguation. Ph.D thesis, Saarland University (2016)
Google Scholar
Suárez-Figueroa, M. C., Gómez-Pérez, A., Motta, E., Gangemi, A. Ontology engineering in a networked world. Springer Science and Business Media (2012)
Google Scholar
Euzenat, J., Valtchev, P.: Similarity-based ontology alignment in owl-lite. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), pp. 323–327. IOS Press, Amsterdam (2004)
Google Scholar
Bollegala, D., Matsuo, Y., Ishizuka, M.: Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International World Wide Web Conference (WWW’07), pp. 757–766 (2007)
Google Scholar
Jiang, Y., Wang, X., Zheng, H.T.: A semantic similarity measure based on information distance for ontology alignment. Inf. Sci. 278, 76–87 (2014)
Article Google Scholar
Yujian, L., Bo, L.: A normalized levenshtein distance metric. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1091–1095 (2007)
Article Google Scholar
van Rijsbergen, C.J.: Information Retrieval, 2nd. edn. Butterworth-Heinemann (1979). ISBN 0408709294
Google Scholar

Download references

Acknowledgements

This research work has been supported by projects TIN2013-46238-C4-4-R, TIN2016-78011-C4-3-R (AEI/FEDER, UE), and DGA/FEDER.

Author information

Authors and Affiliations

InSynergy Consulting S.A., Madrid, Spain
María G. Buey & Cristian Roman
Department of Computer Science and System Engineering, University of Zaragoza, Zaragoza, Spain
Angel Luis Garrido, Carlos Bobed & Eduardo Mena

Authors

María G. Buey
View author publications
You can also search for this author in PubMed Google Scholar
Cristian Roman
View author publications
You can also search for this author in PubMed Google Scholar
Angel Luis Garrido
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Bobed
View author publications
You can also search for this author in PubMed Google Scholar
Eduardo Mena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to María G. Buey .

Editor information

Editors and Affiliations

Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Robert Bembenik
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Łukasz Skonieczny
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Grzegorz Protaziuk
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Marzena Kryszkiewicz
Institute of Computer Science, Warsaw University of Technology, Warsaw, Poland
Henryk Rybinski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Buey, M.G., Roman, C., Garrido, A.L., Bobed, C., Mena, E. (2019). Automatic Legal Document Analysis: Improving the Results of Information Extraction Processes Using an Ontology. In: Bembenik, R., Skonieczny, Ł., Protaziuk, G., Kryszkiewicz, M., Rybinski, H. (eds) Intelligent Methods and Big Data in Industrial Applications. Studies in Big Data, vol 40. Springer, Cham. https://doi.org/10.1007/978-3-319-77604-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-77604-0_24
Published: 19 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-77603-3
Online ISBN: 978-3-319-77604-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Automatic Legal Document Analysis: Improving the Results of Information Extraction Processes Using an Ontology

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GaiusT 2.0: Evolution of a Framework for Annotating Legal Documents

Introducing Solon: A Semantic Platform for Managing Legal Sources

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Automatic Legal Document Analysis: Improving the Results of Information Extraction Processes Using an Ontology

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GaiusT 2.0: Evolution of a Framework for Annotating Legal Documents

Introducing Solon: A Semantic Platform for Managing Legal Sources

Combining Natural Language Processing Approaches for Rule Extraction from Legal Documents

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation