Abstract
Several data quality management (DQM) tasks like duplicate detection or consistency checking depend on domain specific knowledge. Many DQM approaches have potential for bringing together domain knowledge and DQM metadata. We provide an approach which uses this knowledge modeled in ontologies instead of aquiring that knowledge by cost-intensive interviews with domain-experts. These ontologies can directly be annotated with DQM specific metadata. With our approach a synergy effect can be achieved when modeling a domain ontology, e.g. for defining a shared vocabulary for improved interoperability, and performing DQM. We present five DQM applications which directly use knowledge provided by domain ontologies. These applications use the ontology structure itself to provide correction suggestions for invalid data, identify duplicates, and to store data quality annotations at schema and instance level.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Domain Ontology
- Aggregation Layer
- International Electrotechnical Commission
- Consistency Constraint
- Instance Level
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Amicis, F.D., Batini, C.: A methodology for data quality assessment on financial data. Studies in Communication Sciences 4, 115–136 (2004)
Batini, C., Scannapieco, M.: Data Quality. Springer, Heidelberg (2006)
Bilenko, M., Mooney, J.R.: Employing trainable string metrics for information integration. In: Proceedings of the IJCAI-2003 Workshop on Information Integration on the Web, Acapulco, Mexico, pp. 67–72 (August 2003)
Brüggemann, S.: Rule mining for automatic ontology based data cleaning. In: Zhang, Y., Yu, G., Bertino, E., Xu, G. (eds.) APWeb 2008. LNCS, vol. 4976, pp. 522–527. Springer, Heidelberg (2008)
Brüggemann, S.: Proaktives Management von Konsistenzbedingungen im Analytischen Performance Management. In: Proceedings of DW 2008, Synergien durch Integration and Informationslogistik (2008)
Fellegi, I.P., Holt, D.: A systematic approach to automatic edit and imputation. Journal of the American Statistcal Association 71, 17–35 (1976)
Fellegi, I.P., Sunter, A.B.: A theory for record linkage. Journal of the American Statistical Association 64(328), 1183–1210 (1969)
Gebben, T.: OWL-Reasoner basierte Gültigkeitsprüfung von CIM-Topologien gemäßCommon Power System Model (CPSM). Master thesis, Universität Oldenburg (to be published) (2009)
Hinrichs, H.: Datenqualitätsmanagement in Data Warehouse-Systemen. PhD thesis, Universität Oldenburg (2002)
IEC - International Electrotechnical Commission: IEC 61970:301: Energy management system application program interface (EMS-API) - Part 301: Common Information Model (CIM) Base. International Electrotechnical Commission (2003)
IEC - International Electrotechnical Commission: IEC 61970: Energy Management System Application Program Interface (EMS-API) - Part 452: CIM Network Applications Model Exchange Specification. International Electrotechnical Commission (2006)
International Union Against Cancer (UICC). TNM Classification of Malignant Tumours, 6th edn. John Wiley & Sons, New Jersey (2001)
Kedad, Z., Métais, E.: Ontology-based data cleaning. In: Andersson, B., Bergholtz, M., Johannesson, P. (eds.) NLDB 2002. LNCS, vol. 2553, pp. 137–149. Springer, Heidelberg (2002)
Microsoft Corporation: Domain Specific Language Tools, http://msdn2.microsoft.com/en-us/vstudio/aa718368.aspx/ (Feburary 12, 2009)
Milano, D., Scannapieco, M., Catarci, T.: Using ontologies for xml data cleaning. In: Meersman, R., Tari, Z., Herrero, P. (eds.) OTM-WS 2005. LNCS, vol. 3762, pp. 562–571. Springer, Heidelberg (2005)
Rahm, E., Do, H.H.: Data cleaning: Problems and current approaches. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering 23(4), 3–13 (2000)
Schünemann, M.: Duplikatenerkennung in Datensätzen mithilfe selbstlernender Algorithmen. Master thesis, Universität Oldenburg (2007)
Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical OWL-DL reasoner. Web Semantics: Science, Services and Agents on the World Wide Web 5(2), 51–53 (2007)
Uslar, M., Grüning, F.: Zur semantischen Interoperabilität in der Energiebranche: CIM IEC 61970. Wirtschaftsinformatik 49(4), 295–303 (2007)
Wang, X., Hamilton, H.J., Bither, Y.: An ontology-based approach to data cleaning. Technical report, Department of Computer Science, University of Regina (June 2005)
Wietek, F.: Intelligente Analyse multidimensionaler Daten in einer visuellen Programmierumgebung und deren Anwendung in der Krebsepidemiologie. PhD thesis, Universität Oldenburg (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Brüggemann, S., Grüning, F. (2009). Using Ontologies Providing Domain Knowledge for Data Quality Management. In: Pellegrini, T., Auer, S., Tochtermann, K., Schaffert, S. (eds) Networked Knowledge - Networked Media. Studies in Computational Intelligence, vol 221. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02184-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-642-02184-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02183-1
Online ISBN: 978-3-642-02184-8
eBook Packages: EngineeringEngineering (R0)