EntityManager: An Entity-Based Dirty Data Management System

Wang, Hongzhi; Liu, Xueli; Li, Jianzhong; Tong, Xing; Yang, Long; Li, Yakun

doi:10.1007/978-3-642-37450-0_38

Hongzhi Wang²¹,
Xueli Liu²¹,
Jianzhong Li²¹,
Xing Tong²¹,
Long Yang²¹ &
…
Yakun Li²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7826))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1780 Accesses
1 Citations

Abstract

Dirty data exist in many systems. Efficient and effective management of dirty data is in demand. Since data cleaning may result in the the loss of useful data and new dirty data, we attempt to manage dirty data without cleaning and retrieve query result according to the quality requirement of users. Since entity is the unit for understanding objects in the world and many dirty data are led by different descriptions of the same real-world entity, we propose EntityManager, a dirty data management system with entity as the basic unit and keep conflicts in data as uncertain attributes. Even though the query language is SQL , the query in our system has different semantics on dirty data. In the demonstration, we will show a new philosophy for managing dirty data around entities. We will present our prototype allowing load dirty data and query dirty data according to the requirement of users.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

EntityManager: Managing Dirty Data Based on Entity Resolution

Article 12 May 2017

Entity Resolution in NoSQL Data Warehouse

Progressive Query-Driven Entity Resolution

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)
Google Scholar
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database System Implementation. Prentice-Hall (2000)
Google Scholar
Li, Y., Wang, H., Gao, H.: Efficient entity resolution based on sequence rules. In: Shen, G., Huang, X. (eds.) CSIE 2011, Part I. CCIS, vol. 152, pp. 381–388. Springer, Heidelberg (2011)
Chapter Google Scholar
Liu, X., Wang, H., Li, J., Gao, H.: Es-join: Similarity join algorithm based on entity. Research Report HITDB-12-001, Harbin Institute of Technology (October 2012)
Google Scholar
Liu, X., Wang, H., Li, J., Gao, H.: Multi-similarity join order selection in entity database. Journal of Frontiers of Computer Science and Technology 6(10), 865 (2012)
Google Scholar
Tong, X., Wang, H.: Fgram-tree: An index structure based on feature grams for string approximate search. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol. 7418, pp. 241–253. Springer, Heidelberg (2012)
Chapter Google Scholar
Tong, X., Wang, H., Li, J., Gao, H.: A top-k query algorithm for weighted string based on the tree structure index. In: National Database Conference of China (2012)
Google Scholar
Wang, H., Li, J., Wang, J., Gao, H.: Dirty data management in cloud database. In: Grid and Cloud Database Management, pp. 133–150 (2011)
Google Scholar
Zhang, Y., Yang, L., Wang, H.: Range query estimation for dirty data management system. In: Gao, H., Lim, L., Wang, W., Li, C., Chen, L. (eds.) WAIM 2012. LNCS, vol. 7418, pp. 152–164. Springer, Heidelberg (2012)
Chapter Google Scholar
Zhang, Y., Yang, L., Wang, H.: Similarity join size estimation with threshold for dirty data. Journal of Computers 35(10), 2159–2168 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology, China
Hongzhi Wang, Xueli Liu, Jianzhong Li, Xing Tong, Long Yang & Yakun Li

Authors

Hongzhi Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xueli Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhong Li
View author publications
You can also search for this author in PubMed Google Scholar
Xing Tong
View author publications
You can also search for this author in PubMed Google Scholar
Long Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yakun Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Binghamton University, 13902, Binghamton, NY, USA
Weiyi Meng
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Ling Feng
Department of Computer Science, National University of Singapore, 117417, Singapore
Stéphane Bressan
Research Group Data Analystics and Computing, University of Vienna, 1090, Vienna, Austria
Werner Winiwarter
School of Computer, Wuhan University, 430072, Wuhan, China
Wei Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, H., Liu, X., Li, J., Tong, X., Yang, L., Li, Y. (2013). EntityManager: An Entity-Based Dirty Data Management System. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37450-0_38

Download citation

DOI: https://doi.org/10.1007/978-3-642-37450-0_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37449-4
Online ISBN: 978-3-642-37450-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

EntityManager: An Entity-Based Dirty Data Management System

Abstract

Chapter PDF

Similar content being viewed by others

EntityManager: Managing Dirty Data Based on Entity Resolution

Entity Resolution in NoSQL Data Warehouse

Progressive Query-Driven Entity Resolution

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

EntityManager: An Entity-Based Dirty Data Management System

Abstract

Chapter PDF

Similar content being viewed by others

EntityManager: Managing Dirty Data Based on Entity Resolution

Entity Resolution in NoSQL Data Warehouse

Progressive Query-Driven Entity Resolution

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation