D-Ocean: an unstructured data management system for data ocean environment

Zhuang, Yueting; Wang, Yaoguang; Shao, Jian; Chen, Ling; Lu, Weiming; Sun, Jianling; Wei, Baogang; Wu, Jiangqin

doi:10.1007/s11704-015-5045-6

D-Ocean: an unstructured data management system for data ocean environment

Research Article
Published: 20 October 2015

Volume 10, pages 353–369, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Frontiers of Computer Science Aims and scope Submit manuscript

D-Ocean: an unstructured data management system for data ocean environment

Download PDF

Yueting Zhuang¹,
Yaoguang Wang¹,
Jian Shao¹,
Ling Chen¹,
Weiming Lu¹,
Jianling Sun¹,
Baogang Wei¹ &
…
Jiangqin Wu¹

240 Accesses
10 Citations
Explore all metrics

Abstract

Together with the big datamovement,many organizations collect their own big data and build distinctive applications. In order to provide smart services upon big data, massive variable data should be well linked and organized to form Data Ocean, which specially emphasizes the deep exploration of the relationships among unstructured data to support smart services. Currently, almost all of these applications have to deal with unstructured data by integrating various analysis and search techniques upon massive storage and processing infrastructure at the application level, which greatly increase the difficulty and cost of application development.

This paper presents D-Ocean, an unstructured data management system for data ocean environment. D-Ocean has an open and scalable architecture, which consists of a core platform, pluggable components and auxiliary tools. It exploits a unified storage framework to store data in different kinds of data stores, integrates batch and incremental processing mechanisms to process unstructured data, and provides a combined search engine to conduct compound queries. Furthermore, a so-called RAISE process modeling is proposed to support the whole process of Repository, Analysis, Index, Search and Environment modeling, which can greatly simplify application development. The experiments and use cases in production demonstrate the efficiency and usability of D-Ocean.

Article PDF

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Cui B, Mei H, Ooi BC. Big data: the driver for innovation in databases. National Science Review, 2014, 1(1): 27–30
Article Google Scholar
Laney D. 3D data management: controlling data volume, velocity and variety. META Group Research Note, 2001, 6: 70
Google Scholar
David F, Adam L. UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 2004, 10(3–4): 327–348
Google Scholar
Pan Y. Important developments for the digital library: data ocean and smart library. Journal of Zhejiang University-Science C, 2010, 11(11): 835–836
Article Google Scholar
Martinez J M, Pereira F. MPEG-7: the generic multimedia content description standard, part 1. MultiMedia, IEEE, 2002, 9(2): 78–87
Article Google Scholar
Doller M, Tous R, Gruhne M, Yoon K J, Sano M, Burnett I S. The MPEG query format: unifying access to multimedia retrieval systems. MultiMedia, IEEE, 2008, 15(4): 82–95
Article Google Scholar
Melton J, Eisenberg A. SQL multimedia and application packages (SQL/MM). ACM Sigmod Record, 2001, 30(4): 97–102
Article Google Scholar
Buneman P, Davidson S, Hillebrand G, Suciu D. A query language and optimization techniques for unstructured data. ACMSIGMOD Record, 1996, 25(2): 505–516
Article Google Scholar
Halevy A, Franklin M, Maier D. Principles of dataspace systems. In: Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, 2006, 1–9
Google Scholar
Dittrich J P, Salles M V. iDM: a unified and versatile data model for personal dataspace management. In: Proceedings of the 32nd International Conference on Very Large Data Bases. 2006, 367–378
Google Scholar
Stonebraker M, Weisberg A. The voltdb main memory dbms. IEEE Data Engineering Bulletin, 2013, 36(2): 21–27
Google Scholar
LeFevre J, Sankaranarayanan J, Hacigumus H, Tatemura J, Polyzotis N, Carey M J. MISO: souping up big data query processing with a multistore system. In: Proceedings of the ACM SIGMOD International Conference on Management of Data. 2014, 1591–1602
Google Scholar
Dean J, Ghemawat S. MapReduce: simplified data processing on large clusters. Communications of the ACM, 2008, 51(1): 107–113
Article Google Scholar
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, Mc Cayley M, Franklin MJ, Shenker S, Stoica I. Resilient distributed datasets: a faulttolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. 2012, 2
Google Scholar
Oscar B, Sam R, Ian O C, Jimmy L. Summingbird: a framework for integrating batch and online MapReduce computations. Proceedings of the VLDB Endowment, 2014, 7(13): 1441–1451
Article Google Scholar
Jiang D, Chen G, Ooi B C, Tan K L, Wu S. epiC: an extensible and scalable system for processing big data. Proceedings of the VLDB Endowment, 2014, 7(7): 541–552
Article Google Scholar
Lewis D D, Jones K S. Natural language processing for information retrieval. Communications of the ACM, 1996, 39(1): 92–101
Article Google Scholar
Lew M S, Sebe N, Djeraba C, Jain R. Content-based multimedia information retrieval: state of the art and challenges. ACM Transactions on Multimedia Computing, Communications, and Applications, 2006, 2(1): 1–19
Article Google Scholar
Wu E, Diao Y, Rizvi S. High-performance complex event processing over streams. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data. 2006, 407–418
Chapter Google Scholar
Lux M, Chatzichristofis S A. Lire: lucene image retrieval: an extensible java CBIR library. In: Proceedings of the 16th ACM International Conference on Multimedia. 2008, 1085–1088
Chapter Google Scholar
Brenna L, Demers A, Gehrke J, Hong M, Ossher J, Panda B, Riedewald M, Thatte M, White W. Cayuga: a high-performance event processing engine. In: Proceedings of the 2007 ACMSIGMOD International Conference on Management of Data. 2007, 1100–1102
Google Scholar
Alsubaiee S, Altowim Y, Altwaijry H, Behm A, Borkar V, Bu Y, Carey M Cetindil I, Cheelangi M, Faraaz K. AsterixDB: a scalable, open source BDMS. Proceedings of the VLDB Endowment, 2014, 7(14): 1905–1916
Article Google Scholar
Wang Y, Lu W, Wei B. Transactional multi-row access guarantee in the key-value store. In: Proceedings of the International Conference on Cluster Computing. 2012, 572–575
Google Scholar
Yu Q. FastDFS: framework analysis and configuration optimization. In: Proceedings of Database Technology Conference China. 2012
Google Scholar
Meng X, Wang X, Xie M, Zhang X, Zhou J. OrientX: an integrated, schema based native XML database system. Wuhan University Journal of Natural Sciences, 2006, 11(5): 1192–1196
Article MATH Google Scholar
Isard M, Budiu M, Yu Y, Birrell A, Fetterly D. Dryad: distributed dataparallel programs from sequential building blocks. ACM SIGOPS Operating Systems Review, 2007, 41(3): 59–72
Article Google Scholar
Jarke M, Koch J. Query optimization in database systems. ACM Computing Surveys, 1984, 16(2): 111–152
Article MathSciNet MATH Google Scholar
Fagin R, Lotem A, Naor M. Optimal aggregation algorithms for middleware. Journal of Computer and System Sciences, 2003, 66(4): 614–656
Article MathSciNet MATH Google Scholar
Zhuang Y, Liu Y, Wu F, Zhang Y, Shao J. Hypergraph spectral hashing for similarity search of social image. In: Proceedings of the 19th ACM International Conference on Multimedia. 2011, 1457–1460
Chapter Google Scholar
Pavlo A, Paulson E, Rasin A, Abadi D, Dewitt D J, Madden S, Stonebraker M. A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data. 2009, 165–178
Google Scholar
Lu W, Zheng L, Shao J, Wei B, Zhuang Y. Digital library engine: adapting digital library for cloud computing. In: Proceedings of the 6th IEEE International Conference on Cloud Computing. 2013, 934–941
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Yueting Zhuang, Yaoguang Wang, Jian Shao, Ling Chen, Weiming Lu, Jianling Sun, Baogang Wei & Jiangqin Wu

Authors

Yueting Zhuang
View author publications
You can also search for this author in PubMed Google Scholar
Yaoguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Shao
View author publications
You can also search for this author in PubMed Google Scholar
Ling Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jianling Sun
View author publications
You can also search for this author in PubMed Google Scholar
Baogang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Jiangqin Wu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Shao.

Additional information

Yueting Zhuang received his BS, MS and PhD in computer science from Zhejiang University (ZJU), China in 1986, 1989 and 1998, respectively. From 1997 to 1998, he was a visiting scholar at Prof. Thomas Huang’s group, University of Illinois at Urbana- Champaign, USA. Currently, he is a professor at the College of Computer Science, ZJU. His research interests mainly include artificial intelligence, multimedia retrieval, computer animation, digital library and databases.

Yaoguang Wang received his BS from South China University of Technology, China in 2010. He is currently a PhD student in Zhejiang University, China. His research interests include massive data storage management, parallel data processing, and distributed system.

Jian Shao received his BS in Department of Electronic Science and Engineering from Nanjing University, China in 2003, and his PhD in Institute of Acoustics, Chinese Academy of Science, China in 2008. Currently, he is an associate professor at the College of Computer Science, Zhejiang University, China. His research interests include cross media retrieval, artificial intelligence, and unstructured data management.

Ling Chen received his BS and PhD in computer science from Zhejiang University (ZJU), China in 1999 and 2004, respectively. Currently, he is an associate professor in the College of Computer Science, ZJU. His research interests include ubiquitous computing, HCI, AI, pattern recognition, distributed systems, databases, and data mining.

Weiming Lu received his PhD from Zhejiang University (ZJU), China in 2009. He is currently a lecturer in ZJU. His research interests are multimedia analysis and retrieval, artificial intelligence, digital library and unstructured data management.

Jianling Sun received his PhD in computer science from Zhejiang University (ZJU), China in 1993. Currently, he is a professor in the college of computer science of ZJU. His research interests include databases, data mining, distributed systems, and financial Information Technology.

Baogang Wei received his PhD from Northwestern Polytechnical University, China in 1997. He is currently a professor at Zhejiang University, China. His main research interests include artificial intelligence, pattern recognition, digital library, and information and knowledge management.

Jiangqin Wu received her PhD from Harbin Institute of Technology, China. Currently, she is an associate professor in Zhejiang University, China. Her research interests include multimedia computing, pattern recognition, and digital library.

Electronic supplementary material

Supplementary material, approximately 378 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhuang, Y., Wang, Y., Shao, J. et al. D-Ocean: an unstructured data management system for data ocean environment. Front. Comput. Sci. 10, 353–369 (2016). https://doi.org/10.1007/s11704-015-5045-6

Download citation

Received: 29 January 2015
Accepted: 12 June 2015
Published: 20 October 2015
Issue Date: April 2016
DOI: https://doi.org/10.1007/s11704-015-5045-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

D-Ocean: an unstructured data management system for data ocean environment

Abstract

Article PDF

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 378 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation