Shared-Table for Textual Data Clustering in Distributed Relational Databases

Yafooz, Wael M. S.; Abidin, Siti Z. Z.; Omar, Nasiroh; Halim, Rosenah A.

doi:10.1007/978-981-4585-18-7_6

Wael M. S. Yafooz⁴,
Siti Z. Z. Abidin⁴,
Nasiroh Omar⁴ &
…
Rosenah A. Halim⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 285))

3073 Accesses

Abstract

High-performance query processing is a significant requirement of database administrators that can be achieved by grouping data into continuous hard disk pages. Such performance can be achieved by using database partitioning techniques. Database partitioning techniques aid in splitting of the physical structure of database tables into small partitions. A distributed database management system is advantageous for many businesses because such a system aids in the achievement of high-performance processing. However, massive amount of data distributed over network nodes affect query processing when retrieving data from different nodes. This study proposes a novel technique based on a shared-table in a relational database under a distributed environment to achieve high-performance query processing by using data mining techniques. A shared-table is used as a guide to show where the data should be saved. Thus, the efficiency of query processing will improve when data is saved at the same location. The proposed method is suitable for news agencies and domains that rely on massive amount of textual data.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Semantic Partitioning for RDF Datasets

A Scalable Distributed Query Framework for Unstructured Big Clinical Data: A Case Study on Diabetic Records

Building self-clustering RDF databases using Tunable-LSH

Article 03 December 2018

Keywords

References

Abuelyaman, E.S., An Optimized Scheme for Vertical Partitioning of a Distributed Database. IJCSNS International Journal of Computer Science and Network Security, 2008. VOL.8 No.1: p. 310-316.
Google Scholar
Khan, S.I. and D.A.S.M.L. Hoque, A New Technique for Database Fragmentation in Distributed Systems. International Journal of Computer Applications, 2010. Volume 5– No.9: p. 0975 – 8887.
Google Scholar
Chu, W.W. and I.T. Ieong, A Transaction-Based Approach to Vertical Partitioning for Relational Database Systems. Software Engineering, IEEE Transactions on, 1993. VOL. 19, NO. 8.
Google Scholar
Li, L. and L. Gruenwald, Autonomous Database Partitioning using Data Mining on Single Computers and Cluster Computers. Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012.
Google Scholar
Ma, H., K.-D. Schewe, and M. Kirchberg, A Heuristic Approach to Vertical Fragmentation Incorporating Query Information. Databases and Information Systems, 2006. 7th International Baltic Conference on. IEEE: p. 69-76.
Google Scholar
Rodriguez, L. and X. Li, A vertical partitioning algorithm for distributed multimedia databases.. In e. a. A Hameurlain, editor, Proceedings of DEXA,. Springer Verlag, 2011. Vol 6861 (544—558).
Google Scholar
RodríguezA, L. and X. Li, A dynamic vertical partitioning approach for distributed database system. Systems, Man, and Cybernetics (SMC), IEEE International Conference on. IEEE, 2011.
Google Scholar
Song, S. and N. Gorla, A genetic Algorithm for Vertical Fragmentation and Access Path Selection. The Computer Journal, 2000. vol. 45, no. 1: p. 81-93.
Google Scholar
Zhang, Y., On horizontal fragmentation of distributed database design. in M. Orlowska & M. Papazoglou, eds, Advances in Database Re- search, 1993. World Scientific Publishing: p. 121-130.
Google Scholar
Ceri, S., M. Negri, and G. Pelagatti, Horizontal data partitioning in database design. in Proc. ACM SIGMOD, 1982.
Google Scholar
S. Navathe, K.K., Minyoung Ra, Amixed fragmentation methodology for initial distributed database design. Journal of Computer and Software Engineering 1995. 3.4 (1995): p. 395-426.
Google Scholar
Gorla, N., V. Ng, and D.M. Law, Improving database performance with a mixed fragmentation design. J Intell Inf Syst (2012) 39, 2012. 39: p. 559–576.
Google Scholar
Hoffer, H.A. and D.G. Severance, The Use of Cluster Analysis in Physical Database Design. Proceedings First Internutionul Conference on Vety Large Data Bases, 1975.
Google Scholar
Navathe, S., et al., Vertical partitioning algorithms for database design. ACM Transactions on Database Systems (TODS) 9.4, 1984: p. 680-710.
Google Scholar
Navathe, S.B. and M. Ra, Vertical Partitioning for Database Design: A Graphical Algorithm. ACM SIGMOD Record 18.2, 1989.
Google Scholar
Ra, M., Horizontal partitioning for distributed database design. In Advances in Database Research, World Scientific Publishing, 1993: p. 101–120.
Google Scholar
Ng, V., et al., Applying genetic algorithms in database partitioning. SAC ‘03 Proceedings of the ACM symposium on Applied computing, 2003: p. 544-549.
Google Scholar
Ozsu, M.T. and P. Valduriez, Principles of Distributed Database Systems. 2nd ed., New Jersey: Prentice-Hall, 1999.
Google Scholar
McCormick, W.T., P.J. Schweitzer, and T.W. White, Problem decomposition and data reorganization by a clustering technique. 1972. Operations Research 20.5: p. 993-1009.
Google Scholar
Chakravarthy, S., et al., An objective function for vertically partitioning relations in distributed databases and its analysis. Distributed and parallel databases 2.2 1994. 183-207.
Google Scholar
Muthuraj, J., et al., A formal approach to the vertical partitioning problem in distributed database design. Parallel and Distributed Information Systems, Proceedings of the Second International Conference on. IEEE, 1993.
Google Scholar
Guinepain, S. and L. Gruenwald, Using Cluster Computing to Support Automatic and Dynamic Database Clustering. Cluster Computing, 2008 IEEE International Conference on. IEEE, 2008.
Google Scholar
Rodríguez, L., et al., DYMOND: An Active System for Dynamic Vertical Partitioning of Multimedia Databases. Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012., 2012.
Google Scholar
Cheng, C.-H., W.-K. Lee, and K.-F. Wong, A Genetic Algorithm-Based Clustering Approach for Database Partitioning. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 2002. VOL. 32, NO. 3: p. 215-230.
Google Scholar
Surmsuk, P. and S. Thanawastien, The Integrated Strategic Information System Planning Methodology. 11th IEEE International Enterprise Distributed Object Computing Conference, 2007.
Google Scholar
Montalvo, S., F. Víctor, and M. Raquel, NESM: a Named Entity based Proximity Measure for Multilingual News Clustering. Procesamiento de Lenguaje Natural, 2012. 48: p. 81-88.
Google Scholar
Cao, T.H., T.M. Tang, and C.K. Chau, Data Mining: Foundations and Intelligent Paradigms Springer Berlin Heidelberg, 2012: p. 267-287.
Google Scholar
YafoozB, W.M.S., S.Z. Abidin, and N. Omar, Challenges and issues on online news management. Control System, Computing and Engineering (ICCSCE),IEEE International Conference on., 2011.
Google Scholar
Krishna, S.M. and S.D. Bhavani, An Efficient Approach for Text Clustering Based on Frequent Itemsets. European Journal of Scientific Research, 2010. ISSN 1450-216X Vol.42 No.3: p. 399-410.
Google Scholar
Beil, F., M. Ester, and X. Xu, Frequent Term-Based Text Clustering. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.
Google Scholar

Download references

Acknowledgments

The authors wish to thank Universiti Teknologi MARA (UiTM) for the financial support. This work was supported in part by a grant number 600-RMI-/DANA 5/3/RIF (498/2012).

Author information

Authors and Affiliations

Faculty of Computer and Mathematical Sciences, UiTM, Shah Alam, Selagor, Malaysia
Wael M. S. Yafooz, Siti Z. Z. Abidin, Nasiroh Omar & Rosenah A. Halim

Authors

Wael M. S. Yafooz
View author publications
You can also search for this author in PubMed Google Scholar
Siti Z. Z. Abidin
View author publications
You can also search for this author in PubMed Google Scholar
Nasiroh Omar
View author publications
You can also search for this author in PubMed Google Scholar
Rosenah A. Halim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wael M. S. Yafooz .

Editor information

Editors and Affiliations

Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Tutut Herawan
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, Batu Pahat, Malaysia
Mustafa Mat Deris
School of Information Technology, Deakin University, Burwood, Victoria, Australia
Jemal Abawajy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yafooz, W.M.S., Abidin, S.Z.Z., Omar, N., Halim, R.A. (2014). Shared-Table for Textual Data Clustering in Distributed Relational Databases. In: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore. https://doi.org/10.1007/978-981-4585-18-7_6

Download citation

DOI: https://doi.org/10.1007/978-981-4585-18-7_6
Published: 15 December 2013
Publisher Name: Springer, Singapore
Print ISBN: 978-981-4585-17-0
Online ISBN: 978-981-4585-18-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Shared-Table for Textual Data Clustering in Distributed Relational Databases

Abstract

Chapter PDF

Similar content being viewed by others

Semantic Partitioning for RDF Datasets

A Scalable Distributed Query Framework for Unstructured Big Clinical Data: A Case Study on Diabetic Records

Building self-clustering RDF databases using Tunable-LSH

Keywords

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Shared-Table for Textual Data Clustering in Distributed Relational Databases

Abstract

Chapter PDF

Similar content being viewed by others

Semantic Partitioning for RDF Datasets

A Scalable Distributed Query Framework for Unstructured Big Clinical Data: A Case Study on Diabetic Records

Building self-clustering RDF databases using Tunable-LSH

Keywords

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation