Abstract
High-performance query processing is a significant requirement of database administrators that can be achieved by grouping data into continuous hard disk pages. Such performance can be achieved by using database partitioning techniques. Database partitioning techniques aid in splitting of the physical structure of database tables into small partitions. A distributed database management system is advantageous for many businesses because such a system aids in the achievement of high-performance processing. However, massive amount of data distributed over network nodes affect query processing when retrieving data from different nodes. This study proposes a novel technique based on a shared-table in a relational database under a distributed environment to achieve high-performance query processing by using data mining techniques. A shared-table is used as a guide to show where the data should be saved. Thus, the efficiency of query processing will improve when data is saved at the same location. The proposed method is suitable for news agencies and domains that rely on massive amount of textual data.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Abuelyaman, E.S., An Optimized Scheme for Vertical Partitioning of a Distributed Database. IJCSNS International Journal of Computer Science and Network Security, 2008. VOL.8 No.1: p. 310-316.
Khan, S.I. and D.A.S.M.L. Hoque, A New Technique for Database Fragmentation in Distributed Systems. International Journal of Computer Applications, 2010. Volume 5– No.9: p. 0975 – 8887.
Chu, W.W. and I.T. Ieong, A Transaction-Based Approach to Vertical Partitioning for Relational Database Systems. Software Engineering, IEEE Transactions on, 1993. VOL. 19, NO. 8.
Li, L. and L. Gruenwald, Autonomous Database Partitioning using Data Mining on Single Computers and Cluster Computers. Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012.
Ma, H., K.-D. Schewe, and M. Kirchberg, A Heuristic Approach to Vertical Fragmentation Incorporating Query Information. Databases and Information Systems, 2006. 7th International Baltic Conference on. IEEE: p. 69-76.
Rodriguez, L. and X. Li, A vertical partitioning algorithm for distributed multimedia databases.. In e. a. A Hameurlain, editor, Proceedings of DEXA,. Springer Verlag, 2011. Vol 6861 (544—558).
RodríguezA, L. and X. Li, A dynamic vertical partitioning approach for distributed database system. Systems, Man, and Cybernetics (SMC), IEEE International Conference on. IEEE, 2011.
Song, S. and N. Gorla, A genetic Algorithm for Vertical Fragmentation and Access Path Selection. The Computer Journal, 2000. vol. 45, no. 1: p. 81-93.
Zhang, Y., On horizontal fragmentation of distributed database design. in M. Orlowska & M. Papazoglou, eds, Advances in Database Re- search, 1993. World Scientific Publishing: p. 121-130.
Ceri, S., M. Negri, and G. Pelagatti, Horizontal data partitioning in database design. in Proc. ACM SIGMOD, 1982.
S. Navathe, K.K., Minyoung Ra, Amixed fragmentation methodology for initial distributed database design. Journal of Computer and Software Engineering 1995. 3.4 (1995): p. 395-426.
Gorla, N., V. Ng, and D.M. Law, Improving database performance with a mixed fragmentation design. J Intell Inf Syst (2012) 39, 2012. 39: p. 559–576.
Hoffer, H.A. and D.G. Severance, The Use of Cluster Analysis in Physical Database Design. Proceedings First Internutionul Conference on Vety Large Data Bases, 1975.
Navathe, S., et al., Vertical partitioning algorithms for database design. ACM Transactions on Database Systems (TODS) 9.4, 1984: p. 680-710.
Navathe, S.B. and M. Ra, Vertical Partitioning for Database Design: A Graphical Algorithm. ACM SIGMOD Record 18.2, 1989.
Ra, M., Horizontal partitioning for distributed database design. In Advances in Database Research, World Scientific Publishing, 1993: p. 101–120.
Ng, V., et al., Applying genetic algorithms in database partitioning. SAC ‘03 Proceedings of the ACM symposium on Applied computing, 2003: p. 544-549.
Ozsu, M.T. and P. Valduriez, Principles of Distributed Database Systems. 2nd ed., New Jersey: Prentice-Hall, 1999.
McCormick, W.T., P.J. Schweitzer, and T.W. White, Problem decomposition and data reorganization by a clustering technique. 1972. Operations Research 20.5: p. 993-1009.
Chakravarthy, S., et al., An objective function for vertically partitioning relations in distributed databases and its analysis. Distributed and parallel databases 2.2 1994. 183-207.
Muthuraj, J., et al., A formal approach to the vertical partitioning problem in distributed database design. Parallel and Distributed Information Systems, Proceedings of the Second International Conference on. IEEE, 1993.
Guinepain, S. and L. Gruenwald, Using Cluster Computing to Support Automatic and Dynamic Database Clustering. Cluster Computing, 2008 IEEE International Conference on. IEEE, 2008.
Rodríguez, L., et al., DYMOND: An Active System for Dynamic Vertical Partitioning of Multimedia Databases. Proceedings of the 16th International Database Engineering & Applications Sysmposium. ACM, 2012., 2012.
Cheng, C.-H., W.-K. Lee, and K.-F. Wong, A Genetic Algorithm-Based Clustering Approach for Database Partitioning. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 2002. VOL. 32, NO. 3: p. 215-230.
Surmsuk, P. and S. Thanawastien, The Integrated Strategic Information System Planning Methodology. 11th IEEE International Enterprise Distributed Object Computing Conference, 2007.
Montalvo, S., F. Víctor, and M. Raquel, NESM: a Named Entity based Proximity Measure for Multilingual News Clustering. Procesamiento de Lenguaje Natural, 2012. 48: p. 81-88.
Cao, T.H., T.M. Tang, and C.K. Chau, Data Mining: Foundations and Intelligent Paradigms Springer Berlin Heidelberg, 2012: p. 267-287.
YafoozB, W.M.S., S.Z. Abidin, and N. Omar, Challenges and issues on online news management. Control System, Computing and Engineering (ICCSCE),IEEE International Conference on., 2011.
Krishna, S.M. and S.D. Bhavani, An Efficient Approach for Text Clustering Based on Frequent Itemsets. European Journal of Scientific Research, 2010. ISSN 1450-216X Vol.42 No.3: p. 399-410.
Beil, F., M. Ester, and X. Xu, Frequent Term-Based Text Clustering. Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2002.
Acknowledgments
The authors wish to thank Universiti Teknologi MARA (UiTM) for the financial support. This work was supported in part by a grant number 600-RMI-/DANA 5/3/RIF (498/2012).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media Singapore
About this paper
Cite this paper
Yafooz, W.M.S., Abidin, S.Z.Z., Omar, N., Halim, R.A. (2014). Shared-Table for Textual Data Clustering in Distributed Relational Databases. In: Herawan, T., Deris, M., Abawajy, J. (eds) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). Lecture Notes in Electrical Engineering, vol 285. Springer, Singapore. https://doi.org/10.1007/978-981-4585-18-7_6
Download citation
DOI: https://doi.org/10.1007/978-981-4585-18-7_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-4585-17-0
Online ISBN: 978-981-4585-18-7
eBook Packages: EngineeringEngineering (R0)