Abstract
We are living in a world of heavy data bombing and the term Big Data is a key issue these days. The variety of applications, where huge amounts of data are produced (can be expressed in PBs and more), is great in many areas such as: Biology, Medicine, Astronomy, Geology, Geography, to name just a few. This trend is steadily increasing. Data Mining is the process for extracting useful information from large data-sets. There are different approaches to discovering properties of datasets. Machine Learning is one of them. In Machine Learning, unsupervised learning deals with unlabeled datasets. One of the primary approaches to unsupervised learning is clustering which is the process of grouping similar entities together. Therefore, it is a challenge to improve the performance of such techniques, especially when we are dealing with huge amounts of data. In this work, we present a survey of techniques which increase the efficiency of two well-known clustering algorithms, k-means and DBSCAN.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference on Knowledge Discovery, pp. 226–231 (1996)
MPICH, Message Passing Interface. http://www.mpich.org/. Accessed 21 Apr 2019
OpenMP, Open Multi-Processing. http://www.openmp.org/. Accessed 21 Apr 2019
CUDA Zone: NVDIA Accelerated Computing. https://developer.nvidia.com/cudazone. Accessed 21 Apr 2019
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 137–150 (2004)
Kang, S.J., Lee, S.-H., Lee, K.-M.: Performance comparison of OpenMP, MPI, and MapReduce in practical problems. In: Advances in MM, pp. 575687:1–575687:9 (2015)
He, Y., Tan, H., Luo, W., Feng, S., Fan, J.: MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8(1), 83–99 (2014)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2011)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Xu, R., Wunsch, D.: Clustering. Wiley-IEEE Press, Hoboken (2008)
Kadam, P., Jadhav, S., Kulkarni, A., Kulkarni, S.: Survey of parallel implementations of clustering algorithms. Int. J. Adv. Res. 6(10) (2017)
Farivar, R., Rebolledo, D., Chan, E., Campbell, R.H.: A parallel implementation of K-Means clustering on GPUs. In: Arabnia, H.R., Mun, Y. (eds.) PDPTA, pp. 340–345. CSREA Press (2009)
Zhao, W., Ma, H., He, Q.: Parallel K-means clustering based on MapReduce. In: Jaatun, M.G., et al. (eds.) CloudCom, pp. 674–679. Springer, Heidelberg (2009)
Savvas, I.K., Kechadi, M.T.: Mining on the cloud - K-means with MapReduce. In: Leymann, F., et al. (eds.) CLOSER, pp. 413–418. SciTePress (2012)
Yang, L., Chiu, S.C., Liao, W.K., Thomas, M.A.: High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J. Supercomput. 70(1), 284–300 (2014)
Jin, S., Cui, Y., Yu, C.: A new parallelization method for k-means. CoRR. Abs/1608.06347 (2016)
Shahrivari, S., Jalili, S.: Single-pass and linear-time K-means clustering based on MapReduce. Inf. Syst. 60, 1–12 (2016)
Savvas, I.K., Tselios, D.C.: Combining distributed and multi-core programming techniques to increase the performance of K-Means algorithm. In: Reddy, S., et al. (eds.) WETICE, pp. 95–100. IEEE Computer Society (2017)
Savvas, I.K., Sofianidou, G.N.: A novel near-parallel version of k-means algorithm for n-dimensional data objects using MPI. IJGUC 7(2), 80–91 (2016)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-Means++. CoRR. Abs/1203.6402 (2012)
Wowczko, I.A.: Density-based clustering with DBSCAN and OPTICS. Business Intelligence and Data Mining, (2013)
Arlia, D., Coppola, M.: Experiments in parallel clustering with DBSCAN. In: Sakellariou, R. et al. (eds.) Euro-Par, pp. 326–331. Springer, Heidelberg (2001)
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V.: PVM: Parallel Virtual Machine—A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)
Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of ACM-SIGMOD International Conference on Management of Data, Atlantic City, NJ, pp. 322–331 (1990)
Patwary, M.M.A., Palsetia, D., Agrawal, A., Liao, K.W., Manne, F., Choudhary, A.N.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: Hollingsworth, J.K. (ed.) SC, p. 62. IEEE/ACM (2012)
Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Cheung, D.W.-L., et al. (eds.) CIKM, pp. 661–670. ACM (2009)
Loh, W.-K., Moon, Y.-S., Park, Y.-H.: Fast density-based clustering using graphics processing units. IEICE Trans. Inf. Syst. 97(7), 1947–1951 (2014)
Savvas, I.K., Tselios, D.C.: Parallelizing DBSCaN algorithm using MPI. In: Reddy, S., Gaaloul, W. (eds.) WETICE, pp. 77–82. IEEE Computer Society (2016)
Song, H., Lee, J.-G.: RP-DBSCAN: a superfast parallel DBSCAN algorithm based on random partitioning. In: Das, G., et al. (eds.) SIGMOD Conference, pp. 1173–1187. ACM (2018)
Acknowledgments
The reported study was funded by RFBR according to the research project 19-01-246-a, 19-07-00329-a, 18-01-00402-a, 18-08-00549-a.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Savvas, I.K., Michos, C., Chernov, A., Butakova, M. (2020). High Performance Clustering Techniques: A Survey. In: Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds) Proceedings of the Fourth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’19). IITI 2019. Advances in Intelligent Systems and Computing, vol 1156. Springer, Cham. https://doi.org/10.1007/978-3-030-50097-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-50097-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50096-2
Online ISBN: 978-3-030-50097-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)