Abstract
Machine learning techniques applied to data-mining face the challenge of time and memory requirements, and for this purpose should make full profit of the increase in power that recent multi-core processors bring. When applied to sparse data, it is also sometimes necessary to find an appropriate reformulation of the algorithms, keeping in mind that memory load was and still is an issue. In [1], we presented a mathematical reformulation of the standard and the batch versions of the Self-Organizing Map algorithm for sparse data, proposed a parallel implementation of the batch version, and carried out initial performance evaluation tests. We here reproduce and extend our experiments on a more powerful hardware architecture and compare the results to our previous ones. A thorough quantitative and qualitative analysis confirms our preceding results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Using the squared distance here is equivalent to using the euclidean distance, and avoids the square root computation.
- 2.
Complexity of Eq. (2) does not depend on vector size and it is only \(\mathcal {O}(T M)\).
- 3.
We used version 1.7.4. Later versions of Somoclu have been improved by us and use more efficient sparse computation (see: https://github.com/peterwittek/somoclu/commit/d5ffcf250db77aa103a9de96968ef0e27dc14d15).
References
Melka, J., Mariage, J.: Efficient implementation of self-organizing map for sparse input data. In: Proceedings of the 9th International Joint Conference on Computational Intelligence, IJCCI 2017, pp. 54–63, Funchal, Madeira, Portugal (2017)
Ultsch, A.: Data mining and knowledge discovery with emergent self-organizing feature maps for multivariate time series. Kohonen Maps 46, 33–46 (1999)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)
Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: Newsgroup exploration with WEBSOM method and browsing interface. Technical Report A32, Helsinki University of Technology (1996)
Kaski, S., Honkela, T., Lagus, K., Kohonen, T.: WEBSOM-self-organizing maps of document collections. Neurocomputing 21, 101–117 (1998)
Ultsch, A., Mörchen, F.: ESOM-Maps: tools for clustering, visualization, and classification with Emergent SOM. Technical Report 46, Department of Mathematics and Computer Science, University of Marburg, Germany (2005)
Polzlbauer, G., Dittenbach, M., Rauber, A.: A visualization technique for self-organizing maps with vector fields to obtain the cluster structure at desired levels of detail. In: Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, 2005, IJCNN’05, vol. 3, pp. 1558–1563. IEEE (2005)
Vesanto, J., Ahola, J.: Hunting for correlations in data using the self-organizing map. In: Proceeding of the International ICSC Congress on Computational Intelligence Methods and Applications (CIMA’99), pp. 279–285. ICSC Academic Press (1999)
Carpenter, G.A., Grossberg, S.: Art 2: self-organization of stable category recognition codes for analog input patterns. Appl. Opt. 26, 4919–4930 (1987)
He, J., Tan, A.-H., Tan, C.-L.: Modified ART 2A growing network capable of generating a fixed number of nodes. IEEE Trans. Neural Netw. 15, 728–737 (2004)
Carpenter, G.A., Grossberg, S., Rosen, D.B.: ART 2-A: an adaptive resonance algorithm for rapid category learning and recognition. Neural Netw. 4, 493–504 (1991)
Wittek, P., Gao, S.C., Lim, I.S., Zhao, L.: Somoclu: an efficient parallel library for self-organizing maps. J. Stat. Softw. 78, 1–21 (2017)
Liao, G., Chen, P., Du, L., Su, L., Liu, Z., Tang, Z., Shi, T.: Using SOM neural network for X-ray inspection of missing-bump defects in three-dimensional integration. Microelectron. Reliab. 55, 2826–2832 (2015)
Kohonen, T.: Self-Organizing Maps. 2nd edn. Springer Series in Information Sciences, vol. 30. Springer, Berlin (1997)
Kohonen, T.: Things you haven’t heard about the self-organizing map. In: 1993 IEEE International Conference on Neural Networks, pp. 1147–1156 (1993)
Mulier, F., Cherkassky, V.: Self-organization as an iterative Kernel smoothing process. Neural Comput. 7, 1165–1177 (1995)
Cheng, Y.: Convergence and ordering of Kohonen’s batch map. Neural Comput. 9, 1667–1676 (1997)
Ienne, P., Thiran, P., Vassilas, N.: Modified self-organizing feature map algorithms for efficient digital hardware implementation. IEEE Trans. Neural Netw. 8, 315–330 (1997)
Lawrence, R.D., Almasi, G.S., Rushmeier, H.E.: A scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems. Data Min. Knowl. Discov. 3, 171–195 (1999)
Maiorana, F.: Performance improvements of a Kohonen self organizing classification algorithm on sparse data sets. In: Proceedings of the 10th WSEAS International Conference on Mathematical Methods, Computational Techniques and Intelligent Systems, MAMECTIS’08, pp. 347–352. World Scientific and Engineering Academy and Society (WSEAS) (2008)
Natarajan, R.: Exploratory data analysis in large, sparse datasets. Technical Report, IBM Thomas J. Watson Research Division (1997)
Roussinov, D.G., Chen, H.: A scalable self-organizing map algorithm for textual classification: a neural network approach to thesaurus generation. Commun. Cogn. Artif. Intell. J. (1998)
Kohonen, T.: Essentials of the self-organizing map. Neural Netw. 37, 52–65 (2013)
Olteanu, M., Villa-Vialaneix, N.: Sparse online self-organizing maps for large relational data. In: Advances in Self-Organizing Maps and Learning Vector Quantization (Proceedings of WSOM 2016). Advances in Intelligent Systems and Computing, vol. 428, pp. 27–37. Springer, Houston, Texas, USA (2016)
Wu, C.H., Hodges, R.E., Wang, C.J.: Parallelizing the self-organizing feature map on multiprocessor systems. Parallel Comput. 17, 821–832 (1991)
Seiffert, U., Michaelis, B.: Multi-dimensional self-organizing maps on massively parallel hardware. In: Advances in Self-Organising Maps, pp. 160–166. Springer, Berlin (2001)
Guan, H., Li, C.K., Cheung, T.Y., Yu, S.: Parallel design and implementation of SOM neural computing model in PVM environment of a distributed system. In: Proceedings of the Advances in Parallel and Distributed Computing, pp. 26–31. IEEE (1997)
Bandeira, N., Lobo, V., Moura-Pires, F.: Training a Self-Organizing Map distributed on a PVM network. In: 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence, vol. 1, pp. 457–461 (1998)
Tomsich, P., Rauber, A., Merkl, D.: Optimizing the parSOM neural network implementation for data mining with distributed memory systems and cluster computing. In: Proceedings 11th International Workshop on Database and Expert Systems Applications, pp. 661–665. IEEE (2000)
Labonté, G., Quintin, M.: Network parallel computing for SOM neural networks. In: High Performance Computing Systems and Applications, pp. 575–586. Springer, Berlin (2002)
Hämäläinen, T.D.: Parallel implementations of self-organizing maps. In: Seiffert, U., Jain, L.C. (eds.) Self-Organizing Neural Networks, pp. 245–278. Springer, New York (2002)
Campbell, A., Berglund, E., Streit, A.: Graphics hardware implementation of the parameter-less self-organising map. In: International Conference on Intelligent Data Engineering and Automated Learning, pp. 343–350. Springer, Berlin (2005)
Moraes, F.C., Botelho, S.C., Duarte Filho, N., Gaya, J.F.O.: Parallel high dimensional self organizing maps using CUDA. In: Robotics Symposium and Latin American Robotics Symposium (SBR-LARS), pp. 302–306. IEEE, Brazilian (2012)
Richardson, T., Winer, E.: Extending parallelization of the self-organizing map by combining data and network partitioned methods. Adv. Eng. Softw. 88, 1–7 (2015)
Daneshpajouh, H., Delisle, P., Boisson, J.C., Krajecki, M., Zakaria, N.: Parallel batch self-organizing map on graphics processing unit using CUDA. In: Latin American High Performance Computing Conference, pp. 87–100. Springer, Berlin (2017)
Wittek, P., Darányi, S.: Accelerating text mining workloads in a MapReduce-based distributed GPU environment. J. Parallel Distrib. Comput. 73, 198–206 (2013)
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Trans. Neural Netw. 11, 574–585 (2000)
Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Inf. Sci. 163, 135–156 (2004)
Takatsuka, M., Bui, M.: Parallel batch training of the self-organizing map using openCL. In: Neural Information Processing: Models and Applications, pp. 470–476. Springer, Berlin (2010)
Nordström, T.: Designing parallel computers for self organizing maps. In: Proceedings of the 4th Swedish Workshop on Computer System Architecture (DSA-92), pp. 13–15 (1992)
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5, 46–55 (1998)
Yang, M.H., Ahuja, N.: A data partition method for parallel self-organizing map. In: International Joint Conference on Neural Networks, IJCNN’99, vol. 3, pp. 1929–1933. IEEE (1999)
Silva, B., Marques, N.: A hybrid parallel SOM algorithm for large maps in data-mining. New Trends in Artificial Intelligence (2007)
Chang, C.C., Lin, C.J.: LIBSVM data: classification (Multi Class). https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html (2006)
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Identifying suspicious urls: an application of large-scale online learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 681–688. ACM (2009)
Stamper, J., Niculescu-Mizil, A., Ritter, S., Gordon, G., Koedinger, K.: Bridge to Algebra 2008–2009, Challenge data set from KDD Cup 2010 Educational Data Mining Challenge (2010)
Juan, Y., Zhuang, Y., Chin, W.S., Lin, C.J.: Field-aware factorization machines for CTR prediction. In: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 43–50. ACM (2016)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of the 12th International Conference on Machine Learning, pp. 331–339 (1995)
McCallum, A., Nigam, K.: A Comparison of event models for Naive Bayes text classification. In: AAAI/ICML-98 Workshop on Learning for Text Categorization. Technical Report WS-98-05, pp. 41–48 (1998)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Hull, J.J.: A database for handwritten text recognition research. IEEE Trans. Pattern Anal. Mach. Intell. 16, 550–554 (1994)
Wang, J.Y.: Application of support vector machines in bioinformatics. Ph.D. Thesis, National Taiwan University (2002)
Noordewier, M.O., Towell, G.G., Shavlik, J.W.: Training knowledge-based neural networks to recognize genes in DNA sequences. Adv. Neural Inf. Process. Syst. 3, 530–536 (1991)
King, R.D., Feng, C., Sutherland, A.: StatLog: comparison of classi cation algorithms on large real-world problems. Appl. Artif. Intell. Int. J. 9, 289–333 (1995)
Frey, P.W., Slate, D.J.: Letter recognition using Holland-style adaptive classifiers. Mach. Learn. 6, 161–182 (1991)
Fort, J.C., Letremy, P., Cottrell, M.: Advantages and drawbacks of the Batch Kohonen algorithm. ESANN 2, 223–230 (2002)
Nöcker, M., Mörchen, F., Ultsch, A.: An algorithm for fast and reliable ESOM learning. In: ESANN, 14th European Symposium on Artificial Neural Networks, pp. 131–136 (2006)
Kohonen, T., Hynninen, J., Kangas, J., Laaksonen, J.: SOM\(\_\)PAK: The self-organizing map program package. Report A31, Helsinki University of Technology, Laboratory of Computer and Information Science (1996)
Kiviluoto, K.: Topology preservation in self-organizing maps. In: IEEE International Conference on Neural Networks, vol. 1, pp. 294–299 (1996)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Perf Analysis of Serial Runs
Note: sparse-bsom-v2 is a variation of the Sparse-BSom algorithm with outer loop on data and inner loop on codebook in BMU search.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Melka, J., Mariage, JJ. (2019). Adapting Self-Organizing Map Algorithm to Sparse Data. In: Sabourin, C., Merelo, J.J., Madani, K., Warwick, K. (eds) Computational Intelligence. IJCCI 2017. Studies in Computational Intelligence, vol 829. Springer, Cham. https://doi.org/10.1007/978-3-030-16469-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-16469-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16468-3
Online ISBN: 978-3-030-16469-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)