Clustering Professional Baseball Players with SOM and Deciding Team Reinforcement Strategy with AHP

Kohara, Kazuhiro; Enomoto, Shota

doi:10.1007/978-3-319-95786-9_10

Kazuhiro Kohara¹⁴ &
Shota Enomoto¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10933))

Included in the following conference series:

Industrial Conference on Data Mining

1067 Accesses
1 Citations

Abstract

In this paper, we propose an integration method that uses self-organizing maps (SOM) and the analytic hierarchy process (AHP) to cluster professional baseball players and to make decision on team reinforcement strategy. We used data of pitchers in the Japanese professional baseball teams. First, we collected data of 302 pitchers and clustered these pitchers using the following fourteen features: number of games pitched, number of wins, number of loses, number of save, number of hold, number of innings pitched, rate of strikeout, ERA (earned run average), percentage of hits a pitcher allows, WHIP (walks plus hits per inning pitched), K/BB (strikeout to walk ratio), FIP (fielding independent pitching), LOB% (left on base percentage), RSAA (runs saved above average). Second, we created pitcher maps of all teams and each team with SOM. Third, we examined main features of each cluster. Fourth, we considered team reinforcement strategies by using the pitcher maps. Finally, we used AHP to determine the team reinforcement strategy.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Evaluation and Analysis of Relationship between Roles and Individual’s Role Adaptation in Team Sports Using SOM

Profiling the IPL Players—Sports Analytics Through Clustering Algorithms

An intelligent clustering framework for substitute recommendation and player selection

Article 27 April 2023

Keywords

1 Introduction

Machine learning and data mining techniques have been extensively investigated, and various attempts have been made to apply them to baseball e.g., [1,2,3,4,5]. Tolbert and Trafalis applied SVM (Support Vector Machine) to predicting MLB (Major League Baseball) championship winners [1]. Ishii applied K-means clustering to identifying undervalued baseball players [2]. Pane applied K-means clustering and Fisher-wise criterion to identifying clusters of MLB pitchers [3]. Tung applied PCA (Principal Component Analysis) and K-means clustering to analyzing a multivariate data set of career batting performances in MLB [4]. Vazquez applied time series and clustering algorithms to predicting baseball results [5]. In this paper, we propose an integration method that uses Self-Organizing Maps (SOM) [6] and the analytic hierarchy process (AHP) [7] to cluster professional baseball players and to make decision on team reinforcement strategy. We used data of pitchers in Japanese baseball teams. First, we collected data of 302 pitchers and clustered these pitchers using fourteen features. Second, we created pitcher maps of all teams and each team with SOM. Third, we examined main features of each cluster. Fourth, we considered team reinforcement strategies by using pitcher maps. Finally, we used AHP to determine the team reinforcement strategy.

2 Clustering Professional Baseball Players with SOM

The SOM algorithm is based on unsupervised, competitive learning [6]. It provides a topology preserving mapping from the high dimensional space to map units. Map units, or neurons, usually form a two-dimensional lattice and thus the mapping is a mapping from high dimensional space onto a plane.

Previously, we proposed a way of purchase decision support using SOM and AHP. First, we provided two class boundaries, which divide the range between the maximum and minimum of an input feature value into three equal parts. Second, we created self-organizing product maps using the classified inputs. We applied our way to five kinds of products and confirmed its effectiveness [8]. When we previously compared SOM with the other clustering algorithms (hierarchical clustering and K-means clustering) for product clustering, SOM were superior to the other clustering algorithms for both visibility and clustering ability [9]. Therefore, we used SOM for baseball players clustering.

We used data of pitchers of NPB (Nippon Professional Baseball Organization) [10]. We collected data of 302 pitchers in 2015 from Japanese professional baseball database [10, 11]. We clustered these pitchers using the following fourteen features: number of games pitched, number of wins, number of loses, number of save, number of hold, number of innings pitched, rate of strikeouts, ERA (earned run average), percentage of hits a pitcher allows, WHIP (walks plus hits per inning pitched), K/BB (strikeout to walk ratio), FIP (fielding independent pitching), LOB% (left on base percentage), RSAA (runs saved above average).

In each feature, we provide two class boundaries, which divide the range between the maximum and minimum of an input feature value into three equal parts. For classifying the data of the number of games pitched, we divided the number into three classes: under 27, over 28 to 50, and over 51. For classifying the data of the number of wins, we divided the number into three classes: under 5, over 6 to 10, and over 11. For classifying the data of the number of loses, we divided the number into three classes: under 4, over 5 to 8, and over 9. For classifying the data of the number of save, we divided the number into three classes: under 13, over 14 to 27, and over 28. For classifying the data of the number of hold, we divided the number into three classes: under 13, over 14 to 26, and over 27. For classifying the data of the number of innings pitched, we divided the number into three classes: under 74, over 75 to 140, and over 141. For classifying the data of the rate of strikeouts, we divided the rate into three classes: under 6.09, over 6.10 to 10.15, and over 10.16. For classifying the data of ERA, we divided ERA into three classes: under 3.52, over 3.53 to 6.64, and over 6.65. For classifying the data of the percentage of hits a pitcher allows, we divided the percentage into three classes: under 8.35, over 8.36 to 13.08, and over 13.09. For classifying the data of WHIP, we divided WHIP into three classes: under 1.36, over 1.37 to 2.08, and over 2.09. For classifying the data of K/BB, we divided K/BB into three classes: under 4.70, over 4.71 to 8.85, and over 8.86. For classifying the data of FIP, we divided FIP into three classes: under 3.20, over 3.21 to 5.27, and over 5.28. For classifying the data of LOB%, we divided LOB% into three classes: under 0.661, over 0.662 to 0.814, and over 0.815. For classifying the data of RSAA, we divided RSAA into three classes: under −2.1083, over −2.1082 to 16.65, and over 16.66.

Table 1 shows a part of the feature matrix for pitchers.

Table 1. A part of the feature matrix for pitchers.

Clustering Professional Baseball Players with SOM and Deciding Team Reinforcement Strategy with AHP

Abstract

Similar content being viewed by others

Evaluation and Analysis of Relationship between Roles and Individual’s Role Adaptation in Team Sports Using SOM

Profiling the IPL Players—Sports Analytics Through Clustering Algorithms

An intelligent clustering framework for substitute recommendation and player selection

Keywords

1 Introduction

2 Clustering Professional Baseball Players with SOM

3 Considering Team Reinforcement Strategies

4 Decision Making on Team Reinforcement Strategy with AHP

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation