Keywords

1.1 An Overview of High-Performance Sport

The term “High Performance” can be referred to as a process that encapsulates the optimization of techniques and procedures to accomplish exceptional results. In the sporting context, sports teams and organizations prioritize high-performance culture [1]. A typical team or an organization of sport constitutes individuals with various backgrounds, abilities, and obligations. These unique sets of people require diligent leadership, thought, and action to thrive. Essentially, all the individuals involved need to work harmoniously to realize the goal of achieving excellence.

Because of the political and financial significance of elite athlete performance at both individual and team levels, many high-performing nations have included elite sports on their national policy agenda [2]. Many nations’ policy agendas include an emphasis on elite sports policy, elite financing, and a deliberate approach to developing athletes. As a result of this, researchers are interested in better understanding elite sports systems, explaining variables that drive success and shaping policy.

The questions often asked are why some sports teams and nations excel while others fail in high-performance sports. The answer to this question has ignited debates which gave rise to the development of an emerging area of study for the past twenty years [2]. A few examples of such studies are reported by the preceding investigators [3,4,5]. However, it is important to note that several factors play a role in shaping performance at a high level, and such success in high-performance sports is determined by numerous indicators.

1.2 Responsibilities and Influence of Referees in Top-Flight Soccer Tournaments

The game of soccer also known as association football is considered an explosive team event in which players are required to execute intermittent and short duration of high-intensity exercises along with a concomitant recovery period of lower intensity [6,7,8]. It has been reported that at the top-flight soccer tournament, players cover distances exceeding 10 km per match which is approximately 600 m above 21 km per hour [9, 10]. As a result, professional soccer referees must be of a high level of fitness to follow the passage of the game throughout the players’ rapid movements, changes of direction, and sprints. It is worth noting that referees must be in the correct area and at the exact moment during every action, and they must attentively observe players’ behaviour, understand the rules, and make key judgments within seconds [11]. These demands and attributes make the work of referees highly complex and hitherto, saddled with the responsibility of keeping the game attractive via the implementation of valid and good judgement skills.

The referee is responsible for making split-second judgements, enforcing the game’s laws on the participants, and presiding over the match from an unbiased point of view. However, his (or her) mistakes might have detrimental effects on clubs, fans, players, and teams from an economic, psychological, and social standpoint [12]. It is not uncommon to hear people insinuating after a soccer match when the game result typically did not favour them. Complaints such as “They were not better than us, the penalty awarded to them was a mistake, the referee was clearly biased as he ought to have awarded a penalty to us on a foul to Ronaldo. But today, they didn’t even go to VAR when there were clear instances where the VAR should have been consulted” [13]. It is against this background that the introduction of the video assistant referee (VAR) was actualized to improve the decision-making of on-field referees. However, despite the integration of VAR into the current refereeing system, the performances of the referees are still questionable.

1.3 Recent Updates in Data Mining and Machine Learning Application in Sports

The employment of machine learning in sporting activities has gained significant attention in recent times. In fact, owing to its popularity, a number of review papers have been recently published with regard to this particular topic [14,15,16,17,18,19,20]. This section does not intend to compete with such literature, nonetheless, it aims to provide the readers with a flavour on the utilization of such technology in the sporting domain.

Tan et al. [21] investigated the efficacy of different machine learning models, i.e. k-nearest neighbours (kNN), support vector machine (SVM), artificial neural networks (ANN), Naive Bayes (NB), and random forest (RF) on the classification of different Speak Takraw kicks based on data captured via inertial measurement units (IMU) placed on the shank of the participants. Different statistical features were extracted from the IMU data before it was fed into the aforesaid machine learning models. The 70:30 hold out ratio was utilized for training and testing, respectively. It was demonstrated from the study that based on the features extracted that the ANN model is capable of classifying well (no misclassification in the test dataset) the kicks investigated, viz. serve (a.k.a. “tekong”), feeder as well as spike.

Thabtah et al. [22] investigated the efficacy of machine learning and its associated features in predicting National Basketball Association (NBA) game outcome. NB, ANN, and decision tree (DT) were used in the study. The NBA Finals Team Stats Dataset with match outcomes from 1980 through 2017 was obtained from kaggle.com. Three different feature selection techniques were employed in the study, namely multiple regression, correlation feature set, and the RIPPER algorithm. It was indicated from the study that the defensive rebounds were found to be a significant feature in influencing the results of an NBA game.

The classification of gait and jump in modern equestrian was investigated by Echterhoff et al. [23]. A smartphone was attached to the horse saddle to acquire the accelerometry and gyroscopic data. A total of 268 different features were extracted from both the time and frequency domain, and a total of 22 variation of machine learning models were investigated by considering the ninefold cross-validation technique. It was shown from the study that the cubic-SVM could achieve a classification accuracy of 95.4% in distinguishing the four classes evaluated, namely walk, trot, canter, and jump.

McGrath et al. [24] investigated the detection of cricket fast bowling by means of machine learning from IMU acquired data. Different machine learning models were evaluated, namely linear SVM (LSVM), polynomial SVM, ANN, RF, and gradient boosting (XGB) from the reduced feature set of 223. Different sampling frequencies were also evaluated to investigate the efficacy of the developed models in classifying bowling and non-bowling events. It was illustrated from the study that the SVM models demonstrated a slightly higher classification accuracy against the other models evaluated in general although it should be noted that all models achieved an accuracy of more than 95%, suggesting that the features extracted were indeed significant.

The classification of boxing punches by means of machine learning via IMU extracted data was investigated by Worsey et al. [25]. The participant was instructed to perform jab, cross, hook, and uppercut using both the right and left hand. The principal component analysis (PCA) was used to reduce the dimensionality of the extracted features. A total of six machine learning models were evaluated, namely the logistic regression (LR), LSVM, Gaussian SVM (GSVM), ANN, RF, and XGB. It is worth noting that the hyperparameters of the models were evaluated by means of the exhaustive grid-search technique via fivefold cross-validation. It was demonstrated from the study that the untuned GSVM and the ANN model could classify the punches well.

A feature selection investigation was carried out by Duki et al. [26] in the classification of Taekwondo kicks. Nine statistical features were extracted from the acceleration data of an IMU, viz. minimum, maximum, mean, median, standard deviation, variance, skewness, kurtosis, and standard mean error. The significance of the features was investigated by means of ANOVA as well as chi-squares (χ2) test, before it was fed into different machine learning models, namely SVM, RF, kNN, and NB, respectively. The 60:40 hold out ratio was used in the study. It was shown from the study that the features identified via the ANOVA method could yield a better classification accuracy up to 86.7% via the RF model in comparison with utilizing all features, or the features identified by χ2.

Abdullah et al. [27] employed the use of six transfer learning models with optimized SVM classifier to classify skateboarding tricks. Two input image types that were extracted from an IMU that is fixed on the skateboard and transformed to stacked raw signals (RAW), and continuous wavelet transform (CWT) was investigated. It is worth noting that the hyperparameters of the SVM model were optimized via the grid-search technique, and a training, testing, and validation ration of 60:20:20 was used in the study. It was established from the study that the CWT-MobileNet-optimized SVM pipeline was determined to be the best among all the permutation considered by considering its computational time as well as its classification accuracy.

The identification of high-performance volleyball players (HVP) and low-performance volleyball players (LVP) based on anthropometric variables and psychological readiness indicators by means of machine learning was investigated by Musa et al. [28]. The Louvain clustering algorithm was used to distinguish the performance of the players, while the logistic regression model was utilized to classify the classes. Owing to the skewed nature of the data, the authors employed the Synthetic Minority Oversampling TEchnique (SMOTE) to artificially increase the minority class dataset to avoid the overfitting notion upon classification. It was shown from the study that an excellent identification of the class of the player could be attained based on the pipeline developed. It is apparent from the brief survey presented that machine learning has gained significant traction in the sporting domain and has demonstrated to be able to yield reasonably good prediction in a myriad of cases, suggesting its invaluable contribution towards sports in general.

1.4 Mann–Whitney U-Test and Kruskal–Wallis Analysis

The Mann–Whitney U-test is a univariate mathematical analysis that is mostly used to compare means. The U-test is a type of dependency test analysis that assumes the variables in the analysis may be classified as independent or dependent [29]. The test’s logic is based on the assumption that fluctuations in the average scores of the dependent variable(s) are mostly attributable to the independent variable’s direct influence(s). It is worth mentioning that the independent variable(s) is also known as a factor since it splits the observed samples into two or more clusters.

The Kruskal–Wallis test, developed by Kruskal and Wallis in 1952, is a nonparametric method for determining whether samples are drawn from the same distribution. It broadens the Mann–Whitney U-test to include more than two groups [30, 31]. The Kruskal–Wallis test’s null hypothesis is that the mean rankings of the groups are the same. Kruskal–Wallis’s test is known as one-way ANOVA on ranks because it is the nonparametric counterpart of one-way ANOVA [32].

As opposed to the t-test and F-test, the Mann–Whitney U-test and Kruskal–Wallis’s test are considered a nonparametric analysis which signifies that no prior assumption is established on the means of the distribution of the sample within the variables of interests in the population samples. Unlike the equivalent one-way ANOVA, the nonparametric Mann–Whitney U-test and Kruskal–Wallis’s test do not assume that the underlying data is normally distributed [33].

These tests have been successfully applied in different sports and have been shown to be effective in projecting differences between two or more levels of performance classes. For instance, in a recent study, Mann–Whitney U-test has been applied to identify the technical as well as tactical performance indicators that could differentiate between the successful and unsuccessful teams in elite beach soccer competitions [34]. In an earlier study, the Mann–Whitney U-test was also successfully employed to study the probability of sustaining sports injuries among British athletes partaking in wheelchair racing [35]. On the other hand, Kruskal–Wallis’s test was reported to be effective in comparing and separating the use of substances in ballet, dance sport, and synchronized swimming [36]. Similarly, the test has proved useful in ascertaining the effect of open and closed-skill sports on the cognitive functions of amateur table tennis athletes [37].

1.5 Features Extraction Analysis via Information Gain

In machine learning, information gain (IG) is a typical entropy-based function assessment approach. The IG approach is often used to extract information from one or more features concerning a particular categorical-dependent variable [38]. It should be noted that in the current study, IG is used to assess the functionality that may be employed in delivering information in order to estimate the significance of a certain variable for classification or discrimination tasks. The IG is used in the current investigation to extract information that demonstrates the importance of the functions, i.e. the performance parameters, in explaining the underlying associations with the performance of referees in this sport.

1.6 Cluster Analysis

1.6.1 Hierarchical Agglomerative Cluster Analysis (HACA)

Hierarchical agglomerative cluster analysis is commonly used as a tool of exploration as well as a non-exploratory method by which a cluster hierarchy for a single observation is established and a set of related observations form a distinct observation [39]. It is important to note that in this algorithm, the learning process is decided by the merges as well as the splits of the dataset, which are also implemented to isolate and illustrate identical findings in a dendrogram [7, 40, 41]. It should be remembered that HACA shows the number of clusters dependent on the vicinity of a given or predetermined cluster in the dendrogram. Distance of cosine was used in this analysis, and the clustering validation technique was conducted by class centroids [42].

1.6.2 Louvain Clustering

The Louvain clustering method is also regarded as the most recent clustering algorithm capable of classifying a given collection of data or observations. The method is designed to complete the work in two discrete parts; in the first, it seeks for a “thin” group by maximizing modularity in a classical approach. In the second stage, the algorithm connects nodes from related communities to form a distinct community, resulting in the formation of a new network of community nodes [43]. These steps can be repeated repeatedly until a modularity condition is met. This step also adds to the system’s hierarchical fragmentation and the production of many divisions [44]. The divisions are often based on the density of the communities’ borders, rather than the intercommunity margins.

1.6.3 K-means Cluster Analysis

A k-means clustering method is a form of cluster analysis approach that divides a set of data into k-predefined and non-overlapping subgroups called clusters, with only one group given to each data point [40, 45, 46]. The method seeks to make the inter-cluster data points as connected as feasible while maintaining the intra-cluster data points as different as possible. The cluster analysis was used to assign into groups based on performance indicators evaluated in this study. It is worth noting that the Euclidean distance was used as a distance metric to assign the formation of all clusters established in the study.

1.7 Principal Component Analysis as Data Mining Technique

PCA is a mathematical method used primarily to identify the structure of a dataset from a group of observed variables [40, 45]. PCA provides information about key variables that might reflect a particular dataset by observing the spatial and temporal heterogeneity of the entire dataset. The process of extracting information from the PCA is performed by removing the data that is made up of the least important component and subsequently retaining the most useful information in the data [41, 47]. The employment of PCA is non-trivial in removing the most important information from a large dataset, which is crucial, as the analysis may serve to avoid wasting effort, cost, and time since the original data is often retained.

1.8 Application of Machine Learning Models in the Study

In this brief, different supervised machine learning models, namely k-nearest neighbours (kNN), support vector machine (SVM), logistic regression (LR), and artificial neural networks (ANN), were utilized towards the classifying different classes investigated. It is worth noting that the hyperparameters of the models evaluated were at times optimized, and such instances are explicitly mentioned in the ensuing chapters that involves the employment of machine learning models. The readers are encouraged to refer to our previous works [48,49,50,51], on the definition of the aforesaid machine learning models as well as performance indices that are often used to evaluate the efficacy of developed machine learning pipeline.

1.9 Datasets for the Study

A total of 6232 matches from five consecutive seasons (2017 through 2022) officiated across the English Premier League, Spanish Laliga, Italian Serie A, French Ligue1 as well German Bundesliga were retrieved from InStat Scout. The InStat Scout is one of the leading sports performance analysis companies founded in Moscow in 2007 with currently over 900 offices globally. The data is made available for the users upon subscription. InStat reported that the company examines referee performance in greater depth than any other statistics firm. InStat Scout includes a profile for each referee and generates a unique report after each match.

Due to the relative importance of referees’ statistics for both the national federation and referees themselves, InStat developed and covered over 63 indicators for each referee, and to ensure the reliability and validity of the indicators, each indicator is linked to videos that provide the activity profile of the referee as well as comprehensive reports on the referee’s overall activity in the match [52]. The indicators were developed after a thorough consultation with top referees from numerous countries. These indicators are from a complete profile of the referee’s actions during a match. They covered aspects that constitute details statistics, decisions making, home and away teams’ performance aggregates, injury time, distance metrics as well as reasons for awarding cards to players. These statistics are non-trivial as they portray the extent to which referees influence a match and how well they can officiate and keep up with the game. Table 1.1 depicts a detailed description of the datasets utilized in the study.

Table 1.1 Full datasets description in the study