1 Introduction

Handball is a popular sport with growing interest across the world. To date, there does not exist any official ranking tool to compare clubs performances. The only quantitative metrics available are coefficient ranks that compare countries (based on championships) provided by the European and International Handball Federations (EHF and IHF respectively) or player’s performance index available in some countries (Liqui Moly HBL 2024). Handball also suffers from a lack of literature (Saavedra 2018), especially in the predictive or analytical fields. In this work, we aim to establish a methodology to estimate the strength of teams.

Estimating the strength of a team has long been discussed in the literature, and is typically seen in sports such as football. Rating methods often assume some probability distributions to represent the distribution of the outcome of a match. Some methods are based on the Thurstone-Mosteller model (Thurstone 1927) or the Bradley–Terry model (Bradley and Terry, 1952) to model the outcome of a match based on a probability distribution where the location parameter corresponds to the strengths of the modeled teams. These popular techniques, however, assume that the underlying probability distribution is continuous. This is by nature contrary to the structure of the majority of scoring-based team sports data (e.g. football, handball, basketball, rugby, etc).

Another topic of controversy is the choice of the underlying probability distribution representing the number of goals scored by a team. Reep et al. (1971) demonstrated that the Negative Binomial distribution is suitable for modeling scores in several ball games. Maher (1982) however argued that tests for goodness-of-fit plead in favor of the independent Poisson distribution to model football scores. Ley et al. (2019) further investigated the idea of Poisson distributions and after a broad comparison of models, suggested the bivariate Poisson model (Karlis and Ntzoufras, 2003) to best represent the outcome of football games. From the estimated parameter \(\lambda\) obtained via Maximum Likelihood Estimation, they assume a structure from that parameter for team i with opponent team j as

$$\begin{aligned} \log (\lambda _i) = \beta _0 + (r_i - r_j) + h \cdot \mathbbm {1}(\text {team } i \text { playing at home}) \end{aligned}$$
(1)

where \(\beta _0 \in \mathbb {R}\) is a common intercept and \(h > 0\) is the effect of playing at home. The parameters \(r_i \in \mathbb {R}\) and \(r_j \in \mathbb {R}\) represent the abilities of team i and j that are used as estimation of team strengths.

In the context of handball, Groll et al. (2020) analyzed historical international games to determine the best probability distribution to model the number of goals scored in handball matches. Given the level of under-dispersion observed, they concluded that the standard Poisson distribution cannot be used and a Gaussian distribution with low variance is the most appropriate.

In this article, we propose a method to derive a ranking based on handball teams strengths. These strengths are obtained using the estimated parameters of an appropriate discrete probability distribution by means of maximum likelihood. We define formulae to transform such statistical estimates into sports abilities and observe how mathematical expressions can translate into sports facts. To illustrate our results, we apply our method to historical European female matches from the 2022/2023 season and obtain a ranking which is linked to the end of season standings.

Our work is organized as follows. In Sect. 2, we compare distributions from the existing literature with the Conway-Maxwell-Poisson distribution. After motivating the use of this flexible discrete probability distribution, we will generate a metric representing the strength of a team. In Sect. 3, we will illustrate the results of the proposed methodology on female club data and propose a ranking of the best performing teams based on statistical findings. Finally, we discuss next steps and future considerations in Sect. 4 and conclude in Sect. 5.

2 Methodology

In this section, we review the methodology used for modeling handball data to represent the strength of a team. First, we explain why the classical Poisson distribution cannot be used as the underlying probability distribution. We then propose the Conway-Maxwell-Poisson distribution as a flexible probability distribution from which we estimate, using its parameters, the strength of a team.

2.1 Non equi-dispersion of handball data

When analyzing historical data from female handball matches, one can observe situations with non equi-dispersion. We define the dispersion index DI as the ratio between the expectation \(\mathbb {E}(X)\) and the variance \(\mathbb {V}(X)\) of a random variable X:

$$\begin{aligned} DI = \dfrac{\mathbb {E}(X)}{\mathbb {V}(X)}. \end{aligned}$$
(2)

When \(DI < 1\), we are in the situation of over-dispersion, since the variance is larger than the expectation. When \(DI > 1\), the variance is lower than the average which indicates under-dispersion. The final situation where \(DI = 1\) leads to equi-dispersion.

To measure such index for handball data, we analyzed games over the 2022/2023 season of the European championships and performed a statistical test to assess whether the data are equi-dispersed or not. De Oliveira (1963) proposed a dispersion test to compare the mean to the standard deviation of a discrete distribution. Böhning (1994) later proposed an update of the test, correcting the computation of the asymptotic standard deviation. Under the null hypothesis \(H_0: X \sim \mathcal {P}(\lambda )\), one can assess whether a variable X follows a Poisson distribution with parameter \(\lambda\) given that the distribution fulfills the property \(\mathbb {E}(X) = \mathbb {V}(X) = \lambda\). The alternative hypothesis is \(H_1: \mathbb {E}(X) \ne \mathbb {V}(X)\) highlighting non equi-dispersion. The test statistic is thus defined as

$$\begin{aligned} T = \dfrac{\dfrac{1}{n-1} \sum _{i=1}^{n}{(X_i - \bar{X})^2} - \bar{X}}{\sqrt{\dfrac{2}{n-1}} \bar{X}} \end{aligned}$$
(3)

where \(X_i \in \mathbb {N}\) records the number of goals scored in past matches (e.g. ongoing season). The location parameter is approximated by the empirical mean \(\bar{X} = \sum _{i=1}^{n}{X_i}\). Under \(H_0\), the test statistic T follows a \(\chi ^2\) distribution with \(n - 1\) degrees of freedom (Hoel, 1943).

Table 1 Count and share of teams by gender per type of dispersion, assessed by dispersion test on the number of goals scored

We present in Table 1 the results of the tests performed over 819 European clubs (323 being female and 496 male). We could not reject \(H_0\) (at level 5%) for only 47 clubs (5.7%), all others either show over- or under-dispersion. As an illustration, the female team of Metz Handball (France) scored on average 32.25 goals over the 2022/2023 season with a variance of 21.23. Performing the test yields a test statistic of \(-1.53\) with a p-value \(< 0.001\), indicating under-dispersion. Therefore, aligned with conclusions from Groll et al. (2020), we do not determine that equi-dispersed Poisson distribution is suitable to model scored goals during handball matches.

2.2 Modelling handball games with Conway-Maxwell-Poisson

As an alternative to the standard Poisson distribution, we consider the Conway-Maxwell-Poisson (CMP) distribution (Sellers, 2022). It is a generalization of the common Poisson distribution, but with the ability to handle under- and over-dispersion. Its probability mass function is defined by

$$\begin{aligned} \mathbb {P}(X = x | \lambda , \nu ) = \dfrac{\lambda ^{x}}{(x!)^{\nu }} \dfrac{1}{\sum _{j=0}^{\infty } \dfrac{\lambda ^{j}}{(j!)^{\nu }}}. \end{aligned}$$
(4)

The parameter \(\nu \ge 0\) represents the level of dispersion. When \(\nu = 1\), this indicates an equi-dispersed Poisson distribution. When \(\nu < 1\), we are in the situation of over-dispersion while \(\nu > 1\) represents under-dispersion. Though it does not have an explicit interpretation, \(\lambda \in \mathbb {R}^{+}_{*}\) can be seen as a location parameter whose value gets closer to the mean as \(\nu \rightarrow 1\). Other special cases of the Conway-Maxwell-Poisson distribution include the Bernoulli distribution with parameter \(\lambda /(1+\lambda )\) as \(\nu \rightarrow \infty\) and the geometric distribution with probability of success \(1-\lambda\) when \(\lambda < 1\) and \(\nu = 0\). The CMP distribution can also be a good alternative to the classical Poisson distribution given its flexibility to handle different levels of dispersion.

To evaluate the goodness of fit of the distribution on handball data, we compare the CMP with the Gaussian and Negative Binomial distributions as mentioned in Groll et al. (2020). In Table 2, we report the estimated log-likelihood (\(\hat{L}\)) and the associated Akaike Information Criterion (AIC) for the club of Metz Handball over the 2022/2023 season. We observe from Table 2 that, although the three distributions seem to similarly fit the data, the Conway-Maxwell-Poisson distribution exhibits the maximum log-likelihood. However, the AIC aims to penalize complex distributions with numerous parameters to estimate. Given that \(k = 2\) for all three distributions, minimizing the AIC or maximizing the log-likelihood leads to the same conclusion.

Furthermore, when fitting and comparing the three distributions for 819 European male and female clubs (reported in Table 3), the Conway-Maxwell-Poisson distribution is the most suitable in the majority of the cases (382 team out of 819, 46,6%). Although the Gaussian distribution is also appropriate for 36% of the teams, the log-likelihood for the three distributions remain very close to each other. Therefore, considering the flexibility of the distribution, which can handle under-, equi- and over-dispersion situations, the Conway-Maxwell-Poisson distribution would be the most appropriate choice for modeling handball data.

Table 2 Comparison of log-likelihood and AIC evaluated on scored goals by Metz Handball over season 2022/2023
Table 3 Comparison of distributions, counting the number of teams for which the distribution is most suited

We represent in Fig. 1 the relation between the empirical mean from a CMP distribution and its associated parameters \(\lambda\) and \(\nu\). We notice a logarithmic relationship between the parameter \(\lambda\) and the empirical mean. This relation will be of particular interest in the next Sect. 2.3 when defining the team’s strength.

Fig. 1
figure 1

Relation between CMP parameters \(\lambda\) and \(\nu\) and the empirical mean \(\bar{X}\) (from simulated data)

2.3 Estimation of team strengths

As with most competitive sports, the strength of a team can be expressed by its ability to perform both the areas of attack and defense. We thus introduce different formulae to represent defense and attack strengths of a team. We then define the overall strength of a team, creating a combination of attack and defense abilities. In this section, we will refer to a difficulty parameter \(\omega\) defined as

$$\begin{aligned} \omega = \frac{1}{n} \sum _{i=1}^{n}{\omega _i} \in [0, 1]. \end{aligned}$$
(5)

This parameter corresponds to the average difficulty of the \(n \in \mathbb {N}\) matches played by a team over a fixed period of time. Each match is assigned a level of difficulty \(\omega _i \in [0, 1]\) which is a function of the competition. It is based on the European Handball Federation’s (EHF) place distribution of competitiveness of leagues. Some tuning is performed to find a realistic value to represent the competitivity of the leagues. Our experiments suggested to scale the values, using min-max normalization, of the place distribution between 0.9 for the least competitive countries (i.e. Luxembourg or Belgium for women and Moldova for men) and 1 for the EHF Champions League. These results, validated with sport professionals, indicate that a lower bound of 0.9 for \(\omega\) allows to not devalue competitions while still being realistic about their competitivity. Values lower than 0.9 tend to severely penalize some teams and lead to unrealistic rankings, the scaling method allows to keep the amplitude of the place distribution and protect potential gaps between countries.Footnote 1 The parameter serves as a penalty to highlight teams playing in more competitive championships.

2.3.1 Defense strength

Adopting the selected Conway-Maxwell-Poisson distribution, we use its parameters to represent the defensive strength of a team. The distribution of goals conceded by a team, denoted by \(Y_d\), is assumed to follow a \(CMP(\lambda _d, \nu _d)\), where the parameter \(\lambda _d > 0\) can act as a location parameter and \(\nu _d \ge 0\) as the dispersion parameter. We then define the defense strength as

$$\begin{aligned} s_d = \left( \dfrac{\nu _{d}}{\log (\lambda _d)} \right) ^{\omega }. \end{aligned}$$
(6)

The strength of a team’s defense is inversely proportional to the goals they concede. This is reflected in Eq. (6) in the sense that the higher the average number of conceded goals are (i.e. the higher \(\lambda _d\)) the lower the strength \(s_d\) will be. We notice the logarithmic transformation \(\log (\lambda _d)\) to account for the relation with the empirical mean as mentioned and illustrated in Fig. 1. On the other hand, we want to penalize for inconsistencies of a team, therefore we want the parameter \(\nu _d\) to be as large as possible corresponding to under-dispersion. The penalty \(\omega \in [0, 1]\) then makes sure that highly competitive matches are put forward in the strength. We can thus interpret formula (6) as follows: a team is a strong defender if it constantly concedes few goals during matches.

2.3.2 Attack strength

We also assume that the distribution of scored goals follows a CMP distribution, \(Y_a \sim CMP(\lambda _a, \nu _a)\). A team is considered strong in attack if the average number of scored goals is large. The logic can therefore be considered as the inverse from Eq. (6). We define the attack strength of a team as

$$\begin{aligned} s_a = \left( \dfrac{\log (\lambda _a)}{\nu _a} \right) ^{\omega } \end{aligned}$$
(7)

where the location parameter \(\lambda _a\) is used as the numerator to show that a high number of goals scored on average increases the attack strength. The dispersion parameter \(\nu _a\) is used as the denominator, behaving like a penalty as we expect teams to have consistent performances over the season. We finally include the weight \(\omega \in [0, 1]\), as defined in Eq. (5), to penalize for the difficulty of the matches played.

2.3.3 Global strength

A team is considered strong when it can perform well in attack and defense. We define the overall strength of a team as the combination of attack and defense strengths by

$$\begin{aligned} s = s_a \cdot s_d = \left( \dfrac{\log (\lambda _a) \cdot \nu _{d}}{\nu _{a} \cdot \log (\lambda _d)} \right) ^{\omega }. \end{aligned}$$
(8)

We observe that a high score for overall strength can be driven by two factors. On the one hand, the team should have a high average of scored goals while demonstrating consistent defensive performances over time. On the other hand, a team should be able to adapt its attack strategies to teams and be able to take their opponent by surprise by scoring more than expected. They should also be able to prevent conceding too many goals and have a scoring capacity. In other words, the goal difference in the competition’s ranking should be as large as possible. This can usually be verified in different competitions where leading teams tend to have a high difference (+229 goals for Metz Handball in the French female championship at end of 2022/2023 season or +257 for Vipers Kristiansand in Norway) while teams at the bottom of the season standings have highly negative goal differences (-107 for Toulon Métropole Var Handball in France for the same season or -170 for Volda in Norway).

We can now note the importance of the nonlinear transformation for \(\lambda _a\) and \(\lambda _d\). Given the logarithmic rate of these parameters, a team may have to record a much higher average of scored goals to distinguish itself from other teams. First, the slope of the strength s with respect to the scored goals \(\lambda _a\) is

$$\begin{aligned} \dfrac{\partial s}{\partial \lambda _a} = \dfrac{\omega \log ^{\omega - 1}(\lambda _a)}{\lambda _a} \left( \dfrac{\nu _d}{\nu _a \log (\lambda _d)} \right) ^{\omega } > 0. \end{aligned}$$
(9)

As the team gets stronger, \(\lambda _a\) increases (everything else being equal) and differentiators with other teams become marginal since \(\lim _{\lambda _a \rightarrow \infty } (\frac{\partial s}{\partial \lambda _a} ) = 0\).

Second, the derivative of the strength s with respect to conceded goals \(\lambda _d\) is

$$\begin{aligned} \dfrac{\partial s}{\partial \lambda _d} = - \dfrac{\omega }{\lambda _d \log ^{\omega + 1}(\lambda _d)} \left( \dfrac{\log (\lambda _a) \nu _d}{\nu _a} \right) ^{\omega } < 0. \end{aligned}$$
(10)

This suggests that any reduction of conceded goals leads to improvements in the overall strength. These statements have an actual echo in sports terms. It is common knowledge for handball players and coaches that the best way to improve a team’s performance is to start by improving their defense.

3 Illustrative applications

As illustrated in Sect. 2.2, the CMP distribution seems to be the most appropriate choice to model goals scored during a handball match. We plot in Fig. 2 the histogram of scored goals over the 2022/2023 season for the female club of Metz Handball (France) and compare with the fitted theoretical CMP distribution. The fitted distribution smooths the empirical histogram which highlights two distinct scoring regimes. The situation when \(x \le 32\) mostly corresponds to Champions League matches with aggressive defensive teams (average scored goals = 30.4). On the other hand, when \(x > 32\) we mostly have matches from the French championship with teams being substantially weaker (average scored goals = 33.6). To account for these differences, we will need to use a penalty for the competitiveness as introduced in Sect. 2.3.

Fig. 2
figure 2

Histogram of goals scored by Metz Handball over season 2022/2023 vs. theoretical CMP distribution estimated via Maximum Likelihood

Furthermore, we estimate the strength parameters for all European female clubs and display the ranking in Table 4. The estimations are derived from all matches over the 2022/2023 season that were played in female first division competitions (ranging from friendly games to the regular championships and Champions League).

Table 4 Top 10 strongest female teams in Europe for the 2022/2023 season

We can observe from the top clubs that Győri Audi ETO KC, Vipers Kristiansand, Team Esbjerg and FTC Rail-Cargo Hungaria were the participants of the EHF final four in June 2023, representing the final stage of the toughest European competition. Vipers Kristiansand finally won the competition for the third year in a row. All other clubs are leading their championships in their respective countries and were members of the EHF Champions League in season 2023/2024. As handball does not have any official ranking, we asked feedback from sports professionals as the only way to validate our results. They could confirm that our results make sense and are in line with the performance of European teams.

We also notice in Table 4 that, even though the ranking is sorted by the overall estimated strength, the average number of scored and conceded goals seem to follow some sort of hierarchy. To be ranked toward the top, teams have to show a high average of scored goals and a relatively lower number of conceded goals. The exceptions (e.g. MKS Zaglebie Lubin) find a justification in the consistency of their performance. Indeed, such teams exhibit a lower value for \(\nu _a\) or higher value for \(\nu _d\) suggesting consistencies in attack or defense and boosting their final strength ranking. This justifies the requirement for the use of formulae (6) and (7) instead of purely relying on average scored goals.

4 Discussion

Our proposal offers an estimation of attack and defense strengths to rank teams and generate features that can be informative and meaningful in subsequent modelling tasks. Provided that one has access to such data, the presented exercise can be extended to other objectives such as estimation of player abilities or be generalized to other sports.

4.1 From team strengths to player abilities

Using more granular data (not publicly available) on player performances for each game and over several seasons, one can also estimate the attack strength of a player. Considering that the data will most likely also suffer from under- or over-dispersion, the CMP distribution seems to be a good choice to fit the number of scored goals by a player. Using formula (7), we can therefore estimate the attack strength of an individual player. Not focusing only on goals scored, playing ability could also include components such as passing ability and combine scoring and passing abilities as a global attack strength. Accessing data such as interception, successful blocks (e.g. faults with no penalty such as yellow card, 2 min penalty, etc.), the defense ability can be modeled in a similar fashion so that one can derive a defensive ability at player level.

Therefore, combining attack and defense abilities as defined by Eq. (8), one can estimate the individual abilities and derive a ranking. Such ranking can help subsequent modelling exercises by adding informative variables regarding the strength of the individual players and not only the global strength of a team. Additionally, the individual ranking can be used as a new source of information for team managers to assess the potential abilities of a player when recruiting. Indeed, one can obtain a time dependent ranking and observe the evolution of a player over several seasons. This can further lead to forecasting exercises to identify players with high potential to be added to the squad.

4.2 Generalization to other sports from Conway-Maxwell-Poisson distribution

Modelling sports requires the reliance on discrete distributions though the issue of over- or under-dispersion is a recurrent problem (Karlis and Ntzoufras, 2008; Van Bommel et al., 2021). Given the similar constraints as we have seen in the present work, one can replicate the discussed logic on other sports’ data. The methodology from Ley et al. (2019) can be merged with our proposed methodology to obtain football team abilities based on a distribution that can handle the problem of under-dispersion. One can thus define new rankings and generate new informative features to include in predictive Machine Learning models. Using a similar methodology as Groll et al. (2019), one can include such generated features in the feature set to improve the predictive model.

5 Conclusion

Handball is a fast-paced sport of which goals cannot be analyzed via standard count distributions due to the problem of under- or over-dispersion. We showed that, using an appropriate probability distribution, one can define meaningful statistical estimates that approximate the strengths of a team. The choice of the Conway-Maxwell-Poisson distribution can therefore be a suitable option to model not only handball games but any scoring-based team sports.

The proposed methodology allows the generation of very informative context about the performance of a team. In a future work, this will allow to generate new covariates that can be included in predictive models in the spirit of Groll et al. (2019). They can also offer the possibility to consider data-driven analyses of a team’s performance to later support team managers in their personal strategies and tactical motivations. With access to more granular data, this methodology can be adapted to the estimation of player abilities and offer tools to allow coaches take data-driven decisions in their recruitment processes.