1 Introduction

Sports competitions are often structured in official championships, where individual athletes or sporting teams compete to win. In such championships, one has a list of matches where the same teams/players play different games under a plethora of rules to be respected. In some cases, the team/player losing a match is eliminated from the competition—like in Grand Slam tennis tournaments, such as Wimbledon, the US Open, the Australian Open, and the French Open, where the winning cup is assigned at the final game to one of the last two surviving players. In other cases, all the teams/players play the same number of matches, and the winner comes out from the outcomes of all the matches—this is the case of football, with the official championships of European countries like Serie A in Italy, Premier League in the UK, Ligue 1 in France and Campeonato Nacional de Liga de Primera División in Spain. In the former case, there is no real need to quantify the performance of the players to identify the winner and the other positions in the final ranking—the winner is the player who wins the final match, while her/his competitor takes a “silver medal”. In the latter case, one has to state some rules that assign a score to the playing teams at the end of each match. The analysis of the existing scoring rules and the proposal of new criteria—more reasonable, from different perspectives—offers room to carry out scientific research at a methodological level but also in the context of applications, see Sziklai et al. (2022) for an overview on tournaments’ efficacy.

In general, sports statistics is a widely acknowledged field of science (see, e.g. Albert and Koning 2007). One of the most famous papers in sports statistics is Reep and Benjamin (1968), where the authors analysed more than 2500 football matches and found that fewer passes are associated with a higher probability of a goal. This paper is the root of the football philosophy of the so-called “long ball system”, for which the ball should be kicked over long distances to avoid a high number of passes. There was (and still is) a long debate on Reep and Benjamin’s results regarding the presence of some biases in the analysis. From our point of view, debating the outcomes of the analysis does not affect the universal validity of Reep and Benjamin’s research question. Intuitively, statistics in sports might be efficiently exploited to advance methods to predict the outcomes of different matches. On this, Baker and Scarf (2006) face the case of 20 annual sporting contexts by including the heterogeneity of the prediction criteria in their investigation. More recently, Mattera (2023) provides a forecasting exercise of football outcomes by employing score-driven models. Still, in the context of football, Heuer et al. (2010) advance a Poisson Process-based model for predicting the outcomes of football matches.

One can deal with the players’ scores and performance from a perspective still related to the outcomes. In this respect, Volf (2009) provides a view of the scores in sports matches as the realizations of a point process based on the plethora of elements surrounding matches and players. Along the same line, Gabel and Redner (2012) deal with the scoring procedure of basketball games and elaborate on a random walk-type stochastic process behind the evolution of such a procedure over time. According to Volf (2009), Higham et al. (2014) identify the performance indicators for the case of rugby by highlighting their roles in the formation of the scores of the teams. Also, Boys and Philipson (2019) discuss the ranking procedures of sportsmen in the special context of cricket. A relevant contribution is Sandri et al. (2020), where the authors explore game performance variability through Markov switching models. In Ausloos (2024), the author offers a new perspective on the way the final ranking of cyclist rides should be carried out. The interested reader is also signposted to, e.g., Strauss and Arnold (1987), Merritt and Clauset (2014), Migliorati et al. (2023) and references therein contained.

This paper adds and contributes to the literature on scoring procedures with an application to the relevant case of football championships. Specifically, we propose a novel method for assigning a score to the teams to identify the winner and the final ranking of every season of the considered championship. In doing so, we are close to several studies dealing with the analysis of the performance and the scoring procedures in football matches. We mention Ausloos et al. (2014), where the authors deal with the analysis of the structure of the ranking when considering UEFA and FIFA championships at a country level. Ausloos (2014) provides a view of the football rankings as unified frameworks through a rank-size analysis, with specific attention to the illustrative power of the Lavalette law. Ribeiro et al. (2010) build a model based on random walks for describing the scores of the soccer leagues. More recently and on the same line, Vernon-Carter et al. (2023) present soccer leagues as competitive complex systems where competitiveness can be measured through the scoring of individual soccer teams. We refer the interested reader also to Glickman and Stern (2005), Mendes et al. (2007), Thakkar and Shah (2021), Ficcadenti et al. (2023), Cefis and Carpita (2024). The methodological proposal is tested on the paradigmatic case of all the seasons of the Italian championship, Serie A.

Our starting point is the disappointing evidence that the current rule for Serie A admits the existence of special circumstances leading to the mathematical assignment of the winner’s cup to a specific team, disregarding some matches to be played to officially end the season. That happens because of simple arithmetic consequences in the rule set. Indeed, the rigidity of the score on the basis of the trichotomy win-tie-lose makes recoveries be impossible when the distance in scores is large enough. This often implies a deteriorated level of game qualities in the matches played toward the end of the season, when the team that is mathematically the winner of the championship season starts losing matches against low-level teams. An example can be taken from the 2018–2019 championship. Juventus won the Serie A title with five games to spare. They clinched the title on April 20, 2019, after a 2–1 victory over Fiorentina, which put them 20 points clear of their closest challengers at the time, Napoli, with only 15 points left to play for. This significant points gap made it mathematically impossible for any other team to catch up with Juventus in the season’s remaining fixtures.

Following this victory, the matches that Juventus played for the rest of the season lacked the same competitive edge, at least from their perspective, as the title was already secured. They tied against Internazionale, Torino and Atalanta and lost against Roma first and Sampdoria later. This situation illustrates the potential downside of having a team win the league so early: the intensity and competitive nature of their remaining matches can diminish, potentially affecting the overall quality of the league’s competition towards the end of the season. While Juventus continued to compete professionally, the urgency and high stakes associated with their matches were notably reduced, aligning with the concerns expressed about the impact of early championship wins on the quality of the game.

We hypothesize a novel scoring rule for which scored and conceded goals play a relevant role in determining the final ranking of the considered championship season. In so doing, we are not far from Cerqueti et al. (2022), where there is an application on football data to rank teams according to their goals. The following approach is of data science-computational type. We consider the sample of all the Italian Serie A championship seasons, from 1929–1930 to 2022–2023, with specific reference to the official final rankings of the teams. We then implement a four-step procedure based on a combination of Kendall \(\tau\) and radar charts. The procedure leads to what we call New Rankings of the seasons, which are grounded on a different way to assign the scores to the teams also including the goals—hence, removing or reducing the cases of mathematical certainty of being the winner well-before the end of the championship. The approach is quite close to Gorgi et al. (2023), where the authors advance a pair of comparison methods for reconstructing the rankings of the football championships in the presence of forced interruption. This is particularly relevant in that there has been evidence of different cases of interrupted championship—the most recent one being linked to the COVID-19 pandemic. However, the quoted paper uses Kendall \(\tau\) only to test the proposed framework’s validity. Differently, we here base the analysis on such a statistical correlation measure along with a radar chart-based evaluation of the multidimensional performance of specific entities—the seasons of the considered championship, in our case.

The paper is organized as follows. Section 2 describes the considered dataset. Section 3 presents the methodology used for dealing with the data, outlining the four-steps procedure used to obtain the New Rankings. Section 4 collects the main results of the analysis, along with some related discussions. The last section offers some conclusive remarks and traces lines of future research.

2 Data

This study utilises a comprehensive dataset encompassing the outcomes of football matches from the Italian Serie A championship, spanning from its inception, 1929–1930, to the present day, amounting to \(M=90\) seasons.Footnote 1 For brevity, we refer to the seasons by mentioning only the related last year, so that e.g. 1929–1930 becomes 1930 for us.

The summary statistics of the dataset in Table A1 offers a glimpse into the volume and nature of the data analysed. These statistics encompass various metrics to understand football dynamics, including the number of goals scored by home and away teams (“Goals For” identified with GF and “Goals Against” with GA), points accumulated throughout the season, and the number of wins, draws, and losses. The dataset comprises records from the 1930 season of Serie A, totalling 34 matches, to the 2023 season, with 38 matches played. For each match, the dataset records the date, the teams involved, the goals scored by each team, and the final result (win, draw, or loss), allowing for a detailed examination of team performances at the season level and the evolution of the championship over time.

In addition to the standard metrics, we have developed rankings based on GF and GA each season, identified by the variables \(GF_r\) and \(GA_r\), respectively. These rankings provide an alternative perspective on team performance at the end of the season, emphasizing, for example, offensive and defensive capabilities beyond the traditional league standings.

In preparing the data for analysis, several preprocessing steps were undertaken to ensure the dataset was fit for purpose. These steps included verifying and correcting match outcomes, normalising team names to account for historical changes, and identifying and treating any missing or incomplete records. Finally, only official rankings (including penalties applied by authorities), GF and GA were employed to serve as the primary basis for our analysis. Developing rankings based on GF and GA for each season involved taking the total number of goals scored by and against each team, allowing us to derive \(GF_r\) and \(GA_r\). This approach offers a nuanced view of team strategy and performance, fitting the objective of our study and complementing the official league standings with metrics highlighting each team’s offensive and defensive strengths.

3 Methodology

This section illustrates the four steps of the procedure for achieving the New Rankings for the Italian Serie A championship seasons. First, we consider the unofficial rankings with teams ordered on the basis of the scored goals or, simply, Goals For (in decreasing order, so that rank=1 is associated with the team with the highest number of scored goals in the championship season) and conceded goals or, simply, Goals Against (in increasing order, so that rank=1 is assigned to the team with the lowest number of conceded goals). Thus, we have three rankings for each season on the same set of teams. Second, we compute the Kendall \(\tau\) of all the possible couples of rankings, hence obtaining three values of the Kendall \(\tau\) for each season. Third, we build a radar chart for each season, whose axes are associated with the three Kendall values. Therefore, a triangle describes each championship season. We compute the area of the obtained triangles, and then we suitably normalise it so that the areas range from 0 (case of all Kendall \(\tau\) equals − 1) to 1 (case of all Kendall \(\tau\) equals \(+\) 1). The areas of the triangles represent the target (normalised) Kendall \(\tau\). Fourth, we detect rankings with a target Kendall \(\tau\) correlation with the official one. The obtained rankings are the New Rankings. As we will see soon, the New Rankings are often far from the official ones.

We enter the details.

3.1 Kendall \(\tau\) correlation analysis

The association between official team rankings and goal metrics (GF and GA) is achieved through Kendall \(\tau\) correlation analysis. The \(\tau\) correlation coefficient measures the strength and direction of the association between two ranked variables. It is defined as:

$$\begin{aligned} \tau = \frac{2}{n(n-1)} \sum _{i<j} \text {sign}(x_i - x_j) \cdot \text {sign}(y_i - y_j) \end{aligned}$$
(1)

where n is the number of observations, \(x_i\) and \(x_j\) are the ranks of the i-th and j-th observations for the first variable, and \(y_i\) and \(y_j\) are the ranks of i and j for the second variable. The sign function, \(\text {sign}(\cdot )\), returns \(-1\), 0, or 1 depending on the sign of its argument.

In our case, we use the \(\tau _b\) variant, Kendall (1945). Such a coefficient is a measure of association based on the ranks of the data and is adjusted for ties. We point out that ties occur when two or more items have the same rank. The formula defines it:

$$\begin{aligned} \tau _b = \frac{P - Q}{\sqrt{(P+Q+T)(P+Q+U)}} \end{aligned}$$
(2)

being P and Q the number of concordant and discordant pairs, respectively and T and U the number of ties only in x and y, respectively. If a tie is registered for the same couple in x and y, such a tie is not considered for the value of T and U.

The \(\tau _b\) coefficient accounts for ties by adjusting the denominator to reflect the number of tied ranks, which can affect the distribution of concordant and discordant pairs. This adjustment makes \(\tau _b\) a more accurate reflection of the association between two variables when ties are present in the data. In datasets where ties are common—as in our caseFootnote 2\(\tau _b\) offers a more reliable correlation estimate than the standard Kendall \(\tau\) coefficient, which does not adjust for ties.

This coefficient is computed for each season to analyse the relationships between the official rankings (Rank) and the rankings based on goals for (\(GF_r\)) and goals against (\(GA_r\)), as well as the relationship between \(GF_r\) and \(GA_r\) themselves. It is worth recalling that in our data, the ties can be met only in \(GF_r\) and \(GA_r\), as the official ranking is built on a set of rules that avoid the presence of ties. Table 1 reports a summary of the instances considered, and Fig. 1 shows correlations over time.

The Kendall correlation coefficients are computed using Python, leveraging the scipy.stats.kendalltau function for correlation analysis. As we will appreciate when we introduce the radar charts, a normalization process is needed to use the correlations smoothly to form the areas. Such a normalization procedure adjusts the correlation values to a [0, 1] scale centred at 0.5, allowing for a consistent geometric interpretation across different years. The normalisation formula applied to each tau correlation coefficient, denoted as \(\tau _b\), is defined as follows:

$$\begin{aligned} \tau _{b;N} = \frac{\tau _b + 1}{2} \end{aligned}$$
(3)

Here, \(\tau _{b;N}\) represents the normalised correlation coefficient. By adding 1 to the original correlation coefficient \(\tau _b\), the new range starts from 0 (previously \(-1\)) to 2 (previously 1). Dividing this result by 2 adjusts the scale to range from 0 to 1. If the quantity \(\tau _{b;N}\) in (3) has a value of 0.5, then we do not have correlation; values closer to 1 indicate a strong positive correlation, while values closer to 0 suggest a strong negative correlation.

To facilitate the understanding of the steps, we report in Table 2 a snapshot of the 1939 and 2023 cases, being chosen as explicative instances; in Table 3 one can see the rank correlations and their respective normalisation.

Table 1 Summary of the pairwise rank-correlation analyses
Fig. 1
figure 1

The different correlation analyses are reported over the seasons. We use the non normalised version of the Kendall tau in formula (2). Rank is the official ranking, \(GA_r\) is the ranking when “Goals Against” is considered, and \(GF_r\) indicates the ranking when “Goals For” is considered

Table 2 Table of the values of the considered variables and rankings for the championship’s seasons ended in 1939 and 2023
Table 3 Table of the normalised and non-normalised correlations for the analysis of the championship’s seasons ended in 1939 and 2023
Fig. 2
figure 2

The vertices are suitably labelled, and the area of the triangle is shaded to visually represent the correlations’ magnitude. The radial grid lines are set at intervals of 0.1 to indicate the scale of the normalised correlation (according to Formula 3), ranging from negative correlation (zero in the graph) to perfect positive correlation (one in the graph). The shaded areas, calculated from the triangles formed by these correlations, provide a quantitative measure of the combined correlation strength among attributes for each year, in the graph being 1939 and 2023

Table 4 Angles, normalised correlations, coordinates of the vertices of the triangles and resulting areas for the seasons 1939 and 2023
Fig. 3
figure 3

The areas calculated with Eq. (7). Each point represents a season

3.2 Mapping correlations into radar charts

In the analysis, the normalised correlation coefficients for each year are represented geometrically, forming triangles that encapsulate the relationship among different performance metrics. The triangles are constructed as radar charts, by plotting points on axes that extend from a central point, with each axis representing one of the analysis types. The process is described as follows:

  1. 1.

    For each year under analysis, the normalised correlation coefficients (\(\tau _{b;N}\)) for the selected metrics are retrieved. These coefficients range from 0 to 1, and are centred at 0.5.

  2. 2.

    Angles for the vertices of the triangles are calculated to distribute the analysed couples of metrics around a centre evenly. This is achieved using the formula:

    $$\begin{aligned} \theta _h = 2\pi \frac{h}{n} \end{aligned}$$
    (4)

    where \(\theta _h\) represents the angle for the h-th vertex, n is the total number of analysis types, and h ranges from 0 to \(n-1\). In our context, the three angles in radiants are 0, 2.09 and 4.18.

  3. 3.

    Each correlation coefficient is then used as a point on its respective axis, determined by the corresponding angle \(\theta _h\).

  4. 4.

    The points are connected in sequence, forming a closed shape that, in the context of this analysis, is a triangle; one gets three vertexes thanks to the three types of analyses considered. See Fig. 2, where the examples of 1939 and 2023 cases are reported on the basis of the data presented in Table 3.

3.3 Area calculation from the resulting triangles and correlation target

The area of each triangle described above is calculated to quantify the combined strength of the correlations. Given the vertices positioned at angles \(\theta _1\), \(\theta _2\), and \(\theta _3\) with their respective normalised correlation coefficients, the Cartesian coordinates for each vertex are determined by:

$$\begin{aligned} x_h&= \tau ^{(h)}_{b;N} \cos (\theta _h) + \text {shift}_x \end{aligned}$$
(5)
$$\begin{aligned} y_h&= \tau ^{(h)}_{b;N} \sin (\theta _h) + \text {shift}_y \end{aligned}$$
(6)

where \(\tau ^{(h)}_{b;N}\) is the normalised correlation coefficient for vertex h, and \(\text {shift}_x = \text {shift}_y = 10\) and used to ensure all points are positioned in the positive quadrant to simplify the area calculation.

The area of a triangle formed by the three points representing the considered normalised correlations is calculated, to provide a geometric representation of these correlations. Given the vertices coordinates \((x_1, y_1)\), \((x_2, y_2)\), and \((x_3, y_3)\), the area (A) of the triangle is given by:

$$\begin{aligned} A = \frac{1}{2} \left| x_1(y_2 - y_3) + x_2(y_3 - y_1) + x_3(y_1 - y_2) \right| \end{aligned}$$
(7)

The procedure of computing A is implemented to each of the \(M=90\) seasons considered in our dataset. An example of the calculations can be found in Table 4 for the cases 1930 and 2023.

A time series version of the calculated areas can be found in Fig. 3, where there is a clear view of the time-evolution of the considered areas.

We then hypothesise that the area of the triangles represents the (normalised) Kendall correlation targets of the New Rankings with the official ranks. So, considering goals for and against would give a ranking of the teams whose correlation with the official one is the area of the triangle of the related radar chart.

3.4 Finding the new rankings

This section contains the computational strategy devised to discover alternative ranking systems that may more accurately reflect football championships’ dynamics and a performance-centric nature in the final rankings. The goal is to pinpoint permutations of team rankings yielding Kendall \(\tau _b\) correlations that match the geometric areas previously calculated with Eq. (7), suggesting a ranking better-representing team performance throughout the seasons.

3.4.1 Generating permutations and calculating Kendall’s tau correlations

For a given number of teams, \(n\), in the football championship (indicated in Table A1 as “N. Teams”), we embark on a systematic generation of permutations to simulate the possible teams’ positions in the final ranking, therefore simulating various possible seasons’ outcomes. Owing to computational constraints and the \(n!\) increasing number of permutations with \(n\), our exploration is confined to a select subset of permutations. In this way, we can still show here to what extent the official rankings are affected by the partially missed account of “Goal For” and “Goal Against”. Specifically, we run the first 362,880 permutations for the 2023 case. Such a threshold is based on system capabilities and the aspiration to encompass a broad spectrum of potential rankings.

Within each permutation, we compute Kendall \(\tau _b\), with respect to the original ranking sequence using Eq. (2). As already said, Kendall’s tau is a measure used to ascertain the ordinal association between two quantities.

One should use the original sequence of numbers between 1 and n for each season against the permuted sequence of the same set of numbers. The permuted series should be ordered according to the lexicographic criterion to simplify the process. This computation yields a distribution of \(\tau _b\) values formed by the correlations associated with all the possible permutations. The target \(\tau _b\) associated with a given permutation illustrates the degree of correlation of such a permutation with the original team ranking. Such a permutation of the original ranking can be viewed as the outcome of a championship where the teams are ranked according to the permuted ranking. To illustrate this statement, we refer to Fig. 4, where 362,880 permutations are evaluated with different values of n. This exemplifies the idea of having championships of n teams whose rankings are shuffled and compared to the original one, which is assumed to be \((1,\dots , n)\).

Fig. 4
figure 4

Variations in Kendall’s \(\tau ^{(j)}_b\) correlation with permutation Index (j) for different sample sizes (n): This figure illustrates how the correlation coefficients change as a function of permutation index, when permutations are in lexicographic order, across various sample sizes. Each subplot represents a different value of n, with red dashed lines marking factorial milestones to highlight significant permutations. Annotations indicate the factorial values of the first integers (\(362,000=9!\)), providing insights into correlation trends and permutation complexity as n increases

3.4.2 Identifying the optimal permutations

At the heart of our analysis lies the quest for permutations that realise Kendall correlation aligned with the target \(\tau _b\) values stemming from the geometric correlation analysis. For each season, we may have different \(n\in {16,18,20,21}\), as can be grasped from Table A1. The procedure is as follows:

  1. 1.

    We extract the target \(\tau _b\) values from the preceding geometric analysis for the year as the area of the triangle/radar chart. We transform the various As in a \(\tau _{b}\)s.

  2. 2.

    We calculate the absolute difference between each \(\tau _b\) target and the various \(\tau ^{(j)}_b\) resulting from comparing each permutation with the target, where j is the index of permutation.

  3. 3.

    We isolate permutations whose \(\tau ^{(j)}_b\) values are nearest to the target \(\tau _b\), implementing a tolerance threshold to facilitate a significant comparison. This tolerance is derived from the rounded (to the third decimal digit) interval between consecutive lexicographically ordered permutations \(\tau ^{(1)}_b - \tau ^{(2)}_b\) values in our permutation analysis, accommodating the inherent variability in the dataset; in fact, in this way, it depends on the \(n!\) possible operations.

This methodology empowers us to single out some permutations (i.e., hypothetical rankings accounting for GA and GF) that most accurately conform to the theoretical ideals elucidated by our prior analysis. These optimal permutations shed light on alternative ranking methodologies that more faithfully mirror team performances and the competitive dynamics across the football season.

3.4.3 Implementation and challenges

The implementation was conducted using Python, with the assistance of libraries such as itertools for permutation generation, scipy for statistical computations, and pandas for data handling. This computational framework facilitated a thorough exploration of ranking permutations against predefined criteria, offering a fresh perspective on the assessment of football championships.

A significant challenge in this process is the identification of all the permutations that match Kendall \(\tau _b\) target correlations, resulting in the target areas obtained from the triangle’s geometric analysis, driving the problem to something computationally challenging. In fact, to find the best permutations that meet the case of \(n=21\), one has to explore potentially 21! possibilities, which are evidently complex and expensive.

4 Results and discussion

This section elaborates on the outcomes derived from our methodological framework and discusses their implications within the realm of football analytics, specifically in the context of Serie A. The analysis provides insightful revelations about team performance dynamics and proposes a novel perspective on ranking methodologies.

Our geometric representation of team performances allows a different view of the Serie A history and is illustrated in Fig. 3. It unveils a trend where the last ten seasons exhibit a distinguishable pattern. This observation suggests a shift towards more balanced team strategies, aiming to optimise both offensive and defensive plays (see Okada and Takagi 2008, on the impact of various strategies on GA and GF). The gap between the calculated areas and the ideal scenario (Area = 1) quantifies the extent to which the official rankings might overlook the intricate balance between goals scored (GF) and goals conceded (GA) when accounting for more granular teams’ performance in forming the final ranking.

The transformation of these areas into target Kendall \(\tau _{b}\) coefficients provides a foundation for empirical analysis. As depicted in Fig. 5, the majority of seasons align positively with the official rankings, indicating a generally robust system but not completely accounting for GA and GF. Anomalies identified in the negative range (1943, 1956, 1957) call for a closer examination of those particular seasons and potentially underline the need for a refined ranking mechanism that better captures team performance nuances.

Our exploration into the New Rankings, facilitated by the examination of permutations and Kendall \(\tau _b\) correlations, highlights the potential for alternative standings that deviate significantly from the official rankings. As shown in Fig. 6 for 2023, incorporating goals scored and conceded into the rankings can result in substantial shifts in team positions. This variability underscores the impact of evaluating team performances beyond mere wins, draws, and losses, advocating for a more granular approach to ranking that acknowledges the multifaceted nature of football competitions. The case presented in Fig. 6 is already meaningful even if the number of iterations tested stops at 362,880, and to complete the exercise, one should have gone to 20!, as 20 were the team competing. Another way to observe the impact of targeting a level of Kendall \(\tau _b\) that ensures capturing a more comprehensive set of features into the ranking is proved by Fig. 7. Such a figure contains the Kendall \(\tau _b\) in the cases of inversion of the first f elements as well as of the l last elements of the rankings. We take n ranging from 10 to 30 to include the cases of interest for the analysed championships. For example, if \(f=5\) and \(n=10\), we have the “reverted” ranking (5, 4, 3, 2, 1, 6, 7, 8, 9, 10) and when \(l=5\) and \(n=10\), we have (1, 2, 3, 4, 5, 10, 9, 8, 7, 6). One can notice that in the case of \(n=16\) teams playing in a championship, the inversion of the ranking for the last \(l=4\) teams in the ranking means pursuing a target Kendall \(\tau _b\) of 0.9, indicating that with very little variations on the ranking system, teams may or may not face a relegation. This explains the impact of the inversion well when GF and GA are taken into full consideration.

By arranging team permutations in lexicographical order, we systematically explore variations from the initial ranking, incrementally adjusting team positions. In Fig. 4, the permutation index j is plotted along the x-axis, and the corresponding Kendall’s tau correlation coefficient, \(\tau ^{(j)}_b\), is plotted along the y-axis. This arrangement reveals a pattern of regular fluctuations in \(\tau ^{(j)}_b\) values, manifesting as seasonal cycles across the permutation index. We argue that the number of solutions depends on the target correlation. The extreme cases of \(\tau _b=-1\) or \(\tau _b=1\) are associated with singular solutions to the problem – for \(\tau =-1\) being the complete reversion of the ranking, while for \(\tau _b=1\) being the original series itself. More specifically, \(\tau _b=1\) is the perfect agreement with the official ranking, representing a scenario where the permutation does not alter the original team order, highlighting the unique case where the equivalence class contains only the official ranking itself. If one slightly modifies these two corner cases, the number of solutions that lead to the target case increases as one approaches a target \(\tau _b=0\) for the definition of Kendall correlation.

The cycles presented in Fig. 4 reflect the comprehensive range of permutations explored, with the length of each cycle corresponding to the total number of teams involved; for instance, with 18 teams, the permutation space, and hence the cycle length, expands to 18!. Due to computational constraints, the analysis presented in the figure samples only a fraction of the total permutation space.

This cyclical pattern underscores that multiple permutations yield identical Kendall \(\tau _b\) correlations, indicating that multiple rankings could feasibly represent the data with equivalent statistical validity. Thus, as already said above, a specific \(\tau _b\) value may correspond to a set of rankings rather than a single, unique order. This set forms an equivalence class, each member sharing the same Kendall \(\tau _b\) correlation with the official ranking. The diversity within these classes illustrates the potential for alternative interpretations of team performance and rankings.

Fig. 5
figure 5

Histogram of the resulting \(\tau _b\) obtained from mapping the areas back to the [− 1,1] correlation range along the Italian Serie A history, 1930–2023

Fig. 6
figure 6

In this box plot, the results for the 2023 season are reported. The target correlation is \(\tau _b = 0.842962\) and the optimal correlation is \(\tau ^j_b = 0.842105\) with j taking some opportune values in the set \(\{0,\dots ,362880\}\), being 362,880 the number of permutation tested in this case when the generated permutations are stored in lexicographic order. The whiskers of each box represent respectively min and max ranking obtained in the considered permutations to meet the target correlations. The vertical line that splits the box in two is the median obtained from the raking obtained for that position (y-axis) and the left and right sides of the box are the 25-th and 75-th percentile of ranks assigned to that ranking position. When there is a single bar, like for cases 1, \(\dots\) ,11, it means that no changes has been recorded. It is worth recalling that here 362880 permutations are considered over 20! that should have been explored to complete the plot

Fig. 7
figure 7

The x-axis has ticks indicating that the Kendal Tau correlation reported in the cells is calculated comparing the original series 1,2,...,n (y-axis) with the series where the first (‘f.’) or the last (’l’) k elements have been permuted, inverting their order. For example, when \(n = 10\), for “f. 2 inv.”, the value 0.96 is obtained by applying Formula (2) to the series (1,2,3,4,5,6,7,8,9,10) and (2,1,3,4,5,6,7,8,9,10)

5 Concluding remarks

This study embarked on a novel exploration of Serie A football championship rankings by introducing a comprehensive methodology that integrates geometric analysis with Kendall’s \(\tau _b\) correlation coefficients. Through this approach, we scrutinised the alignment between official team rankings and those derived from goals scored and conceded, as well as with the official rankings, unveiling the potential for alternative rankings that might offer a deeper insight into team performance dynamics.

Our findings reveal that while the performance metrics-based rankings (here being Goals For and Against, GF and GA) broadly align with the official ranking system, there are distinct seasons where alternative ranking methodologies could provide a more nuanced understanding of team capabilities. Introducing a geometric representation to visualise the relationship between different ranking metrics not only enriches our analytical toolkit but also highlights the multifaceted nature of football competitions, where outcomes are influenced by a complex interplay of offensive and defensive strategies mirrored in the GA and GF-based rankings, here indicated with \(GA_r\) and \(GF_r\) respectively.

Furthermore, identifying equivalence classes among rankings underscores the notion that multiple valid perspectives can exist regarding team performance, challenging the singular narrative often presented by official standings. This observation invites a broader discussion on the criteria and metrics used to assess and compare teams, suggesting that there is room for innovation in ranking methodologies that more accurately reflect the competitive landscape of Serie A football.

5.1 Implications for stakeholders

For stakeholders in the football community-ranging from team management and coaches to analysts and fans-our study offers a fresh lens through which to evaluate team performance. By considering alternative rankings, stakeholders can better understand a team’s strengths and weaknesses, guiding strategic decisions from player development to game tactics. That certainly does not come to question the official outcome of a season winner, but more to award different teams because of their outstanding defensive tactics or aggressive ones manifested in GA or GF, also addressing some challenges described in Sziklai et al. (2022).

5.2 Limitations and future research

While our study contributes valuable insights into football ranking systems, it is not without limitations. The computational complexity of analysing all possible permutations of team rankings poses a challenge, necessitating further methodological innovations to explore the full permutation space efficiently. In particular, the application of suitably defined heuristics for reducing the cardinality of the set of permutations as n grows is a possible way to let the problem be tractable, no doubt. This opens the gate to more operational research-oriented studies—that are out of the scope of the present paper.

Additionally, our focus on Serie A limits the generalizability of our findings. Future research could extend this methodology to other leagues and sports, examining the universality of our observations across different competitive contexts and combining it with existing methods such as Sum of Ranking Differences, presented in Sziklai and Héberger (2020).

Moreover, incorporating other performance metrics (in fact, the axis on the radar charts can be more than three), such as aggregated player statistics or situational variables (e.g., weather conditions during matches), could enhance the robustness and relevance of alternative ranking systems.

5.3 Final thoughts

In conclusion, our study enlightens the potential for alternative perspectives in evaluating football team performances, inviting a reconsideration of conventional ranking systems. As we continue to navigate the rich and evolving landscape of sports analytics, the pursuit of more sophisticated and representative methodologies for assessing team success remains a compelling and worthwhile endeavour.