1 Introduction

Football is one of the most watched and followed games all around the world. It has many complexities, tactics, players, playing styles, formations and what not. Football team performance analysis [1] is not a new concept, and it often leaves the avid watchers perplexed about the dream team which would comprise of their favorite players playing in a formation that compliments their abilities.

Dream team football is a paradigm of social group functioning and performance analysis. Team sports [2] are amalgam of individual skills and team cooperation. Individual skill can be crucial in a social group, as it is central stimuli of persistence of a group of people. Human social interactions [3] and group formation take place with the advent of people with good individual skills. People with certain attributes can be beneficiary to a dream team. The idea of dream team can be exploited in any domain wherein a selected group performance is of pertinent interest. The present paper is an attempt to model a team whose performance is interesting to be realistic in tune with the available data.

People with higher individual skills would bring higher productivity to the functioning social group. The second part is about compatibility. Sports team work as an integrated system of players. These players need to work in a system where each compliments the qualities of the other for achieving a common goal, i.e., to win the match [4]. This research concept of compatibility is pragmatic to the working of social groups as well. Despite individual characteristics, players in team sports need to work in coordination [5, 6] so that the aggregate performance improves. With team sports like football, where formation transition, counter-attacking press and different decisions are taken within a fraction of seconds, good coordination between players is very cardinal. The coordination between players can be judged on discrete parameters like club, nationalities, passing, etc. Dream team is a framework to find out how a bunch of players whose attributes are known will play as a team. This mathematical modeling can be utilized in complex dynamic systems [7, 8] predicting their efficiency and feasibility. Social groups, teams sports, and any other coordinative establishment can be judged on merits using this framework. Compatibility is a quite vague term which we quantified by the means of graph theory which ultimately helped in finding the overall team strength if they played together. This mathematical modeling paper on dream team analysis is a mere effort to propose an idea of dream team in any field of work; using concepts of graph theory and individual characteristics, we can generate a dream team which can collaborate together with much more efficiency with an increase in productivity as well.

Joa \(\widetilde{o}\) Ribeiro et al. [9] have proposed a framework which uses social network analyses and graph theory to evaluate team performance. They considered synergistic interpersonal process between players in competitive performance environments, rather than discrete events. Using graph theory, they evaluated structural and topological properties of interpersonal interactions of teammates. The highlight of this paper is importance of interpersonal relationship for team performance, but it misses to focus on individual skills and work rate of an individual player. Pedro Silva et al. [10] have proposed that intra-team synchronization is governed by local information, which specifies shared affordances responsible for synergy formation. In this paper after experimentation and further research, they instituted those synergies were established and dispersed rapidly as a result of the dynamic creation of informational properties. By these tests the players became faster at regulating their movements with teammates. But this paper didn’t focus on the asymmetric movements among the players which can be a specific strategy. Filipe Manuel Clemente et al. [11] have proposed an approach in which network metrics are used to improve the offensive processes analysis of football teams. Using density, heterogeneity and centralization metrics, it is portrayed that it is feasible to recognize player’s intra-connection and its strength. Florian Korte et al. [12] have portrayed interplay in football, a proposed playmaker indicator that focuses on real passing sequences rather than averages over a game. Additionally, it contributes to a more comprehensive understanding of players' contribution. The framework allows for the integration of other situational variables that are relevant to football performance in addition to play outcome. Filipe Manuel Clemente et al. [13] have proposed a pilot study which insinuated a set of network methods to quantify the specific merits of football teams. The results reveal that the lateral defenders, central defenders, and midfielders are the centroid players of the team. The most independent players in a regular way during all matches analyzed were the midfielders. Thus, it is safe to say midfielders offer a dynamism to the game, making them a prominent figure on the field. HalilOnal et al. [14] found that individual sports like billiards and archery require higher mathematical thinking, and in team sports football requires second highest analytical and problem-solving skills. These outcomes can be extended in support of importance of individual players in team sports to achieve a common outcome of winning. However, the experience of the players was exempted in this study. Jason D. Vescovi et al. [15] proposed correlation analysis to find the similarities between two variables in team sports. This study also highlights the importance of agility, speed and fast reactions in sports. These abilities are analyzed using correlation coefficient to find the degree of their relation. Though correlation analysis finds the degree of relation between two variables, it fails to prove the cause of similarity.

2 Proposed system

Figure 1 shows the block diagram of the proposed system. The proposed system starts with data cleaning and preprocessing the dataset for each playing position there on the field of football. The next step is to select attributes for each position; different positions on the field require different attributes or characteristics. Every attribute has a relevance which interposes to the overall rating of the player. After selecting the attributes, various vector distances are employed to calculate different player indices for different positions.

Fig. 1
figure 1

Block diagram of the proposed system

In the next step, the system takes the type of football formation [8]. The scope of this paper is limited to only 3 variations of 4–3-3 football field formation (4 defenders, 3 midfielders and 3 attacking forwards) [16]. The next step is to take the 11 players with the respective position at which they will play. After taking the player’s names, the system is divided into two parts: the first part is about calculating the MVP (most valued player) and the team rating based on the individual abilities of every player in their respective position of play which ultimately contributes to the overall team rating. The second part is about calculating team chemistry, which is done using concepts of graph theory.

2.1 Dataset description

This section describes the dataset [17] used for the proposed system. The dataset used over here quantifies the properties of every player. Every player has been given 80 + attributes to judge their football skills. These encompasses attacking skills, defending skills and goalkeeping skills. There are 18,209 entries in this dataset, which means there are 18,000 + players with 80 + attributes which will ultimately help in determining the quality of each player.

2.2 Football terminologies

Football as a game comes with a lot intricacy. These intricacies can be team formations, player potentials, tactics, etc. Few of these terminologies are crucial to be discussed here. The 4–3-3 football formation uses four defenders—made up of two center-backs and two full-backs—behind a midfield line of three. The most common set-up in midfield is one deeper player—the single pivot—and two slightly more advanced to either side. It is a high pressing football formation [18] in which the transitions occur as the game advances (Figs. 2 and 3).

Fig. 2
figure 2

4–3-3 football formation

Fig. 3
figure 3

(i) 4–3-3 Attacking (ii) 4–3-3 Balanced (iii) 4–3-3 Defensive

The benefits of the 4–3-3 formation are to create natural triangles when in possession which allows several passing options to the player in possession. The player can work up the possession and can use the wingers to cut back in passing the balls with the midfielders and the overlapping full-backs to go for the goal. One of the trump cards in this formation is a ‘False-Nine’ often employed in the ideas of Pep Guardiola [19] played by Lionel Andres Messi [20]. A False-Nine is a player who is the link between the midfielders and the front attacking line. The False-Nine has the freedom to transit from the forward attacking line to the midfield to become an extra passing option. A False-Nine is a very grueling position to mark by the defenders as his position always keep transiting. This brilliant idea was used by Pep Guardiola during his early stint as a Barcelona Manager, where he became the most successful manager in the history of the club [21].

3 Methodology

3.1 Player index

The first step is to give each and every player a rating. This rating would be eventually used in calculating the overall team rating. The idea behind giving each player a rating is to find out how he would play at different positions. Each position of the field has its own importance, and the requirements of every position are different. Rating every player requires parameters, and these parameters may differ from position to position. Furthermore, the number of parameters may also vary according to different positions. To have a quantification we prescribe,

$$ {\text{Rating}}\left( {{\text{player}}} \right) = P\left( {x_{1} , x_{2} , x_{3} , \ldots .. x_{n} } \right) $$

where x1, x2, etc., are positional parameters derived from data set of attributes such as crossing, passing, tackling, etc. Here it is worth to note that attributes at different positions are different. For example, for a forward position tackling has not been considered as an attribute.

Here, two questions arise: first, how to select the appropriate parameters for every position and second, after selecting the parameters, how to use them to find the player index [22]. The ensuing section deals these pertinent issues.

3.1.1 Parameter selection

The parameters are selected on the basis of the position the player is playing at, simply because for different positions different abilities are required. Each position has a different role, and to fulfill that role, each player should have certain qualities that are suitable for that position. To find those attributes, correlation coefficient (r) [23] is invoked. Correlation coefficient between two variables, say x and y, indicates that if x is high and y is also high, then correlation coefficient is positive, but if one of the variables is high and the other one is low then the correlation coefficient is negative. That is,

$$ r = \frac{{\sum \left( {x_{i} - \overline{x}} \right)\left( {y_{i} - \overline{y}} \right)}}{{\sqrt {\sum (x_{i} - \overline{x})^{2} \sum \left( {y_{i} - \overline{y}} \right)^{2} } }} $$

Having said that for every position the parameters are selected on the basis of the correlation coefficient, positive and high correlation coefficient between two parameters can be used as the degree to find the similarities of two parameters. The correlation information is then used in a proposed novel approach of average correlation coefficient (ACC). The dataset provides parameters which can be used to assess a player on every possible level. The parameters encompass goalkeeping skills, attacking skills, defending skills, dribbling skills, etc. It is worth to note that each position will use a different set of parameters depending upon the role that position has to play.

Table 1 depicts the number of attributes each position requires to access a player in that position. Furthermore, correlation coefficient matrix’s color coding also depicts the similarities between attributes. If the field is greener, the correlation coefficient between two attributes is higher. On the contrary, if the field is red, the correlation coefficient between attributes is negative.

Table 1 Positional parameters and their correlation coefficients

3.1.2 Rating system

After selecting the parameters, a rating system is generated which uses the selected parameters to generate player rating for each player with respect to their position. Each player is represented in vectorial notation wherein a component of vector describes specified attribute of the player. With each attribute now converted into a vectorial component, different method to find the magnitude of the vector can be used to find different forms of rating systems.

$$ {\text{Positon}}\left( {{\text{player}}} \right) = \left\{ {x_{1} , x_{2} , x_{3} , \ldots .. x_{n} } \right\} $$

Here, \({x}_{1}, {x}_{2}, {x}_{3}, \dots .. {x}_{n}\) are the n components of the player vector. Each component in turn is a parameter of that player for a particular position.

This framework uses the following rating systems:

  1. 1.

    Manhattan distance

  2. 2.

    Euclidean distance

  3. 3.

    Mahalanobis distance

  4. 4.

    Average correlation coefficient (ACC)

3.1.2.1 Manhattan distance

Manhattan distance [24] is calculated as non-relative difference between 2 vectors; in other words, the sum of the absolute values is the differences of the coordinates. For instance, if x = (a, b) and y = (c, d), the Manhattan distance M (a, b, c, d) between x and y is |a − c| +|b − d|. The framework uses the Manhattan distance as one of the rating systems.

$$ {\text{Manhattan}}\left( {\left( {x_{1} , x_{2} , x_{3} , \ldots .. x_{n} } \right)\left( {y_{1} , y_{2} , y_{3} , \ldots .. y_{n} } \right)} \right) = \mathop \sum \limits_{i}^{{\text{i = n}}} \left| { x_{i} - y_{i} } \right| $$
3.1.2.2 Euclidean distance

It is the distance between two points in Euclidean space that is represented by the length of line segment between those two points. The square root of the sum of the squares is the differences of the coordinates. For example, if x = (a, b) and y = (c, d), the Euclidean distance E (x, y) [25] between x and y is √((a − c)2 + (b − d)2).

The framework uses the Euclidean distance as one of the rating systems. The coordinates represent the attributes in n-dimensional space, where every axis represents an attribute which is thus a component of the player vector. Euclidean distance of this provides the magnitude of the player vector. Higher the distance better the player.

$$ {\text{Euclidean}}\left( {\left( {x_{1} , x_{2} , x_{3} , \ldots .. x_{n} } \right)\left( {y_{1} , y_{2} , y_{3} , \ldots .. y_{n} } \right)} \right) = \sqrt {\mathop \sum \limits_{1}^{i = n} \left( {x_{i} - y_{i} } \right)^{2} } $$
3.1.2.3 Mahalanobis distance

The distance between two points in multivariate space is calculated with Mahalanobis distance [26]. The Euclidean representation of variables is represented by axes which are drawn at right angle to each other. In a Euclidean plane distance between two points can be calculated using a ruler. The problem arises where the axes are correlated to each other. Here the axes are no longer perpendicular to each other. Moreover, as the dimensions increase, plotting n-dimensional coordinate system is not possible. The Mahalanobis distance solves the problem. It measures the distance between correlated points for multiple variables. The Mahalanobis distance is used to find multivariate outliners, which is a combination of two or more variables.

$$ {\text{Mahalanobis}} = \left[ {\left( {x_{B} - x_{A} } \right)^{T} \times C^{ - 1} \times \left( {x_{B} - x_{A} } \right)} \right]^{0.5} $$

Here xA and xB is a pair of objects and C is the sample of covariance matrix.

3.1.2.4 Average correlation coefficient (ACC)

The framework devised a novel concept average correlation coefficient or ACC. The idea behind ACC is very simple. Correlation coefficient is a measure of similarities between two attributes. If correlation coefficient is high, two variables are very similar and vice versa. The ACC is the mean strength of a parameter with other parameters. High value of ACC signifies the high similarity of a parameter with the other contributing parameters. Thus, the ACC can be called as the factor by which each parameter will contribute toward the overall rating of the player. In other terms, we can call this as the weightage of a parameter in contributing toward the overall rating of the player with respect to other parameters. High ACC means highly contributing parameter, and low ACC means less contributing parameter.

$$ a_{i} = \frac{{\mathop \sum \nolimits_{1}^{j = i - 1} C_{ji} + \mathop \sum \nolimits_{i + 1}^{j = n} C_{ji} }}{n} $$

Here, ai is the ACC of the ith element in the array of n parameters, and Cij is the correlation coefficient of ith and jth element.

$$ {\text{ACC}}\left( {x_{1} , x_{2} , x_{3} , \ldots .. x_{n} } \right) = \mathop \sum \limits_{1}^{{\text{i = n}}} a_{i} x_{i} $$

3.2 Relative ranking

The players are rated using multiple rating systems. But when it comes to team formation and finding overall team rating it is not possible to compare different positions. Mathematically speaking, it is impossible to compare an m-dimensional quantity with an n-dimensional object. Thus, to calculate the overall rating of the player different types of rating systems cannot be used directly as the position with larger number of attributes will always contribute more in the overall team rating. To counter this adversity, the concept of ranking is used. Every player has a rating for every possible position. The player with the higher rating will be ranked above the player with the lower one. This rank will be ultimately used in calculating the overall team rating.

Table 2 contains 10 Sects. (10 probable positions in a 4–3-3 system) having 4 distinctive types of rankings, namely Manhattan, Euclidean, Mahalanobis and average correlation coefficient (ACC). For every section (position) top 5 players and their corresponding ranking are shown.

Table 2 Top players in respective positions

3.3 Team formation

As it is discussed in Sect. 2.2, the framework has utilized 3 distinctive types of 4–3-3 formations. The 3 types of formation which this framework covers are attacking 4–3-3, defensive 4–3-3 and balanced 4–3-3. The attacking 4–3-3 has 2 central midfielders (CM) and a central attacking midfielder (CAM). Defensive 4–3-3 has 2 central midfielders (CM) and one central defensive midfielder (CDM).

With a closer look at football formations, this can be inferred that the football team formations are nothing but a graph. Each player representing a node and the connection between a players in the vicinity can be observed as the edge between two players. With the conversion of the problem into graph paradigm, it can be deduced that each connection between the nodes can have a certain value if the concepts of weighted graph is introduced. This value can be called the compatibility in sporting terms. Compatibility in itself is a very vague term. The question arises how to find that whether two players are compatible or not. Only if the edge which is called compatibility can be quantified, can concrete the idea of good understanding between two players. If the edge value between two nodes is high, then both players will have a great understanding between them. Certain criteria are required to comment on the edge value between two players. This criterion will be discussed in Sect. 3.3.2.

3.3.1 Graph theory induction in 4–3-3 formations

Team sports like football deploys several disciplines of graph theory. A graph G = (V, E) consists of a non-empty vertex set V(G) and a finite family E(G) of unordered pairs of elements of V(G) called edges, such that an edge {v, w} joins the vertices v and w[25]. Each formation in this framework has a different graph. The topology depends highly around the formation. With a defensive 4–3-3 formation [26], the defense remains highly compact and crowded which would make it difficult for the opponent to find spaces between defensive lines to score a goal. With an attacking 4–3-3 formation [27], the central attacking midfielder plays an important role in carrying out the attack. This CAM is the link between the midfield line and the attacking line, making the team very dangerous on counter-attacking play [28], and build-up play [29] as well, but this has a drawback too. With the team attacking so well, it leaves spaces at the back which can be easily exploited once the opposition gets the ball. With a balanced 4–3-3 formation, the team has a choice to attack or defense depending on the wind of the game. This allows easy transitions and gives much more passing options to the player. There is a clear demarcation between football lines of midfield, defense and attack, midfield being the pivot of the formation which can allow the movement of the ball from back of the field to upfront, and being congested when the possession of the ball is lost, making it difficult for the opponent to attack.

Football is a game of coordination and communication. Each position works with the other one to get desired results, and every position has its own importance. As each player is important in his role, certain edges between nodes (players) cannot be prioritized over the other, because interaction of players on certain positions is negligible with the players on other positions [30]. For instance, a forward and goalkeeper rarely interact on field, because of their positions on the field. Similarly, a Left-Back will rarely have an interaction with a player on Right Wing. Thus, their weightage being infinitesimal, these edges are insignificant and can be neglected. The scope of this framework covers the compatibility between two players in vicinity.

Attacking formation should allow the players to keep the ball in opponents half as much as possible. Figure 4(i) clearly shows that with a CAM the opportunities to rotate the ball increase in the upper-half of the field. CAM is connected with 5 players in the vicinity, allowing him to pass the ball to keep the attack in progress. With this formation it is quite visible that there is ample amount of attacking options but the defense looks much stretched. If the ball possession is lost, players will have to cover greater distances to defend any possible goal scoring threat posed by the opponent.

Fig. 4
figure 4

(i) 4–3-3 Attacking (ii) 4–3-3 Balanced (iii) 4–3-3 Defensive

In balanced formation there is a proper demarcation between attacking line, midfield line and defensive line. Midfield as a whole being the pivot of the formation. Midfielder are the link between attacking players and the defenders. Depending on the situation the team can transit from attack to defense providing a wide range of passing possibilities. Figure 4(ii) clearly depicts that midfielder are open to attack and defense at the same time depending on the need of the game. This type of formation is very useful in buildup play which always allows the player to do the one-twos with the adjacent player to build the game while always remaining in the shape.

Defensive formation is very compact in defensive lines. With a CDM playing as a pivot, he is the link which is connecting the midfield with the defense. CDM stops any possible counter-attacking opportunities and is very crucial. CDM is often responsible for providing passing options during high press by the opponent. This type of formation works out pretty well against teams which like to play high pressing game. As seen in Fig. 4(iii), the team is compact at the back with a lot of passing options but attack is very vacant. This type of formation generally attacks using counter play.

3.3.2 Team chemistry

In team sports, the strength of interaction between teammates can be measured using weighted graph [30]. This strength of interaction can be called as compatibility between two players. Before defining compatibility or chemistry mathematically, its literal meaning should be clear. So, compatibility is a state in which two things exist together without any conflict. In footballing terms if two players are compatible with each other, chances of error and miss communication during the game would be reduced. Compatibility is a very important factor in team sports. The players playing together should know each other quite well which will ultimately boost the game of the other. For instance, if a player has a lot of skills but is not able to communicate with the players in the vicinity, his talents won’t be a use to the team. He will ultimately miss passes, goal scoring opportunities and would lose possession rather cheaply. With this it is quite clear that other than individual skills players should be compatible with each other. The question arises what dynamics can shape the compatibility between two players which can ultimately make compatibility quantified?

Factors affecting compatibility:

  1. 1.

    Passing

  2. 2.

    Ball control

  3. 3.

    Nationality

  4. 4.

    Club

  5. 5.

    Experience

The framework has divided compatibility between two players into two parts: A) individual passing and ball receiving quotient and B) communication compatibility.

On-field interaction is a two-way communication, so while calculating compatibility between the nodes (players), it is important to consider attributes of both the nodes (Fig. 5).

Fig. 5
figure 5

Schematic representation of bidirectional graph representing bidirectional nature of compatibility between players

3.4 Passing and ball control factor (PBC Factor)

Passing is one of the most important parts in the game of football. Passing the ball [31] keeps the game in continuous motion, and it is one of the most frequent ways through which players interact with each other. If passing percentage of a player is high, it means that most of the passes are completed by a player. On the other hand, if the ball control of a player is high, it means he would handle the passed ball quite nicely and would keep the game moving. There are two types of passing in football: a) long-passing (LP), which is used to pass the ball across the field or to give a lob pass to a player, and b) short-passing (SP), which is used to pass the ball to players in vicinity with less power, just to avoid losing possession and to build up the game. Both are important and thus play a decisive role in generating PBC factor. Passing is relative in nature, i.e., it depends upon the position the player is playing. Types of passes also vary according to position. Defensive players generally play short passes to avoid losing possession. Attacking players and midfielder depend on both SP and LP. The framework uses the prescribes the following way to calculate PBC,

$$ {\text{PBC}}\left( {{\text{Defensive}} \;{\text{player}}} \right) = {\text{Ball}}\;{\text{Control}} + {\text{SP}} $$
$$ {\text{PBC}}\left( {{\text{Attacking}} {\text{player}}} \right) = 2 \times {\text{Ball}}\;{\text{Control}} + {\text{SP}} + {\text{LP}} $$

Passing and ball receiving is a two-way process, to calculate the first half of compatibility or edge value between two nodes (players); both player’s PBC factor should be taken in account. We call this first half of compatibility as C factor. To calculate C factor between \({P}_{1} \;and\; {P}_{2}\), we prescribe the following formula:

$$ C\left( {P_{1} ,P_{2} } \right) = {\text{PBC}}\left( {P_{1} } \right) + {\text{PBC}}\left( {P_{2} } \right) $$

With passing of ball being relative to position, it is not appropriate to compare PBC factor of two players who play in completely different positions than the other. For instance, GK-LCB will have a different PBC factor than a RCM-LCM. To counter this problem, relative PBC factor is used. For each position minimum and maximum value of PBC is calculated. This minimum and maximum PBC is used to calculate minimum and maximum C factor for every possible edge in the graph. Cmin and Cmax denote the minimum and maximum C values of two players, respectively, in their respective positions. The interval Cmin, Cmax is further partitioned into equally space point in the interval. Every subinterval is assigned an index computed from an algorithm. Any C factor lying in the subinterval is accorded respective index as discussed above. Table 3 depicts the C factor intervals for every possible pair of position. The pair of players can be marked from 1 to 5 using these intervals, giving the first part of the compatibility factor.

Table 3 Equally spaced interval between Cmin and Cmax for every probable edge between two nodes

3.5 Communication quotient (CQ)

In football maintaining formation, advancing, track-back, etc., are very important. Players need to maintain disciple and coordination to do this. Coordination and communication are the keys to a very well-drilled game. So, using this idea communication quotient (CQ) is generated. CQ depends upon two factors: Nationality and Club. If two players have same nationality, their CQ is higher. If two players are of same club, their CQ depends upon the experience of them playing together and players with higher experience of playing together will have high CQ than the players with less experience of playing together. Using CQ and C factor the framework rates each edge out of 10. The rating of every edge represents the synergy between the two players. Good synergy portends fluidity in game, i.e., easy passing and ball receiving between the players, good understanding of game situation and ultimately good cognizing of each other’s game.

4 Results and discussion

Table 4 is used to check the credibility of the proposed framework. In Table 4, the actual data from the real-time matches are compared with the results of the framework. The first and the second column represents the teams which played the match and the date on which the match was actually played in real life, respectively. The third column is team which won the respective match in real life. When the matches were simulated using the framework, the team compatibility and team ratings were the output and the last two columns depict the team with better compatibility and better team rating. Out of 14, 11 results showed the team with better compatibility and team rating (as predicted by the model) has actually won the game.

Table 4 Real-life match results vs framework results

With the credibility of the framework that has been tested, the framework can now be tested on the players who are from different clubs and countries, and it can help in inferring how will the team dynamics will look once these players play together. In Table 5, ten different hypothetical dream teams are simulated using the proposed framework, each of them having different formations and different players playing at different positions.

Table 5 Team lineups

Table 5 shows the lineup of ten random teams and their formation in which they will play. Table 6 depicts different team ratings which were discussed earlier and team compatibility of the simulated teams from Table 5. Table 7 shows the most important player for every team. Top 3 most valuable players (MVP) are given for every team.

Table 6 Team statistics
Table 7 Most valuable players (MVP)

Table 8 depicts the visualization for all teams is present. It portrays team formation and player rating vs positional graph for every team. This is a mere effort to show how these teams would behave if they play in future under the given circumstances with similar properties attached to them.

Table 8 Team formation and team ratings

5 Conclusion

This manuscript advocates the importance of individual players in the process of team formation. Using the prescribed framework, performance analysis of a hypothetical topological structure can be done and its key players and their compatibility with each other can be evaluated. This framework gives a freedom to select different structures to find the optimum results with the given entities. Also, the suggested work can be used to check whether a given player would contribute in the betterment of the team. This analysis helps team to check whether they should select a specific player in a specific position in the match against a specific opponent. Such analysis leads to improved team selection and also allows the management to try various formations for increasing the efficiency of the team. When simulated with real time matches, out of fourteen, eleven results showed the team with better compatibility and team rating (as predicted by the model) has actually won the game. This framework can be used to build a team with better efficiency and can be deployed by major footballing clubs to improve the efficiency of their starting 11 against different opponents. The suggested methodology helps in improving the team dynamics and also allows the management to try few variations for getting better. However, the given framework also has some limitations. The experiment did not highlight the importance of the manager on the field and also on the inter player synergies. Moreover, it also did not consider the league in which the player plays which can ultimately impact the interplayer synergy. Also, other footballing setups are overlooked and preferentially only 4–3-3 and its variations have been considered. The model and approach adopted in this paper can be applied to any field where selection process out of the given data is important. It may have further refinement with more added complexities, etc.