Keywords

1 Introduction

BCCI debuted the Indian Premier League (IPL) which is a Twenty20 (T20) cricket extravaganza in 2008. It is held in the month of April–June on an annual basis. As of 2015, IPL consists of eight teams which represent eight cities of India: Chennai Super Kings (CSK), Delhi Daredevils (DD), Kings XI Punjab (KXIP), Kolkata Knight Riders (KKR), Mumbai Indians (MI), Rajasthan Royals (RR), Royal Challengers Bangalore (RCB) and Sunrisers Hyderabad (SRH). All eight teams are owned and managed by franchises. IPL is the most popular T20 league in the world since it was the first sporting event to be broadcasted on YouTube. Franchises select cricketers through the IPL auction, thus building their team. The team winning the tournament is awarded a prize money of Rs. 150,000,000, runner-up is awarded Rs. 100,000,000 and Rs. 75,000,000 each for 3rd and 4th team. The last four teams get no prize money.

2 Objective

We attempt to identify the Most Valuable Player among all the participants in an IPL auction. MVP is dynamic in nature. That means, after every player selection there will be a variation in team’s batting and bowling requirements. Based on the requirement value, the MVP value will also vary. Initially, we use decision tree for the classification of players of a particular team with class A, B, C and D respectively. We also analyze the requirement of the owner and suggest which players would have more value if added to the team based on the type of already selected players. Further, we analyze the contribution of individual players for a particular team using the concept of Correlation Measure using Lift. Finally, the similarity can be measure among players of a team by using the concept of symmetric & asymmetric binary variables.

3 Literature Survey

P. Kansal, P. Kumar, H. Arya and A. Methaila in [1] suggested a method for estimating base price of a player based on his past performance and predicting his selection. This can help the decision making authorities to set price for the players. The authors have used Näive Bayes Theorem, Multilayer Perceptron and J48 Algorithm to compare and arrive at the results. They arrived at the conclusion that Multilayer Perceptron gives the best results.

S. Singh, S. Gupta and V. Gupta in [2] proposed an integer programming real-time model for optimal strategy for binding processes. Spreadsheets were used to document and calculate the results. Spread sheets was the optimal choice considering that flexibility for more weight-age based on recent performance of a player can be easily incorporated to evaluate the final outcome.

S. Singh in [3] uses Data Envelopment Analysis to measure how effective teams are in IPL. The author calculates awarded points, total run rate, profit and returns by determining that total expenses including the wage price of players and staff as well as other expenses. Efficiency score is usually directly related to the performance of the player in the league. On decomposing the inefficiencies into technical and scale inefficiency, it is realize that the inefficiency is primarily due to un-optimized scale of production and un-optimized transformation of the results and the considered data.

P. Kalgotra, R. Sharda and G. Chakraborty in [4] develop predictive models which aid managers to select players for a talented team in the least possible price. This is calculated on the basis of the player’s past performance. The author uses SAS Enterprise Miner 7.1 to build the models. The optimal model is selected on the basis of the rate of validation data misclassification. This model helps in the selection of players by aiding in the author’s bidding equation. This research also facilitates the managers to set the salaries for players.

F. Ahmed, K. Deb and A. Jindal [5] use NSGA-II algorithm to propose a new representation scheme and a multi-objective approach for selecting players in a limited budget considering the batting and bowling strengths along with the team formation. Factors such as fielding further optimize the results. The dataset to define performance is taken from IPL–2011 Edition. The author shows analysis in real-time auction events, selecting players one-by-one. The author argues that the methodology can be implemented across other fields of sports such as soccer etc.

S.K. Rastogi and S.Y. Deodhar [6] attempt to find out relevant attributes and their relative valuations. The author uses bid and offer curve concept. This concept is adapted from hedonic price analysis and establish a relation between the bid amount and player characteristics econometric-ally for IPL (2008).

Sonali B and Shubhasheesh B [7] focus on how teams strategically decide on the final bid amount based on past player and team performance in IPL and formats similar to IPL. The authors also shed light on how personalities of players can affect team performance. They analyze the possible factors based on which bidders decide and build a predictive model for pricing in the auction. The analysis is done individually for all the teams.

P.K. Dey, D.N. Ghosh and A.C. Mondal [8] propose that the contribution of each cricketer to team performance can be quantified and performance evaluation of the cricketers is a vital issue. The study measures the performance of bowlers and compute rankings bases on performances using AHP and TOPSIS methods. Performances are computed using AHP-TOPSIS and AHP-COPRAS providing the ranking.

J.M. Douglas and N. Tam [9] take the success in relation with batting, bowling and fielding variables associated with into consideration. The authors compare batting & bowling attributes of the winning and losing teams by analyzing the differences magnitude using Cohen’s Effect Size concept. They suggested that the primary indicators for success were losing less number of wickets during the powerplay, having a high ‘runs per over’ score, score more runs in middle 8 overs and maximizing the number of dot balls which are bowled. They concluded by stating that teams should focus on maximizing 50 + run partnerships, batsmen who hit boundaries, taking wickets and delivering maximum dot balls.

H. Saikia and D. Bhattacharjee [10] classify performances of all-rounders into ‘Performer’, ‘Batting-All Rounder’, ‘Bowling-All Rounder’ and ‘Under-performer’. Further, they suggest and consider independent variables that influence an all-rounder’s performance by using Step-wise Multinomial Logistic Regression (SMLR). The independent variables are used to predict the class of an all-rounder player using Naive Bayes Classification concept.

P.K. Dey and D.N. Ghosh [11] propose a methodology’AHP-ANOVA-TOPSIS’ by identifying the attributes for consideration. The authors then assign weights to the attributes in order to create a decision matrix. The overall contribution of all the decision attributes is computed, after which the total eight for each attribute is calculated. Finally, after computing overall assessment measures for all alternatives, the alternatives are ranked.

4 Methodology

4.1 MVP Calculation

In this section, we need to find out the player’s batting points (PBT), player’s bowling points (PBW) and player’s experience (PEX). In order to find out the above three formulae’s, we need to consider the following parameters: Player’s Batting Average, Player’s Batting Strike Rate, Number of centuries and half-centuries, Bowling Average, Bowling Strike Rate, Economy, Number of 4-wicket and 5-wicket haul and Number of Matches Played. We define the ‘Most Valuable Player’ (MVP) as the single parameter that can be used to compare any type of player in the auction. MVP is decided on the basis of requirement of type of player selected by the owner. For this, we need the ‘Requirement Points’ (minimum required in the team) for batting(BARP), bowling(BORP) and experience(ERP). ‘Total Requirement’ (TRP) is the sum of all requirement points (Batting + Bowling + Experience) i.e.

$$ {\text{TRP }} = {\text{ BARP }} + {\text{ BORP }} + {\text{ ERP}} $$
$$ \begin{aligned} {\mathbf{PBT}} & = ((({\text{BattingAverage}} * 0. 3) + ({\text{BattingStrikeRate}} * 0. 4) \\ & \quad + ({\text{floor }}({\text{NumberofHundreds}}) \, *0. 1) + ({\text{Number of Fifties}} * 0. 2))/ 10) \\ \end{aligned} $$
(1)
$$ \begin{aligned} {\mathbf{If}}\,{\mathbf{that}}\,{\mathbf{the}}\,{\mathbf{bowler}}\, & {\mathbf{must}}\,{\mathbf{have}}\,{\mathbf{bowled}}\,{\mathbf{minimum}}\,{\mathbf{100}}\,{\mathbf{bowls}}\,{\mathbf{in}}\,{\mathbf{his}}\,{\mathbf{IPL}}\,{\mathbf{career}},{\mathbf{then}},{\mathbf{PBW}} = \\ & ((( 300/{\text{BowlingAverage}}) + ( 200/{\text{BowlingStrikeRate}}) + ( 300/{\text{Economy}}) \\ + {\text{floor}}( & {\text{Numberof4}} - {\text{wicketshaul}}) * 0. 1+ {\text{floor }}({\text{Numberof5}} - {\text{wicketshaul}}) * 0. 1)/ 10) \\ \end{aligned} $$
(2)
$$ {\mathbf{PEX}} = ({\text{Number}}\,{\text{of}}\,{\text{Matches}}\,{\text{Played}}/{\text{Total}}\,{\text{Number}}\,{\text{of}}\,{\text{Matches}}\,{\text{in}}\,{\text{IPL}}\,{\text{so}}\,{\text{far}}) $$
(3)
$$ {\mathbf{If}}\,{\mathbf{PBW}} = {\mathbf{0}}\,{\text{then}},\,{\text{MVP}} = ( 8* {\text{PBT}} * ({\text{BARP}}) + ({\text{PBW}} * {\text{BORP}}) + ({\text{PEX}} * {\text{ERP}}))/({\text{TRP}} * 10) $$
(4)
$$ {\mathbf{If}}\,\frac{{{\mathbf{PB}}{\mathbf{T}}}}{{{\mathbf{PBW}}}} > = {\mathbf{2}}\,{\text{then}},{\text{MVP}} = ( 7* {\text{PBT}} * ({\text{BARP}}) + ( 2* {\text{PBW}} * {\text{BORP}}) + ({\text{PEX}} * {\text{ERP}}))/({\text{TRP}} * 10) $$
(5)
$$ {\mathbf{If}}\,\frac{{{\mathbf{PBW}}}}{{{\mathbf{PB}}{\mathbf{T}}}} > = {\mathbf{2}}\,{\text{then}},{\text{MVP}} = ( 2* {\text{PBT}} * ({\text{BARP}}) + ( 7* {\text{PBW}} * {\text{BORP}}) + ({\text{PEX}} * {\text{ERP}}))/({\text{TRP}} * 10) $$
(6)
$$ {\mathbf{Otherwise}},{\text{MVP}} = ( 9* {\text{PBT}} * ({\text{BARP}}) + ( 9* {\text{PBW}} * {\text{BORP}}) + ( 2* {\text{PEX}} * {\text{ERP}}))/({\text{TRP}} * 20) $$
(7)

4.2 Decision Tree

Decision Tree is powerful decisive tool used for Classification and Prediction. Every node is bonded with rules that help the data to be classified according to the nature defined by the rules. It is basically used in Data Warehouse for Knowledge Discovery.

Following are the features of a Decision Tree:

  • There must be finite number of distinct attributes for classification.

  • Target values of data used for classification should be discrete.

  • There should not be any missing data which are important for classification.

Following are the components of a Decision Tree:

  • Decision Node A non-leaf node used to make a decision according to the relevant data taken into consideration for the classification.

  • Leaf Node Represents the final classification container holding the data post operations occurred at the Decision Node.

  • Path It represent the result used for classification of the data from the decision node.

In Decision Tree Data is classified starting from the root node using top down approach till the leaf node is encountered. We have used decision tree to classify the players into the type and class of player, as shown in Fig. 1. We follow Algorithm 1 to calculate the type and class.

Fig. 1
figure 1

Classification of players using decision tree

4.3 Correlation Analysis Using Lift

Lift is a correlation measure which suggests that the occurrence of A is independent of B if P (A ∪ B) = P(A)P(B). Otherwise, A and B correlation exists between A and B. We define Lift as follows:

$$ lift\,(A,\,B) = \frac{P(A\, \cup \,B)}{P(A)\,P(B)} $$
(8)

If lift (A,B) < 1, then the occurrence of A is negatively correlated with the occurrence of B. If lift (A,B) > 1, then the occurrence of A is positively correlated with the occurrence of B.

If lift (A,B) = 1, then the occurrence of A is independent of the occurrence of B and there exists no correlation.

4.4 Computing Similarity Between Players Using Symmetric and Asymmetric Binary Variables

A symmetric binary variable has both its states (positive/negative) as equally viable and carry the same weight. There is no preference assigned to the outcome. The symmetric binary dissimilarity measure indicates the dissimilarity between objects i & j. For the values based on Table 1, we obtain:

$$ d\,(i,\,j) = \frac{b + c}{a + b + c + d} $$
(9)
Table 1 A contingency table for binary variables

An asymmetric binary variable does not have both of its states as equally important. For instance, two positives may be given preference over two negatives. These binary variables are also known as “monary”.

A binary variable is asymmetric if the outcomes of the states are not equally important. Given two asymmetric binary variables, the agreement of two 1s is considered more significant than that of two 0s. Therefore, such binary variables are often considered “monary”. The dissimilarity based on such variables is called asymmetric binary dissimilarity, where the number of negative matches, t is considered unimportant and thus is ignored in the computation.

$$ d\,(i,\,j) = \frac{b + c}{a + b + c} $$
(10)

Complementarity, we can measure the distance between two binary variables based on the notion of similarity instead of dissimilarity.

The coefficient similarity(i,j) is called Jaccard Coefficient. For example, the asymmetric binary similarity between the objects i and j is:

$$ similarity\,(i,j) = \frac{a}{a + b + c} = 1 - d\,(i,\,j) $$
(11)

5 Case Study

In the paper we are comparing all types of players on the basis of a single computed parameter called MVP value. The parameter is well calculated taking account of many sets of attributes that define a player’s performance. In addition to player performance the MVP value also takes account of the current requirement of the team in the form of Batting Requirement points, Bowling Requirement points and Experience Requirement points. These Requirement points are decided by the owners who need to purchase players from the auction after retaining players of the team from the previous tournament. In other words, these points depict the expectation of owners from the auction. For Mumbai Indians, we calculate the MVP values and type of player in Table 2. For illustration purposes, we have defined the Batting, Bowling and Experience requirement for the IPL teams in Table 3.

Table 2 Mumbai Indians Player details with classification
Table 3 Player requirement table for IPL teams

In Table 4, the values of Gurinder Sandhu for SRH and KXIP are 1.8611 and 2.4413. This is because of different requirements of different teams in terms of bowling, in this case. The similar concept applies to other players & teams. More value indicates high desirability for selection as shown in Fig. 2. As the Auction proceeds and players are bought in the auction the ratio of requirement points are changed and so the MVP value of other players change accordingly for the team who has made a transaction. The respective points of the player have been deducted from the corresponding requirement points of the team who purchased him. To explain this change we have taken a dataset and showed three cases where a player is bought and the MVP values got changed. The variation can be seen in Fig. 3 and Fig. 4.

Table 4 List of players participating in IPL auction
Fig. 2
figure 2

MVP analysis of different players for different teams

Fig. 3
figure 3

MVP variation of players: Finch and Wiese before and after auction

Fig. 4
figure 4

MVP variation of KKR before and after auction

  • Suppose Aaron Finch is bought by Mumbai Indians When Aaron Finch(Batsman) is bought by Mumbai Indians

    Franchise he has PBT = 5.86, PBW = 0 and PEX = 3.03. These are deducted from Requirement Points. So owners has comparatively less requirement for a Batsman. So all Batsman and Batting All-rounder remaining will have comparatively less MVP Value and Bowler and Bowling All Rounder will have comparatively more MVP. Batsman like Mike Hussey, Murali Vijay and Dinesh Kartik MVP value reduced for Mumbai Indians. Also Bowler like Zaheer Khan, Rahul Sharma and Gurinder Sandhu MVP Value increased for Mumbai Indians.

  • Suppose David Wiese is bought by Royal Challengers Bangalore When David Wiese (Batting All Rounder) is bought by RCB, its PBT, PBW and PEX is deducted from RCB team requirement points. So MVP value of players for RCB changes. The Revised values show that RCB require less Batsman and Batting All Rounder as their MVP value has decreased. Also there is a slight increase in MVP in case of bowlers.

  • Suppose Gurinder Sandhu is bought by Kolkata Knight Riders When Gurinder Sandhu (bowler) is bought by KKR, its PBT, PBW and PEX is deducted from KKR team requirement points. MVP value of players for KKR changes. The revised values show that Bowler and Bowling All Rounder importance decreases and batsmen MVP value increases before Sandhu was bought.

After analyzing the dynamic behavior of MVP, we are focusing on the classification among players of different teams by using decision tree approach. We are classifying players into four kinds of classes: A, B, C and D with the help of PBT, PBW and PEX. In this step we have to we have to follow the same method for all teams participating in IPL. In Table 2, we have shown the classes of Mumbai Indians for 7 players. Table 5 depicts how the MVP absolute values change before and after buying a player.

Table 5 Change in MVP Values

In Table 6, we have given the detailed results of performance of individual players of Mumbai Indians for the IPL–2014 Edition.

Table 6 Resultant table of individual performances of players for Mumbai Indians (IPL-2014)

Mumbai, played 15 matches in the IPL–2014 edition, against other teams. Here, we are representing

  • 1 : If a player played well

  • 0 : If a player did not play well

  • - : If a player did not play the match

After representation, we are finding support and confidence for individual players of a team to calculate the contribution of each player. In Table 6, we have shown only the support and confidence of Mumbai Indians. Similarly, we have to find out the support and confidence for all players of all the teams participating in IPL.

To know the match winner player of a team, we are applying the concept of correlation using lift. In Table 7, we have shown the example of two players, Simmons and R. Sharma with their corresponding contribution for Mumbai Indians. We have calculated the team result with Simmons and without Simmons, and also applied the same for R. Sharma.

$$ {\text{P }}\left( {\text{Simmons Performed}} \right) \, = { 7}/ 8 { } = \, 0. 8 7 5 $$
$$ {\text{P }}\left( {\text{Mumbai Won}} \right) \, = { 5}/ 8 { } = \, 0. 6 2 5 $$
$$ {\text{P }}({\text{Simmons Played}} \,\cup\, {\text{MI Won}}) \, = { 4}/ 8= 0. 5 $$
$$ {\text{Lift }}\left( {{\text{MI Won}},{\text{ Simmons Performed}}} \right) \, = 0. 5/\left( {0. 8 7 5*0. 6 2 5} \right) \, = 0. 9 1 4 3 $$
Table 7 Analysis of match winner player for Mumbai Indians (IPL-2014)

Since the value is less than 1, we conclude that Simmons’ Performance and Mumbai Indians are negatively correlated.

$$ {\text{P }}\left( {{\text{R}}.{\text{ Sharma Performed}}} \right) = 1 2/ 1 5= 0. 8 $$
$$ {\text{P}}\left( {{\text{MI}}\,{\text{Won}}} \right) = 7/ 1 5= 0. 4 6 $$
$$ {\text{P}}({\text{R}}.\,{\text{Sharma}}\,{\text{Performed}} \,\cup\, {\text{MI Won}}) = \left( { 7/ 1 5} \right) = 0. 4 6 $$
$$ {\text{Lift}}\left( {{\text{MI}}\,{\text{won}},\,{\text{R}}.\,{\text{Sharma}}\,{\text{Performed}}} \right) = 0. 4 6/\left( {0. 4 6*0. 8} \right) = 1. 2 5 $$

Since the value is greater than 1, we conclude that R. Sharma’s Performance and Mumbai Indians are positively correlated.

Similarly, we can find out the list of consistent performers, along with the match winner category for all the teams which are participating in Indian Premier League–2015.

$$ {\text{a}} = 1 1;{\text{b}} = 10;{\text{c}} = 0 1;{\text{d}} = 00 $$
$$ {\text{Dissimilarity }}({\text{Tare}},{\text{Rayudu}}) = \frac{ 3+ 6}{ 3+ 3+ 6+ 3} = 0.6 $$
$$ {\text{Similarity }}({\text{Tare}},{\text{Rayudu}}) = 1- {\text{Dissimilarity }}({\text{Tare}},{\text{Rayudu}}) = 0. 4 $$
$$ {\text{Dissimilarity }}({\text{Tare}},{\text{Rohit}}) = \frac{1 + 7}{ 5+ 1+ 7+ 2} = 0.534 $$
$$ {\text{Similarity }}({\text{Tare}},{\text{Rohit}}) = 1- {\text{Dissimilarity }}({\text{Tare}},{\text{Rayudu}}) = 0. 4 6 6 $$
$$ {\text{Dissimilarity }}({\text{Rayudu}},{\text{Rohit}}) = \frac{3 + 6}{ 6+ 3+ 6+ 0} = 0.6 $$
$$ {\text{Similarity }}({\text{Rayudu}},{\text{Rohit}}) = 1- {\text{Dissimilarity }}({\text{Tare}},{\text{Rayudu}}) = 0. 4 $$

This shows the similarity of batting is comparatively high for Tare and Rayudu as well as Rayudu and Rohit. Dis-similarity is comparatively high for Tare and Rohit.

After finding the match winning capability of individual players, we are trying to find out the similarity among different players by the concept of symmetric and asymmetric binary variables. In Table 8, we have given the example of similarity among Rayudu, Rohit and Tare along with the similarity among Anderson, Harbhajan and Malinga. From the above table, we are getting a higher value for Tare and Rohit in batting performance and a higher value for Anderson and Harbhajan in bowling performance. So, most likely Tare & Rohit are similar in batting as well as Anderson & Harbhajan are similar in bowling for Mumbai Indians.

Table 8 Similarity Among Mumbai Indian Players

6 Conclusion and Future Work

In our paper we are using MVP value concept to compare players and their dynamic change during an auction to show the effect of each transaction to a team. It aims at preparing a balanced team for any franchise. To distinguish players during an auction we are using Decision Tree concept to classify players according to their role they can get in a team. We are using correlation analysis using lift to identify match winner players according to their performance in the previous held tournament. This analysis will really identify those players whose play is a deciding factor in matches. We have also calculated similarity and dissimilarity between players on the basis of their performances in the past tournament according to Symmetric Asymmetric Binary Variables. This can correlate player to the task given in a team. Using the results of such cases, a team can find various patterns and predict various results of the efforts made by the team. This may give a new dimension for team management for better results.