Constraint Programming and Machine Learning for Interactive Soccer Analysis

Duque, Robinson; Díaz, Juan Francisco; Arbelaez, Alejandro

doi:10.1007/978-3-319-50349-3_18

Robinson Duque¹⁶,
Juan Francisco Díaz¹⁶ &
Alejandro Arbelaez¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10079))

Included in the following conference series:

International Conference on Learning and Intelligent Optimization

785 Accesses
1 Citations

Abstract

A soccer competition consists of n teams playing against each other in a league or tournament system, according to a single or double round-robin schedule. These competitions offer an excellent opportunity to model interesting problems related to questions that soccer fans frequently ask about their favourite teams. For instance, at some stage of the competition, fans might be interested in determining whether a given team still has chances of winning the competition (i.e., finishing first in a league or being within the first k teams in a tournament to qualify to the playoff). This problem relates to the elimination problem, which is NP-complete for the actual FIFA pointing rule system (0, 1, 3), zero point to a loss, one point to a tie, and three points to a win. In this paper, we combine constraint programming with machine learning to model a general soccer scenario in a real-time application.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Towards a Principled Solution to Simulated Robot Soccer

A comparison of penalty shootout designs in soccer

Article Open access 20 April 2020

Disruptive Innovations in RoboCup 2D Soccer Simulation League: From Cyberoos’98 to Gliders2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Soccer fans usually have questions related to their favourite teams and most of the time they are subject to media speculations that are sometimes proved wrong. Many domestic leagues use a two-stage tournament structure with a single or double round-robin tournament for the regular season and a final knockout stage (aka playoffs). The first stage of the league typically features between 16 and 30 teams, each team faces each other team once or twice with home and away matches distributed evenly in the regular season. Depending on the results of the matches, every team is awarded some points under the FIFA three-point-rule (three points for a victory, one point for a draw, and zero points for a defeat), and the top k teams (typically eight) qualify for the playoffs.

The elimination problem is well-known in sports competitions, particularly from baseball [1, 2] and consists in determining whether at some stage of the competition a given team still has the opportunity to be within the top teams to qualify for playoffs. The complexity of the problem depends on the score system for the results of the matches. In [3, 4] the authors showed that the elimination problem is NP-complete for the current FiFA score system (0, 1, 3). However, interestingly [4] pointed out that with the old FIFA score system (0, 1, 2) from the 90’s, the elimination problem could be solved in polynomial time using a network flow algorithms as first proposed by [5]. In this paper we attempt to present a general model to simulate scenarios and problems where fans can formulate queries about the positions of the teams at the end of a tournament, e.g., Will R. Madrid be in a better position than 3. To this end, we propose a combination of constraint programming (CP) with machine learning (ML) to answer soccer related queries.

2 CP Model for Soccer Queries

CP is a powerful technique to solve combinatorial problems which combines backtracking with constraint propagation. At each step a value is assigned to some variable. Each assignment is combined with a look-ahead process called constraint propagation which can reduce the domains of the remaining variables. In the following, we describe a CP formulation for soccer competitions, we start by offering a list of variables and notations for a basic soccer model.

n: number of teams in the competition;
T: set of team indexes in the competition;
i, j: team indexes, such that ($i, j \in T$);
$p_{i}$: initial points of team i. If i has not played any games, then $p_i=0$;
F: number of fixtures left to be played in the competition. A fixture consists of one or more games between competitors;
k: represents a fixture number, ($1 \le k\le F$);
G: set that represents the schedule of the remaining games to be played. Every game is represented as a triple $ng_{e}=(i, j, k) \wedge 0 \le e \le |G|$, where k is the fixture when both teams (i and j) meet in a game;
$pt_{ik}$: represents the points that team i gets in fixture k, ($1 \le k \le F \text { and } pt_{ik} \in \{0, 1, 3\})$. If team i is not scheduled to play fixture k, then $pt_{i,k}=0$.
$tp_{i}$: total points of team i at end of the competition;
$geq_{ij}$: boolean variable indicating if team j has greater or equal total points as i: if $tp_j \ge tp_i$ then $geq_{ij}=1$; otherwise $geq_{ij}=0, \quad (\forall i , j \in T) $;
$eq_{ij}$: boolean variable indicating if two different teams i and j tie in points at the end of the competition: if $tp_j = tp_i$ and $i \ne j$ then $eq_{ij}=1$; otherwise $eq_{ij}=0, \quad (\forall i , j \in T)$.
$pos_{i}$: position of team i at the end of the competition;
$worstPos_{i}$: upper bound for $pos_{i}$;
$bestPos_{i}$: lower bound for $pos_{i}$;

Position in Ranking Queries: we use this set of variables to represent queries about positions of the teams at the end of the competition (e.g., R. Madrid will be in position 3).

P: set of possible position in ranking queries, defined as a set of triples $np_{b}=(i, opr_i, ptn_i)$ and $0 \le b \le |P|$;
$opr_i$: logical operator ($opr_i \in \{ <, \le , >, \ge , = \}$) to constrain team i;
$ptn_i$: denoting the expected position for team i; $1 \le ptn_i \le n $;

2.1 CP Model Formulation

Basic Soccer Model: Constraints (1), (2), and (3) represent a valid game point assignment (0,3), (3,0) or (1,1) for each game $ng_{e} \in G$ between two teams i and j in a fixture k:

$$\begin{aligned} (0 \le pt_{ik} \le 3) \wedge (0 \le pt_{jk} \le 3 ) \quad \forall ng_{e} \in G \wedge ng_{e}=(i,j,k) \end{aligned}$$

(1)

$$\begin{aligned} (pt_{ik} \ne 2) \wedge (pt_{jk} \ne 2)\quad \forall ng_{e} \in G \wedge ng_{e}=(i,j,k) \end{aligned}$$

(2)

$$\begin{aligned} 2 \le pt_{ik}+pt_{jk} \le 3 \quad \forall ng_{e} \in G \wedge ng_{e}=(i,j,k) \end{aligned}$$

(3)

Constraint (4) corresponds to the final points $tp_{i}$ of a team i. It is the addition of the initial points $p_i$ and the points $pt_{ik}$ obtained in every fixture k:

$$\begin{aligned} tp_i = p_i + \displaystyle \sum _{k=1}^{F} pt_{ik}\quad \forall i\in T \end{aligned}$$

(4)

Constraints (5) to (8) are used to calculate final positions. All the final positions must be different and every position is bounded by $bestPos_i$ and $worstPos_i$:

$$\begin{aligned} {\begin{matrix} geq_{ij}= {\left\{ \begin{array}{ll} 1,&{} \text {if } tp_j \ge tp_i\\ 0,&{} otherwise\\ \end{array}\right. }\quad \forall i,j \in T \\ worstPos_{i}=\sum _{j=1}^{n} geq_{ij} \quad \forall i,j \in T \end{matrix}} \end{aligned}$$

(5)

$$\begin{aligned} {\begin{matrix} eq_{ij}= {\left\{ \begin{array}{ll} 1,&{} \text {if } tp_j = tp_i \text { and } i \ne j\\ 0,&{} otherwise\\ \end{array}\right. }\quad \forall i,j \in T \\ bestPos_{i}= worstPos_{i} - \sum _{j=1, j\ne i}^{n} eq_{ij}\quad \forall i,j \in T \wedge i\ne j \end{matrix}} \end{aligned}$$

(6)

$$\begin{aligned} bestPos_{i} \le pos_{i}\le worstPos_{i} \quad \forall i \in T \end{aligned}$$

(7)

$$\begin{aligned} alldifferent(pos_{1},\ldots ,pos_{n}) \end{aligned}$$

(8)

Position in Ranking Queries: involves a set of constrained teams and indicates whether a given team can be above, below, or at a given position $ptn_i$, constraint (9) depicts the five possibilities:

$$\begin{aligned} \forall np_{b} \in P \wedge np_{b}=(i, opr_i, ptn_i) {\left\{ \begin{array}{ll} pos_{i} = ptn_i, &{} \text {if } opr_i \text { is } = \\ pos_{i}< ptn_i , &{} \text {if } opr_i \text { is } <\\ pos_{i} \le ptn_i, &{} \text {if } opr_i \text { is } \le \\ pos_{i}> ptn_i, &{} \text {if } opr_i \text { is } >\\ pos_{i} \ge ptn_i, &{} \text {if } opr_i \text { is } \ge \\ \end{array}\right. } \end{aligned}$$

(9)

2.2 Variable/Value Selection

Generic heuristics (e.g., [6, 7]) typically do not perform well for real-life problems as these heuristics do not exploit the structure of the problem. Therefore, in this paper we propose some query based heuristics for variable/value selection. First we introduce a set of required variables in order to describe a priority mechanism to select the team variables constrained in queries P:

$spos_{i}$: starting position of team i before any branching strategy is applied;
$pri_i$: denoting the priority of team i to be selected during branching, If team i does not appear in any query, then $pri_i=0$;
$str_i$: denoting the global branching strategy for the variables $pt_{ik}$ of a particular team i in every fixture k. $str_i$ starts with “tie” as a default value.

Heuristics for Position in Ranking Queries $\mathbf{(}{{\varvec{P}}}{} \mathbf{).}$ Recall from (9) that we use the position $pos_i$ to constrain a team to a wanted position $ptn_i$. Suppose we have the query $(pos_i < 8)$. It’s natural to try $str_i=win$ and assign a priority using the position $ptn_i$ from the query. We depict in (10) some general rules for variable value selection:

(10)

Interestingly the defined heuristics in (10) for queries with the “=” operator seem to fail quite often (see Sect. 3). Suppose a scenario with a query $(pos_i = 7)$ where $spos_i=9$ with $F=8$ fixtures to play. Given that the starting position is 9 and we have to reach position 7, the global branching strategy $str_{i} =win$ causes that $pos_i$ overshoots position 7 and would require many backtracks of the search algorithm in order to reach such position, therefore, it might be useful to perform a bias search and in the following section we tackle this problem by using machine learning.

Machine Learning for Value Selection. For teams constrained with the “=” operator, we decided to assign a high priority ($pri_{i}=|spos_i - ptn_i| \cdot n$) for variable selection and to avoid position overshooting, we trained a classifier that selects among 9 branching strategies: S1 = [1,0,0], S2 = [0,1,0], S3 = [0,0,1], S4 = [0.5,0.5,0], S5 = [0.5,0,0.5], S6 = [0,0.5,0.5], S7 = [0.5,0.25,0.25], S8 = [0.25,0.5,0.25], S9 = [0.25,0.25,0.5]. Each strategy defines probabilities to select among [win, tie, lose] respectively, e.g., S7 means that for a team i, every variable $pt_{ik}$ will be assigned win with a probability of 0.5, tie and lose with a probability of 0.25 each. We use the selected strategy with a restart-based search; therefore we restart the algorithm when some cutoff in the execution time is met. (3 s in this paper). Notice that we excluded a strategy [1/3, 1/3/, 1/3] as preliminarily tests showed a poor performance for this alternative.

Training the Classifier: In order to train this classifier, we created a total of 500 P queries with the equality operator at different stages of a tournament (fixture 7, 9, 11, 14 and 16) with 18 teams, scheduled in a single round robin. We ran every query with each of the 9 branching strategies in order to get the strategy that solved the instance in the shortest time and created a data set with the following features: starting position ($spos_i$), wanted position ($ptn_i$), direction and distance ($spos_i - ptn_i$), fixtures to play (F), range rate ($|spos_i - ptn_i|/n$), best executing strategy $\in \{S1, S2, S3, S4, S5, S6, S7, S8, S9\}$.

In this paper we use J48 (the Weka v3.6.12 implementation of C4.5) to evaluate the performance of the algorithms. The objective is that for each query P with the equality operator “=”, J48 assigns one of the nine branching strategies to the constrained team.

3 Empirical Evaluation

Tests Configuration. We evaluated our models using Mozart-Oz (V 1.4.0) as our CP reference solver. All the experiments were performed in a 4-core machine, featuring an Intel Core i5 processor at 2.3 Ghz and 4 GB of RAM. We focus our attention in the Colombian league (liga Postobón 2014-I) with 18 teams and 18 fixtures to play in a single round-robin schedule (17 fixtures + 1 extra fixture for the derbies). We provided five experimental scenarios (i.e., fixtures 7, 9, 11, 14, and 16). We also created a series of instances for each fixture (100 with 2 suppositions, 100 with 3 suppositions, and the same for 4, 5, 7, and 9 suppositions). Each instance (3000 in total) was executed with a time limit of 30 s. We recall that our models are implemented in SABIO, a Web based application where long answer times are not desirable. We experimented two scenarios: the basic CP implementation using the heuristics for position in ranking queries and the CP-ML implementation configured with 10 restarts (i.e., 3 s per restart), featuring the basic heuristics and the machine learning classifier for equality constraints.

Tests Results. Table 1 shows the number of unsolved instances and the average runtimes of the solved ones in our experiments. We observe 1069 unsolved instances with the CP model and we attribute this to 2 main reasons: first, the position bounds (i.e., $bestPos_i$ and $worstPos_i$) can only be computed after finding the total points ($tp_i$) for all the teams in the competition. As a result, position in ranking constraints standing are validated only when the search algorithm performs a complete game points assignment for all teams. Second, we observed that our variable/value selection heuristics struggle with queries related to the “=” operator and the lack of a biased search causes position overshooting. We also observed that our CP-ML implementation seems to perform better and the classifier improves the effectiveness of the algorithm by reducing the number of unsolved instances from 1069 to 627 while displaying a small trade-off in running time.

Table 1. Unsolved instances and average running times of CP and CP-ML

Full size table

4 Conclusions

In this paper we have combined the use of constraint programming and machine learning to solve general soccer fan queries at different stages of a competition and presented 2 alternative solutions CP and CP-ML. Our computational experiments showed that our CP-ML model improves CP effectiveness since it performs a query biased search. We also plan to extend our models to deal with more queries such as determining the maximum number of games that a team can afford to lose and still qualify to the playoffs and also explore the implementation of a MIP model, based on the work of Ribeiro and Urrutia [8].

We would like to thank Luis Felipe Vargas, María Andrea Cruz and Carlos Martínez for developing early versions of the CP model under the supervision of Juan Francisco Díaz. Robinson Duque is supported by Colciencias under the PhD scholarship program. Alejandro Arbelaez is supported by the DISCUS project (FP7 Grant Agreement 318137), and Science Foundation Ireland (SFI) Grant No. 10/CE/I1853.

References

Schwartz, B.L.: Possible winners in partially completed tournaments. SIAM Rev. 8(3), 302–308 (1966)
Article MATH Google Scholar
Hoffman, A., Rivlin, T.: When is a team mathematically eliminated? In: Princeton Symposium on Mathematical Programming, pp. 391–401. Princeton, NJ (1967)
Google Scholar
Kern, W., Paulusma, D.: The new fifa rules are hard: complexity aspects of sports competitions. Discrete Appl. Math. 108(3), 317–323 (2001)
Article MathSciNet MATH Google Scholar
Bernholt, T., Gülich, A., Hofmeister, T., Schmitt, N.: Football elimination is hard to decide under the 3-point-rule. In: Kutyłowski, M., Pacholski, L., Wierzbicki, T. (eds.) MFCS 1999. LNCS, vol. 1672, pp. 410–418. Springer, Heidelberg (1999). doi:10.1007/3-540-48340-3_37
Chapter Google Scholar
Wayne, K.D.: A new property and a faster algorithm for baseball elimination. SIAM J. Discrete Math. 14(2), 223–229 (2001)
Article MathSciNet MATH Google Scholar
Arbelaez, A., Hamadi, Y.: Exploiting weak dependencies in tree-based search. In: SAC 2009, pp. 1385–1391 (2009)
Google Scholar
Haralick, R.M., Elliott, G.L.: Increasing tree search efficiency for constraint satisfaction problems. In: IJCAI 1979, San Francisco, CA, USA, pp. 356–364 (1979)
Google Scholar
Ribeiro, C.C., Urrutia, S.: An application of integer programming to playoff elimination in football championships. ITOR 12(4), 375–386 (2005)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Universidad del Valle, Cali, Colombia
Robinson Duque & Juan Francisco Díaz
Insight Centre for Data Analytics, University College Cork, Cork, Ireland
Alejandro Arbelaez

Authors

Robinson Duque
View author publications
You can also search for this author in PubMed Google Scholar
Juan Francisco Díaz
View author publications
You can also search for this author in PubMed Google Scholar
Alejandro Arbelaez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robinson Duque .

Editor information

Editors and Affiliations

Univ of Naples Federico II, Naples, Italy
Paola Festa
Thomas J. Wastson Research Center, Yorktown Heights, New York, USA
Meinolf Sellmann
Eindhoven University of Technology, Eindhoven, The Netherlands
Joaquin Vanschoren

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duque, R., Díaz, J.F., Arbelaez, A. (2016). Constraint Programming and Machine Learning for Interactive Soccer Analysis. In: Festa, P., Sellmann, M., Vanschoren, J. (eds) Learning and Intelligent Optimization. LION 2016. Lecture Notes in Computer Science(), vol 10079. Springer, Cham. https://doi.org/10.1007/978-3-319-50349-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-50349-3_18
Published: 01 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50348-6
Online ISBN: 978-3-319-50349-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Constraint Programming and Machine Learning for Interactive Soccer Analysis

Abstract

Similar content being viewed by others

Towards a Principled Solution to Simulated Robot Soccer

A comparison of penalty shootout designs in soccer

Disruptive Innovations in RoboCup 2D Soccer Simulation League: From Cyberoos’98 to Gliders2016

Keywords

1 Introduction

2 CP Model for Soccer Queries

2.1 CP Model Formulation

2.2 Variable/Value Selection

3 Empirical Evaluation

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Constraint Programming and Machine Learning for Interactive Soccer Analysis

Abstract

Similar content being viewed by others

Towards a Principled Solution to Simulated Robot Soccer

A comparison of penalty shootout designs in soccer

Disruptive Innovations in RoboCup 2D Soccer Simulation League: From Cyberoos’98 to Gliders2016

Keywords

1 Introduction

2 CP Model for Soccer Queries

2.1 CP Model Formulation

2.2 Variable/Value Selection

3 Empirical Evaluation

4 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation