Keywords

Prologue and Introduction

This paper was written in the context of the author’s doctoral thesis (Kotzbek 2017), which examined the practicability of GIS in the course of football game analyses. Besides a brief overview of the project’s main issues and aims (Kotzbek and Kainz 2014, 2015a) the data provided by ProzoneSports was reviewed, classified (Kotzbek and Kainz 2015b), and an approach towards automated GIS-based analysis of scoring attempt patterns was presented so far (Kotzbek and Kainz 2016). In general, analysing football games demands the zoning of the pitch if the game’s spatiotemporal components should be taken into account appropriately. For this purpose, static analysis zones such as the common pitch’s vertical trisection are utilised. However, the pitch is also spatially segmented by default according to the FIFA’s Law of the Game (Fédération Internationale de Football Association 2016). In addition to fictional analysis zones, the predefined pitch zones are marked and hence visible. Moreover, they can be also applied for a wide range of spatial analyses.

As space is crucial in football, knowledge about where something happened during the game is of great importance because only then the question why something happened can be answered. Furthermore, if the causes of certain circumstances are known one might be able to control or influence comparable situations in the future. Concerning this, the spatial classification of game relevant actions as well as the observation of the ever-changing teams’ occupied areas is appropriate. In this context, the utilisation of the pitch’s default segments might be decent for some purposes but is definitely limited. Therefore, analytical zonings are often applied and selected in dependence upon the particular issue.

A review of the literature addressing this topic is unsatisfactory. The vast majority of contributions focus on other aspects and cover the utilised zoning alternatives only marginally. Moreover, almost all of them are static and hence lack flexibility. In regard of this, Fig. 1 exemplarily illustrates six different zoning options, which have been applied in previous studies. In comparison to each other, option b is of particular interest as it is the only model that features zones that are based on tactical position roles (Di Salvo et al. 2007), whereas almost the rest is aligned along the predefined pitch lines except for option c (Lucey et al. 2012). Notwithstanding this the zonings’ fields of application vary. For example, Clayton (2011) applied option f supplementary to option d for analysing attacks within the foremost attacking zone. A similar variation to d is option a which divides the central pitch area more consistently. This approach was utilised by Pollard and Reep (1997) to evaluate the effectiveness of the teams’ strategy. Furthermore, option e was employed by Cotta et al. (2013) in order to analyse passing networks. In contrast to these zonings, Lucey et al. (2012) consistently segmented the pitch in equally large rectangles irrespective of its predefined marking lines in order to analyse passing sequences.

Fig. 1
figure 1

Common zoning alternatives. Source Pollard and Reep (1997) (a), Di Salvo et al. (2007) (b), Lucey et al. (2012) (c), Clayton (2011) (d), Cotta et al. (2013) (e), Clayton (2011) (f)

Whether a zoning alternative is practicable or not depends on the purpose of its application. Hence, an evaluation of different zoning alternatives is not intended in the course of this study. Instead of this, this article’s main objective is to highlight the benefits of dynamic zoning in the course of football game analyses. Since football is not static, it is worthwhile to consider dynamic alternatives. In this context, evidence is provided by Vilar et al. (2013) who demonstrated an approach towards dynamic segmentation of the teams’ occupied areas and defined it as convex hulls around the outfield players. Based on this, numerical dominance was analysed.

In contrast to this local zoning, a global approach encompassing the entire pitch is presented in this paper. In practical terms a custom tool for ArcGIS for Desktop 10.x named Dynamic Zoning was developed. It is based on Python and automatically trisects the pitch’s area based upon the players’ changing positions. The tool’s outcome consists of three zones for each team, which together cover the whole pitch and represent the areas of the teams’ defenders, midfielders and forwards at a given moment. Although this tool’s functionality covers the basic conception of dynamic zoning only, it can be considered as the basis for manifold analysis opportunities. In order to provide an example of the conception’s practical application, another ArcGIS custom tool for the determination of the teams’ offensive qualities is also described. Moreover, the conception’s practicability and the tools’ applicability in the course of game analyses were evaluated by professional game analysts from Austria and Germany. Concerning this an expert survey was conducted, which was assessed conducting a multi-level content analysis.

The rest of this paper is organised as follows: First, the applied data is briefly outlined. Then the conception of dynamic zoning in general as well as the Dynamic Zoning tool in particular are described in the paper’s first main part. In its second part, the Analyse Offensive Qualities tool is presented. Since the tools’ code are too extensive to display, both tools are described in as much detail as possible, including their input parameters and their functionality. Afterwards, the expert survey’s design as well as the experts’ feedback are concisely summarised. Finally, this paper is concluded in its last chapter, whereby not only the benefits of dynamic zoning but also suggestions for the tools’ improvements are discussed.

A Brief Overview on the Data and the Systems Applied

Although football-specific geo data consists of event and tracking data, the application of the latter were necessary only in the course of this study. It represents the players’ and the ball’s movement, constantly gathered with 10 fps as consecutive point data (Kotzbek and Kainz 2015b). Owing to this recording rate far more than a million single tracking data points are provided for one game only. As each point is equipped with a time stamp and frame the process of dynamic zoning can be conducted for every tenth of a second.

The data was provided by ProzoneSports, the European market leader (Castellano et al. 2014) that recently joined forces with the US company STATS (STATS Sports Data Company 2016). The characteristics of tracking data are applicable to the data definition of both Mitchell (2009) as well as Zeiler and Murphy (2010) and hence can be described as point features whose appearance is time-dependent. Besides this, referring to Dodge et al. (2008) to classify the data as Moving Point Objects (MPO) is also appropriate.

Its spatial information is provided as X/Y-coordinates based on a local spatial reference system, and its origin is located at the pitch’s centre. Although ProzoneSports’ data spatially refers to the international standardised pitch size of 105 m × 68 m and hence minor inaccuracies due to scaling are possible, several studies indicate that the tracking data’s spatial accuracy is sufficient for analysis purposes (Di Salvo et al. 2006, 2009; O’Donoghue and Robinson 2009).

Since the data is usually provided as xml files, it is necessary to prepare them in a GIS appropriate manner. For this purpose, the so called Match Data Preparation tool was developed. It automatically converts the raw data to point feature classes (FC), separated into tracking objects as well as half time and arranges them within feature datasets (FDS) located in a file geo database (GDB). Moreover, a pitch composed of point, line and polygon FCs can be created optionally. As this tool’s functionality was already demonstrated (Kotzbek and Kainz 2015b), detailed information about it are omitted here. As the process of dynamic zoning requires team specific FCs which attributively contain the players’ tactical positions, two additional tools have to be executed in advance in order to update the game’s GDB. Whilst the Add Formation Index tool classifies the players as goalkeepers, defenders, midfielders and forwards based upon their tactical positions stored within the FCs’ names, the Team Merge tool, merges the single tracking FC according to their Team ID (Kotzbek 2017).

ArcGIS for Desktop 10.4 for DesktopBackground Geoprocessing 64 Bit developed by Esri was applied for this study’s purposes not only because it is a robust and well documented GIS, but also owing to the author’s long-term practical experience with it. Notwithstanding this, it is assumed, that other GIS such as QGIS for instance are also suitable for analysing football games in general. As Python is preferred by Esri (Zandbergen 2010) it was reasonable to apply it for the development of game analysis tools. However, the tools’ target group composed of professional game analysts might not be familiar with executing Python scripts. Hence, the scripts are prepared as custom tools for ArcGIS providing the users with a graphical user interface (GUI) in form of tool dialog boxes (TDB) which are more accessible and intuitive. Furthermore, custom tools can be combined to geoprocessing packages, and their distribution can therefore be considered straightforward.

In the course of this study the utilised computer system was based on a Windows 7 64 Bit OS with 8 GB RAM and an Intel® Core™ i5-3470 CPU with 3.2 GHz. This information has to be taken into account when interpreting the tool’s performance.

Part I: The Conception of Dynamic Zoning and Its Implementation

In general, this first approach towards dynamic zoning is based upon the defenders’ and forwards’ average X-coordinates, whose axes are parallel to the pitch’s side lines. These thresholds can be determined for either period of time or a specific time stamp and divide the pitch vertically in a defending, central and attacking zone. Arithmetic averaging was applied as a compromise because it is considered more robust towards the tactical zones’ random coalescence. For instance, employing the foremost position might be critical in corner situations as tall defenders tend to get the header within the opponent’s penalty area. By contrast, utilising the backmost position fail to not mirror the course of the game accurately.

For this purpose, the Dynamic Zoning tool was developed which requires six input parameters as illustrated in Fig. 2. Besides the selection of an input GDB which corresponds to the Match Data Preparation tool’s output GDB, the point of time for which the dynamic zoning should be conducted has to be entered. As the tool mirrors a first draft of the basic conception, its functionality is limited to the zoning at certain points of time only. Besides these two mandatory parameters, it is up to the user to execute the tool for either both teams and half times respectively or just one of them. In regard of the default input parameters as shown in Fig. 2, the process lasts 21 s on average. In the following the tool script’s main parts are described in as much detail as possible albeit code fragments cannot be provided as space is limited. However, the code is readily shared on request.

Fig. 2
figure 2

Tool dialog box of the dynamic zoning tool. Source Kotzbek (2017)

A number of preliminary measures such as the localisation of the FDS in which according to the input parameters the required merged team-specific tracking data are stored have to be conducted initially. Since the names of all FCs contain the data’s Half ID and Team ID, its identification and localisation is straightforward. Although direct inputs of the required FCs would be an option, demanding a GDB as an input prevents incorrect selections by the users and is thus preferable.

After the creation of the output GDB which is composed of half time specific FDS, the given point of time is converted into a Frame value. For this purpose, the entered string is divided by the colon in order to cast the two single parts to float numbers, which are then multiplied by 600 and 10 for the minute and second value respectively. This procedure is feasible since the data’s recording rate is constant and known. The addition of both values corresponds to the Frame value.

Subsequently the teams’ game directions are determined by detecting the home team’s goalkeeper’s position at the game’s very first frame. As the coordinate system’s origin is located at the pitch’s centre the algebraic sign of that position’s X-coordinate can be applied for the assignment. Based upon this information the game directions of both teams and half times are derived.

As stated at this chapter’s beginning, the players’ X-coordinates are applied for the zoning process. In contrast to the determination of the game direction, the position values along the abscissa have to be normalised. In order to prevent incorrect calculations of the zones’ thresholds this measure is necessary as the defenders’ average X-positon can be composed of values left and right of the pitch’s halfway line. Therefore, the outfield players’ X-coordinates are translated to a new origin. Its ordinate runs along the team’s own goal line. Then, the two calculated average positions of the defenders and forwards are denormalised so that the team’s three zones can be generated on the pitch in relation to the original coordinate system in the pitch’s centre. Again, the game direction has to be taken into account.

Once the separation lines have been placed correctly, the corner points of the zones’ polygons are determined. Whilst the defending zone is spanned between the team’s own goal line and the defenders’ average position line, the attacking zone stretches from the forwards’ average line to the opponent’s goal line. In between the midfielders’ area or central zone is located (see Fig. 3). Executing the ArcGIS system tools CreateFeatureClass and InsertCursor polygon FCs are created and finally united utilising the ArcGIS system tool Union. Conclusively, the zones’ area, width and height are extracted while iterating through the output FC applying a SearchCursor.

Fig. 3
figure 3

Schematic depiction of the zone’s corner points. Source Kotzbek (2017) modified

After the Dynamic Zoning tool has successfully been executed for each selected team an output FC containing all three zones at the entered point of time is available. Figure 4 exemplarily illustrates the tool’s graphical outcome immediately before the away team kicks a corner. Although this approved the basic conception’s feasibility, it does not provide information about its practicability in the course of football game analyses. Therefore, a potential case of application is demonstrated in the next chapter.

Fig. 4
figure 4

Exemplarily outcome of the dynamic zoning tool at a certain point of time. Source Kotzbek (2017) modified, data ProzoneSports

Part II: Determination of the Offensive Quality Based on Dynamic Zoning

Based on the conception of dynamic zoning, the teams’ offensive quality can be determined assessing the defenders zone’s vertical length. This is reasonable as it is assumed that the bigger the defending zone, the higher the team’s vertical play is and hence the more offensive the team is. As a consequence, this information not only facilitates statements about pressing play but also spatially displays the interdependency of both teams’ use of space. In order to categorise the teams’ offensive quality a third of the international pitch’s default length of 105 m was applied as a threshold. In dependence upon the game direction the vertical separation line is located 35 m in front of the team’s own goal line and divides the pitch into areas of low and strong offensive quality. However, these are further narrowed by enlarging the linear threshold to an area of middle offensive quality between 30 and 40 m measured from the goal line. As this classification is arbitrarily defined, the outcome’s significance has to be reviewed.

This conception was implemented as a custom tool for ArcGIS named Analyse Offensive Qualities. Its TDB is illustrated in Fig. 5 and consists of ten input parameters altogether. Besides the mandatory entry of the input GDB as well as the selection of the teams and half times to be analysed, not only three analysis periods can be chosen but also the classification’s two break values. In consideration of the previously described applied computer system the tool’s length of execution varies in dependence upon the selected analysis period. Whilst on average the analysis process in regard of the entire half times takes 4 min, analysis periods of 5 and 15 min require 8 and 5 min respectively.

Fig. 5
figure 5

Tool dialog box of the analyse offensive qualities tool. Source Kotzbek (2017)

Since the tool’s code is almost identical to the one of the Dynamic Zoning tool, main differences are only briefly described hereafter. First of all, the process is limited to the determination of the teams’ defending zones. Furthermore, instead of the assessment’s restriction to certain points of time, the teams’ mean vertical defending height is calculated for predefined analysis periods. For this purpose, the ArcGIS system tool Dissolve is applied to the defensive line’s position for each frame of the selected analysis period. Afterwards the determined mean values are read out utilising a SearchCursor and are summarised. The result is then divided by the number of frames. Conclusively, two new fields named Length and Off_Quality are added to the output polygon FC. The new attributes are updated based upon the extracted polygons length as well as according to the input parameters. The tool’s outcome is exemplarily illustrated in Fig. 6 that shows the teams’ defending areas for both half times.

Fig. 6
figure 6

Comparison between the overall offensive qualities of both teams for both half times. Source Kotzbek (2017) modified, data ProzoneSports

Whilst the away team (Team B) was marginally higher positioned in the first half, the opposite case was true in the game’s second period. Although additional information about the teams’ play can be obtained, their significance is limited at least with regard to the analysis of entire half times. Therefore, applying shorter analysis periods such as 5 min each, leads to a more informative outcome as illustrated in Fig. 7. It displays that the teams’ offensive qualities correlate which can be expected as the players’ movements are ball-oriented in general. The defensive lines’ mean heights are illustrated as vertical lines parallel to the goal lines and are colour-coded in dependence upon the Team ID (A = red, B = blue) from light to dark according to the game’s chronology.

Fig. 7
figure 7

Comparison between the offensive qualities of both teams for the first half, distinguished in periods of 5 min each. Source Kotzbek (2017), data ProzoneSports

In this half time the home team (Team A) scored three goals, although the away team played more offensively on average as illustrated in Fig. 6. The first goal was scored in the 5th minute during a strong offensive period of the visiting team. As a consequence, their attacking play ceased but was still strong for the subsequent quarter of an hour, followed by a short strong offensive by the home team. As the away team gained back control over the game again between the 25th and the 30th minute, the hosts stroke back and scored twice in the 31st and 36th minute. Figure 7 suggests that after the 3:0 the home team did not risk their comfortable lead and primarily focussed on the defence until the half time break.

Since the tool covers one specific facet of the game only, it is important to relate its outcome to the results of other tools such as the developed Analyse Scoring Attempts custom tool for ArcGIS, which was already presented by the authors (Kotzbek and Kainz 2016). With this, further information to interpret the outcomes can be taken into account. For example, the obtained results indicate that the first goal was scored as a consequence of a ball possession oriented build up play, whereas a direct free-kick and a corner lead to the 2:0 and 3:0 respectively.

The tool’s outcome is not only a quantification of the teams’ collective offensive play, but also mirrors an aspect of the game’s course. Furthermore, it is a source for suggestions about the teams’ game style. However, in this particular context it is necessary to analyse a bigger data set comprising games of an entire season for example. With this, it would be conceivable to detect patterns within the teams’ offensive play. For instance, it could be questioned whether a team tends to drop deep after scoring a goal. Apart from this particular purpose, it is reasonable to combine different game relevant information such as about ball possession, current score, minute of play or the competition in which the match was played.

Evaluation of the Tools by Professional Game Analayst

Both tools were presented separately to eight game analysts employed by Bundesliga clubs in Austria and Germany. Subsequent to each demonstration an expert survey was conducted in order to evaluate the conception of dynamic zoning as well as both tools’ functionality. According to Bogner et al. (2014) the survey can be described as partly standardised and explorative. To ensure the feedbacks’ completeness a guideline comprising questions about the tools’ parameters, possibilities for further developments and its practicability in the course of football game analyses, was utilised. Since the survey’s design is described in detail in the author’s thesis (Kotzbek 2017), it is only briefly outlined hereafter.

In the thesis’ scope the experts were already questioned about contemporary football game analysis’ methods and techniques in late 2014. Therefore, the research project was not unknown to them when the second survey was conducted from 9 June 2016 to 4 July 2016. Whilst three interviews were conducted face-to-face, the other experts were questioned online via Skype. Irrespective of this, all dialogues were recorded acoustically in order to archive them and to produce interview protocols. These written recordings can be considered as content-related summaries and are appropriate for the assessment of collectively shared interpretative knowledge (Bogner et al. 2014). Furthermore, the protocols were analysed applying a multi-level qualitative content analysis according to Mayring (2010). Since this method cannot be standardised, adjustments in dependence upon certain analysis purposes are inevitable (Mayring 2010). However, in general this approach is intended for the systematical extraction of a text’s key messages (Gläser and Laudel 2004). Based on this, the information is classified in terms of a textual structuring and was further assessed employing a frequency analysis in order to weigh the experts’ feedback. Although the design’s last part corresponds to a quantitative method, it is also suitable for multilevel qualitative content analysis as it facilitates the possibility to strengthen or weaken statements (Mayring 2010). As a result, several individual feedbacks are textually combined to collectively shared statements.

The evaluation of the conception of dynamic zoning concluded that it is interesting and fine but besides a pitch’s vertical trisection, horizontal zones also have to be taken into account. Moreover, the majority of experts contradicted the author’s assumption whereupon the players’ mean X-coordinates are suitable as thresholds. Instead of this, they consider the utilisation of the teams’ backmost and foremost positions as more appropriate in this context. Nevertheless, it was recommended to implement both approaches so that the demarcation criteria is freely selectable. According to two experts, dynamic zoning can also be applied for analysing a team’s spatial compactness. Furthermore, the separation lines indicate spatial disparities within the teams’ tactical lines.

Before the second tool was presented the majority of experts had difficulties to name certain fields of application for dynamic zoning. However, subsequently they acknowledged its applicability for analysing a team’s offensive qualities. The tool’s methodology was predominantly described as good, innovative and interesting. In this sense, the approach is suitable not only for the pressing play’s assessment, but also for the evaluation of the compliance with the coach’s tactical guidelines. Moreover, drawing conclusions about the opponent’s game style is considered possible. In addition to this, the tool provides further information about a team’s compactness as well as insights into the game’s course which is indicative of a team’s dominance.

Besides these benefits, it is controversial whether a causality can be derived from the obvious correlation between the teams’ offensive qualities illustrated in Fig. 7. Furthermore, despite the tool’s functionality, the experts’ opinions about the tool’s practicability in the course of football game analyses differ. Whilst at least half of them saw potential cases of application, the other half was indecisive and suggested improvements. Among others, these comprised the implementation of freely selectable pressing play parameters, the additional consideration of the highest defending line for the determination of the offensive dominance, individual selectable analysis periods, the attacking play’s horizontal distinction as well as contextual classifications in order to analyse the dynamic of the game’s flow.

Discussion and Conclusions

The evaluation provides evidence that the conception of dynamic zoning as well as both demonstrated tools are worth to be developed further. For this purpose, the experts’ recommendations about adaptation possibilities have to be imperatively taken into account as professional game analysts represent the tools’ prime target group. It is particularly reasonable to not only additionally fragment the pitch into horizontal zones dynamically, but also to reconsider the demarcation criteria as mean values might convey false impressions of the game. Furthermore, the creation of animated and/or interactive tool outcomes are worth to consider as this measure would provide more detailed information about the dynamic game’s flow. In this case the zones’ determination is reasonable for every frame. Although valuable feedback was already obtained in this context, it is advisable to refine the concept and the tools in close cooperation with game analysts in order to configure the development process more systematically, efficiently and purposefully. Moreover, as the presented approach corresponds to a first draft an extensive study of movement analysis in other sports disciplines is certainly insightful in the course of the approach’s revision.

According to the experts, dynamic zoning provides new interesting perspectives on the game which provides insights into a team’s pressing play, dominance, spatial compactness and game style. Moreover, assessing the zones is considered as suitable to evaluate the compliance with the coach’s tactical guidelines. Hence, it can be concluded that dynamic zoning facilitates manifold analysis opportunities. However, caution should be exercised if it is attempted to derive causalities as verifications of the conception’s as well as tools’ functionality are outstanding yet.

Besides the recommendations quoted in the last chapter, the experts argue that the success of GIS in the scope of football game analyses is essentially connected to the establishment of an appropriate interface to the video analysis. “As both approaches have their shortcomings, a combination of both would alleviate these.” (Kotzbek 2017) As it is common practise to time tag game relevant events within video footages of football games an attributive conjunction to the georeferenced data’s time stamp already exists. Hence, the connection between both analysis approaches is feasible. However, among other details such as the interface’s functionality and access are not contrived yet. This strengthens the demand for interdisciplinary team work.

In order to address a larger part of the target group it seems to be advisable to provide the tools for free and open source GIS since it is assumed that this would attract financially weaker clubs too. As Python scripts can also be applied for process automation in QGIS for instance, the effort to revise the tools would be negligible. Moreover, according to the experts the development of an independent graphical user interface, which not only provides access to video footages but also facilitates the possibility to call several analysis tools is worth considering, as it is expected to be more user friendly.

To sum up, this paper introduces a novel conception of a football pitch’s dynamic zoning based upon the players’ movements applying GIS. Although several aspects of the conception’s first draft were judged favourably by professional game analysts, it cannot be considered sophisticated so far owing to outstanding issues. Hence, further developments are necessary. In this context, we encourage our colleagues to participate in the field of both, GIS-based football game analysis in particular as well as sports analytics in general.