1 Introduction

Strategy development for a team of agents/robots, with decentralized decision making, is a challenging task in an uncertain and adversarial domain. An agent playing soccer not only has to score a goal against an opponent team but also has to defend its own goal. Team strategy, thus, becomes a collective outcome of individual agents’ strategies. To evaluate a team strategy, a number of variables have to be considered. For instance, if a team is on defense, the list of variables may include alignment of defenders with the goal post, distance maintained between the ball and the defenders, the alignment of players in a supporting role with the ball and the distance between the opponents. In contrast, when a team is on the attack, then the number of players involved in the attack, their attacking pattern of moving ahead or staying behind at a certain distance becomes important for devising a strategy. In situations like these, it becomes difficult for an expert to design a strategy that works well in every scenario. In addition, manual evaluation of the strategy on all possible scenarios becomes even more time-consuming. To overcome these challenges, this paper presents a parallel evolutionary computation-based framework for producing an optimized strategy. Parallelization is introduced to reduce the time taken by a single machine in evaluating all possible strategies. Offspring in the evolutionary algorithm is produced by a server, while the workload is distributed among the clients for strategy evaluation.

It is important to note that in the context of this paper a strategy denotes the critical areas that should be occupied by defending players. The alignment of players with the goal and the ball has been evolved to obtain optimized strategic points. These optimized points are then executed in real time for team performance evaluation. The fitness of a solution takes into consideration the difference between goals of the home and the opponent teams, the average ball possession by the home team and certain regions occupied by home team agents.

For the experimental purposes, RoboCup Soccer Simulation Server 3D has been chosen as a test bed. The environment provides a promising platform which is truly dynamic and partially observable in nature. A RoboCup Soccer Simulation 3D League team (Darab and Ebrahimi 2007), namely Karachi Koalas (Haider et al. 2014), that was developed as a part of a larger project has been taken as the home team in this research. This team has been frequently participating in the RoboCup competitions with the best ranking of being the fifth ranked team in the year 2013. The base code of this team has been taken as a benchmark as matches of the base code team, and the evolved team are executed against a similar opponent. It is also worth mentioning that much work has been done in the literature in which other techniques rather than the evolutionary approaches have been applied such as reinforcement learning (Salustowicz et al. 1998; Riedmiller et al. 2009), handcrafted structuring of code (Kazakov and Kudenko 2001), layered learning (Stone and Veloso 2000; Cherubini et al. 2007; Urieli et al. 2011). In this paper, we have compared other approaches with the evolved approach by executing matches to demonstrate that the evolved strategy works well. The rest of the paper is organized as follows. Section 2 provides an overview of the existing literature along with a subsection presenting novelty of FEASO. Section 3 briefly describes the technical background of the RoboCup Soccer domain as well as evolutionary algorithms, and Sect. 4 gives a comprehensive overview of FEASO with its implementation details. Experimental setup and results are discussed in Sect. 5. Section 6 compares FEASO with most relevant evolutionary approaches proposed previously, while Sect. 7 concludes the paper and provides future research directions.

2 Related work

There are various techniques reported in the literature to incorporate strategies in robot soccer. The list of techniques includes reinforcement learning, decision trees; handcrafted rule-based techniques, coordination graphs, layered learning approaches, evolutionary algorithms. The notion of strategy varies among each of the specified techniques. This literature review focuses on learning team strategy and thus techniques that employ strategy optimization are discussed in detail in each of the subsections.

2.1 Reinforcement learning techniques

In reinforcement learning, strategy denotes learning a good policy for a team of agents and concept of collective reward and punishment was introduced. The earliest approach was proposed by Salustowicz et al. (1998) who compared several reinforcement learning algorithms in multi-agent learning scenarios. Riedmiller et al. (2009) suggested learning strategy at a tactical level, such as moves to intercept the ball, wait at a position, passing the ball to a teammate, shoot to goal. Many variants of reinforcement learning such as observational reinforcement learning, team partitioned opaque transition reinforcement learning, Q-learning have been proposed in the domain of RoboCup soccer simulation 2D, but in 3D there are many challenges in current reinforcement learning research. Firstly, with the addition of each new agent, it becomes memory expensive. Secondly, similar behaviors are exhibited in robot soccer and the problem of learning everything all over again exists. Lastly, due to limited perception of an agent, it is often impossible to fully determine the current state and this result in loss of performance of the algorithm. As this study considers an optimized strategy for a team of agents, reinforcement learning has not been used.

2.2 Rule-based techniques

In contrast to reinforcement learning, rule-based and inductive logic programming-based approaches (ILP) have also been proposed by different researchers. The inductive learning agent (Kazakov and Kudenko 2001) uses first-order formalism and ILP to acquire rules to predict failures. The agent gathers instance of actions and classifies them. Next prediction rules are formed that assist soccer agents in deciding each action taken in the game to be good or bad. The limitations of this approach are that its focus lies on verification and validation of knowledge-based system with no new knowledge acquisition. Consequently, agents cannot adapt their own behavior using rules or knowledge acquired by ILP. Murray et al. (2000) developed a multi-agent team based on script language, where procedural aspects were specified by state charts and declarative aspects by using decision trees. The rules have to be designed by an expert and it becomes a tedious job to think of all sets of possibilities. To overcome this limitation, the study (Kok et al. 2005) presented a strategy that was devised by using a few rules provided by experts and the rest of the coordination rules were computed online using coordination graphs. Bezek (2005) and Wang et al. (2009) proposed the use of coordination graphs in the assignment of roles and actions to the players. Svatoň et al. (2014) proposed a methodology to improve the description of strategy by creating sub-strategies and thus ensuring a smooth implementation of actions defined by each strategy. The main focus of coordination graphs is to aid humans in strategy building. A human modeler analyzes the various patterns in the graph and designs hand-coded agents to act under certain circumstances. The focus of this study is not on rules or graph creation but the generation of an optimized strategy that works well in an automated fashion. Thus, no handcrafting of rules is performed but the evolution process applied has the advantage of giving multiple strategies, which are equally good in a specific situation.

2.3 Layered learning technique

Layered learning has also been applied for flexible team formation in RoboCup Soccer Simulation Leagues. Stone et al. (2000) pioneered the architecture in which the problem consisted of breaking into a bottom-up the hierarchy of subproblems. The subproblems are solved and they serve as input to the next layer. The problem used Genetic Programming for selection of an action by a team of agents. Urieli et al. (2011) presented an approach for optimizing interdependent skills in the 3D domain. This layered architecture has the benefit that primitive locomotive skills can be learned precisely and multiple skills are learned in conjunction with each other. The issue lies in learning strategy because it is a high-level behavior exhibited by a team of agents that requires numerous learning efforts if performed via layered learning approach.

2.4 Evolutionary algorithm-based techniques

Evolutionary algorithms have also been applied to evolve strategies in robot soccer. Nakashima et al. (2004, 2006) proposed the use of Genetic Algorithms to learn a team strategy in the domain of 2D simulation league. They divided the field into 48 subregions and the 10 players of both the teams excluding the goalkeeper could exist in any of the regions. The chromosome comprised of the action to be executed in a certain situation. The fitness of a strategy is computed by running simulated matches and taking average goal difference. One limitation of the approach was that a single opponent was fixed and strategy would not be generalized. Cultural Algorithms (CA) was applied (Salhieh et al. 2012) to generate a team strategy for playing robot soccer. A chromosome contained regions and the corresponding player numbers. This was a centralized approach where a coach agent was developed that sent messages to all the agents and only the intended agent executed the action. Goal difference was used as the evaluation function. Recent advances in team strategy were contributed by Ali et al. (2014) in which they investigated the use of a simplified and adaptive version of CA to develop defensive and offensive plays and cooperative strategies in robot soccer. This work was very relevant to the presented approach, and a detailed comparative discussion is provided later in Sect. 6. Lekavy (2011) applied evolutionary approach for evolution of pass execution in standard situations such as kickoff in 2D simulation league. Okada et al. (2011) utilized particle swarm optimization to evolve team formation. Cartesian coordinates of the ten players and 15 possible positions of the ball were modeled as elements of an individual solution, while the initial population was generated randomly. They reported how well formations for various team performances (e.g., offensive, defensive, balanced) could automatically be obtained. Ant Intelligence (Ramani et al. 2008) was utilized for player strategies generation. Multi-group ant colony optimization algorithm (Chen et al. 2016) was proposed that used ant pheromone evaporation mechanism to learn offensive strategies in 2D soccer simulation league. A comparative analysis of this approach is presented in Sect. 6. Luh et al. (2006) used the immune system to develop cooperative strategies in robot soccer. The system selected a behavior for the player such as shot, pass, kick, chase, track, and guard. The approach was validated on the SimuroSot Middle League. It is important to note that all of the above-mentioned techniques using evolutionary computation face the challenge of the time needed to evolve a solution as a large number of fitness evaluations are needed before a well acceptable solution can be found. In light of this requirement, distributed evolutionary approaches are proposed (Gong et al. 2015) and this research also contributes to a distributed approach.

2.5 Novelty

The FEASO framework is a distributed approach for evolution of defensive strategy that has not been addressed in other techniques proposed in the literature. The major contribution of this framework is that it facilitates automatic generation of optimized strategy against a specific opponent team and then generalizes it across various similar opponents. This paper focuses upon the strategic placements of all the agents in the field, while most of the above-mentioned evolutionary techniques consider either subfield or two-three players in the field to demonstrate the required behavior. Moreover, the domain of RoboCup Soccer Simulation 2D in which the above-mentioned approaches were tested is less complex and knowledge about the action to be executed is available to the players that make it easier to exhibit a cooperative strategy. But the domain of RoboCup Soccer Simulation 3D is highly dynamic and unpredictable in which only positions of the agents are known and actions have to be inferred by observational data. The author finds no technique for strategy evolution in the domain of RoboCup 3D. The proposed framework is both robust and time inexpensive in real-time strategy execution. This framework can also be applied to similar competitive domains as well.

3 Technical background

This section presents an overview of the domain of RoboCup and also discusses some basics of evolutionary algorithms.

3.1 Domain description

RoboCup Soccer (Kitano et al. 1998) is a scientific venture that provides an exciting platform that has been used for the advancement of research in artificial intelligence and robotics. The competition has a goal that, by the middle of the twenty-first century, a team of robots would defeat a team of humans in soccer. This competition is held every year, and several leagues have been designed to cater for different problems and challenges in robot coordination, locomotion, etc. The main leagues in this tournament are a middle-sized league, small-sized league, simulation 2D league, simulation 3D league, standard platform league and the humanoid league. In contrast to other leagues, the simulated leagues do not involve physical robots and instead simulated robots play soccer on the virtual field. In the simulation 2D league, the ball and the players are represented by circles on the plane of the field, while in the simulation 3D League, the players are represented as articulated, rigid bodies having 22 hinges. In the 2D league, commands such as move, dash, turn and kick are available; however, in 3D these commands do not exist and locomotion is a big challenge there. The 3D game comprises of two teams of eleven robots in a field of 21 * 30 m as shown in Fig. 1.

Fig. 1
figure 1

Two teams playing RoboCup Soccer Simulation 3D

This research utilizes RoboCup Soccer Simulation 3D as a test environment for the development of opponent-specific strategies. The match is divided into two halves of 5 min each. Continuous time is approximated with discrete cycles with each half being 300 cycles and 1 min equals to 60 cycles. Each robot has its own local view of the field and is controlled by a separate autonomous program. All robots can move and act independently as long as they comply with the league rules. They can freely communicate with each other via message broadcasting, but their visual and hearing perception is distance limited. Due to humanoid locomotion, there are chances of robots falling in the field or colliding with other robots so the server, in such cases, can penalize them by throwing them out of the field. The domain is a complex multi-agent system and is easily understandable by humans due to its soccer-related content.

3.2 Evolutionary algorithms

In many real-world applications, there exist problems that are hard to solve such as traveling salesman problem, time series prediction, bankruptcy prediction, credit scoring. Traditional algorithms to solve these are either very specialized or more general. To approach such hard problems, evolutionary algorithms (EA) were introduced a few decades ago. Evolutionary Algorithms (Bäck 1996) are stochastic search and optimization heuristics derived from classical evolution theory. It follows the Darwinist evolution, which is described as survival of the fittest. EA methods only need the target (fitness) function for a given problem, which is to be optimized.

A typical evolutionary algorithm passes through the following phases as shown in Fig. 2. At first, the population is initialized randomly. The fitness of each solution is computed, parents are selected via selection procedure and offspring are created by variation operators. Next fitness of the new offspring is computed. Members of the population die using survival selection and the process continues until a termination criterion is met.

Fig. 2
figure 2

(Reproduced with permission from Bäck 1996)

Typical evolutionary algorithm.

3.2.1 Parent selection mechanism

Parent selection is used to distinguish among individuals based on their quality to allow better individuals to become parents of the next generation. There are various selection schemas (Bäck 1996), namely fitness proportional selection (FPS), rank-based selection (RBS), tournament selection (TS). In FPS, a probability distribution proportional to the fitness is computed and individuals are selected by sampling the distribution. This schema has high selection pressure. Rank-based selection attempts to remove problems of FPS by computing selection probabilities on relative fitness rather than absolute fitness. RBS mechanism ranks population according to fitness and selection probabilities on the basis of the assigned rank. This schema imposes a sorting overhead on the algorithm, but this is usually negligible compared to the fitness evaluation time. The advantage of this schema is that the selection is independent of actual fitness values and it preserves a constant selection pressure.

3.2.2 Variation operators

The role of variation operators is to create new individuals from the old ones. There are two types of variation operators: mutation and crossover. Mutation is a unary variation operator and is always stochastic. Crossover is a binary variation operator that merges information from two parents to generate two offspring. The two fundamental design concepts in evolutionary algorithms are exploration and exploitation. Exploration means discovering promising areas in the entire search space, while exploitation deals with optimizing within a promising area. Crossover is explorative as it makes a big jump to an area between the parents. Mutation is exploitative as it creates random diversions while standing near the parents.

3.2.3 Survival selection mechanism

The role of survivor selection is to distinguish among individuals based on their quality. A survival selection scheme can be categorized as generational (current population replaced by its offspring) or steady-state model (few members move are replaced by its offspring). Truncation is a survival selection schema in which out of a 2n population of parents and offspring, the top n survive to the next generation.

3.2.4 Performance indicators

Following are some of the performance indicators for evolutionary algorithms:

  • Best-so-far (BSF): The best solution found by the algorithm in each generation.

  • Average-so-far (ASF): the average solution found by the algorithm in each generation.

4 A framework based on evolutionary algorithm for strategy optimization (FEASO)

This section provides a comprehensive overview of the entire system architecture of the proposed framework. The framework focuses on devising an optimized strategy in robot soccer by discovering the key points where the agents should place themselves under various scenarios.

4.1 Optimization parameters

To develop an understanding of the parameters being optimized, an example from robot soccer is presented. Figure 3 depicts an instance of a simulated soccer field in which agents of both the teams (red and blue) are placed. The red circles with ‘O’ represent the opposing team, while the blue circles represent the home team. A small white circle denotes the ball. The scenario shown is an instance of attack being executed by the opponent team. There are different roles associated with the home team players such as goalkeeper (G), main attacker (MA), left forward (LF), right forward (RF), left supporter (LS), right supporter (RS), left midfielder (LM), right midfielder (RM) and finally three defenders left defender (LD), right defender (RD) and center defender (CD). Each of these roles occupies certain positions in the field. These positions are in turn determined by factors such as the distance of a player from the ball, distance from the goal posts and distance with other teammates. The red arrows highlight the variables (d1, d2, …., d9) that needs to be optimized. Table 1 signifies the relationship of these variables with the roles.

Fig. 3
figure 3

Simulated soccer field with arrows denoting optimization variables (color figure online)

Table 1 Solution Representation

The variables d1, d2, d3 and d4 represent the distances of LM, RMF, the x coordinate of LD and x coordinate of RD with respect to the goal post. The variables d5 and d6 represent the distances maintained between the CD player and y coordinates of the left and right defense players. CD also maintains a distance d7 with the ball. The variables d8 and d9 denote the distances between the ball and the left and right supporters. These nine variables form a single tuple Xi. Attackers, goalkeepers and forward players are not considered in this representation.

The initial value for Xi that forms the seed is populated from the code of our home team. The rationale behind the selection of this seed is that it is a viable solution already available. If Xi is initialized randomly, then it is possible that we get distances that are not feasible for a player according to its role.

4.2 System architecture

FEASO comprises of three basic modules, namely evolutionary algorithm Execution Module (EAEM), Parallel Fitness Evaluation Module (PFEM) and Fitness Computation Module (FCM) as shown in Fig. 4.

Fig. 4
figure 4

Proposed modules of FEASO

The input to EAEM is a single tuple Xi and its output is a set of newly generated variables formed by application of an evolutionary algorithm. These variables are then passed to the PFEM module which replicates each one turn by turn on multiple machines. Each of the machines then executes FCM and sends a fitness value back to PFEM. The PFEM applies a more complex function on all fitness values to produce the final fitness that is returned to the EAEM and the cycle continues. Figure 5 gives the high-level system architecture depicting all the modules.

Fig. 5
figure 5

System architecture of FEASO

4.2.1 Module I: evolutionary algorithm execution module (EAEM)

This module is a vital part of the entire architecture as it runs on the server machine and is responsible for the generation of offspring. The process starts with the generation of an initial population represented by P. It contains a set of Xi that in turn are produced randomly from Gaussian random mutation of Xseed value. The fitness computation is performed by PFEM and FCM as shown in the gray box in Fig. 6. The newly generated population is then sent to the parent selection phase that selects a set of parents using Rank-Based Selection Scheme. Crossover and mutation of the parents are performed and the resultant population along with new offspring is sent to the survival selection phase. The proposed algorithm introduces a random offspring generation phase for increasing diversity in the population. This phase makes the algorithm exploratory in nature and different from the traditional EA.

Fig. 6
figure 6

The process flow of EAEM

The algorithm for initial generation of P is exhibited in Table 2 and the algorithm for EAEM is depicted in Table 3.

Table 2 Algorithm for initial population
Table 3 Algorithm for EAEM

4.2.2 Module II: parallel fitness evaluation module (PFEM)

This module is responsible for the generation of multiple instances which are executed in parallel. Each instance Ei receives a single tuple Xi and is responsible for its replication across p workstations. After distribution, this module waits synchronously for receiving fitness values from FCM. Since FCM is executed on multiple machines in parallel, therefore, a different fitness value fi is received corresponding to same Xi. Thus, the PFEM is responsible for aggregation of all fi’s to form a single one which accounts for noise and variation in the values received. The consolidated fitness value fi is then returned to EAEM. The algorithm in Table 4 explains the working of this module.

Table 4 Algorithm for PFEM

4.2.3 Module III: fitness computation module (FCM)

FCM is responsible for the computation of fitness corresponding to each Xi. This module is deployed on every client machine and performs the following sequence of steps, also depicted in Fig. 7:

Fig. 7
figure 7

Phases of fitness computation module

  1. 1)

    A code adapter configures the variables of Xi into the developed code of RoboCup Soccer Simulation League 3D team, namely Karachi Koalas (Haider et al. 2014).

  2. 2)

    Simspark (Xu and Vatankhah 2013) server is used to execute simulation match between two teams, T1 and T2, and generate a log file.

  3. 3)

    A parser application extracts the required game statistics from the recorded log file for fitness computation that is returned back to the PFEM.

The algorithm for Fitness Computation Module is shown in Table 5.

Table 5 Algorithm for FCM

4.3 Fitness function

This section describes the factors that contribute to the fitness function of a solution.

4.3.1 Goal scored against defending team

The defending team should be able to reduce the number of goals scored by the opponent. Equation 1 describes the goal difference where Scoreh denotes the number of goals scored by the home team, while Score denotes the goals scored by the opponent

$$ {\mathbf{GoalScored}} = \, {\mathbf{Score}}_{{\mathbf{h}}} {-} \, {\mathbf{Score}}_{{\mathbf{o}}} $$
(1)

4.3.2 Ball possession by home team players

The defending team should maintain possession of the ball. For this research, it is assumed that ball possessor is the player having the minimum distance from the ball. Equation 2 describes ball possession where Cph denotes the number of cycles ball is with the home team player and Tc denotes the total number of simulation cycles.

$$ {\mathbf{BallPos}} = \, {\mathbf{b}}_{{\mathbf{p}}} = \, {\mathbf{Cp}}_{{\mathbf{h}}} / \, {\mathbf{T}}_{{\mathbf{c}}} $$
(2)

4.3.3 Average time ball out of danger zone

The defending team should be able to keep the ball out of the danger zone most of the time. For this purpose, the field has been divided into four regions, namely LeftRegion1, LeftRegion2, RightRegion1 and RightRegion2 as shown in Fig. 8. The home team in this scenario is initialized on the left-hand side of the field and the opponent on the right-hand side. The danger zone for the home team is the LeftRegion1. Equation 3 describes the average time ball is out of danger where CLeftRegion2 denotes the number of cycles ball is in LeftRegion2 and CLeftRegion1 denotes the number of cycles ball is in LeftRegion1.

$$ {\mathbf{Avg}} \, {\mathbf{C}}_{{{\mathbf{outofdanger}}}} = \, {\mathbf{C}}_{{{\mathbf{LeftRegion2}}}} / { }\left( {{\mathbf{C}}_{{{\mathbf{LeftRegion1}}}} + \, {\mathbf{C}}_{{{\mathbf{LeftRegion2}}}} } \right) $$
(3)
Fig. 8
figure 8

Regions in a simulated soccer field

To evolve a good strategy for defense, we focus on instances where the ball is in the LeftRegion1 and LeftRegion2. However, at times, an uncertain situation may arise in which ball remains most of the time in the right regions. Thus, a solution is penalized if the ball is more than delta cycles in the opponent half. The fitness function that needs to be maximized is shown in Eq. 4.

$$ {\mathbf{f}}\left( {\mathbf{x}} \right) \, = \, {\mathbf{GoalScored}} + \, {\mathbf{b}}_{{\mathbf{p}}} + \, {\mathbf{Avg}} \, {\mathbf{C}}_{{{\mathbf{outofdanger}}}} - \, \partial_{{{\mathbf{cycles}}}} $$
(4)

The range of fitness function comes out to be [− 2, + 2]. To visually verify the fitness values, logs of few matches were manually replayed.

4.4 Parser application

A parser application that extracts the three mentioned factors has been developed earlier (Larik and Haider 2012) and is shown in Table 6.

Table 6 Algorithm for parser application

5 Experimental setup and results

To demonstrate the effectiveness of FEASO, a series of experiments were designed and executed to investigate:

  • parameters of evolutionary algorithm,

  • the evolved strategy

  • performance of the evolved team against some benchmark scenarios.

The above-mentioned aspects are discussed in subsequent subsections.

5.1 Design and analysis of results for parameters of evolutionary algorithm

Initially, experiments were conducted with a varying population size of 5–15. It was observed that with a small population there was little diversity in the solutions, while with larger size, the learning process became slow. Next to fitness evaluations ranging from 1 to 5 were experimented to determine the acceptable number that works best. It was found that due to the noisy environment, single fitness evaluation produced erratic results, while using more than three evaluations was very time-consuming. Thus, the population size was set to 10, while a solution was evaluated three times (that is, three matches were played with the same solution parameters) to get its final fitness. The hardware and the software specifications are shown in Table 7.

Table 7 Specifications for FEASO

The teams that participate in the RoboCup event publish their working code, known as binaries, and after completion of the tournament, these binaries are made public. For the experiments, the binary of team RoboCanes, a leading RoboCup Soccer 3D team from the University of Miami that has been participating since the beginning of the league, was chosen. The reason for its selection was its strong attacking capability. The code of the home team, Karachi Koalas, was deployed on all the workstations which were capable of executing 20 matches of 10-min duration consecutively. Extensive shell scripting was performed for the distributed evolutionary algorithm to minimize the processing time. Karachi Koalas was initialized on the left-hand side while RoboCanes on the right-hand side of the simulated field. During the experiments, the evolutionary algorithm was run for 25 generations. Initial population was generated by applying Gaussian random mutation on the seed solution. For parent selection, RBS scheme was implemented. One-point crossover technique was employed because of its simplicity and also because no obvious advantage was observed in the initial experiments with different orders of n-point crossover. The rate of mutation was varied from 0.5 to 0.1, and 0.3 was selected as the final rate. For survival selection, the truncation scheme with the steady-state model was used. For injecting random solutions, a ratio of 20% of the population size was chosen intuitively. The final features of EA are shown in Table 8.

Table 8 Features of evolutionary algorithms

The input parameters for execution of EAEM, as discussed in Sect. 4, are presented in Table 9.

Table 9 Input parameters for EAEM

To compute the fitness of a solution, a match of 300 cycles corresponding to 5 min was simulated on different machines. It took almost 40 min to complete evolution of a single generation. Initially, the evaluation function only considered goal difference but that was not viable as many matches ended without a goal. Next, the average time the ball was out of danger region was added to the fitness function, but in many cases when the ball was moving toward opponent region more fitness value was assigned to those solutions. Thus, a penalty was imposed to cut down those solutions. Figure 9 depicts the average-so-far and best-so-far curves where the blue line represents the best-so-far graph, while the red signifies the average achieved in each of the 25 generations. As evident from the graph, the best fitness value of 1.2 was achieved during the first five generations and the same value dominates toward the end. This signifies that much improvement has been made in initial generations.

Fig. 9
figure 9

Average-so-far and best-so-far curves

The average-so-far graph, on the other hand, started initially with a negative fitness value, but due to truncation-based survival mechanism, the average fitness of the population had improved by the end of the experiments. Next, the best individuals of the fifth generation were compared with the best individuals of the twenty-fifth generations to see whether there is significant diversity in the population. The results shown in Fig. 10 demonstrate the proof of concept where the blue line represents the fifth generation, while the red line denotes the twenty-fifth generation. Although the tenth individual of both the populations achieved the best fitness value 1.2, there is diversity in the rest of the individuals.

Fig. 10
figure 10

Best individuals of the fifth generation versus best individuals of the twenty-fifth generation

5.2 Analysis of the evolved strategy

For analyzing the evolved strategy throughout generations, the factors contributing toward overall fitness of solution were given due consideration. At first, only the goals scored against the home team was considered as fitness criteria with the assumption that if the volume of goals scored is reduced then the learned strategic positions were better. However, if the matches ended without a goal, then it was difficult to infer about the overall team strategy. To overcome this issue, the ball possession by the home team was considered a metric for good defense. Moreover, the region where the ball resides also influences the positioning of the agents; thus, the instances in which the defenders were able to keep the ball were also incorporated during fitness computation. Since the log files of matches were available, the best fitness solution files were replayed to manually verify whether there was a significant difference between the initial and final generations. The matches started with the opponent team kicking the ball in the defensive region of the home team. In the initial phases of strategy evolution as shown in Fig. 11, if this kick from opponent team entered the danger zone and five of the opponent agents rushed toward the defensive half, then the home team defenders were able to stop the goal.

Fig. 11
figure 11

Initial phases of strategy evolution

Later as the generations evolved, it was observed that the defenders considered their distances with the ball and the goal post and aligned themselves in such a way that they were able to block the path between the ball and goal as depicted in Fig. 12. The role of defender at the center was of great importance because if either the ball was coming from the left or right it should be able to clear the ball toward the opponent half.

Fig. 12
figure 12

Final phases of strategy evolution

In the evolved strategy, the defending agents were able to keep the ball out of their danger zone. It was noted that the placements of both defenders and midfielders when the goal was on attack were also appropriate in the evolved generations. Figure 13 shows a bar chart comparing the original and the evolved strategic points. The blue bar represents original points, while the orange bar represents evolved points. A yellow line shows the difference between the two values. As evident from the figure, there are two peaks and there is a substantial difference between the values for optimized distances, namely d2 and d7.

Fig. 13
figure 13

Bar chart with a line showing the difference between original and evolved strategic points

The distance d2 signifies the right midfielder and d7 shows the position of center defense. Thus, these two play a vital role in the strategic placements of the team.

5.3 Benchmark scenarios

This section conducts a series of matches for analyzing the impact of evolution on team performance. A set of benchmark scenarios were created to test the applicability of the FEASO. For this purpose, three variants of Karachi Koalas team have been devised as follows:

  • KKRandom team that decides strategic points randomly

  • KKBase team that uses the heuristic seed values that was our viable working strategy in the RoboCup events and

  • KKEvolved team that uses the optimized variables obtained via FEASO approach.

The selected opponent team is the same RoboCanes team that was used for offline strategy evolution.

5.3.1 Competition among the KKRandom, KKBase and KKEvolved

In this experiment, approximately 120 matches were executed between the three sets of teams. Each match was of 220 cycles amounting to 5-min duration under the same time constraints and same opponent used in the learning phase. Table 10 shows the summarized results obtained from 40 matches against each of the teams.

Table 10 Summarized results of 40 matches executed between variants of KK

The results demonstrate that the evolved team is able to defend more goals as compared to the KKRandom and the KKBase team. Furthermore, it has also been noted that normally the opponent team scored a goal against both the KKRandom and KKBase team in around 120 cycles but with the KKEvolved team, it was either unable to score a goal and even if it did then it took place after 180 cycles of the match. The percentage of ball possession against each of the teams is also observed and provides a proof of concept that the evolved team defenders are able to retain ball possession as compared to the other two teams.

Table 11 examines the statistical significance of the differences between the KKRandom team, KKBase team and KKEvolved team on the basis of the attribute, goals scored against. The p value demonstrates that the KKEvolved team is significantly better than the other two teams.

Table 11 Statistical test on the significance of results obtained

5.3.2 Competition with other teams participating in RoboCup tournament

This section compares the strategy obtained via evolutionary algorithms with other approaches discussed in the literature review. Experiments were conducted with three sets of teams, namely A, B, and C. Team A uses reinforcement learning strategy, team B uses handcrafted strategy, and team C uses layered learning approach. The entire focus of this paper is to learn defensive strategy; thus, the objective is not to win against other strategies but to show that the evolved team KKEvolved performs better than the benchmarked KKBase team. The novelty of this comparison is to demonstrate that the evolved strategy could be generalized to other opponent teams.

Ten matches each were executed between KKEvolved versus team A, KKEvolved versus team B and KKEvolved versus team C for 5 min. Similarly, ten matches were executed between KKBase versus team A, KKBase versus team B and KKBase versus team C with the same time constraint. Tables 12, 13 and 14 exhibit the summarized results.

Table 12 Summarized results of 20 matches executed between team A, KKBase and KKEvolved
Table 13 Summarized results of 20 matches executed between team B, KKBase, and KKEvolved
Table 14 Summarized results of 20 matches executed between team C, KKBase, and KKEvolved

It should be mentioned that all the three teams A, B, and C are strong offensive teams with team C being the champion team participating in the RoboCup event. The results show a drastic performance improvement in KKEvolved team proving this method to indeed be very powerful. The approach learned against a single opponent was generalized among other similar opponents.

6 Comparison with most relevant evolutionary approaches

This section discusses the two evolutionary approaches that are similar to the presented FEASO framework.

It was highlighted earlier in Sect. 2 that the work of Chen et al. (2016) and Ali et al. (2014) is more recent and relevant in terms of the evolution of strategy in robot soccer. A comparison among these approaches and FEASO would be drawn theoretically to signify the novelty of proposed framework.

6.1 FEASO versus MACO

Multi-group ant colony optimization algorithm was proposed (Chen et al. 2016) that used ant intelligence to learn offensive strategies in 2D soccer simulation league. In their work, they learned three models, namely shooting model, passing model and dribbling model based on the success of foraging behavior of ants as an effective cooperative strategy. They divided the field into regions, and their focus was to decide the best model to pick based on the pheromone evaporation preference value. They simulated the attack environment by considering two attackers, two defenders and one goalkeeper in the field. The positions of all the five players were set randomly and the training was conducted till 15,000 times in a single match. For evaluation of the proposed strategy, matches were conducted among the baseline team and the evolved team concluding that the evolved MACO team enjoyed a 100% winning advantage. The approach is similar to FEASO as we have also conducted the same set of experiments with varying opponent teams for performance evaluation. The approach, however, is different from FEASO as FEASO utilizes distributed evolutionary algorithm for learning defense, and the focus is on minimizing the number of goals scored. Secondly, ACO comes under swarm intelligence in which the leader dictates the behavior of the whole swarm and in the described approach the leader is the attacker that takes a decision. However, in FEASO approach, there is no leader and team strategic positioning is governed by the optimization algorithm. The MACO approach considers only a smaller version of the team for learning strategy; however, this paper uses the complete team that makes the optimization process much more complex. A known problem with ACO is the huge convergence time required by the algorithm; however, FEASO approach gets an optimal solution in a couple of generations. MACO approach has been tested in a simple, 2D environment with five players in the field. As described earlier, the domain of 3D is much complex due to its humanoid locomotion and FEASO provides fruitful results while working under dynamic, uncertain environment of simulation 3D.

6.2 FEASO versus adaptive CA

The study (Ali et al. 2014) contributed the use of a simplified and adaptive version of CA to develop defensive and offensive plays in simulated robot soccer. The agents were able to develop the most suitable team formation by utilizing a set of finite state machines in the field. The number of goals scored was used for fitness. For each individual in the population, they defined a set of actions and a set of regions. At the start of CA, all the individuals were initialized randomly. The goal was to train the team to choose the best states depending upon the scenario faced. The belief space comprised of general behaviors associated with the overall plan of the team. Experiments were conducted to test the effectiveness of the approach for 150 generations. After every 15 generations, some statistics were computed that indicated the best regions and the best states. For performance evaluation, matches were executed with multiple opponent teams that either played offensively or defensively and the team that used CA played better in terms of defense as well as offense. The CA-based approach is very relevant as this paper also conducts matches against multiple opponent teams and uses goals scored for fitness evaluation. However, it differs from the CA-based approach as besides considering goals scored it also considers other parameters for fitness evaluation. Secondly, they have stated in their research that the simulator provides noisy data that affects the performance of CA algorithm so to overcome this limitation they have developed a simplified version of the simulator in which actions of all the agents are known. In the FEASO approach, the evolutionary algorithm is robust to noise and the approach is learning strategy in an environment that contains the interactions of 22 agents at a given time instance. In addition, CA-based approaches require some knowledge in the form of belief space, whereas this paper starts with a simple heuristic-based seed value with no prior knowledge required at the time of evolution. Lastly, FEASO is scalable and less time expensive due to its distributed nature that is missing in the CA-based approach. Table 15 highlights the comparison between the proposed FEASO and the other two evolutionary approaches discussed in this section.

Table 15 Comparison of FEASO with most relevant approaches

7 Conclusion

The paper proposes an evolutionary algorithm-based framework for strategy optimization. RoboCup Soccer Simulation 3D has been selected as the test bed for performing these experiments. The team evolved using proposed EA produced better defensive capability as compared to the original team. The simulation results show enhancements in solutions over generations which have been very encouraging and provide initial proof of concept. The performance of the evolved team was compared against other approaches, and it was found the evolved team performed better defense in terms of the goals defended. The future task that needs consideration involves testing and validation of this optimization process for several other teams. Furthermore, a similar approach would also be tested to learn offensive strategies as well. Finally, a multi-objective evaluation function would be formed to test both offensive and defensive scenarios.