1 Introduction

Socio-technical systems are all around us, and increasingly computers are an intrinsic part of these systems [1]. Smart-grids [2], vehicular ad-hoc networks [3, 4], smart buildings [5], e-procurement [6], cloud computing [7], healthcare [8], and transport [9] are some examples of domains where humans interact with computer systems to achieve a particular goal. Due to the speed, complexity, and frequency of decision-making involved in these domains, computers are often required to be autonomic and adaptive to the dynamic environment around them. Indeed this is a foundational reason why we create self-adaptive systems. Socio-technical systems were first defined by Emery and Trist [10] to be a complex interaction between humans, machines and their joint environment. This means that design choices made for the construction/selection of computer programs would also significantly affect the humans in the system.

A commonly accepted definition of an agent is given by Woolridge [11]–“An agent is a computer system that is situated in some environment, and that is capable of autonomous action in this environment in order to meet its design objectives”. Computer systems that function in socio-technical systems are increasingly agents that are, by definition, autonomous. However, in a socio-technical system filled with such agents, each with its own set of constraints, goals and strategies, the resulting collective system may not be predictable. Rather, the multiplicity of agents, their possible actions, and corresponding feedback loops could result in a complex system [12]. This makes the aggregate outcome of individual actions dynamic and difficult to predict. An action taken by an agent at a particular point in time may have an outcome that is vastly different from the same action at a different point in time. In many such systems, agents make (or recommend) decisions that result in allocation of common pool resources [13]. That is, resources in a social context are divided amongst actors/entities in a manner that achieves some objective. In such systems, less-than-ideal outcomes are most clearly visible where some resource is being allocated, and there is no centralized mechanism to ensure that autonomous actions by each agent does not result in a disastrous allocation, at the aggregate level. This is specially so in cases where the agents are not, apriori, designed to be cooperative (e.g., traffic jams, where each vehicle is exhibiting individually rational behaviour). In many domains, the natural model is a competitive set of agents where each agent attempts to fulfill its goals, without regard to the effect it has on the system-as-a-whole. Indeed, if the system is large enough, the agent cannot even view the system-as-a-whole. Examples of such domains include traffic, energy markets, financial markets, etc. In such domains, multi-agent system designers need a mechanism that can harness the multiplicity of agents to produce good outcomes, even where the agents are not explicitly designed to cooperate.

In other words, resource allocation mechanisms in multi-agent systems must guard against some agents grabbing more than their fair share of resources without actually trampling on the autonomy of the agents themselves. Typically, resource allocation in multi-agent systems are analyzed for two types of aggregate outcomes: (a) efficiency, and (b) social welfare. The issue of efficiency is quantified as Pareto-Efficiency: where the total allocation is such that it is not possible to improve the allocation for some agents without making it worse for some other agents. A pareto-efficient allocation, therefore, represents a situation where the solution cannot be bettered, without hurting at least one other agent. Social welfare is usually codified as envy-freeness: no agent would rather hold the resource bundle held by another agent [14]. For example, consider a route-choice problem through a city i.e., pick road A or road B to reach a destination. This is a familiar situation in many cities, where multiple humans, utilizing commonly known (albeit coarse-grained) information about traffic flows, along with their own private heuristics (e.g., road A is generally congested from 7:30-8:30 am), make route choices to reach their destination. These agents may use additional machine intelligence in the form of real-time traffic updates through GPS devices/smartphones, and shortest-path algorithms, while making their choices. The reward for this choice is determined dynamically, based on the aggregate number of agents that make a particular choice without prior coordination, i.e., the agent that chooses the ‘road less travelled’ would experience lower congestion. There is no way to force an agent to make a particular choice, or to choose a particular strategy. However, regardless of the strategy used to pick a route, it must yield a satisfactory outcome for all agents, at least some of the time, else some agents would be envious of others. Envious allocations are not a good outcome in socio-technical systems, in contrast to efficiency-based problem formulations, where the only concern is to optimize the traffic flow through that route. As pointed out by Chevaleyre et al., in some systems, it is impossible to allocate resources (roads, in this case) such that the result is both efficient and envy-free [15]. Furthermore, in certain simultaneous games, even solely envy-freeness is impossible to guarantee [16].

This paper is concerned with ameliorating the negative effect of the impossibility of envy-freeness in such games, by enabling long-term fairness through cumulative allocations. It is in this context that we introduce the idea of fairness. We distinguish fairness from envy-freeness, by observing that a series of envious allocations may, over time, turn out to be cummulatively fair. We invoke Rescher’s notion of fairness through distributive justice [17] over the total resources allocated amongst all agents. All agents have a legitimate claim on the road, and distributive justice consists of reconciling these claims in the face of plurality of allocation choices. Rescher says, “... a claim represents what an individual ought ideally to be given in the light of the principles of justice ...” [18]. In the case of the route-choice problem, this would be the evenness of the rewards (experience lower congestion) that accumulate to drivers, over multiple days. That is, even though it is impossible to ensure that all users in a road network get the same rewards on any given day, it is sufficient to ensure that, over time all users get approximately the same cummulative reward. The lack of envy-freeness on any given day, is compensated by the distribution of envious allocations over time and across agents.

The contribution of this paper is to present a technique that allows us to achieve both efficiency and fairness. We borrow diversity as a metaphor from ecology, and model the multi-agent-system as an ecosystem, where each agent is concerned in a selfish way, only with its own survival. We had previously shown [19] that diversity in an ecosystem of competing agents can drive the collective measure of efficiency and fairness higher. However, the algorithmic diversity demonstrated was achieved via deliberate human design. This is not a scalable mechanism (or indeed, a sustainable mechanism) for long-lived, large multi-agent systems consisting of thousands of agents. In this paper, we present a mechanism (clonal plasticity) that allows agents to self-diversify in an autonomic fashion. We evaluate the diversity achieved via clonal plasticity, and reflect on its scalability.

The rest of this paper is structured as follows: First, we review resource allocation in multi-agent systems and the various ways in which diversity has been used in software systems (Sect. 2). We also review various mechanisms of introducing and sustaining diversity used in other domains of computer science (Sect. 3). Next, we make explicit what we mean by diversity in multi-agent systems, and how the diversity present in a multi-agent system can be quantified (Sect. 4). We consider a multi-agent system, where multiple agents act autonomously to secure resources (Sect. 5) and show that increasing the amount of diversity results in an increased distributive justice amongst the agents (Sect. 6). We then introduce a self-tuning mechanism (clonal plasticity) that measurably increases the diversity amongst the agents (Sect. 7) and formalize the expected fairness from clonal plasticity (Sect. 8). We show experimentally that clonal plasticity does increase diversity, compare its effectiveness against manual diversification, and discuss the scalability of clonal plasticity (Sect. 9). We also contrast it against other phenotypic-diversity generation mechanisms(Sect. 10) as well as non-phenotypic diversity generation mechanisms (Sect. 11). We reflect on the limitations and weaknesses of clonal plasticity (Sect. 12) and finally, we conclude the paper (Sect. 13) with a few thoughts on open questions and potential areas of further research.

2 Related work

We first look at the concept of diversity, as transferred from the ecological sciences, and how it has been implemented in computer systems. Then we review the work on decentralized resource allocation in multi-agent systems.

2.1 Diversity in software systems

In the natural world, diversity has long been considered a contributor to stability of natural ecosystems [20, 21]. While not sufficient to guarantee the stability of an ecosystem, it is nevertheless considered critically important, due to the Insurance Hypothesis [22,23,24]. Briefly, the Insurance Hypothesis relies on the observation that “differential response by species to a perturbation or variation in environment causes a dampening of the effect of the perturbation”. Simply put, an increase in diversity increases the probability that there will exist some species that is able to respond differently to environmental perturbations. The presence of diversity is therefore hypothesized to be an insurance against an ecosystem collapsing due to perturbations.

2.1.1 Security

Researchers have applied the notion of diversity in order to protect systems from attacks and intrusions. The principal use of diversity in these approaches has been in the form of creating a moving target for defense. The idea is to reduce the predictability in the structural layout of programs in memory, thus making it more expensive for attackers to create an attack that can work across a range of systems. Multiple stages in the lifecycle of software development such as, creation [25], compilation [26,27,28,29], deployment of software [30,31,32,33] have benefited from the introduction of variability.

2.1.2 Search and machine learning

Differential response has also been used to explore different parts of a search space, in order to aid search techniques such as genetic algorithms [34] and particle swarm optimization [35, 36]. The presence of diversity has been used to create ensemble learners that learn faster than other learners [37]. There has been work in the area of support vector machines that incorporate the notion of diversity to create more accurate active-learning [38, 39]. Other computational intelligence techniques such as Artificial Immune Systems have also recognized diversity [40, 41] to be integral to its effectiveness. Almost all techniques in the field of search and computational intelligence now explicitly incorporate mechanisms to generate and maintain diversity.

2.1.3 Software engineering

Apart from resilience in the presence of attackers, diversity as a concept has also been investigated for software product lines [42] in the form of design diversity, management of functional diversity using plugins [43] and survivability [44]. Diversity, as a software engineering phenomenon has also been extensively studied in various surveys [45,46,47].

All of these surveys and papers have considered “differential response by a species” to be the critical element of diversity. However, to the best of our knowledge, we are not aware of any prior work that proposes a systematic technique for diversification of multi-agent systems (MAS). This is critically important since agents in a MAS are being used to control and coordinate complex pieces of software, such as smart-grids [2], vehicular ad-hoc networks [3, 4], smart buildings [5], healthcare [8], and transport [9] systems.

2.2 Resource allocation in multi-agent systems

The allocation of resources within a society of agents such that each individual agent’s preferences are met forms a multi-disciplinary research area, with inputs from Computer Science and Economics. There are many approaches to realizing an allocation, and each approach has its own properties with regard to three fundamental axes: (a) what kind of resources (divisible / indivisible) are being allocated? (b) what is the allocation procedure to be followed? and (c) what constitutes a good allocation? Each of these axes can be sub-divided further:(i) how are resources and preferences represented? (j) how do agents negotiate amongst themselves? (k) what is the computational complexity of the entire allocation process, and so on. Depending on the domain where multi-agent resource allocation (MARA) occurs, different combinations of these properties will take priority over others. The breadth of domains for which MARA has been used is broad enough (network routing [48], air traffic control [49], manufacturing [50, 51], transport and logistics [52], e-procurement [53], and even sharing satellite resources [54]) that these properties become the critical factors on which the choice of allocation approach in a particular software system hinges.

2.2.1 Types of resources

Different types of resources may be in contention for allocation. Some resources are divisible (e.g., network bandwidth [48]) whereas others are indivisible (e.g., slot for landing aircraft [49]). Some resources are shareable (e.g., images from a satellite [54]), while others are non-shareable (e.g., resources in a factory [50]). Some domains may even have multiple types of resources, but may have business constraints that only allow certain bundles of resources to be allocated (e.g., multi-unit e-procurement [53])

2.2.2 Allocation approaches

Fundamentally, all allocation procedures can be classified as being centralized or decentralized. Centralized approaches are mostly in the form of combinatorial auctions [55], which may themselves use multiple strategies for trading (e.g., machine-learning [7], game-theory [56]). Decentralized approaches include negotiation protocols such as Contract-Net [57] and its variants (Leveled Commitments [58], Constraint Propagation [59]). Centralized approaches have the advantage of being easy to understand and implement, however they suffer from their inability to scale. For a large system, consisting of hundreds or thousands of agents, scalability is a critical issue. Decentralized approaches, on the other hand, scale well to large numbers of agents, which are typical of large socio-technical systems such as traffic on city roads. There have also been attempts to extract the best of both worlds by performing resource allocation in a hybrid manner, with a mix of centralized and decentralized decision making. [60,61,62] are all approaches that utilize particle-swarm optimization as a middle ground, with each individual particle behaving in a decentralized manner, along with neighbourhood rules that result in the creation of small hierarchies (partial centralization).

2.2.3 Social welfare

One of the objectives of MARA is the achievement of allocations that reflect the desired preferences of the individual agents. This introduces an axis, where the aggregate social distribution of resources is as important as the efficiency and ease of allocation. Concepts from social welfare and social choice theory [63] influence the allocation procedures, and the desired ordering of resource allocations. Bentham’s utilitarianism [64] is concerned with maximizing the total utility in the system. Other theories such as Rawls’s theory of justice [65] advocates an egalitarian allocation which is measured by the utility gained by the agent that is worst-off. Other notions of welfare include elitist social welfare (measure the utility of the agent that is best-off) and maximin ordering (ranks allocations such that the worst-off agent has the highest possible utility) which can be used to evaluate the allocations generated by various schemes.

2.3 Distributive justice

The notion of distributive justice has been dealt with in multiple works in the field of social choice theory. However, in the field of computer science, we are aware of only one thread of research [66, 67], that explicitly attempts to ensure distributive justice in multi-agent systems. We view our work as complementary, as we approach the same research theme, but from two different ends of the spectrum. Pitt et al. develop a formal framework for operationalizing distributive justice rules in self-organizing systems [66]. In this work, we present an empirical approach that measures the amount of distributive justice under certain conditions and present a mechanism to autonomically increase this justice in homogeneous systems.

3 Diversity generation mechanisms in literature

As is evident from Sect. 2, diversity has been well-recognized as a mechanism for positively influencing system properties, in many domains. However, there hasn’t been a similar wealth of work in actually generating diversity, in a homogeneous system. In the field of security, diversity through randomization [27,28,29,30,31,32,33] has been the most dominant tactic. Most work in automated diversity generation has been in the domain of evolutionary and swarm-based algorithms, where populations (and therefore diversity) are a natural solution concept.

3.1 Evolutionary computing

Search techniques such as Evolutionary Algorithms, Genetic Algorithms, Genetic Programming, etc. all rely on population-based approaches along with fitness functions that aim to mimic the natural process of evolution [68]. In all of these techniques, diversity is a key factor in whether the population is able to converge to a good solution. Note that diversity, per se, is not a requirement, rather it is a strong observational recommendation. This is echoed by McPhee and Hopper in [69], “Progress in evolution depends fundamentally on the existence of variation of population. Unfortunately, a key problem in many Evolutionary Computation (EC) systems is the loss of diversity through premature convergence”. Diversity, hence, is generated or preserved through various approaches. Cobb and Grefenstette introduced a random immigrants approach [70] where random individuals are inserted into the population in every generation. Introduction of specialized selection operators (such as incest prevention [71] and mating restrictions [72]) also encourage diversity in the population by disallowing genetically similar individuals to mate. Črepinšek et al. [73] classify different strategies for diversity maintenance and diversity control in a population and observe that most methods simply expect an increase in diversity and hence, a better balance between exploration and exploitation. They however caution that not all diversification is useful. Depending on whether more exploration is required or more exploitation is necessary, the levels of useful diversity will vary. That is, increasing diversity by itself does not necessarily mean that the system will perform better.

3.2 Swarm-based optimization algorithms

Swarm-based algorithms have been used for several combinatorial and optimization problems. Algorithms in this category view swarms as any restrained collection of interacting agents or individuals [74]. Particle Swarm Optimization (PSO) was introduced by Eberhart and Kennedy [75] as a mechanism for optimization of non-linear functions. In a similar vein, swarm-based algorithms such as Ant Colony Optimization [76] and Bee Colony algorithms [74, 77] have been proposed as meta-heuristics for solving combinatorial problems in traffic and transportation [78]. The central problem in all of these optimization algorithms is how to avoid local minima in complex search spaces, and achieve global optima quickly. Diversity is usually achieved in these methods using concepts such as ’critical thresholds’[79], ’electro-static charges’[80], etc. The fundamental idea in these methods is to probabilistically avoid dense neighbourhoods by making particles move away from each other upon triggering a certain event/threshold.

3.3 Machine learning

In the field of Machine Learning, generation of diversity has been explicitly considered in ensemble-based methods. Diversity of neural network architectures has been investigated through the process of speciation [81]. In [81], the problem of combining multiple artificial neural networks is tackled via a Darwinian evolutionary process along with fitness sharing as the key ingredient that promotes speciation, and thus the creation of diverse neighbourhoods. Their mechanism specifies a population-wide calculation of fitness, and subsequent adjustment of individual-fitness based on the sharing radius. Fitness sharing penalizes individuals in dense neighbourhoods, while individuals in sparse neighbourhoods are promoted, thus increasing the probability that diverse individuals are carried forward into the next generation.

Concept Drift is an important topic in online learning mechanisms, where classifiers that learn on a particular dataset, are subsequently required to classify using variables that have changed with time. Minku et al. ([82]) had shown that while ensembles with low diversity showed lower test errors, they were unable to cope with a dynamic environment and consequent concept drift. Ensembles with higher diversity were able to cope better, regardless of the type of drift encountered. Subsequently, they proposed a method that used four ensembles with two levels of diversity [37] to successfully handle environments, both without drift as well as with concept drift. The levels of diversity are generated using different values for presenting a training example. These different values (drawn from a poisson distribution) result in different ensembles learning different classifications.

4 Algorithmic diversity

Decision-making and resource allocation mechanisms are all realized by various algorithms, and hence we view algorithmic diversity as an appropriate level of granularity for achieving our goals. We define Algorithmic Diversity as: a variability in the output produced by a set of algorithms, when faced with the same input/input-sequence. We do not consider differences in the internal data structures used (stacks, queues, trees, etc.) or control flow implemented (iteration, recursion, continuations, labelled jumps, etc.) for two reasons:

  1. 1.

    Detecting whether two algorithms are the same even when they are exceedingly simple in their construction is so hard that for practical purposes it is infeasible [83]

  2. 2.

    More philosophically, from the perspective of the environment (or other agents), if two systems produce exactly the same output (behaviour), regardless of the input sequence, then they are identical, regardless of their internal construction.

This definition positions us in domains where there are multiple valid answers and no deterministic ways of calculating the optimal response in advance. Autonomous systems fall squarely into this category since the environment in which they operate is dynamic in nature. Traffic flow management systems are a good example of dynamically changing environments. Each vehicle in traffic has (possibly) different source and destination and different strategies in how it gets there (e.g., fastest route, shortest route, least polluted route, etc.). It is in these scenarios that aggregate outcomes such as efficiency and fairness are important. While autonomous agents may negotiate amongst themselves to reach an efficient allocation of road resources, it is not necessary that this allocation is fair. It is in this context that we hypothesize that diversification of algorithms leads to a fairer distribution of rewards amongst the population, regardless of the actual algorithms in use. This hypothesis was first formulated and tested in [19]. In this work, we extend the results as well as introduce an autonomic mechanism to allow agents to self-diversify. We also compare the effectiveness of the autonomic mechanism against the manual efforts earlier.

We use a self-organizing game called the Minority Game (MG) [84] as our exemplar Multi-Agent System in a traffic setting, primarily because it has been well-studied and well-understood. The Minority Game (MG), introduced by Challet and Zhang, consists of an odd number (N) of agents, playing a game in an iterated fashion.Footnote 1 At each timestep, each agent chooses from one of two actions (A or B), and the group that is in the minority, wins. Since N is an odd integer, there is guaranteed to be a minority. Winners receive a reward, while losers receive nothing. After each round, all agents are told which action was in the minority. This feedback loop induces the agents to re-evaluate their action for the next iteration.

5 Minority game

While simple in its conception, MG has been used in many fields like econophysics [85,86,87], multi-agent resource allocation [88, 89], emergence of cooperation [90], and heterogeneity [91, 92]. The Minority Game (MG) is intuitively applicable to many domains, such as traffic (the driver that chooses the ‘less-travelled’ road experiences less congestion), packet traffic in networks, ecologies of foraging animals, etc. MG has been acknowledged to be an example of a complex adaptive system, where the individual behaviour is simple to understand, yet the aggregate dynamics are complex and have served as a proxy in multiple domains such as financial markets [93], distributed decision-making in wireless networks [94], and smart-buildings [95].

5.1 Description of the game

There is a set of \(N = \{1, ..., n\}\) agents (\(\hbox {N} = \hbox {odd}\)), each implementing an algorithm \(A = \{a_1,...,a_n\}\). At every iteration of the game \(G_i\), each agent chooses from a binary set \(\{A,B\}\). Thus, at each iteration i of the game (G):

$$\begin{aligned} G_i (a_j) \rightarrow A\vert B \end{aligned}$$

refers to agent j making a binary choice from A or B. If the subset of agents that chose A is smaller than the subset of agents that chose B, then the agents that chose A are said to be in the minority (\(minG_i\)) at iteration i. And vice versa. The agents that are in minG for any particular iteration, earn a reward of one unit. The agents that are not in minG receive no reward. If \(R_i\) is the reward earned by an agent for a particular iteration, then the total reward earned by an agent after T iterations is given by:

$$\begin{aligned} \sum _{i=1}^{T} R_i (a) \end{aligned}$$

Clearly, the allocation of rewards is most efficient when the size of \(minG = N/2\). While the allocation is least efficient when the size of \(minG = 1\) (one agent chooses A, and all the rest choose B).

5.2 Measuring algorithm diversity

In MG, each agent shares the same functional objective, to continue to be on the side of the minority, and thereby accumulate a reward. The difference in each agent is the algorithm used to compute the best action (A or B). Depending on the type of algorithm used, differences in initialization parameters could lead to different actions chosen (e.g., in evolutionary or machine learning algorithms). Thus, any algorithm that exhibits a different output-sequence is categorized as a different variety. The calculation of diversity in the game depends only on the variety and balance in the population of agents (implementing those algorithms). This calculation is most famously done by the Shannon Index (Eq. 1). The Shannon Index measures the entropy (or uncertainty) in picking an individual of a particular type, from a dataset [96]. Thus, for instance, if all the agents in a MAS implement exactly the same algorithm, the entropy (or uncertainty) will be zero. The higher the diversity in the population, the higher is the Shannon Index. In Eq. 1, R is the number of types in the population, and p is the proportion of individuals in the population with type i.

$$\begin{aligned} H' = -\sum _{i=1}^R p_i \ln p_i \end{aligned}$$
(1)

The Shannon Index is a measure of both, the richness of species (number of different algorithms) and evenness of species (abundance of agents implementing an algorithm). We use the Shannon Index to measure how much algorithms differ, in their key parameters. Thus, if a particular initialization parameter (that has an effect on the output-sequence3) can take multiple values, the Shannon Index will measure how many such differentiated types exist and how many individuals implement that type, as a proportion of the total population.

6 Effect of algorithmic diversity

In this section, we re-cap the important aspects of the results from [19]. Diversification of algorithms was performed on two levels: (a) Parameter Diversification, and (b) Strategy Diversification.Footnote 2 Three algorithms were chosen as the candidate algorithms:

  1. 1.

    BestPlay Introduced by the creators of the MinorityGame, this is a simple strategy that nevertheless exhibits interesting dynamics at the aggregate level [84]. In this strategy, all players had access to a bounded collective memory (m) of the game, along with a pool of strategies(s). Collective memory is implemented in the form of a history of which action (+1 or -1) was in the minority, at each timestep. Each player drew s strategies randomly from a strategy space, that indicated which action to play next, based on the history of the game. Each player allocated virtual points amongst all its strategies, based on the continually changing history of the game. Strategies that correctly predicted the correct output, regardless of whether they were used or not, received virtual points. Each player, for the next round, used the top-ranking strategy amongst his pool of strategies. Table 1 shows a sample strategy pool used by a player. The size of each strategy depends on m, the memory of the game, while the total number of strategies in the pool depends on the computational power the agent possesses. Table 1 is therefore taken from a game with a memory of size 3, while the agent has a pool size (k) of 4. Typically, each agent in the entire game has the same pool size (k). While it is intuitively obvious that the greater the pool size, the greater the probability of an agent drawing good strategies, it is impossible for an agent to have a pool size that exhaustively covers every possible strategy, since the strategy space grows hyper-exponentially(\(2^{2^{m}}\)) with increase in m.

  2. 2.

    Evolutionary Evolutionary algorithms have been used quite successfully in multiple domains [97, 98]. As pointed out in [99], evolutionary algorithms are now being compared to human beings, in their ability to solve hard problems. We have implemented an evolutionary algorithm that uses BestPlay’s strategy representation as a genome and performs crossover and mutation on it. The strategy is as follows: At every reproduction cycle, the ten most poorly performing agents get rid of their strategies and adopt a combination of strategies from two parents randomly selected from the ten best performing agents. Effectively, two of the ten best performing agents have reproduced to create an offspring that replaces one of the ten worst agents. The offspring mutates the strategies obtained from its parents, by some small mutation probability. This allows the offspring to get better, as well as explore more of the strategy space. The reproduction cycle refers to the rounds after which evolution occurs. A frequency of 20 cycles means that after every 20 rounds of the game, the worst 10 agents discard their strategies and inherit good strategies from 2 of the top 10 agents. These are then mutated according to the agents’ own mutation probability.

  3. 3.

    Roth-Erev We implement a Reinforcement-Learning algorithm, called Roth-Erev [100]. Reinforcement-Learning(RL) is a class of machine-learning algorithms where an agent tries to learn the optimal action to take, in a certain environment, based on a succession of action-reward pairs. Given a set of actions (a) that an agent can take, the Roth-Erev learning algorithm performs the following steps:

    1. (a)

      Initialize initial propensities (q) amongst all actions (N)

    2. (b)

      Initialize strength (s) of initial propensities

    3. (c)

      Initialize recency (\(\varphi \)) as the ‘forgetting’ parameter

    4. (d)

      Initialize epsilon (\(\varepsilon \)) as the exploration parameter

    5. (e)

      Choose action (\(a_{k}\)) at time (t), based on q

    6. (f)

      Based on reward (\(r_{k}\)) for \(a_{k}\), update q using the following formula:

      $$\begin{aligned} q_{j}(t + 1) = [1 - \varphi ]q_{j}(t) + E_{j}(\varepsilon ,N, k, t) \end{aligned}$$
      $$\begin{aligned} E_{j}(\varepsilon ,N, k, t)&= {\left\{ \begin{array}{ll} r_{k}(t)[1 - \varepsilon ] &{} \text {if } j = k \\ r_{k}(t)\frac{\varepsilon }{N - 1} &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
    7. (g)

      Probability of choosing action j at time (t) is given by:

      $$\begin{aligned} P_{j}(t) = \frac{q_{j}(t)}{\sum _{n=1}^{N}[q_{n}(t)]} \end{aligned}$$
    8. (h)

      Repeat from step 3e

Table 1 Sample strategy pool for a player

6.1 Parameter diversity

Parameter diversity refers to diversity in the initialization parameters of the algorithm being implemented by the agents. That is, for any game, all agents still implement the same code, but some parameter in the algorithm is diversified. For the Evolutionary and Roth-Erev algorithms, even minute differences within the initialization parameters can cause the aggregate average payoff to vary. For each of the three families, the following parameters were diversified within the algorithm:

  1. 1.

    Best Play algorithm: k–which is the pool of strategies, a player has (strategies-per-agent).

  2. 2.

    Evolutionary algorithm: The mutation probability that allows an agent to explore the strategy space, by mutating known good strategies.

  3. 3.

    Roth-Erev algorithm: The parameters of recency (\(\varphi \)), and exploration (\(\varepsilon \))

6.2 Strategy diversity

We mix agents implementing different algorithms (strategies) into one population. The diversification here takes place on two levels, parameter-diversification (as before), as well as proportion of population implementing a particular strategy. The total population of agents in all scenarios is kept constant, but the proportion of agents playing a particular strategy is varied. This leads to different levels of diversification, for each combination of strategies. The parameters that were varied were as follows:

  1. 1.

    Number of algorithms This reflects the effective number of families (or genera) of species that are present in the population.

  2. 2.

    Proportion of population implementing an algorithm Depending on the relative number of each specie, the aggregate reward in the game changes.

  3. 3.

    Parameters within algorithms that were diversified Variation in critical parameters for an algorithm results in effectively creating a new specie, since the functional pathway (and resultant output) begins to change depending on the environment.

Table 2 shows the constant factors in each experiment. Each minority game was played with a population size of 501 (N) agents, through a simulation time period of 2000 steps (T). For each variation in the experimental setup, the data is reported as an average of 100 simulations.

Table 2 Experimental constants for minority game

6.3 Results

We combine the results from all of the diversification to illustrate the comparative rewards achieved at different levels of diversity. The graphs show the distribution of the population achieving a particular level of average reward. The ‘fatter’ the graph, the more evenly rewards are distributed, and vice-versa. Figure 1 clearly shows that the higher the diversity level, the fairer the average reward. Not only is the reward fairer, it is also higher, which is an important consideration for resource allocation strategies. Since it is the same three algorithms that are mixed in different proportions to achieve different levels of diversity, it is clear that diversity-level is the distinguishing factor in the level of rewards.

Fig. 1
figure 1

Payoff across all diversification

As was pointed out earlier in this section, MG represents a class of games where envy-freeness is impossible to achieve. After any given round, a majority of the agents are envious of the allocations secured by the minority. Algorithmic Diversification offers an alternate resolution, in such games, where multiple iterations of the game result in approximate envy-freeness. That is, on multiple iterations of an envious game, the introduction of algorithmic diversity, moves all the agents towards receiving approximately the same amount of reward As seen in Fig. 1, the increase in diversity results in increased fairness, which can be seen as a proxy for envy-freeness. This occurs due to a ’rotation’ of agents that receive a reward, through the multiple iterations. The faster this rotation, in allowing agents that have previously not been rewarded to receive rewards, the greater the likelihood of envy-freeness.

7 Automated generation of diversity

In this section, we introduce the main contribution of this paper: a self-adaptation mechanism called clonal plasticity which can be used by a group of agents to self-diversify. We use Clonal Plasticity, to generate diversity automatically amongst the algorithms implemented. We show that this generated diversity is competitive with the manually introduced diversity presented in the previous section.

We had previously used clonal plasticity [101] to enable decentralized adaptation, based on localized environmental changes. Due to its decentralized nature, the adaptations performed by multiple agents in the system, results in variations being naturally produced at a system-wide level. The process of clonal plasticity is briefly explained in the following sub-section.

7.1 Clonal plasticity process

Clonal Plasticity is an adaptation strategy, formed through the combination of two processes present in plants: (a) clonal reproduction, and (b) phenotypic plasticity. Clonal Reproduction refers to “... sequential reiteration of a basic structural unit or module consisting of the leaf with its associated auxiliary bud and a segment of stem that connects it to other units” [102]. The fundamental idea in this form of reproduction is that the daughter plant is genetically identical to the parent plant, and hence the term clone. Phenotypic Plasticity refers to the ability of an organism to vary its structure (height of stem, width of leaves, etc.) or behaviour (connect to its clonal sibling or disconnect) according to environmental influences. Importantly, this adaptation does not include variation in an organism due to genetic factors. We combine these two mechanisms to propose a self-adaptation mechanism called clonal plasticity. Clonal Plasticity is distinguished from other evolutionary mechanisms such as genetic algorithms, in that the genome of the organism is not modified at all. From a software system’s perspective, this means that a clonally plastic software will deviate in some aspects of its behaviour, but will functionally remain the same.

The process of plastic reproduction, depicted in Fig. 2, is given by the following steps:

  1. 1.

    Identify plasticity points At a given moment in time, t, of the system’s lifetime, each individual agent in the system (depicted a circle in Fig. 2) identifies all its parts that can take multiple representations, values, or behaviour.

  2. 2.

    Evaluate environmental input Simultaneously, the fitness of each individual agent is self-evaluated in the following way: for each plastic point, environmental feedback is evaluated to check whether it impedes or favours that point. Feedback from the environment may come from different sensors or neighbouring agents, or the agent’s own reproduction memory (m in Fig. 2).

  3. 3.

    Plasticity memory All agents that have reproduced before, keep a record of the previous Plasticity Range chosen during the last reproduction cycle (variable m in Fig. 2). This allows an individual agent to repeat a good adaptation strategy, if it previously had a positive reward, or alternatively abandon a poor adaptation strategy.

  4. 4.

    Choose plasticity range During the following time-step (t+1), each agent chooses a plastic response, based on the evaluated environmental input. The available responses may be one of:

    1. (a)

      Exact Clone,

    2. (b)

      Low Plasticity, and

    3. (c)

      High Plasticity.

  5. 5.

    Clone and modify plastic points During the same time-step, each agent makes a clone of itself. The individual agent’s plasticity points may be adapted according to the chosen strategy. The entire process starts again with a new time-step, t \('\), on all clones (Fig. 3).

Fig. 2
figure 2

Clonal plasticity process

Fig. 3
figure 3

Create a plastic clone of an algorithm

7.1.1 Cloning frequency

Every adaptation action, in any system, incurs a cost, either computational or communicative. It must be possible to evaluate the effect of the cloning process, before creating the next plastic clone. t + 1 refers to the next time-step when it is feasible to perform the next cloning process. In the agent that implements plasticity, the granularity of time when the next iteration of the cloning process is done, is domain dependent. For instance, domains such as traffic and smart-grids are different in how dynamically their respective environments change. To allow for each adaptation to take effect, we introduce the notion of cloning frequency. Cloning Frequency refers to the time-period for which an agent is evaluated before the next iteration of the cloning process starts.

7.1.2 Choosing the plasticity range

The process of choosing the plasticity range is the critical part in generating a diverse population. This process of modifying the plastic points considers both positive and negative feedback from the environment. For a given time instant t, the adaptation decision depends on two factors:

  1. 1.

    Feedback from environment at time t, whether it is positive or negative.

  2. 2.

    Memory of the modification action, taken at time t-1.

If the feedback from the environment is positive, and memory of the previous action is also positive, then the Plasticity Range is chosen to be Exact Clone. If the feedback from the environment is positive, but the memory of the previous action is negative, then the Plasticity Range is chosen to be Low Plasticity. If the feedback from the environment is negative, then the Plasticity Range is always High Plasticity

7.1.3 Exact clone

In this case, the individual simply makes a copy of itself, with no change at all.

7.1.4 Low plasticity

Modification with Low Plasticity implies that the individual chooses the same plasticity point, as chosen during the previous cloning process, and moves in the same direction as before.

7.1.5 High plasticity

Modification with High Plasticity implies that the individual chooses a different plasticity point, than was chosen during the previous cloning process, and moves in a random direction.

7.2 Clonal plasticity for MG

Here, we modify the agents, such that each agent is able to reproduce in a clonally plastic way. Cloning, as opposed to other evolution-based mechanisms, is completely decentralized as there is no universal fitness function that sorts the population in to fit and unfit individuals.

Making an algorithm plastic requires following the process of plastic reproduction

  1. 1.

    Identification of plasticity points BestPlay has only one variable that influences its decision-making, i.e., memory. Hence, there is only one plasticity point for agents implementing BestPlay. Similarly, for the Evolutionary strategy, there is one variable that influences how the decision making varies within the population, i.e., mutation. Roth-Erev has two variables that can be plasticized, i.e., recency and epsilon

  2. 2.

    Evaluate environmental input The environment allows each agent to know how well its algorithm is doing. That is, depending on the performance of the agent in successfully choosing the minority group, the agent would have either get positive or negative feedback. In the context of MG, cloning frequency refers to the number of times an agent plays the game, before its performance is evaluated. Thus, a cloning frequency of 5 means that an agent’s performance is evaluated after every 5 rounds. Clearly, the lower this number, the faster the agent reacts to the environmental feedback. Agents playing BestPlay or Roth-Erev were evaluated with cloning frequency ranging from 5 to 105 rounds. Agents playing the Evolutionary strategy were evaluated with cloning frequency ranging from 20 to 100. This difference is due to the fact that preliminary experimentation revealed that the Evolutionary algorithm required 20 rounds to produce a noticeable effect in an agent’s performance. The effect of cloning frequency on diversity generation is discussed in Sect. 9.2. The memory (m) of the previous algorithm parameters allows an agent to return to a previously known good configuration, if the current modification resulted in losses.

  3. 3.

    Choose plasticity range The plasticity range chosen by a particular agent defines how the agent decides to adapt. Each of the algorithms (BestPlay, Evolutionary, Roth-Erev), depending on its choice of parameters, decides how the agent would play upcoming rounds of MG. Any change of these parameters could cause the agent to change its decision, and thereby affect the rewards that it accumulates.

7.3 Worked out example

Now we perform the clonal plasticity process step-by-step for BestPlay. For details on the clonal plasticity process for the Evolutionary and RothErev algorithms, see appendices A and B (the process differs only in the plasticity points identified for each algorithm).

  1. 1.

    Identify plasticity points The BestPlay algorithm critically depends on the size of the memory, that it keeps of the game (see Table 1). Two agents that diversify into different directions (one increasing the memory size, and the other decreasing) could end up with drastically different rewards. Here, m is identified as a plasticity point of the algorithm. Another plasticity point is the number of strategies (k) that the agent possesses.

  2. 2.

    Evaluate environmental input For any given time instant, the algorithm has a 50% chance of picking the right group (i.e., being in the minority). Hence, for any given period of cloning frequency, if the agent has accumulated (wins-this-cycle) more than 50% of the rewards during the period, the feedback is said to be positive. If the agent accumulates less than 50% of the rewards, then the feedback is negative.

  3. 3.

    Plasticity memory The agent keeps track of the rewards that it had accumulated during the previous adaptation (wins-last-cycle). Along with the feedback, this helps it decide what plasticity range to choose next (see Table 3).

  4. 4.

    Clone and modify plastic points In the event of Exact Clone, the agent does not change any of its parameters. In the event of Low Plasticity, the agent picks one of its plastic points and modifies it in the same direction, as the plasticity memory from the previous generation. In the event of High Plasticity, the agent modifies all its plasticity points to return to the values that it had from the previous generation.

Table 3 Criteria for choosing plasticity range

8 Requirements for distributive justice

In Sect. 5.1, we saw that efficiency in allocations would only occur, if \(minG = N/2\). However, it is entirely possible that in repeated iterations of the game, the same subset of N agents continue to be a part of minG. This would mean that the allocations while efficient, are unfair, leading to envy amongst the agents. Ideally, we would like envy-freeness to be a property of clonal plasticity. While envy-freeness is an important concern in many domains, there are some areas, specifically in simultaneous games, where envy-freeness is impossible [16]. Balkanski et al. [16] show that for any resource allocation protocol which involves simultaneous choice by more than 3 agents, it is impossible to construct an envy-free protocol. This impossibility is intuitive in the case of the Minority Game where, by construction, during every round, a majority of the agents are denied a reward and are envious of the minority.

In view of the impossibility of an envy-free protocol ever being created, we focus on an alternate form of fairness, i.e., distributive justice.

Distributive justice, in the context of MG, implies the following condition

$$\begin{aligned} \sum _{t=1}^{T} R_t (a_j) \approx \sum _{t=1}^{T} R_t (a_k) \forall (a_k \ne a_j) \end{aligned}$$
(2)

Equation 2 simply states that, over a period T, every agent (\(a_j\)) in MG must get approximately the same reward (\(R(a_j)\)), as any other agent (\(a_k\)). In the ideal case, every agent would get exactly the same reward as every other agent. However, this can be trivially shown to be impossible under the following conditions of agent abilities:

  1. 1.

    Autonomy Each agent is autonomous and cannot be coerced by any other agent. The agent’s choice is dependent solely on its own algorithm.

  2. 2.

    Simultaneity Each agent makes its choice in a simultaneous fashion, and no agent has access to any other agent’s decision before it makes its own.

Proof

Assume a set of algorithms A that are implemented in the agents playing MG. If this set A meets two conditions, then MG would be both, efficient and fair:

  1. 1.

    The size of the minority is the largest possible, while still being a minority, and

  2. 2.

    The members of the minority group, in any iteration of MG, have never been in a minority before

then, after two iterations, there will be only one agent that has never been in a minority before. Therefore, in iteration 3, the sole agent participates in the minority group, and thus meets the game is both, efficient and fair.

For any mechanism, meeting condition 1 requires that the size of the minG be N / 2 for iteration 1 and iteration 2. Meeting condition 2 requires that the list of agents that have already participated in minG be known at all times, and any agent that has previously been a part of minG be restricted by some mechanism. Meeting condition 2 violates the environmental assumption of Autonomy, while meeting condition 1 violates our environmental assumption of Simultaneity. Thus, there is no way for any mechanism to assure equality of rewards and be efficient, without violating our operating conditions.

8.1 Desirable properties of clonal plasticity

It is evident that equation 2 would be impossible to achieve with MG, if it were played in a one-shot manner. That is, for a single round of play, regardless of the algorithm chosen, the agents that are in the minority will get a reward, while the agents in the majority will not. This means that equation 2 will never hold. However, if the same agents play MG in a repeated manner, it is probable that different agents are in a minority in other rounds, such that their rewards accumulated over time would be approximately equal. That is:

$$\begin{aligned} \lim _{T\rightarrow \infty }\sum _{t=1}^{T} R_t (a_j) \approx \sum _{t=1}^{T} R_t (a_k) \forall (a_j \ne a_k) \end{aligned}$$
(3)

Recall, that we are not interested solely in distributive justice, but also that the mechanism should be efficient. In the case of MG, the system would be efficient if allocation of rewards is as high as possible in any given round (see Sect. 5.1). At its most efficient, the game provides the highest amount of reward (when the size of minG is N / 2):

$$\begin{aligned} R_{1} = \frac{N}{2} \end{aligned}$$
(4)

This implies that after T rounds of highest possible efficiency, the total rewards in the system must be:

$$\begin{aligned} \sum _{t=1}^{T} R_t = \frac{N}{2} * T \end{aligned}$$
(5)

From equations 4 and 5, it is clear that at optimal efficiency, the expected rewards accumulated by any single agent is given by:

$$\begin{aligned} \begin{aligned} \sum _{t=1}^{T} R_t (a_j)&= \frac{N/2 * T}{N}\\&= \frac{T}{2} \end{aligned} \end{aligned}$$
(6)

Thus, we would like a mechanism to achieve both equation 3 and 6.

9 Experimental setup and results

Table 4 shows the constant factors in each experiment.Footnote 3 Each minority game was played with a population size of 501 agents, through a simulation time period of 2000 steps. For each variation in the experimental setup, the data is reported as an average of 100 simulations. We measure the efficiency of the system through the median amount of rewards accumulated by the agent, after the simulation period. We measure the fairness of the system using the Gini index. The Gini index is used to calculate dispersion in the income distribution of a society [103]. In a society that is perfectly equal, the Gini index is equal to zero. Therefore, the closer the Gini index is to zero, the fairer the distribution of rewards.

Table 4 Experimental constants–Same as manual process of diversification

Figures 4 and 5 show the results of diversification introduced by clonal plasticity. As is immediately obvious from the x-axis (that shows diversity generated), manual diversification is able to generate a higher level of diversity in the populations of agents. Clonal Plasticity is able to reach a diversity level (also called H-index) of upto 2.7, whereas manual diversification (see Fig. 1) is able to reach an H-index of upto 5.8. This is due to the fact that a human (knowing the functional pathways of a particular algorithm) is able to specify diverse individuals that will generate diverse outputs. However, in terms of efficiency and equity, clonal plasticity confirms the results found by Nallur et al. [19]. Note that as the diversity goes up, the median reward accummulated by the population also increases (Fig. 4). At the lower end of the diversity scale (\(H = 1.7\)), the median reward is 509, while at the higher end of the scale (\(H = 2.7\)), the median reward is 867. Increasing the diversity of the population increases the reward achieved by each agent. Figure 5 shows the effect on equity of rewards achieved. The y-axis shows the Gini index, which is a quantitative measure of equality of reward. In Fig. 5, the increase in diversity results in a decrease in the Gini index (and therefore, a fairer distribution). Note: Both Figs. 4 and 5 are shown with error bars that indicate the 95% confidence intervals. The medians were established using 100 experimental trials for each variation (7 variations in algorithms [see Table 6]), and the confidence intervals for each diversity value across the 700 trials were established by bootstrapping at 100 iterations.

Fig. 4
figure 4

Impact of diversity on reward

Fig. 5
figure 5

Impact of diversity on fairness

Table 5 Relationship between diversity and reward and fairness

9.1 Effectiveness of diversity in fairness and efficiency

Recall from Sect. 8, the theoretical optimum efficiency that we would like is T / 2. That is, the accummulated reward at the end of the simulation should be equal to half the number of rounds of simulation. In all our experiments, the T is kept as a constant 2000, hence the optimum efficiency is 1000. We see from Fig. 4, that increasing diversity results in a increased median reward. At the highest diversity level (H = 2.7), the accumulated reward is 867, while at the lowest level (H = 1.7), it is 509.

Note also from Table 5, the lowest level of diversity is also responsible for the highest level of the Gini index. Recall that the Gini index measures the inequality of the distribution of income. Thus, the closer the Gini index value is to zero, the more equitable the distribution is. This table shows that attaining both, efficiency of the mechanism (close to the theoretical optimum) and fairness (a theoretical optimum of zero) is a difficult problem. While increasing diversity directly increases the efficiency (867 is closer to 1000 than 509), it does not have the same clear impact on fairness. While the fairness achieved with the highest diversity (H-index of 2.7 results in Gini index of 2.1) is certainly better than the fairness achieved with low diversity (H-index of 1.7 results in Gini index of 11.2), it is interesting to see that a diversity level of 2.5 actually provides the highest amount of fairness (Gini index of 0.9). This indicates the system might have an optimal ‘sweet spot’ for diversity, and going beyond such a limit might not necessarily improve both efficiency and fairness. Automatically deriving the pareto curve of diversity, for a particular system, would be an interesting avenue of further research.

9.2 Evaluating clonal plasticity in generation of diversity

We now look at Clonal Plasticity itself and evaluate how the cloning frequency influences the amount of diversity achieved within the population. Cloning frequency refers to how many rounds (of time) elapse before the cloning process is run. Intuitively, the fewer the number of rounds between each cloning process, the higher the amount of diversity that will be achieved. This is due to the fact that each run of the cloning process has the potential to create a new specie. This is borne out by the data in Table 6 which illustrates that an increase in mean diversity has a direct correspondence with an increase in the mean number of species generated by the cloning process. Figure 6 shows a population-wide distribution of the same analysis

Table 6 Diversity and number of species
Fig. 6
figure 6

Diversity, cloning frequency and number of species together

In Fig. 6, we see the distribution of diversity across the entire population as bubbles (y-axis), with change in cloning frequency(x-axis). The size of the bubbles represents the number of species. As we go up the y-axis, notice that the size of the bubbles increases. This indicates that the greater the diversity, the greater the number of species generated. Also, the lower the number of rounds between each cloning process (i.e., the lower the frequency), the higher the mean amount of diversity produced. Table 9 in Appendix C shows the full breakdown of the number of species produced, as the cloning frequency varies, per algorithm. It is interesting to note that the cloning frequency does not, by and large, affect the diversity achieved within an algorithmic group.

9.3 Effect of plasticity on different algorithms

Figures 7 and 8 show an algorithm-wise breakup of differences in reward achieved as well as the Gini index. Read in conjunction with Fig. 9, which shows that three algorithms (Evo, RothErev and RothErev-Evo) achieved the highest diversity, it also shows that high levels of diversity are linked to high levels of reward. Table 9, from Appendix C shows how this can be further broken down based on how rapidly the cloning process operates.

Fig. 7
figure 7

Algorithm-wise breakup of reward achieved

Fig. 8
figure 8

Algorithm-wise breakup of Gini Index

Consider Figs. 7 and 8 along with Tables 5 and 6. Both BestPlay-Evo and RothErev-BestPlay achieve a mean Gini-index very close to zero(1.0 and 1.3), however it is the combination of Evo-RothErev-BestPlay that achieves the best Gini-index (0.9) and as well as a very high level of mean reward (780). Evo-RothErev-BestPlay performs very well across both indicators of reward and Gini. However, if one considers only rewards, it is RothErev with the highest diversity level of 2.7, that has the highest reward level. In contrast, Evo is another algorithm that reached a high level of diversity; almost all its individuals consistently formed separate species (see Fig. 9) and a mean Gini-index very close to zero. However, an examination of Fig. 8 shows that the distribution of Gini values fluctuated quite a bit (see Table 9 from Appendix C for a cloning-frequency-wise breakdown), indicating that it is sensitive to cloning-frequency values. Taken together, the results show that raising diversity levels indiscriminately is not a silver bullet for achieving fairness, however reward levels rise in congruence with diversity.

Fig. 9
figure 9

Algorithm-wise breakup of number of species at the end of cloning

9.4 Scalability of clonal plasticity

Although the clonal plasticity process is effective at generating diversity, its scalability is an important factor in deciding whether it is feasible for implementation in real systems.

Before the cloning process can begin, the identification of the plasticity points must be done. This is a one-time function and needs to be done only at the initialization of the plasticity process. For any algorithm a, with k interface variables, the first step is to identify whether those variables modify the output of the algorithm. This is clearly O(k) in complexity. The second major step is to identify the plasticity range for each plasticity point. This involves identifying the valid set of values a plasticity point may take, as well as the constraints on its value. Assuming a time complexity of r per variable, the first phase of clonal plasticity is \(O(r*k)\) in complexity.

Recall that the ChoosePlasticityRange and CreateClone functions form the core of the plasticity process (see Algorithm 3). ChoosePlasticityRange is a linear function (O(c)) since it involves merely decision-making, based on the LastReward and Memory variables. CreateClone involves looking at the value of each plasticity point, and assuming a time complexity of p per plasticity point, its complexity can be calculated as \(O(p*k)\). The computational complexity of the clonal plasticity process is therefore: \(O(r*k) + O(c) + O(p*k)\). This is important because, if the adaptation process is computationally expensive, then its benefits are limited to systems with few plasticity points. The linear complexity of clonal plasticity implies that the mechanism is scalable to large-scale multi-agent systems, with a high number of plasticity points.

The EvalEnviron is a domain-specific function to evaluate the current performance of the agent, and is therefore left unspecified. Any adaptation mechanism would require an EvalEnviron function and as such, it is a price that all adaptation functions must pay for sensible self-modification.

Crucially, it is important to note that the entire process works on a single agent and is independent of other agents’ performance, in any direct fashion. This allows the plasticity mechanism to work in a completely decentralized manner. Of course, the presence of other agents affects the evaluation of the environment (in an indirect fashion via their actions), and thus provides local competitive pressure that moves the agent to a better configuration.

10 Differences between clonal plasticity and phenotypic diversity-based mechanisms

In the presence of large search spaces for adaptation, there have been a number of intelligent trial-and-error based algorithms that have been proposed. These differ from the more commonly known EA/GA/GP methods in that there is an explicit absence of an optimzing function. Notable examples include techniques such as Novelty Search [104], Functional Blueprints [105] and more recently, MAP-Elites [106], SHINE [107], etc. Among these techniques, there is an acknowledgement of the fundamental importance of phenotypic diversity, similar to clonal plasticity. The key difference between these methods and clonal plasticity is pre-computation of phenotypically diverse behaviours. For instance, Multi-dimensional Archive of Phenotypic Elites (MAP-Elites) [106] creates a number of discrete behaviour bins, which contain the fittest individual mapped to that bin so far. The bins are then sampled at random to produce an offspring and mapped to a bin depending on its behaviour characterization. The individual is then saved or culled depending on its relative fitness to other individuals in the bin. In this way, due to uniform sampling, the algorithm passively acquires diversity [108]. SHINE [107] uses the same fundamental idea, but uses a tree-structure for maintenance and selection of potential individuals. This, along with a technique of penalising crowded areas, allows the algorithm to explore the phenotypic landscape more rapidly than MAP-Elites.

Real-world domains that have been explored by these techniques include bi-pedal walking by a robot, walking by hexapod robot and damaged robotic arms. A peculiarity of these domains is that the adaptation required cannot wait for trial-and-error to conclude, and hence pre-computation is a necessity for such domains. Pre-computation requires simulation and a thorough exploration of the behaviour space, before the final solutions are transferred to the hardware device. Clonal Plasticity does not do any pre-computation and hence is unsuitable for domains that require a rapid response.

11 Differences between clonal plasticity and non-phenotypic diversity-based mechanisms

Clonal Plasticity is a meta-heuristic, in the same vein as the other methods in evolutionary computation. However, it differs in significant ways from the diversity generation approaches presented before, in Sect. 3. These axes affect how suitable an approach is (Table 7), for a specific domain:

  1. 1.

    Representation (Neutral or Dependent): If the diversification relies on the distance-measure adopted, then the representation of the individual becomes a critical factor in calculating diversity.

  2. 2.

    Temporal Dependence (Yes or No): Does the diversification depend on when the differences between individuals is calculated. For example, methods such as incest prevention and mating restrictions require that the diversity between individuals be calculated before every new generation. Similarly, the ensemble mechanism also requires that different values for ensemble training are chosen (from a poisson distribution) before the training starts.

  3. 3.

    Locus of Control (Yes or No): Does the mechanism involve calculating the diversity of the entire population in a centralized fashion or can different subsets diversify at different rates?

11.1 Representation

Evolutionary Algorithms (EAs), Genetic Algorithms (GAs), Genetic Programming (GP) typically use a distance measure in the genotype of the individual to force diversity into the population. For example, Sareni and Krähenbül utilize the notion of fitness sharing and creation of niches to ensure that the population retains diversity, after multiple generations [109]. However, this mechanism quickly becomes computationally intractable when using genotype representations such as Artificial Neural Networks(ANNs) [110]. Behavioural diversity based approaches such as novelty search [104] avoid the problems of genotype-based representations.

The traditional particle-swarm [75] uses a tuple of D-dimensional vectors (where D is the dimensionality of the search space), position and fitness of best point, as well as index of best particle to represent a particle in the swarm. Diversity generation mechanisms (e.g., [79, 80]) continue to use the tuple as a standard representation of a particle.

Machine Learning approaches such as ensembles are independent of any particular representation.

Clonal Plasticity does not have (or require) a particular representation of an agent. The agent must merely be able to reflect on, and tune its own parameters.

Representation is an important aspect during development, since many application domains where multi-agent systems are used (modelling electricity grids, transportation systems, etc.) do not lend themselves to arbitrary representations. Therefore, for agents that need to model domain-specific behaviour it would be more convenient to employ either ensemble-based diversity generation or clonal plasticity.

11.2 Temporal dependence

EA/GA/GP have a high dependence on the temporality of the diversity calculation. Due to their inherent selection-pressure based nature, diversity mechanisms must be deployed at every generation, to ensure that the selection pressure does not lead to a homogeneous/convergent population.

Swarm-based methods are also dependent on temporality of diversity calculation, since every particle’s next-step will influence whether the resulting swarm consists of a dense neighbourhood(s) or not.

Machine Learning mechanisms such as ensemble-learning are not dependent on generations, however diversity is induced during the training phase by using variation in the examples [37]Ḣowever, other mechanisms such as evolutionary modification of neural networks [81] require a generation-by-generation diversity calculation.

Clonal Plasticity belongs to a class of methods which do not explicitly force selective pressure towards diversity [111], rather diversity is emergent from the self-adaptation behaviour.

The need to ensure diversity at every generation makes EA/GA/GP/swarm-based approaches unsuitable for a dynamic environment that requires some action from the system-under-consideration. Unlike optimization problems where the final calculated solution is the only determiner of quality, other domains (such as human-facing systems) will often need software to respond, even if sub-optimally. In such situations, clonal plasticity is the best way to ensure that the system continues to function, while moving towards more diversity.

11.3 Locus of control

EA/GA/GP use methods that calculate genotype-distance require the entire population to be available for diversity calculations. Even the behaviour-based novelty search creates a distance-measure among behaviours that have been observed in the population, and hence require computation over the entire population.

Swarm-based methods operate on each particle at a time, however, neighbourhood measurements of density require computations over subsets of the population, which make diversification more of a centralized process than a decentralized one.

Machine Learning techniques all explicitly aim for diversity during their training phase. This requires that after a training datum, the feedback and subsequent training take the current homogeneity of all available learners into account.

Since clonal plasticity does not explicitly aim towards diversity, the diversity seen in the population is an emergent phenomenon from the actions of the individual agents. Each agent does, however, perform a neighbourhood-based calculation of its relative fitness. This makes clonal plasticity not a fully decentralized process, rather a distributed process.

No approach that we know of is able to achieve a fully decentralized operation to ensure diversity. The best achieved (so far) reduces computation to small subsets, instead of the entire population. Decentralization is required where the system-under-consideration is a large-scale system, and using centralized operations are infeasible. Examples of such systems include vehicular ad-hoc networks [3, 4], smart buildings [5], e-procurement [6], cloud computing [7], healthcare systems [8].

Table 7 Contrast between diversity generation approaches

12 Weaknesses of clonal plasticity

Domain limitations Clonal Plasticity is an intelligent trial-and-error mechanism that is able to dynamically adjust to a changing environment by exhibiting phenotypic diversity. However, in certain domains such as robotics or autonomous vehicles the rate of response required from the system is quite fast. Due to real-time response requirements, other phenotypic diversity mechanisms such as MAP-Elites [106] use pre-computation to explore the possible behaviour space before deployment. Clonal Plasticity does not use pre-computation and hence is slow to jump from one known behaviour to another. Domains requiring such fast adaptation responses medical devices, transportation, and industrial robotics are not suitable for clonal plasticity.

Algorithmic limitations Clonal plasticity is a meta-algorithm that achieves diversity regardless of the underlying problem-specific algorithm being implemented. However, the diversity-seeking behaviour makes it unlikely that clonal plasticity can be used to achieve optimality in any search space. This diversity, unlike others such as Novelty Search [104] and SHINE [107], is an emergent characteristic of the interaction between plasticity and the environment. This implies that if by some chance, clonal plasticity hits on the optimal behaviour, it is likely to stay at that point. However, it does not actively seek out optimal behaviour. By contrast, EA/GA/GP have a well-established record of optimization with both, theoretical analysis [112] as well as in practical domains [113, 114]. Hence, clonal plasticity is unsuited for situations that require optimization.

13 Conclusion

In ecological sciences, as well as computer science, diversity has been shown to be an important property of an ecosystem. As pointed out in Sect. 2, the concept of diversity has been used in computer security, search, machine-learning, as well as software engineering. However, instead of qualitative statements about diversity, we present a quantitative approach to measuring the effects of diversity, and an adaptation approach that increases the amount of diversity present in the system. This is especially important in socio-technical systems that involve human interactions. The Minority Game has been shown to be a simple, yet good proxy for socio-technical systems such as traffic [115], financial markets [93], distributed decision-making in wireless networks [94], smart-buildings [95], etc. Results shown in the Minority Game setting indicate that diversity can play an important role in affecting system properties such as efficiency and fairness. We do not wish to suggest that diversity is a silver bullet for achieving optimal levels of both, but rather that it is an important mechanism in a software engineer’s toolkit while engineering socio-technical systems.

13.1 Open research questions

Although clonal plasticity has been shown to lead to successful diversification, it would be interesting to know if there are limits to how quickly such diversification can occur. An investigation into the formal properties of the clonal plasticity process would be invaluable in determining its suitability for time-critical systems. Are there properties of the constituent algorithms or the number of plasticity points that can predict the speed of, or magnitude of adaptive movement? Can we automatically identify the pareto space of the system, where increasing diversity does not necessarily result in better values of the desired system properties? All of these are subjects of future research that we think will contribute greatly to a more detailed and nuanced understanding of this adaptive strategy.

13.2 Concluding remarks

Multi-agent systems are increasingly being used to model complex socio-technical systems that involve human interactions. In such systems, it is difficult to simultaneously guarantee properties of efficiency and fairness. This paper presents a measurable system property (diversity) that can be engineered to produce high levels of both. Further, it presents an adaptation process (clonal plasticity) that can drive a system, from low levels of diversity to higher levels, depending on the number of plasticity points that individual agents possess. Being a completely decentralized mechanism, clonal plasticity is suitable for systems consisting of a large number of agents.