Keywords

1 Introduction

The autonomous robots may be identified with cognitive agents. This permit studying, through modeling and simulation, how their learning performance depends on various parameters, [1]. We study performance of homogeneous and heterogeneous (i.e., containing risk takers and risk avoiders) populations of cognitive agents learning to cross a cellular automaton (CA) based highway under various traffic conditions. The agents use a simple observational social learning strategy, [2] in which they learn by observing the performance of other agents, mimicking what worked for them and avoiding what did not in the past. Our work focuses on simplicity of the learning algorithms and it is an extension of the previous research [3,4,5], in which the agents’ decision formula was based only on the assessment of agents crossing decisions. In [6] we introduced a modified decision formula which incorporates the assessment of the agents both crossing and waiting decisions. We study how this modification improves agents’ performance measured by the rates of agents four decision types: correct and incorrect crossing decisions, and correct and incorrect waiting decisions. We investigate the effects of the presence of risk takers and risk avoiders on these rates for various density of cars on the highway. We study how the transfer of agents’ knowledge base, built by agents in one traffic environment to the agents learning to cross in a different traffic environment, affects the rates of their decisions.

The paper is organized as follows: Sect. 2 describes the model focusing on agents’ decision-making algorithms; Sect. 3 describes setup of simulation parameters, the resulting data, introduces the rate functions of agents’ decisions and the considered agents’ populations; Sect. 4 presents analysis of selected simulation results. Section 5 reports our conclusions and outlines future work.

2 Model of Agents Learning to Cross a Highway

For detailed description of the model the reader is referred to [3,4,5,6]. We assume that: (1) the environment is a single lane unidirectional highway, modelled by adopting the Nagel-Schreckenberg cellular automaton (CA) model [7]; (2) all agents want to learn how to cross the highway without being hit/killed by the oncoming vehicles and they witness what had happened to the agents that previously crossed the highway at a given crossing point (with exclusion of the first one). These allow each crossing point (CP) to build one knowledge base (KB) during an experiment that is available to all agents at that CP. An agent is generated only at the CPs set at the initialization step and is placed into the queue at this CP. Each generated agent falls with equal probability (0.25) into one of the four categories: (1) no Fear nor Desire; (2) only Fear; (3) only Desire; (4) both Fear and Desire. The agents’ attributes/parameters of Fear and Desire play a role in their decision-making process of crossing the highway. The values of Fear reflect the agents’ aversion to risk taking and the values of Desire reflect their propensity to risk taking. Agents attempt to cross the highway having a limited horizon of vision and they can perceive only fuzzy levels of speed (e.g., slow, medium, fast, very fast) and of distance (e.g., close, medium, far) of cars within this horizon. The distances and speeds that each agent can perceive are set in the configuration file. If an agent at some instance of time does not cross the highway, because it has become afraid, agents will build up in the queue until the agent at the top of the queue, called active agent, decides to cross, or moves to a different location from which to attempt crossing. If the simulation setup permits, an agent may move randomly right or left from its CP along the highway, [3,4,5,6].

Each active agent must make one of the following two decisions: Crossing Decision (CD) or Waiting Decision (WD). The CD is Correct Crossing Decision (CCD) if the active agent succeeds, if not then it is Incorrect Crossing Decision (ICD). The WD is: (1) Correct Waiting Decision (CWD), in the case when, if the agent did not wait and chose to cross, it would be hit; (2) Incorrect Waiting Decision (IWD), in the case when, the active agent chose to wait but it could have crossed the highway successfully. The assessment of each decision of an active agent, i.e. if the decision was CCD, ICD, CWD, or IWD, is recorded, respectively, as a count in the Knowledge-Based (KB) table of all agents waiting at the CP of the active agent. Thus, with each CP is associated its KB table.

Each KB table is organized as a matrix with an extra row entry. The columns names are slow, medium, fast and very fast. They stand for the car speeds perceived by the active agents. The rows names are close, medium and far. They stand for the car distances perceived by the active agents. Since the agents have limited horizon of vision, the extra row entry corresponds to agents’ out of range vision, i.e. the situation in which an active agent cannot perceive if outside its horizon of vision there is a car and if it is, what is its velocity. Because of this the cells corresponding to the described fuzzy velocity levels are all merged together into the extra row entry. At each time t, each entry of the KB table (including the extra row entry) contains four numbers: number of CCDs, number of ICDs, number of CWDs and number of IWDs, i.e. of each of the decision type made by the active agents up to time t − 1. The KB table is initialized as tabula rasa; i.e. a “blank slate”, represented by “(0, 0, 0, 0)” at each table entry, for further details see [3,4,5,6]. After the initialization period the active agents make their decisions based on the outcomes of the implemented intelligence/decision-making algorithm, which for a given (distance, velocity) pair or out of range vision combines the success ratio of crossing the highway for the observed situation with the agent’s Fear and/or Desire parameters’ values.

The main simulation loop of the model consists of: (1) generating randomly cars using the Car Prob.; (2) generating agents at each CP with their attributes; (3) updating the car speeds according the Nagel-Schreckenberg model; (4) moving the agents from their CP queues into the highway (if the decision algorithm indicates this should occur); (5) updating locations of the cars on the highway, checking if any agent has been killed and updating the KB tables; (6) advancing of the current time step. After the simulation is completed, the results are written to output files using an output function.

The decision formula (DF) of [3,4,5] considers only the outcomes of agents’ CDs, i.e. numbers of successful and killed agents for each fuzzy (distance, velocity) pair observation or for out of range vision at time t. Since the number of successful agents is equal to the number of CCDs, and the number of killed agents is equal to the number of ICDs, we call this formula Crossing Based Decision Formula (cDF).

After the initialization phase, at each time step t, each active agent, carries several tasks, namely: (1) determines if there is a car in its horizon of vision. If it is, then it determines the fuzzy (ith distance, jth velocity) values of the closest car; (2) from the KB table associated with its CP it gets information about the number of CCDs and the number of ICDs for the observed (ith distance, jth velocity) pair, or for the observed out of range vision situation, entry of which in the KB table is denoted by (0, 0) pair of indexes; (3) for the observed (i, j) situation it calculates the value of the cDF, i.e. the value \( cDF_{ij} \left( t \right), \) corresponding to the (i, j) entry of the KB table (including the extra row entry). The expression \( cDF_{ij} \left( t \right) \) is calculated as follows:

$$ cDF_{\text{ij}} \left( {\text{t}} \right) = cSR_{\text{ij}} \left( {\text{t}} \right) + v\left( {Desire} \right) - v\left( {Fear} \right), $$
(1)

where v(Desire) and v(Fear) are the values of the active agent Fear and Desire attributes/parameters, and cSRij(t) is the Crossing Based Success Ratio (cSR) corresponding to the ijth entry of the KB table. The cSRij(t) is calculated as follows:

$$ cSR_{\text{ij}} \left( {\text{t}} \right) = \left\{ {CCD_{ij} \left( {{\text{t}} - 1} \right){-}ICD_{ij} \left( {{\text{t}} - 1} \right)} \right\}/CCD_{total} \left( {{\text{t}} - 1} \right). $$
(2)

The terms CCDij(t − 1) and ICDij(t − 1) are, respectively, the numbers of CCDs and of ICDs recorded in the ijth entry of the KB table up to time t − 1. The term CCDtotal(t − 1) is the number of all CCDs made by active agents up to time t − 1, i.e. it is the sum of CCDs made up to time t − 1 over all the entries of the KB table. The number CCDtotal(t − 1) is equivalent to the total number of successful agents up to time t − 1.

After the initialization period (for details see [6]), if \( cDF_{ij} \left( {\rm t} \right) \ge 0 \), then an active agent decides to cross, if \( cDF_{ij} \left({\rm t} \right) \ge 0 \), then it decides to wait and additionally it may move to another crossing point, if simulation setup permits.

The modified decision formula, called Crossing-and-Waiting Based Decision Formula (cwDF) [6], is based on the assessment of both crossing and waiting decisions of the active agents. The formula cwDF is obtained from cDF formula by replacing the term cSRij(t) by the term cwSRij(t) in the cDF formula (1). The term cwSRij(t), called Crossing-and-Waiting Based Success Ratio (cwSR), is defined for each ij entry of the KB table at time t as follows:

$$ cwSR_{\text{ij}} \left( {\text{t}} \right) = \left\{ {CCD_{ij} \left( {{\text{t}} - 1} \right){-}ICD_{ij} \left( {{\text{t}} - 1} \right) - CWD_{ij} \left( {{\text{t}} - 1} \right) + IWD_{ij} \left( {{\text{t}} - 1} \right)} \right\}/S\left( {{\text{t}} - 1} \right), $$
(3)

where CCDij(t − 1), ICDij(t − 1), CWDij(t − 1) and IWDij(t − 1), respectively, is the number of CCDs, ICDs, CWDs and IWDs, made by active agents up to time t − 1, which is recorded in the entry ij of KB table. The term S(t − 1) is the sum of all the numbers of decisions made up to time t − 1 over all the entries of the KB table, and it is given by

$$ S\left( {{\text{t}} - 1} \right) \, = \sum\nolimits_{\text{ij}} {\left\{ {CCD_{ij} \left( {{\text{t}} - 1} \right) \, + ICD_{ij} \left( {{\text{t}} - 1} \right) \, + CWD_{ij} \left( {{\text{t}} - 1} \right) \, + IWD_{ij} \left( {{\text{t}} - 1} \right)} \right\}} . $$
(4)

Thus, the formula cwDF can be written as follows

$$ cwDF_{\text{ij}} \left( {\text{t}} \right) = cwSR_{\text{ij}} \left( {\text{t}} \right) + v\left( {Desire} \right) - v\left( {Fear} \right), $$
(5)

where the term cwSRij(t) is defined in (3). As before v(Desire), v(Fear) are the values of an active agent Desire and Fear attributes/parameters and for an observed (i, j) situation an active agent decides to cross the highway only when cwDFij(t) ≥ 0. Otherwise, the active agent will wait and additionally it may move to another crossing point, if the simulation setup allows this.

Depending on Desire and Fear parameters values the difference v(Desire) − v(Fear) in the DFs (1) and (5) acts like a threshold and determines an agent “rationality”, or “propensity to risk taking”, or “aversion to risk taking”. If the values of Desire and Fear are both 0.0, then all agents use cSR or cwSR in their decision-making process, i.e. the entire population of agents acts “rationally” alike in their decision-making process. However, if the values of Desire and Fear are different from 0.0, then no longer all agents act “rationally” alike, i.e. at least 25% of agents will have propensity to risk taking and at least 25% will have aversion to risk taking.

3 Simulation Data and Rate Functions of Agents Decisions

To study the effects of DF on agents performance data sets were generated, respectively, for cDF and cwDF, with the same setup of the other parameter values.

We consider the model parameters as factors with various levels in the sense of the experimental design paradigm [8]. Some parameters have constant values some other not. The detailed description of the parameters and their values is in [6]. We consider the same values of the parameters as in [6].

There are 6 parameters/factors values of which vary in the simulation setups of the software. These parameters are: (1) car creation probability, i.e. CCP; (2) Fear parameter; (3) Desire parameter; (4) the KB transfer parameter, i.e. KBT; (5) random deceleration, i.e. RD and (5) horizontal movement of an active agent, i.e. HM.

We measure the agents’ performance by the rate functions of their CCDs, ICDs, CWDs and IWDs, i.e. by the time series RCCD(t), RICD(t), RCWD(t) and RIWD(t), where “R” stands for “rate”. Each value of each of these times series at each time t is a mean calculated over many simulation runs. Consider RCCD(t) as an example, then

$$ RCCD\left( t \right) = \frac{1}{n}\sum\nolimits_{k = 1}^{n} {\frac{{CCD_{k} \left( t \right)}}{t}} , $$
(6)

where CCDk(t) is the number of all CCDs up to time t in the simulation run k, where \( k = 1, \ldots ,n \), and n stands for the number of repeats. In our case n = 30. Thus, CCDk(t) is the sum of CCDij(t) over all the entries of the KB table at time t in the simulation run k. The time series RICD(t), RCWD(t) and RIWD(t) are calculated by replacing CCDk(t) in (6), respectively, by ICDk(t), CWDk(t) and IWDk(t), which are calculated similarly as CCDk(t). When HM = 0, i.e. when only one CP is allowed, then only one active agent makes decision per each time step. Thus, the values of each rate function are always between 0 and 1.

4 Simulation Results

We compare the rates of decision functions of the agents using cwDF with the rates of these functions when the agents use cDF instead. Also, we study how the values of Fear and Desire parameters and the transfer of KB affect the agents’ rates of decisions. Let’s recall that the values of Fear and Desire parameters determine the value of the threshold each agent uses in its decision-making process. Thus, they determine if an agent acts “rationally” or not (i.e., it makes its decision based on Success Ratio cSR or cwSR only), or if it is risk taker or risk avoider. To illustrate the effects of risk takers and risk avoiders on agents’ populations performance we discuss the results for the following representative pairs of (Desire, Fear) parameters’ values: (0.0, 0.0), (0.5, 0.5), (0.25, 0.75) and (0.75, 0.25). For (Desire, Fear) parameters’ values (0.0, 0.0) each population of agents is homogeneous one, i.e. all agents act “rationally”. For the other values of the parameters the populations of agents are heterogeneous ones. For (0.5, 0.5) they contain the same numbers of risk takers as risk avoiders, for (0.25, 0.75) smaller number, for (0.75, 0.25) larger number of risk takers than risk avoiders. The risk takers’ and risk avoiders’ subpopulations are homogeneous ones for (0.5, 0.5). However, the risk avoiders’ subpopulations are heterogeneous ones for (0.25, 0.75) and the risk takers’ subpopulations are heterogeneous ones for (0.75, 0.25), i.e. the agents in these subpopulations use different thresholds in their decisions.

The simulation results are organized as follows. The results are displayed for KBT = 0, RD = 0, HM = 0 in the first two columns and for KBT = 1, RD = 0, HM = 0 in the last two columns. The figure’s first and third column display the decision rate functions for cDF and the second and fourth column display these functions for cwDF. On each inset of the figure the solid curves display the rate of decision functions, and the corresponding colour marker curves display one standard deviations of rate of decision functions. On each inset we display 5 graphs of the rate of decision functions, each one of them for different CCP value. We assign the colours to these graphs as follows: red to CCP = 0.1, blue to CCP = 0.3, green to CCP = 0.5, black to CCP = 0.7 and yellow to CCP = 0.9. The values of CWDs and ICDs rate functions are very small for both DF. Thus, we do not display them here.

Our simulations show that the values of rate functions of “rational” populations of agents (i.e., homogeneous ones) are alike for all CCP values and both DFs, and the transfer of KB does not improve significantly the agents’ performance (results not display here). This is not the case for heterogeneous populations of agents, see Fig. 1, which displays CCDs and IWDs rate functions for (Desire, Fear) parameters’ values (0.25, 0.75), (0.5, 0.5) and (0.75, 0.25). We notice that for heterogeneous populations of agents: (1) the performance depends on CCP vlaues and DF the agents use; (2) the performance degradation increases with the increase of Fear parameter values, i.e. with the increase of risk avoiders’ numbers and their threshold values. For cwDF, after some transient times the agents’ population overcome this and their decisions’ rates are like those of homogeneous population of agents (except RIWD for (0.75, 0.25)), this is not the case for cDF; (3) variability in performance increases with the increase of Desire parameter values (i.e., with the increase of risk takers numbers and risk takers threshold values) significantly for cDF but not for cwDF. The transfer of KB reduces this variability for cwDF but not for cDF; (4) the transfer of KB improves significantly the performance of heterogeneous populations of agents for cwDF but does not for cDF. After the KB transfer the performance for cwDF becomes alike to the one of homogeneous population of agents but not for cDF.

Fig. 1.
figure 1

Mean values (solid curves) of CCDs and IWDs rates and their one standard deviations (marker curves) for various Desire, Fear and CCP parameters values. (Color figure online)

Our simulations show that for the heterogeneous population of agents using cDF the values of IWDs rate functions are significantly higher than the respective values of the homogeneous populations, and with the increase of CCP values and as time progresses the values of IWDs rate functions monotonically increase causing decrease, to almost zero, in the values of CCDs rate functions. Thus, for cDF, the values of CCDs rate functions are significantly lower for the heterogeneous populations of agents than for the homogeneous ones. Also, these values are lower from those when the agents use cwDF instead. For cwDF and when KBT = 0, the values of CCDs rate functions, after some transient times, increase monotonically with the increase of CCP values and as time progresses they reach asymptotically almost the values like the ones of the homogeneous populations of agents. These monotonic increase is the result of the monotonic decrease in the values of IWDs rate functions. Thus, when the heterogeneous populations of agents use cwDF the values of CCDs and IWDs rate functions behave in opposite way than when the agents use cDF instead. Also, transferring of KB improves agents’ performance when they use cwDF, it becomes alike to the one of homogeneous population of agents, which is not the case for cDF. Thus, the use of cwDF guarantees consistency and predictability in the agents’ performance, which is not the case when the agents use cDF instead.

5 Conclusions and Future Work

The simulation results show that the performance of the homogeneous population of agents is almost the same regardless which DF they use. However, this is not the case for heterogeneous populations of agents, i.e. including risk takers and risk avoiders. A heterogeneous population of agents’ performance is much better when the agents use cwDF instead of cDF in their decision-making process. The inclusion of the assessment of agents WDs into their DF formula, based only on the assessment of their CDs, can mitigate the negative effects caused by the presence of risk takers and risk avoiders in agents’ population. Transfer of the KB improves significantly the performance of a heterogeneous population of agents when they use cwDF but not when they use cDF. Also, the performance of agents using cwDF is much more consistent across various traffic environments, then the one when they use cDF instead. We plan to investigate agents’ performance in learning to cross the highway for other types of decision-making process.