Introduction

Autonomous robot is defined as a robot that is able to act adaptively to the unknown or dynamic environments. To acquire the autonomic behaviors, an internal model of robot needs to be designed not only with the ability of information processing in response to external stimuli, but also with the ability of self-control. In another word, adaptive behaviors driven by the internal model depend on the observation of both external environment and internal state. Behaviors or actions of autonomous robots result in the change of its environment; meanwhile, the variable states of the environment keep affecting the internal decision system of robots. So how to design an intelligent and powerful “machine conscious system” to yield constant adaptive behaviors of an autonomous system or robot is considered more and more recently [1].

When multiple autonomous robots (agents) exist in the unknown or dynamic environment, “collective behaviors” or “cooperative behaviors” are expected to realize their common goals. In [2], a survey of multiple-robot system studies is given, and it is advocated that “group architecture”, “resource conflict”, “learning”, and “the origin of cooperation” using biological analogies and fundamental theories of social science are serviceable for the design of autonomous robots. In [3], cooperative behavior acquisition of mobile robots is realized with reinforcement learning which is inspired by behaviorist psychology. We also proposed a neuro-fuzzy reinforcement learning system for swarm formation and adaptive swarm behavior acquisition of the swarm robots in the previous work [47]. Meanwhile, Ide and Nozawa group proposed an internal model in [8, 9], which drives autonomous robots avoiding obstacles and exploring a goal in the unknown environments using a psychological model proposed by Russell in [10, 11]. Russell’s “circumplex model of affect” assigns 8 kinds of major emotions on a two-dimensional map, and by psychological analysis of evidences, affective states such as pleasure, excitement, arousal, distress, misery, depression, sleepiness, and contentment are categorized orderly in a pleasure-arousal space (Fig. 1). The emotion model of robots given by [8] uses the factors of “pleasure” and “arousal”, those are orthogonal axes concerning with all affective states, to decide a series of behavior rules for the autonomous robots. Degrees (values) of “pleasure” and “arousal” control the velocity of robots and obstacles. And these states of other robots are also observed and affect the values of “pleasure” or “sleepiness” of the observer (robot). In [9], the emotion model is applied to cooperative work acquisition of autonomous robots. However, when we executed the simulation of goal-exploration problem using the model, some practical problems were found such as dead-lock, limitation of vision depth, and restriction of speed. So we proposed an improved internal model in [12] with a new factor: “curiosity”, which is not included in Russell’s mental map.

Fig. 1
figure 1

A circumplex model of affect proposed by Russell [10]. Eight affect concepts are arranged in a circular order, pleasure and arousal dimensions provide main directions of the mental map

In this paper, we intend to compare and analyze Ide and Nozawa’s model and our improved model by goal-exploration problem simulation and evaluation more in detail. Two kinds of unknown environments for two affection-driven robots are designed: a simple environment with one goal to explore and a complex environment with three goals. Different delay times of the start of robots are also set to observe the change of behaviors and internal states of robots. The rest of the paper is organized as follows. In section “An Improved Internal Model of Autonomous Robot”, conventional model and improved model are described. Comparison of goal-exploration simulations with the two models in the different environment is presented in section “Simulation Experiments”. Finally, the conclusion is given in section “Conclusion”.

An Improved Internal Model of Autonomous Robot

Russell’s emotion model [10] is shown in Fig. 1. The main difference from traditional psychological analyses of affect is that only pleasure and arousal dimensions are stressed in the model; meanwhile, conventionally a set of dimensions such as displeasure, distress, depression, excitement, and so on were considered independently. According to 28 stimulus words presented to 36 young peoples, [10] described the emotion categories in the circular ordering. In [11], it is pointed that the circumplex model suggested a clear structure and has large heuristic value for the evaluation of emotion. The group of Ide and Nozawa used the concept of the circumplex model to design an emotion model to evoke interactions or cooperative behaviors of multiple autonomous robots [8, 9]. Furthermore, Oudeyer and Kaplan used a curiosity model that considered the influence of time factor to raise the motivation of adaptive activities of robots [13]. In this Section, the emotion model of robots is introduced at first and an improved internal model including the emotional concept and a novel calculation method of curiosity is proposed.

A Conventional Emotion Model for Robots

In a goal-exploration problem, robots move to search the goal and avoiding to obstacles and other robots in the unknown 2-D environments. In the conventional emotion model [8], information of local environment around the robot is obtained by the observation and the information determines the degree of emotional vectors: “pleasure” and “arousal”, which cause the motion of robot.

First of all, the position vector \( \user2{R}_{\user2{i}} (\user2{t}) \) of robot i at time t is given by:

$$ \user2{R}_{\user2{i}} (\user2{t} + 1) = \user2{R}_{\user2{i}} (\user2{t}) + \user2{V}_{\user2{i}} (\user2{t}) $$
(1)

where t: step (time); \( \user2{V}{}_{\user2{i}}(\user2{t}) \): velocity vector of robot i at time t .

The velocity \( \user2{V}{}_{\user2{i}}(\user2{t}) \) of robot i is decided by the degree of “pleasure” of itself \( Pv{}_{ij} \)and other’s \( Pv{}_{ji} \):

$$ \user2{V}_{\user2{i}} (\user2{t} + 1) = \user2{V}_{\user2{i}} (\user2{t}) - l_{1} \cdot \sum\limits_{j} {Pv_{ji} } + l_{2} \cdot \sum\limits_{i} {Pv_{ij} } . $$
(2)

where \( l_{1} ,l_{2} \): emotional influence parameters; \( Pv{}_{ji} \): influence from robot j to robot i; \( Pv{}_{ij} \): influence from robot i to robot j.

The negative coefficient of \( Pv{}_{ji} \) results in avoiding the crowd of robots and supports robot is behavior of exploration.

The influences between the robots are in direct ratios to the distances (Euclidean) between them, calculated by:

$$ Pv_{ji} = \frac{{Pv{}_{j} \cdot r{}_{ji}}}{{\left| {r{}_{ji}} \right|}}, $$
(3)
$$ Pv_{ij} = \frac{{Pv{}_{i} \cdot r{}_{ij}}}{{\left| {r{}_{ij}} \right|}}, $$
(4)

respectively, where \( r_{ij} ,r_{ji} \) are distances (Euclidean) between robot i and j. The degree of “pleasure” of each robot is decided by the depth of the vision:

$$ Pv(t + 1) = Pv(t) + e_{p} \cdot R_{Pv} \cdot Pv(t) , $$
(5)
$$ e_{p} = \left\{ \begin{gathered} - 1\quad ( {\text{where}}\;d_{o} > D, \, d_{r} > D )\hfill \\ 1\quad ( {\text{where}}\;d_{o} \le D, \, d_{r} \le D )\hfill \\ \end{gathered} \right. . $$
(6)

\( R_{pv} \): rate of the change of “pleasure” \( (0 \le R_{pv} \le 1) \); \( d_{o} \): distance (scalar) from robot i to the nearest obstacle; \( d_{r} \): distance (scalar) from robot i to the nearest robot; \( D \): threshold value (scalar) of depth of the vision; \( e_{p} \): positive coefficient.

The threshold value of vision depth decides the change of “pleasure” and is decided by the degree of “arousal” \( Av \):

$$ D = \alpha \cdot Av + K. $$
(7)

where \( Av \): degree of “arousal”; \( K \): bias of the vision depth; \( \alpha \): positive coefficients.

The degree of “arousal” \( Av \) is also influenced by the situation of the existence of other robots beyond its vision:

$$ Av(t + 1) = Av(t) + e_{a} \cdot R_{Av} \cdot Av(t), $$
(8)
$$ e_{a} = \left\{ \begin{gathered} - 1\quad ({\text{where}}\;d_{r} > D) \hfill \\ 1\quad ({\text{where}}\;d_{r} \le D) \hfill \\ \end{gathered} \right., $$
(9)

where \( \, e_{a} : \) coefficients; \( R_{Av} \): rate of the change of “arousal” \( (0 \le R_{Av} \le 1 ) \).

According to definitions described above, robots move in the environment as follows:

  1. (1)

    Local information is obtained within the environment of vision;

  2. (2)

    The degree of “arousal” is influenced by the distance to other robots (Eqs. 8, 9);

  3. (3)

    A robot comes up to other robots that are “pleasure” appearing within the vision and comes off to ones in the opposite case (Eq. 2);

  4. (4)

    A robot comes up to other robots, when it is in the state of “pleasure”, and comes off to the others in the opposite case (Eq. 2);

  5. (5)

    The degree of “pleasure” is reduced, when obstacles or other robots are observed, and increased in the opposite case (Eqs. 5, 6);

  6. (6)

    The degree of “arousal” is increased, when other robots are observed, and reduced in the opposite case (Eqs. 8, 9).

According to these rules, patterns of behaviors of robots appear as follows:

  1. (1)

    Robots in “pleasure” state attract each other and come up to each other closely (Fig. 2a);

    Fig. 2
    figure 2

    Behavior patterns of robots driven by an emotion model

  2. (2)

    A robot in “pleasure” state moves to a direction to toward the goal and causes others to follow it (Fig. 2b);

  3. (3)

    Robots in “displeasure” state go away from each other (Fig. 2c).

An Improved Internal Model for Autonomous Robots

Using the conventional model described in the section “A Conventional Emotion Model for Robots”, we performed simulation experiments and observed kinds of results such as robots successfully attracted each other, avoided to obstacles and achieved at the goal(s) of exploration, or failed to attract each other, failed to achieve on the multiple goal areas in more complicated environment.

The reasons of the failed cases may be considered as follows:

  1. (1)

    Bias of the vision was set inadequately. Too large value of K caused the internal state of robot dropped into “displeasure” easily.

  2. (2)

    The time that robots influent each other was too short because of too high velocity.

  3. (3)

    There was a trend that the degree of pleasure reduced more easily than increased.

  4. (4)

    Low degree of pleasure of all robots caused low degree of arousal of robots and the case resulted in all robots dropped into the state of sleepiness, the behaviors of exploration disappeared.

To overcome these problems and to raise the motivation of exploration, in [12], we proposed to add new rules into the emotion model and adopt a new mental factor “curiosity” into the calculation of the velocity vector as following:

  1. (1)

    Limit bounds of the depth of vision: K min < K < K max;

  2. (2)

    Limit a maximum value of velocity: V i (t) < V max;

  3. (3)

    Make the change of emotion factor “pleasure” to be dynamical, i.e., using Eqs. 10 and 11 instead of Eq. 5.

$$ Pv(t + 1) = N \cdot \frac{1}{{1 + { \exp }( - \beta \cdot x(t))}}, $$
(10)
$$ x(t) = \mu \cdot { \sin }(\pi (Pv(t) + e_{p} \cdot M)), $$
(11)

where \( \mu , \, M , { }N, \, \beta \) are positive parameters.

  1. (4)

    “Curiosity” means two situations concerning with the change of the internal state of robots:

    1. (i)

      Robot i keeps to search the goals k (k = 1, 2,…,k,…,K) before it arrives at them and after it arrives at one goal k, then its “curiosity” to the goal is reduced eventually;

    2. (ii)

      During robot i exploring the environment, when it crushes to obstacles, its “curiosity” is reduced eventually.

Equation 12 defines the “curiosity”, and Eq. 13 builds an improved internal model of autonomous robots:

$$ Cv_{ik} (t + 1) = \left\{ \begin{gathered} I_{k} - \lambda_{1} \cdot Cv_{ik} (t)\quad {\text{goal}} = 1,2,3, \ldots ,K \hfill \\ I_{k} - \lambda_{2} \cdot Cv_{ik} (t)\quad {\text{obstacles}} \hfill \\ 0\quad {\text{otherwise}} \hfill \\ \end{gathered} \right., $$
(12)
$$ \user2{V}_{\user2{i}} (\user2{t} + 1) = V_{i} (t) - l_{1} \cdot \sum\limits_{j} {Pv_{ji} } + l_{2} \cdot \sum\limits_{i} {Pv_{ij} } + l_{3} \cdot \sum\limits_{i} {Cv_{ik} } . $$
(13)

where \( Cv(t) \) is the degree of “curiosity” in the improved internal model, \( I{}_{k} \) is the positive parameter for different goal k and coefficients \( \lambda_{1} , \, \lambda_{2} \), \( l_{3} > 0 \).

Simulation Experiments

To exam the improved internal model proposed in section “An Improved Internal Model for Autonomous Robots”, computer simulation experiments were performed using two kinds of environments for multiple robots goal exploration. Goal position is supposed to be known by the robots; however, the route to the goal (or goals) is unknown. Robots are required to move to goal area(s) avoiding crush to the obstacle or each other. The movements of robots are the behaviors driven by the internal models using affective factors.

In section “Single Goal in Simple Environment”, a simple environment with single goal was used for two autonomous robots; meanwhile, in section “Multiple Goals in Complicated Environment”, a complicated environment (maze-like) with multiple goals was used. The number of robots was increased to 5 in simulations; conveniently, here, we show the results of two robots for analyzing the effectiveness of the theory.

Single Goal in Simple Environment

The size of a two-dimensional exploring space is 500 × 500, an obstacle exists in the center of the square, and two robots start from the corner of left-down to search a goal area located at the corner of right-up: the environment of simulation is shown in Fig. 3. To compare with simulations reported in [8], two cases of timing of start of robots were also executed in our experiments:

Fig. 3
figure 3

A simple environment with an obstacle and a goal

  1. (i)

    Two robots started at the same time;

  2. (ii)

    One started after another one started 200 steps. All parameters were set as shown in Table 1.

    Table 1 Parameters used in the simulation of this section
  3. (i)

    Simulation results of two robots started at the same time are shown in Fig. 4. The total exploration time using the improved model was 134 steps (Fig. 4b), which is less than the time of conventional model 151 steps (Fig. 4a). The change of pleasure degree of the robots showed similar as shown as in Fig. 5.

    Fig. 4
    figure 4

    Tracks of two robots started at the same time exploring a goal in a simple unknown environment

    Fig. 5
    figure 5

    Comparison of the change of the degrees of pleasure of two robots started at the same time during exploring a goal in a simple unknown environment

  4. (ii)

    Simulation results of two robots started at the same time are shown in Fig. 6. The total exploration time using the improved model was 335 steps (Fig. 6b), which is more than the time of conventional model 291 steps (Fig. 6b). The change of pleasure degree of the robots showed similar as shown as in Fig. 7.

    Fig. 6
    figure 6

    Tracks of two robots started at the different time exploring a goal in a simple unknown environment

    Fig. 7
    figure 7

    Comparison of the change of the degrees of pleasure of two robots started at the different time during exploring a goal in a simple unknown environment

Multiple Goals in Complicated Environment

The size of a two-dimensional exploring space is 500 × 500, multiple obstacles exist in the different positions of the square, two robots start from two different positions to search three goal areas located at the different positions: the environment of simulation is shown in Fig. 8. Two cases of timing of start of robots were also executed in this experiment:

Fig. 8
figure 8

A complicated exploration environment with multiple obstacles and goals

  1. (i)

    Two robots started at the same time;

  2. (ii)

    One started after another one started 200 steps.

All parameters were set as shown in Table 2, and the limitation of steps of a trial was set to 2,000 steps.

Table 2 Parameters used in the simulation of this section
  1. (i)

    Simulation results of two robots started at the same time are shown in Fig. 9a, b. Robots with conventional model stopped exploration at step 15, for the kinds of reasons such as obstacles and multiple goals (Fig. 9a); meanwhile, in the occasion of the improved model showed active exploration and reached at all three goals within 2,000 steps limitation of time (Fig. 9b).

    Fig. 9
    figure 9

    Comparison of tracks of two robots driven by the conventional model and the improved model (started at the same time exploring three goals in a complicated unknown environment)

  2. (ii)

    Simulation results of two robots started at the different time are shown in Fig. 10a, b. Robots with conventional model also stopped exploration at step 214, failed to arrive at any goal (Fig. 10a). Robots with the improved model also showed active exploration; however, one failed to reach at Goal 3 in the limitation of 2,000 steps (Fig. 10b).

    Fig. 10
    figure 10

    Comparison of tracks of two robots driven by the conventional model and the improved model (started at the different time exploring three goals in a complicated unknown environment)

The change of the degree of pleasure, as curves depicted in Figs. 11 and 12 respective to Figs. 9 and 10, showed the difference of the internal state between conventional model and improved model. More dynamical activity was observed in the case of our improved model.

Fig. 11
figure 11

Comparison of the change of “pleasure” degree of two robots driven by the conventional model and the improved model (started at the same time exploring three goals in the complicated unknown environment)

Fig. 12
figure 12

Comparison of the change of “pleasure” degree of two robots driven by the conventional model and the improved model (started at the different time exploring three goals in the complicated unknown environment)

Discussions

According to the simulation results reported in sections “Single Goal in Simple Environment” and “Multiple Goals in Complicated Environment”, the improved model showed its effectiveness in different environments and its superiority to the conventional model in the situation of exploring multiple goals in the complicated environment. A comparison of these results is given by Table 3.

Table 3 Comparison of results of simulations between conventional model and improved model

Conclusion

To deal with dynamical complex environment and acquire collaborative behaviors of autonomous robots, “mental” states of robots play important roles during the decision process of actions. An emotion-curiosity-driven behavior model is proposed for the cooperative exploration activity of multiple robots. Machine consciousness including pleasure, arousal, and curiosity motivates robots to control the velocities of exploration movement and keep the activeness of exploration in the unknown environments. Kinds of simulations were executed, and the results showed the effectiveness of the proposed model. Other intelligent factors such as memory function and learning ability which often adopted in systems of biological analogy are expected to be composed with the internal model to raise the exploration efficiency in the future.