Keywords

1 Introduction

The success of deep learning models in AI (LeCun et al. 2015) is arguably leading to a shift in the analysis of cognitive architectures within cognitive science and AGI research. Traditionally, such architectures (e.g., Minsky 2006, Newell 1990) focused on the multitude of observable or inferred functionality of cognitive agents, and proposed structures and mechanisms for their implementation. The complexity of human cognition is thought to emanate from a large set of implemented functions and structural components, which have to be modeled and implemented by the researcher.

A different perspective is given by the idea of general learning systems (e.g., Hutter 2005), which propose that neocortex and hippocampus allow for general hierarchical function approximation to express complex emergent behaviors and perceptual abilities, with other parts of the brain supplying infrastructure for learning, reward generation, differential attention control, routing of information between cortical areas, and interfacing with perceptual and motor systems (see Marcus et al. (2014)).

This perspective suggests that cognition and interpersonal differences are largely defined by motivational and attentional responses to environmental stimuli. If we understand intelligent agents primarily as learning systems, we need to understand the structures that shape this learning and give rise to perceptual models, imagination, reasoning and problem solving, decision making, reflection, social interaction, and so on. Unlike most current AI learning systems, human behavior is not driven by a single reward or unified utility function, but by a complex system of physiological, cognitive and social needs. For the implementation of a fully autonomous AI agent in a complex environment, we will need to identify a set of needs that spans the full spectrum of relevant behavioral tendencies. A suggestion for such a set has been introduced by the Psi theory (Dörner 1999), and later extended in the MicroPsi model (Bach 2012b, 2015), which describes a detailed framework of such needs, reward generators and cognitive modulators. The MicroPsi motivation model forms the core of the cognitive architecture MicroPsi (Bach 2009), but has also been used in the cognitive architecture OpenCog as OpenPsi (Cai et al. 2012).

2 From Needs to Behavior

The MicroPsi model sees a cognitive architecture as a system to regulate and control a complex organism (or comparable agent). Perception, learning and imagination are tools in the service of that regulation. The demands of the agent (such as sustenance, rest, social embedding) are represented as needs, and signaled to the cognitive processes as urges, which are proportional to the strength (weight) of the respective need. A need can be understood as a tank that runs dry over time, and which can be filled by satisfaction events (consumptive actions) or emptied by aversive events.

In MicroPsi, memory content is represented in a neuro-symbolic formalism, and representations of situations, objects and actions may be associated to urge signals via learning. The presence of an urge sends activation into the representations, and thus primes content in memory and perception, so attention and processing resources of the agent are directed at them. The strength of urges modulate cognition to the situation at hand, giving rise to the configurations that we call affective states.

The satisfaction or frustration of a need generates reinforcement signals that (together with sparseness and stability criteria) give rise to reinforcement learning and the gradual formation of complex processing structures enabling perception, mental simulation and high-level cognition. Finally, urge signals inform the decision making of the agent, direct its impulses and give rise to goal-directed behavior.

2.1 Needs

Needs reflect the demands of the organism, for instance, the requirement to maintain a suitable body temperature, to obtain rest, or to find sustenance. The physiological needs give rise to foraging, feeding, resting, pain avoidance and so on.

Social behaviors are driven by social needs, such as a need to be recognized and reaffirmed as a member of a social group (affiliation), or to reduce another agent’s suffering (nurturing), to conform to internalized norms (legitimacy), to obtain a position in a social hierarchy (status), to regulate a social environment (justice), to engage in courtship (affection), and to experience intimacy.

Cognitive needs direct skill acquisition (competence), exploration (uncertainty reduction) and the formation of mental representations (aesthetics).

All goals of the agent correspond to the satisfaction of a need, or the avoidance of its frustration. This reflects the direction of behavior by approach goals and avoidance goals (Carver and Scheier, 1998; Elliot and Church 1997; Higgins, 1996).

A need is defined by an extensive parameter set:

Each need is characterized by its current value \( v_{t} \in \left[ {0, 1} \right] \) and a weight \( \omega \, \in \,{\mathbb{R}}^{ + } \), which specifies how strong the need registers in comparison to others (\( v_{0} \) is the initial value of the need). An urge with an urge strength \( \alpha_{t} \, \in \,\left[ {0, 1} \right] \) represents the difference between the target state (a fully satisfied need) and its current state for each need:

$$ \alpha_{t} = \omega \left[ {1 - v _{t - 1} } \right]_{0}^{{1^{2} }} $$

In addition to the strength of an urge, a need is characterized by the urgency \( \beta_{t} \in \left[ {0, 1} \right] \) to address it. Urge strength and urgency are separate values, because sometimes, even weak needs might have a short time window for satisfying them, which should trigger the activity of the agent. The urgency is determined by amount of time left to realize satisfaction, either because a crucial resource is about to drop to a critically low level, or because the dynamics in the environment require immediate action.

$$ \beta_{ t} = \omega \left[ {\frac{{k - remaining\, time_{t} }}{k}} \right]_{0}^{{1^{2} }} $$

The value of the need represents the inverse of a systemic demand. Satisfying the need leads to an increase of the value, proportional to the gain \( \text{g}\, \in \,\left[ {0, 1} \right] \) of the need, frustrating it leads to a reduction of the value, proportional to the loss \( \ell \, \in \,\left[ {0, 1} \right] \).

The satisfaction is provided either from a consumption event (such as eating for the sustenance need, reaping praise for the affiliation need, or successfully acquiring a skill for the competence need). Demands can also increase due to aversive events (such as suffering an injury for the pain avoidance need, or failing at a task for the competence need), which leads to a frustration of the need, proportional to its loss factor. A strong gain or loss means that a consumption or frustration event has a stronger effect on the value and reward signal of that need.

Some needs may also be satisfied or frustrated (usually to a lesser degree) by an anticipated event. For instance, anticipating future uncertainty may frustrate the need for uncertainty reduction. The strength of the gain and loss due to anticipated events is given by the factors \( {\hat{g}} \in \left[ {0, 1} \right] \) and \( \hat{\ell } \in \left[ {0, 1} \right] \). Most needs also deplete over time on their own, depending on their decay factor, so they need to be regularly replenished. For instance, the demands for sustenance, rest and affiliation will increase over time, and thus require the agent to satisfy them repeatedly. In each moment, the agent may experience a change in its current demand, \( \delta_{t} \). Let \( \delta_{t}^{ + } \) be the positive change at time t, \( \delta_{t}^{ - } \) the negative change, and \( \widehat{{\delta_{t}^{ + } }} \) and \( \widehat{{\delta_{t}^{ - } }} \) anticipated positive and negative changes. The new value of the need can be calculated as:

It seems natural to assume that the value of a need decays logistically, for instance using a sigmoidal function that will decrease slowly at first, then rapidly, before decreasing slowly again, such as

$$ \sigma \left( x \right)\text{ := }1 - \frac{1}{{1 + e^{{ - 12\left( {x - 1/2} \right)}} }} $$

For \( 0 < y < 1, \) we can calculate the inverse of this function as

$$ \sigma^{ - 1} \left( y \right)\text{ := }\frac{1}{12}{ \log }\left( {\frac{1 - y}{y}} \right) + \frac{1}{2} $$

so that we can determine how far the decay of a variable has progressed, based on its current value.

The differences in parameter configurations of the needs (especially weight, gain, loss, and decay factor) can be used to model interpersonal variance in motivation and thereby different personality properties. For instance, high weight and loss for affiliation, combined with a low weight and gain for competence, would lead to high agreeableness. High weights and gains for competence, uncertainty reduction and aesthetics would lead to high openness. Low decay and high loss for affiliation may lead to introversion. High weights and loss on competence and uncertainty reduction will lead to high conscientiousness, and high loss on competence, affiliation, and uncertainty reduction may lead to neuroticism (Bach 2012a).

Pleasure and pain are generated by changes in values of a need. They provide rewards for the agent’s actions, and are used as reinforcement signals for learning. Satisfying a need leads to a pleasure signal that is proportional to how much the value of the need increases, the gain factor \( \text{g} \) of the need, and its pleasure sensitivity . The anticipation of future consumption can also generate a pleasure signal, proportional to the imaginary pleasure sensitivity . Pleasure signals decay over time, according to a sigmoid function with a pleasure decay:

Pain signals are generated as a reaction to the depletion of a need.

In addition, depleted needs (such as an empty stomach) may also cause a continuous pain signal. For instance, using the following function with \( \theta = 0.10 \) will increase the pain value from 0 to 1, starting when the value is depleted to 10%:

2.2 Events and Consumptions

Anticipated events are expected situations in the inner or perceptual world of the agent. They become relevant if they are associated with the expectation of a consumption event \( {\mathcal{C}} \), i.e. with satisfying or frustrating a need and can be defined as

$$ {\mathcal{E} := \langle \mathcal{C}},er,c,s,\varepsilon \rangle $$

Events can just happen, or they can be chosen as goals, which are actively pursued by the agent and are either appetitive (positive reward) or aversive (negative reward). In the latter case, the goal is to avoid the event. The motivational relevance of any event is given by its expected reward \( er \in \left[ { - 1, 1} \right] \), which can be positive (a need is satisfied) or negative (frustration). The certainty \( c \in \left( {0, 1} \right] \) of the event specifies the confidence of the agent that the event will happen. The skill (epistemic competence)\( s \in \left[ {0, 1} \right] \) specifies the chance that the agent can reap the benefits or avoid the dangers of the event. The expiration time \( \varepsilon_{t} \in {\mathbb{R}}^{ + } \) determines the duration until the event is expected to happen. In every time step, or as the result of a belief update, the remaining time is updated:

$$ \varepsilon_{t} = \left[ {\varepsilon_{t - 1} - {\text{duration}}\left( {t, t - 1} \right)} \right]_{0}^{\infty } $$

(Note that this is a simplification; a more accurate model should capture expected reward, epistemic competence and expiration time as distributions).

Consumptions are the parts of an event that affect needs of an agent. Each consumption either satisfies or frustrates a need (with a positive or negative total reward).

$$ {\mathcal{C} := \langle \mathcal{N}},r_{t} ,r^{total} ,r^{max} ,rd,discount\rangle $$

When the associated event is triggered, the consumption continuously generates a certain amount \( \left( {r_{t} \text{ } \in {\mathbb{R}}} \right) \) of satisfaction or frustration for its associated need \( {\mathcal{N}} \), over a certain reward duration \( reward duration \in {\mathbb{R}}^{ + } \), limited to a certain maximum reward \( r^{max} \in {\mathbb{R}} \) per time step, until the total reward \( r^{total} \in {\mathbb{R}} \) is reached.

In an organism, reward signals are delivered via neurotransmitters, and their release typically follows a right-skewed distribution. We may approximate this for instance as a \( \chi \) distribution with degree 2:

$$ signal\left( t \right): = t e^{{ - \frac{1}{2}t^{2} }} $$

This way, the signal peaks at \( t = 1 \) with a value of 0.6, and at \( t = 3.5 \), we have delivered 99.78% of the signal. In a discrete simulation, the actual reward value delivered at a time step \( t \) for a reward signal triggered at \( t_{0} \) can be approximated using

$$ t_{1} = \left( {t - t_{0} } \right) \frac{3.5 }{rd}{\text{duration}}\left( {{\text{t}}, {\text{t}} - 1} \right);\,t_{2} = \left( {t - 1 - t_{0} } \right) \frac{3.5 }{rd}{\text{duration}}\left( {{\text{t}}, {\text{t}} - 1} \right) $$
$$ r_{t} = \left[ {\frac{{3.5 r^{total} }}{rd}\mathop \smallint \limits_{{t_{1} }}^{{t_{2} }} t e^{{ - \frac{1}{2}t^{2} }} } \right]_{{ - r^{max} }}^{{r^{max} }} = \left[ {\frac{{3.5 r^{total} }}{rd}\left( {e^{{ - \frac{1}{2}t_{1}^{2} }} - e^{{ - \frac{1}{2}t_{2}^{2} }} } \right)} \right]_{{ - r^{max} }}^{{r^{max} }} $$

The reward value of an active consumption will change the value of the corresponding need by

$$ \delta_{t}^{ + } = \left[ {r_{t}^{{\mathcal{C}}} } \right]_{0}^{\infty } ;\,\delta_{t}^{ - } = \left[ {r_{t}^{{\mathcal{C}}} } \right]_{ - \infty }^{0} $$

as well as create the corresponding pleasure and pain signals. Pleasure and pain signals indicate the performance of the agent and can be used to drive reinforcement learning. Learning associates the respective need indicator with the current situation representation or action, to establish an appetitive or aversive goal. It can also be used to associate the current situation or action with preceding elements in protocol memory, to establish procedural memory. The learning signal is derived by multiplying the respective pleasure or pain signal with the weight of the affected need.

2.3 Managing Expectations

Perception and cognition trigger belief updates, which manifest as the establishment, change, execution or deletion of anticipated events. At each point in time, the agent maintains a list of anticipated events. If the agent visits one of these events, either during an update (such as establishing or changing an event, or reflecting upon it during planning), it generates anticipation rewards, depending on the certainty \( c \) of the event and the epistemic competence (skill) \( s \) of the agent to manage or avoid it:

$$ \widehat{{\delta_{t}^{ + } }} = c s\left[ {\frac{{r_{t}^{{\mathcal{C}}} }}{{1 + discount \varepsilon_{t} }}} \right]_{0}^{\infty } ;\,\widehat{{\delta_{t}^{ - } }} = c\left( {1 - s} \right) \left[ {\frac{{r_{t}^{{\mathcal{C}}} }}{{1 + discount \varepsilon_{t} }}} \right]_{ - \infty }^{0} $$

Here, we are using hyperbolic discounting, to ensure that the agent is progressively less concerned about the effects of events, the more the event lies in the future. The effects of anticipated events on need satisfaction and pleasure/pain generation only manifest as the result of the agent focusing on them in the respective time step, but the actual signal generation may extend over an extended time.

Changes in an expected event may concern higher or lower certainty in its occurrence, a difference in remaining time, a different expected reward, or a different ability to deal with it (skill). In such cases, \( \delta \) is computed using the change in the expected reward. If expected events manifest, they generate a reward for the need for certainty. If they do not, they frustrate the need for certainty (which increases the probability for the agent to engage in exploration behavior). If events are goals of the agent, their occurrence or failure will also satisfy or frustrate the need for competence.

3 Modulators

Motivation determines the direction of cognition and the relevance of its content. In contrast, modulation adapts the parameters of cognition to the situation at hand. Examples of such modulators are arousal, valence and attentional focus, which may vary in response to chances in the external and internal environment of the agent. A configuration of modulators amounts to an affective state.

Each modulator \( {\mathcal{M}} \) has a current value , and five parameters that account for individual variance between subjects: the baseline is the default value of the modulator; min and max \( \in {\mathbb{R}} \) the upper and lower bound of its changes, the volatility defines the reaction to change, and the decay time how long it takes until the modulator returns to its baseline.

Modulators do not assume a target value \( \tau \) instantly, but according to their volatility:

We currently use six modulators: valence, arousal, dominance, which correspond to the pleasure, arousal and dominance of Mehrabian’s PAD model (Mehrabian 1980), and have first been suggested as valence, arousal and tension by Wundt (1910). Dominance is also sometimes called agency; it describes how much the agent is in control of a situation, as opposed to being in a passive state with reduced metacognition. In addition, MicroPsi defines three attentional modulators: resolution level, focus and exteroception. Modulators change depending on aggregates of the current values and changes of the needs. We determine the value of these aggregates using marginal sums (because combining urgency or pain signals is not simply additive, but approaches a limit given by the signaling pathways of the organism):

$$ {\text{marginal sum}}\left( {V, limit} \right): = \left. {\sum\nolimits_{n = 0}^{\left| V \right|} {S_{n} } } \right|S_{n} \text{ := }\frac{{limit - S_{n - 1} }}{limit}v_{n} ;\,limit = \hbox{max} \left( {\left\{ {\omega |\omega \in weights_{{\mathfrak{N}}} } \right\}} \right) $$

Valence represents a qualitative evaluation of the current situation. Valence is determined as the aggregate of pleasure and pain:

$$ \tau^{valence} = \frac{{{\mathcal{P}} - {\mathcal{Q}}}}{limit} $$

Arousal reflects the combined strength and urgency of the needs of the agent. Arousal leads to more energy expenditure in actions, action readiness, stronger responses to sensory stimuli, and faster reactions:

$$ urge = {\text{marginal}}\,{\text{sum}}\left( {\left\{ {\upomega_{{\mathcal{N}}} \alpha_{{\mathcal{N}}} } \right\}} \right);\,urgency = {\text{marginal}}\,{\text{sum}}\left( {\left\{ {\upomega_{{\mathcal{N}}} \beta_{{\mathcal{N}}} } \right\}} \right) $$
$$ \tau^{arousal} = \frac{urge + urgency}{limit} - 1 $$

Dominance suggests whether to approach or retract from the attended object, based on the competence for dealing with it. High dominance corresponds to a high anticipated reward, a middle value marks indifference, and low dominance tends to lead to retraction from the object.

$$ epistemic \,comp. = s_{current\,goal\,event} \quad \quad general\,comp. = \sqrt {v^{{{\text{comp}}.}} epistemic\,comp.} $$
$$ \tau^{dominance} = general\,comp. + epistemic\,comp. - 1 $$

The resolution level controls the level of detail when performing cognitive and perceptual tasks. A high resolution will consider more details and thus often arrive at more accurate solutions and representations, while a low resolution allows faster responses. (In MicroPsi, the resolution level is interpreted as the width of activation spreading in neuro-symbolic representations). It is calculated by the urge strength of the goal, but reduced by its urgency, allowing for faster responses.

The focus modulator defines a selection threshold, which amounts to a stronger focus on the current task, and a narrower direction of attention. Suppression is a mechanism to avoid oscillations between competing motives. (In our implementation of the model, we also use focus as a factor to proportionally increase pleasure and pain signals of a need that corresponds to a current goal, i.e. is currently in attendance.) Focus is increased by the strength and urgency of the current goal, and is reduced by a low general competence.

Exteroception (sometimes also called securing rate)- determines the frequency of obtaining/updating information from the environment, vs attending to interoception (mental events). A dynamic environment requires more cognitive resources for perceptual processing, while a static environment frees resources for deliberation and reflection. The securing rate is decreased by the strength and urgency of the leading motive, but increases with low competence and a high need for exploration (which is equivalent to experienced uncertainty).

4 Feelings

In human beings, the ability to perform motivational appraisals precedes symbolic and conceptual cognition. These appraisals are influencing decision making and action control below the consciously accessible level, and can often be experienced as distinct sensations (feelings). While the world “feeling” is sometimes colloquially used to mean “emotion”, “romantic affect” or “intution”, here it refers to the hedonic aspect of an emotion or motivational state, i.e. a perceptual sensation with distinct qualities that may or may not accompany the emotional episode. Feelings sometimes correspond to characteristic changes in modulation of physiological parameters during certain emotions, such as tension of muscles and increased heart rate during anger, or the flushing of the cheeks during shame. However, we suspect that in large part, the mapping of feelings to the body image serves disambiguation (such as love or heartbreak in the chest, anxiety and power in the solar plexus, cognitive events in the head). To further the disambiguation, feelings tend to have a valence (pleasant or painful), as well as additional perceptual features (weight, extension, expansion/contraction). While there seems to be considerable interpersonal variation in how feelings are experienced, and these experiences change during childhood development, most subjects report similarity in the way in which they sense correlates to their emotional states (Nummenma et al. 2013). Our artificial agents do not possess a dynamic self model that would allow to model cognitive access to their feelings and other experiential content, but we can treat them as semantic items that influence conversation, and model their influence on the expression of emotional states, especially with respect to posture and movement patterns.

5 Emotions

The MicroPsi model does not assume that emotions are explicitly implemented, like for instance in the classic OCC model (Orthony et al. 1988), but emergent, as perceptual classifications. Modulator dimensions give rise to a space of affective states, with higher level emotions resulting from affects that are bound to an object via appraisals that result from motivational relevance. This makes the MicroPsi model similar to the EMotion and Adaptation Model (EMA) (Gratch and Marsella 2009). Emotion categories are perceptual classifications that we can use to characterize and predict the modulator state and behavior tendencies of an agent.

There is no generally accepted taxonomy of emotional states, and the classification of emotions depends very much of on the cultural and individual context (see, for instance, Cowen and Kentner 2017). For the purpose of classification within our model of conversational agents, we can nevertheless give characterizations, such as the joy, as a state of positive valence and high arousal, and bliss, with positive valence and high resolution level (which often corresponds to low arousal).

Social emotions often depend on a difference between how we perceive an agent or action and a normative expectation (how the agent or action should be). For instance, we call the perception of the position of an agent in a social hierarchy status, and the measure of the actual value of that agent esteem (corresponding to the level of the need for legitimacy). A difference between status and esteem in another agent is perceived as an injustice, and in oneself as a source of guilt. An event that causes other agents to lower their esteem of oneself may be a cause of shame. Envy is an emotion that describes a status differential between oneself and another agent that is not reflected by a corresponding difference in esteem. In this way, we have characterized 32 emotions along the following dimensions: valence (positive, negative or neutral), dominance (agency/control), arousal, urge (as aggregate of all present need strengths), urgency, certainty, immediacy vs. expectation vs. memory of past, confirmation of expectation, competence, physiological pleasure, valence for ingestion (appetence/disgust), aesthetic valence, normative valence of agent (esteem), status of agent (level in hierarchy), relational valence of agent (sympathy, potential for affiliation), romantic valence of agent (potential for affection), erotic valence of agent (sexual attraction), normative valence of self (self-esteem, need for legitimacy), status of self, relational valence of self (need for affiliation), romantic valence of self (need for affection). (The detailed description of this characterization is beyond the scope of this paper). In this way, typical emotion categories can be defined and mapped to feelings, facial expression, modulation of utterances, changes in posture etc.

6 Evaluation

The MicroPsi model has been implemented in various agents. The extensions that we present here attempt to approach the time dynamics of human behavior, by modeling the gradual release and change of signals, and dealing with anticipated rewards. A full blown psychological study has been outside of the scope of this engineering work. Instead, we annotated video sequences with human actors with their motivational events (expecting events, changing expectations, setting and dropping goals, experiencing events), and displayed the resulting modulator dynamics and emotional states in real-time (Fig. 1). The viewer combines consumptions with corresponding physiological, social and cognitive needs (eat–food, drink/perspire–water, heal/injure–health, recover/exert–rest, acceptance/rejection–affiliation, virtue/transgression–legitimacy, win/loss–status, compassion/pity–nurturing, connection/abandonment–affection, success/failure–competence, confirmation/disconfirmation–uncertainty reduction, enjoyment/disgust–aesthetics). For each need, it displays real-time responses to pleasure, pain, urge and urgency, and the combination of these into aggregates (valence, global urgency/stress level, global urge) and modulators (resolution level, focus, dominance, exteroception and arousal). By triggering consumptions based on events in the video, we are able to reproduce and display the motivational and affective dynamics of the actor. Future work may use the same approach to model affective states of human interaction partners of conversational agents in real time.

Fig. 1.
figure 1

Real-time motivational system, using motivational, cognitive and social needs to drive valence, urge, urgency and modulators.

While such an approach is insufficient to demonstrate psychological adequacy of the model (even though such adequacy is ultimately the goal of our work), it demonstrates that MicroPsi can be used to mimic plausible human behavior over a range of events, which we see as an important step in creating a computational model of complex motivation and emotion.