Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Norms define an ideal behaviour for an autonomous agent in an open environment. However, having individual goals to pursue, self-interested agents might not want or be able always to adhere to the norms imposed on them. Depending on the way they are given a computational interpretation, norms can be regarded as soft or hard constraints. When modelled as hard constrains, norms are regarded as regimented, in which case the agent has no choice but blindly to follow the norms [12]. Although regimentation guarantees norm compliance, it greatly restricts agent autonomy. Conversely, enforcement approaches in which norms are modelled as soft constraints, leave the choice of obeying or disobeying the norms to the agent. However, in order to encourage norm compliance, there are consequences introduced in terms of punishment in case the agent violates the norm [25, 29]. Moreover, in some enforcement approaches [1] the agent is rewarded for complying with a norm. The enforcement approaches can be broadly divided in two categories. In the utility-based approaches [1, 2, 26] there is a utility gain/loss associated with respecting norm or not, whereas in the pressured norm compliance approaches [25], violating a norm or not is determined by the interference of the norm in satisfying or hindering the agent goals. Gaining better utility or not losing utility is the basis of normative reasoning in the former category, while in the latter it is the potential conflicts between norms and agent goals. If there is no such conflict, the agent only complies with a norm if there are goals that are hindered by punishment of violation, and violates the norms otherwise. On the other hand, if there is a conflict, the agent does not comply unless the goals hindered by punishment are more important than goals facilitated by compliance.

Existing work on normative practical reasoning using enforcement have considered different phases of the practical reasoning process, such as plan generation and plan selection. In [27] norms are taken into account in the agent’s plan generation phase, whereas [26] takes norms into consideration when deciding how to execute a pre-generated plan with respect to the norms triggered by that plan. There is also a substantial body of work on integration of norms into the BDI architecture [30]. The BOID architecture [7] extends BDI with the concept of obligation and uses agent types such as social, selfish, etc. to handle the conflicts between beliefs, desires, intentions and obligations. Another extended BDI architecture is proposed in [9], which focusses on norm recognition and considering them in agent decision making processes. More recently, [2] proposed a novel way of utilising permission norms in a BDI agent when the agent does not have complete information about the environment it operates in.

In this paper we define an approach for practical reasoning that considers norms in both plan generation and plan selection. We extend the current work on normative plan generation such that the agent attempts to satisfy a set of potentially conflicting goals in the presence of norms, as opposed to conventional planning problems that generate plans for a single goal [26, 27]. Additionally, since in reality the actions are often non-atomic, the model allows for planning with durative actions that can be executed concurrently. Durative actions reflects the real time that a machine takes to execute certain actions, which is also known as “real-time duration” of actions [6]. More importantly, another contribution of this paper is introducing an enforcement approach that is a combination of utility-based and pressure-based compliance methods mentioned earlier. In order to do so, we first extend the notion of conflict defined in [25] by allowing conflict between norms as well as between norms and goals. We then define a penalty cost for norm violation, regardless of the existence of conflict. Whenever a norm is triggered, both outcomes of norm compliance and violation and their impacts on hindering or facilitating other goals and norms, are generated and compared according to their utility. Moreover, in those cases that there are no conflicts and no goals or norms hindered by the punishment of violation: loss of utility drives the agent toward compliance. Regarding plan selection, generated plans are compared based on the utility of the goals satisfied and cost of norms violated in the entire plan. Both plan generation and plan selection mechanisms proposed here are implemented using Answer Set Programming (ASP) [15].

ASP is a declarative programming paradigm using logic programs under the answer set semantics. In this paradigm the user provides a description of a problem and ASP works out how to solve the problem by returning answer sets corresponding to problem solutions. The existence of efficient solvers to generate the answers to the provided problems has increased the application of ASP in different domains of autonomous agents and multi-agent systems such as planning [24] and normative reasoning [8, 28]. Several action languages (e.g. event calculus [21], \(\mathcal {A}\) [14] and its descendants (e.g. \(\mathcal {B,C}\) [14]), Temporal Action Logics (TAL) [11]) have been implemented in ASP [22, 23], which indicates that ASP is an appropriate tool for reasoning about actions. We therefore, propose an implementation of STRIPS [13] as an action language in ASP.

This paper is organised as follows. The formal model and its semantics are proposed in Sect. 2, followed by the computational implementation of the model in Sect. 3. Section 4 provides an example that illustrates the main features of the model in action. Finally, after the discussion of related work in Sect. 5, we conclude in Sect. 6.

2 A Model for Normative Practical Reasoning

This section introduces a formal model and its semantics for normative practical reasoning in the presence of durative actions. The foundation of this model is classical planning in which an agent is presented with a set of actions and a goal. Any sequence of actions that satisfies the goal is a solution to the planning problem. In Sect. 2.1 we extend the classical planning problem by substituting a single goal with a set of potentially inconsistent goals G and a set of norms N. A solution for such a problem is any sequence of actions that satisfies at least one goal. The agent has the choice of violating or complying with triggered norms, while satisfying its goals.

2.1 Syntax

A normative planning system is a tuple \(P = (F\!L, \varDelta , A, G, N)\) where \(F\!L\) is a set of fluents, \(\varDelta \) is the initial state, A is a set of durative STRIPS-like [13] actions, G denotes the set of agent goals and N denotes a set of norms imposed on the agent that define what an agent is obliged or forbidden to do under certain conditions. We now describe each of these in more details.

Fluents: \(F\!L\) is a set of domain fluents that accounts for the description of the domain the agent operates in. A literal l is a fluent or its negation i.e. \(l = f\!l\) or \(l = \lnot f\!l\) for some \(f\!l \in F\!L\). For a set of literals L, we define \(L^{+} = \{f\!l | f\!l \in L\}\) and \(L^{-} = \{f\!l | \lnot f\!l \in L\}\) to denote the set of positive and negative fluents in L respectively. L is well-defined if there exists no fluent \(f\!l \in F\!L\) such that \(f\!l \in L\) and \(\lnot f\!l \in L\), i.e. if \(L^{+} \cap L^{-} = \emptyset \).

The semantics of the model are defined over a set of states \(\varSigma \). A state \(s \subseteq F\!L\) is determined by set of fluents that hold true at a given time, while the other fluents (those that are not present) are considered to be false. A state \(s \in \varSigma \) satisfies fluent \(f\!l \in F\!L\), denoted \(s \models f\!l\), if \(f\!l \in s\). It satisfies its negation \(\lnot f\!l\) if \(f\!l \not \in s\). This notation can be extended to a set of literals as follows, set X is satisfied in state s, \(s \models X\), when \(\forall x \in X \cdot s \models x\).

Initial State: The set of fluents that hold at the initial state is denoted by \(\varDelta \subseteq F\!L\).

Actions: A is a set of durative STRIPS-like actions, that is actions with preconditions and postconditions that take a non-zero duration of time to have their effects in terms of their postconditions. A durative action \(a = \langle pr, ps, d \rangle \) is composed of well-defined sets of literals \(pr(a), ps(a) \subseteq F\!L\) to represents a’s preconditions and postconditions and a positive number \(d(a) \in \mathbb {N}\) for its duration. Postconditions are further divided into a set of add postconditions \(ps(a)^{+}\) and a set of delete postconditions \(ps(a)^{-}\). An action a can be executed in a state s if its preconditions hold in s (i.e. \(s \models pr(a)\). The postconditions of a durative action are applied in the state s at which the action ends (i.e. \(s \models ps(a)^+\) and \(s \not \models ps(a)^-\)).

The model does not allow parallel actions, since it is not realistic to assume that a single agent initiates several actions at the exact same point in time. Concurrency however, is allowed unless there is a concurrency conflict between actions, which prevents them from being executed in an overlapping period of time. The definition of concurrency conflict is adopted from [4] as follows: two actions \(a_{1}\) and \(a_{2}\) are in a concurrency conflict, if the preconditions or postconditions of \(a_{1}\) contradict the preconditions or postconditions of \(a_{2}\).

Goals: G denotes a set of (possibly inconsistent) goals. Goals identify the state of affairs that an agent wants to satisfy. Each goal \(g = \langle r,v \rangle \) is defined as a set of well-defined literals r, that are requirements that should hold in order to satisfy the goal and a positive integer \(v \in \mathbb {N}\) that shows the value or utility gain of the agent upon satisfying this goal. Goal g’s requirements and value are denoted r(g) and v(g), respectively. Goal g is satisfied in the state s when \(s \models r(g)\).

Norms: N denotes a set of event-based norms to which the agent is subject. Each norm is a tuple of the form \(n = \langle d\_o, a_{1}, a_{2}, dl, c \rangle \), where

  • \(d\_o \in \{o,f\}\) is the deontic operator determining the type of norm, which can be an obligation or a prohibition. The agent is assumed to be operating in a permissible society, hence what is not prohibited is permitted.

  • \(a_{1} \in A\) is the action that counts as the norm activation condition.

  • \(a_{2} \in A\) is the action that is the subject of the obligation or prohibition.

  • \(dl \in \mathbb {N}\) is the norm deadline relative to the activation condition, which is the completion of execution of \(a_{1}\).

  • \(c \in \mathbb {N}\) is the penalty cost that will be applied if the norm is violated.

An obligation expresses that taking action \(a_{1}\) obliges the agent to take action \(a_{2}\) within dl time units of the end of execution of \(a_{1}\). Such an obligation is complied with if the agent starts executing \(a_{2}\) before the deadline and is violated otherwise. A prohibition expresses that taking action \(a_{1}\) prohibits the agent from taking action \(a_{2}\) within dl time units of the end of execution of \(a_{1}\). Such a prohibition is complied with if the agent does not start executing \(a_{2}\) before the deadline and is violated otherwise.

2.2 Semantics

Let \(P=(F\!L,\varDelta ,A,G,N)\) be a normative planning problem. A plan is represented by a sequence of actions taken at certain times, denoted as: \( \pi =\langle (a_{0},t_{0}),\cdots ,(a_{n},t_{n}) \rangle \). \((a_{i},t_{i})\) means that action \(a_{i}\) is executed at time \(t_{i} \in \mathbb {Z^{+}} \text{ s.t. } \forall i<j \text{ we } \text{ have } t_{i} < t_{j}\). The total duration of a plan, \(Makespan(\pi )\), is calculated by the relation: \( Makespan(\pi ) = max(t_{i}+d(a_{i}))\). The evolution of a sequence of actions for a given starting state \(s_{0}= \varDelta \) is a sequence of states \(\langle s_{0}, \cdots s_{m}\rangle \) for every discrete time interval from \(t_{0}\) to m, where \(m= Makespan(\pi )\). The transition relation between two states is defined by Eq. 1 below. If an action \(a_{j}\) ends at time \(t_{i}\), state \(s_{i}\) results from removing all delete postconditions and adding all add postconditions of action \(a_{j}\) to state \(s_{i-1}\). If there is no action ending at \(s_{i}\), it remains the same as \(s_{i-1}\).

$$\begin{aligned} \forall i > 0 : \; s_{i} = {\left\{ \begin{array}{ll} (s_{i-1} \setminus ps(a_{j})^{-}) \cup ps(a_{j})^{+} &{} i=t_{j}+d(a_{j}) \\ s_{i-1} &{} \text{ otherwise } \end{array}\right. } \end{aligned}$$
(1)

A sequence of actions \(\pi \) satisfies a goal, \(\pi \models g\), if there is at least one state \(s_i\) in the sequence of states caused by the sequence of actions such that \(s_i \models g\). An obligation \(n_{1} = \langle o, a_{i}, a_{j},dl, c \rangle \) is complied with in plan \(\pi \) (i.e. \(\pi \models n_{1}\)), if the action that is the norm activation condition has occurred (\((a_{i},t_{i}) \in \pi \)), and the action that is the subject of the obligation occurs (\((a_{j},t_{j}) \in \pi \)) between when the condition holds and when the deadline expires (\(t_{j} \in [t_{i}+d(a_{i}), dl+t_{i})+d(a_{i})\)). If \(a_{i}\) has occurred but \(a_{j}\) does not occur at all or occurs in a period other than the one specified, the obligation is violated (i.e. \(\pi \not \models n_{1}\)). In the case of prohibition \(n_{2} = \langle f, a_{i}, a_{j},dl,c \rangle \), compliance happens if the action that is the norm activation condition has occurred (\((a_{i},t_{i}) \in \pi \)) and the action that is the subject of the prohibition does not occur in the period between when the condition holds and when the deadline expires (\(\not \!\exists (a_{j},t_{j}) \in \pi \text{ s.t. } t_{j} \in [t_{i}+d(a_{i}), dl+t_{i}+d(a_{i})\)). If \(a_{i}\) has occurred and \(a_{j}\) occurs in the specified period, the prohibition is violated (i.e. \(\pi \not \models n_{2}\)). The set of satisfied goals, norms complied with and norms violated in plan \(\pi \) are denoted as \(G_{\pi }\), \(N_{cmp(\pi )}\) and \(N_{vol(\pi )}\), respectively.

In classical planning, any sequence of actions that satisfies the goal is a solution to the planning problem. Extending a planning problem to cater for conflicting goals and norms requires considering different types of conflicts as follows:

Conflicting Actions. Actions \(a_{i}\) and \(a_{j}\) have a concurrency conflict iff the preconditions or postconditions of \(a_{i}\) contradict the preconditions or postconditions of \(a_{j}\).

$$\begin{aligned} cf _{action}=\{(a_{i},a_{j}) \text{ s.t. } \exists r \in pr(a_{i}) \cup ps(a_{i}), \lnot r \in pr(a_{j}) \cup ps(a_{j})\} \end{aligned}$$
(2)

Conflicting Goals. Goal \(g_{i}\) and \(g_{j}\) are in conflict iff satisfying one requires bringing about a state of affairs that is in conflict with the state of affairs required for satisfying the other.

$$\begin{aligned} cf _{goal}=\{ (g_{i},g_{j}) \text{ s.t. } \exists r \in g_{i}, \lnot r \in g_{j}\} \end{aligned}$$
(3)

Conflicting Norms. Obligations \(n_{1} = \langle o, a_{1}, a_{2}, dl,c \rangle \) and \(n_{2} = \langle o, b_{1}, b_{2}, dl',c' \rangle \) are in conflict in the context of plan \(\pi \) iff: (i) their activation conditions hold, (ii) the obliged actions \(a_{2}\) and \(b_{2}\) have a concurrency conflict and (iii) \(a_{2}\) is in progress during the entire period over which the agent is obliged to take action \(b_{2}\). The set of conflicting obligations is formulated as:

$$\begin{aligned}&cf^{\pi }_{oblobl}=\{(n_{1},n_{2}) \text{ s.t. } (a_{1},t_{a_{1}}), (b_{1},t_{b_{1}}) \in \pi ; (a_{2}, b_{2}) \in cf _{action};\nonumber \\&\qquad \qquad \qquad \qquad \quad t_{a_{2}} \in [t_{a_{1}}+d(a_{1}), t_{a_{1}}+d(a_{1})+dl); \nonumber \\&\qquad \qquad \qquad \qquad \qquad \quad \!\!\!\! [t_{b_{1}}+d(b_{1}),t_{b_{1}}+d(b_{1})+dl') \subseteq [t_{a_{2}},t_{a_{2}}+d(a_{2})) \} \end{aligned}$$
(4)

On the other hand, an obligation \(n_{1} =\langle o, a_{1}, a_{2}, dl,c \rangle \) and a prohibition \(n_{2} = \langle f, b_{1}, a_{2}, dl',c'\rangle \) are in conflict in the context of plan \(\pi \) iff: (i) their activation conditions hold and (ii) \(n_{2}\) forbids the agent to take action \(a_{2}\) during the entire period over which \(n_{1}\) obliges the agent to take \(a_{2}\). The set \( cf ^{\pi }_{oblpro}\) denotes the set of conflicting obligations and prohibitions as below:

$$\begin{aligned}&cf ^{\pi }_{oblpro}=\{(n_{1},n_{2}) \text{ s.t. } (a_{1},t_{a_{1}}), (b_{2},t_{b_{2}}) \in \pi ; \nonumber \\&\qquad \qquad \qquad \qquad \qquad \!\! [t_{a_{1}}+d(a_{1}),t_{a_{1}}+d(a_{1})+dl) \subseteq \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \!\![t_{b_{2}}+d(b_{2}),t_{b_{2}}+d(b_{2})+dl') \} \end{aligned}$$
(5)

The entire set of conflicting goals and norms is defined as:

$$\begin{aligned} cf ^{\pi }_{norm} = cf ^{\pi }_{oblobl} \cup cf ^{\pi }_{oblpro} \end{aligned}$$
(6)

Conflicting Goals and Norms. An obligation \(n = \langle o, a_{1}, a_{2}, dl,c \rangle \) and a goal g are in conflict, if taking action \(a_{2}\) that is the subject of the obligation, brings about postconditions that are in conflict with the requirements of goal g. The set of conflicting goals and obligations is formulated as:

$$\begin{aligned} cf _{goalobl}=\{ (g,n) \text{ s.t. } \exists r \in r(g), \lnot r \in ps(a_{2})\} \end{aligned}$$
(7)

In addition, a prohibition \( n = \langle f, a_{1}, a_{2}, dl,c' \rangle \) and a goal g are in conflict, if the postconditions of \(a_{2}\) contribute to satisfying g, but taking action \(a_{2}\) is prohibited by norm n.

$$\begin{aligned} cf _{goalpro}=\{ (g,n) \text{ s.t. } \exists r \in r(g), r \in ps(a_{2})\} \end{aligned}$$
(8)

The entire set of conflicting goals and norms is defined as:

$$\begin{aligned} cf _{goalnorm} = cf _{goalobl} \cup cf _{goalpro} \end{aligned}$$
(9)

A sequence of actions \(\pi \) is a plan for P, if all the fluents in \(\varDelta \) hold at time \(t_{0}\) and for each i, the preconditions of action \(a_{i}\) hold at time \(t_{i}\), as well as through the execution of \(a_{i}\), and a non-empty subset of goals is satisfied in the path from initial state \(s_{0}\) to the state holding at time \(t_{m}\), where \(m = Makespan(\pi )\). Furthermore, extending the conventional planning problem by multiple potentially conflicting goals and norms requires defining extra conditions that makes a plan a valid plan and a solution for P. Plan \(\pi \) is a valid plan for P iff:

  1. 1.

    all the fluents and only those fluents in \(\varDelta \) hold in the initial state: \(s_{0} = \varDelta \)

  2. 2.

    the preconditions of action \(a_{1}\) holds at time \(t_{a_{1}}\) and throughout the execution of \(a_{1}\):

    $$\begin{aligned} \forall k \in [t_{a_{1}}, t_{a_{1}}+d(a_{1})), s_{k} \models pr(a_{1}) \end{aligned}$$
  3. 3.

    the set of goals satisfied by plan \(\pi \) is a non-empty consistent subset of goals:

    $$\begin{aligned} G_{\pi } \,\,\, \subseteq \,\,\, G \text{ and } G_{\pi } \,\,\, \ne \,\,\, \emptyset \text{ and } \quad \not \!\exists g_{i}, g_{j} \,\,\, \in \,\,\, G_{\pi } \text{ s.t. } (g_{i}, g_{j}) \,\,\, \in \,\,\, cf _{goal} \end{aligned}$$
  4. 4.

    there is no concurrency conflict between actions that are executed concurrently:

    $$\begin{aligned} \not \!\exists (a_{i},t_{a_{i}}), (a_{j},t_{a_{j}}) \in \pi \text{ s.t. } t_{a_{i}} \le t_{a_{j}} < t_{a_{i}}+d(a_{i}), (a_{i}, a_{j}) \in cf _{action} \end{aligned}$$
  5. 5.

    there is no conflict between norms complied with.

    $$\begin{aligned} \not \exists n_{i},n_{j} \in N_{cmp(\pi )} \text{ s.t. } (n_{i},n_{j}) \in cf ^{\pi }_{norm} \end{aligned}$$
  6. 6.

    there is no conflict between goals satisfied and norms complied with:

    $$\begin{aligned} \not \exists g \in G_{\pi } \text{ and } n \in N_{cmp(\pi )} \text{ s.t. } (g,n) \in cf _{goalnorm} \end{aligned}$$

Let \(satisfied(\pi )\) and \(violated(\pi )\) be the set of satisfied goals and violated norms in plan \(\pi \). The utility of a plan \(\pi \) is defined by Eq. 10 where \( Value \) is a function that returns the value of goals being satisfied and \( Cost \) returns the penalty cost of norms being violated in that plan. The set of optimal plans, Opt, are those plans that maximise the utility.

$$\begin{aligned} Utility(\pi ) = \sum _{g_{i} \in satisfied(\pi )} Value(g_{i}) - \sum _{n_{j} \in violated(\pi )} Cost(n_{j}) \end{aligned}$$
(10)

3 An Answer Set Programming Implementation

Encoding a practical reasoning problem as a declarative specification makes it possible to reason computationally about agent actions, goals and norms. This enables an agent to keep track of actions taken, goals satisfied and norms complied with or violated at each state of its evolution. More importantly, it provides the possibility of querying traces that fulfil certain requirements such as satisfying some specific goals. Consequently, instead of generating all possible traces and looking for those ones that satisfy at least one goal, only those ones that do satisfy at least one goal are generated.

ASP programs consist of a finite set of rules formed from atoms. Atoms are the basic components of the language that can be assigned a truth value (true or false). Literals are atoms or negated atoms. Atoms are negated using classical negation (\(\lnot \)) or negation as failure (not). The former states that something is false, whereas the latter states something is assumed false since it cannot be proven true. The general rule syntax in ASP is: \(l_{0} \leftarrow \; l_{1}, \cdots , l_{m}, not \; l_{m+1}, \cdots , not \; l_{n}.\), in which \(l_{i}\) is an atom (e.g. a) or its negation (e.g. \(\lnot a\)). \(l_{0}\) is the rule head and \(l_{1}, \cdots , l_{m},\) \(not \; l_{m+1}, \cdots , not \; l_{n}\) are the body of the rule. The above rule is read as: \(l_{0}\) is known/true, if \(l_{1}, \cdots , l_{m}\) are known/true and none of \(l_{m+1}, l_{n}\) are known. If a rule body is empty, that rule is called a fact and if the head is empty, it is called a constraint indicating that none of the answers should satisfy the body.

3.1 Translating the Model into ASP

In this section, we demonstrate how a planning problem \(P=(F\!L, \varDelta , A, G, N)\) can be mapped into an answer set program such that there is a one to one correspondence between solutions for the planning problem and the answers of the program. The mapping uses the following atoms: \(\texttt {state}(s\mathtt{)}\) for denoting the states; \(\texttt {time}(t,s\mathtt{)}\) to indicate the time at state s; \(\texttt {holdsat}(x,s\mathtt{)}\) to express fluent x is true in state s; \(\texttt {occurred(a,s)}\) to encode action a occurs at state s. There are additional atoms used in Figs. 2345 and 6, that will be discussed in their respective sections. Please note that the variables begin with capital letters in ASP.

Time and Initial State (Fig. 1). The facts produced by Line 1 provide the program with all available states, while Line 2 defines the order of states. The maximum number of states, q, results from sum of duration of all actions: \( q = \sum _{i=1}^{n} d(a_{i})\). The final state is therefore stated as sq in Line 3. Line 4 illustrates the initial time that increases by one unit from one state to the state next to it (Line 5). Finally, Line 6 encodes the fluents that hold at initial state s0.

Fig. 1.
figure 1

Rules for time component (Lines 1–5) and initial state (Line 6)

Actions (Fig. 2). Each durative action is encoded as \(\texttt {action}(a,d\mathtt{)}\) (Line 7), where a is the name of the action and d is its duration. Recalling from Sect. 2, the preconditions pr(a) of action a hold in state s if \(s \models pr(a)\). This is expressed in Line 8, where \(pr(a)^{+}\) and \(pr(a)^{-}\) are positive and negative literals in pr(a). In order to make the coding more readable we introduce the shorthand \(\texttt {EX(X,S)}\) where \(\texttt {X}\) is a set of fluents that should hold at state \(\texttt {S}\). For all \(x \in \texttt {X}\), \(\texttt {EX(X,S)}\) is translated into \(\texttt {holdsat(x,S)}\) and for all \(\lnot x \in \texttt {X}\), \(\texttt {EX(}\) \(\lnot \) \( \texttt {X,S)}\) is translated into \(\texttt {not EX(x,S)}\) using negation as failure. The agent has the choice to take any of its actions in any state (Line 9), however, the preconditions of a durative action should be preserved when it is in progress. A durative action is in progress, \(\texttt {inprog}(A,S\mathtt{)}\), from the state in which it begins to the state in which it ends at (Lines 10 to 11). Then, Line 12 rules out the execution of an action, when the preconditions of the action do not hold during its execution. In addition there should not be any action in progress in the final state (Line 13). Another assumption made in Sect. 2, is the prevention of parallel actions, which prevents the agent from starting two actions at the same time (Lines 14 to 15). Once an action starts in one state, the result of its execution is reflected in the state where the action ends. This is expressed through (i) Lines 16 to 17 that allow the add postconditions of the action to hold when the action ends, and (ii) Lines 18 to 19 that allow the termination of the delete postconditions. The termination happens in the state before the end state of the action. The reason for this is that all the fluents that hold in a state, hold in the next state unless they are terminated (Lines 20 to 21). Since the delete postconditions of an action are terminated in the state before the end state of the action, they will not hold in the following state, in which the action ends (i.e. they are deleted from the state).

Fig. 2.
figure 2

Rules for translating actions

Goals (Fig. 3). Line 22 encodes goal g with value of v. From Sect. 2, we have goal g is satisfied in state s if \(s \models r(g)\). This is expressed in Line 23, where \(r(g)^{+}\) and \(r(g)^{-}\) are the positive and negative literals in r(g).

Fig. 3.
figure 3

Rules for translating goals

Norms (Fig. 4). The conditional event-based norms that are the focus of this research are discussed in the previous section. Line 24 encodes norm n with penalty cost of c upon violation. Lines 25–39 deal with obligations and prohibitions of form: \(n=\langle d\_o, a_{1}, a_{2},dl, c \rangle \). In order to implement the concepts of norm compliance and violation described in Sect. 2.2, we introduce normative fluents \(o(n,a_{2},dl')\) and \(f(n,a_{2},dl')\) that first hold in the state in which action \(a_{1}\)’s execution ends. An obligation fluent \(o(n,a_{2},dl')\) denotes that action \(a_{2}\) should be brought about before deadline \(dl'\)or be subject to violation, whereas prohibition fluent \(f(n,a_{2},dl')\) denotes that action \(a_{2}\) should not be brought about before deadline \(dl'\) or be subject to violation. If \(a_{1}\) with duration d1 occurs at state S, where time is T, the agent has dl units time starting from end of action \(a_{1}\) (\(\texttt {T2=T1+}d1\)) to comply with the norm imposed on it. Lines 25–26 and 32–33 indicate the establishment of obligation and prohibition fluents.

In terms of compliance and violation, the occurrence of an obliged action before the deadline expires, counts as compliance (Lines 27 to 28) and the absence of such an occurrence before the deadline is regarded as violation (Line 30). Atoms \(\texttt {cmp}(\texttt {o|f}(n,a,\mathtt{DL) ,S)}\) and \(\texttt {vol}(\texttt {o|f}(n,a\mathtt{,DL),S)}\) are used to indicate compliance or violation of norm n in state \(\texttt {S}\). In both cases of compliance and violation, the norm is terminated (Lines 29 and 31). On the other hand, a prohibition is complied with if the forbidden action does not happen before the deadline (Lines 34 to 35) and is violated if it does happen before the deadline (Lines 37 to 38). As with obligations, after being complied with or violated, the prohibitions are terminated (rules 36 and 39).

Fig. 4.
figure 4

Rules for translating norms

3.2 Mapping of Answer Sets to Plans

In Sect. 2.2 we defined the criteria for a sequence of actions to be identified as a valid plan and solution for \(P=\langle F\!L, \varDelta , A, G, N\rangle \). Figure 5 provides the coding for the criteria. The rule in Line 41 is responsible for constraining answer sets to those that fulfil at least one goal by excluding answers that do not satisfy any goals. The input for this rule is provided in Line 40. Line 42 prevents satisfying two conflicting goals, hence guaranteeing the consistency of satisfied goals in a plan. Preventing the concurrency of conflicting actions, is implemented using Lines 43–44, by expressing that such two actions cannot be in progress together. Lines 45 and 46 provides the input for Lines 47 and 48, which exclude the possibility of satisfying a goal and complying with a norm that are conflicting. Note that the implementation prevents complying with conflicting norms automatically: (i) since it is not possible to execute two conflicting actions concurrently, if two obligations would require that, one of them has to be violated, while (ii) regarding conflicting obligation and prohibition, by definition, taking the obliged action by the agent and hence complying with the obligation causes the violation of the other norm that enforces the prohibition of taking the very same action, and vice versa.

Fig. 5.
figure 5

Solutions for problem P

Theorem 1

Let program \(\varPi _{base}\) consist of Lines 7 – 48. Given a planning problem \(P = ( FL , \varDelta , A, G, N)\), for every answer set Ans of \(\varPi _{base}\) the set of atoms of the form \(\texttt {occurred(a,s)}\) Footnote 1 in Ans encodes a solution to the planning problem P. Conversely, each solution to the problem P corresponds to a single answer set of \(\varPi _{base}\).

Proof

(sketch). The proof can be obtained through structural induction. Line 9 generates all sequences of actions. Line 6 ensures that all fluents in \(\varDelta \) hold at \(t_{0}\). Line 12 guarantees that the precondition of an action hold all through its execution. Line 41 indicates that a non-empty subset of goals has to be satisfied in a plan, while Line 42 ensures the consistency of the goals satisfied. Preventing the concurrency conflict is provided in Lines 43–44. Finally, Lines 47–48 eliminate the possibility of conflict between goals satisfied and norms complied with. This implies that the sequence of actions that is part of the answer set satisfies the conditions to be a solution to the encoded planning program. Conversely, each solution satisfies all the program’s rules in a minimal fashion.

3.3 Optimised Plans

In order to find optimal plans, in Fig. 6 we show how to encode the utility function defined by Eq. 10. The sum of values of goals satisfied in a plan is calculated in Line 49. The sum of costs of norms violated in a plan is calculated in Line 49, by first providing the input for this line in Lines 50 and 51. Having calculated \(\texttt {value(TV)}\) and \(\texttt {cost(TC)}\), the utility of a plan is denoted in Line 53, which is subject to the optimisation statement in the final line.

Fig. 6.
figure 6

Optimised solutions for P

Theorem 2

Let program \(\varPi = \varPi _{base} \cup \varPi ^{*}\), where \(\varPi ^{*}\) consists of Lines 49 – 54. Given a planning problem \(P = (F\!L, \varDelta , A, G, N)\), for every answer set Ans of \(\varPi \) the set of atoms of the form \(\texttt {occurred(a,s)}\) in Ans encodes an optimal solution to the planning problem P. Conversely, each optimal solution for the problem P corresponds to a single answer set of \(\varPi \).

Proof

(Sketch). Theorem 1 ensures that all solutions are represented by answer sets and vice versa. The optimality of solutions is guaranteed in this program. Line 54 ensures optimal solutions that maximise utility, which is in turn defined in Line 53 as the difference between the cost of violation (Line 52) and goal values (Line 49).

4 Illustrative Example

In this section, we provide a brief example that highlights the most important features of the proposed model. Let us consider an agent with the durative actions presented in Table 1. The agent has three goals presented with their requirements and two different set of values in Table 2. The first goal is to get some certificate that requires the agent to take some test, but in order to be able to attend the test, the agent first needs to pay the fee for the test. The second goal is to make a submission of some marking that needs to be done in the \( office \) and the last goal is to go on strike, for which the agent needs to be a member of union, not to go to \( office \) nor to attend any meeting on behalf of the company. In addition, one of the agent’s action, \(comp\_funding\), has a normative consequence captured in a norm that states that if company funds are used to pay the fee for the test, the agent is obliged to attend a meeting on behalf of the company within 1 time unit of end of action \(comp\_funding\), which results in the payment of the fee for the test. If the agent uses the funding, but does not attend the meeting before the deadline, it is entitled to the penalty cost of 4 units.

$$n = \langle o, comp\_funding, attend\_meeting, 1, 4\rangle $$

Table 3 shows the corresponding ASP code for this example based on the code in Sect. 3. For spacial reasons, only those rules that need instantiation are provided. For ease of reference, rules instantiated in each part of the code are titled by their corresponding figures in Sect. 3. Moreover, only one action, drive, and one goal, certificate, are encoded. The rest of the actions and goals can be coded in the same way.

Following Theorem 2, we obtain a one-to-one correspondence between the answer sets of the program in Table 3 and optimal plans for the agent to execute such that the agent utility is maximised. Table 4 illustrates the optimal plans (as translations of the answer sets) based on two different set of values in Table 2. Plan \(\pi _{1}\) satisfies goals certificate and strike, however due to the conflict between strike and norm n, the norm is inevitably violated. Additionally, the conflict between goal strike and submission, makes it impossible for the agent to satisfy submission. Since the sum of utility loss of violating n and not satisfying submission, is still less that the utility gain of satisfying strike, the agent prefers the former to the latter. On the other hand, in plan \(\pi _{2}\) satisfying submission is preferred over satisfying strike, although they have the same utility gain. However, satisfying strike would have implied violating n, and thus incurring the penalty cost of 4. Therefore, in pursuit of maximising the utility, the agent prefers satisfying submission and complying with n to satisfying strike and violating n, which was the case in plan \(\pi _{1}\).

Table 1. Agent Actions
Table 2. Agent Goals
Table 3. Instantiated ASP code
Table 4. Optimal plans

5 Related Work

The interaction between an agent’s individual goals and social norms has been discussed in a number of works. Some such as [26, 27] use utility measurement to enforce norm compliance. In contrast, in [25] norm compliance relies on the explicit interaction between goals and norms, but if the norm compliance or violation does not hinder any goals there is no connection and hence no computational mechanism in place that enforces the norms. From a planning perspective, norms are taken into account in plan generation [27] and in plan selection [20, 26]. In [27] the normative state of the agent is checked by a planner after each individual action is taken, which depending on the number of actions, imposes a high computational cost on the step-by-step generation of plans. It is the utility of individual actions here that determines norm compliance. On the other hand, [20, 26] consider norms as part of plan selection, starting from the assumption that the agent has access to a library of pre-generated plans. In contrast to all of [20, 26, 27], our work deals with both plan generation and plan selection while taking account of norms, and like [26] we focus on the utility of the entire plan, unlike [27] which only considers the constituent actions in sequence.

Some works [19, 32] focus on interaction between an agent’s goal and its commitments, where commitments are made by agents to one another in order to support the realisation of their goals. Our approach is different from these approaches for two main reasons: (i) commitments are deliberately made by the agent, whereas norms are externally imposed to the agent; and (ii) commitments are made to support satisfying goals, while imposed norms might be in conflict with the agent’s goals and consequently, hinder some of them.

The Event Calculus (EC) [21] forms the basis for the implementation of some normative reasoning frameworks, such as [2, 3]. Our proposed formal model is independent of language and could be translated to EC and hence to a computational model, but the one-step translation to ASP is preferred because the formulation of the problem is very similar to the computational model, thus there are no conceptual gaps to bridge. Furthermore, the EC implementation language is Prolog, which although syntactically similar to ASP, suffers from non-declarative functionality in the form of the cut operator, which results in a loss of completeness. Furthermore, its query-based nature that focusses on one issue at a time, makes it cumbersome to reason about all plans.

A final point is that the norm representation and implementation proposed here is expressive and realistic in respect of time and duration: specifically, since the formal model and ASP implementation handle time explicitly, it is straightforward to represent the norm deadline as a future time instant, rather than a state to be brought about.

6 Conclusions, Discussion and Future Work

An agent performing practical reasoning in an environment regulated by norms, needs constantly to weigh up the importance of goals satisfied and norms complied with against goals not satisfied and norms broken. This comparison is possible when the agent has access to all possible plans, such that the decision of which goals to pursue and which norms to respect is made based on their impact on the entire plan. We show how this impact can be captured in a utility function that permits the agent to execute a plan that maximises the utility.

The focus of plan selection in this paper is on maximising the agent utility by considering the value of goals and penalties for norm violation. While these are sensible criteria, there are others that can be taken into account. Given that actions modelled in this approach are durative, one such criterion is the duration of the entire plan. Since durative actions that do not have concurrency conflicts can be executed concurrently, there might exist some plans with the exact the same utility while one takes longer than another. We intend to extend the plan selection mechanism with additional criteria by using the existing multi-criteria optimisation mechanisms in ASP.

Just like norms, in real scenarios, goals often have a deadline before which they should be satisfied [18]). Temporally extended goals [17] are discussed in detail in agent programming languages such as GOAL [5], however they are not commonly used in practical reasoning frameworks. Substituting achievement goals with temporally extended goals increases the expressiveness of the model. It also allows defining conflict within goals and between goals and norms temporally and which results in enriching the concept of conflict in the model.

Incorporation plan revision is also an avenue for future work. As presented here, a plan once selected is acted out until its conclusion, but it is of course necessary to incorporate plan revision in order to handle the inevitable dynamic environment.

Another area of improvement is to extend the normative reasoning capability of the model by extending it for state based norms in addition to event-based norms. Such an extension would allow the expression of obligations and prohibitions to achieve or avoid some state before some deadline. A combination of event and state based norms [10] enriches the norm representation as well as normative reasoning.

Lastly, we intend to build on the current ASP implementation to provide justification for why a certain plan maximises the utility considering the goals and norms it satisfies against those it does not. A potential starting point is [31], where it is possible to explain why certain literals are part of an answer set of a program and why others are not.