Keywords

1 Introduction

Interaction Protocols are a key ingredient in MASs as they explicitly represent the agents expected/allowed communicative patterns and can be used either to check the compliance of the agent actual behavior w.r.t. expected one [1, 10] or to drive the agent behavior itself [4].

Interaction protocols are also crucial outside the MAS community: what we name an “Agent Interaction Protocol”, AIP, is referenced as a “Choreography” in the Service Oriented Computing (SOC) community [36] and as a “Global Type” in the multiparty session types one [15, 31]. In the MAS community, AIPs describe interaction patterns characterizing the system as a whole. This global viewpoint is supported by many formalisms and notations such as AUML [32], commitment machines and their extensions [11, 12, 22, 40, 42], the Blindingly Simple Protocol Language (BSPL) and its Splee extension [21, 37], the Hierarchical Agent Protocol Notation (HAPN) [41], trace expressions [7]. A systematic comparison of these approaches – apart from AUML which is not formal, and not supported by a textual notation – can be found in [9].

When moving from the specification to the execution stage, the AIP must be enacted by agents in the MAS: besides the global description of the protocol, the “local” description of the AIP portion each agent is in charge of, is required to run the AIP. The AIP enactment is usually left to Computer-Aided Software Engineering tools that move from AIP diagrams directly into agent skeletons in some concrete agent oriented programming language [23, 30], or to algorithms that translate the AIP textual representation to some abstract, intermediate formalism for modeling the local viewpoint [19, 26]. Such intermediate formalisms are not perceived as the main target of the MAS research and no standardization effort has been put on them. In the SOC community, on the contrary, formalisms exist for modeling both the global and the local perspectives. As observed by [35], WS-CDLFootnote 1 follows an interaction-oriented (“global”) approach, whereas in BPEL4ChorFootnote 2 the business process of each partner involved in a choreography is specified using an abstract version of BPELFootnote 3: BPEL4Chor follows a process-oriented (“local”) approach. In the multiparty session types community, the main emphasis is on type-checking aspects: the formalism used to represent global types is relevant, as well as its expressive power, but even more relevant are the properties of the “global to local” projection function w.r.t. type-safety [25]. Finally, in the Distributed Runtime Verification (DRV) community [29], one of the most relevant issues is how to automatically generate a monitor (or a set of monitors) from a given protocol, and to use it (them) to dynamically monitor the existing system’s behaviour.

Our work lies in the intersection of the MAS and DRV research areas. We are interested in exploring under which conditions a good AIP can turn into a bad one, if it is possible to exploit such bad protocols anyway for DRV purposes, and in providing an implemented tool for automatically generating a set of MAS monitors under these conditions. To this aim, we developed RIVERtools [27] which supports the development of AIPs modeled using the trace expressions formalism [3, 7] via a user-friendly GUI. RIVERtools implements static controls to check if a protocol is good, ugly or bad, and supports the user in finding out how to (partially) decentralize the runtime verification process also in the last case [28]. The adoption of RIVERtools for this purpose in exemplified in the companion demo paper [6], which addresses the most practical aspects of our investigation: this paper is mainly centered around the notions of bad, good, and ugly protocol in the literature, and on the observability-driven transformation of AIPs which, under known environmental/network constraints, may turn a good protocol into a bad one which can nevertheless serve for DRV purposes.

2 The Good, the Bad and the Ugly

The Good. Let us consider the following interaction protocol expressed in natural language:

  1. 1.

    Alice sends a whatsApp message to her mother Barbara asking her to buy a book (plus some implausible excuse for not doing it herself...);

  2. 2.

    Barbara sends an email message to her friend Carol, responsible for the Book Shop front end, to reserve that book;

  3. 3.

    Carol receives Barbara email and sends a whatsApp message to Dave, the responsible of the Book Shop warehouse, to check the availability of the book and order it if necessary;

  4. 4.

    Dave checks if the book is available in the warehouse;

    1. (a)

      if it is,

      1. (i.)

        he sends a whatsApp message to Emily who is in charge for physically managing the books and informing the clients if their requests can be satisfied immediately;

      2. (ii.)

        Emily takes the book to the front end and sends a confirmation email to Barbara telling that the book is already there;

    2. (b)

      otherwise,

      1. i.

        Dave sends an email to Frank, the point of contact for the publisher of the required book, and orders it;

      2. ii.

        Frank sends a confirmation to Barbara via whatsApp telling her that the book will be available in two days.

For readability, we express the protocol using a more compact and formal syntax where \( a {\mathop {\Longrightarrow }\limits ^{{mns, cnt}}} b \) stands for “agent \( a \) sends a message with content cnt via communication means mns to agent \( b \)”. The symbol: stands for the prefix operator (\(int {:}P\) is the protocol whose first allowed interaction is int, and the remainder is the protocol P) and has precedence over the other operators, \({\vee }\) stands for mutual exclusiveness, and \(\epsilon \) represents the empty protocol:

figure a

P1 receives the unanimous appreciation whatever the research community. In fact, two very intuitive properties are met: 1. apart from the first one, the message that each agent sends is a reaction to the message it just received, and there is an evident cause-effect link between two sequential messages; 2. in case some mutually exclusive choice must be made, the choice is up to only one agent involved in the protocol, and hence it is feasible. These good properties take different names depending on the research communities and on the authors. The first one is named, for example, “sequentiality” [20], “connectedness for sequence” [35], “explicit causality” [37]; the second “knowledge for choice” [20], “local choice” [34], “unique point of choice” [35]. Meeting these two properties is closely related to the absence of covert channels; they ensure that all communications between different participants are explicitly stated, and rule out those protocols whose implementation or enactment relies on the presence of secret/invisible communications between participants: a protocol description must contain all and only the interactions used to implement it [20].

The Bad. Those protocols that do not respect the connectedness for sequence and unique point of choice properties are bad, and it is not difficult to see why. Let us consider protocol P2:

figure b

The protocol states that \( carol \) can send a checkAvail message to \( dave \) only after \( alice \) has sent a buy message to \( barbara \), but how can \( carol \) know if and when \( alice \) sent that message? Also, the protocol states that either \( dave \) sends a message to \( emily \), or \( frank \) sends a message to \( barbara \): how can \( frank \) know if he is allowed to send a message to \( barbara \), without coordinating with \( dave \) via some covert channel not shown in the protocol?

The Ugly. Protocols which are not syntactically correct are ugly, and are ignored by all the authors. However, some protocols may be ugly even if they are syntactically correct, for example if they are characterized by:

  • “Causality unsafety”: consider the two shuffled sequences \( carol {\mathop {\Longrightarrow }\limits ^{{wa, buy}}} dave {:}\epsilon \) \({|} alice {\mathop {\Longrightarrow }\limits ^{{wa, buy}}} barbara {:}\epsilon \), where \({|}\) models interleaving (a.k.a. shuffling) between two protocol branches; suppose we are only able to observe what \( alice \) sends, and what \( dave \) receives. If \( alice \) sends buy and \( dave \) receives buy, we might think that the protocol above is respected. However, that observation might be due to \( alice \) sending buy to \( dave \), which is not an allowed interaction: the protocol is not causality safe [35].

  • “Non-Determinism”: given an interaction taking place in some protocol state, we might want to deterministically know how to move to the next state. For example, if \( alice \) asks her mother to buy a book, and the protocol is

    figure c

    we could move on either branch. If we opt to move on the first branch, the next expected action is that \( barbara \) asks \( carol \) to reserve a book. If, instead, \( barbara \) tells \( alice \) to buy the book by herself, we have to backtrack to the previous protocol state in order to check that this interaction is allowed as well; this is extremely inefficient and should be avoided [7]. While the notions of good and bad protocols are universally recognized, ugliness also depends on the formalism and its expressive power.

Can We Simply Ignore Bad Protocols? Let us suppose that the protocol is used for monitoring purposes: it does not need to be implemented or enacted. The agents in the MAS are already there, and they are heterogeneous black boxes behaving according to their own policies and goals, in full autonomy. The monitor observes messages that agents exchange, in a completely non obtrusive way, checks their compliance with the protocol ruling the MAS, and reports violations to some other entity or to the human(s) in charge for the security and safety of the monitored system.

Let us also suppose that the MAS protocol is P1 and the human(s) who set up the monitoring process know in advance that email messages cannot be observed by the monitor, for infrastructural, legal, or other reasons. Keeping interactions taking place via email in the protocol that the monitor will check would lead to false positives, as the monitor would look for messages foreseen by the protocol that it cannot see and would hence detect a protocol violation, but removing them from P1 leads to P2: a bad protocol! If the monitor observation ability is not perfect – which is an extremely realistic situation – there is no gain in struggling against bad protocols: unobservable interactions are there and generate the same problems of covert channels. We may state that what is “bad” in this situation is the monitor, since it is not able to observe everything should be observed, and not the AIP. This is another way to analyze the situation, but the problem still remains: a monitor with imperfect observation capability cannot be driven by a “perfect” protocol which includes unobservable messages. Rather, it might need to be driven by what we classified a “bad” protocol, describing all and only the messages that the monitor can actually observe.

Since we cannot ignore bad protocols, we opted for deepening our acquaintance with them. We developed RIVERtools, a tool able to (1) generate bad protocols starting from good ones with known unobservable channels, to correctly drive monitors with limited observation ability, and (2) explore the usage of bad protocols for Distributed Runtime Verification under suitable conditions. The next two sections introduce trace expressions and the algorithm implemented by RIVERtools to address the first point above. The second issue, namely how to partially decentralize the runtime monitoring process also when the AIP is bad, has been addressed in [28].

3 Background: Modeling AIPs with Trace Expressions

The general mechanism we propose for taking unobservable events into account during MAS monitoring, discussed in Sect. 4, can be applied to any formalism. However, to make our proposal more practical, we will discuss the algorithm we implemented for a specific formalism, trace expressions [3, 5, 7], and we briefly introduce it in the sequel. Trace expressions are based on the notions of event and event type. We denote by \(\mathcal {E}\) the fixed universe of events subject to monitoring. An event trace \(\bar{e}\) over \(\mathcal {E}\) is a possibly infinite sequence of events in \(\mathcal {E}\), and a trace expression over \(\mathcal {E}\) denotes a set of event traces over \(\mathcal {E}\). Trace expressions are built on top of event types (chosen from a set \(\mathcal {ET}\)), each specifying a subset of \(\mathcal {E}\).

The semantics of event types is specified by the function \( match \): if \(e\) is an event, and \(\vartheta \) is an event type, then \( match (e,\vartheta )\) holds if and only if event \(e\) matches event type \(\vartheta \); hence, the semantics \(\llbracket {\vartheta }\rrbracket \) of an event type \(\vartheta \) is the set \(\{ e\in \mathcal {E}\mid match (e,\vartheta ) \text { holds} \}\).

A trace expression \(\tau \in \mathcal {T}\) represents a set of possibly infinite event traces, and is defined on top of the following operators:

  • \(\epsilon \) (empty trace), denoting the singleton set \(\{\epsilon \}\) containing the empty event trace \(\epsilon \).

  • \(\vartheta {:}\tau \) (prefix), denoting the set of all traces whose first event \(e\) matches the event type \(\vartheta \), and the remaining part is a trace of \(\tau \).

  • \(\tau _1{\cdot }\tau _2\) (concatenation), denoting the set of all traces obtained by concatenating the traces of \(\tau _1\) with those of \(\tau _2\).

  • \(\tau _1{\wedge }\tau _2\) (intersection), denoting the intersection of the traces of \(\tau _1\) and \(\tau _2\).

  • \(\tau _1{\vee }\tau _2\) (union), denoting the union of the traces of \(\tau _1\) and \(\tau _2\).

  • \(\tau _1{|}\tau _2\) (shuffle), denoting the set obtained by shuffling the traces of \(\tau _1\) with the traces of \(\tau _2\).

Trace expressions support recursion through cyclic terms expressed by finite sets of recursive syntactic equations, as supported by modern Prolog systems. If \( match ( alice {\mathop {\Longrightarrow }\limits ^{{wa, buy}}} barbara , buy)\), \(P4 = buy {:}(\epsilon ~{\vee }~P4)\) denotes the protocol where \( alice \) may send one buy request to \( barbara \) and either terminate (\(\epsilon \)) or start the protocol again (\({\vee }~P4\)). The traces denoted by P4 are

figure d

namely, traces consisting of n instances of event \( alice {\mathop {\Longrightarrow }\limits ^{{wa, buy}}} barbara \), with \(n \ge 1\), plus the infinite trace. Infinite traces allow us to model systems that ought not to terminate, such as those involving a “service provider agent” that must be always be ready to answer the requests of its clients. We represent the infinite trace in an explicit way, to distinguish it from finite traces of any length.

Some Considerations on Events and Event Types. For presentation purposes, the protocols shown in Sect. 2 did not include event types but events. A simplified variant of P1 with event types is P5, where

and, for example,

figure e

The sequence \( barbara {\mathop {\Longrightarrow }\limits ^{{em, reserve}}} carol ~~ hillary {\mathop {\Longrightarrow }\limits ^{{wa, checkAvail}}} dave \) (namely \( barbara {\mathop {\Longrightarrow }\limits ^{{em, reserve}}} carol \) followed by \( hillary {\mathop {\Longrightarrow }\limits ^{{wa, checkAvail}}} dave \)) is correct w.r.t. P5 since the first event matches bookReservationReq, after which an event matching availabilityCheckReq is expected and in fact the second event matches it. Clearly, this sequence of messages does not make sense: we would like to state that the receiver of the first message must be the sender of the second one, but we cannot express such a constraint if the event type and protocol languages do not support variables. Parametric trace expressions [8] overcome this limitation by introducing parameters in the trace expression formalism. For space constraints we do not discuss them, but parameters are the element of the formalism that allows us, for example, to support multiple concurrent conversations and correctly track which message instance belongs to which conversation. The algorithm presented in Sect. 4 works for parametric trace expressions as well.

As far as observability is concerned, an event type bookReservationReq that models reservation requests whatever the communication means used to send them (whatsApp or email) makes sense in most situations, apart those where the observability of messages is different depending on the communication means.

The sequence \( barbara {\mathop {\Longrightarrow }\limits ^{{em, reserve}}} carol ~~~~ carol {\mathop {\Longrightarrow }\limits ^{{wa, checkAvail}}} dave \) is correct w.r.t. P5, and the same holds for \( barbara {\mathop {\Longrightarrow }\limits ^{{wa, reserve}}} carol ~~~~ carol {\mathop {\Longrightarrow }\limits ^{{wa, checkAvail}}} dave \), but they might lead to different monitoring outcomes if the likelihood of \( barbara {\mathop {\Longrightarrow }\limits ^{{em, reserve}}} carol \) to be observed by the monitor is different from that of \( barbara {\mathop {\Longrightarrow }\limits ^{{wa, reserve}}} carol \).

In Sect. 4 we will limit our investigation to non-deterministic and contractive trace expressions (for example, definitions like \(P = P {\vee }P\) are not contractive, as there is no means to “consume” interactions from P while rewriting it) and we will assume that event types model only sets of events whose observability likelihood is equivalent.

4 Partial Observability: When the Good Becomes Bad

In this section we discuss how a good protocol may become a (possibly) bad one, due to unobservability or partial observability of events. The human beings in charge for the system monitoring process may use the algorithm we present if they are aware of limited observability of events or channels, in order to generate a monitor which will check the system’s compliance with the protocol output by the algorithm (possibly different from the original one, and possibly “bad”) and avoid raising false alarms. Given a trace expression modeling an AIP, the monitor that dynamically verifies it can be generated automatically both for Jade [14] and for Jason [16], as discussed in [17] and [5] respectively.

For sake of readability, let us suppose that we are in charge for the monitoring process. We must associate with each event foreseen by the protocol, its “observability likelihood”, namely the likelihood that the event can be observed by the monitor. Of course, we must know this parameter, which depends on the system to be monitored and on its context. If, when the event takes place, the monitor can always observe it, we associate 1 with the event. If the monitor can never observe the event (for example, the monitor can sniff whatsApp messages only, and the event is an email message), we associate 0 with it. If the event is transmitted over an unreliable or leaky channel, we may associate a number between 0 and 1, excluding the extremes, with it.

Let us consider P1 again, and let us suppose that:

  1. (1)

    the observability likelihood of messages exchanged via email is 0;

  2. (2)

    the observability likelihood of whatsApp messages sent by \( frank \) is 0.95;

  3. (3)

    the observability likelihood of the other whatsApp messages is 1.

Condition 1 forces us to remove all messages exchanged via email from the protocol, and condition 3 forces us to keep all the other whatsApp messages but those sent from \( frank \). The first and last conditions would lead to protocol P2. The second condition, however, requires a special treatment. In fact, message \( frank {\mathop {\Longrightarrow }\limits ^{{wa, okOrder}}} dave \) could be either observed or not and both cases would be correct, even if the first one should be much more frequent than the second.

The subprotocol where \( frank {\mathop {\Longrightarrow }\limits ^{{wa, okOrder}}} dave \) can either take place or not can be modeled by \( frank {\mathop {\Longrightarrow }\limits ^{{wa, okOrder}}} dave ~ {:}~ \epsilon ~~{\vee }~~\epsilon \). The transformation from P1 to the protocol which takes observability likelihood into account requires the following steps:

  1. (1)

    since the observability likelihood of messages exchanged via email is 0, remove them by P1;

  2. (2)

    since the observability likelihood of the whatsApp message sent by \( frank \) is 0.95, substitute it with the corresponding subprotocol where the message can take place or not, and concatenate this subprotocol with the remainder;

  3. (3)

    since the observability likelihood of the other whatsApp messages is 1, keep them all.

The result is

figure f

Since dealing with likelihoods in (0, 1) results into a more complex protocol, as the original protocol must be extended with the choice between observing the event or not, we might want to collapse likelihoods greater than a given threshold to 1, to avoid proliferation of choices: we may set a threshold above which events will be considered fully observable. Let Th be such threshold and P be the protocol to transform.

\(P'\) is obtained by P applying the following rules; L is the observability likelihood of interaction int

  1. (1)

    if \(L > Th\), int is kept;

  2. (2)

    if \(0 < L \le Th\), int is transformed into the subprotocol where int can either take place or not, and suitably concatenated with the remainder;

  3. (3)

    if \(L = 0\), int is discarded.

Since different monitors might observe different events and observability might change over time, causing an evolution of the observable protocol, modeling the good global protocol and then transforming it based on contingencies is a better engineering approach than directly modeling the partial, observable protocol. However, even if we start from a good P protocol, \(P'\) might be bad or even ugly.

Observability-Driven Transformation of Trace Expressions. We implemented the algorithm sketched above for protocols modeled as trace expressions. The code has been developed in SWI-Prolog (http://www.swi-prolog.org/) and, for space constraints, is made available in a longer version of this paper available at http://www.disi.unige.it/person/MascardiV/Download/PAAMS-long18.pdf. The code for filter_events implementing the observability-driven transformation is 36 lines long and – provided a basic knowledge of logic programming – is self-explaining. Despite its simplicity, it can operate on very complex parametric and recursive (also non terminating) protocols. The magic behind the “invisible” management of non terminating protocols like P4 is the use of the SWI-Prolog coinduction library (http://www.swi-prolog.org/pldoc/doc/_SWI_/library/coinduction.pl) which allows to cope with infinite terms without entering into loops. RIVERtools [6, 27] supports the development of AIPs modeled as trace expressions via a user-friendly GUI and implements static controls to check if a protocol is ugly, bad or good, suggesting how to decentralize the runtime monitoring process even in presence of bad protocols.

Experiments. We have experimented the filtering algorithm on a parametric trace expressions modeling the English Auction where the auctioneer proposes to sell an item for a given price and the bidders either accept or reject the proposal; as long as more than one bidder accepts, the price – which is a parameter of the protocol – is raised and another negotiation round is made. The protocol is consistent with the existing descriptions of the English Auction that can be found online, even if it slightly differs from the English Auction FIPA specification. The protocol description, as well as its code, can be downloaded from http://parametricTraceExpr.altervista.org/. We initially assumed that the only partially observable event was buy(X) with observability likelihood 0.5. By setting the threshold to 0.7, all occurrences of buy(X) became optional, while with a threshold equal to 0.4 they were all kept in the protocol. By setting the observability likelihood of buy(X) to 0, any occurrence of buy(X) was removed from the protocol.

The filtering algorithm was also run on a variant of the Alternating Bit Protocol [25] with 6 agents. Different observability likelihoods and different thresholds were set with both protocols, to test the algorithm in an exhaustive way.

5 Related Work and Conclusions

To the best of our knowledge, MAS monitoring under partial or imperfect observability has been addressed in the context of normative multiagent organizations only, and by just a few works. Among them, [2] spun off from [18] and shows how to move from the heaven of ideal norms to the earthly condition of approximate norms. The paper focuses on conditional norms with deadlines and sanctions [39]; ideal norms are those that can be perfectly monitored given a monitor, and optimality of a norm approximation means that any other approximation would fail to detect at least as many violations of the ideal norm. Given a set of ideal norms, a set of observable properties, and some relationships between observable properties and norms, the paper presents algorithms to automatically synthesize optimal approximations of the ideal norms defined in terms of the observable properties. Even if the purpose of our work is in principle similar to that of [2, 18], the approaches used to model AIPs are too different – also in expressive power – to compare them. A more recent work in the normative agents area is [24] that proposes information models and algorithms for monitoring norms under partial action observability by reconstructing unobserved actions from observed actions. While we assume to know in advance which events cannot be observed and we transform the ideal protocol into a monitorable one based on this information, the authors of [24] “guess” the actions that the monitor could not observe, but that must have taken place because of their visible effects. Outside the MAS community, partial ability of monitors to observe events is a well studied problem in many contexts including command and control [43] and runtime verification. In [38] the authors address the problem of gaps in the observed program executions. To deal with the effects of sampling on runtime verification, they consider event sequences as observation sequences of a Hidden Markov Model (HMM), and use an HMM of the monitored program to fill in sampling-induced gaps in observation sequences, and extend the classic forward algorithm for HMM state estimation to compute the probability that the property is satisfied by an execution of the program. Similarly to [24], that work complements ours by estimating the likelihood of an event to occur, whereas we assume to know that likelihood, and we transform the protocol – and hence the expected sequence of observed events – based on this knowledge. Other works pursuing the objective of suitably dealing with “lossy traces” in the runtime verification area are [13, 33].

As part of our future work, we will evaluate the possibility to integrate our approach with those that complement it, like [24, 38]. We expect that this might lead to some interesting result: on the observation (by the monitor) that the system is compliant with the bad AIP \(P'\), after many observations we might be able to state that “probably” the system is also compliant with the good AIP P. This research issue is very challenging: our roadmap involves deepening our understanding of the techniques that other researchers exploit to estimate the likelihood of unobserved events, and then merging those techniques into the algorithms supported by RIVERtools. We will also consider if and how our approach could be used to evaluate the protocol robustness to the presence of leaky and unreliable channels.