1 Introduction

Runtime enforcement consists in preventing the violation of a specification by using the so-called enforcers, which alter the execution whenever necessary. Conceptually the execution is typically abstracted as a trace, that is a sequence of system states. An enforcer takes as input such trace, modifies it if needs be to comply with the specification and then produces it as output. Existing enforcement frameworks are defined in the so-called centralized setting where there is a single enforcer acting on a global observation and control point.

As systems become increasingly decentralized (e.g. finance, vehicles, drone swarms), it is desirable to be able to ensure their critical properties while preserving their decentralization. In decentralized enforcement, a system consists of several components with one enforcer attached to each component. Since every enforcer can only observe what is happening locally, they need to communicate to gather information on the whole system and collaborate to modify the current execution.

In [13], we introduced two algorithms for the decentralized enforcement of properties specified using Linear-time Temporal Logic (LTL) [19]. Both of these algorithms are online algorithms that modify the current event if by appending this event to the trace output so far, this would violate the property. These algorithms differ by how they compute the corrected event. In the so-called global algorithm, all the possible alternatives for the emitted event are explored so that we are guaranteed to find the most optimal correction, whereas, in the so-called local-incremental algorithm, safe choices are made on each component in an incremental manner, without backtracking. However, these algorithms were not implemented nor evaluated.

This paper introduces Decent, a simulation environment that implements the two decentralized enforcement algorithms to benchmark them against randomly generated formulas and patterns or using a specific formula. In Decent, enforcement is performed offline on randomly generated traces as the goal is to get results on the performance of the algorithms independently of a system. In practice, however, our algorithms also work for online enforcement by reading the execution as it is produced by the system under scrutiny. The paper is structured as follows. Sect. 2 recalls the main principles of the approaches we presented previously. In Sect. 3, we overview the tool, and in Sect. 4 we discuss how we evaluated the algorithms, and we present the results. In Sect. 5, we discuss related work. Finally, we conclude in Sect. 6.

2 Principles of Decentralized Enforcement

We briefly overview the decentralized enforcement algorithms and refer to [13] for a formal definition. In the decentralized setting, multiple enforcers communicate to gather information. Whenever they observe an event, the enforcers use their internal memory to compute alternatives to the observed event in case of a violation. Common to the two enforcement algorithms are the following steps. The enforcers maintain a formula to enforce at any time, which is rewritten using progression [1] to separate the present from the future obligations. Then, in turn, the enforcers partially evaluate the formula using every possible assignment of their local atomic propositions while keeping track of the number of modifications compared to the original observed event. If one of the assignments leads to the partially evaluated formula being simplified to \(\bot \), then the corresponding event is a violation and is removed from the memory of the enforcer.

The algorithms differ in their strategy to modify their local observation. In the global algorithm, once an enforcer is done evaluating the formula, it sends its memory to the next enforcer. Once the present obligations of the formula are entirely evaluated, the last enforcer applies a decision rule to pick which event to emit and communicates its decision to all the others. In the local-incremental algorithm, once an enforcer has evaluated the formula, it applies a local decision to send a single partial event (that may already be different from the observed event) to the next enforcer instead of the whole memory (which contains multiples events and the associated partially evaluated formulas). When the last enforcer is done, it applies the local decision rule to determine the event to emit. In this case, the last enforcer communicates the next formula to enforce as the other enforcers have already decided which event to emit locally.

We proved that the algorithms guarantee classical properties in enforcement: they are sound, meaning that the global output sequence of the enforcers does not violate the specification, and transparent, meaning that the global event is only modified if it does not comply with the specification. Additionally, the global algorithm is also optimal because the number of modifications to atomic propositions is minimal.

3 Decent Overview

DecentFootnote 1 implements the algorithms mentioned above using the functional programming language OCaml (in about 2200 LLOC). We reused some modules implemented in DecentMon [2, 3], mainly the implementation of LTL, events, traces and the associated generators. We implemented a few additional modules for the centralized and decentralized enforcement algorithms. The other functionalities of DecentMon are left as is to allow monitoring or enforcement of formulas.

With Decent, we can either parse specific formulas given in a file or randomly generate them. In both cases, formulas are enforced against a randomly generated trace using the decentralized algorithms as well as a “centralized" orchestration-based enforcement algorithm to compare them. When generating random formulas, it is possible to either specify a (maximum) size or to choose the specification patterns (defined in [6]) as templates to generate properties. Moreover, formula generation can be “biased" to place using more atomic propositions on a component. The underlying system is represented by an alphabet expressing how the atomic propositions are spread over the components. The alphabet is given as a command line argument, but it is also possible to use multiple different alphabets given in a file (to vary the distribution of the atomic propositions over the components or to add extra components/atomic propositions, for example). When generating traces (i.e. lists of events), each atomic proposition of the alphabet has a fixed probability of being included in each event (flip coin distribution), and it is possible to choose a different probability distribution such as Bernoulli, exponential or beta. Finally, Decent offers two enforcement modes: optimistic or pessimistic. In the former, possible alternative local events are computed only when the observed event leads to a violation (i.e. after performing a round of verification first), while in the latter, local alternatives are always computed (i.e. at the same time as the verification). By default, the pessimistic mode is used.

4 Evaluating the Algorithms

We define the evaluation metrics in Sect. 4.1 and present the experiments to compare the algorithms in Sect. 4.2.

4.1 Metrics

The metrics we consider for this benchmark are mainly related to messages and the internal memory of the enforcers. We explain how measuring these metrics helps to understand the intricate behavior of enforcement algorithms.

Number of Modifications of the Events. As mentioned in Sect. 2, the local algorithm does not guarantee optimality. This metric allows us to observe how far is the local algorithm from optimality. It is worth noting that optimality is defined for each event individually when compared to the observed event, not for the whole trace. More specifically, it is possible (although quite rare from our experiments) for the total number of modifications in the enforced trace produced with the local algorithm to be lower than it would have been using the global algorithm (on the same trace and formula). As the local algorithm relies on local information to decide the event to send to the next enforcer, it is impossible to guarantee that the two algorithms emit the same event. Consequently, the algorithms can end up with a different formula to enforce at some point (since the formula depends on the emitted event). Therefore, getting a stricter formula in the global version is possible in which the next observed event is a violation (forcing the enforcers to modify it). However, it is not in the local version. The number of modifications should, however, be identical between the global version and the centralized one because they both explore every alternative and should therefore be able to pick the same verdict every time. To see whether these differences are common, we also measured the number of differing events and the number of events with a different number of modifications between the enforced traces. Result tables show the average number of modifications in column #mod.

Number and Size of Messages. To measure the intensity of the communication required between monitors, we also measure the number (#msg) and the size of messages in the number of bytes used to encode them (|msg|). We only count the messages sent during the main evaluation process and not the ones sent after the final decision as their size depends on an implementation choice, that is, whether we send only the enforced event (which means the enforcers have to re-compute the next formula to enforce) or both the event and the next formula. In both cases, the last enforcer has to send a message to all the others, which causes the same number of extra messages as in the centralized version (the central enforcer also has to notify the local enforcers of the verdict). If we only send the event, the extra messages contain the same information as in the centralized version (so their size would be close if not identical). Otherwise, the final messages are larger in the decentralized version as they contain the next formula to enforce.

Size of the Temporal Correction Log. We also measure the size of the tcl (in bytes, given in |tcl|), i.e. the main internal structure used by the enforcers to compute the alternatives to compare the usage of the internal memory.

4.2 Experiments

For each experiment, we compare the performances of three algorithms (centralized “orchestration-based" [4], decentralized “global" and decentralized “local") using the same randomly generated traces and formulas (with some added constraints on the generation depending on the experiment). In all of them, we generate 1000 formulas and traces of size 100 (the size of a trace is the number of events composing it). We did not use larger traces because 1) it made the execution time of the experiments much longer while not changing much in the results but mainly, 2) it made the pathological cases much worse. We will talk about the latter in more detail with the first experiment. Also, we did not measure the execution time or the delays that would be caused by the algorithms because the results would not be realistic as we do not take into account the cost of communication between the enforcers and because interacting with the system in a real application may have a significant impact on the overhead. We detail the two experiments in the following (an additional one can be found in Appendix A).

Varying Formula Size. Here, the generated formulas are of different sizes (from 1 to 6) to assess the scalability of our approach. As in DecentMon, we use the maximum nesting of operator as a measure of size because it reflects well the difficulty to evaluate a formula. For example, \({\textbf {G}} (a \vee b)\) has a size of 2 and \({\textbf {G}} b \vee {\textbf {F}} \lnot a\) has a size of 3. Also, it is worth noting that we consider the size before we apply a simplification on the formula (so the simplified one could be smaller). We used the alphabet \(\{a1,a2|b1,b2|c1,c2\}\), that is, there are 3 components in the system that can observe 2 atomic proposition each (for instance, the first component can observe a1 and a2).

Table 1. Results of the first experiment: varying formula size.
Table 2. Differences between the enforced traces in the first experiment.

Table 1 and 2 show the results of the first experiment. Here, the average number of modifications between the three version is almost identical. Although it is an average over the whole trace, it at least means that, in practice, the local algorithm is very close to the other versions (even though it does not guarantee optimality). It is worth noting that #mod sometimes has a lower value for the local algorithm. This could be caused either by what we mentioned in the previous section or because, in some cases, we can reach a state where the next formula to enforce is \(\top \) (at which point we stop the enforcement). As we cannot guarantee that the local algorithm outputs the same trace as the others, it may miss this state and enforce a few more events before reaching it. Because of this, as we do the average of #mod on the whole trace, if the trace given by the local version is larger, it skews the average by a little bit. Aside from this, the cost in terms of local memory usage (|tcl|) is greatly reduced by the local algorithm (upwards of almost 3 times smaller with formulas of size 6) and the number of messages (#msg) is much smaller with the decentralized algorithms. There are fewer messages because the enforced formulas do not necessarily contain every atomic proposition of the alphabet, and therefore, some enforcers may not need to do any work. However, the size of messages (|msg|) is much larger than in the centralized algorithm because they contain more information (alternatives and the associated partially evaluated formulas vs only the local observations in the centralized version). It is worth noting that messages with the local version are much smaller than with the global one (about 2 to 4 times depending on the size of the formula). Moreover, the traces produced by the centralized and the global algorithm are identical. The traces given by the local version are almost identical, with about one event different from the other two. This supports our claim that the local version is very close to being optimal despite not guaranteeing it.

An issue of this experiment is that some formulas are particularly bad for these methods because the rewriting to separate their present and future obligations cause an explosion of their size. As we generate some random formulas, we may get one of these pathological cases for only one size and not the others, which would majorly impact the results for this size. Typically, pathological cases are formulas with many Until (U) and Globally (G) operators interleaved so they are quite uncommon for smaller formulas and we have not seen any using specification patterns (i.e. the next experiment). As we mentioned earlier, these formulas are one of the reasons why we chose to use traces of this size (with larger traces, the execution would take way too much time to complete as the formula size tends to increase after each event in these cases).

Using Realistic Specifications. In this experiment with realistic specifications, we used formulas generated with the LTL specification patterns [6] (we omitted universality response chain because of size constraint, the complete tables are available on the tool repository). Here, we also compare the two enforcement modes optimistic and pessimistic (on the same formulas/traces by setting a seed). We used the same alphabet as in the previous experiment (Table 4).

Results of the second experiment: using specification patterns

Table 3. Differences between the enforced traces in the second experiment.
Table 4. Pessimistic algorithm
Table 5. Optimistic algorithm

Tables 3 to 5 give the results of this experiment. There is still an improvement in memory usage (|tcl|) using the local algorithm. However, it is not as large as in the previous experiment (about two times smaller at most instead of three previously, and the same observation applies to the message sizes). A notable difference with the pessimistic mode is that, for some patterns, there seems to be a large difference in the number of messages between both decentralized algorithms: for instance, with the precedence pattern, there were almost twice as many messages sent in the global version on average. Using the optimistic mode, the memory usage is improved for some patterns (e.g. bounded existence) but it is also worse for others (e.g. response). It seems that this mode is better for certain types of formulas although it is hard to tell exactly because our metrics are not well suited to compare both. For example, with |tcl|, we only include in the average the results of runs where enforcement was required. Otherwise, it would skew the result and make it seem like optimistic is significantly better, which is untrue as the (enforcement) cost is identical to the other mode when enforcement is required (and null when it is not). However, this metric largely depends on the enforced formula and has consequently a large variance over the different runs. Therefore, if the runs that required enforcement are the ones where |tcl| was large, then the average will be worse (or at best similar) than what we get with the pessimistic mode, even though we might have gained a lot overall in computation time by not doing useless enforcement steps. The main drawback of this mode is that the overhead is larger when enforcement is required because of the initial verification phase. Table 5 shows that, for some patterns, the local algorithm produces traces that have more differences compared to the other two versions than with random formulas (mainly constrained chain where we measured 4 different on average). We have not included a table showing the difference between the enforced traces with the optimistic mode because the values are identical (i.e. the exact same traces are produced).

5 Related Work

There are many approaches to tackling the problem of decentralized monitoring (see [12] for an overview). The closest approaches to ours are the ones using formula rewriting on LTL [2, 3, 22]. Other methods use different formalisms to express the specifications like finite-state automata [9] or Stream Runtime Verification (SRVs) [5] or have different assumptions on the system like in monitoring decentralized specifications [7], that is having an independent specification for each component instead of one for the whole system. The aforementioned approaches have been implemented in various tools such as DecentMon [2, 3, 9], using Maude [22], dLola [5] or THEMIS [7, 8]. All these approaches only perform verification: they report violations or satisfaction of a property and do not consider enforcement.

On the topic of runtime enforcement, most of the work has been done for centralized systems with varying assumptions about the underlying system: specification expressed with discrete-time formalisms [10, 11], timed properties [17] or with uncontrollable events in the system [16, 20]. We note that fewer methods have been actually implemented for centralized enforcement, see TiPEX [18] or GREP [21] for instance. Although there are a few approaches [14, 15] considering decentralized enforcement in decentralized systems, these are tailored to specific systems and our work is the first generic approach able to enforce any property expressed in linear-temporal logic.

6 Conclusions

Decent allows the evaluation of two decentralized enforcement algorithms introduced in [13]. Our experiments demonstrate that, although the messages are much larger with the decentralized algorithms, we need less of them. The internal memory usage is significantly reduced with the local-incremental algorithm. Also, even though the latter algorithm does not guarantee the optimality, it is very close to it in practice.

We plan to implement our algorithms into a larger tool (THEMIS, for instance) which would allow us to use these algorithms in real systems. We also plan to improve the algorithms as they suffer from a few limitations, mainly the dependence on a powerful simplification function (to not miss any verdict) and the exponential blowup of the formulas in some rare cases. As both of these limitations come from LTL and rewriting, we plan to use different specifications formalism such as finite-state automata or even a more expressive one like timed properties or streams. Finally, we also plan to study in greater detail the optimistic and pessimistic modes to try and find settings in which one method is better than the other using more suited metrics.