Keywords

1 Introduction

Within the growing area of data-aware processes, Data Petri nets (DPNs) with arithmetic data have recently gained increasing popularity thanks to their ability to balance simplicity with expressiveness. DPNs can also be mined automatically [5, 17], but automatic mining techniques typically come without any correctness guarantees. However, the complex interplay between the control structure and data makes it hard to check whether DPNs satisfy properties of interest; indeed, all non-trivial verification tasks are undecidable. While linear- and branching-time model checking procedures for DPNs were developed [11,12,13, 18], many analysis tasks relevant in BPM go beyond these techniques. Here we focus on checking equivalence and refinement of processes, which is an important task in many contexts [1, 6]: to match an organization-specific model to a reference model, to relate an automatically mined model to a normative one by domain experts, or to compare a refined version of a process model with the original one.

Example 1

The DPN in Fig. 1 models a management process for road fines by the Italian police [20], where assignments with right-hand side ? indicate a non-deterministic write operation.

Fig. 1.
figure 1

Data Petri net for road fine management process

This normative process model was designed by domain experts, but other versions of this process were discovered by automatic techniques [19]. This raises a number of questions that are important in the context of process discovery and repair: Do the DPNs admit the same set of configurations and data values? Are all process runs of one model also possible in the other? Do the possible data values in a final state coincide?

To answer such questions, we consider in this paper different notions of behavioural refinement and equivalence for DPNs, comparing markings, configurations, or language, and we propose techniques to check them. Since even state reachability is undecidable for DPNs [12, Rem. 2.6], it does not come as a surpise that these verification tasks are in general undecidable as well.

In this paper, we thus impose two restrictions on DPNs to make verification decidable: (1) To tame the control flow perspective, we assume that the Petri nets underlying DPNs are bounded. (2) To tame the data perspective, we assume that the DPNs have a finite constraint graph. Constraint graphs (CGs) are symbolic abstractions of the reachable state space that are used for data-aware soundness and model checking of DPNs [10,11,12,13]. In general, CGs are infinite. However, it was shown that finite constraint graphs can in fact be computed for a wide range of DPNs from the literature. These include DPNs where all constraints are variable-to-variable/constant comparisons, as produced by automatic guard discovery techniques [5, 17], or bounded lookback DPNs whose behavior depends only on a bounded amount of information from the process run [4, 12]. It was shown that these classes comprise almost all DPNs in the literature [13], also e.g. the process of Example 1. Notably, we do not assume that DPNs are acyclic.

The contributions of this paper can be summarized as follows: (1) We show how natural notions of equivalence and refinement of DPNs with respect to markings, configurations, and language can be checked based on constraint graphs. (2) Our technique is a decision procedure for bounded DPNs where finite constraint graphs exist, which proves decidability of process refinement/equivalence for such DPNs. These include DPNs where all guards are variable-to-variable/constant comparisons, and DPNs with bounded lookback. (3) If equivalence or refinement does not hold, counterexamples that distinguish the two processes can be computed by our approach.

Related work. For process models without data, a variety of comparison techniques were developed. An early first approach for process equivalence was presented in [1]. A basic taxonomy of similarity measures was proposed in [6], distinguishing similarity based on either element labels, structure, or behaviour. For the first kind, schema and ontology matching techniques are used [8]. For structural similarity measures, graph matching algorithms were studied [7]. Our approach falls within the class of behavioural similarity, and to the best of our knowledge, no respective approaches exist to compare data-aware processes. However, while most works on process comparison are quantitative (i.e., they quantify process similarity with respect to some measure [16, 21], this paper is purely qualitative, in the sense that our techniques check process equivalence or refinement, but the difference between models is not quantified.

2 Background

In this section we summarize some background on constraints, DPNs and data-aware dynamic systems as process models, as well as constraint graphs.

We assume a set of process variables V, each of which is associated with a sort from the set \(\varSigma = \{\texttt {int}, \texttt {rat}\}\) with associated domains integers \(\mathcal D(\texttt{int}) = \mathbb Z\) and rationals \(\mathcal D(\texttt{rat}) = \mathbb Q\). For instance, in Example 1 the set of process variables is \(V = \{a, d, dj, dp, ds, p, t\}\), where a and t are of sort \(\texttt {rat}\) and the others of sort \(\texttt{int}\). For \(\sigma \,{\in }\, \varSigma \), \(V_{\sigma }\) denotes the subset of variables in V of type \(\sigma \). To manipulate variables, we consider linear arithmetic expressions \(c\), called constraints:

$$\begin{aligned} c &:= n \geqslant n \mid n \ne n \mid n = n \mid r \geqslant r \mid r > r \mid r \ne r \mid r = r \mid c \wedge c \\ n &:= v_i \mid k \mid n + n \mid - n \qquad \qquad r := v_r \mid q \mid r + r \mid - r \end{aligned}$$

where \(v_i \in V_{\texttt{int}}\), \(v_r\in V_{\texttt{rat}}\), \(k \in \mathbb Z\), and \(q\in \mathbb Q\). These expressions will be used to capture conditions on the values of variables that are read and written during the execution of process activities. The set of constraints over a set of variables V is denoted \(\mathcal C(V)\). We will also consider first-order formulas that have constraints as atoms. Given the definition of constraints, such formulas are in the theory of linear arithmetic, which is decidable [2]. Moreover, quantifier elimination can produce a quantifier-free, equivalent for any formula of the form \(\exists x.\varphi \), cf. [2]. We denote logical equivalence by \(\equiv \), and logical entailment by \(\models \).

Data Petri nets. We adopt the standard definition of Data Petri Nets (DPNs) [19, 20]. We consider two disjoint, marked copies of the set of process variables V, denoted \(V^r = \{v^r \mid v\,{\in }\,V\}\) and \(V^w = \{v^w \mid v\,{\in }\,V\}\), called the read and write variables. They will refer to variable values before and after a transition, respectively. We also write \(\overline{V}\) for a vector that orders V in an arbitrary, fixed way, and \(\overline{V}^r\) and \(\overline{V}^w\) for vectors ordering \(V^r\) and \(V^w\) in the same way.

Definition 1

A data Petri net (DPN) is a tuple \(\mathcal {N}= \langle P, T, F, \ell , \mathcal {A}, V, guard \rangle \), where (1) \(\langle P, T, F, \ell \rangle \) is a labelled Petri net with non-empty, disjoint sets of places P and transitions T, a flow relation \(F:(P \times T)\cup (T \times P)\mapsto \mathbb {N}\) and a labelling function \(\ell :T\mapsto \mathcal {A}\cup \{\tau \}\), where \(\mathcal {A}\) is a finite set of activity labels and \(\tau \) is a special symbol for silent transitions; (2) V is a set of process variables with a sort in \(\varSigma \); and (3) \(guard:T \mapsto \mathcal {C}(V^r \cup V^w)\) is a guard mapping.

Example 2

The process in Fig. 2(a) is a DPN modelling a simple auction process. It maintains the set of variables \(V = \{o,t\}\) of sort \(\texttt{rat}\), where o holds the last offer issued by a bidder, and t is a timer. The action init sets the timer t to a positive value and the offer o to 0; while the timer did not expire, it can be decreased (action timer), or bids can be issued, increasing the current offer (bid); the item can be sold if the timer expired and the offer is positive (hammer). We denote this DPN, consisting of all actions drawn in black in Fig. 2, by \(\mathcal {N}\). Moreover, we consider a variant of this DPN with an additional reset action that restarts the process if the offer is 0 (drawn in blue), and call this DPN \(\mathcal {N}_{\textsf{reset}}\).

Fig. 2.
figure 2

A DPN with its DDSA and constraint graph.

Also the process shown in Example 1 is a DPN.

The variables read and written by a transition t are denoted by \( read (t) = \{v \mid v^r \text { occurs in } guard (t) \}\) and \( write (t) = \{v \mid v^w \text { occurs in } guard (t) \}\). For instance, for t the activity \(\textsf{bid}\) in Fig. 2, \( write (t) = \{o\}\) and \( read (t) = \{o,t\}\). An assignment with domain V is called a state variable assignment, to distinguish it from a transition variable assignment \(\beta \) that assigns values to the set of variables \(V^r\cup V^w\). All assignments are supposed to map variables to elements of their domain.

A configuration in a DPN \(\mathcal {N}\) is a pair \((M,\alpha )\) given by a marking \(M:P\mapsto \mathbb {N}\) for the underlying Petri net, together with a state variable assignment \(\alpha \). A configuration thus simultaneously accounts for the control flow progress and for the current values of all variables in V, as specified by \(\alpha \). For instance, is a configuration of the DPNs of Example 2.

Definition 2 (Transition firing)

A transition \(t\in T\) is enabled in \((M, \alpha )\) if a transition variable assignment \(\beta \) exists such that:

  1. (i)

    \(\beta (v^r) = \alpha (v)\) for every \(v\in read (t)\), i.e., \(\beta \) is as \(\alpha \) for read variables;

  2. (ii)

    \(\beta \models guard (t)\), i.e., \(\beta \) satisfies the guard; and

  3. (iii)

    \(M(p) \geqslant F(p,t)\) for every p so that \(F(p,t)\geqslant 0\).

An enabled transition may fire, producing a new configuration \((M',\alpha ')\), s.t. \(M'(p) = M(p) - F(p,t) + F(t,p)\) for every \(p\in P\), and \(\alpha '(v) \,{=}\,\beta (v^w)\) for every \(v\in write (t)\), and \(\alpha '(v) \,{=}\,\alpha (v)\) for every \(v\not \in write (t)\). A pair \((t,\beta )\) as above is called (valid) transition firing, and we denote its firing by .

Thus, a guard simultaneously expresses a condition on read variables, and an update on written ones: e.g., \(\textsf{bid}\) in Fig. 2 requires the current value of t to be positive and non-deterministically sets o to a new value that exceeds the current.

Given \(\mathcal {N}\), we fix one configuration \((M_I,\alpha _0)\) as initial, where \(M_I\) is the initial marking of the underlying Petri net and \(\alpha _0\) is a state variable assignment that specifies the initial values of all variables in V. The final marking is denoted \(M_F\). For instance, \(\mathcal {N}\) in Example 2 admits a transition firing from its initial state; and \(\{\textsf{p}_3\}\) is the final marking.

A state \((M',\alpha ')\) is reachable in a DPN if it is reached by a transition sequence from the initial state . Such a sequence is also written as \((M_I, \alpha _0) \rightarrow ^* (M',\alpha ')\). We denote by \(\mathcal M ark (\mathcal {N})\) the set of all such \(M'\), and by \(\mathcal C onf (\mathcal {N})\) the set of all such \((M',\alpha )\), i.e., the sets of reachable markings and configurations. A transition sequence as above is a process run if \(M'=M_F\). In this paper, we will assume that DPNs are bounded, i.e., that the number of tokens in reachable markings is upper-bounded by some \(k\in \mathbb N\).

Data-aware Dynamic Systems with Arithmetic (DDSAs) are a simpler, equivalent model [9, 10] that we will use for analysis tasks.

Definition 3

A DDSA \(\mathcal {B}=\langle B, b_{ I }, \mathcal {A}, T, B_F, V, \alpha _{ I }, guard \rangle \) is a labeled transition system where (1) B is a finite set of control states, with \(b_{ I }\,{\in }\,B\) the initial one; (2) \(\mathcal {A}\) is a set of actions; (3) \(T \subseteq B{\times }\mathcal {A}{\times }B\) is a transition relation; (4) \(B_F \subseteq B\) are final states; (5) V is the set of process variables; (6) \(\alpha _{ I }\) the initial variable assignment; and (7) \( guard :\mathcal {A}\mapsto \mathcal {C}(V^r \cup V^w)\) specifies executability constraints for actions over variables \(V^r\,{\cup }\,V^w\).

Every bounded DPN \(\mathcal {N}\) can be equivalently expressed as a DDSA \(\mathcal {B}\) over the same set of process variables V, by unfolding all possible markings (see [13] for details). The set of control states of \(\mathcal {B}\) coincides thus with the set of markings of \(\mathcal {N}\). Figure 2(b) shows a DDSA which corresponds to the DPN in Fig. 2(a). The action guards are the same as in the DPN, but have been omitted for readability.

If a control state \(b \in B\) admits a transition to \(b'\) via action a, i.e., \((b, a, b')\in T\), this is denoted by . A configuration of \(\mathcal {B}\) is a pair \((b, \alpha )\) where \(b\,{\in }\,B\) and \(\alpha \) is a state variable assignment, and \((b_{ I }, \alpha _{ I })\) is the initial one. As defined next, an action a transforms a configuration \((b, \alpha )\) into a new configuration \((b', \alpha ')\) by updating the assignment \(\alpha \) according to the action guard, exactly as in DPNs:

Definition 4

A DDSA \(\mathcal {B}\,{=}\,\langle B, b_{ I }, \mathcal {A}, T, B_F, V, \alpha _{ I }, guard \rangle \) admits a step from configuration \((b, \alpha )\) to \((b', \alpha ')\) via action a, denoted , if , \(\alpha '(v) = \alpha (v)\) for all \(v \in V\setminus write(a)\), and the transition assignment \(\beta \) given by \(\beta (v^r) = \alpha (v)\) and \(\beta (v^w) = \alpha '(v)\) for all \(v \in V\), satisfies \(\beta \models guard (a)\).

A run \(\rho \) of a DDSA \(\mathcal {B}\) is a sequence of steps , and it is final if \(b_n\in B_F\). We call the abstraction of a run the respective transition sequence . For instance, for the DDSA in Fig. 2(b), the following is a run ending in a final state (note that each state corresponds to a marking of the DPN):

figure j

The number of runs and configurations of a DPN or DDSA are typically infinite, due to the infinite number of possible valuations. For analysis tasks, we thus resort to the following abstraction:

Constraint graphs (CGs) are an abstraction of the reachable state space that was introduced for soundness checking [10, 13]. The key idea is that formulas are used to represent sets of configurations.

Let \(\mathcal {B}\,{=}\,\langle B, b_{ I }, \mathcal {A}, \varDelta , B_F, V, \alpha _{ I }, guard \rangle \) be a DDSA. The transition formula \(\varDelta _{a}\) of action a is given by \(\varDelta _{a}(\overline{V}^r, \overline{V}^w){=} guard (a) \wedge \bigwedge _{v\not \in write (a)} v^{w}\,{=}\,v^{r}\). This formula simply expresses conditions on variables before and after executing the action: \( guard (a)\) must hold, and the values of all variables that are not written are copied. E.g., for action \(\textsf{bid}\) in Fig. 2(b), \( write (\textsf{bid}) = \{o\}\), so \(\varDelta _{\textsf{bid}} = (t^r\,{>}\,0) \wedge (o^w\,{>}\,o^r) \wedge (t^{w}\,{=}\,t^{r})\). Next, we define an update operation, to express how a set of configurations, captured by formula \(\varphi \), changes when executing an action.

Definition 5

For a formula \(\varphi \) with free variables V and action a, \( update (\varphi , a) = \exists \overline{U}. \varphi [\overline{U}/\overline{V}] \wedge \varDelta _a[\overline{U}/\overline{V}^r, \overline{V}/\overline{V}^w]\), where U is a set of variables that has the same cardinality as V and is disjoint from all variables in \(\varphi \).

Here, \(\varphi [\overline{U}/\overline{V}]\) is the result of replacing variables \(\overline{V}\) in \(\varphi \) by \(\overline{U}\), and similar for \(\varDelta _{a}\). For instance, if \(\overline{V} = (o,t)\) we can take the renamed variables \(\overline{U} = (o',t')\); for \(\varphi = (t\,{>}\,0) \wedge (o \,{=}\,0)\) we then get \( update (\varphi , \textsf{bid}) = \exists o'\,t'. (t'\,{>}\,0) \wedge (o' \,{=}\,0) \wedge (o\,{>}\,o') \wedge (t\,{=}\,t')\), which is equivalent to \((t\,{>}\,0) \wedge (o\,{>}\, 0)\). This reflects that, if in the process of Fig. 2, \(t>0\) and \(o=0\) hold, and \(\textsf{bid}\) is executed, then afterwards we still have \(t>0\) but also o is positive. Below, let \(c_{\alpha _{ I }}:=\bigwedge _{v\in V} v\,{=}\,\alpha _{ I }(v)\) .

Definition 6

The constraint graph \(CG _{\mathcal {B}}\) of \(\mathcal {B}\) is a quadruple \(\langle S, s_0, \gamma , S_F\rangle \) where the set of nodes S consists of tuples \((b, \varphi )\) for \(b \,{\in }\,B\) and a formula \(\varphi \) with free variables V, and \(\gamma \subseteq S \times \mathcal {A}\times S\), inductively defined as follows:

  1. (i)

    \(s_0 := (b_0, c_{\alpha _{ I }}) \in S\) is the initial node; and

  2. (ii)

    if \((b,\varphi ) \in S\) and such that \( update (\varphi , a)\) is satisfiable, there is some \((b', \varphi ')\in S\) with \(\varphi ' \equiv update (\varphi , a)\), and is in \(\gamma \), and

  3. (iii)

    the set of final nodes \(S_F\) consists of all \((b,\varphi )\) such that \(b\in B_F\).

Intuitively, the constraint graph describes all configurations reachable in \(\mathcal {B}\): Every node combines a control state b with a formula \(\varphi \): it represents all configurations \((b,\alpha )\) such that \(\alpha \) satisfies \(\varphi \). Figure 2(c) shows the CG for the DDSA obtained from \(\mathcal {N}\) in Example 2, (final nodes are drawn with a double border). In fact, Fig. 2(c) is also the CG for the DDSA of \(\mathcal {N}_{\textsf{reset}}\) (basically, because the transition \(\textsf{reset}\) is not reachable). The crucial property of CGs is that they faithfully and completely represent the configuration space, in the following sense:

Lemma 1

([13, Lem. 2]).\(CG _{\mathcal {B}}\) has a path \(\pi :(b_{ I }, c_{\alpha _{ I }}) \rightarrow ^* (b,\varphi )\) s.t. \(\varphi \) is satisfied by \(\alpha \) iff \(\mathcal {B}\) has a run \(\smash {(b_{ I }, \alpha _{ I }) \rightarrow ^* (b, \alpha )}\) whose abstraction is \(\sigma (\pi )\).

Here, for a path \(\pi \) in the CG, \(\sigma (\pi )\) is the DDSA transition sequence along this path. Thus a path \(\pi \) in the CG captures all runs \(\rho \) with the same sequence of control states and actions such that the last assignment in \(\rho \) satisfies the formula in the last node of \(\pi \). CGs are infinite in general, but for many classes of DDSAs occurring in practice, finite CGs can be computed [10, 13]. These include DDSAs where all constraints are variable-to-variable/constant comparisons over \(\mathbb Q\) like Example 2, and bounded lookback DDSAs whose behaviour, intuitively, depends only on a bounded number of past steps (this holds e.g. for Example 1).

3 Marking and Configuration Equivalence

Two Petri nets are marking equivalent if their sets of reachable markings coincide. While marking equivalence is in general undecidable for Petri nets [14], it is easy to decide for bounded Petri nets, by enumerating all markings. Here we consider marking equivalence, as well as the related problem of marking inclusion, for bounded DPNs. First, we note that if a DPN \(\mathcal {N}\) was transformed into a DDSA \(\mathcal {B}\) as described in [13], then the set of possible markings \(\mathcal M ark (\mathcal {N})\) coincides with the set of reachable states in \(\mathcal {B}\). Two DPNs \(\mathcal {N}_1\) and \(\mathcal {N}_2\) with respective DDSAs \(\mathcal {B}_1\) and \(\mathcal {B}_2\) are thus marking equivalent iff \(\mathcal {B}_1\) and \(\mathcal {B}_2\) have the same sets of reachable states. Since reachability of a single state in a DDSA is already undecidable (cf.  [12, Rem. 2.6]), also marking equivalence of DPNs is undecidable.

However, we show that for bounded DPNs with finite CGs, marking equivalence can be read off the CGs: Suppose two DPNs were transformed into DDSAs \(\mathcal {B}_1=\langle B, b_{ I }, \mathcal {A}, T, B_F, V, \alpha _{ I }, guard \rangle \) and \(\mathcal {B}_2=\langle B, b_{ I }', \mathcal {A}, T', B_F, V, \alpha _{ I }', guard \rangle \). We assume that all components of the DDSAs coincide, except for initial states and transitions, but this does not restrict generality as control states can be unreachable. For a DDSA \(\mathcal {B}\), let \(\textit{MReach}(\mathcal {B}) = \{ b \mid (b,\varphi ) \in S\}\) for S the set of nodes in \(CG _{\mathcal {B}}\), i.e., \(\textit{MReach}(\mathcal {B})\) is the set of control states of \(\mathcal {B}\) that occur in the CG of \(\mathcal {B}\).

Proposition 1

Two DPNs \(\mathcal {N}_1\) and \(\mathcal {N}_2\) that correspond to DDSAs \(\mathcal {B}_1\) and \(\mathcal {B}_2\) with finite CGs satisfy \(\mathcal M ark (\mathcal {N}_1) \subseteq \mathcal M ark (\mathcal {N}_2)\) iff \(\textit{MReach}(\mathcal {B}_1) \subseteq \textit{MReach}(\mathcal {B}_2)\), and are marking equivalent iff \(\textit{MReach}(\mathcal {B}_1) = \textit{MReach}(\mathcal {B}_2)\).

Proof

First, suppose \(\mathcal M ark (\mathcal {N}_1) \subseteq \mathcal M ark (\mathcal {N}_2)\), and let \(M \in \textit{MReach}(\mathcal {B}_1)\). By Lemma 1, there is a run of \(\mathcal {B}_1\) ending in a configuration \((M,\alpha )\). Thus M is a reachable state of \(\mathcal {B}_1\), i.e., a reachable marking of \(\mathcal {N}_1\), and hence also of \(\mathcal {N}_2\), so there is a process run of \(\mathcal {N}_2\) (and thus a run of \(\mathcal {B}_2\)) ending in a configuration \((M,\alpha ')\). By Lemma 1, the CG for \(\mathcal {B}_2\) has a node \((M,\varphi )\), i.e., \(M \in \textit{MReach}(\mathcal {B}_2)\).

Second, if \(\textit{MReach}(\mathcal {B}_1) \subseteq \textit{MReach}(\mathcal {B}_2)\) and \(M\in \mathcal M ark (\mathcal {N}_1)\), some process run of \(\mathcal {N}_1\) and run of \(\mathcal {B}_1\) end in a configuration \((M,\alpha )\). By Lemma 1, the CG for \(\mathcal {B}_1\) has a node \((M,\varphi )\). Since \(\textit{MReach}(\mathcal {B}_1) \subseteq \textit{MReach}(\mathcal {B}_2)\), the CG for \(\mathcal {B}_2\) has a node \((M,\varphi ')\). By Lemma 1, there is a run of \(\mathcal {B}_2\) ending in a configuration \((M,\alpha ')\). This shows the inclusion statement, so the one for equivalence follows.    \(\square \)

Configuration Equivalence. For DPNs, the perhaps more relevant notion than marking equivalence is equivalence of sets of configurations. Let two DPNs be configuration equivalent if their sets of reachable configurations coincide. First, note that for a DPN \(\mathcal {N}\) with associated DDSA \(\mathcal {B}\), the sets of configurations of \(\mathcal {N}\) and \(\mathcal {B}\) coincide. Thus, we can again check the problem on the level of DDSAs. For a DDSA \(\mathcal {B}\) with constraint graph with node set S, and M a state of \(\mathcal {B}\), consider \(\varphi _{ reach }(\mathcal {B},M) = \bigvee \{ \varphi \mid (M,\varphi ) \in S\}\) as a formula representation of the configurations that can occur together with M.

Proposition 2

Let two DPNs \(\mathcal {N}_1\) and \(\mathcal {N}_2\) correspond to DDSAs \(\mathcal {B}_1\) and \(\mathcal {B}_2\).

  1. (1)

    \(\mathcal C onf (\mathcal {N}_1) \subseteq \mathcal C onf (\mathcal {N}_2)\) iff \(\mathcal M ark (\mathcal {N}_1) \subseteq \mathcal M ark (\mathcal {N}_2)\) and \(\varphi _{ reach }(\mathcal {B}_1,M) \models \varphi _{ reach }(\mathcal {B}_2,M)\) for all \(M \in \mathcal M ark (\mathcal {N}_1)\).

  2. (2)

    \(\mathcal {N}_1\) and \(\mathcal {N}_2\) are configuration equivalent iff they are marking equivalent and \(\varphi _{ reach }(\mathcal {B}_1,M) \equiv \varphi _{ reach }(\mathcal {B}_2,M)\) for all \(M \in \mathcal M ark (\mathcal {N}_1)\).

  3. (3)

    If \(M\in \mathcal M ark (\mathcal {N}_1) \cap \mathcal M ark (\mathcal {N}_2)\) and there is some assignment \(\alpha \) that satisfies \(\varphi _{ reach }(\mathcal {B}_1,M) \wedge \lnot \varphi _{ reach }(\mathcal {B}_2,M)\) then \((M,\alpha ) \in \mathcal C onf (\mathcal {N}_1) \setminus \mathcal C onf (\mathcal {N}_2)\).

Proof

(1) First, assume \(\mathcal C onf (\mathcal {N}_1) \subseteq \mathcal C onf (\mathcal {N}_2)\), so \(\mathcal M ark (\mathcal {N}_1) \subseteq \mathcal M ark (\mathcal {N}_2)\). Let \(\alpha \models \varphi _{ reach }(\mathcal {B}_1,M)\), so \(\alpha \models \varphi \) for some \((M,\varphi )\) in the CG of \(\mathcal {B}_1\). By Lemma 1, there is a run of \(\mathcal {B}_1\) ending in a configuration \((M,\alpha )\). So \((M,\alpha )\) is a configuration of \(\mathcal {N}_1\), and hence of \(\mathcal {N}_2\), so \((M,\alpha )\) is also reachable in \(\mathcal {B}_2\). Again by Lemma 1, the CG for \(\mathcal {B}_2\) has a node \((M,\varphi ')\) such that \(\alpha \models \varphi '\), so \(\alpha \models \varphi _{ reach }(\mathcal {B}_2,M)\).

Second, suppose \(\mathcal M ark (\mathcal {N}_1) \subseteq \mathcal M ark (\mathcal {N}_2)\) and \(\varphi _{ reach }(\mathcal {B}_1,M) \models \varphi _{ reach }(\mathcal {B}_2,M)\) for all \(M \in \mathcal M ark (\mathcal {N}_1)\). Let \((M,\alpha ) \in \mathcal C onf (\mathcal {N}_1)\), so reachable in \(\mathcal {B}_1\). By Lemma 1, the CG for \(\mathcal {B}_1\) has a node \((M,\varphi )\) such that \(\alpha \models \varphi \), so \(\alpha \models \varphi _{ reach }(\mathcal {B}_1,M)\), hence \(\alpha \models \varphi _{ reach }(\mathcal {B}_2,M)\). By Lemma 1, some run of \(\mathcal {B}_2\) (hence of \(\mathcal {N}_2\)) ends in \((M,\alpha )\). This shows (1), which implies (2); for (3) the reasoning is similar.    \(\square \)

Note that marking equivalence can also be decided by finitely many reachability queries, but not configuration equivalence if the state space is infinite.

Fig. 3.
figure 3

Two DPNs with their DDSAs and constraint graphs.

Example 3

Consider the DPNs \(\mathcal {N}_1\) and \(\mathcal {N}_2\) in Fig. 3(a) over variables \(V=\{x,y\}\) with sort \(\texttt{rat}\), with \(\alpha _{ I }(x)=\alpha _{ I }(y)=0\) (the guards of actions in \(\mathcal {N}_2\) coincide with those in \(\mathcal {N}_1\)). The respective DDSAs \(\mathcal {B}_1\) and \(\mathcal {B}_2\) and their constraint graphs are shown in Fig. 3(b) and (c). The markings occurring in the CGs coincide, so the two DPNs are marking equivalent. However, they are not configuration equivalent: For instance, the formulas \(\varphi _{ reach }(\mathcal {B}_1,\{\textsf{p}_3\}) = (x{>}1\wedge y{>}x)\) and \(\varphi _{ reach }(\mathcal {B}_2,\{\textsf{p}_3\}) = (x{>}1\wedge y{>}x) \vee (x{>}y{+}1\wedge y{>}0)\) are not equivalent. This is witnessed by any assignment that satisfies \(\varphi _{ reach }(\mathcal {B}_2,M) \wedge \lnot \varphi _{ reach }(\mathcal {B}_1,M) \equiv (x{>}y{+}1\wedge y{>}0)\), e.g., \(\alpha (x)=2\) and \(\alpha (y)=1\), so \((\{\textsf{p}_3\},\alpha )\) is a configuration of \(\mathcal {N}_2\) but not of \(\mathcal {N}_1\). However, \(\varphi _{ reach }(\mathcal {B}_1,M) \models \varphi _{ reach }(\mathcal {B}_2,M)\) for all M, so \(\mathcal C onf (\mathcal {N}_1) \subseteq \mathcal C onf (\mathcal {N}_2)\).

For another example, as the CGs of \(\mathcal {N}\) and \(\mathcal {N}_{\textsf{reset}}\) from Example 2 coincide, the DPNs are marking and configuration equivalent. One can also use Propositions 1 and 2 to check that the DPN in Example 1, and the version in [19, Fig. 12.7] in which guards were discovered automatically, are marking but not configuration equivalent.

4 Language Equivalence

Language equivalence is undecidable for unbounded Petri nets [14], but decidable for bounded nets, in which case it basically amounts to checking equivalence of two regular languages [15]. Here we consider the respective problem for bounded DPNs. Given a DPN \(\mathcal {N}\), let its language \(\mathcal {L}(\mathcal {N})\) be the set of all \(\ell (t_1'), \dots \ell (t_m')\) such that there is a process run   and \(t'_1, \dots , t'_m\) is the maximal subsequence of \(t_1, \dots , t_n\) such that \(\ell (t_i') \ne \tau \). We assume that silent transitions in DPNs do not write variables, and moreover, that there are no cycles that consist of silent transitions only. Thus, after transforming a DPN into a DDSA, we can eliminate silent transitions by replacing them with non-silent transitions, similar as done in NFAs: if with t silent with guard c, and   are all transitions starting from \(b'\), we remove and add transitions   with \( guard (t_i') = guard (t_i) \wedge c\). This replacement can be repeated until there are no silent transitions left. Thus, we can consider the language equivalence problem for DDSAs without silent transitions. Let the language \(\mathcal {L}(\mathcal {B})\) of a DDSA \(\mathcal {B}\) be the set of all words \(a_1, \dots , a_n\) such that \(\mathcal {B}\) has a run and \(b_n \in B_F\). Then, for every bounded DPN \(\mathcal {N}\) there is a DDSA \(\mathcal {B}\) such that \(\mathcal {L}(\mathcal {N})=\mathcal {L}(\mathcal {B})\).

We now consider two DPNs \(\mathcal {N}_1\) and \(\mathcal {N}_2\) that correspond to DDSAs \(\mathcal {B}_1=\langle B_1, b_{ I }, \mathcal {A}, T, B_F, V, \alpha _{ I }, guard \rangle \) and \(\mathcal {B}_2=\langle B_2, b_{ I }', \mathcal {A}, T_2, B_F', V, \alpha _{ I }', guard \rangle \). Note that the data variables, actions, and guards are supposed to coincide, but control states, transitions and the initial assignment may be different.

Language equivalence can be checked by an adaptation of an algorithm to check language equivalence of NFAs [3, Fig. 4], see the procedure \(\textsc {Equiv}\) in Algorithm 1. We use the following shorthands: for a set of nodes X in a constraint graph \(\mathcal {G}\), \( fin _\mathcal {B}(X)\) is true iff X contains a node that is final in \(\mathcal {G}\). Moreover, \( next _\mathcal {G}(X,a)\) is the set of all CG nodes \(s'\) such that \(\mathcal {G}\) has an edge for some \(s\in X\).

figure t

For a set of nodes X in a CG \(\mathcal {G}\), we now write \(\mathcal {L}_\mathcal {G}(X)\) for the set of words accepted starting from a state in X, i.e., the set of words \(a_1, \dots , a_n\) such that \(\mathcal {G}\) has a path from some \(b\in X\) such that \((b_n, \varphi _n)\) is final in \(\mathcal {G}\). Then, procedure \(\textsc {SetEquiv}\) can be used to check whether two sets of states accept the same words. This is formally stated in the next result, which follows directly from [3, Prop. 2], considering \(\mathcal {G}\) and \(\mathcal {G}'\) as NFAs and X and Y as sets of states therein.

Proposition 3

For sets of CG nodes X in \(\mathcal {G}\) and Y in \(\mathcal {G}'\), \(\textsc {SetEquiv}(X,Y,\mathcal {G}, \mathcal {G}')\) is true iff \(\mathcal {L}_{\mathcal {G}}(X)=\mathcal {L}_{\mathcal {G}'}(Y)\).

Proposition 4

For two DPNs \(\mathcal {N}_1\), \(\mathcal {N}_2\) and DDSAs \(\mathcal {B}_1\), \(\mathcal {B}_2\) without silent transitions such that \(\mathcal {L}(\mathcal {N}_1)=\mathcal {L}(\mathcal {B}_1)\) and \(\mathcal {L}(\mathcal {N}_1)=\mathcal {L}(\mathcal {B}_1)\), \(\textsc {Equiv}(\mathcal {B}_1,\mathcal {B}_2)=true \) iff \(\mathcal {N}_1\) and \(\mathcal {N}_2\) are language equivalent.

Proof (sketch)

Let \(\mathcal {G}_1\) and \(\mathcal {G}_2\) be the CGs of \(\mathcal {B}_1\) and \(\mathcal {B}_2\). We have \(\textsc {Equiv}(\mathcal {B}_1,\mathcal {B}_2)=\textsc {SetEquiv}(\{s_0\}, \{s_0'\}, \mathcal {G}_1, \mathcal {G}_2)\), which is true iff \(\mathcal {L}_{\mathcal {G}_1}(\{s_0\})=\mathcal {L}_{\mathcal {G}_2}(\{s_0'\})\) by Proposition 3. From Lemma 1 it follows that \(\mathcal {L}_{\mathcal {G}_1}(\{s_0\})\) coincides with \(\mathcal {L}(\mathcal {B}_1) = \mathcal {L}(\mathcal {N}_1)\), and similar for \(\mathcal {B}_2\), so the claim follows.    \(\square \)

Language inclusion can be reduced to language equivalence since \(\mathcal {L}(X) \subseteq \mathcal {L}(Y)\) iff \(\mathcal {L}(X) \cup \mathcal {L}(Y) \subseteq \mathcal {L}(Y)\) iff \(\mathcal {L}(X \cup Y) \subseteq \mathcal {L}(Y)\), cf. [3], so Algorithm 1 also serves to decide language inclusion.

For instance, since the constraint graphs of \(\mathcal {N}\) and \(\mathcal {N}_{\textsf{reset}}\) from Example 2 coincide, the DPNs are language equivalent. On the other hand, \(\textsc {Equiv}\) can be used to detect that the languages of the DPNs in Fig. 3 are not equal (e.g., \(\textsf{set}_y, \textsf{set}_x\) is not accepted by the first CG). The next example shows that it does not suffice to execute \(\textsc {Equiv}\) on the DDSAs instead of the constraint graphs.

Example 4

Consider the following two DDSAs:

figure v

If both \(\alpha _{ I }(x)=\alpha _{ I }'(x)=0\), their CGs coincide, so \(\textsc {Equiv}\) concludes language equivalence (the language being \(\textsf{inc}^+\textsf{check} + \textsf{zero}\), in regular expression notation).

figure w

However, note that the procedure \(\textsc {Equiv}\) could not be run on the DDSAs directly, since detection of dead transition requires a reasoning based on reachable configurations, as done in CGs. Moreover, note that with e.g. \(\alpha _{ I }(x)=\alpha _{ I }'(x)=10\), the DDSAs would not be language equivalent because the single-letter word \(\textsf{check}\) would be in the language of the second, but not of the first DDSA.

Algorithm 1 can be modified to return a witness if equivalence does not hold. For reasons of space, we cannot formalize this here, but the main idea is straightforward: every pair in \(todo \) can be associated with a word that led to this pair of node sets, starting with the empty word \(\epsilon \) for the initial nodes. When returning false in line 9, \(\textsc {SetEquiv}\) can then also return the word accumulated up to this point, which witnesses the difference.

It can also be noted that if \(\mathcal {N}_1\) and \(\mathcal {N}_2\) are language equivalent and \(\alpha _{ I }=\alpha _{ I }'\), then the DPNs are also configuration equivalent, because actions have the same guards in both DPNs. However, the converse does not hold: if \(\mathcal {N}_1\) is a copy of \(\mathcal {N}_2\) where actions are renamed but have the same guards, then the two DPNs are configuration equivalent but not language equivalent.

Conclusion. We proposed techniques to check marking, configuration, and language equivalence for bounded DPNs with finite constraint graphs. Our correctness results thus imply that these notions are decidable for bounded DPNs with finite CGs, which captures many DPNs from practice [13]. To the best of our knowledge, these are the first results to compare DPNs based on behaviour. In future work, it would be interesting to study other notions of equivalence, e.g. language equivalence taking data into account. Also the study of quantitative similarity measures would be of interest, to e.g. express how large the intersection/difference of configuration spaces/language is. Our approach could also be implemented on top of the tool ada [11, 13], which already computes CGs.