1 Introduction

This work is about the role of data in conventional business processes [14, 32], and is concerned with verifying the correctness of processes when data and their manipulation over time are fully taken into account. With “conventional” processes, we mean processes whose control-flow is captured by typical process modelling languages such as BPMN, UML activity diagrams, YAWL or EPCs. These are centred around the key notion of case, intended as a (concrete or abstract) business object that is manipulated and evolved by (an instance of) the process so as to achieve the company’s strategic goals, and in turn produce value to one or more customers/consumers [14, 32]. Example of cases are an order within an order-to-cash process, an item within a production chain, or a complaint within an issue-to-resolution process. We refer to this class of processes as case-centric processes.

In recent years, a plethora of foundational results and corresponding techniques and tools has been developed to support the modeling, analysis, execution and improvement of case-centric processes along their entire lifecycle. A major shortcoming of such analysis techniques is that they solely focus on the control-flow dimension of the process, omitting how business objects and their data affect and are manipulated by process instances and their tasks. In particular, the static, formal correctness analysis for the control-flow of case-centric processes, has a long tradition, grounded on workflow nets and on the property of soundness [16, 29, 30]. In this context, case centricity is a fundamental requirement, since soundness is typically checked by relying on the conceptual assumption that multiple process executions are separately driven by cases, and that the corresponding business objects are evolved in isolation.

The importance of considering data alongside the process control-flow has been extensively argued in a growing number of works (see, e.g., [13, 19, 20, 31]). In this enriched setting, many interesting decidability results on the static analysis of data-aware processes have been produced over the last 15 years, see [9] for a survey. Within this extensive literature, robust decidability results on the verification of data-aware dynamic systems against rich first-order temporal properties have been shown lately, under the assumption that the system is state-bounded.Footnote 1 Such results have been studied by considering different modeling variants of data-aware business processes, in particular (i) artifact systems [6], (ii) data-centric dynamic systems (DCDSs) [2], and (iii) description logic-based dynamic systems [10].

Our aim is to merge these two lines of research, and in particular to study the correctness of case-centric processes by jointly taking into account the process control-flow and the manipulation of (case-)data stored in a full-fledged relational database equipped with first-order constraints/dependencies. A major drawback of the approaches based on state-boundedness is that it is a semantic property undecidable to check, but for which only sufficient, syntactic conditions can be defined [2, 3]. To mitigate this issue, modelling guidelines can be introduced in such a way that the system is guaranteed to be state-bounded by design [28]. When applied to case-centric processes, state-boundedness implicitly limits the maximum degree of case parallelism, that is, the number of cases that can simultaneously coexist. To the best of our knowledge, [11] is the only work so far to propose modeling and methodological guidelines to achieve decidability of verification in a setting where no limitation is put on such number.

We adopt this methodological approach here. Using DCDSs as a concrete framework for our investigation, we show that, unsurprisingly, checking data-aware soundness is in general undecidable. In parallel, we provide a fine-grained analysis on the role of data in case-centric business processes. We then substantiate this analysis by applying the methodological guidelines of [11], and introduce a class of case-centric DCDSs that are based on good modeling principles, and for which model checking of a first-order variant of \(\mu \)-calculus (that can express soundness) is decidable. This class is characterized by simultaneously operating on:

  • the database constraints and their mutual relationships, and on

  • the shape of queries that can be formulated in the process component of the DCDS so as to retrieve and modify the data maintained in the DCDS data component.

Notably, we pose minimal assumptions on the process control-flow, arguing that any control-flow pattern can be encoded into a DCDS by introducing a fixed set of dedicated database values to support the runtime instantiation of the pattern. This class of DCDSs does not enjoy state-boundedness, but is such that the unboundedly many data present in a state of the system are organized into isolated chunks, each referring to a single case identifier. In this sense, our key decidability result is incomparable to those in [2, 6, 10]. In addition, we show that the different features of this class are all necessary, in the sense that weakening one of them again leads to undecidability. In particular, our results imply that unboundedly many cases may coexist simultaneously only if they evolve in complete isolation, i.e., the evolution of one case is not affected by the data attached to another case.

Our investigation does not only lift soundness to the data-aware setting, but also indirectly recasts key notions in the theory of business processes, such as that of freedom of choice and case isolation, taking into account the interaction between processes and data.

The modeling restrictions we impose towards decidability, as well as the abstraction techniques we borrow from [2], present some similarities with the different notions of boundedness studied for the formal verification of (infinite-state) distributed systems [1, 18, 22]. In particular, some of the modeling restrictions we impose in this work resemble the notions of simple path- and depth-boundedness respectively studied in [1] and [22]. Instead, [2] and [18] independently obtained decidability results in their respective settings, by exploiting similar ideas related to recycling of data values/process names. On the other hand, there are key differences both in the modeling frameworks and in the verification results shown in these works. As for modeling, [1, 18, 22] focus on process algebras/communicating register automata where the exchanged data are atomic data values representing the pure names of the involved processes. Contrariwise, in [2] and the present work, the focus is on the dynamic manipulation of rich relational structures, without taking into account inter-process communication. As for the verification results, [1, 18, 22] study specific, standard properties such as reachability, coverability and termination. Instead, [2] and the present work show decidability for sophisticated temporal properties expressed in a first-order variant of the \(\mu \)-calculus. It remains to be investigated how such two lines of research can be interconnected: on the one hand by studying whether the communication graphs of [1, 18, 22] can be suitably encoded using the relational structures with constraints of [2], and on the other hand by investigating whether arbitrary n-ary relations can be encoded leveraging pure names and the graph structure of inter-process communication.

2 Data-centric dynamic systems

In this section, we provide an overview of Data-Centric Dynamic Systems (DCDSs). More specifically, we introduce a variant of the original DCDS framework (first introduced in [2]). In this variant, actions are described following the action formalism of [24], which provides STRIPS-like abstractions on top of the original DCDS action formalism. This variant is expressively equivalent to the original one [24].

2.1 The DCDS framework

A DCDS \(\mathcal {S} \) is a pair \(\langle \mathcal {D},\mathcal {P} \rangle \), where \(\mathcal {D} \) is the data component of \(\mathcal {S} \), and \(\mathcal {P} \) is its process component.

Data component The data component is a full-fledged relational database with constraints. Technically, \(\mathcal {D} =\langle \varDelta , \mathcal {R},\mathcal {C}, \mathcal {I}_0 \rangle \), where:

  • \(\varDelta \) is a countably infinite set of constants.

  • \(\mathcal {R} \) is a database schema, i.e., a set of relation schemas. We will equivalently adopt the positional notation or the attribute-based named notation for relations; When the latter notation is used, an n-ary relation R is represented as R(U), where U is a set of n named attributes. Furthermore, given a set \(A \subseteq U\) of attributes, \({ R } [A]\) represents the projection of R over A.

  • \(\mathcal {C}\) is a set of domain-independent FO-constraints over \(\mathcal {R} \), capturing the real-world constraints of the targeted application domain; technically, constraints are closed FO formulae over \(\mathcal {C}\) and the constants in \(\mathcal {I}_0 \).

  • \(\mathcal {I}_0 \) is the initial database instance of \(\mathcal {S} \), i.e., a database instance conforming to \(\mathcal {R} \), satisfying the constraints \(\mathcal {C}\), and made of values in \(\varDelta \).

Among all possible FO-constraints, we consider the following specific constraints, which are widespread in database modelling and standard conceptual modelling languages such as UML class diagrams, E–R diagrams, and the ORM notation:

  • Standard key and primary key constraints. We use notation \(\textsc {KEY}({ R } [A])\) (resp., \(\textsc {PK}({ R } [A])\)) to model that the set A of attributes is a key (resp., primary key) for relation R.

  • Standard foreign key constraints. We use notation \({ R } [A] \longrightarrow { S } [B]\) to model that the set A of attributes in R is a foreign key pointing to the set B of attributes in S, such that \(\textsc {PK}({ S } [B])\) holds.

  • Cardinality constraints, which resemble cardinality/frequency constraints of conceptual modelling languages. Cardinality constraints generalize key constraints by bounding the minimum and maximum number of tuples allowed in a relation when the value of some attributes is maintained unaltered. Given a relation R(U) and a set \(A\subseteq U\) of attributes, notation \(\textsc {CARD}({ R } [A],m..n)\), where m and n are positive integers, denotes the cardinality constraint requiring that the number of R-tuples with the same values for attributes A ranges between m and n.

  • A combination of cardinality and foreign key constraints, where a foreign key has an associated cardinality constraint guaranteeing that the number of tuples pointing to the same target primary key is bounded. We denote by \({ A } [R]\xrightarrow {m..n~~~}{ B } [S]\) the database constraint corresponding to the conjunction of \({ A } [R] \longrightarrow { B } [S]\) and \(\textsc {CARD}({ A } [R],m..n)\), and call such a conjunction a cardinality-bounded foreign key constraint. Notice that \({ A } [R]\xrightarrow {1..1~~~}{ B } [S]\) is equivalent to the combination of \(\textsc {KEY}({ R } [A])\) and \({ A } [R] \longrightarrow { B } [S]\).

All these constraint types can be easily encoded as FO formulae. Some examples will be shown in Sect. 2.3.

(Open) FO formulae can be used to query an instance of the data component. In particular, let \(Q(\mathbf{x })\) be query, i.e., a FO formula with free variables \(\mathbf{x }\), and let \(\mathcal {I}\) be a a database instance over \(\mathcal {R} \) and \(\varDelta \). The answer \({ ans }_{}(Q,\mathcal {I})\) to Q over \(\mathcal {I}\) is the set of assignments \(\theta \) from \(\mathbf{x }\) to the active domain of \(\mathcal {I}\), such that \(\mathcal {I}\models Q\theta \), where \(Q\theta \) is the boolean query obtained from Q by substituting each free variable \(x_i \in \mathbf{x }\) with \(\theta (\mathbf{x })\). Recall that the active domain of a database instance is the set of values explicitly appearing in its tuples. Given a query Q with no free variables (also called a boolean query), we say that Q is true in \(\mathcal {I}\), or equivalently that \(\mathcal {I}\) satisfies Q, if \(\langle \rangle \in { ans }_{}(Q,\mathcal {I})\). In this light, a database instance \(\mathcal {I}\) satisfies \(\mathcal {C}\) if it satisfies every boolean query in \(\mathcal {C}\).

Process component The process component \(\mathcal {P} \) defines the progression mechanism for the DCDS. It is constituted by a process, which queries the current data maintained by \(\mathcal {D} \) and determines which actions are executable, and with which parameters; parameterised actions, in turn, query and update \(\mathcal {D} \), possibly introducing new values from the external environment, by issuing service calls. Technically, \(\mathcal {P} = \langle \mathcal {F},\mathcal {A},\varrho \rangle \), where

  • \(\mathcal {F} \) is a finite set of functions, each representing the interface to a (nondeterministic) external service;

  • \(\mathcal {A} \) is a finite set of actions, whose execution updates the data component, and may involve external service calls;

  • \(\varrho \) is a finite set of condition-action rules that form the specification of the overall process, which tells at any moment which actions can be executed.

Actions. An action of \(\mathcal {A} \) is an expression \({\textsc {act}} ({\textsf {p}} _{{\textsf {1}}},\ldots ,{\textsf {p}} _{{\textsf {n}}}): \{e_1,\ldots ,e_m\}\), where:

  • \({\textsc {act}} ({\textsf {p}} _{{\textsf {1}}},\ldots ,{\textsf {p}} _{{\textsf {n}}})\) is the action signature, constituted by a name \({\textsc {act}} \) and a sequence \({\textsf {p}} _{{\textsf {1}}},\ldots ,{\textsf {p}} _{{\textsf {n}}}\) of parameters, to be substituted with values when the action is invoked;

  • \(\{e_1,\ldots ,e_m\}\), also denoted as \(\textsc {effect}({\textsc {act}})\), is a set of effects, which are assumed to take place simultaneously.

Each effect \(e_i\) has the form

$$\begin{aligned} Q(\mathbf {\textsf {p}} ,\mathbf{x }) \rightsquigarrow \mathbf{add }~A~\mathbf{del }~D \end{aligned}$$

where:

  • Q is a domain-independent FO query over \(\mathcal {R} \) whose terms are variables, action parameters, and constants from \(\mathcal {I}_0 \). Intuitively, Q selects the tuples to instantiate the effect with. During the execution, the effect is applied with a ground substitution \({{\mathbf {\mathtt{{d}}}}}\) for the action parameters, and for every answer \(\theta \) to the query \(Q({{\mathbf {\mathtt{{d}}}}},\mathbf{x })\).

  • A is a set of facts over \(\mathcal {R} \), which include as terms: free variables \(\mathbf{x }\) of Q, action parameters \(\mathbf {\textsf {p}} \) and/or Skolem terms \(f(\mathbf{x }',\mathbf {\textsf {p}} ')\) (with \(\mathbf{x }'\subseteq \mathbf{x }\), and \(\mathbf {\textsf {p}} '\subseteq \mathbf {\textsf {p}} \)). We use \(\textsc {skolem}(A)\) to denote all Skolem terms mentioned in A. At runtime, whenever a ground Skolem term is produced by applying substitution \(\theta \) to A, the corresponding service call is issued, replacing it with the result (from \(\varDelta \)) returned by the invoked service. The ground set of facts so obtained is added by the DCDS to its current database instance.

  • D is also a set of facts over \(\mathcal {R} \), which include as terms free variables \(\mathbf{x }\) of Q and action parameters \(\mathbf {\textsf {p}} \). At runtime, the ground facts obtained by applying substitution \(\theta \) to D are removed from the current database instance.

As in STRIPS, we assume that additions have higher priority than deletions (i.e., if the same fact is asserted to be added and deleted during the same execution step, then the fact is added). The “\(\mathbf{add }~A\)” part (resp., the “\(\mathbf{del }~D\)” part) can be omitted if \(A = \emptyset \) (resp., \(D = \emptyset \)).

Process. The process \(\varrho \) is a finite set of condition-action rules, each of the form \(Q(\mathbf{x }) \mapsto {\textsc {act}} (\mathbf{x })\), where \({\textsc {act}} \) is an action in \(\mathcal {A} \) and Q is again a FO query over \(\mathcal {R} \) whose free variables are exactly the parameters of \({\textsc {act}} \), and whose other terms can be quantified variables or constants mentioned in \(\mathcal {I}_0 \).

Finally, notice that effects and condition-action rules can be rearranged in a modular way, by observing that:

  • A single effect of the form

    $$\begin{aligned} Q(\mathbf {\textsf {p}} ,\mathbf{x }) \rightsquigarrow \mathbf{add }~A~\mathbf{del }~D \end{aligned}$$

    can be equivalently re-expressed as a set of effects

    $$\begin{aligned} Q(\mathbf {\textsf {p}} ,\mathbf{x })&\rightsquigarrow&\mathbf{add }~A_1~\mathbf{del }~D_1\\&\cdots&\,\\ Q(\mathbf {\textsf {p}} ,\mathbf{x })&\rightsquigarrow&\mathbf{add }~A_n~\mathbf{del }~D_n\\ \end{aligned}$$

    where some \(A_i\) or \(D_i\) could possibly be \(\emptyset \), and we have that \(A = \bigcup _{i \in \{1,\ldots ,n\}} A_i\) and \(D = \bigcup _{i \in \{1,\ldots ,n\}} D_i\).

  • Unions in condition-action rules can be implicitly obtained by composing multiple rules, since a single rule of the form

    $$\begin{aligned} \textstyle \bigvee _{i \in \{1,\ldots ,n\}} Q_i(\mathbf{x }) \mapsto {\textsc {act}} (\mathbf{x }) \end{aligned}$$

    can be equivalently re-expressed as a set of rules

    $$\begin{aligned} Q_1(\mathbf{x }) \mapsto {\textsc {act}} (\mathbf{x }) \qquad \cdots \qquad Q_n(\mathbf{x }) \mapsto {\textsc {act}} (\mathbf{x }) \end{aligned}$$

This equivalent rearrangement will be useful for the class of DCDSs introduced in Sect. 4, for which add and delete facts are required to obey to some restrictions.

2.2 Execution semantics

The execution semantics of a DCDS \(\mathcal {S} \) is a possibly infinite transition system \(\varUpsilon _{\mathcal {S}} \) whose states are labeled by database instances. This transition system represents all possible computations that the process component can do on the data component. Specifically, \(\varUpsilon _{\mathcal {S}} = \langle \varDelta ,\mathcal {R},\varSigma ,s_0,{ db },{\Rightarrow }\rangle \), where: (i) \(\varSigma \) is a set of states; (ii) \(s_0 \in \varSigma \) is the initial state; (iii) \({ db }\) is a function that, given a state \(s \in \varSigma \), returns the database instance of s, which is made of values in \(\varDelta \) and conforms to \(\mathcal {R}\) and \(\mathcal {C}\); (iv) \({\Rightarrow } \subseteq \varSigma \times \varSigma \) is a transition relation over states.

Fig. 1
figure 1

The archery training data component

Given a DCDS \(\mathcal {S} =\langle \mathcal {D},\mathcal {P} \rangle \) with \(\mathcal {D} = \langle \varDelta , \mathcal {R},\mathcal {C}, \mathcal {I}_0 \rangle \) and \(\mathcal {P} = \langle \mathcal {F},\mathcal {A},\varrho \rangle \), the transition system \(\varUpsilon _{\mathcal {S}} \) is intuitively constructed as follows. Starting from \(\mathcal {I}_0 \), all condition-action rules in \(\varrho \) are evaluated, determining which actions are executable, and with which ground parameter assignments. Non-deterministically, one such action \({\textsc {act}} \) with parameter assignment \(\rho \) is selected and executed over \(\mathcal {I}_0 \). To do so, every effect e of \({\textsc {act}} \) (partially grounded with the parameter assignment \(\rho \)) is evaluated, by calculating all the answers of its left-hand side, and grounding the right-hand side accordingly. If the right-hand side of e contains service calls, they are issued, receiving back for each of them a value nondeterministically chosen from \(\varDelta \). This value is then used to substitute the service call with the actual result; notice that, within an execution step, multiple occurrences of the same service call are substituted with the same value. The overall set of ground facts obtained by evaluating all effects of \({\textsc {act}} \rho \) in this way finally constitutes the next database instance, provided that no constraint in \(\mathcal {C}\) is violated. In case of violation, the database instance is not changed, and \({\textsc {act}} \rho \) is considered to be inapplicable for the given configuration of service call results. Notice that upon the execution of an action, the content of a relation is lost unless it is explicitly maintained through dedicated effects of the action. The transition system construction then proceeds by constructing all possible successors, each of which is obtained by selecting one of the executable actions with parameters, and one result for each of the involved service calls. The construction then recursively proceeds over such newly generated states.

The formal definition of the execution semantics for DCDSs can be found in [2]. We recall it here, considering the case where services behave nondeterministically (i.e., calling the same service with the same inputs in two different states could result in different values).

Formally, let \(\mathcal {I}\) be a database instance over \(\mathcal {R} \) and \(\varDelta \), such that \(\mathcal {I}\) satisfies all constraints in \(\mathcal {C}\). Consider an action \({\textsc {act}} (\mathbf {\textsf {p}} ): \{e_1,\ldots ,e_m\}\) in \(\mathcal {A} \), where \(e_i\) is of the form \(Q_i(\mathbf {\textsf {p}} ,\mathbf{x }) \rightsquigarrow \mathbf{add }~A_i~\mathbf{del }~D_i\), for \(i \in \{1,\ldots ,m\}\). A parameter substitution \(\rho \) for \({\textsc {act}} \) is an assignment that maps the parameters \(\mathbf {\textsf {p}} \) of \({\textsc {act}} \) to corresponding values in \(\varDelta \). We say that \(\rho \) is legal for \({\textsc {act}} \) in \(\mathcal {I}\) if there exists a condition-action rule \(Q(\mathbf{x }) \mapsto {\textsc {act}} (\mathbf{x })\) in \(\varrho \) such that \(\rho \in { ans }_{}(Q,\mathcal {I})\).

When \({\textsc {act}} \rho \) is applied on \(\mathcal {I}\), it produces a set of facts \(\textsc {do}(\mathcal {I},{\textsc {act}} \rho )\), in accordance to the execution semantics of its effects. Technically, we have

$$\begin{aligned}&\textsc {do}(\mathcal {I},{\textsc {act}} \rho ) = (\mathcal {I}\setminus F^-) \cup F^+,\,\text {where}\\&F^-={\bigcup }_{j \in \{1,\ldots ,m\}} {\bigcup }_{\theta _j \in { ans }_{}(Q_j\rho ,\mathcal {I})} {\bigcup }_{F_k \in D_i} F_k\theta _j\\&F^+={\bigcup }_{j \in \{1,\ldots ,m\}} {\bigcup }_{\theta _j \in { ans }_{}(Q_j\rho ,\mathcal {I})} {\bigcup }_{F_k \in A_i} F_k\theta _j \end{aligned}$$

This set of facts provides the footprint of the new database instance, which is completely determined only once the ground service calls contained in \(\textsc {do}(\mathcal {I},{\textsc {act}} \rho )\) are substituted by corresponding results.

We are now ready to formally define the construction of \(\varUpsilon _{\mathcal {S}} =\langle \varDelta ,\mathcal {R},\varSigma ,s_0,{ db },{\Rightarrow }\rangle \):

  • \(s_0 = \mathcal {I}_0 \);

  • \({ db }\) is the identity function;

  • \(\varSigma \) and \({\Rightarrow }\) are defined by simultaneous induction as the smallest sets satisfying the following properties:

    • \(\mathcal {I}_0 \in \varSigma \)

    • for \(\mathcal {I}\in \varSigma \), then for every action \({\textsc {act}} (\mathbf {\textsf {p}} )\) in \(\mathcal {A} \), every legal parameter substitution \(\theta \) for \({\textsc {act}} \) in \(\mathcal {I}\), and every substitution sigma mapping each ground service call contained in \(\textsc {do}(\mathcal {I},{\textsc {act}} \rho )\) to a corresponding value in \(\varDelta \), if \(\mathcal {I}' = \textsc {do}(\mathcal {I},{\textsc {act}} \rho )\sigma \) satisfies all constraints in \(\mathcal {C}\), then \(\mathcal {I}' \in \varSigma \) and \(\mathcal {I}\Rightarrow \mathcal {I}'\).

2.3 Robin Hood and the Archery training process

We now introduce a simple DCDS that summarizes all the key ingredients that will be discussed in the remainder of the paper.

Robin Hood is a renown archer, and needs an information system to keep track of the archery courses he delivers to his apprentices among the merry men. To this end, he creates a DCDS \(\mathcal {S} _{rh}\) that supports him in maintaining the information of interest, and manipulate it over time.

The schema and constraints of \(\mathcal {S} _{rh}\) are listed in Fig. 1. As for the schema:

  • \({ Group } \) (id) states that there is an archery group identified by id.

  • \({ Meets } \) (weekSlot,group,where) indicates that the weekly time slot weekSlot is dedicated to the training of group in the location specified by the code where.

  • \({ MerryM } \) (id,name,birthdate,combatLevel,group) states that the person identified by id is a merry man named name and born on birthdate, who has currently an archery ability corresponding to combatLevel, and is enrolled in group. We reserve a special constant \({\mathtt {null}} \) to model the case where a person is not enrolled in any group.

  • \({ Trusts } \) (subj,obj) models that merry man subj trusts merry man obj.

  • \({ State } \) is a relation that glues the data component with the process component, in particular to keep track of the current state of each group—an information that is used to “locate” the group inside the process. Specifically, \({ State } (group,s)\) indicates that group is currently in state s, which may be either \({\mathtt {in}} \) (the group is being assembled), \({\mathtt {running}} \) (the group is under training), or \({\mathtt {out}} \) (the group has completed the training).

The schema of Fig. 1 is equipped with the following constraints:

  • In each state, the \({ combatLevel } \) of a merry man is one of three pre-defined levels:

    $$\begin{aligned}&\forall id,n,b,c,g.{ MerryM } (id,n,b,c,g)\\&\quad \rightarrow (c ={\mathtt {basic}} \vee c = {\mathtt {ok}} \vee c = {\mathtt {pro}}) \end{aligned}$$

    Similarly, for group states we have:

    $$\begin{aligned}&\forall id,s. { State } (id,s)\\&\quad \rightarrow (s = {\mathtt {in}} \vee s = {\mathtt {running}} \vee s = {\mathtt {out}}) \end{aligned}$$
  • The first columns of \({ MerryM } \), \({ State } \), and \({ Meets } \) are the (primary) keys of the corresponding relationsFootnote 2.:

    $$\begin{aligned} \begin{array}{l} \forall id,n_1,b_1,c_1,g_1,n_2,b_2,c_2,g_2.\\ { MerryM } (id,n_1,b_1,c_1,g_1)\wedge { MerryM } (id,n_2,b_2,c_2,g_2) \\ \qquad \rightarrow n_1=n_2 \wedge b_1=b_2 \wedge c_1=c_2 \wedge g_1=g_2\\ \forall id, s_1,s_2. { State } (id,s_1)\wedge { State } (id,s_2) \rightarrow s_1 = s_2\\ \forall s,g_1,w_1,g_2,w_2. { Meets } (s,g_1,w_1) \wedge { Meets } (s,g_2,w_2)\\ \qquad \rightarrow g_1 = g_2\wedge w_1=w_2 \end{array} \end{aligned}$$
  • The two attributes of \({ Trusts } \) reference both a merry man. There are therefore two foreign key constraints, formalized as:

    $$\begin{aligned} \begin{array}{l} \forall s,o. { Trusts } (s,o) \rightarrow \exists n,b,c,g. { MerryM } (s,n,b,c,g)\\ \forall s,o. { Trusts } (s,o) \rightarrow \exists n,b,c,g. { MerryM } (o,n,b,c,g) \end{array} \end{aligned}$$

    Similarly for the foreign key starting from the \({ State } \) relation.

  • The foreign key starting from the \({ MerryM } \) relation does not start from an attribute that is part of the primary key for the source relation. Hence, differently from the previous case, the FO formalization needs to consider also the fact that the attribute is nullable:

    $$\begin{aligned}&\forall id,n,b,c,g. { MerryM } (id,n,b,c,g)\\&\quad \rightarrow g = {\mathtt {null}} \vee {} { Group } (g) \end{aligned}$$
  • \({ Meets } \) has a cardinality-bounded foreign key pointing to \({ Group } \), of the form \({ Meets } [group]\xrightarrow {1..2~~~}{ Group } [id]\). This can be formalized in FOL as:

    $$\begin{aligned} \begin{array}{l} \forall s,g,w.{ Meets } (s,g,w) \rightarrow g = {\mathtt {null}} \vee { Group } (g)\\ \forall g,s_1,w_1,s_2,w_2,s_3,w_3.\\ ~{ Meets } (s_1,g,w_1) \wedge { Meets } (s_2,g,w_2) \wedge { Meets } (s_3,g,w_3)\\ \qquad \rightarrow s_1=s_2 \vee s_1=s_3\vee s_2=s_3 \end{array} \end{aligned}$$

    where \({\mathtt {null}} \) is again used to model that the foreign key may be null.

Finally, the data component of \(\mathcal {S} _{rh}\) populates the \({ MerryM } \) relation with all the merry men that live together with Robin Hood in the Sherwood forest, together with their personal information and trust relations. We also assume that, at the beginning, no group exists, and consequently all merry men have \({\mathtt {null}} \) in the corresponding attribute.

Fig. 2
figure 2

The archery training process control-flow. Tokens denote process cases, which in this example correspond to groups of people

Figure 2 shows a Petri net that sketches the archery training process. Intuitively, places represent group states (and in fact they correspond to the possible values that the second column of relation \({ State } \) can take). Transitions correspond to DCDS actions manipulating groups and their related informations, whose executability depends on the group state.

Specifically, the special \({\textsc {cr}}{\hbox {-}}{\textsc {group}}\) action is always executable, and has the effect of creating a new group in the information system, putting it into the \({\mathtt {in}} \) state. The group identifier is injected into the system by calling the \(\mathbf{newId } \) service.

Robin Hood can add an apprentice to a newly created group, provided that the apprentice is not already enrolled in a group. The effect of the action is to update the group attribute of the selected merry man; this is modeled by removing the current tuple of that merry man, and reinserting it with the updated group attribute.

At the same time, a newly created group can be updated by providing a weekly slot in which Robin inputs when and where a certain group meets.

To model the two user inputs, service calls \(\mathbf{inWhen } \) and \(\mathbf{inWhere } \) are used, both taking as parameter the identifier of the group for which the slot is being created. We can imagine that such service calls are actually realised as a user form that asks Robin Hood to provide the time and location of the group given as input.

Interestingly, although no explicit indication is given in the process control-flow of Fig. 2, the combination between such control-flow and the data constraints tells us that at it will be possible to add at most two weekly slots for the same group. This example attests how much involved process analysis becomes once the combination between the data and process component is fully tackled.

A group in the \({\mathtt {in}} \) state can be turned into a \({\mathtt {running}} \) group by executing the \({\textsc {start}}\hbox {-}{\textsc {training}}\) action. This has the effect of making the group not eligible anymore for adding new apprentices.

$$\begin{aligned}&{ Group } (g) \wedge { State } (g,{\mathtt {in}}) \mapsto {\textsc {start}}{\hbox {-}}{\textsc {train}}(g)\\&\begin{array}{l} {\textsc {start}}{\hbox {-}}{\textsc {train}}({\textsf {g}}) : \\ \left\{ \begin{array}{l} { State } ({\textsf {g}},s) \!\rightsquigarrow \! \mathbf{del }\{{ State } ({\textsf {g}},s)\}~ \mathbf{add }\{{ State } ({\textsf {g}},{\mathtt {running}})\} \end{array}\right\} \end{array} \end{aligned}$$

The end of the training for a running group is marked by executing the \({\textsc {end}}\hbox {-}{\textsc {training}}\) action. Besides the state update for the group, this has a twofold effect:

  • the combat level of each group member is updated according to the quality of his performance;

  • the group members are dissociated from the group, becoming again free to be enrolled in another group.

Notice that the combat level assessment is input by Robin Hood for each of the involved merry men. To model such a user input, service call \(\mathbf{assess } \) is used, which takes as parameter the identifier of the merry man to be assessed. We can again imagine this service call to be realised as a user form for Robin. Differently from the weekly slot case, though, the service call result is implicitly subject to the database constraint that enumerates the acceptable values for the combatLevel attribute of \({ MerryM } \). This implies that the provided input needs to correspond to one of the three pre-defined levels.

2.4 Fresh value injection

A technical, but important, aspect related to DCDSs is that issuing a service call does not guarantee that the obtained result is a fresh value, that is, a value that is not present in the current active domain. In the archery training DCDS of Sect. 2.3, this is perfectly fine with the \(\mathbf{assess } \) service call, which in fact is forced to return one of the three pre-defined combat levels, but is not satisfactory with the \(\mathbf{newId } \) service call. In fact, when creating a new group, the implicit requirement is that the identifier assigned to that group is not already assigned to another group. In this respect, the formalization of the \({\textsc {cr}}{\hbox {-}}{\textsc {group}}\) action is not correct, as it could result in a no-op if \(\mathbf{newId } \) returns an identifier that is already assigned to another idle group in the \({\mathtt {in}} \) state.

In this section, we show that DCDSs can easily model the injection of a new value that is guaranteed to be fresh w.r.t. the values present in a given column of the current database instance (this can be easily generalized to multiple columns, or even the entire active domain). This is particularly useful in all those cases where a new primary key has to be generated for a certain relation, such as that of \({\textsc {create}}\hbox {-}{\textsc {group}}\).

Let \(\mathbf{f } \) be a 0-ary service call, and let R be an n-ary relation. We want to ensure that whenever \(\mathbf{f } \) is called, the obtained result is fresh w.r.t. the i-th column of R, i.e., different from all values appearing in the i-th position of R-tuples in the current database instance. This can be guaranteed by modifying the original DCDS specification as follows:

  • The database schema of the original DCDS is augmented with two additional relations: a unary relation \({ Temp_f } \) used to store a copy of the value returned by \(\mathbf{f } \), and an n-ary relation \(R_{prev}\), whose extension corresponds to the extension of R in the previous state.

  • Each action of the original DCDS is augmented with two additional effects, used to populate the next instance of \(R_{prev}\) with the current extension of R. This is done by emptying the content of \(R_{prev}\), and by filling it with all tuples currently stored in R:

  • Every action that employs \(\mathbf{f } \) in (the head of) its effects is augmented with an additional effect, which enforces that a copy of the result returned by \(\mathbf{f } \) is stored into relation \({ Temp_f } \):

    $$\begin{aligned} {\textsf {true}}\rightsquigarrow \mathbf{add }~\{Temp_f(\mathbf{f } ())\} \end{aligned}$$
  • The data component of the original DCDS is augmented with a constraint that enforces the freshness of the results returned by \(\mathbf{f } \) w.r.t. the i-th component of (the previous extension of) \({ R } \):

    $$\begin{aligned}&\forall x_1,\ldots ,x_n. R_{prev}(x_1,\ldots ,x_{i-1},x_i,\ldots ,x_n)\\&\quad \rightarrow \lnot { Temp_f } (x_i) \end{aligned}$$

    The constraint indeed checks that the i-th component of each \(R_{prev}\) tuple does not correspond to the result returned by \(\mathbf{f } \), which is stored into the \({ Temp_f } \) relation.

Thanks to this (linear) transformation, we can introduce the surface syntax \(\mathbf{f } _{{ R } [id]}\) to indicate that \(\mathbf{f } \) is fresh w.r.t. the i-th column of R, remembering that a DCDS employing such a syntactic sugar can always be transformed into a standard DCDS.

In this respect, the \({\textsc {cr}}{\hbox {-}}{\textsc {group}}\) action of the archery training DCDS in Sect. 2.3 can be correctly rephrased as follows:

$$\begin{aligned} \begin{array}{l} {\textsc {cr}}{\hbox {-}}{\textsc {group}}() : \\ \quad \left\{ {\textsf {true}}\rightsquigarrow \mathbf{add }~ \left\{ \begin{array}{l} { Group } (\mathbf{newId } _{{ { Group } } [id]}()),\\ { State } (\mathbf{newId } _{{ { Group } } [id]}(),{\mathtt {in}}) \end{array}\right\} \right\} \end{array} \end{aligned}$$

3 Data-aware, case-centric processes

In this section, we introduce some minimal modeling guidelines on the shape that a DCDS must have to be considered “case-centric”. We then provide a fine-grained classification of the different types of data stored in a relational database taking the (business) process perspective, and considering that our main focus is in the static analysis of the system, i.e., before cases are actually executed. We conclude by introducing a formal notion of process correctness, reformulating the well-known notion of soundness [29] in our data-aware setting. This constitutes the basis for Sect. 4, in which we study the boundaries of (un)decidability when checking soundness over (subclasses of) DCDSs.

3.1 Basic control structures

To identify the class of case-centric DCDSs, we make the following basic assumptions on their shape.

Fig. 3
figure 3

Core relations and constraints in a case-centric data component \(\mathcal {D} \)

Let \(\mathcal {S} = \langle \mathcal {D},\mathcal {P} \rangle \) be a DCDS. For \(\mathcal {S} \) to be case-centric, we first assume that \(\mathcal {D} \) contains two relations (and related constraints) to keep track of case-related information, as shown in Fig. 3:

  • A unary case relation, storing case identifiers. The name of such a relation depends on the specific domain under study. For example, in Sect. 2.3, the case relation is \({ Group } \). We use \(\textsc {case-rel}_{\mathcal {D}}\) to denote the case relation of data component \(\mathcal {D} \).

  • A binary \({ State } \) relation, storing the current state(s) of each case.

Similarly to the special input and output places used in workflow nets [29], we assume that each case has two special states: an \({\mathtt {in}} \) state in which the case is located when it is created, and an \({\mathtt {out}} \) state reached by the case when the execution of the process on it terminates. This data structure generalizes that of \({ Group } \) and \({ State } \) in Sect. 2.3. The main difference is that in this general form, the primary key of relation \({ State } \) is constituted by the entire relation, so as to support the possibility of associating multiple states to the same case. This is particularly useful to model concurrency and sub-processes, that is, multiple threads for the same case, each located in a different state.

Let us now consider two general requirements on the process component \(\mathcal {P} \), in relation to the manipulation of cases independently from the specific domain. First, we assume that \(\mathcal {P} \) contains a special action (without parameters) to create a new case. This is the only action that can actually add values to the \(\textsc {case-rel}_{\mathcal {D}}\) relation, which we consider to be always initially empty. For compactness of notation, from now on we assume that \(\textsc {case-rel}_{\mathcal {D}}\) is relation \({ C } \) with attribute \({ id } \). The action generating a new case is then formalized as:

$$\begin{aligned} \begin{array}{l} {\textsc {new}}\hbox {-}{\textsc {case}}(): \left\{ {\textsf {true}}\rightsquigarrow \mathbf{add }\left\{ \begin{array}{l} { C } (\mathbf{newId } _{{ { C } } [id]}()),\\ { State } (\mathbf{newId } _{{ { C } } [id]}(),{\mathtt {in}}) \end{array}\right\} \right\} \end{array} \end{aligned}$$

The \({\textsc {new}}\hbox {-}{\textsc {case}}\) action can be considered as a data-aware variant of emitter transitions in Petri nets. Its executability depends on the targeted domain, i.e., on the domain-specific conditions that allow for the generation of a new case. In the most general situation, e.g., the one in which the creation of a new case depends on external stakeholders (i.e., cannot be controlled by the company that manages the process execution), the condition-action rule determining the executability of \({\textsc {new}}\hbox {-}{\textsc {case}}()\) is simply: \({\textsf {true}}\mapsto {\textsc {new}}\hbox {-}{\textsc {case}}()\).

The second general requirement concerns the core nature of case-centric processes: each action \({\textsc {act}} \) of \(\mathcal {P} \) (except \({\textsc {new}}\hbox {-}{\textsc {case}}\)) is required to have one parameter matching with a case identifier. For this reason, each condition-action rule is required to have the following form, where \(\varPhi \) is a domain-specific condition for \({\textsc {act}} \):

$$\begin{aligned} { C } (c) \wedge \varPhi (c,\mathbf{x }) \mapsto {\textsc {act}} (c,\mathbf{x }) \end{aligned}$$

In summary, a DCDS \(\langle \mathcal {D},\mathcal {P} \rangle \) is case-centric if:

  • \(\mathcal {D} \) contains the relations and constraints shown in Fig. 3, and is such that \(\textsc {case-rel}_{\mathcal {D}}\) is empty in the initial database of \(\mathcal {D} \);

  • \(\mathcal {P} = \langle \mathcal {F},\mathcal {A},\varrho \rangle \) is such that:

    • \(\mathcal {A} = \mathcal {A} _{norm} \uplus \{{\textsc {new}}\hbox {-}{\textsc {case}}()\}\), where \({\textsc {new}}\hbox {-}{\textsc {case}}()\) is defined as above;

    • \(\varrho = \varrho _{norm} \uplus \{{\textsf {true}}\mapsto {\textsc {new}}\hbox {-}{\textsc {case}}()\}\), where each condition-action rule in \(\varrho _{norm}\) has the form \({ C } (c) \wedge \varPhi (c,\mathbf{x }) \mapsto {\textsc {act}} (c,\mathbf{x })\), in which C corresponds to \(\textsc {case-rel}_{\mathcal {D}}\), and \({\textsc {act}} \in \mathcal {A} _{norm}\).

3.2 The different roles of data

We now overview how the data maintained in the data component of a data-aware, case-centric process can be classified w.r.t. the process itself. Our focus is on static analysis, i.e., on the study of how the process will manipulate the underlying data before process cases are actually executed in the real world. We use the archery training process of Sect. 2.3 to illustrate the different concepts.

Control, distinguished, and pure values Among all the data values \(\varDelta \) maintained in a database, some values have the peculiar characteristic of being explicitly used by business processes to select which actions can be executed and which not, in turn determining the allowed courses of execution. For this reason, we call such values control values. In the archery training example, the control values are:

  • All the values maintained in the second column of the \({ State } \) relation, to keep track of the state of each group and, in turn, determine which actions can be executed.

  • The \({\mathtt {null}} \) value, because it determines whether a certain merry man can be selected as apprentice for a group or not.

It is important to notice that these values are typically pre-defined, and finitely many. They can be identified by analyzing the process control-flow, and extracting all those values that are explicitly mentioned in its specification. In DCDSs, this is done by scanning all queries used in the conditions of condition-action rules, as well as those specifying the body of action effects. Figure 4 shows a simple example of control values in BPMN.

Fig. 4
figure 4

Control values in a simple BPMN diagram: \({\mathtt {OK}}\) and \({\mathtt {ERR}}\) are control values that route the process to different alternative flows

Obviously, the database does not just contain control values. A second important class of data values is that of distinguished values, that is, values that have a special semantics for the domain experts and end users. In particular, those values can be explicitly mentioned by users when posing queries over the database. In this light, control values can be considered as a subset of distinguished values, since they can be mentioned by users to query the state of cases. In the archery training process, distinguished values are all group states, the special value \({\mathtt {null}} \) and all combat levels. It is in fact expected that Robin Hood will be interested, e.g., in knowing which are the merry men that have a \({\mathtt {basic}} \) combat level.

All other values of \(\varDelta \) are not explicitly used to route the process nor to formulate user queries, but are simply data values describing a relevant information about the domain. Examples of such kind of values are tuple identifiers (i.e., internal ids used to fill the primary key of a relation), characteristics of products in a catalogue (such as products names and their bar codes), and features of people (such as social security numbers, credit card numbers, names, and addresses). In the archery training example, these are the values appearing in all columns of the \({ MerryM } \) relation but the combat level (and the special \({\mathtt {null}} \) value). The crucial characteristic of a data value of such kind is that the value does not have an importance per sè, but only in relation with other data. As a consequence, they are only compared for identity with each other. For example, it is important to know that a given person address is associated to the same social security number to which a cloned credit card is registered, but the consistent renaming of the social security number to a different number would not lead to any information loss, provided that the relationship with the address and credit card are maintained: comparing such values for identity would still lead to the same result. This is especially true when focusing on static analysis, because at design time many values are not yet present in the database, but will be actually injected during the execution of process cases.

This notion of “invariance under renaming” is very well known in computer science. On the one hand, it relates to the classical notion of genericity in databases [12], and to invariance under isomorphism in first-order logic. On the other hand, it is reminiscent of name binding in programming languages [17], of name usage in distributed systems [26], process algebras [25], and Petri nets [27], and of nominal setsFootnote 3 [7].

Leveraging on the terminology used in many of these papers, we refer to this kind of data values as pure values. Notably, both genericity and nominal sets have been exploited to provide key decidability results for the static verification of dynamic systems manipulating data, see respectively [2, 4, 6, 10] and [8].

Figure 5 overviews how the overall set of data values \(\varDelta \) is decomposed into the three sets of control, distinguished, and pure values.

Fig. 5
figure 5

Control, distinguished and pure values

Read/write data access When analyzing how a process impacts on data, it is important to understand in a fine-grained way how such data are accessed by the process. This has a twofold impact on verification: on the one hand, it may help towards decidability, and on the other hand it may help in reducing the state space to be considered. Consider again the archery training example. Clearly, the \({ State } \) and \({ MerryM } \) relations play a radically different role in the process: the first one is a core relation in controlling the process execution, whereas the second one keeps domain data that only partially interact with the process, which never uses the columns maintaining names and birthdates.

More in general, a first useful characterization concerns the kind of access that processes can have on certain relations and relation attributes: (i) no-access, (ii) read-only, (iii) write-only, and (iv) read-write.

Interestingly, read-write attributes are typically used to store control data values used to drive the evolution of the different process cases. Also notice that the dividing line between “no access” and “read-only” is not crisp, since some attributes are read just to present useful information to the end users, but they do not really participate to the process. This is, e.g., the case of the merry man name and birthday attributes, which, in the context of the archery training process, do not impact the process execution at all, but could still be employed to make Robin Hood able to make sense out of the corresponding identifiers.

Among all relations, of particular interest in this paper are the two classes of cardinality-immutable and read-only relations.

Cardinality-immutable relations are relations that can be accessed in write-mode by the process only in a controlled way. In particular, the attributes forming the primary key of a cardinality-immutable relation are read-only. In this special case, the process can manipulate the content of the relation, but cannot shrink nor extend its set of tuples. To comply with cardinality-immutable relations, the action specification of a DCDS must satisfy the following condition: for each relation R with \(\textsc {PK}({ R } [K])\), every effect \(Q \rightsquigarrow \mathbf{add }~A~\mathbf{del }~D\) is such that:

  • either the effect does not mention R in A or in D; or

  • Q mentions R in a way that variables \(\mathbf{id }\) are used in the positions corresponding to attributes in K, and both A and D mentions R as well, in a way that the same variables \(\mathbf{id }\) are again used in the positions corresponding to attributes in K.

Intuitively, the first kind of effect does not touch the extension of R, therefore maintaining it implicitly unaltered. The second kind of effect models instead the case of an update for all R-tuples, where the update of each tuple is modelled as a deletion and addition in such a way that the primary key is left untouched.

Read-only relations are cardinality-immutable relations whose attributes are all read-only. To comply with read-only relations, the action specification of a DCDS must never mention them in the right-hand side of effects.

Throughout the paper, we assume that once a relation is declared to be cardinality-immutable or read-only, then every DCDS complies with this requirement.

Example 1

The archery training DCDS is such that relation \({ MerryM } \) is cardinality-immutable. It is easy to see that whenever such a relation is mentioned in the left-hand side of an effect, then it is also mentioned both in the corresponding \(\mathbf{add }\) and \(\mathbf{del }\) sets, in such a way that the primary key is maintained. \(\square \)

Relevant vs irrelevant data, and vertical partitioning The classification of attribute access helps in sharpening the focus on those portions of the database that are really relevant for the static analysis of business processes. Specifically: no-access attributes, write-only attributes, and read-only attributes that are just accessed for presentation purposes, can be abstracted away without impacting on the static analysis, given the fact that they do not influence the execution of process cases. Since this operation corresponds to removing columns (or tables) from the database schema, we call it vertical partitioning. Figure 6 shows the result of vertical partitioning on the archery database schema.

Fig. 6
figure 6

Vertical partitioning of the archery database schema on read-write attributes and primary keys; notice that \({ MerryM }\) is a cardinality-immutable relation

Related vs unrelated data, and horizontal partitioning The last dimension concerning the way data can be manipulated by business processes is about query patterns, which determine how data are related to each other. If no restriction is made on the way queries are posed over the database, then every datum can in principle be related with any other datum (think about the extreme case of a query that executes the cartesian product of every relation present in the database schema).

This chaotic way of accessing the database is not only unreasonable, but also dashes any hope of getting decidability of static analysis (see, e.g., [2, 11, 21]). Contrariwise, relations are usually queried in a well-disciplined way, e.g., by applying joins in a compatible way with foreign keys. When business processes follow this principle, their execution tends to produce data organized into data slices, each containing data that are related to each other, and such that data of different slices are only partially related, or even completely unrelated. We call this emergent property horizontal partitioning. Figure 7 provides an intuition for horizontal partitioning in our running example.

Fig. 7
figure 7

Horizontal partitioning with three slices considering a possible database instance for the archery training process. The merry man with id 76 is part of the slice of group 4 because, even though it is not one of the members of group 4, he can be queried and become one of its apprentices; this is in fact the only tuple that is shared by slices belonging to different groups in the \({\mathtt {in}} \) state

The notion of horizontal partitioning into data slices perfectly fits case-centric business processes, which typically assume that each case evolves independently from the others. This is apparent when considering the control-flow constructs of standard process modelling languages, which do not provide any form of support for inter-case relationships, and translates into the formal notion of freedom of choice [29] by considering their underlying Petri net-based semantics. In fact, free-choice nets guarantee that the route a token can take among multiple choices does not depend on the position of other tokens, enforcing this idea of isolation.

Understanding whether a case-centric business process enforces this separation becomes much more problematic in a data-aware setting: even though two cases may seem independent from the control-flow perspective, they could still indirectly interact through the underlying common database. Studying this aspect is exactly the purpose of Sect. 4 below. Finally, notice that this notion of separation becomes much less definite and even more difficult to be properly understood when adopting an artifact-centric approach to process modelling. In fact, one of the trickiest aspects of business artifacts is the fact that they could establish many-to-many relations with each other [15], mutually affecting their evolution.

3.3 Data-aware soundness

Soundness is a fundamental notion in the control-flow analysis of standard, case-centric process models. Since the original definition in [29], many variants and relaxations have been proposed to characterize the correctness of processes [30]. We lift now the classical notion of soundness to account for case-centric, data-aware processes—relaxations can be then easily defined by reconstructing the literature in such a data-aware setting.

Soundness is defined over processes that have two special states: an input state denoting the starting point of each case, and an output state denoting the ending point. With this minimal assumption, soundness intuitively imposes that, for each case that starts in the input state, we have:

  1. 1.

    eventual termination: the output state is always reachable, i.e., in every state there is a sequence of executable tasks that leads to the output state;

  2. 2.

    proper termination: the output state is always reached in a clean way, i.e., when the case is in the output state, then there is no executable task;

  3. 3.

    task executability: there is no dead task, i.e., for each task there is at least one execution leading to a state in which that task is executable.

We now rephrase soundness in the context of case-centric DCDSs. Let \(\mathcal {S} = \langle \mathcal {D},\mathcal {P} \rangle \) be a case-centric DCDS with \(\mathcal {P} = \langle \mathcal {F},\mathcal {A},\varrho \rangle \). Intuitively, data-aware soundness imposes that, for each case identifier c belonging to the \(\textsc {case-rel}_{\mathcal {D}}\) relation and such that \({ State } (c,{\mathtt {in}})\) holdsFootnote 4, we have:

  1. 1.

    eventual termination: along every possible future execution, it is always the case that a state can be reached where \({ State } (c,{\mathtt {out}})\) holds;

  2. 2.

    proper termination: when \({ State } (c,{\mathtt {out}})\) holds, then there is no condition-action rule in \(\varrho _{norm}\) that can fire on c;

  3. 3.

    task executability: for each action \({\textsc {act}} (\mathbf{p })\) in \(\mathcal {A} _{norm}\), there exists an execution leading to a state in which \({\textsc {act}} ({{\mathbf {\mathtt{{d}}}}})\) can be executed with some concrete values \({{\mathbf {\mathtt{{d}}}}}\) instantiating the parameters \(\mathbf{p }\), according to the condition-action rules in \(\varrho _{norm}\).

We now show how data-aware soundness can be actually formalized in the \(\mu \mathcal {L}_{P} \) verification logic introduced in [2]. \(\mu \mathcal {L}_{P} \) is a variant of first-order \(\mu \)-calculus that allows only for a controlled, “persistent” form of first-order quantification across different states of a DCDS. In particular, quantification only applies as long as data values persist in the active domain of the evolving DCDS database. Given a case-centric DCDS \(\mathcal {S} = \langle \mathcal {D},\mathcal {P} \rangle \) with \(\mathcal {P} = \langle \mathcal {F},\mathcal {A},\varrho \rangle \), we have that \(\mathcal {S} \) is sound if \(\varUpsilon _{\mathcal {S}} \models \varPhi _{sound}\), where \(\varPhi _{sound}\) corresponds to formula

$$\begin{aligned} \nu Z. \big (\forall c. { C } (c) \wedge { State } (c,{\mathtt {in}}) \rightarrow (\xi (c) \wedge \pi (c) \wedge \tau (c))\big ) \wedge Z \end{aligned}$$

in which, under the assumption that \(C = \textsc {case-rel}_{\mathcal {D}}\):

  1. 1.

    \(\xi (c)\) models the eventual termination of c as follows:

    $$\begin{aligned}&\nu W. \Big (\mu Y. \big ({ C } (c) \wedge { State } (c,{\mathtt {out}}) \vee ({ C } (c) \wedge Y)\big )\Big ) \\&\quad \wedge ({ C } (c) \wedge W) \end{aligned}$$
  2. 2.

    \(\pi (c)\) models the proper termination of c as follows:

  3. 3.

    \(\tau (c)\) models task executability on c as follows:

It is important to observe that, while being expressible in the \(\mu \mathcal {L}_{P} \) logic, checking soundness cannot be directly tackled by the technique presented in [2]. In fact, the key property required by [2] on the shape of DCDSs towards decidability of verification against \(\mu \mathcal {L}_{P} \) properties, is that of state-boundedness. State-boundedness requires the existence of a number that provides an overall bound over the size of each database produced along any run of the DCDS under study. This does not hold for case-centric DCDSs, for which there is no bound on the number of cases that may simultaneously coexist in the system.

4 Soundness of case-centric DCDSs

We now study verification of (data-aware) soundness over case-centric DCDSs, showing that its decidability holds for an interesting class of case-centric DCDSs. More specifically, our main goal is to introduce a set of modeling guidelines that on the one hand guarantee decidability of soundness, and on the other hand allow the modeler to capture real-life processes. We achieve the formulation of this class incrementally, by showing that each limitation we introduce is necessary towards decidability (i.e., by relaxing it, soundness becomes undecidable). To do so, we take inspiration from the artifact-centric methodology studied in [11]. However, our results are quite different: those in [11] are focused on verification of termination properties in the context of an UML-based artifact-centric framework, whereas here we are interested in the study of soundness over case-centric process models.

The aspects that we incrementally tackle towards decidability are:

  • Navigationality The process cannot arbitrarily work over all relations of the schema, but must access them in a controlled way. By leveraging the notion of case, we require queries to be case-navigational, i.e., they start from the case relation and navigate foreign keys backward so as to reach other relations.

  • Case-width-boundedness When navigating a foreign key from a target to a source relation, the process must not have the possibility of creating unboundedly many tuples of the source relation.

  • Case-depth-boundedness It must be prevented that the process can create and manipulate unbounded chains of relations.

  • Isolation Each case manipulates its own data, and does not interfere with the data of the other cases. When different cases are not properly isolated, that is, they share read-write relations, then they could interact in such a way that, in spite of case-depth-boundedness, they could form unbounded chains altogether.

All undecidability proofs that we provide in the following rely on reductions of soundness checking for case-centric DCDSs to the halting problem of two-counter machines [23], which are machines simultaneously operating over a propositional control state and the values maintained by two counters through increment and decrement, in a way that depends on the values of the counters. We just focus on the core of the reductions, showing how each considered class of case-centric DCDSs can encode a counter and corresponding operations of increment, decrement, and test for zero. More specifically, we show how the DCDS under study can simulate the following three operations:

  • \(\langle {\mathtt {k}},{\mathtt {c}} {} \texttt {+},{\mathtt {k'}} \rangle \), which indicates that when the machine is in control state \({\mathtt {k}} \), then it moves to state \({\mathtt {k'}} \) by incrementing the value of counter \({\mathtt {c}} \) by one unit.

  • \(\langle {\mathtt {k}},{\mathtt {c}} {} \texttt {-},{\mathtt {k'}} \rangle \), which indicates that when the machine is in control state \({\mathtt {k}} \) and counter \({\mathtt {c}} \) holds a positive value, then the machine moves to state \({\mathtt {k'}} \) by decrementing the value of counter \({\mathtt {c}} \) by one unit.

  • \(\langle {\mathtt {k}},{\mathtt {c}} {} \texttt {==0},{\mathtt {k'}} \rangle \), which indicates that when the machine is in control state \({\mathtt {k}} \) and counter \({\mathtt {c}} \) is zero, then the machine moves to state \({\mathtt {k'}} \).

Once these operations are available, a program operating over two counters can be encoded by a DCDS where each case represents an execution of the program, and control states correspond to case states, such that the first instruction corresponds to state \({\mathtt {in}} \), and the last one to state \({\mathtt {out}} \). Furthermore, once a single counter is encoded into a DCDS, a second counter can be immediately obtained by duplicating the involved relations, or by adding a new column that plays the role of counter identifier. This technique has been recently independently adopted in [11] and [21].

4.1 Undecidability of soundness

Unsurprisingly, given the expressiveness of case-centric DCDSs, the following negative result holds:

Theorem 1

Checking soundness of case-centric DCDSs is undecidable.

Proof

Consider a case-centric DCDS equipped with two unary relations \(C_1(v)\) and \(C_2(v)\). In the general case, no restriction is imposed on how the DCDS can manipulate the extension of these two relations, hence each case can arbitrarily query and manipulate those relations. In particular, the DCDS can simulate a 2-counter machine where the values of the counters correspond to the number of tuples in \(C_1\) and \(C_2\), and, consequently:

  • increment is encoded by adding a fresh value into the corresponding counter relation;

  • decrement is encoded by (nondeterministically) removing one of the tuples in the corresponding counter relation;

  • test for zero is simulated by asking whether the extension of the corresponding counter relation is empty.

This technique resembles the one used in [11] and [21]. More specifically, the DCDS maintains a unary relation \({ Contr } \), which contains a single case identifier responsible for the manipulation of the two counters. An initial action \({\textsc {selContr}} \) is devoted to select the controller as follows:

  • if \({ Contr } \) is empty, then the case on which the action is applied becomes the controller of the counter, and is moved to the first control state \({\mathtt {k}} _0\);

  • otherwise, the case immediately terminates (i.e., it is put in the \({\mathtt {out}} \) state).

By assuming that \({ Case } \) is the case relation, this is formalized as:

An increment operation of the form \(\langle {\mathtt {k}},{\mathtt {c_1}} {} \texttt {+},{\mathtt {k'}} \rangle \) is encoded as follows (notice the fact that only the controller case can apply the action):

In the formalization above, the new control state is passed as an action parameter, and increment is simulated by introducing a fresh value into relation \(C_1\), relying on the nullary service call \(\mathbf{newVal } \).

A decrement operation of the form \(\langle {\mathtt {k}},{\mathtt {c_1}} {} \texttt {-},{\mathtt {k'}} \rangle \) is encoded as follows:

The structure is similar to increment, the key difference being that decrement is simulated by nondeterministically picking an element from \(C_1\), and removing it from the extension of such relation. The element is picked in the condition of the condition-action rule, and passed as a parameter. This implicitly means that the decrement action can be executed only if \(C_1\) contains at least one element.

Finally, an instruction of the form \(\langle {\mathtt {k}},{\mathtt {c_1}} {} \texttt {==0},{\mathtt {k'}} \rangle \) is modelled as follows:

We close the proof by remarking that a DCDS structured in this way is indeed case-centric. In fact, case centricity does by no means limit how the DCDS can query the relations of its schema, including \(C_1\). \(\square \)

4.2 Navigational case-centric DCDSs

In Theorem 1, undecidability arises from the fact that the DCDS can manipulate two unary relations without any restriction. We consequently need to discipline how the extension of relations can be manipulated. As a first fundamental requirement, we impose that the DCDS can access facts of a given relation only by navigating from a case identifier. To do so, we impose some modelling guidelines on both the data and process components.

As for the data component, we require that its constraints are either keys or foreign keys. In this case, the data component is said to be navigation-supporting. Following the discussion of Sect. 3.2, we also explicitly distinguish relations whose extension can be freely manipulated by the process from those that are cardinality-immutable.

As for the process component, we define a class of case-centric DCDSs called case-navigational. Intuitively, a case-centric DCDS is case-navigational if the following three conditions hold:

  • Every query is a navigational query rooted in a case identifier, that is, it “starts” from a case identifier, and then moves through schema relations by applying joins that navigate foreign key constraints backwards.

  • Facts mentioned in the \(\mathbf{add }\) and \(\mathbf{del }\) sets of effects are also subject to navigational patterns, in such a way that tuples can be added, deleted, and modified only if they can be reached from a case identifier by means of a navigational query.

To formally define navigational queries, some preliminary notions are needed. A weakly guarded FO query \(Q(\mathbf{x })\) is inductively defined as follows:

  • (base case) \(Q(\mathbf{x })\) is a conjunction of the form \(\exists \mathbf{y }. R(\mathbf{x },\mathbf{y }) \wedge \varPhi _{neg}(\mathbf{x },\mathbf{y })\), where \(\varPhi _{neg}\) is a query whose atoms are either relational atoms, or (in)equalities, composed through boolean operators. Intuitively, \(\varPhi _{neg}\) is used as a filter.

  • (inductive case) \(Q(\mathbf{x })\) is a conjunction of the form \(\exists y. R(\mathbf{x }_0,\mathbf{y }) \wedge \bigwedge _{i \in \{1,\ldots ,n\}} \varPhi _i(\mathbf{x }_i,\mathbf{v }_i)\) where, for each \(i \in \{1,\ldots ,n\}\), the following three conditions hold: (i) \(\mathbf{x }_0 \cup \mathbf{x }_i = \mathbf{x }\), (ii) \(\mathbf{v }_i \subseteq \mathbf{x }_0 \cup \mathbf{y }\), and (iii) \(\varPhi _i\) is a weakly guarded FO query.

We also say that a weakly guarded query \(\exists \mathbf{y }. R(\mathbf{x },\mathbf{y }) \wedge \varPsi \) is rooted in R. The appellative “weak” is used since, in the inductive case, not all free variables \(\mathbf{x }\) of the original query Q are forced to appear in the (weak) guard R.

Given a data component \(\mathcal {D} \) and a FO query \(\varPhi \), we say that \(\varPhi \) is a case-navigational query over \(\mathcal {D} \) if it obeys to the following conditions:

  • \(\varPhi \) is a weakly guarded query rooted in relation \(\textsc {case-rel}_{\mathcal {D}}\).

  • For each (weakly guarded) subformula \(\exists \mathbf{y }.S(\mathbf{x }_0,\mathbf{y }) \wedge \bigwedge _{i \in \{1,\ldots ,n\}} \varPsi _i(\mathbf{x }_i,\mathbf{v }_i)\), such that for each \(i \in \{1,\ldots ,n\}\), \(\varPsi _i\) is rooted in relation \(R_i\), \(\mathbf{x }_0 \cup \mathbf{x }_i = \mathbf{x }\), and \(\mathbf{v }_i \subseteq \mathbf{x }_0 \cup \mathbf{y }\), we have that:

    • either \(R_i\) is a cardinality-immutable relation, or

    • the following three conditions apply:

      1. 1.

        Variables \(\mathbf{v }_i\) appear exactly in those positions B of S that form primary key of S, i.e., \(\textsc {PK}({ S } [B])\) belongs to the constraints of \(\mathcal {D} \).

      2. 2.

        Variables \(\mathbf{v }_i\) all appear in relation \(R_i\).

      3. 3.

        By denoting with \(A_i\) the set of attributes corresponding to the positions of variables \(\mathbf{v }_i\), we have that \({ R_i } [A_i] \longrightarrow { S } [B]\) belongs to the constraints of \(\mathcal {D} \).

  • The same check is applied to the innermost formula \(\exists \mathbf{y }.R(\mathbf{x },\mathbf{y }) \wedge \varPhi _{neg}(\mathbf{x },\mathbf{y })\), by considering each relational atom appearing in \(\varPhi _{neg}\).

It is easy to see that checking whether a query is case-navigational for a given data component requires time that is linear in the size of the query. The definition is quite involved, but the underlying philosophy is rather intuitive: a case-navigational query compatible with \(\mathcal {D} \) starts from a case identifier, and moves from one relation to another by navigating foreign keys backward, i.e., from the “one” side (the pointed primary key) to the “many” side (all tuples referring to such a key). The answers to the query are extracted for some of the attributes visited along the navigational graph. Finally, some of the so-obtained results can be filtered away by appending, at the end of the navigation, a query of the form \(\varPhi _{neg}(\mathbf{x },\mathbf{y })\) above. An exception is constituted by cardinality-immutable relations, which can be queried arbitrarily (that is, without requiring a navigation driven by foreign-keys).

Fig. 8
figure 8

A navigation-supporting data component where each case corresponds to a pebble, moving and modifying a grid (shared with other pebbles)

Example 2

Consider the navigation-supporting data component of Fig. 8, where \({ Pebble } \) is the case relation. Each case represents a pebble, located over a cell in a (dynamic) planar grid. Each cell, in turn, may contain at most one pebble. We assume that the data component contains also constraints ensuring that its shape is in fact a grid. We do not model a specific case-centric DCDS running on top of such data component, but we instead focus on sample queries that are case-navigational. The first query we consider is:

$$\begin{aligned} { TRP } (p) = { Pebble } (p) \wedge \exists c,d,l.{ Cell } (c,p,{\mathtt {null}},d,l,{\mathtt {null}}) \end{aligned}$$

This query returns the pebble that is located in the top-right corner of the grid (provided that there exists one). The query is case-navigational: it first selects a case p from the \({ Pebble } \) relation, then uses p to navigate to its cell c, checking that such a cell has no other cell on its right and on its top.

Another interesting example of case-navigational query is the following:

The query returns those pebbles (together with their corresponding locations) that are completely surrounded, i.e., have pebbles in each adjacent cell. The query is case-navigational because it navigates from the \({ Pebble } \) relation to the \({ Cell } \) one, and then from the cell in which the pebble is located to all adjacent cells. Each subquery is paired with a negative filter guaranteeing that the considered adjacent cell indeed contains a pebble.

As last navigational query, consider

The query returns those pebbles that can move of two cells on the right, without incurring in any other pebble. The fact that the query is case-navigational follows the intuition of navigating inside the grid starting from the position of the pebble, and moving twice on the right. Each right step is captured by navigating backward the foreign key that connects a cell with its right cell. \(\square \)

Example 3

Consider again the archery training data component. Query

$$ \begin{aligned}&{ Groups \& Members } (g,n)\\&\quad ={ Group } (g) \wedge {} { State } (g,{\mathtt {running}}) \wedge {} \\&\quad \exists m,b. ({ MerryM } (m,n,b,{\mathtt {basic}},g) \end{aligned}$$

is case-navigational over the archery training data component, and returns each known group, together with the names of those merry men that are apprentices in that group and have a basic combat level. Query

$$\begin{aligned} { GSharingS } (g)= & {} { Group } (g) \wedge \exists s. ({ State } (g,s)\wedge \\&\exists g'. ({ Group } (g')\wedge { State } (g',s) \wedge g' \ne g)) \end{aligned}$$

returns all those groups that are in the same state of at least another group. This query is not case-navigational over the archery training data component, because it starts navigating from case identifier g and then suddenly “jumps” to querying another case identifier \(g'\). \(\square \)

Given a data component \(\mathcal {D} \), a case-navigational query \(\varPhi (\mathbf{x })\) over \(\mathcal {D} \), and a weakly guarded query \(\varPsi (\mathbf{x }')\) with \(\mathbf{x' } \subseteq \mathbf{x }\), we say that \(\varPsi (\mathbf{x }')\) is navigationally embeddable into \(\varPhi (\mathbf{x })\) if it is possible to replace a subquery \(\varPhi '(\mathbf{x }')\) of \(\varPhi (\mathbf{x })\) with \(\varPsi (\mathbf{x }')\), such that the resulting query is still case-navigational w.r.t. \(\mathcal {D} \). This notion is useful to construct add and delete effects starting from a case-navigational query.

With all these notions at hand, we now transport the notion of case-navigation to condition-action rules and action effects accordingly. A case-centric DCDS \(\langle \mathcal {D},\mathcal {P} \rangle \) with \(\mathcal {P} = \langle \mathcal {F},\mathcal {A},\varrho \rangle \) is case-navigational if, given \(C = \textsc {case-rel}_{\mathcal {D}}\):

  • \(\mathcal {D} \) is navigation-supporting;

  • \(\varrho _{norm}\) is constituted by condition-action rules of the form \(\varPhi (c,\mathbf{x }) \mapsto {\textsc {act}} (c,\mathbf{x })\), where query \({ C } (c) \wedge \varPhi (c,\mathbf{x })\) is case-navigational w.r.t. \(\mathcal {D} \).

  • For every action \({\textsc {act}} (\mathbf {\textsf {p}} )\) in \(\mathcal {A} _{norm}\) and each effect \(e \in \textsc {effect}({\textsc {act}})\), e has the form

    $$\begin{aligned} \bigvee _{i \in \{1,\ldots ,n\}}\varPhi _i({\textsf {c}},\mathbf {\textsf {p}} ,\mathbf{x }) \rightsquigarrow \mathbf{add }~A~ \mathbf{del }~D \end{aligned}$$

    where, by considering parameters \(\mathbf {\textsf {p}} \) as constants, we have that:

    1. 1.

      for each \(i \in \{1,\ldots ,n\}\), query \({ C } ({\textsf {c}}) \wedge \varPhi _i({\textsf {c}},\mathbf {\textsf {p}} ,\mathbf{x })\) is case-navigational w.r.t. \(\mathcal {D} \).Footnote 5

    2. 2.

      For each \(i \in \{1,\ldots ,n\}\), and for each fact F in A, F is navigationally embeddable into \({ C } ({\textsf {c}}) \wedge \varPhi _i({\textsf {c}},\mathbf {\textsf {p}} ,\mathbf{x })\), by considering the service calls in F as existentially quantified variables.

    3. 3.

      For each \(i \in \{1,\ldots ,n\}\), and for each fact F in D, we have that F is navigationally embeddable into \({ C } ({\textsf {c}}) \wedge \varPhi _i({\textsf {c}},\mathbf {\textsf {p}} ,\mathbf{x })\).

The requirements on action effects ensure that the manipulation of data through add and delete facts does not work over parts of the database that are unrelated (in the sense of Sect. 3.2) to the case-navigational query in the body of the effect.

Example 4

The archery training DCDS can easily be turned into an equivalent form that makes it case-navigational. Consider for example the condition-action rule for the \({\textsc {add}}{\hbox {-}}{\textsc {appr}}\) action:

$$\begin{aligned}&{ Group } (g) {}\wedge { State } (g,{\mathtt {in}})\\&\quad \wedge \exists n,b,c.{ MerryM } (m,n,b,c,{\mathtt {null}}) \mapsto {\textsc {add}}\hbox {-}{\textsc {appr}}(m,g) \end{aligned}$$

This query is already case-navigational: it navigates from the \({ Group } \) to the \({ State } \) relation, while \({ MerryM } \) can be freely queried, being cardinality-immutable. The fact that \({ MerryM } \) is cardinality-immutable also implies that the effect used to specify action \({\textsc {add}}\hbox {-}{\textsc {appr}}\) is also already in a case-navigational form. As for the other actions, they are already in case-navigational form, or can be directly made case-navigational by adding \({ Group } (g)\) in front of the employed queries, which already use group g as entry point. \(\square \)

In general, case-navigationality guarantees that queries access the data in a well-disciplined way, always starting from a case identifier and retrieving tuples that directly or indirectly refer to it. In this light, the case identifier can be seen as a sort of “provenance identifier”. This also tells us that when a process is not case-navigational because it contains relations that are disconnected from the case relation, a possible strategy to turn it into a case-navigational process is to augment such relations with a (non-null) “provenance column”, in turn pointing via a foreing key to the case relation. Given one such relations, say, R, this modification ensures that every tuple of R now refers to one and only one case identifier, and that the provenance column can be used as the join column to access R in case-navigational way.

Fig. 9
figure 9

A case-centric data model for user carts

Unfortunately, being navigational does not suffice for decidability of soundness. In fact, we argue that the possibility of having unboundedly many tuples (indirectly) referring to a single case identifier, is a source of undecidability.

Theorem 2

Checking data-aware soundness of case-navigational DCDSs is undecidable.

Proof

Consider a case-centric DCDS \(\mathcal {S} = \langle \mathcal {D},\mathcal {P} \rangle \) modeling a process that manipulates items in a cart. The schema and constraints of \(\mathcal {D} \) are shown in Fig. 9. Each case represents a user cart (\(\textsc {case-rel}_{\mathcal {D}} = { Cart } \)), and each cart is filled with products equipped with a barcode. The key aspect of \(\mathcal {D} \) is that, for a given cart, unboundedly many products can be added to the cart. This features makes it possible to use \(\mathcal {D} \) to realize a case-centric DCDS that models the behaviour of shoppers who do not have a limit on the number of products they intend to buy. In this light, \(\mathcal {P} \) offers to shoppers the possibility of adding and removing items to/from the cart.

Specifically, a shopper who owns cart \({\mathtt {c}}\) can add a new product of type \({\mathtt {t}}\) to \({\mathtt {c}}\) using the following process fragment:

$$\begin{aligned} \begin{array}{l} { Cart } (c) \mapsto {\textsc {ins}}\hbox {-}{\textsc {prod}}(c) \\ {\textsc {ins}}\hbox {-}{\textsc {prod}}({\mathtt {c}}) : \\ \{ {\textsf {true}}\rightsquigarrow \mathbf{add }~\{{ InCart } (\mathbf{newCode } _{{ { InCart } } [barcode]}(),{\mathtt {c}}) \} \} \end{array} \end{aligned}$$

Notice the usage of the \(\mathbf{newCode } _{{ { InCart } } [barcode]}\) service call, which guarantees that the added product has a fresh barcode.

On the other hand, a shopper can also change her mind about a product that is currently in the cart. The following process fragment supports the shopper in the removal of a product from the cart:

$$\begin{aligned} \begin{array}{l} { Cart } (c)\wedge { InCart } (p,c) \mapsto {\textsc {rem}}\hbox {-}{\textsc {prod}}(p,c)\\ {\textsc {rem}}\hbox {-}{\textsc {prod}}({\mathtt {p}},{\mathtt {c}}) : \{ { InCart } ({\mathtt {p}},{\mathtt {c}}) \rightsquigarrow \mathbf{del }~ \{ { InCart } ({\mathtt {p}},{\mathtt {c}}) \} \} \end{array} \end{aligned}$$

Given these two process fragments, it is quite easy to see that a shopper can straightforwardly simulate a counter machine, where:

  • the value of the counter is represented by the number of items in the cart;

  • the two actions ins-prod and rem-prod add and remove a tuple to/from the \({ InCart } \) relation, and thus they respectively encode the increment and decrement of the counter;

  • testing whether the counter simulated by cart \({\mathtt {c}} \) is zero amounts to querying whether the extension of \({ InCart } \) is empty for \({\mathtt {c}} \): \({ IsEmpty } ({\mathtt {c}}) = \lnot \exists p. { InCart } (p,{\mathtt {c}})\). \(\square \)

4.3 Case-width-bounded DCDSs

The source of undecidability in Theorem 2 relies on the possibility of having unboundedly many tuples referring (directly or indirectly) to a single case identifier. We therefore need to limit this source of unboundedness. To do so, we introduce a class of case-centric DCDSs that are case-width-bounded, by leveraging cardinality constraints in combination with key constraints (i.e., cardinality-bounded foreign key constraints in the sense of Sect. 2.1). Interestingly, when paired with the case-navigational requirement introduced before, this approach guarantees that, starting from a case identifier, each (backward) foreign key navigation has only boundedly many branches to follow.

Technically, a data component \(\mathcal {D} \) is case-width-bounded if:

  • \(\mathcal {D} \) is case-navigational;

  • each foreign key constraint \({ A } [R] \longrightarrow { B } [S]\), where A is not cardinality-immutable, is paired with a corresponding cardinality constraint \(\textsc {CARD}({ A } [R],n..m)\), so that together they form a cardinality-bounded foreign key constraint \({ A } [R]\xrightarrow {n..m~~~}{ B } [S]\).

Observe that for foreign keys pointing to a cardinality-immutable relation, no cardinality constraint need to be enforced, since the number of tuples in the source relation of the foreign key is anyway bounded by the number of tuples in the target relation, which does not change during the execution.

Correspondingly, we say that a case-centric DCDS is case-width-bounded if it is case-navigational and works over a data component that is case-width-bounded.

Example 5

The archery training schema of Fig. 1 is case-width-bounded, since foreign keys are either paired with a corresponding cardinality constraint, or they point to a cardinality immutable relation. Specifically, foreign keys without corresponding cardinality constraints point to relations \({ MerryM }\) and \({ Trust }\), which are both cardinality-immutable. Instead, foreign keys pointing to cardinality-mutable relations all have a numeric upper bound: every group has exactly one state, and at most two weekly slots. \(\square \)

It is important to observe that, to ensure case-width-boundedness, also the standard foreign key from \({ State }\) to the case relation (cf. Fig. 3) needs to be properly associated to a cardinality constraint. This is possible only when the control-flow component of the process gives raise to boundedly many control states for the same case, which is typical in reality. In workflow net terms, this implies that the injection of a single “case token” into the net gives raise to a bounded marked net, which controls the maximum degree of concurrency induced by the case token.

We continue our (un)decidability tour by showing that being case-width-bounded is still not sufficient to ensure decidability, even when the width-bound is 1. The new source of undecidability is related to the possibility of recurring over cyclic foreign keys, in such a way that each navigation step has a bounded (actually, a single) branch, but, as a whole, it can reach unboundedly many values along a chain.

Theorem 3

Checking data-aware soundness over case-width-bounded DCDSs is undecidable.

Proof

To show undecidability, we take inspiration from the well-known Snake\(^{\copyright }\) videogame. In our variant, we consider a snake that, at the beginning, consists of just a snake head. Every time the snake finds a piece of food, it grows of one body part. Every time the snake hits a wall, it looses its tail body part (unless it only has its head—in this case nothing happens). We do not explicitly account for the position of the snake on the screen, nor for the actions used to move the snake, as they are not needed in the proof.

To formalize this behavior, we rely on the data component shown in Fig. 10. In this data component, each case denotes (the head of) a snake: \(\textsc {case-rel}_{\mathcal {D}} = { SnakeH } \). The \({ BodyPart } \) relation keeps track of the body parts of a snake. Each body part keeps a reference to the corresponding snake’s head, and (optionally) a reference to its next part, that is, the part that comes next towards the head. The special control value \({\mathtt {null}} \) is used in the next column of the first body part, i.e., the body part that has the head as next.

It is easy to see that such a data component is case-width-bounded: each cardinality-bounded foreign key models a one-to-one relation.

The growth of snake \({\textsf {c}} \) due to eating some food is captured by the following case-navigational action:

$$\begin{aligned} \begin{array}{l} {\textsc {eat}}\hbox {-}{\textsc {food}}({\textsf {c}}):\\ \left\{ \begin{array}{l} \lnot \exists f,n. { BodyPart } (f,{\textsf {c}},n)\rightsquigarrow \\ \qquad \mathbf{add }\{{ BodyPart } (\mathbf{newBP } _{{ BodyPart } [id]}(),{\textsf {c}},{\mathtt {null}})\}\\ { BodyPart } (f_{o},{\textsf {c}},{\mathtt {null}}) \rightsquigarrow \mathbf{del }\{{ BodyPart } (f_{o},{\textsf {c}},{\mathtt {null}})\}\\ \qquad \mathbf{add }\left\{ \begin{array}{l} { BodyPart } (\mathbf{newBP } _{{ BodyPart } [id]}(),{\textsf {c}},{\mathtt {null}}),\\ { BodyPart } (f_o,{\textsf {c}},\mathbf{newBP } _{{ BodyPart } [id]}()) \end{array}\right\} \end{array}\right\} \end{array} \end{aligned}$$

The first effect deals with the case where the snake is currently without any body part (i.e., it just has a head). In this case, the first body part is created by calling the \(\mathbf{newBP } _{{ BodyPart } [id]}()\) service call so as to generate a fresh identifier for it.

Fig. 10
figure 10

A case-width-bounded data component modeling snakes

The second effect deals instead with the case where snake \({\textsf {c}} \) has already at least one body part. The query selects the first body part of \({\textsf {c}} \), namely the one that has \({\mathtt {null}} \) as next. The identifier of such a body part matches with variable \(f_{o}\). The head of the effect acts then as follows:

  • A new first body part is added, using the \(\mathbf{newBP } _{{ BodyPart } [id]}()\) service call to generate a fresh identifier for it.

  • The old first body part is updated, making it the second one; this has the indirect effect of shifting all the other body parts one position forward.

Symmetrically, the shrinking of snake \({\textsf {c}} \) due to a wall hit is modeled by the following case-navigational action:

$$\begin{aligned} \begin{array}{l} {\textsc {hit}}\hbox {-}{\textsc {wall}}({\textsf {c}}):\\ \left\{ \begin{array}{lcll} { BodyPart } (f_{o},{\textsf {c}},{\mathtt {null}}) &{}\rightsquigarrow &{}\mathbf{del }&{}\{{ BodyPart } (f_{o},{\textsf {c}},{\mathtt {null}})\}\\ { BodyPart } (f_{o},{\textsf {c}},{\mathtt {null}})\\ {}\wedge { BodyPart } (f_{n},{\textsf {c}},f_{o}) &{}\rightsquigarrow &{}\mathbf{del }&{}\{{ BodyPart } (f_{n},{\textsf {c}},f_{o})\}\\ &{}&{}\mathbf{add }&{}\{{ BodyPart } (f_{n},{\textsf {c}},{\mathtt {null}}))\}\\ \end{array} \right\} \end{array} \end{aligned}$$

The action deletes the current first body part, and at the same time updates the current second body part turning it into the first one. This has the indirect effect of shifting all the other body parts one position backward.

Given these two process fragments, it is easy to see that this DCDS can simulate a counter machine, where:

  • The value of the counter is represented by the length (i.e., the number of body parts) of the snake.

  • The two actions eat-food and hit-wall are respectively guaranteed to add and remove a body part to/from the snake (which means incrementing and decrementing the counter).

  • Testing whether the counter encoded by snake \({\mathtt {c}} \) is zero amounts to querying whether there is no body part for \({\mathtt {c}} \):

    $$\begin{aligned} \lnot \exists b,n. { BodyPart } (b,{\mathtt {c}},n) \end{aligned}$$

This concludes the proof. \(\square \)

4.4 Case-depth-bounded DCDSs

The source of undecidability in Theorem 3 relies on the possibility of building unbounded chains of tuples, connected to each other via foreign keys. Therefore, the next necessary restriction is to “bound” the length of such chains. There are several ways of doing this. Following our methodological flavor, we choose a general approach that consists in checking whether the DCDS data component employs foreign keys safely.

Specifically, unbounded chains can be created by complying with the case-navigational property (that is, by navigating foreign keys) only if the foreign keys collectively form cyclic dependencies. This is the case of Fig. 10, where the \({ BodyPart } \) relation cyclically depends on itself. In general, cycles may be formed by composing several foreign keys together.

Fig. 11
figure 11

Examples of FK-graphs; double-line nodes correspond to the case relation, blue nodes with thin line to cardinality-bounded relations, and green nodes with thin, dashed line to read-only relations. a FK-graph of Fig. 1. b FK-graph of Fig. 10. c FK-graph of Fig. 12

We therefore characterize a class of data components guaranteeing the absence of such cycles. Technically, we say that a data component \(\mathcal {D} \) is case-depth-bounded if:

  • \(\mathcal {D} \) is case-navigational.

  • Whenever \(\mathcal {D} \) contains a set of foreign key constraints of the form

    • \({ \textsc {case-rel}_{\mathcal {D}} } [c]\longleftarrow { R_1 } [A_1]\longleftarrow \cdots \longleftarrow { R_n } [A_n]\longleftarrow { S_1 } [B_1]\)

    • \({ S_1 } [B_1]\longleftarrow \cdots \longleftarrow { S_m } [B_m]\longleftarrow { S_1 } [B_1]\)

    then at least one relation \(S_i\) (with \(i \in \{1,\ldots ,m\}\)) is cardinality-immutable.

Intuitively, the mentioned set of foreign key constraints expresses that it is possible to navigate backward from the case relation, in such a way that the same relation is visited twice. If it is the case, then it is required that one of the relations involved in the cycle is cardinality-immutable. This indirectly rules out the possibility of having cycles that involve cardinality-mutable relations only.

In the following, we say that a tuple (indirectly or directly) refers to a case identifier if the tuple can be retrieved using a case-navigational query starting from that case identifier. This means that there is a chain of tuples going from the case identifier to the target tuple, such that every tuple in the chain refers to the previous one via a foreign key.

The key property guaranteed by a case-depth-bounded data component is that it is never possible to write a case-navigational query that visits the same relation twice along a path, unless the query visits a cardinality-immutable relation in between. This, in turn, implies the following interesting intermediate result.

Lemma 1

No case-navigational DCDS working over a case-depth-bounded data component can create a chain of tuples of unbounded length, where the source of the chain is a \({ Case } \) tuple, and each other tuple refers to the previous one via a foreign key.

Proof

Suppose, by absurdum, that there is a DCDS that can create such a chain. Since the chain is unbounded but a data component has a fixed schema consisting of finitely many relations, the chain must contain unboundedly many tuples for some relations. Due to the case-navigational assumption, the DCDS can create such unboundedly many tuples only via navigational effects. This, in turn, can be done only if there is a cyclic composition of foreign key constraints in the DCDS data component. Let \(S_1,\ldots ,S_m\) be the relations involved in such a cycle. Then the chain must cyclically contain a sequence of m fresh tuples for each \(S_i\) (with \(i \in \{1,\ldots ,m\}\)), repeated unboundedly many times. This, in turn, implies that each such relation has unboundedly many tuples along the chain. However, from the definition of case-depth-bounded data component, at least one out of these relations must be cardinality-immutable. The extension of cardinality-immutable relations is fixed and cannot be increased, therefore it is impossible to have unboundedly many tuples for a cardinality-immutable relation along the chain. This contradicts the hypothesis. \(\square \)

An effective way to detect cyclic chains of foreign keys and rule them out is to see the schema and constraints of a data component \(\mathcal {D} \) as a FK-graph \(G_\mathcal {D} \), that is, a tuple \(\langle N,n_0,E\rangle \), where:

  • N is a set of nodes, which contains a node for each relation in the database schema of \(\mathcal {D} \).

  • \(n_0 \in N\) is the node corresponding to the case relation \(\textsc {case-rel}_{\mathcal {D}}\).

  • E is a set of edges, each of which has the form \(\langle R_1,R_2,id\rangle \), where \(R_1,R_2 \in N\) and id is a unique identifier. Formally, E is the minimal set satisfying the following condition: for each foreign key constraint \({ R_i } [A_i] \longrightarrow { R_j } [B_j]\) of \(\mathcal {D} \), E contains a dedicated tuple \(\langle R_i,R_j,id\rangle \), where id is a fresh identifier.

We use the standard notation \(R_i \longrightarrow R_j\) if there exists an edge identifier id such that \(\langle R_i,R_j,id\rangle \in E\), and \(\overset{+}{\longrightarrow }\) to denote that a node can reach another node via a path in the graph.

With this notion at hand, we have that a data component \(\mathcal {D} \) is case-depth-bounded if in \(G_\mathcal {D} = \langle N,n_0,E\rangle \) there is no node \(n \in N\) such that: (i) \(n_0 \overset{+}{\longrightarrow }n\), and (ii) \(n\overset{+}{\longrightarrow }n\) without ever passing through a cardinality-immutable relation. This property can be checked in \(\textsc {NLogSpace}\) in the size of the graph (i.e., in the size of the data component \(\mathcal {D} \)), since it consists of a sequence of two reachability checks.

Figure 11 provides examples of foreign-key graphs. It is easy to see that the data component used in the proof of Theorem 3 is not case-depth-bounded (cf. Fig. 11b).

4.5 Case-bounded DCDSs

We now consider the interaction between width- and depth-boundedness. Recall that both are necessary towards decidability, as soundness turned out to be undecidable for width-bounded but depth-unbounded systems (cf. Theorem 3), as well as for width-unbounded and depth-bounded systems (cf. Theorem 2).

In particular, we introduce the class of case-bounded DCDSs, i.e., DCDSs that are simultaneously width-bounded and depth-bounded. For this class of DCDSs, the following interesting intermediate result holds.

Lemma 2

Let \(\mathcal {S} = \langle \mathcal {D},\mathcal {P} \rangle \) be a case-bounded DCDS with \(\varUpsilon _{\mathcal {S}} = \langle \varDelta ,\mathcal {R},\varSigma ,s_0,{ db },{\Rightarrow }\rangle \). Then there exists an overall bound \(b \in \mathbb {N}\) such that, for every state \(s \in \varSigma \) and every case-tuple \(\textsc {case-rel}_{\mathcal {D}}({\mathtt {c}}) \in { db }(s)\), the number of tuples referring to \({\mathtt {c}} \) is bounded by b.

Proof

Let \(\mathcal {D} =\langle \varDelta , \mathcal {R},\mathcal {C}, \mathcal {I}_0 \rangle \). Being case-bounded, \(\mathcal {S} \) is case-width-bounded and case-depth-bounded. Thanks to case-width-boundedness, a DCDS can create at most \(k \cdot W\) tuples directly referring to a given tuple, where: (i) k is the number of foreign keys in the schema (which bounds the number of foreign keys that point to the same relation), and (ii) W is the maximum upper bound of all the cardinality constraints in \(\mathcal {D} \).

Thanks to case-depth-boundedness, there is also a bound on the length of the chains of tuples that can be created by \(\mathcal {S} \) starting from a case identifier and using navigational effects (cf. Lemma 1. Now let T the maximum number of tuples of a cardinality-immutable relation in \(\mathcal {I}_0 \). Since this number remains constant along any evolution of \(\mathcal {S} \), the number of tuples that can be chained is bounded by \(T \cdot D\), where D is the length of the longest simple path in \(G_\mathcal {D} \). This upper bound is obtained by iterating T times through a loop that consists of D relations - this cannot be exceeded, since the number of tuples present in any cardinality-immutable relation corresponds, by definition, to T.

Putting everything together, we obtain that the number of tuples that can refer to a single case identifier is bounded by \(b=(k\cdot W)^{(D\cdot T)+1}\). \(\square \)

Unfortunately, despite the boundedness result of Lemma 2, unbounded chains can still be obtained by creating subtle interactions among different cases.

Theorem 4

Checking soundness over case-bounded DCDSs is undecidable.

Proof

The proof takes inspiration from the proof of Theorem 4.6 in [11], and is more involved than the other undecidability proofs in this work, since it is not true anymore that each single case can simulate a 2-counter machine, but cases must collectively cooperate to do so.

Fig. 12
figure 12

A case-bounded data model for protesters and protest signs

To illustrate the proof, we use the metaphor of a pacific protest and its demonstrators, and construct a DCDS that manages a protest by simulating a 2-counter machine. As shown in Fig. 12, the data layer for this setting maintains data about protesters (where each protester is a case) and about protest signs that protesters held together. Each protester can hold from zero to two signs (one per hand), and each protest sign can be held by one or two protesters (one with her right hand, the other with her left hand). Each sign tracks the (unique) message contained in the sign, together with the identifiers of the protesters who respectively stand on the left and on the right of the sign (using \({\mathtt {null}} \) if one is missing). Two cardinality-immutable relations are used to keep track of common data for all protesters: the id of the leader, and the current “command” to be collectively executed. The leader table contains a single tuple, initialized to some random value and then overridden as soon as the first protester is created (i.e., the first protester becomes the leader of the protest). It is easy to see that this data component is both width- and depth-bounded.

The manipulation of the two counters corresponds, in our protest metaphor, to the organization of the protest with two chains of protesters on the left and on the right of the leader, each constituted by an alternation of protesters and signs.

The joint command relation contains a code representing the current command to be executed—in the context of the 2-counter machine reduction, this corresponds to the program counter.

When the first protester is created, a dedicated action of the DCDS is used to introduce two signs into the system, making sure that the first sign has the leader on the right and nobody, i.e., \({\mathtt {null}} \), on the left, whereas the second has the leader on the left, and nobody on the right. The action also marks that the first protester is actually a leader, and that it is currently alone, i.e., the last of the queue on the left and on the right. The two initial signs hold by the leader represent the zero elements of the two counters. In the following, we focus on the manipulation of the second counter (related to the right-chain starting from the leader). The manipulation of the first counter is managed symmetrically.

To test whether the second counter is zero, we can in fact use the following case-navigational queryFootnote 6:

$$\begin{aligned} { Leader } (l) \wedge { Protester } (l) \wedge \exists m.{ Sign } (m,l,{\mathtt {null}}) \end{aligned}$$

The increment of the second counter corresponds to the addition of a protester to the chain. The fact that the chain is meant to be extended is signalled by a specific value \({\mathtt {ext}} \) in the \({ Cmd } \) relation.

To be added, a protester must be un-supplied with signs. The addition consists of three steps. To indirectly synch, the different protesters involved in the addition mark the phase column of the \({ TBA } \) (to-be-added) relation with three different constants \({\mathtt {0}} \), \({\mathtt {1}} \), and \({\mathtt {2}} \). The last constant \({\mathtt {2}} \) represents that the addition protocol was executed in the past, and that the id column contains old data.

In the first step, a protester with no signs nondeterministically offers to be added to the chain, provided that nobody else already did (i.e., the phase column of \({ TBA } \) has value \({\mathtt {2}} \)). This is done by suitably querying and updating the cardinality-immutable \({ TBA } \) (to-be-added) relation. In particular, if the second component of \({ TBA } \) is \({\textsf {true}}\), then it means that nobody offered for the current addition.

$$\begin{aligned}&{ Protester } (p) \wedge \lnot (\exists m,p_2. { Sign } (m,p_2,p) \vee { Sign } (m,p,p_2))\\&\quad \wedge { Cmd } ({\mathtt {ext}}) \wedge (\exists p_2. { TBA } (p_2,{\mathtt {2}})) \mapsto {\textsc {offer}} (p)\\&{\textsc {offer}} ({\textsf {p}}) : \left\{ \begin{array}{rl} \exists p_2. { TBA } (p_2,{\mathtt {2}}) \rightsquigarrow &{} \mathbf{del }\{{ TBA } (p_2,{\mathtt {2}})\}\\ &{} \mathbf{add }\{{ TBA } ({\textsf {p}},{\mathtt {0}})\} \end{array}\right\} \end{aligned}$$

The second step is executed by the rightmost protester of the chain, and consists in making the protester in the id column of the \({ TBA } \) relation the new rightmost protester.

By noting that the rightmost protester can be univocally identified by checking that she holds a sign with her right hand, but there is nobody standing on the right of the same sign, the formalization of this process fragment is then as follows:

$$\begin{aligned}&\begin{array}{rl} { Protester } (p) &{} {}\wedge (\exists m. { Sign } (m,p,{\mathtt {null}}))\\ &{}{}\wedge { Cmd } ({\mathtt {ext}}) \wedge { TBA } (n,{\mathtt {0}}) \mapsto {\textsc {ins}} (p,n) \end{array}\\&\begin{array}{l} {\textsc {ins}} ({\textsf {p}},{\textsf {n}}):\\ \left\{ \begin{array}{rl} { Sign } (m,{\textsf {p}},{\mathtt {null}}) \rightsquigarrow &{} \mathbf{del }\{{ Sign } (m,{\textsf {p}},{\mathtt {null}})\}\\ &{} \mathbf{add }\{{ Sign } (m,{\textsf {p}},{\textsf {n}})\}\\ { TBA } ({\textsf {p}},{\mathtt {0}}) \rightsquigarrow &{} \mathbf{del }\{{ TBA } ({\textsf {p}},{\mathtt {0}})\}\\ &{} \mathbf{add }\{{ TBA } ({\textsf {p}},{\mathtt {1}})\} \end{array}\right\} \end{array} \end{aligned}$$

The last step of the addition is then again under the responsibility of the just inserted protester, who now needs to prepare a new sign to be placed on her own right. This has the effect of incrementing by 1 the number of signs on the right of the leader. At the same time, the protester also updates the \({ TBA } \) relation and the \({ Cmd } \) relation (the latter depending on the specific strategy adopted by the protest, i.e., by the specific DCDS at hand).

$$\begin{aligned}&\begin{array}{rl} { Protester } (p)&{}\wedge { Cmd } ({\mathtt {ext}}) \wedge { TBA } (p,{\mathtt {1}}) \mapsto {\textsc {prep}} (p) \end{array}\\&\begin{array}{l} {\textsc {prep}} ({\textsf {p}}):\\ \left\{ \begin{array}{rl} {\textsf {true}}\rightsquigarrow &{} \mathbf{add }\{{ Sign } (\mathbf{newMsg } _{{ Sign } [msg]}(),{\textsf {p}},{\mathtt {null}})\}\\ { TBA } ({\textsf {p}},{\mathtt {1}}) \rightsquigarrow &{} \mathbf{del }\{{ TBA } ({\textsf {p}},{\mathtt {1}})\}~ \mathbf{add }\{{ TBA } ({\textsf {p}},{\mathtt {2}})\}\\ { Cmd } ({\mathtt {ext}})\rightsquigarrow &{} \mathbf{del }\{{ Cmd } ({\mathtt {ext}})\}~ \mathbf{add }\{{ Cmd } (\ldots )\}\\ \end{array}\right\} \end{array} \end{aligned}$$

The decrement of the second counter is much easier. It corresponds to the contraction of the chain on the right of the leader, i.e., to the removal of the rightmost protester from the chain. This is signalled by the presence of the \({\mathtt {contr}} \) constant inside the cardinality-immutable \({ Cmd } \) relation. The removal just consists in affirming that the sign currently standing at the left of the rightmost protester will not have anybody on its right, and that the rightmost protester will loose the sign standing on her own right (this will make her eligible for a future insertion in the chain).

$$\begin{aligned}&\begin{array}{rl} { Protester } (p) &{} {}\wedge \exists m,l.({ Sign } (m,l,p) \wedge l \ne {\mathtt {null}}) \\ &{} {}\wedge \exists m. ({ Sign } (m,p,{\mathtt {null}})\\ &{} {}\wedge { Cmd } ({\mathtt {contr}}) \mapsto {\textsc {rem}} (p) \end{array}\\&\begin{array}{l} {\textsc {rem}} ({\textsf {p}}): \left\{ \begin{array}{rl} { Sign } (m,l,{\textsf {p}}) \rightsquigarrow &{} \mathbf{del }\{{ Sign } (m,l,{\textsf {p}})\}\\ &{} \mathbf{add }\{{ Sign } (m,l,{\mathtt {null}})\}\\ { Sign } (m,{\textsf {p}},{\mathtt {null}}) \rightsquigarrow &{} \mathbf{del }\{{ Sign } (m,{\textsf {p}},{\mathtt {null}})\}\\ { Cmd } ({\mathtt {ext}}) \rightsquigarrow &{} \mathbf{del }\{{ Cmd } ({\mathtt {contr}})\}\\ &{}\mathbf{add }\{{ Cmd } (\ldots )\}\\ \end{array}\right\} \end{array} \end{aligned}$$

It is worth noting that, thanks to the condition-action rule for \({\textsc {rem}} \), it is never possible to apply it on the leader.

We can now see the evolution of the protest as a 2-counter machine, where the value of the first (resp., second) counter is represented by the number of signs present on the left (resp., right) of the leader, minus 1. The zero element of the first (resp., second) counter is actually represented by the sign on the left (resp., right) by the leader. The evolution relies on the fact that unboundedly many protesters (representing cases) are nondeterministically created, and that the addition and removal into/from the chain are made possible by indirect synchronizations working over cardinality-immutable relations. The input 2-counter machine can be modeled by properly deciding how to update the content of the \({ Cmd } \) relation, so as to mimic the program of the 2-counter machine. The halting of such machine is reduced to soundness of the corresponding DCDS by just ensuring that when the halting state is reached, all protesters move into the \({\mathtt {out}}\) state. \(\square \)

4.6 Case-isolated DCDSs

The undecidability result of Theorem 4 is due to the lack of isolation across cases: when a single relation can be achieved navigationally from two distinct cases, such relation can be used to implicitly transfer information among cases, and in turn coordinate their evolution.

We consequently need to limit this interaction. We do so by introducing a suitable notion of isolation, which guarantees that the data associated to some case are not touched by other cases. Technically, we say that a data component \(\mathcal {D} \) is case-isolated if:

  • \(\mathcal {D} \) is case-navigational.

  • Whenever \(\mathcal {D} \) contains a set of foreign key constraints of the form

    • \({ \textsc {case-rel}_{\mathcal {D}} } [c]\longleftarrow { R_1^1 } [A_1^1]\longleftarrow \ldots \longleftarrow { R_n^1 } [A_n^1]\longleftarrow { R } [A]\)

    • \({ \textsc {case-rel}_{\mathcal {D}} } [c]\longleftarrow { R_1^2 } [A_1^2]\longleftarrow \ldots \longleftarrow { R_k^2 } [A_k^2]\longleftarrow { R } [A]\)

    such that \({ R_i^1 } [A_i^1] \ne { R_j^2 } [A_j^2]\) for some \(i \in \{1,\ldots ,n\}\) and \(j \in \{1,\ldots ,k\}\), then R is a read-only relation.

Intuitively, the mentioned set of foreign key constraints expresses that it is possible to find two distinct backward foreign-keys navigation paths that start from the case relation and reach the same relation. If it is the case, then it is required that this latter relation is read-only (and consequently cannot be used to transfer information from one case to the other). This indirectly rules out the possibility of having multiple paths pointing to the same read-write relation.

And for case-depth-boundedness, we can reformulate case-isolation of a data component \(\mathcal {D} \) as a property over its corresponding FK-graph \(G_\mathcal {D} \). In particular, we have that \(\mathcal {D} \) is case-isolated if, whenever in \(G_\mathcal {D} \) a node n can be reached from \(n_0\) along two different paths (that is, paths that differ for at least one node), then n corresponds to a read-only relation. It is easy to see that the data component used in the proof of Theorem 4 is not case-isolated (cf. Fig. 11c).

Case-isolation implies the following interesting intermediate result.

Lemma 3

Let \(\mathcal {S} = \langle \mathcal {D},\mathcal {P} \rangle \) be a case-isolated DCDS with \(\varUpsilon _{\mathcal {S}} =\langle \varDelta ,\mathcal {R},\varSigma ,s_0,{ db },{\Rightarrow }\rangle \). Then, for every state \(s \in \varSigma \) and every relation R in \(\mathcal {D} \) that is not read-only, each R-tuple in \({ db }(s)\) refers to at most one case identifier.

Proof

Suppose, by absurdum, that there is a state \(s \in \varSigma \), case identifiers \({\mathtt {c_1}} \) and \({\mathtt {c_2}} \), and a non-read-only, n-ary relation R such that:

  • \(\{\textsc {case-rel}_{\mathcal {D}}({\mathtt {c_1}}),\textsc {case-rel}_{\mathcal {D}}({\mathtt {c_2}})\} \subseteq { db }(s)\);

  • there exists a tuple \(\mathbf{v }\) of n values in \(\varDelta \), for which \(R(\mathbf{v }) \in { db }(s)\);

  • \(R(\mathbf{v })\) refers to both \({\mathtt {c_1}} \) and \({\mathtt {c_2}} \).

Since, by definition, a case-centric DCDS has no case in its initial database, then s must have been produced after the application of a sequence of actions, during which: (i) \({\mathtt {c_1}} \) and \({\mathtt {c_2}} \) have been created, and (ii) \(R(\mathbf{v })\) has been inserted or updated making it referring to \({\mathtt {c_1}} \) and \({\mathtt {c_2}} \). Since by hypothesis \(\mathcal {S} \) is case-isolated, hence also case-navigational, the latter condition implies the existence of two different chains of tuples, created by \(\mathcal {S} \) using navigational effects, of the form:

$$\begin{aligned} R(\mathbf{v }), R^i_1\left( \mathbf{v }^i_1\right) , \ldots , R^i_n\left( \mathbf{v }^i_n\right) , \textsc {case-rel}_{\mathcal {D}}({\mathtt {c_i}}) \end{aligned}$$

for \(i \in \{1,2\}\), where each tuple points to the next tuple via a foreign key. The existence of two different chains of this form, constructed by the DCDS using navigational effects, implies that \(\mathcal {D} \) contains two different sets of foreign key constraints of the form:

$$\begin{aligned} { R } [A] \longrightarrow { R^i_1 } [A^i_1] \longrightarrow \cdots \longrightarrow { R^i_n } [A^i_n] \longrightarrow { \textsc {case-rel}_{\mathcal {D}} } [id] \end{aligned}$$

for \(i \in \{1,2\}\). However, this in turn implies that in the FK-graph \(G_\mathcal {D} \) there are two paths leading from R to \(\textsc {case-rel}_{\mathcal {D}}\), which contradicts the hypothesis that \(\mathcal {S} \) is case-bounded, hence also case-isolated. \(\square \)

4.7 Decidability, at last

As we have shown in the previous sections, case-isolation and case-boundedness do not suffice alone for achieving decidability of soundness. In fact, all the proofs of the undecidability theorems in this work (with the exception of Theorem 4) employ a DCDS that is case-isolated but not case-bounded, whereas the one of Theorem 4) employs a DCDS that is case-bounded but not case-isolated.

In fact, the key towards decidability of data-aware soundness is the combination of case-isolation and case-boundedness.

Theorem 5

Checking soundness of case-isolated, case-bounded DCDSs is decidable, and reducible to conventional model checking of propositional \(\mu \)-calculus.

Proof

Let \(\mathcal {S} = \langle \mathcal {D},\mathcal {P} \rangle \) be a case-centric DCDS that is both case-isolated and case-bounded. Thanks to case-isolation, \(\mathcal {S} \) evolves each case (and related tuples) in a way that is completely independent from that of the other cases – cf. Lemma 2. As a consequence, the soundness of a single case is not affected by the evolution of the other cases. Since by definition a case-centric DCDS starts with an empty extension for its case relation, every case of \(\mathcal {S} \) is injected into the system through the execution of the \({\textsc {new}}\hbox {-}{\textsc {case}}\) action, which has a \({\textsf {true}}\) precondition and treats every case homogeneously: it simply puts every case into the initial, \({\mathtt {in}} \) state. This, in turn, implies that all cases are sound if and only if one case is sound. We can then replace the usual condition-action rule

$$\begin{aligned} {\textsf {true}}\mapsto {\textsc {new}}\hbox {-}{\textsc {case}}() \end{aligned}$$

with the following rule, which ensures that only one single case is injected into the system:

$$\begin{aligned} \lnot \textsc {case-rel}_{\mathcal {D}}(\_) \mapsto {\textsc {new}}\hbox {-}{\textsc {case}}() \end{aligned}$$

Let \(\widehat{\mathcal {S}}\) be DCDS obtained from \(\mathcal {S} \) with such a replacement. Recalling that the (data-aware) soundness property can be formalized in \(\mu \mathcal {L}_{P} \) as \(\varPhi _{sound}\) (cf. Sect. 3.3), we obtain that \(\varUpsilon _{\mathcal {S}} \models \varPhi _{sound}\) if and only if \(\varUpsilon _{\widehat{\mathcal {S}}} \models \varPhi _{sound}\). We hence consider the model checking problem \(\varUpsilon _{\widehat{\mathcal {S}}} \models \varPhi _{sound}\).

Thanks to the modified version of the \({\textsc {new}}\hbox {-}{\textsc {case}}\) condition-action rule, in \(\varUpsilon _{\widehat{\mathcal {S}}} \) relation \(\textsc {case-rel}_{\mathcal {D}}\) is guaranteed to contain at most one tuple. The claim is then obtained by observing that:

  • since \(\mathcal {S} \) is case-bounded by hypothesis, we can apply Lemma 2, consequently obtaining that \(\widehat{\mathcal {S}}\) is state-bounded in the sense of [2].

  • By [2], we know that verification of \(\mu \mathcal {L}_{P} \) properties over state-bounded DCDSs is decidable and reducible to conventional model checking techniques, hence so is deciding whether \(\varUpsilon _{\widehat{\mathcal {S}}} \models \varPhi _{sound}\). \(\square \)

Notably, the Robin Hood archery training DCDS illustrated in Sect. 2.3 is both case-isolated and case-bounded. Hence, Theorem 5 guarantees that soundness can be checked over it.

By inspecting the proof of Theorem 5, we can make two important observations. The first one is related to the generality and robustness of the result. In our setting, we are considering the specific property of soundness, but since decidability is obtained by a reduction to finite-state model checking with the technique in [2], it actually holds more in general for all temporal properties that:

  1. 1.

    can be expressed in \(\mu \mathcal {L}_{P} \);

  2. 2.

    obey to case-isolation, that is, they can be verified without considering the interplay between different cases.

The second observation concerns the computational complexity of the problem. The abstraction technique in [2] provides us an immediate upper bound for checking soundness of case-isolated, case-bounded DCDSs. This upper bound is ExpTime in the size of the initial DCDSFootnote 7, by considering the arity of the relations bounded by a constant. This complexity result carries over to arbitrary \(\mu \mathcal {L}_{P} \) properties of the aforementioned form. As for soundness, the complexity can actually be tightened to PSpace, by observing that soundness itself can be expressed in the fragment of first-order CTL with persistent quantification (which can be seen as a proper fragment of \(\mu \mathcal {L}_{P} \)). This, in turn, makes it possible to perform verification by constructing the (abstract) transition system on-the-fly, leveraging the well-known techniques for CTL [5].

We close our investigation with three general observations on our decidability result. First of all, it is interesting to notice that, by restricting the attention to read-write relations only, the combination of case-isolation and case-boundedness requires that FK-graphs have the shape of a tree, rooted in the case relation.Footnote 8 This, in turn, implies that case-navigational queries are tree-shaped (with a negative filter). This way of structuring data (and corresponding queries) resembles document-centric processes, where each process case works over a dedicated business object, separately from the other cases. In our framework, such a business object is stored in a nested-tuple relational data structure, a solution that is adopted by concrete data-centric system, such as BizArtifactFootnote 9.

Second, we stress the novelty of the decidability result of Theorem 5: while all the key decidability results for DCDSs and similar frameworks have been provided, so far, under the hypothesis of state boundedness, Theorem 5 holds for a class of DCDSs in which relations are unbounded. Decidability is obtained thanks to the fact that, despite unboundedness, such relations are populated by composing an unbounded number of bounded data slices, each originating from a single case identifier. This reflects the methodological guideline of horizontal partitioning, in the sense discussed in Sect. 3.2.

Finally, the proof of Theorem 5 shows that case-bounded, case-isolated DCDSs can be correctly verified by shrinking their unboundedly many cases into a single, prototypical case. This can be seen as the data-aware counterpart of soundness in workflow nets [29], which is checked by introducing a single, prototypical token into the input state of the process.

5 Conclusion

We have investigated the classical notion of case-centric processes, but in a rich, data-aware setting, where we fully take into account how the process control-flow operates over a full-fledged, relational database with constraints. We have also reformulated the standard correctness notion of soundness accordingly. To ground our study, we have considered Data-Centric Dynamic Systems (DCDSs) as a reference framework.

The main result of the paper is that checking whether a case-centric DCDS is sound can be decided, under the hypothesis that each case evolves independently from the other, and that each single case cannot generate unboundedly many data directly or indirectly referring to it. Notably, decidability is obtained by showing how to find a cutoff on the number of process instances that must be subject to the soundness test, having the guarantee that soundness transfers to the entire system, where not limit is imposed on the number of process instances that can be injected.

It is important to notice that this class of DCDSs is characterized syntactically, and can be naturally paired with a set of methodological guidelines supporting the modeler during the design phase. In this light, the results presented here depart from those in [2, 4, 6]; these rely on the semantic properties of run- and state-boundedness, which are highly undecidable to check, and for which only syntactic, sufficient conditions can be provided [3]. It is also important to observe that, by recasting the results in [11] in our setting, one can show that all the modeling restrictions we require towards decidability are indeed tight, i.e., soundness turns out to be undecidable if we relax any of them.

To the best of our knowledge, the only other work that considers verification of unbounded, data-aware processes is [21]. Differently from this work, however, [21] aims at providing a general tight decidability result by just controlling the number and arity of relations. The result is theoretically interesting but of very limited practical applicability, as decidability is provided for systems working over a single, unary relation only.

We plan to continue this line of research along two directions. First of all, we want to merge this investigation with that of [11], on the one hand to generalize our decidability result to the case of navigational \(\mu \mathcal {L}_{P} \) properties, and on the other hand to go beyond case-centric processes, and consider rich artifact-centric systems in which dynamic entities co-evolve by establishing mutual many-to-many relations in a controlled way. Second, we want to study the possibility of practically implementing the presented techniques, in particular by attacking the exponentiality in the data that comes with data-aware dynamic systems through modularization techniques, taking inspiration from horizontal and vertical partitioning as introduced here.