Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Web-based workflow management systems allow diverse groups of users to collaborate efficiently on complex tasks. For example, conference management systems like EasyChair let authors, reviewers, and program committees collaborate on the organization of a scientific conference; health management systems like HealthVault let family members, doctors, and other health care providers collaborate on the management of a patient’s care; shopping sites like Amazon or Ebay let merchants, customers, as well as various other agents responsible for payment, customer service, and shipping, collaborate on the purchase and delivery of products.

Since the information maintained in such systems is often confidential, the workflows must carefully manage who has access to what information in a particular stage of the workflow. For example, in a conference management system, PC members must declare conflicts of interest, and they should only see reviews of papers where no conflict exists. Authors eventually get access to reviews of their papers, but only when the process has reached the official notification stage, and without identifying information about the reviewers.

It is difficult to characterize the legitimate information flow in such systems with standard notions of secrecy. Classic information flow policies are often too strong. For example, noninterference [12] requires that the PC member cannot observe any difference when classified input, such as the reviews of papers where the PC member has a conflict of interest, is removed. This strong requirement is typically violated, because another PC member might, for example, nondeterministically post a message in a discussion about a paper where they both have no conflict. Weaker information flow policies, on the other hand, often turn out too weak. Nondeducibility [19], for example, only requires that an agent cannot deduce, i.e., conclusively determine, the classified information. The problem is that a piece of information is considered nondeducible already if, in the entire space of potential behaviors, there exists some other explanation. In reality, however, not all agents exhibit the full set of potentially possible behaviors, and an actual agent might be able to deduce far more than expected (cf. [15]).

Temporal logics for the specification of information flow [10] are an important step forward, because they make it possible to customize the secrecy properties. HyperLTL [7] is the linear-time representative of this class of logics. As an extension of linear-time temporal logic (LTL), HyperLTL can describe the precise circumstances under which a particular information flow policy must hold. While standard linear or branching-time logics, like LTL or CTL\(^*\), can only reason about the observations at a single computation trace at a time, and can thus, by themselves, not specify information flow, HyperLTL formulas use trace quantifiers and trace variables to simultaneously refer to multiple traces. For example, HyperLTL can directly express information flow properties like “for any pair of traces \(\pi , \pi '\), if the low-security observer sees the same inputs on \(\pi \) and \(\pi '\), then the low-security observer must also see the same outputs on \(\pi \) and \(\pi '\)”. The key limitation of HyperLTL for the specification of workflows is that it is a propositional logic. It is, hence, impossible to specify the information flow in workflows unless the number of agents is fixed a-priori. In this paper, we overcome this limitation.

We introduce a framework for the specification and verification of secrecy in workflows with arbitrarily many agents. Our framework consists of a workflow description language, a specification language, and a verification method. Our workflow description language gives a precise description of the behavior of workflow management systems with an arbitrary number of agents. Figure 1 shows a simple example workflow of a conference management system. The workflow manipulates several relations over the unbounded domain of agents, that each characterize a particular relationship between the agents: for example, a pair (xp) in \(\textit{Conf}\) indicates that PC member x has declared a conflict with paper p, a triple (xyp) in \(\textit{Comm}\) indicates that PC member x has received from PC member y a message about paper p. As a specification language for the information flow policies in such workflows, we introduce a first-order version of HyperLTL. We extend HyperLTL with first-order quantifiers, allowing the formulas to refer to an arbitrary number of agents. We show that the new logic can be used to specify precise assumptions on the behavior of the agents, such as causality: while a nondeterministic agent can take any action, the actions of a causal agent can only reveal information the agent has actually observed. Restricting the behaviors of the agents to the causal behavior allows us to quantify universally over the actions of the agents, as in classic notions of secrecy like noninterference, and, at the same time, eliminate the false positives of these notions. Finally, we introduce a verification method, which translates the verification problem of workflows with arbitrarily many agents and specifications in first-order HyperLTL to the satisfiability problem of first-order logic. While first-order logic is in general undecidable, we identify conditions under which the satisfiability problem for the particular formulas in the verification of the workflows is guaranteed to be decidable.

Fig. 1.
figure 1

Example workflow from a conference management system.

2 Workflows with Arbitrarily Many Agents

Symbolic Transition Systems. As the formal setting for the specification and verification of our workflows, we chose symbolic transition systems, where the states are defined as the valuations of a set of first-order predicates \(\mathcal P\). The initial states and the transitions between states are described symbolically using an assertion logic over \(\mathcal P\). For the purpose of describing workflows, we use first-order predicate logic (PL) with equality as the assertion language.

A symbolic transition system \(\mathcal S=(\mathcal P, \varTheta , \varDelta )\) consists of a set of predicates \(\mathcal P\), an initial condition \(\varTheta \), and a transition relation \(\varDelta \). The initial condition \(\varTheta \) is given as a formula of the assertion language over the predicates \(\mathcal P\). The transition relation \(\varDelta (P_1,\ldots ,P_k; P_1', \ldots ,P_k')\) is given as a formula over the predicates \(\mathcal P=\{P_1, \ldots , P_k\}\), which indicate the interpretation of the predicates in the present state, and the set of primed predicates \(\mathcal P'=\{P_1',\ldots ,P_k'\}\), which indicate the interpretation of the predicates in the next state.

Let U be some arbitrary universe. In the case of the workflows, U is the set of agents participating in the workflow. Let \(\mathcal P^n\) denote the set of predicates with arity n. A state \(s: \bigcup _{n\ge 0}\mathcal P^n \times U^n \rightarrow \mathbb B\) is then an evaluation of the predicates over U. A trace is an infinite sequence of states \(s_0,s_1,\dots \) such that (1) \(s_0\) satisfies \(\varTheta \) (initiation), and (2) for each \(i\ge 0\), the transition relation \(\varDelta \) is satisfied by the consecutive states \(s_i\) and \(s_{i+1}\), where the predicates in \(\mathcal P\) are evaluated according to \(s_i\) and the predicates in \(\mathcal P'\) are evaluated according to \(s_{i+1}\). We denote the set of all traces of a transition system \(\mathcal S\) as \( Traces (\mathcal S)\).

The Workflow Language. We define a language to specify workflows. A workflow is structured into multiple blocks. Each block specifies the behaviour of a group of agents. A block is made of several statements which add (or remove) specific tuples from a given relation depending on a guard clause.

Here, terms \(t_1,\dots ,t_n\) are either agent variables \(x_0,\ldots ,x_k\) or constant values c, R denotes a predicate symbol, and \(\theta \) is a guard clause that needs to be met before performing the update. If the guard is not met, no update occurs. Guards can be arbitrary formulas from first-order predicate logic (PL). The set of predicate symbols contains a special symbol \( Oracle \) denoting the environment input. In order to specify deterministic/nondeterministic behaviour, we use two different kinds of statements. In a normal block, all agents execute the block, i.e., the listed sequence of guarded updates. In a \(\mathbf{may}\) block, only a subset of tuples of agents may decide to execute the block. Note that guarded remove to a predicate R of the form \( \theta \rightarrow R \mathbin {{-}{=}}(t_0,\ldots ,t_n) \) can be simulated by a guarded addition to a fresh predicate \(R'\). For that, we define: \( R(t_0,\ldots ,t_n) \wedge \lnot \theta \rightarrow R' \mathbin {{+}{=}}(t_0,\ldots ,t_n) \) and subsequently, replace uses of R with uses of \(R'\).

Semantics. In the following, we give a semantics for workflow \(w=b_1\ldots b_T\) as a transition system. The set of variables then consists of the universe \( U \) of agents participating in the workflow, together with a finite set of relations or predicates over \( U \). In order to control the transitions between system states, we require one predicate \(\textit{Choice}_i\) for the i-th \(\mathbf{may}\) statement to control the subset of tuples of agents choosing to execute the statement. Furthermore, let \(\textit{Count}_0,\ldots ,\textit{Count}_T\) denote a sequence of boolean flags indicating the current program point. Iteration of the workflow from 0 to T is expressed by the formula \(\varPhi _{\textit{Count}}\) given by:

$$\textit{Count}_T\rightarrow (\textit{Count}'_{T}\wedge \bigwedge _{l'\ne T}\lnot \textit{Count}'_{l'})\wedge \bigwedge _{l=0}^{T-1} \textit{Count}_l\rightarrow (\textit{Count}'_{l+1}\wedge \bigwedge _{l'\ne l+1}\lnot \textit{Count}'_{l'})$$

Initially, all predicates are \(\textit{false}\), except for the designated relation \( Oracle \) that provides input data to the workflow and the relations \(\textit{Choice}_i\) that provide the agent behaviour. Moreover, all flags \(\textit{Count}_l\), but \(\textit{Count}_0\) are \(\textit{false}\). An execution of the workflow program then is completely determined by the initial value of \( Oracle \) together with the choices of the agents as provided by the relations \(\textit{Choice}_i\). W.l.o.g., we assume that within a statement, every relation R is updated at most once. For every k-ary relation R and program point l, we construct a formula \(\varPhi _{R,l}(y_1,\ldots ,y_k)\) using free variables \(y_1,\ldots ,y_k\), so that \(R(y_1,\ldots ,y_k)\) holds after execution of block \(b_l\) iff \(\varPhi _{R,l}(y_1,\ldots ,y_k)\) holds before the execution of \(b_l\). The transition relation is defined by the conjunction of \(\varPhi _{\textit{Count}}\) together with the conjunction over all formulas

where \(R'\) denotes the value of R after the transition. Thus, we assume that after the last step, the workflow stutters, i.e., the last state is repeated indefinitely. For defining the formulas \(\varPhi _{R,l}(y_1,\ldots ,y_k)\), consider a block \(b_l\) of the form:

Then for \(j=1,\ldots ,r\), \(\varPhi _{R_j,l}(y_1,\ldots ,y_k)\) is \( \varPhi _{R_j,l}(y_1,\ldots ,y_k) \vee \exists x_0,\ldots ,x_m.\;\bar{\theta }_j\wedge (y_1=t_{j1})\wedge \ldots \wedge (y_{k_j}=t_{jk_j}), \) where \(\bar{\theta }_j\) is obtained from \(\theta _j\) by replacing every literal \(R_i(s_1,\ldots ,s_{k_i})\) with the corresponding formula \(\varPhi _{R_i,l}(s_1,\ldots ,s_{k_i})\). For all other predicates R, \(\varPhi _{R,l}(y_1,\ldots ,y_k) \equiv \varPhi _{R,l}(y_1,\ldots ,y_k)\). If \(b_l\) is the n-th \(\mathbf{may}\) block and of the form:

we proceed analogously, but add the choice relation \(\textit{Choice}_n(x_0,\ldots ,x_m)\) as an additional condition to the \(\theta _j\). Thus for \(j=1,\ldots ,r\), \(\varPhi _{R_j,l}(y_1,\ldots ,y_k)\) is given by:

$$\begin{aligned} \begin{array}{l} \varPhi _{R_j,l}(y_1,\ldots ,y_k) \vee \exists x_1,\ldots ,x_k.\;\bar{\theta }_j\wedge \textit{Choice}_n(x_0,\ldots ,x_m)\ \wedge \\ (y_1=t_{j1})\wedge \ldots \wedge (y_{k_j}=t_{jk_j}) \end{array} \end{aligned}$$

where \(\bar{\theta }_j\) is obtained from \(\theta _j\) by replacing every literal \(R_i(s_1,\ldots ,s_{k_i})\) with the corresponding formula \(\varPhi _{R_i,l}(s_1,\ldots ,s_{k_i})\). For all other predicates R, \(\varPhi _{R,l}(y_1,\ldots ,y_k) \equiv \varPhi _{R,l}(y_1,\ldots ,y_k)\).

We remark that, by successive substitution of the formulas \(\varPhi _{R,l}\), we obtain for every prefix of the workflow of length l and for every predicate R, a formula \(\bar{\varPhi }_{R,l}\) which expresses the value of R in terms of the predicates at program start and the predicates \(\textit{Choice}_i\) only.

Example 1

Consider a variation of the conference management workflow given in the introduction, where a set of all PC members that do not have a conflict with any paper is collected.

(\(s_1\)) \(\mathbf{forall}\ x,p \ \mathbf{may}\ \textit{true}\ \rightarrow \ \textit{Conf}\ \mathbin {{+}{=}}\ (x,p)\)

(\(s_2\)) \(\mathbf{forall}\ x,p \ \lnot \textit{Conf}(x,p)\ \rightarrow \ S\ \mathbin {{+}{=}}\ (x)\)

Then for \((s_1)\), \(\bar{\varPhi }_{\textit{Conf},1}(x,p) \equiv \bar{\varPhi }_{\textit{Conf},2}(x,p)\) is given by \( \exists x_1,p_1.\ \textit{Choice}_1(x_1,p_1)\wedge (x_1 = x)\wedge (p_1 = p), \) which is equivalent to \(\textit{Choice}_1(x,p)\). Accordingly for \((s_2)\), \(\bar{\varPhi }_{S,2}\) is given by: \( \exists x_2,p_2.\ \lnot \bar{\varPhi }_{\textit{Conf},1}(x_2,p_2)\wedge (x_2 = x) \) which can be simplified to \( \exists p_2.\ \lnot \bar{\varPhi }_{\textit{Conf},1}(x,p_2) \). Altogether, we obtain:

$$ \begin{array}{lll} \bar{\varPhi }_{\textit{Conf},2}(x,p) &{}\equiv &{} \textit{Choice}_1(x,p) \\ \bar{\varPhi }_{S,2}(x) &{}\equiv &{} \exists p_2.\ \lnot \textit{Choice}_1(x,p_2)\\ \end{array} $$

Example 2

Consider the workflow (WF1), shown in Fig. 1 in the introduction. Within this workflow, every statement updates exactly one predicate, and each predicate \(\textit{Conf}\), \(\textit{A}\), \(\textit{Read}\) and \(\textit{Comm}\) is updated only once. Accordingly, we can drop the extra index t and write \(\bar{\varPhi }_{\textit{Conf}}\), \(\bar{\varPhi }_{\textit{A}}\), \(\bar{\varPhi }_{\textit{Read}}\), \(\bar{\varPhi }_{\textit{Comm}}\) for the corresponding predicates after their respective updates. We have:

   \(\square \)

3 Specifying Secrecy with First-Order HyperLTL

HyperLTL [7] is a recent extension of linear-time temporal logic (LTL) with trace variables and trace quantifiers. HyperLTL can express noninterference and other information flow policies by relating multiple traces, which are each identified by a separate trace variable. Since HyperLTL was introduced as a propositional logic, it cannot express properties about systems with an arbitrary number of agents. We now present first-order HyperLTL, which extends propositional HyperLTL with first-order quantifiers. In the following, we will refer to first-order HyperLTL simply as HyperLTL.

HyperLTL Syntax. Let \(\mathcal P\) be a set of predicates, \(\mathcal V\) be a set of first-order variables, and \(\varPi \) be a set of trace variables. We call the set \(\mathcal P_{\varPi }=\{ P_\pi \mid P \in \mathcal P, \pi \in \varPi \}\) the set of indexed predicates. Our logic builds on the assertion language used in the description of the symbolic transition systems. In the case of the workflows, this is first-order predicate logic (PL) with equality. The atomic formulas of HyperLTL are formulas of the assertion language over the indexed predicates \(\mathcal P_{\varPi }\) and the variables \(\mathcal V\). HyperLTL formulas are then generated by the following grammar (with initial symbol \(\psi \)):

where \(\varPsi \) is an atomic formula, \(\pi \in \varPi \) is a trace variable, and \(x\in \mathcal{V}\) is a first-order variable. HyperLTL formulas thus start with a prefix of trace quantifiers consisting of at least one quantifier and then continue with a subformula that contains only first-order quantifiers, no trace quantifiers. Universal trace quantification is defined as \(\forall \pi .\varphi \equiv \lnot \exists \pi .\lnot \varphi \). \(\mathrel {\mathcal U}\) and are the usual Until and Next modalities from LTL. We also consider the usual derived Boolean operators and the derived temporal operators Eventually , Globally , and Weak Until .

HyperLTL Semantics. The semantics of a HyperLTL formula \(\psi \) is given with respect to a set of traces \(\mathcal T\), an evaluation \(\alpha : \mathcal V \rightarrow U\) of the first-order variables, and an evaluation \(\beta : \varPi \rightarrow \mathcal T\) of the trace variables. Let \(\sigma (n)\) denote the n-th element in a trace \(\sigma \), and let \(\sigma [n,\infty ]=\sigma (n)\sigma (n+1)\ldots \) denote the n-th suffix of \(\sigma \). We lift the suffix operation from traces to trace assignments and define \(\beta [n,\infty ](\sigma ):=\beta (\sigma )[n,\infty ]\). The update of an evaluation of the first-order or trace variables is defined as follows: \(\gamma [x\mapsto a](x)= a\) and \(\gamma [x\mapsto a](y)= \gamma (x)\) for \(x\ne y\). The satisfaction of a HyperLTL formula \(\psi \), denoted by \(\alpha , \beta \models _\mathcal{T} \psi \), is then defined as follows:

where \(\psi , \varphi _1\), and \(\varphi _2\) are HyperLTL formulas, \(\varPsi \) is an atomic formula, and \(\alpha , \delta \models \varPsi \) denotes the satisfaction of the formula \(\varPsi \) of the assertion logic in the valuation \(\alpha \) of the first-order variables and the interpretation \(\delta \) of the indexed predicates. The interpretation \(\delta (P_\pi )\) of an indexed predicate \(P_\pi \) is defined as the interpretation \(\delta (P_\pi ) = \beta (\pi )(0)(P)\) of P provided by the first state of the trace assigned to \(\pi \). A formula without free first-order and trace variables is called closed. A closed formula \(\psi \) is satisfied by a transition system \(\mathcal S\), denoted by \(\mathcal S \models \psi \), iff \(\alpha ,\beta \models _\mathcal{T} \psi \) for the empty assignments \(\alpha \) and \(\beta \) and the set \(\mathcal T= Traces\mathcal (\mathcal S)\) of traces of the transition system. HyperLTL formulas in which all trace quantifiers are universal are called universal formulas. In the remainder of the paper, we will only consider universal formulas. This fragment contains many information flow properties of practical interest.

Noninterference. Secrecy properties like noninterference are based on a classification of the inputs and outputs of a system into either low, meaning not confidential, or high, meaning highly confidential. A system has the noninterference property [12] if in any pair of traces where the low inputs are the same, the low outputs are the same as well, regardless of the high inputs. When we are interested in the noninterference property of a single agent, it is possible to model the low and high inputs and the low and high outputs of the system (as seen by the agent) using separate predicates, for example, as \(I_l, I_h, O_l, O_h\), respectively. Noninterference can then be expressed as the HyperLTL formula

which states that all traces \(\pi \) and \(\pi '\) that have the same low input \(I_l\) at all times, must also have the same low output \(O_l\) at all times.

In a workflow, the inputs or outputs of different agents may be collected in the same predicate. In the conference management example from the introduction, the low outputs observed by a PC member x consist of the pairs (xpr) for some paper p in the \(\textit{Read}\) relation and, additionally, of the tuples (xyp) for some PC member y and a paper p in the \(\textit{Comm}\) relation. The low input provided by agent x is given by the tuples of the \(\textit{Choice}\) predicates that begin with x. Additionally, the system has high input in the form of the \( Oracle \) predicate.

Generalizing from the example, we assume there is one or more predicates of the form \(O_l(x,{\varvec{y}})\), modeling low output observed by the agents from the system, and one or more predicates of the form \(I_l(x,{\varvec{y}})\) modeling low inputs provided by the agents to the system. An output is observable by agent x whenever x occurs in the first position of the tuple. Likewise, an input is controllable by agent x whenever x occurs in the first position of the tuple. The remaining components of the tuple are denoted by the vector \({\varvec{y}}=y_1,y_2,\ldots \). Noninterference is then expressed as the HyperLTL formula

which states that, for all agents x, if the low input provided by agent x on traces \(\pi \) and \(\pi '\) is the same, then the low output read by x on \(\pi \) and \(\pi '\) must be the same as well.

Declassification. Declassification [18] becomes necessary when the functionality of the system makes it unavoidable that some information is leaked. In the conference management example, a PC member x is supposed to read the reviews of the papers assigned to x. This is legitimate as long as x has not declared a conflict of interest with those papers. We assume that, in addition to the input and output predicates, there is a declassification condition \(D(x,{\varvec{y}})\), which indicates that agent x is allowed to learn about the high input \(I_h(x,{\varvec{y}})\). Noninterference with Declassification is then expressed as the HyperLTL formula

figure a

which expresses that on all pairs of traces where the low inputs are the same and, additionally, the high inputs are the same whenever the declassification condition is true, the low outputs must be the same.

Example 3

In the conference management example, we specify the information flow policy that an agent should not receive information regarding papers where a conflict of interest has been declared as a noninterference property:

figure b

   \(\square \)

Causality Assumptions on Agents. In the workflow from Fig. 1, it is easy to see that no PC member can directly read the reviews of papers where a conflict of interest has been declared: the PC member can only read a review if the PC member was assigned to the paper, which, in turn, can only happen if no conflict of interest was declared. It is much more difficult to rule out an indirect flow of information via a message sent by another PC member. So far, neither the description of the workflow, nor the HyperLTL specification would prevent other PC members to add such messages to \(\textit{Comm}\). To rule out messages that would leak information about papers where a PC member has a conflict, we must make assumptions about the possible behaviors of the other agents.

Stubborn Agents. A radical restriction on the behavior of the other agents is to require that they always, stubbornly, produce the same input, independently of their own observations. We assume that the input is represented by one or more predicates of the form \(I(x,{\varvec{y}})\), where an input is controllable by agent x whenever x occurs in the first position of the tuple. The requirement for traces \(\pi ,\pi '\) that all agents are stubborn can be specified by the HyperLTL formula:

Causal Agents. A more natural restriction on the behavior of the other agents is to require that they act causally, i.e., they only provide different inputs if they, themselves, have previously observed different outputs. The causality of agents w.r.t. traces \(\pi ,\pi '\) can be described by the HyperLTL formula:

$$ \begin{array}{l} \forall x.\ (\forall {\varvec{y}} .\ I_{\pi }(x,{\varvec{y}}) \leftrightarrow I_{\pi '}(x,{\varvec{y}}))\ \mathrel {\mathcal W}\ (\exists {\varvec{y}}.\ O_{\pi }(x,{\varvec{y}}) \not \leftrightarrow O_{\pi '}(x,{\varvec{y}})) \end{array} $$

which states that, for all agents x the inputs provided on two traces are the same until a difference in the outputs observed by x occurs.

Example 4

In the conference management example, stubbornness for traces \(\pi ,\pi '\) can be specified as the HyperLTL formula The requirement of causality for \(\pi ,\pi '\) is specified as the HyperLTL formula

figure c

   \(\square \)

Combining the agent assumptions with the specification of noninterference (and possibly declassification), we obtain a formula of the form

$$ \forall \pi _1, \ldots , \pi _n.\ \varphi _\mathsf{causal} \rightarrow \varphi $$

where \(\varphi _\mathsf{causal}\) describes the agent assumption on all pairs of paths in \(\pi _1,\ldots ,\pi _n\).

4 Verifying Secrecy

We now present a bounded model checking method for symbolic transition systems and HyperLTL specifications. The approach reduces the violation of a HyperLTL formula on the prefix of a trace of a given symbolic transition system to the satisfiability of a formula of the assertion language. For workflows, it suffices to consider prefixes of bounded length, because the workflow terminates (and then stutters forever) after a fixed number of steps. Since the assertion language in the description of the workflows is first-order predicate logic, satisfiability of the resulting formula is not necessarily decidable. We return to this issue in Sect. 5, where we identify conditions under which decidability is guaranteed.

Bounded Satisfaction. Bounded model checking is based on a restricted notion of HyperLTL satisfaction where only trace prefixes of length n, for some fixed bound n, are considered. Let \(\mathcal T\) be a set of traces, \(\alpha : \mathcal V \rightarrow U\) an evaluation of the first-order variables, and \(\beta : \varPi \rightarrow \mathcal T\) an evaluation of the trace variables. The n-bounded satisfaction of a HyperLTL formula \(\psi \), denoted by \(\alpha ,\beta \models ^n_\mathcal{T} \psi \), is then defined as follows:

where \(\psi , \varphi ,\varphi _1\), and \(\varphi _2\) are HyperLTL formulas, \(\varPsi \) is an atomic formula, and \(\delta (P_\pi ) = \beta (\pi )(0)(P)\). A closed formula \(\psi \) is n-bounded satisfied by a transition system \(\mathcal S\), denoted by \(\mathcal S \models ^n \psi \), iff \(\alpha ,\beta \models ^n_\mathcal{T} \psi \) for the empty assignments \(\alpha \) and \(\beta \) and the set \(\mathcal{T}= Traces (\mathcal S)\) of traces of the transition system.

For workflows, satisfaction and bounded satisfaction coincide.

Theorem 1

Let \(\mathcal S\) be the transition system representing a workflow with n blocks. For all HyperLTL formulas \(\psi \), it holds that \(\mathcal S \models \psi \text { iff } \mathcal S \models ^n \psi \).

Bounded Model Checking. We now translate a transition system \(\mathcal S\) and a given universal HyperLTL formula for a given bound n into a formula \(\varPsi _{\mathcal S, \lnot \psi }\) of the assertion language such that \(\varPsi _{\mathcal S, \lnot \psi }\) is satisfiable iff \(\mathcal S \not \models ^n \psi \). Since \(\psi \) is universal, its negation is of the form \(\exists \pi _1,\ldots , \pi _k.\ \varphi \), where \(\varphi \) does not contain any more trace quantifiers. Let the set of predicates \(\mathcal P\) be given as \(\mathcal P = \{ P_1, \ldots , P_m\}\). In \(\varPsi ^n_{\mathcal S, \lnot \psi }\), we use for every predicate \(P_i\) several copies \(P_{i,\pi ,l}\), one per trace variable \(\pi \in \{\pi _1,\ldots ,\pi _k\}\) and position l, \(0\le l \le n\). The formula \(\varPsi ^n_{\mathcal S, \lnot \psi } = [\![\mathcal S ]\!]^n \wedge [\![\varphi ]\!]_0^n\) is a conjunction of two formulas of the assertion language, the unfolding \([\![\mathcal S ]\!]^n\) of the transition system \(\mathcal S\) and the unfolding \([\![\varphi ]\!]_0^n\) of the HyperLTL formula \(\varphi \).

For a symbolic transition system \(\mathcal S\) and a bound \(n \ge 0\), the unfolding \([\![\mathcal S ]\!]^n\) is defined as follows:

figure d

For a HyperLTL formula \(\varphi \) without trace quantifiers and a bound \(n \ge 0\), the unfolding \([\![\varphi ]\!]_l^n\) is defined as follows:

where \(\varphi , \varphi _1\), and \(\varphi _2\) are HyperLTL formulas, \(\varPsi \) is a formula of the assertion language over indexed predicates \(P_{i,\pi }\) and \(\varPsi _l\) is the same formula with all occurrences of an indexed predicate \(P_{i,\pi }\) replaced by the predicate \(P_{i,\pi ,l}\).

Theorem 2

For a symbolic transition system S, a universal HyperLTL formulas \(\psi \), and a bound \(n\ge 0\), it holds that \(S \models ^n \psi \) iff \(\varPsi ^n_{\mathcal S, \lnot \psi }\) is unsatisfiable.

Combining Theorems 1 and 2, we obtain the corollary that bounded model checking is a complete verification technique for workflows.

Corollary 1

Let \(\mathcal S\) be the transition system representing a workflow with T blocks. For all HyperLTL formulas \(\psi \), it holds that \(\mathcal S \models \psi \) iff \(\varPsi ^T_{\mathcal S, \lnot \psi }\) is unsatisfiable.    \(\square \)

5 Decidability

We now identify cases where the satisfiability of the predicate logic formulas constructed by the verification method of the previous section are decidable. For background on PL and decidable subclasses, we refer to the textbook [6].

Theorem 3

Consider a workflow consisting of T blocks where all agents are stubborn, and every predicate R encountered by the workflow after l blocks, is characterized by a quantifier-free formula \(\bar{\varPhi }_{R,l}\).

Let \(\forall \pi _1,\ldots ,\pi _r. \varphi _\mathsf{stubborn}\rightarrow \varphi \) denotes a HyperLTL formula where \(\varPsi '\equiv [\![ \lnot \varphi ]\!]_0^T\) is a Bernays-Schönfinkel formula, i.e., the prenex form of \(\varPsi '\) has a quantifier sequence of the form \(\exists ^*\forall ^*\). Then it is decidable whether \(\forall \pi _1,\ldots ,\pi _r.\ \varphi _\mathsf{stubborn}\rightarrow \varphi \) holds.

Proof

For every predicate R, let \(\bar{\varPhi }_{R,\pi _j,l}\) denote the formula which characterizes \(R_{\pi _j,l}\), i.e., the value of R after l blocks along the execution of \(\pi _j\). The formula \(\bar{\varPhi }_{R,\pi _j,l}\) is obtained from \(\bar{\varPhi }_{R,l}\) by replacing the occurrences of \(\textit{Choice}_i, Oracle \) with \(\textit{Choice}_{i,\pi _j}\) and \( Oracle _{\pi _j}\), respectively. Let \(\bar{\varPsi }'\) denote the formula obtained from \(\varPsi '\) by first replacing every occurrence of a literal \(R_{\pi _j,l}(s_1,\ldots ,s_k)\) with \(\bar{\varPhi }_{R,\pi _j,l}(s_1,\ldots ,s_k)\). As all agents are stubborn, the predicates \(\textit{Choice}_{i,\pi _j}\) are equivalent for \(j=1,\ldots ,r\). Accordingly, we may replace all \(\textit{Choice}_{i,\pi _j}(s_1,\ldots ,s_k)\) with \(\textit{Choice}_{i,\pi _1}(s_1,\ldots ,s_k)\). The resulting formula is still a Bernays-Schönfinkel formula. It is unsatisfiable iff \(\forall \pi _1,\ldots ,\pi _r.\ \varphi _\mathsf{stubborn}\rightarrow \varphi \) is universally true. Satisfiability of \(\bar{\varPsi }'\), however, is decidable — which thus implies the the theorem.

   \(\square \)

Theorem 3 can be extended also to more general classes of workflows, given that the predicates \(R_{\pi _j,l}\) occur only positively or only negatively in \(\varPsi '\). Non-interference, however, amounts to stating that (under certain conditions) no distinction is observable between some \(R_{\pi _j,l}\) and \(R_{\pi _{j'},l}\). Logically, indistinguishability is expressed by equivalence, which thus results in both positive and negative occurrences of the predicates in question.

Theorem 4

Consider a workflow consisting of T blocks where all agents are causal, and every predicate R encountered by the workflow after l blocks, is characterized by a quantifierfree formula \(\bar{\varPhi }_{R,l}\). Assume that \(\forall \pi _1,\ldots ,\pi _r.\ \varphi _\mathsf{causal}\rightarrow \varphi \) is a temporal formula where the prenex form of \(\varPsi '\equiv [\![ \lnot \varphi ]\!]_0^T\) is purely existential. Then it is decidable whether \(\forall \pi _1,\ldots ,\pi _r.\ \varphi _\mathsf{causal}\rightarrow \varphi \) holds.

Proof

The argument for causal agents is somewhat more complicated and accordingly leads to decidability only for a smaller fragment of HyperLTL formulas. Removal of the temporal operators and skolemization of the formula \(\varphi _\mathsf{causal}\) describing causality yields a conjunction of clauses in one of the forms \((*)\): \({S(x,f_1(x),\ldots ,f_r(x)) \vee \textit{Choice}_{i,\pi _{j_1}}(x,\mathbf{z})\vee \lnot \textit{Choice}_{i,\pi _j}(x,\mathbf{z})}\), or \({S(x,f_1(x),\ldots ,f_r(x)) \vee \lnot \textit{Choice}_{i,\pi _{j_2}}(x,\mathbf{z})\vee \textit{Choice}_{i,\pi _j}(x,\mathbf{z})}\) for \(j_1,j_2<j\), where the disjunction S refers to predicates which depend on \(\textit{Choice}_{i',\_}\) for \(i'<i\) only. To perform ordered resolution, we order predicates so that \(\textit{Choice}_{i,\pi _j}\) receives a higher priority than \(\textit{Choice}_{i',\pi _{j'}}\) if \(i'<i\) or, if \(i=i'\), \(j'<j\). All predicates in S have lower priorities than the \(\textit{Choice}\) predicates. Accordingly, the highest priority literal in each clause of \(\varphi _\mathsf{causal}\) contains all free variables of the clause.

Let us first consider the case \(r=2\). Then resolution of two clauses with a positive and negative occurrence of the same highest-priority literal will result in a tautology and therefore is useless. As in the proof of Theorem 3, let \(\bar{\varPsi }'\) denote the formula obtained from \(\varPsi '\) by replacing each occurrence of a predicate \(R_{\pi _j,l}(s_1,\ldots ,s_k)\) with the formulas \(\bar{\varPhi }_{R,\pi _j,l}(s_1,\ldots ,s_k)\) (\(j=1,2\)). According to our assumption on \(\varPsi '\), the clauses obtained from \(\bar{\varPsi }'\) are all ground. Resolution of such a clause with a clause of \(\varphi _\mathsf{causal}\) for some \(\textit{Choice}_{i,\pi _2}\) will again return a ground formula. By substituting the semantic formulas \(\bar{\varPhi }_{R,\pi _j,l}\) we obtain a set of new ground clauses, this time, however, with occurrences of predicates \(\textit{Choice}_{i',\pi _{j'}}, i'<i,\) only. As a consequence, for every i, there is a bounded number of new clauses derivable by means of clauses from \(\varphi _\mathsf{causal}\) with highest priority predicate \(\textit{Choice}_{i,\pi _2}\). Altogether, we therefore obtain only a bounded number of ground clauses which are derivable by means of ordered resolution. Hence, it is decidable whether a contradiction is derivable or not. This concludes the proof.

The argument for \(r > 2\) is similar, only that resolution of any two such clauses originating from \(\varphi _\mathsf{causal}\) with \(j_1\ne j_2\) upon the literal \(\textit{Choice}_{i,\pi _{j}}(x,\mathbf{z})\)) will again result in a clause of the given form. In particular, no further literals are introduced. Therefore, saturation of \(\varphi _\mathsf{causal}\) by ordered resolution will eventually terminate. Then the argument for termination proceeds analogously to the case \(r=2\) where \(\varphi _\mathsf{causal}\) is replaced with the saturation of \(\varphi _\mathsf{causal}\).    \(\square \)

Theorem 4 can be extended to formulas \(\varphi \) where \(\bar{\varPsi }'\) obtained from \([\![ \lnot \varphi ]\!]_0^T\) is a Bernays-Schönfinkel formula at least in restricted cases. Consider the clauses of the form \((*)\) as obtained from \(\varphi _\mathsf{causal}\) after skolemization. In case that the disjunction S is empty, we call the corresponding clause simple, otherwise complex. Now assume that complex clauses from the saturation of \(\varphi _\mathsf{causal}\) are always resolved with clauses (originating from the skolemization of \(\bar{\varPsi }'\)) upon a ground literal. Then the same argument as in the proof of Theorem 4 applies to show that saturation by resolution will eventually terminate.

6 Completing the Conference Management Example

We now complete the verification of our running example, that no PC member learns about the reviews of a paper for which he has declared a conflict. As already discussed in Sect. 3, it is easy to see that no PC member can directly read the reviews of papers where a conflict of interest has been declared. To prove the noninterference property in Example 3, it remains to show that the communication received from the other agents is the same on two traces \(\pi \) and \(\pi '\) whenever the Oracle for the papers with a conflict are the same on \(\pi \) and \(\pi '\). For both stubborn and causal agents, the predicates \(\textit{Conf}_\pi (x, p)\) and \(\textit{A}_\pi (y, p)\) coincide with their counterparts in \(\pi '\). Furthermore, for stubborn agents, the \(\textit{Choice}\) predicates do not depend on the execution paths. As the predicates \(\textit{Comm}_\pi \) and \(\textit{Comm}_{\pi '}\) only depend on \(\textit{Choice}\) predicates, the equivalence in the conclusion is trivially true. Hence, the property holds under the assumption that the agents are stubborn. The situation is different for causal agents. The causality assumption \(\varphi _\mathsf{causal}\) (given in Sect. 3) states that the other PC members only send different communications if there was a different observation on the two traces. Since causality already implies that \(\textit{Choice}_1\) and \(\textit{Choice}_2\) are equal on all paths, this can be omitted from the antecendent of the requirement. The negation of the remaining property is then given by the following formula:

Due to the causality assumption, when we unroll \(\mathrel {\mathcal W}\) and replace \(\textit{Read}\) with its semantics formula \(\bar{\varPhi }_{\textit{Read}}\), we obtain that \(\textit{Choice}_1\) and \(\textit{Choice}_2\) are always equal and \(\textit{Choice}_3\) could differ on \(\pi \) and \(\pi '\) if there is a difference in the oracle.

Since in our example, the relation \(\textit{Comm}\) is only assigned once, the operator is unrolled to a large disjunction that is false everywhere before the last step, since \(\textit{Comm}\) is empty on both paths. By unrolling \(\lnot \varPsi \) and subsequently simplifying the formula with the causal equalities for \(\textit{Choice}_1\) and \(\textit{Choice}_2\), we obtain:

Note that all literals \(\textit{Choice}_{3,\pi }(y,x,p'),\textit{Choice}_{3,\pi '}(y,x,p')\) contain existentially quantified variables only. Therefore, the assumptions of (the extension of) Theorem 4 are met. For the given formula, no contradiction can be derived. Instead, a model can be constructed as follows:

Suppose the PC member x who has a conflict with paper p is assigned to a paper q where he does not have a conflict, and another PC member y, who does not have a conflict with either paper, is assigned to both p and q. Then y can communicate with x and therefore leak the review on paper p to x. To repair the problem, we let the PC chair remove the assignment of PC member y to paper q in such situations. Let (WF2) be (WF1) with the new line (2a) added in-between lines (2) and (3):

(2a) \(\mathbf{forall}\ x,y,p,q .\ \textit{Conf}(x,p) \wedge \lnot \textit{Conf}(y,p) \wedge \textit{A}(x,q) \wedge \textit{A}(y,q) \rightarrow \textit{A}\mathbin {{-}{=}}(y,q)\)

% PC chair removes assignments that might cause leaks

For the resulting workflow (WF2), we obtain a new formula \(\bar{\varPhi }_{\textit{A}'}\), which in turn affects the formulas \(\bar{\varPhi }_\textit{Read}\) and \(\bar{\varPhi }_\textit{Comm}\) for \(\textit{Read}\) and \(\textit{Comm}\):

figure e

The resulting formula after substitution of the semantics formulas and simplification is similar to \(\lnot \bar{\varPsi }\), but adds two conjunctions with the \(\forall \)-clause of \(\bar{\varPhi }_{\textit{A}'}\) instantiated for \((x,p')\) and \((y,p')\) on both sides of the inequivalence. The resulting formula is a Bernays-Schönfinkel formula where again the decision procedure of Theorem 4 can be applied. That procedure now derives a contradiction. Intuitively, the reason is that on both paths, x has declared a conflict with \(p_1\). Since y is assigned to \(p_1\), x and y cannot be assigned jointly to the same paper. Thus, both sides of the inequivalence collapse to \(\textit{false}\) — implying that for (WF2) requirement (2) is satisfied and thus (WF2) is indeed noninterferent.    \(\square \)

7 Related Work

There is a vast body of work on information flow policies and associated verification techniques. We mention Goguen and Meseguer’s seminal work on noninterference [12], Zdancewic and Myer’s observational determinism [20], Sutherland’s nondeducability [19], and Halpern and O’Neill’s secrecy maintenance [13] as representative examples. See Kanav et al. [15] for a recent overview with a detailed discussion of the most relevant notions for the verification of workflows. Our approach is based on the temporal logic HyperLTL [7]. HyperLTL has been applied in the verification of hardware systems, such as an Ethernet controller with 20000 latches [11]. Other logical approaches to information flow control include SecLTL [8], the polyadic modal \(\mu \)-calculus [2] and the epistemic temporal logics [9]. While standard linear-time temporal logic has been extended with first-order quantifiers [16], our first-order extension of HyperLTL is the first temporal logic for the specification of information flow in systems with arbitrarily many agents. In terms of practical verification efforts, there has been a lot of recent interest in proving secrecy in web-based workflow management systems. For example, for the ConfiChair conference management system it was proven that the system provider cannot learn the contents of papers [3]. For CoCon, another conference management system, it was proven that various groups of users, such as authors, reviewers, and PC members cannot deduce certain content, such as reviews, unless certain declassification triggers, such as being a PC member without a conflict of interest, are met [15]. For the verification of an eHealth system, Bhardwaj and Prasad [5] assume that all agents are known at analysis time. Based on this information, the authors construct a dedicated security lattice and then apply techniques from universal information flow [1, 14]. Our verification method is based on a reduction to the satisfiability problem of first-order predicate logic. First-order logic has many applications in verification. Most related, perhaps, is recent work on the verification of software defined networks [4, 17]. There, a network controller is translated into a first order formula and either a theorem prover or an SMT-solver is used to determine properties of the topology so that the controller satisfies a given invariant.

8 Conclusion

We have presented a formalization of fine-grained security properties for workflow systems with an unbounded number of agents. HyperLTL is the first approach to specify hyperproperties for systems without a fixed set of agents. For the verification of HyperLTL formulas, we have provided a bounded model checking algorithm that translates the problem of verifying such a property for a given workflow to the satisfiability of first-order predicate logic. We have also provided a non-trivial fragment of properties and workflows so that the corresponding decision problem is decidable. As an example we considered noninterference for a simple workflow of a conference management system. Unexpectedly, our method exhibited a subtle form of indirect information flow. We also indicated how that deficiency can be cured. All corresponding proving took place within our benevolent fragments. Various problems remain for future work. Further decidable fragments are of major concern. Also, our work should be extended to more complex and thus more expressive forms of workflows.