1 Introduction

Information flow control discovers software security leaks by analysing the source or machine code of a program. Information flow control guarantees confidentiality if secret data which are processed in the program cannot influence publicly visible behaviour of that program; it guarantees integrity if critical computations performed by the program cannot be manipulated from public ports. Four types of confidentiality leaks have been identified [40]:

  • explicit leaks arise if secret data are (indirectly) copied to public output;

  • implicit leaks arise if secret data influence control flow (which may lead to publicly visible effects);

  • possibilistic leaks in concurrent programs arise if the set of possible public outputs (under any possible interleaving) depends on secret data;

  • probabilistic leaks in concurrent programs arise if, for some scheduling strategy, the probability of some publicly visible behaviour depends on secret data.

Figure 1 presents an example. The bottom left program has a possibilistic leak: interleaving order 2, 6, 7, 3 causes the secret PIN to be printed on public output; hence, the set of possible outputs depends on secret data. The program to the right has no possibilistic leak (see Sect. 3). But the PIN’s value may alter the probabilities of the outputs “\(0\)” resp. “\(1\)”, because the running time of the loop may influence the interleaving order of the two assignments to \(x\). Thus, a secret value changes the probability of a public output—a probabilistic leak.

Information flow control (IFC) aims at discovering all such security leaks. Information flow control for multi-threaded programs is challenging, as it must prevent possibilistic or probabilistic information leaks. Most IFC approaches check some form of noninterference [40], and to this end classify program variables, input and output as “high” (secret) or “low” (public). Probabilistic Noninterference [29, 4144] is the established security criterion for concurrent programs. It is difficult to guarantee probabilistic noninterference, as an IFC must in principle check all possible interleavings and their impact on execution probabilities. Due to this difficulty, some analysis algorithms for probabilistic noninterference put severe restrictions on program or scheduler behaviour.

Fig. 1
figure 1

Examples for explicit and implicit leaks (top left), for a possibilistic leak (bottom left), and for a probabilistic leak (right)

One specific form of probabilistic noninterference (PN), however, is scheduler independent: Low-Security Observational Determinism demands that for a program which runs on two low-equivalent inputs, all possible traces are low equivalent [20, 39, 53]. Traces log all operations (events) and memory states for a given run and interleaving; low-equivalent inputs coincide on low input values, and low-equivalent traces coincide on operations using low variables. Thus, low-security observational determinism (LSOD) in fact demands that execution order conflicts between low events are disallowed, if they may be influenced by high events. The following criterion is sufficient to guarantee LSOD [53]: (1) program parts contributing to low-observable behaviour are conflict free, that is, the program is low-observable deterministic; (2) implicit or explicit flows do not leak high data to low-observable behaviour. Earlier research [20, 53] has shown that the LSOD criterion guarantees PN. But several attempts to devise program analysis algorithms for LSOD turned out to be unsound, unprecise, or very restrictive. In particular, simplistic LSOD will absolutely prohibit any (even secure) low nondeterminism. Hence, LSOD never gained popularity, and there have been very few realistic implementations.

This article aims to overcome all these previous LSOD obstacles. It demonstrates that LSOD can be checked naturally using Program Dependence Graphs (PDGs) for concurrent programs. PDGs have already been developed as an IFC analysis tool for full sequential Java [15, 16, 46, 51] and demonstrated high precision and scalability. The current work shows how to use PDGs for a precise LSOD checker. It uses a new definition of low-equivalent traces, which—in case of nonterminating traces—avoids certain problems of earlier definitions. Exploiting the structure of PDGs, the algorithm is flow-sensitive, object-sensitive, and context-sensitive. It is sound and does not impose restrictions on the thread or program structure. It also relaxes the classical LSOD definition by allowing secure nondeterminism, while preserving soundness. It turns out that flow sensitivity is the key to eliminating soundness gaps and unrealistic restrictions. The algorithm also exploits advances in the may-happen-in-parallel (MHP) analysis of concurrent programs, which allow even time-sensitive MHP and IFC.

Note that PDG-based IFC for sequential programs has been described in detail in [16]; we assume some familiarity with this earlier work. The current article concentrates on PDG-based IFC for concurrent programs. We present an informal overview (Sect. 2), formally develop the new RLSOD criterion and its soundness proof (Sect. 3), summarize PDGs for concurrent programs (Sect. 4), show how the criterion can be precisely and soundly approximated by a PDG-based static RLSOD check (Sect. 5), explain the algorithm details (Sect. 6), and present data about performance and precision (Sect. 7). Related work is discussed in Sect. 8.

1.1 The JOANA tool

PDG-based IFC, including the new algorithm described in this article, has been fully implemented. The system, called JOANA, is available for public download, or can be used by everybody through a Java webstart GUI.Footnote 1 The engineer must provide Java sources to be analysed, where all input and output statements are annotated “high” or “low” (other statements do not need annotations). JOANA can handle full Java byte code with arbitrary threads, scales to ca. 50kLOC and empirically demonstrates high precision [1416, 46]. JOANA is based on a stack of sophisticated program analysis algorithms (pointer analysis, exception analysis, PDG construction; some details are described in Sect. 4). JOANA minimizes false alarms through flow-, context-, object-, and field-sensitive analysis techniques. JOANA allows declassification along sequential information flows. In concurrent programs, all possibilistic and probabilistic leaks are discovered. JOANA and the underlying program analysis was developed over the last 15 years and was used in realistic case studies such as [24]. The practical application is described in detail in [12].

2 Overview of approach

2.1 Security policy

IFC analysis must discover all possible violations of confidentiality and integrity—including probabilistic ones—for realistic programs. Our IFC analysis thus aims to provide a sound, precise, scalable, and nonrestrictive noninterference criterion for programs in full Java containing arbitrary threads, needing few annotations and admitting possible declassifications.

2.2 Why LSOD?

We chose LSOD as the fundamental mechanism because it has the huge advantage of being scheduler independent. However, attempts to define LSOD in a termination-sensitive way led to severe restrictions (see below). We therefore aim for a definition which is termination insensitive, but flow-, context-, and object-sensitive. From a practical viewpoint, we believe that these features are more important than termination sensitivity; they also overcome previous obstacles to the use of LSOD. In particular, our new RLSOD criterion will not prohibit secure low nondeterminism.

2.3 Flow sensitivity

A flow-sensitive analysis takes statement order into account, and a context-sensitive analysis takes procedure calling context into account. PDG-based IFC was introduced because PDGs are naturally flow- and context-sensitive. Studies have shown that this reduces false alarms considerable (e.g. [15]); Fig. 2 left presents two sequential examples (we assume h to contain a high value and l to contain a low value). For object-oriented programs, object- and field-sensitivity are similarly important [16].

Fig. 2
figure 2

Three secure program fragments. Flow- or context-insensitive analysis will generate false alarms

Now consider the multi-threaded example in Fig. 2 right. A flow-insensitive IFC analysis ignores statement order, thus assumes that lines 8 and 9 are interchangeable and generates a false alarm. We will return to this example in Sects. 3 and 8. Note that typical security type systems are flow- and/or context-insensitive and will reject all programs in Fig. 2. The type system in [21] is flow-sensitive, but not context-sensitive.

2.4 Classification of statements and data

Program data and operations are classified either low (public) or high (secret).Footnote 2 Note that in our flow-sensitive approach, classification of values and variables happens per statement or expression in the source code—there is no global classification of variables. Technically, PDG nodes \(n\) (in particular nodes for (sub)expressions in statements) are statically classified: \(cl(n)=L\) resp. \(cl(n)=H\). It is enough to classify inputs (sources) and outputs (sinks), as the classification of all intermediate nodes can easily be computed by a fixpoint iteration on the PDG [16]. Hence, a variable may at one program point (PDG node) contain a low value, and at another point a high value, as both variable occurrences are represented by different PDG nodes. Still, soundness is guaranteed, while flow sensitivity (which is naturally provided by PDGs) offers precision gains and fewer restrictions on programs.

We further assume that program input and output consist of streams of (perhaps nonprimitive) values, where a complete stream has a security classification. In accordance with flow sensitivity, high input streams are not part of initial memory, but all streams (except stdin and stdout) have to be explicitly opened.Footnote 3 Inputs are low-equivalent if they coincide on low input streams.

2.5 Attacker model

We assume that an attacker knows the source code and can observe execution of all operations (i.e. dynamically executed statements) and their operands that are classified low. However, the attacker cannot observe high operations or high operands. For example, if the source statement print(stream,x) is classified low, then its dynamic execution outputs a value on a low output stream, which can be observed; if read (stream,x) is classified high, the dynamically read value of x comes from a high input stream and cannot be observed.Footnote 4 As explained, variables do not possess a global classification; correspondingly, the attacker cannot see all low values at any time.

We assume that the attacker can distinguish the relative order of reads/writes to different variables (in contrast to [53], see discussion in Sect. 8). We further assume the attacker cannot observe whether a program is in an infinite loop. But the attacker may have knowledge about the probabilities \(P_i(r)\) of input \(i\) causing a certain low-observable behaviour \(r\).

2.6 Low-equivalent traces

The definition of LSOD is based on low-equivalent traces. A trace of a program execution is a (possibly infinite) list of program configurations, where a configuration includes the executed operation, the memory before execution, and the memory after execution. Note that a trace is valid only with respect to a specific interleaving of program threads. Low-observable events are configurations from a trace which read or write low values; only memory cells which are read or written by a low operation are part of low-observable events. The low-observable behaviour of an execution trace is the subtrace which contains only low-observable events. As the attacker knows the source code and PDG, the attacker can reconstruct from the low-observable behaviour which PDG node caused a low-observable event.

Low-equivalent traces have identical low-observable behaviour. LSOD demands that any two executions with low-equivalent inputs have low-equivalent traces. Thus, LSOD is defined similarly to classical (sequential) noninterference, using traces instead of program states. This natural definition, as formalized in the literature, however, allows one trace to terminate and the other to not terminate. Hence, the problem of infinite traces and termination channels needs to be discussed before we formally present the new definition of low-equivalent traces.

2.7 Termination leaks

Consider the top right program in Fig. 3, whose input in line 2 is high data and whose print-statement is low observable. If a run of the program does not terminate, the print-statement is delayed infinitely, which leads to the conclusion that the input was \(<\)0. Worse, Fig. 3 bottom right exploits a termination channel that leaks the PIN completely. The two left programs behave identically to the bottom right program and also contain termination channels, but contain an additional implicit flow (the right programs do not contain implicit flow).Footnote 5 It is characteristic for termination leaks that the attacker must know that a program loops, in order to exploit the observable behaviour. It is known that in interactive programs, termination channels can leak arbitrary amounts of information [2].

Fig. 3
figure 3

Four tough nuts for termination-insensitive definitions of low-equivalent traces. All programs contain termination leaks and gradually leak (part of) the PIN

To prevent termination channels, several variants of LSOD and PN forbid low-observable events behind loops guarded by high data. Such algorithms will disallow the programs in Fig. 3. However, in practice this is an unacceptable restriction. Sometimes, program analysis can deduce that a loop will terminate, and the restriction can be relaxed. But in general, no other means to always avoid termination leaks are known. Therefore, several authors—including ourselves—allow termination channels (see also Sect. 8). Hence, our attacker model assumes that the attacker cannot observe nontermination.

We thus aim at a sound definition of LSOD which, however, may allow termination leaks. This approach has already been tried in earlier research—in particular in [47, 53]. Their LSOD definitions permit termination channels and declare traces to be low equivalent if their low-observable behaviour is equal up to the length of the shorter sequence of low-observable events. But as pointed out in [20], this may lead to unintended leaks. Consider the program in Fig. 3 top left, whose traces always diverge and assume that the input PIN is high data and that the print statement is low observable. The program exposes the input PIN by printing an equal number of x’s to the screen. If low equivalence of traces is confined to the length of the shorter sequence of low-observable events, this behaviour is perfectly legal, because all traces with low-equivalent inputs are equal up to the length of the shorter sequence.

2.8 The new approach

To solve this problem, we suggest the following new definition of low-equivalent traces: For finite traces, we stick to the common definition that they are low equivalent if their low-observable behaviours are equal. If one trace is finite and the other is infinite, then the finite trace must have at least as many low-observable events as the infinite one, the low-observable behaviours must be equal up to the length of the shorter sequence, and the low-observable events missing in the infinite trace must be missing due to nontermination. The latter constraint is new. It means that prior to the missing events, a condition, e.g. in an IF (“branching point”) must have been evaluated which dynamically led into an infinite loop. Thus, the new constraint makes sure that the missing events leak information only via termination channels.

Of course, for two infinite traces with infinite low-observable subtraces, these subtraces must coincide completely. The new definition of low-equivalent traces is soundly approximated through PDGs and slicing, resulting in a precise analysis for probabilistic noninterference. Additional exploitation of the concurrent control flow graph prevents rejection of secure low-nondeterminism.

3 Formalizing low-equivalent traces and LSOD

In the following, we will formally develop the definition of the noninterference criterion, prove that it guarantees LSOD, and use it as basis for an algorithm. To begin with, let us repeat the original PN Definition [42, 44]:

Definition 1

(Probabilistic noninterference) A program satisfies probabilistic noninterference if for all pairs \((t, u)\) of low-equivalent inputs the following holds:

Let \(\varTheta \) be the set of possible program runs (traces) resulting from \(t\) and \(u\). For each \(T \in \varTheta \), the following must hold: Let \(\mathfrak {T}\) be the set of program runs possibly caused by \(t\) that are low equivalent to \(T\). Let \(\mathfrak {U}\) be the analogous set for \(u\). Then \(\sum _{r\in \mathfrak {T}}P_t(r)=\sum _{r\in \mathfrak {U}}P_u(r)\) must hold, where \(P_i(r)\) is the probability of trace \(r\) under input \(i\).

Note that Definition 1 is scheduler specific: for some particular scheduler, the definition might be violated and a leak occurs, for other schedulers not. In particular the \(P_i(r)\) can be scheduler dependent and are very difficult to compute or measure; worse, the number of summands might be infinite. Fortunately we will see later that no values for the \(P_i(r)\) are needed for LSOD, as LSOD guarantees Definition 1 for every scheduler. The definition of possibilistic noninterference is much simpler [40]:

Definition 2

A program or statement \(C\) satisfies possibilistic noninterference if for all initial states \(s_1, s_2\): \(s_1\approx _{low} s_2 \implies O(C, s_1) \approx _{low} O(C, s_2)\); where \(O(C,s)\) is the set of all possible outputs under all possible interleavings in initial state \(s\), and \(\approx _{low}\) denotes low-equivalency of states resp. outputs.

Thus, possibilistic leaks are also probabilistic, but not vice versa. The program in Fig. 1 right is possibilistically secure, because it has no low input (so all initial states are low equivalent), and the sets of possible low outputs are the same: “0” as well as “1” can both be produced by some interleaving, regardless of the initial state or PIN. However, the program contains a probabilistic leak: Definition 1 is violated under uniform (or similar) scheduling.Footnote 6

Formally, Definition 1 is violated as follows: There is only high input, so all inputs are low equivalent. Let \(t=0, u=1\) be two input PINs, and \(\varTheta \) be the set of possible traces resulting from \(t\) or \(u\). Let \(T\in \varTheta \). Case a): “print(0)\(\in T\). Let \(\mathfrak {T}\) be all traces caused by \(t\) low equivalent to \(T\), let \(\mathfrak {U}\) be all traces caused by \(u\) low equivalent to \(T\). Thus, \(\mathfrak {T}\) contains traces which print “\(0\)” under input PIN \(0\), and \(\mathfrak {U}\) contains traces which print “\(0\)” under input PIN \(1\). For \(r\in \mathfrak {T}\), \(P_{0}(r)\) is the probability of trace \(r\) under input PIN \(0\). \(\sum _{r\in \mathfrak {T}}P_{0}(r)\) is thus the overall probability that “\(0\)” is printed under input PIN \(0\). Likewise, \(\sum _{r\in \mathfrak {U}}P_{1}(r)\) is the overall probability that “\(0\)” is printed under input PIN \(1\). These probabilities depend on scheduling. Assume the scheduler picks each thread with the same probability and schedules after every executed statement. Then enumerating all scheduling possibilities shows that \(\sum _{r\in \mathfrak {T}}P_{0}(r)={44\over 64}\), while \(\sum _{r\in \mathfrak {U}}P_{1}(r)={57\over 64}\). Case b) is analogous.

3.1 Traces and dynamic dependences

Definition 3 (Trace)

[Trace] An operation is a dynamic instance of a program statement (e.g. assignment execution, procedure call, thread fork). For operation \(o\), \({stmt}(o)\) is the corresponding source program statement.

A trace is a list of events of the form \((\overline{m}, o, m)\), where \(o\) is an operation, \(\overline{m}\) is the memory before execution of \(o\), and \(m\) is the memory after execution of \(o\).

The low-observable behaviour of a trace

$$\begin{aligned} T=(\overline{m}_1, o_1, m_1), \ldots , (\overline{m}_k, o_k, m_k) \end{aligned}$$

is a list of events

$$\begin{aligned} obs_{low}(T)= evt_{low}(\overline{m}_1, o_1, m_1), \ldots , {evt}_{low}(\overline{m}_k, o_k, m_k) \end{aligned}$$

where

$$\begin{aligned} {evt}_{{low}}(\overline{m}, o, m) = {\left\{ \begin{array}{ll} (\overline{m} \mid _{{use}(o)}, o, m \mid _{{def}(o)}) &{} o \text { reads or }\\ &{} \text {writes low}\\ &{} \text { values;}\\ \lambda &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

\({def}(o)\) are the variables defined (written) in \(o\), while \({use}(o)\) are the variables used (read) in \(o\). \(\overline{m} \mid _{{use}(o)}\) resp. \(m \mid _{{def}(o)}\) are the memory cells in \(m\) resp. \(\overline{m}\) which are used resp. defined by \(o\); \(\lambda \) is the empty event.

Fig. 4
figure 4

A program and two possible traces. The first trace results from input (inputPIN() = 0, input() = 0), the second from (inputPIN() = 1, input() = 0). The shaded nodes represent the low-observable behaviour

We write \(o\in T\) iff \(\exists {\overline{m}}, m: ({\overline{m}},o,m)\in T\), and \(LO(o)\) iff for \(o\in T\): \({evt}_{{low}}(\overline{m}, o, m) \not =\lambda \). For \(o,o'\in T\), we write \(o<o'\) if \(o\) is executed before \(o'\): \(o=o_i, o'=o_j, i<j\).

It is important to note that a trace includes all operations of a specific program execution with specific input; due to interleaving, many traces for the same input may exist. Operations in (different) traces can be uniquely identified by their calling context and control flow history (see below). Operations which read or write low values are low observable, together with the corresponding parts of the memory. Thus, only \(\overline{m} \mid _{{use}(o)}\) resp. \(m \mid _{{def}(o)}\) are low observable.

To better understand the latter fact, remember that due to flow sensitivity, there is no fixed separation into low and high memory. \(\overline{m} \mid _{{use}(o)}\) and \(m \mid _{{def}(o)}\) are exactly all the low cells touched by \(o\); this “operation-wise exact fit” has the effect to weaken the requirements for low equivalence while maintaining soundness (see below), ultimately reducing false alarms. A global low memory \(M_{low}\) would render less traces low equivalent (i.e. more false alarms), as \(\overline{m} \mid _{{use}(o)}, m \mid _{{def}(o)}\subseteq M_{low}\). This subtle insight is another explanation why flow-sensitive IFC is more precise.

Example

Figure 4 presents examples for traces and their low-observable behaviour. Remember that only input and output are explicitly classified as low or high and that memory cells are not globally classified.

Dynamic dependences Our LSOD criterion is very much based on the notion of “dependency”. Dependency between statement \(x\) and statement \(y\) means that \(x\) can somehow influence the behaviour (or results) of \(y\). Dependency exists in various flavours; one typical example is called data dependency and means that in \(x\), a variable \(v\) is assigned which is used in \(y\), where there is possible control flow from \(x\) to \(y\) which does not redefine \(v\). Dependency is a dynamic phenomenon, but can be soundly approximated by static program dependence graphs (PDGs). In the following, we formally define various kinds of dynamic dependences (the description of static dependences and PDGs is postponed to Sect. 4). Traces are enriched with dynamic dependences as in Fig. 4: dynamic data dependences connect dynamic reads and writes of variables (with no intermediate writes to the variables), and dynamic control dependences connect conditions in if, while etc., and the operations “governed” by these conditions.

Definition 4

(Dynamic dependences) Let \(T\) be a trace.

  1. 1.

    Operation \(o \in T\) is dynamically control dependent on operation \(b \in T\), written \(b \mathop {\dashrightarrow }\limits ^{dcd} o\), iff

    • \(o\) is a thread entry and \(b\) is the corresponding fork operation, or

    • \(o\) is a procedure entry and \(b\) is the operation that invoked that procedure, or

    • \(b\) is the director of the innermost control region of \(o\).Footnote 7

  2. 2.

    Operation \(o\) is dynamically data dependent on operation \(a\) in \(T\), written \(a \mathop {\dashrightarrow }\limits ^{v} o\), iff there exists a variable \(v \in {use}(o) \cap {def}(a)\), \(a<o\), and there is no operation \(o'\in T\), \(v \in {def}(o')\) where \(a<o'<o\).

  3. 3.

    \({\textit{DFS}}_T(o)=\{q\in T\mid o\ (\mathop {\dashrightarrow }\limits ^{v}\cup \mathop {\dashrightarrow }\limits ^{dcd})^{*} q \}\) denotes the set of all operations that are (transitively) dynamically control or data dependent on \(o\). \({\textit{DCD}}_T(o)=\langle {start}, q_2,\ldots ,q_{n-1}, o \mid q_i\in T, q_i<q_{i+1},\) \( q_i {\mathop {\dashrightarrow }\limits ^{dcd}}^{*} o \rangle \) is the list of operations on which \(o\) is (transitively) dynamically control (but not data) dependent.

If \(T\) is fixed, we just write \({\textit{DFS}}\) and \({\textit{DCD}}\). Note that dynamic dependencies are cycle free. \({\textit{DFS}}(o)\), the operations potentially influenced by \(o\), can be seen as a dynamic forward slice; \({\textit{DCD}}\) can be seen as a dynamic backward control slice (cmp. Sect. 4). Every operation has exactly one predecessor on which it is dynamically control dependent (except the \(start\) operation, which has no dynamic predecessor). For \(o\not ={start}\) and \(b \mathop {\dashrightarrow }\limits ^{dcd} o\), \(b\) is the unique dynamic control predecessor of \(o\), written \(b=\textit{dcp}(o)\).

Example

In Fig. 4 (lower part), \({\textit{DFS}}(5)=\{6, 7, 8,\) \( 7',9, 10\}\), and \({\textit{DCD}}(4)=\langle start,1,4\rangle \).

3.2 Infinite delay and low equivalency

We are now ready to tackle low equivalency of traces. As explained earlier, we want to define low equivalency of two traces \(T,U\) such that for infinite \(T\), if \(T\) misses a low-observable operation \(o\) executed in \(U\), \(o\) is missing due to an infinite delay in \(T\). Other reasons for nonexecution of \(o\) in \(T\) are not allowed.

This idea requires a formalization of the notions of “infinite delay” and “an operation happens in two different traces”. First we observe that an operation is uniquely identified by its calling context and control flow history. More formally, \(p=q\) holds for operations \(p\in T\) and \(q\in U\) if \({stmt}(p)={stmt}(q)\), and either \(p=q=start\), or \(dcp(p)=dcp(q)\). This recursive definition terminates as backward control dependency chains are finite. Thus, \(p=q\iff {\textit{DCD}}(p)={\textit{DCD}}(q)\). This definition explicitly includes the case that an operation occurs in two different traces \(T\) and \(U\), written \(o\in T\cap U\). Note that \(o\in T\cap U\) still allows that the memories (in particular the high parts) in both traces at \(o\) are not identical.

Next we observe that the execution of a branching point in a trace \(T\) triggers the execution of all operations in the chosen branch which are dynamically control-dependent on the branching point, up to the next branching point. For example, in the code fragment if(b){o1; o2}, both o1 and o2 are (statically and dynamically) control dependent on b (but o2 is not control dependent on o1). If b evaluates to true and the then-branch is executed, both o1 and o2 are executed, unless o1 does not terminate.Footnote 8

In terms of traces, if \(b_1\mathop {\dashrightarrow }\limits ^{dcd} o_1, o_2\ldots o_k \mathop {\dashrightarrow }\limits ^{dcd} b_2\) (where not necessarily \(o_i\mathop {\dashrightarrow }\limits ^{dcd}o_{i+1}\)), and \(o_1\in T\) (that is, \(o_1\) belongs to the branch chosen by \(b_1\) and thus is executed) then \(o_2, \ldots o_k\) are executed as well, unless there is nontermination in some \(o_i\), causing \(o_{i+1}\) to be delayed infinitely. Other possibilities for the nonexecution of \(o_{i+1}\) do not exist, because control dependency by its very definition implies that \(b_1\) (and nobody else) decides about the execution of \(o_1\ldots o_k\). The same argument applies if \(b\) occurs in two traces \(T\) and \(U\). Hence, we define

Definition 5

(Infinite delay) Let \(T,U\) be traces and let both execute branching point \(b\): \(b\in T\cap U\). Let \(o\in T\) be an operation where \(b \mathop {\dashrightarrow }\limits ^{dcd} o\) (thus, \(o\) belongs to the branch \(b\) chooses to execute in \(T\)). If \(o\not \in U\), \(U\) infinitely delays \(o\).

Thus, if both \(T\) and \(U\) execute \(b\) and choose the same branch, then either both \(T\) and \(U\) execute \(o\), or \(U\) does not execute \(o\) due to an infinite loop between \(b\) and \(o\). This definition is used for the formalization of low-equivalent traces:

Definition 6

(Low equivalence of traces, \(\sim _{{low}}\) ) Let \(T\) and \(U\) be two traces. Let \({obs}_{{low}}(T) = ({\overline{m}}_0, o_0, m_0) \cdots \) and \({obs}_{{low}}(U) = ({\overline{n}}_0, q_0, n_0) \cdots \) be their low-observable behaviours. Let \(k_T\) be the number of events in \({obs}_{{low}}(T)\) and \(k_U\) be the number of events in \({obs}_{{low}}(U)\). \(T\) and \(U\) are low equivalent, written \(T \sim _{{low}} U\), if one of the following cases holds:

  1. 1.

    \(T\) and \(U\) are finite, \(k_T = k_U\), and \(\forall 0 \le i < k_T:\) \({\overline{m}}_i = {\overline{n}}_i \wedge o_i = q_i \wedge m_i=n_i\)

  2. 2.

    \(T\) is finite and \(U\) is infinite, and

    • \(k_T \ge k_U\),

    • \(\forall 0 \le i < k_U: {\overline{m}}_i = {\overline{n}}_i \wedge o_i = q_i\wedge m_i = n_i \), and

    • \(\forall k_U < j < k_T\): \(U\) infinitely delays an operation \(o \in {\textit{DCD}}(o_j)\).

  3. 3.

    \(T\) is infinite and \(U\) is finite, and

    • \(k_U \ge k_T\),

    • \(\forall 0 \le i < k_T: {\overline{m}}_i = {\overline{n}}_i \wedge o_i = q_i\wedge m_i = n_i \), and

    • \(\forall k_T < j < k_U\): \(T\) infinitely delays an operation \(o \in {\textit{DCD}}(q_j)\).

  4. 4.

    \(T\) and \(U\) are infinite, and

    • if \(k_T = k_U\), then \(\forall 0 \le i < k_T: {\overline{m}}_i = {\overline{n}}_i \wedge o_i = q_i\wedge m_i = n_i \).

    • if \(k_T > k_U\), then \(\forall 0 \le i < k_U: {\overline{m}}_i = {\overline{n}}_i \wedge o_i = q_i\wedge m_i = n_i \), and \(\forall k_U < j < k_T\): \(U\) infinitely delays an operation \(o \in {\textit{DCD}}(o_j)\).

    • if \(k_T < k_U\), then \(\forall 0 \le i < k_T: {\overline{m}}_i = {\overline{n}}_i \wedge o_i = q_i\wedge m_i = n_i \), and \(\forall k_T < j < k_U\): \(T\) infinitely delays an operation \(o \in {\textit{DCD}}(q_j)\).

    • if \(k_T=k_U=\infty \), then \(\forall i: {\overline{m}}_i = {\overline{n}}_i \wedge o_i = q_i\wedge m_i = n_i \).

In fact the definition can be expressed in a more compact form:

Observation \(T \sim _{{low}} U \iff \)

  • \(\forall 0\le i\le \min (k_T, k_U): {\overline{m}}_i = {\overline{n}}_i\wedge o_i = q_i \wedge m_i=n_i\), and

  • if \(k_T\not = k_U\), and w.l.o.g. \(k_T>k_U\), then \(U\) is infinite and \(\forall k_U\le j<k_T\): \(U\) infinitely delays an operation \(o \in {\textit{DCD}}(q_j)\).

We will, however, refer to cases 1–4 in the following text. If all programs terminate, Definition 5 reduces to case 1, which does not rely on \({\textit{DCD}}\) or infinite delay. In case of nontermination, i.e. infinite traces, all low-observable \(o_j\) missing in the shorter trace must be missing due to infinite delay. “\(U\) infinitely delays \(o\in {\textit{DCD}}(o_j)\)” expresses that the delayed operation \(o\) must not necessarily be \(o_j\) itself, but can be a dynamic control predecessor of \(o_j\). In any case, \(o\) is on the dynamic control path between \(o_{k_U-1}\) and \(o_j\), and the infinite loop in the shorter trace is before \(o\): according to Definition 5, there must be a branching point \(b\in T\cap U\), where \(b\mathop {\dashrightarrow }\limits ^{dcd} o, o\not \in U\).

Interestingly, the essence of Definition 6 can be expressed without infinite delay.Footnote 9 Imagine every program has an additional print( ’done’) as the very last statement. Then the condition “\(U\) infinitely delays an operation \(o \in {\textit{DCD}}(o_j)\)” implies “print (’done’) \(\not \in U\)”. Perhaps it would be possible to re-formulate Definition 6 accordingly (thus replacing “\(U\) infinitely delays ...” by “print (’done’) \(\not \in U\)”) and construct an alternate soundness proof. In any case, \({\textit{DCD}}\) and \({DFS}\) are needed for the proof.

Note that Definition 6 is stronger than the low-equivalency Definitions in [20, 53]: the latter authors demand low equivalence not for \(T\) and \(U\), but only for all subtraces of individual variables. This weaker definition results in more traces being low equivalent; it assumes, however, that the attacker cannot distinguish between relative access order for different variables. It also assumes that variables are globally classified, which in our flow-sensitive setting they are not. Section 8 compares both definitions in detail.

The following definition of LSOD is standard, but uses the new definition of low-equivalent traces:

Definition 7

(Low-security observational determinism) A program is low-security observational deterministic if the following holds for every pair \((t, u)\) of low-equivalent inputs: Let \(\mathfrak {T}\) and \(\mathfrak {U}\) be the sets of possible traces resulting from \(t\) and \(u\). Then \(\forall \ T, U \in \mathfrak {T} \cup \mathfrak {U}: T \sim _{low} U\) must hold.

Note that we will later relax this definition and allow secure low nondeterminism while maintaining soundness. We will now discuss Definitions 6 and 7 using the examples from Figs. 2 and 3.

Example 1

For a simple multi-threaded program without termination leaks, consider Fig. 2 right. print(l) is explicitly classified low, and inputPIN is explicitly classified high. Remember that l and h are not globally classified; only input/output is explicitly classified. For all other statements the classification is computed through a fixpoint iteration on the PDG.Footnote 10 In particular, if \(x\in BS(y)\), \(cl(x)\sqsubseteq cl(y)\) must hold, where \({Low}\sqsubset {High}\) [16]. Thus, h in line 6 is high, l in line 5 is low, as \(\mathtt inputPIN \in BS(\mathtt h _6)\) and \(\mathtt l _5\in BS(\mathtt print(l) )\). l=h is, however, not classified low, because l=h \(\not \in BS(\mathtt print(l) )\) (see the PDG in Fig. 5).

Next we observe that all inputs are low equivalent, as the only input is high. Now let \(T,U\) be two possible traces for different inputPINs (in this example, several traces exist for the same input due to interleaving). Hence, l=0, print(l) are the only low-observable operations in \(T\) and \(U\). The operation order may, however, be different for \(T\) and \(U\) due to interleaving. Hence, \(T\not \sim _{low} U\). The example demonstrates the central weakness of LSOD: even secure low determinism is prohibited. The new RLSOD criterion, however, accepts the example (see below). But RLSOD only works because flow sensitivity of \(BS\) prevents l=h to be low observable, which otherwise would cause a false alarm.

Example 2

To understand the treatment of termination leaks, consider the top left program in Fig. 3. Again all inputs are low equivalent. We show that the program does not fulfil Definition 6. Obviously the program diverges for any inputPIN, but for different inputPINs prints different numbers of "x"s. Let us select trace \(T\) for one inputPIN \(n\) and trace \(U\) for a different inputPIN \(m\). Then we are in the infinite/infinite case of Definition 6. \(T\) contains \(n\) instances of print("x"), as well as \(n\) instances of while(x>0) and \(n\) instances of x—; thus, \(k_T=3n\). \(U\) contains \(m\) instances print("x") etc., W.l.o.g., assume \(k_T>k_U\). The low-observable \(o_j\) are the print("x") in \(T\) which are missing in \(U\). \({\textit{DCD}}(o_j)\) is the list of operations \(q_i\in T\) on which \(o_j\in T\) is dynamically control dependent, namely the list of dynamically executed loop tests: \({\textit{DCD}}(\) print("x") \()=\langle \) while(x>0), while(x>0), ... \(\rangle \).

Fig. 5
figure 5

PDG for example in Fig. 2 right. \(BS(\mathtt {print(l)})\) is shaded

Definition 6 demands that \(U\) infinitely delays an operation \(o\) from this list: there must be an infinite loop before \(o\). But this is not the case! According to Definition 5, it would not only require \(o\not \in U\) (which is the case for the first while(x>0) not in \(U\)), but that there is a branching point \(b\in T\cap U, b\mathop {\dashrightarrow }\limits ^{dcd} o\) which selects the same branch in both traces, but never proceeds to \(o\) in \(U\).

Such a \(b\) does not exist. Indeed, the operation causing the infinite delay, namely while (true), is not between any \(b\) and \(o\), but at the end of the program. Thus, \(T\not \sim _{low} U\). Hence, this program is not LSOD; it violates Definition 7. In this example, Definition 6 discovered the termination leak, but in general it will not discover termination leaks—as in the next example.

Example 3

The bottom left program in Fig. 3 fulfils Definition 6 as follows. Let \(T, U, k_T, k_U, o_j\) as above. Thus, the first low-observable operation \(o_j\in T\) missing in \(U\) is occurence no. \(k_U+1\) of print("x"). Again \(o_j\) is dynamically control dependent on some \(o=\) while(x>0). But this time, there is \(b\in T\cap U\), namely occurence \(k_U-1\) of if(x==0), which dynamically controls the \(k_U\)’st

while(x>0) in \(T\). But the latter is not in \(U\) any more. Thus, \(b\mathop {\dashrightarrow }\limits ^{dcd} o\), and the infinite loop indeed happens between \(b\) and \(o\). Hence, \(T\sim _{low} U\). That is, by Definitions 6 and 7, the program is wrongfully declared secure—which is ok, as we exclude termination leaks.

We see that Definitions 6 and 7 may miss termination leaks—which is a design feature. But sometimes we are lucky, and termination leaks are discovered anyway, as in example 2.

3.3 Soundness of the LSOD criterion

Soundness means that, under the conditions of the following theorem, all probabilistic leaks are discovered by Definition 7. Precision means that the number of false alarms is small. In this section we outline the soundness proof. Precision is investigated in Sect. 7.

Zdancewic and Myers [53] observed that probabilistic leaks can only occur if the program contains concurrency conflicts such as data races. LSOD is guaranteed if there is no implicit or explicit flow, and in addition, program parts influencing low-observable behaviour are conflict free. Their observation served as a starting point for our own work, as we realized that not only explicit and implicit flow can naturally be checked using PDGs, but also conflicts and their impact are naturally modelled in PDGs enriched with conflict edges. We thus provide the following definition:

Definition 8

(Data and order conflicts) Let \(a\) and \(b\) be two operations that may happen in parallel.

  • There is a data conflict from \(a\) to \(b\), written \(a \mathop {\rightsquigarrow }\limits ^{dconf} b\), iff \(a\) defines a variable \(v\) that is used or defined by \(b\).

  • There is an order conflict between \(a\) and \(b\), written , iff both operations are low observable: \(\textit{LO}(a)\wedge \textit{LO}(b)\).

  • An operation \(o\) is potentially influenced by a data conflict if there exist operations \(a,b\) such that \(o\in {\textit{DFS}}(b)\) and \(a \mathop {\rightsquigarrow }\limits ^{dconf} b\).

The following lemma states that in all possible traces which are produced by a set of low-equivalent inputs, the operations which are not influenced by high data or by execution order conflicts are identical “modulo termination”. In particular, for terminating traces these operations are completely identical.

Lemma 1

Let \(p\) be a program. Let \(T\) be a trace of \(p\) and \(\varTheta \) be the set of possible traces whose inputs are low equivalent to the one of \(T\). Let \(o\) be an operation of \(p\) that is not potentially influenced by a data conflict \(a \mathop {\rightsquigarrow }\limits ^{{dconf}} b\) or by an operation \(q\) reading high input: \(o\not \in {\textit{DFS}}(a)\cup {\textit{DFS}}(b)\cup {\textit{DFS}}(q)\). If \(o \in T\), then every \(U \in \varTheta \) either executes \(o\) or infinitely delays an operation in \({\textit{DCD}}(o)\).

The proof is in appendix ; the full proof can be found in [8]. Without the dependency machinery from Definition 6, the proof would not be possible. From the lemma it is a small step to the statement that low-equivalent inputs generate low-equivalent traces. This fundamental soundness theorem can now be stated:

Theorem 1

A program is low-security observational deterministic according to Definition 7 if for all traces \(T\) and all operations \(o,o',o''\in T\):

  1. 1.

    no low-observable operation \(o\) is potentially influenced by an operation reading high input:

    $$\begin{aligned} \textit{LO}(o)\wedge \lnot \textit{LO}(o')\implies o\not \in {\textit{DFS}}_T(o') \end{aligned}$$
  2. 2.

    no low-observable operation \(o\) is potentially influenced by a data conflict:

    $$\begin{aligned} \textit{LO}(o)\wedge o' \mathop {\rightsquigarrow }\limits ^{{dconf}} o'' \implies o\not \in {\textit{DFS}}_T(o')\cup {\textit{DFS}}_T(o'') \end{aligned}$$
  3. 3.

    there is no order conflict between low-observable operations:

Proof

see Appendix ; the full proof is in [8].\(\square \)

In the criterion, the first rule ensures that implicit and explicit flow to \(o\) do not transfer high data. The second rule ensures that high data cannot influence the data flowing to \(o\) via interleaving. The third rule ensures that high data cannot influence the execution order of low-observable operations via interleaving. Note that the theorem is only valid if sequential consistency in the sense of the Java memory model can be assumed. The Java memory model was designed to guarantee sequential consistency for race-free programs, and formal definitions plus machine-checked guarantees of the JMM are available [26, 27].

Theorem 2

If a program is LSOD according to Definition 7, it is probabilistically noninterferent according to Definition 1 for all schedulers.

Proof

see Appendix  . The proof is in a sense trivial, as the sum of probabilities for all traces possibly generated by two low-equivalent inputs must equal \(1\) (which also explains why LSOD does not need explicit probability values for traces). Note that for two low-equivalent inputs more than one trace may result, because our algorithm—as explained below—does not prohibit useful nondeterminism, as long as soundness is maintained.\(\square \)

Corollary 1

LSOD, as defined in Definition 7, is scheduler-independent.

Proof

Using the terminology of Definition 1, scheduler behaviour will influence the probability distributions for certain interleavings and thus the \(P_t(r)\) resp. \(P_u(r)\). The proof of Theorem 2, however, demonstrated that under LSOD, the sum of these probabilities is always 1. Only these sums are needed to guarantee probabilistic noninterference; thus, the latter is invariant under the probability distribution of interleavings.   \(\square \)

4 Dependence graphs and noninterference

In the following, we will present a static analysis which soundly approximates the above LSOD criterion. This analysis is based on program dependence graphs (PDGs). In this section, we present the necessary facts about PDGs for multi-threaded programs, which are then exploited in Sects. 5 and 6.

4.1 PDGs for sequential programs

Program dependence graphs are a standard tool to model information flow through a program. In an (intraprocedural) PDG \(G=(N,\rightarrow )\), \(N\) comprises program statements or expressions. There are two kinds of edges: data dependences and control dependences, thus \(\rightarrow \ =\ \mathop {\rightarrow }\limits ^{D}\cup \!\!\!\cdot \mathop {\rightarrow }\limits ^{C}\). Dependence \(x\mathop {\rightarrow }\limits ^{D}y\) means that statement \(x\) assigns a variable which is used in statement \(y\), without being reassigned on any control flow path from \(x\) to \(y\). Dependence \(x\mathop {\rightarrow }\limits ^{C}y\) means that the mere execution of \(y\) depends directly on the value of the expression \(x\) (which is typically a condition in an if- or while-statement, see e.g. [23] for formal definitions). A path \(x\rightarrow ^*y\) means that information can flow from \(x\) to \(y\); if there is no path, it is guaranteed that there is no information flow [17, 36, 38, 51]. Exploiting this fundamental property, all statements possibly influencing \(y\) (the so-called backward slice) are easily computed as \(BS(y)=\{x\mid x\rightarrow ^* y\} \). \(y\) is called the slicing criterion of the backward slice. Similarly, the forward slice is \(FS(x)=\{y\mid x\rightarrow ^* y\}\).

Fig. 6
figure 6

A small program and its dependence graph

As a small example, consider the program and its dependence graph in Fig. 6 (from [16]). Control dependences are shown as straight edges, data dependences are shown as curved edges. There is a path from statement 1 to statement 9 (i.e. \(9\in FS(1)\)), indicating that input variable a may eventually influence output variable z. Since there is no path \((1)\rightarrow ^*(4)\) (\(1\not \in BS(4)\)), there is definitely no influence from a to x>0.

The Slicing Theorem [17, 38] states that for any terminating execution reaching statement \(x\), the program and \(BS(x)\) compute the same sequence of values for each variable used in \(x\). Thus, a correct PDG may have too many edges, but never too few (“soundness”). But of course, PDGs should have as few edges as possible (“precision”). Note that due to decidability problems, “complete” precision can never be achieved while maintaining soundness. Section 5 contains formal soundness properties of slices, as exploited by our static LSOD check.

PDGs and slices for languages with procedures, exceptions, pointers, objects, arrays, etc., are much more complex. Interprocedural analysis is typically based on the context-sensitive Horwitz-Reps-Binkley (HRB) algorithm [18, 37], which uses summary edges to model flow through procedures.Footnote 11 Full sequential Java requires even more complex algorithms, in particular for pointer analysis. The pointer analysis used in JOANA is object-sensitive, field-sensitive, and optionally flow-sensitive; it is described in [11, 16]. Precise pointer analysis is a prerequisite for precise treatment of dynamic dispatch in object-oriented languages [16]. Another issue are exceptions, which may cause a lot of additional control flow. Precise treatment of exceptions in PDGs is described in [14, 16]. Thus, in general, computing \(BS(x)\) involves more than just backward paths in the PDG.

In-depth descriptions of slicing techniques can be found in [23]. Note that the slicing theorem has been shown for all the extensions mentioned above—and the soundness of the current work only depends on the slicing theorem, not on the specific variant of slicing or pointer analysis.

Of course, slicing precision directly influences IFC precision and false alarms. In the early days of PDGs and slicing, often a backward slice comprised almost the complete program. Today, PDGs have become much more precise since they are flow-sensitive, context-sensitive, and object-sensitive: the order of statements is taken into account, as is the actual calling context for procedures, and the actual reference object for method calls. Thus, the backward slice never indicates influences that are in fact impossible due to the given statement execution order or call stack of the program; only so-called realizable paths are considered (that is, paths dynamically possible with respect to call stack and statement order). As a consequence, slice size has dropped dramatically (see, e.g. [3, 4]). But this precision is not for free: interprocedural PDG construction can have complexity \(O(|N|^3)\), object-sensitive pointer analysis is similarly expensive, and flow-sensitive pointer analysis is so expensive that it is activated on demand only. In practice, PDG use is limited to programs of about 100kLOC [4]. Section 7 presents data about precision and scalability of our PDG construction algorithm.

4.2 Noninterference and PDGs

As mentioned, \(a\not \in BS(b)\) guarantees that there is no information flow from \(a\) to \(b\). This is true for all information flow which is not caused by hidden physical side channels such as timing leaks. It is therefore not surprising that slicing is a natural tool for IFC [1, 45]. Illegal flow becomes visible as a PDG path from a secret value to a public variable. In Fig. 6, assume x is secret and z is public; the PDG path \(3\rightarrow ^* 9\) (i.e. \(3\in BS(9)\)) exposes the illegal (implicit) flow from x to z. In Fig. 7, slicing uncovers an illegal flow in a multi-threaded example (see next section).

More examples for PDG-based IFC can be found in [8, 16]. In 2009, we provided a machine-checked proof that (sequential) noninterference holds if no high variable or input is in the backward slice of a low output. This result was shown for the intraprocedural as well as the interprocedural (HRB) case [49, 51].

Let us finally mention that—due to potentially nonlocal effects of any points-to, alias, or dependency relation—the PDG of a complete system cannot be obtained by just combining PDGs of subsystems. PDGs thus require a whole-program analysis, and all library functions have to be analysed together with the client code (or at least “stubs” must be provided, which simulate the dependencies through library functions). The consequence is that PDG-based IFC is not compositional: security of a program cannot be inferred just from the security of its components. Only recently, compositional PDGs were tackled (see Sect. 9).

4.3 PDGs and slicing for multi-threaded programs

For multi-threaded programs operating on shared memory, PDGs are extended by so-called interference dependencies Footnote 12 which connect an assignment to a variable in one thread with a read of the same variable in another thread [22]. Figure 7 shows a small example with two interference edges. The backward slice from node \(\mathtt {x=y+1}\) consists of the grey nodes.Footnote 13 Now assume x and b are public; a and y are secret. The illegal explicit flow from y to x is captured in the PDG by (among others) the data dependence \(\mathtt {y}~\mathop {\rightarrow }\limits ^{D}~ \mathtt {x=y+1}\); the illegal implicit flow from a to b is captured by the control dependence \(\mathtt {a>0}~\mathop {\rightarrow }\limits ^{C}~ \mathtt {b=0}\).

Fig. 7
figure 7

PDG for two threads with interference dependences

The simplest slicer for multi-threaded programs is the iterated two-phase slicer (I2P) [14, 33]. I2P uses the context-sensitive HRB algorithm,Footnote 14 but does not traverse interference edges directly. Instead, I2P adds the starting point of interference edges to a work list and thus repeatedly applies the intra-thread HRB backward slice algorithm for every interference edge.

I2P can be improved by using May-happen-in-parallel (MHP) information. Often MHP analysis can prove that, due to locks or the fork/join structure, certain statements cannot happen in parallel. Such information can be used to prune interference dependencies, drastically improving scalability and precision. Various MHP algorithms for Java have been published, e.g. [8, 25, 34].

Time-sensitivity. A time-sensitive analysis discards impossible, “time-travelling” flows.Footnote 15 Time-travelling means that the source of a flow would be executed physically later than the sink—which is impossible for any sequentially consistent interleaving. I2P is context-sensitive inside threads, but not time-sensitive. Time-sensitive interprocedural slicing algorithms are very complex and cannot be described in detail here; for many years, no scalable algorithm existed. Today, our new algorithms, which are based on earlier work by Krinke and Nanda [22, 33] allow analysis of at least a few kLOC of full Java [79]. Benchmarks have shown that I2P precision (i.e. slice size) is ca. 25% worse than the best known time-sensitive slicer [8].

Lock-sensitivity. Most concurrent programs contain explicit locks and synchronization. However, many MHP algorithms ignore locks – they only analyse fork-join structure, perhaps in a context-sensitive manner. The reason is that lock-sensitive MHP needs a precise must-alias analysis, which is notoriously difficult and expensive [8]. Section 9 discusses approaches to lock-sensitive PDGs. Note that a lock-sensitive MHP only improves precision, but does not affect correctness of PDGs and IFC.

5 A slicing-based static LSOD check and the RLSOD criterion

5.1 The static check

Let us now come back to our LSOD criterion and its sound, static approximation through PDGs and slicing. First we emphasize that dynamic dependencies and thus DFS and DCD can be soundly approximated by static slices and PDGs. In particular, the following soundness properties for \(BS\) and \(FS\) hold [16, 17]:

Observation Let \(o\) be an operation in trace \(T\).

  1. 1.

    If \({DFS}_T(o)=\{p_1,\ldots ,p_k,\ldots \}\) then

    $$\begin{aligned} \{stmt(p_1),\ldots ,stmt(p_k),\ldots \}\subseteq FS(stmt(o)) \end{aligned}$$
  2. 2.

    If \(stmt(o)\not \in BS(stmt(p))\), then \(p\not \in {DFS}_T(o)\), and it is guaranteed that \(o\) cannot influence \(p\) through explicit or implicit flow.

Due to these properties, the three conditions of theorem 1 can naturally be checked using slicing for concurrent programs. Furthermore, a sound and precise MHP analysis is needed; we write MHP \((s,s')\) if this analysis concludes that PDG nodes \(s, s'\in N\) may be executed in parallel. Remember that \(cl(s)\) is the classification (\(H\) or \(L\)) of PDG node \(s\); we assume a two-element security lattice where \(L\sqsubseteq H\). \({def}(s)\) are the variables defined (written) in \(s\), while \({use}(s)\supseteq {def}(s)\) are the variables used (read or written) in \(s\).

Then the following static conditions are sufficient to guarantee the three dynamic conditions in theorem 1:

Static check For all \(s, s', s''\in N\),

  1. 1.

    \(cl(s)=L\wedge cl(s')=H\implies s'\not \in BS(s)\)

  2. 2.

    \({MHP}(s, s')\wedge \bigl (\exists v: v\in {def}(s)\wedge v\in use(s')\bigr )\) \(\wedge ~ cl(s'')=L\implies s, s'\not \in BS(s'') \)

  3. 3.

    \({MHP}(s,s')\implies \bigl (cl(s)=H\vee cl(s')=H\bigr )\)

The first condition is also the basis for the sequential IFC [16], the second disallows data conflicts between low-observable operations, and the last demands that no two low-observable operations are in an order conflict. Thus, Zdancewic’s original idea, namely to disallow conflicts between publicly observable effects, is naturally implemented through slicing of concurrent programs.

Example 1

For the program in Fig. 1 right, we have \(cl(\mathtt {inputPIN()})=H\), \(cl(\mathtt {print(x)})= cl(\mathtt {print(2)})=L\). There are no explicit or implicit leaks, so rule 1 does not fire. But \({MHP}(\mathtt {print(x)}, \mathtt {print(2)})\), so there is an order conflict: . Indeed, rule 3 rejects the program immediately. Furthermore, \({MHP}(\mathtt {print(x)},\) \(\mathtt {x=1})\); thus, there is a data race concerning \(\mathtt {x}\). Since any PDG node is in its own backward slice, rule 2 rejects the example as well, where \(s'=s''=\mathtt {print(x)}\).

Example 2

Consider again Fig. 3. The programs in the right are accepted, as there are no explicit, implicit or probabilistic flows. Both programs contain termination channels which are, however, not discovered by the static check. The top left program contains an additional implicit flow (\(\mathtt {inputPIN()}\) \(\in BS(\mathtt {print("x")})\)) and thus is rejected by the static check – even though its behaviour is identical to Fig. 3 bottom right. But the check analyses program text, not program behaviour (and behaviour equivalence is undecidable). For the program bottom left, which again behaves identical to top left and bottom right, the static check discovers the same implicit flow.

Now remember that for this program—in contrast to the top left program—Definition 7 is fulfilled: it does not discover the termination leak. The static check is, however, a sound overapproximation of Definition 7: in rare cases, it rejects programs which are accepted by Definition 7. In particular, it rejects more termination leaks than Definition 7, but it never misses nontermination leaks. In Fig. 3, the static check thus discovers 2 of 4 termination leaks, while Definition 7 discovers only 1 of 4 termination leaks. The example indicates that the static check is, after all, “only 50% termination insensitive”; however, this number has not been validated in realistic experiments.

5.2 RLSOD: allowing noncritical conflicts

Often LSOD is criticized as it may prohibit useful nondeterminism (due to rules 2 and 3). For example, if several threads nondeterministically write to the same public file, LSOD as described so far will always prohibit such nondeterministic writes due to rule 3.

But note that a race or conflict is only harmful if there is a possible control flow from a high node to the conflicting nodes—otherwise the conflict cannot cause a leak. In particular, programs without high sources may contain arbitrary low determinism, and arbitrary high nondeterminism is allowed if it cannot influence low events. This important observation allows to prevent spurious rejection of useful nondeterminism.

The result is a criterion which, strictly speaking, cannot be called LSOD any more, as LSOD Definition 7 absolutely disallows any low nondeterminism. We call the new criterion RLSOD (relaxed LSOD). It is easy to implement: potential control flow can be checked in the threaded control flow graph (see Sect. 6). Thus, item 3 in the above static check now reads

  1. 3’.

    \(\bigl ({MHP}(s,s')\wedge \exists a: cl(a)=H\wedge ({flow}(a,s)\vee {flow}(a,s'))\bigr )\) \(\implies \bigl (cl(s)=H\vee cl(s')=H\bigr )\)

Examples

In Fig. 2 right, neither l=0 nor print(l) can be reached from high events, as both are initial in their thread. Thus, the example is RLSOD even though it is not LSOD (cmp. example 1 in Sect. 3.2). In Fig. 8 top, the inputPIN cannot influence the writes to the public debug file; hence, the nondeterminism present in the high-influenced parts is allowed. In Fig. 8 bottom, there is no high input, and hence, all low conflicts are allowed. Thus, both examples are RLSOD. Similar anti-LSOD examples from the literature are RLSOD likewise. Thus, the long-standing statement “LSOD prohibits useful nondeterminism” has lost its foundation.

Fig. 8
figure 8

The RLSOD criterion does not prohibit useful nondeterminism

Note, however, that low nondeterminism can be insecure—even if the RSLOD criterion (or any PN criterion) is satisfied—if the scheduler can be manipulated. A manipulated scheduler could, for example, read a high value before scheduling (e.g. h=0 or h=1), and then schedule low determinism (e.g. in l = 0 || l = 1;) in a way such that the high value is copied to a low variable. This is the reason why some authors disallow schedulers which read high data [42], while others favour scheduler-specific IFC and PN [19, 35] (see also Sect. 8). In the current work, the possibility of manipulated schedulers is ignored, and low nondeterminism is considered secure if in the program code it is not influenced by high events. We consider this approach consistent with the notion of “language-based” security.

6 Implementation

We assume that all PDG nodes \(n\in N\), as well as input/output streams, are annotated (classified) with a security level \(cl(n)\). It is enough to annotate inputs \(I\subseteq N\) and outputs \(O\subseteq N\) (also called sources and sinks), as the security level for intermediate nodes \(n\in N{\setminus }(I\cup O)\) can be determined by a fix-point iteration similar to data flow analysis on the PDG [16]. The analysis can handle arbitrary lattices of security levels, not just the two-element lattice \(L\sqsubseteq H\).

For the implementation, the PDG is enriched with data and order conflict edges. The result is called a CPDG (conflict-enriched program dependency graph).

Definition 9

(Data and order conflict edges) Let \(G=(N,\rightarrow )\) be a PDG. Let \(m,n\in N\) where \({MHP}(m,n)\). There is a data conflict edge \(m \rightarrow _{{dconf}} n\) to \(G\) if \(m\) defines a variable \(v\) that is used or defined by \(n\): \(\exists v: v\in {def}(m)\wedge v\in {use}(n)\). There is an order conflict edge \(m \leftrightarrow _{{oconf}} n\) to \(G\) if both nodes are classified as sources or sinks: \(m,n\in I\cup O\).

Example

Figure 9 shows the CPDG of the program on the right hand side of Fig. 1. The example assumes that y = inputPIN() is classified as a source of high data and print(x) and print(2) are classified as sinks of low data. The CPDG contains two order conflict edges, one between print(x) and print(2) and one between print(x) and y = inputPIN(), and three data conflict edges, from x = 0 to x = 1, from x = 1 to x = 0 and from x = 1 to print(x).

Fig. 9
figure 9

PDG of the program on the right side of Fig. 1, enriched with data and order conflict edges. The grey nodes denote the slice for node print(x). Note that the slice ignores conflict edges

Definition 10

(TCFG) A Threaded Control Flow Graph (TCFG) consists of the interprocedural CFGs for the individual threads, connected by fork and join edges.

A formal definition of TCFGs can be found in [8]. A path in the TCFG is written \(a\rightarrow ^*_{TCFG} b\) (or \({flow}(a,b)\) for short).Footnote 16 Once CPDG and TCFG have been constructed, the IFC checker proceeds as follows:

  1. 1.

    Compute \(BS(s)\) for every sink \(s\in O\).

  2. 2.

    If the backward traversal encounters a source \(i\in I\) where \(cl(i) \not \sqsubseteq cl(s)\), then the program may leak data of level \(cl(i)\) via explicit or implicit flow and is rejected. This criterion is also used in our sequential IFC [16].

  3. 3.

    If the traversal encounters an incoming data conflict \(m \rightarrow _{{dconf}} n\), the program may contain a probabilistic data channel and is rejected.

  4. 4.

    If the traversal encounters an order conflict \(m \leftrightarrow _{{oconf}} n\), check if the order conflict is low observable, i.e. \(cl(m)=cl(n)=L\). If so, the program may contain a probabilistic order channel and is rejected.

Example. Consider Fig. 9. The backward slice for print(2) encounters the order conflict edge between print(2) and print(x), so the program may contain a probabilistic order channel. The slice for print(x), highlighted grey in Fig. 9, encounters all data conflict edges, so the program may contain a probabilistic data channel as well, whereas its implicit and explicit flow is secure.

In order to allow noncritical conflicts (RLSOD criterion, see Sect. 5.2), we change item 4 in the algorithm as follows:

  1. 4’.

    If the traversal encounters an order conflict \(m \leftrightarrow _{{oconf}} n\), check if \(cl(m)=cl(n)=L\). If so, check in the TCFG if any node that can be executed before both conflicting nodes is a high source: \(\exists \ a\in N: cl(a)=H\wedge a\rightarrow ^*_{TCFG} m \wedge a\rightarrow ^*_{TCFG} n\). If so, the program may contain a probabilistic order channel and is rejected. If not, the conflict is not critical.

In the implementation, rules 2–4 are integrated into a backward I2P slicer. A time-sensitive slicer would be more precise, but would make the algorithm much more complex and expensive. In practice, I2P precision is often sufficient; time-sensitive slicing can have exponential runtime and thus should only be applied if the I2P approach produces false alarms.

figure a
figure b
figure c

Algorithms 1, 2 and 3 present detailed pseudocode. Algorithm 1 receives a CPDG in which sources and sinks are already classified and which already contains the order conflict edges, the corresponding TCFG and the security lattice in charge. It then runs a slicing-based check of the implicit and explicit flow (that is, it checks rule 2; in fact, the algorithm from [16] is used). If the program passes that check, it is scanned for probabilistic channels by checking rules 3 and 4. This is done by Algorithm 2.

Algorithm 2 receives the CPDG, the TCFG, the security lattice and a source or sink \(s\) of a certain security level \(l\). The algorithm first checks whether \(s\) is involved in a low-observable order conflict that can be preceded by a source of high data. This task is delegated to the auxiliary procedure benign in Algorithm 3. After that, it executes an extended I2P slicer which additionally checks if \(s\) is potentially influenced by a data conflict whose nodes can be preceded by a source of high data. This check is again delegated to procedure benign. The “phase 1” and “phase 2” in the I2P loop are just the two phases of the HRB slicer, which is inlined into the I2P algorithm.

Procedure benign implements rule 4. First it checks whether the given conflict is an order conflict and whether it is low observable, which it is if both conflicting nodes are visible to the attacker. To allow noncritical conflicts, it additionally checks if the conflicting nodes are preceded by a high source \(n\). This is the case if \(n\) reaches them on realizable paths in the TCFG or if it may happen in parallel to them.

Example 1. The sequential programs in Fig. 3 right are accepted by Algorithm 1, as there are no explicit, implicit or probabilistic flows. Both programs contain termination channels which, however, are not discovered by Algorithm 1. On the other hand, the left programs contain additional implicit flow and thus are rejected by Algorithm 1—even though their behaviour is identical to the bottom right program. This seeming inconsistency is a deliberate consequence of allowing termination leaks, as discussed at length in Sect. 2.

Example 2. Consider Fig. 9. Algorithm 1 passes the flow call successfully (no implicit or explicit flows, as checked by sequential IFC for every thread). It then calls Algorithm 2 for the high source y = inputPIN() and for the two low sinks print(x) and print(2). Tracing the last call, Algorithm 2 sees the order conflict between print(2) and print(x); hence, it calls Algorithm 3. The latter discovers visibility of the conflict in the second main phrase of the first if, which prevents the conflict to be benign—Algorithm 2 immediately returns false. If we nevertheless trace the algorithm a little further, the worklist for the I2P slicer is initialized with print(x). In the first iteration, the for loop discovers \(m=\) x=0; and \(m=\) x=1; to be in immediate data conflict with \(n=\) print(x). Algorithm 3 discovers that a source of high data, namely y=inputPIN(); can reach print(x); hence, the data conflict is harmful. The example shows that the RLSOD check is terminated as soon as a leak is found; it can also be modified to return a list of all leaks, where a PDG path is given for every leak.

7 Evaluation

The above algorithms have been implemented for full Java and integrated into JOANA. To our knowledge, no other evaluations of LSOD or PN precision or scalability have been published; hence, we cannot compare our implementation to other algorithms.

Fig. 10
figure 10

Example from Smith and Volpano [44]

7.1 Precision

To assess precision, we analysed examples from the literature. Our first example is from [44] (Fig. 10). The program contains a possibilistic leak: if the input PIN is less than twice the value of variable mask, then PIN’s value is copied bit-wise into result and printed (provided that scheduling is starvation free). Using JOANA, we classified PIN = Integer.parseInt(args[0]) as a high source and System. out.println(result) as a low sink. No other classifications were necessary. Algorithm 1 detected an order conflict which depends on high data: the assignments to result in threads Alpha and Beta are conflicting, and the outcome of the conflict is influenced by the values of trigger0 and trigger1, which in turn are changed dependent on PIN’s value in thread Gamma. Thus, this program is rejected by RLSOD.

Our second example in Fig. 11 is from [31]. The program is probabilistic noninterferent, which is, however, difficult to discover. The program manages a stock portfolio of Euro Stoxx 50 entries. The portfolio data, pfNames and pfNums, are secret; hence, neither the Euro Stoxx request by EuroStoxx50, nor the final message sent to a commercials provider may contain any information about the portfolio. Indeed, Portfolio and EuroStoxx50 do not interfere; thus, the Euro Stoxx request does not leak information about the portfolio. The message sent to the commercials provider is not influenced by the values of the portfolio, either, because there is no explicit or implicit flow from the secret portfolio values to the sent message. Furthermore, the two outputs have a fixed relative ordering, as EuroStoxx50 is joined before Output is started. Hence, the program should be considered secure.

Fig. 11
figure 11

Example from Mantel et al [31], converted to Java. For brevity, some methods are not shown

We classified the two statements reading the portfolio from storage, pfNames = getPFNames() and pfNums = getPFNums(), as high sources and the output flushes nwOutBuf in EuroStoxx50 and at the end of main as low sinks; other classifications were not necessary. The challenge of this program is to detect that EuroStoxx50 is joined before nwOutBuf is flushed in the main procedure, because otherwise it cannot be determined that the two flushes of nwOutBuf have a fixed execution order. And then the program would have to be rejected because the resulting order conflict is influenced by both sources.

Our MHP analysis, together with context-sensitive points-to analysis, was able to detect that the joins of the threads are must-joins, which enabled the RLSOD algorithm to identify that there is no order conflict between the two flushes of nwOutBuf (cmp. Definitions 8, 9), which in turn avoided false alarms in the “benign” check. Therefore, no probabilistic channel was reported. To our knowledge, algorithm 1 is the first implementation of LSOD or PN, which was precise enough to verify that this example is noninterferent.

More case studies can be found in [8].

7.2 Runtime behaviour

We applied our algorithm to a benchmark of 8 programs between 200 and 3000 LOC (taken from the Bandera benchmark and the JavaGrande benchmark). We used three different security lattices, called A, B, C, with 3, resp. 22, resp. 254 elements. For each program and lattice, we randomly chose 10 sources and 10 sinks, then 33 sources and 33 sinks, and finally 100 sources and 100 sinks of random security levels and analysed the classified programs. We measured the total execution times, and separate execution times of the scan for probabilistic channels and of the scan for explicit or implicit flow. Every test was run ten times.

Table 1 Average execution times of algorithm 1 for different programs, lattices and numbers of sources and sinks (in seconds)

Table 1 shows the average execution times (measured on a rather old standard PC). It contains one row for each combination of program and lattice, i.e. row “LG + A” contains the results for program “LaplaceGrid” and lattice A. The numbers reveal that the most important factor influencing the runtime behaviour is, besides the sheer size of the program, the number of sources and sinks. With an increasing number of sources and sinks the size of lattice C eventually became the dominating cost factor. A more detailed analysis reveals that the sequential IFC check in algorithm 1 needs about the same execution time as the additional probabilistic checks.

Table 2 The percentage share of the probabilistic channel detection among the overall execution times

Table 2 shows the percentage share of the probabilistic channel detection among the overall execution times. The remaining time was consumed by the algorithm of Hammer et al. [16], which is employed for verifying the explicit and implicit flow. The results show that the two checks were similarly fast. However, the probabilistic check seems to decline in performance for large lattices, compared with Hammer et al.’s algorithm, which is consistent with the above-mentioned performance sensitivity for very large lattices.

8 Discussion and related work

In the following, we compare our definition of LSOD with PN and LSOD definitions from the literature.

8.1 Weak probabilistic noninterference

Smith and Volpano’s weak probabilistic noninterference (WPN) property [43, 48] enforces probabilistic noninterference via weak probabilistic bisimulation. A program is WPN if for each pair of low-equivalent inputs, each sequence of low-observable events caused by one input can be caused by the other input with the same probability.

WPN addresses explicit and implicit flow and probabilistic channels. Like in our analysis, timing channels and termination channels are excluded (which permits the probabilistic bisimulation to be weak). This renders their interpretation of low-observable behaviour very similar to ours: It consists of a sequence of low-observable events, but lacks information about the time at which such an event occurs. The major difference concerning low-equivalent behaviour is that their definition disallows low-observable events to be delayed infinitely in one low-observable behaviour and to be executed in the other. Thus, WPN is stricter with respect to termination channels and only permits the sheer termination of the program to differ.

Fig. 12
figure 12

Two examples comparing LSOD and WPN. We assume that Smith and Volpano’s technique classifies variables h and x as high and l as low and that our technique classifies h = inputPIN() as a high input and l = 0 and l = 1 as low output. The left program is accepted by our condition and rejected by theirs, the right program is rejected by ours and accepted by theirs

WPN globally partitions the program variables into high and low, and the attacker is able to see all low variables at any time. In contrast, our attacker can only see low operations once they are executed. In particular, in our flow-sensitive approach the same variable can be low at one program point or high at another, dependent on the context.

Thus, the WPN attacker is generally more powerful than ours, because we assume that only low operations/events and their low operands are visible to the attacker. The price to be paid is flow insensitivity, resulting in strange false alarms, as we saw in, e.g. Fig. 2. It depends on the application context which attacker model is more realistic.

Smith and Volpano’s security-type system lacks a detection of conflicts. Probabilistic channels are prevented by forbidding assignments to low variables sequentially behind conditional structures, which is very restrictive. The program on the left side of Fig. 12 is rejected by WPN, because the running time of the if-structure depends on high data and is followed sequentially by l = 0. However, it does not contain a probabilistic channel because l = 0 is not involved in an order conflict or influenced by a data conflict. It therefore satisfies RLSOD.

WPN assumes that a single statement has a fixed running time. Thus, programs like in Fig. 12 (right) are accepted by WPN, because the branches of the if-structure have equal length, and thus, different values of h do not alter the probabilities of the possible ways of interleaving of l = 0 and l = 1. We explicitly aim to reject such programs, arguing that different running times of h = inputPIN() could already cause a probabilistic channel, and our security constraint rejects the program because of the data conflict between l = 0 and l = 1.

Smith and Volpano’s security-type system is restricted to probabilistic schedulers and breaks, for example, in the presence of a round-robin scheduler [43]. (R)LSOD holds for every scheduler.

8.2 Strong security

Sabelfeld and Sands’ security property strong security [42] addresses implicit and explicit flow, probabilistic channels and termination channels. It enforces probabilistic noninterference for all schedulers whose decisions are not influenced by high data. It makes the following requirements to a program \(p\) and all possible pairs \((t,u)\) of low-equivalent inputs: Let \(\mathfrak {T}\) and \(\mathfrak {U}\) be the set of possible program runs resulting from \(t\) and \(u\). For every \(T \in \mathfrak {T}\), there must exist a low-equivalent program run \(U \in \mathfrak {U}\).

Even though it looks like a possibilistic property, strong security is capable of preventing probabilistic channels, the trick being the definition of low-equivalent program runs: Two program runs are low equivalent if they have the same number of threads and they produce the same low-observable events and create or kill the same number of threads at each step under any scheduler whose decisions are not influenced by high data. This “lockstep execution” requirement allows ignoring the concrete scheduling strategy.

The attacker sees all low variables at all times and is aware of program termination. The values of the low variables, their changes over time and the termination behaviour constitute the low-observable behaviour. Strong security assumes that the attacker is not able to see which statement is responsible for a low-observable event and is designed to identify whether two syntactically different subprograms have equivalent low-observable effects. This makes it possible to identify programs like Fig. 13 (left) as secure. Even though the assignments to the low variable l are influenced by high data via implicit flow, strong security states that the low-observable behaviour is not, because both branches lead to 0 being assigned to l. Our algorithm is not able to recognize this program as secure, the same holds for WPN.

Smith/Volpano and we assume that the attacker cannot exploit termination channels and is able to identify statements responsible for low-observable events, which is seen contrarily by Sabelfeld and Sands.

The requirement of lock-step execution implies that strongly secure programs can be combined sequentially or in parallel to a new strongly secure program (compositionality). Sabelfeld [41] has proven that strong security is the least restrictive security property that provides this degree of compositionality and scheduler-independence. Its compositionality is its outstanding property and an advantage over our LSOD. However, lockstep execution imposes serious restrictions to programs.

Furthermore, the restriction to schedulers which do not touch high data means that any information possibly used by the scheduler, for example the mere number of existing threads, must be classified as low. This in turn means that the classification of a program becomes scheduler dependent, so the scheduler independence of strong security is bought by making the classification scheduler dependent. This allows breaking strong security, by running the program under a scheduler for which the attacker knows the classification of the program to be inappropriate ([42], Sect. 4.1).

Fig. 13
figure 13

Two examples demonstrating the capabilities of strong security. We assume that h and x are classified as high and l as low. The left program is strongly secure, because both branches assign the same value to l. The right program is a transformation of the program on the left of Fig. 12, where the additional skip statement removes the probabilistic data channel

8.3 LSOD by Zdancewic and Myers

Inspired by Roscoe’s earlier work [39], Zdancewic and Myers [53] pointed out that conflicts are a necessary condition of probabilistic channels. They suggested combining a security-type system for implicit and explicit flow with a conflict analysis, arguing that programs without conflicts have no probabilistic channels.

The authors exclude termination channels and probabilistic order channels and justify that by confining the attacker to be a program itself (e.g. a thread). Such an attacker is not able to observe the relative order of low-observable events, because such an observation requires a probabilistic data channel in which the differing relative orders manifest.

The authors apply the approach to languages with message passing and shared memory. The language provides linear communication channels that are used for transmitting exactly one message and thus guarantee conflict-free communication. They present a security-type system for a concurrent calculus \(\lambda ^{PAR}_{SEC}\), that verifies confidentiality of implicit and explicit flow, and verifies that linear channels are used exactly once. The type system guarantees that well-typed programs are LSOD if they are additionally free of data conflicts. Later, Terauchi provided an improved type system for LSOD [47] which does not require confluence of the checked program.

It is interesting to compare their notion of low-equivalent traces with ours. Technically, they use a weaker notion of low-equivalence. They do not demand that two traces are low equivalent as in our Definition 6 (finite case); they only demand that the projections of the traces onto all individual variables (“location traces”) are low equivalent. This implies that low variables undergo the same changes in low-equivalent traces, but not necessarily at the same time. That is, more traces are low equivalent than in our definition—which should also reduce false alarms. Unfortunately, this approach is not flow- sensitive. Consider again Fig. 2 right: l is globally classified low; thus, any location trace for l has the form \(\langle l=0,l=0, l=0, l=h\rangle \). As h depends on inputPIN, these location traces are not low equivalent for different high inputs, causing a false alarm. In general, the “projection trick” and global variable classification may abstract away from statement order, implying flow insensitivity. In our analysis, l=h in Fig. 2 is not low observable (see discussion in Sect. 3.2); hence, the example is RLSOD even though our Definition 6 demands more than Zdancewic’s.

Zdancewic and Myers’ attacker is weaker than ours, as probabilistic order channels are excluded. It is explicitly designed to tackle malicious threads spying out confidential information. Their requirement that programs are completely free of data conflicts (which is much stricter than ours) in practice prevents an application to languages with shared memory, because many programs contain data conflicts, but many such conflicts do not influence low-observable behaviour. We have provided examples of such programs in Fig. 8.

8.4 LSOD by Huisman et al.

Huisman et al. [20] pointed out that Zdancewic and Myers’ method contains a leak, because its definition of low-equivalent program runs is restricted to the length of the shorter run. They close that leak by strengthening the definition of low-equivalent program runs: assignments to low variables sequentially behind loops iterating over high data are forbidden.

Closing termination channels additionally requires that either both program runs terminate or none of them. They also discover probabilistic order channels, by extending location traces to the set \(L\) of low variables: In that case two program runs \(T\) and \(U\) are low equivalent if the set of low variables in \(T\) and \(U\) undergoes the same sequence of changes in both runs up to stuttering.

The authors formalized their different security properties via temporal logics, for which the model-checking problem is decidable if the program in question can be expressed by a finite state machine. This permits a very precise detection of relevant data conflicts such that total freedom of data conflicts is not required. Hence, their approach can be applied to languages with shared memory communication.

Closing termination channels has the effect that Huisman’s approach is more restrictive than ours. It forbids low-observable events sequentially behind loops iterating over high data. Furthermore, the optional treatment of probabilistic order channels imposes severe restrictions on the analysed programs. As in WPN, each assignment to a low variable is regarded as a low-observable event. Huisman thus requires that two low-equivalent program runs must make the same sequence of changes to low variables. Even if two threads work on completely unrelated low variables, the assignments to these variables must have a fixed interleaving order.

In more recent work, Huisman et al. [19, 35] introduced \(\delta \) -specific observational determinism. Since the then-known LSOD criteria all turned out to be either too restrictive or unsound, [19] explicitly reintroduces the scheduling policy \(\delta \) into the definition and demands low equivalency of traces only for traces under the same scheduling policy. Thereby, some restrictions on the original approach can be relaxed, while the model-checking approach can be maintained. \(\delta \)-specific observational determinism is explicitly scheduler dependent; like WPN, it is flow insensitive and assumes that variables are globally classified.

Thus, Huisman et al. depart again from scheduler independence, because they consider it to be too problematic. In contrast, we consider flow insensitivity to be the troublemaker, as our flow-sensitive LSOD is scheduler independent, sound, precise, and has demonstrated practical applicability. It remains to be seen how the discussion about PN and LSOD will eventually evolve.

8.5 Low distinguishability by Mantel, Sands, and Sudbrock

In a recent approach to compositional noninterference for concurrent programs, Mantel, Sands and Sudbrock [28] employ a flow-sensitive security-type system [21] and MHP to establish SIFUM-security. Threads are dynamically annotated with “modes”, i.e. assumptions and guarantees about variable read and write behaviour in program threads. SIFUM then defines low-distinguishability (which is similar to noninterference) via strong low bisimulation modulo modes. A soundness proof is provided.

Mantel, Sands and Sudbrock certainly improve earlier definitions by exploiting mode information for variables, and by being flow-sensitive. Furthermore, compositionality is an asset not yet provided by PDGs. However, [28] is restricted to a fixed number of threads and does not consider context-, object-, or time-sensitivity; evaluations of an implementation have not yet been reported.

Recently, Sudbrock has shown that for intraprocedural IFC, PDGs and flow-sensitive type systems such as [21] have exactly the same precision [30]. But note that all known type systems for interprocedural IFC are—in contrast to PDGs—not context- or object-sensitive.

8.6 LSOD for X10 by Chong and Muller

Recently, Chong and Muller presented an LSOD implementation for the language X10 [32]. X10 is designed for PGAS (Partitioned Global Address Space) hardware architectures, where hardware resources are partitioned into so-called tiles. In X10, tiles ared modelled by so-called places, and using the at command, computations can be started at remote tiles.

Muller and Chong [32] assumes that every tile may have a different security level, but all data on a tile have the same security level. They present an operational semantics for a relevant X10 subset, together with a security-type system which guarantees LSOD. The approach has been implemented, but empirical data on scalability and precision have not been published. It would be interesting to empirically compare their implementation to JOANA.Footnote 17

9 Future work

Let us briefly mention some topics in ongoing and future work.

Time sensitivity. Recently, an optional time-sensitive slicer was integrated. We also added an experimental autotuning facility, which automatically switches between variants of pointer analysis (flow-sensitive, context-sensitive) and slicer (time-sensitive and I2P variants). Autotuning is applied to a benchmark of secure programs, until a setup without false alarms has been found. However, more algorithm engineering is needed to balance precision against scalability.

Lock sensitivity. As described in Sect. 2, MHP analysis is crucial for PDG precision and hence for overall IFC precision. The current MHP analysis, however, does not analyse explicit locks in the program. The latter property is called lock sensitivity and has been explored in [5, 6] in the scope of Dynamic Pushdown Networks. We recently integrated this analysis. Preliminary experiments indicate that MHP indeed becomes more precise, as more interference edges are pruned [13].

Declassification. We currently do not provide a declassification mechanism for probabilistic channels, only for sequential channels (see [16] for details). Instead of declassifying probabilistic channels, we consider Zdancewic and Myers’ idea of using linear channels for deterministic communication between threads [53] more promising. Linear channels can be integrated in form of a library into languages with shared memory. We have recently added a preliminary implementation of such a library.

Compositionality. As explained in Sect. 4.2, the PDG of a complete system cannot be obtained by just combining PDGs of subsystems. We recently investigated mechanisms to overcome this drawback: subsystems can be analysed in isolation and later plugged into larger systems. However, plug-in of local PDGs is a nontrivial operation, which may require secondary fix-point iteration until global dependencies stabilize.

Machine-checked proofs. It is our long-term goal to formalize our RLSOD check in Isabelle and provide a machine-checked proof for Theorem 1; just as we have provided machine-checked soundness proofs for the sequential (interprocedural) PDG-based IFC [50, 51].

Evaluation and comparison. JOANA’s RLSOD implementation needs to be applied in larger case studies and will be compared with other IFC tools.

10 Conclusion

We presented a new method for information flow control in concurrent programs. The method guarantees probabilistic noninterference and is based on a new variant—named RLSOD—of low-security observational determinism. It turns out that RLSOD can be naturally implemented through slicing algorithms for concurrent programs, which are flow-sensitive, object-sensitive, context-sensitive, and time-sensitive. We also demonstrated how RLSOD fixes some weaknesses of earlier LSOD definitions. In particular, secure low nondeterminism is allowed by RLSOD. In essence, we demonstrated that flow sensitivity is the key to maintain soundness and precision in IFC for multi-threaded programs.

Our implementation can handle full Java with an arbitrary number of threads. It was applied to examples from the literature and a small benchmark; preliminary experience indicates high precision and scalability for medium-sized programs. Future work will explore declassification for probabilistic channels; we also aim at a machine-checked version of the soundness proof.

Our current work is part of a long-standing project, which exploits modern program analysis for software security. The current work demonstrates that IFC analysis of concurrent programs can indeed be improved by applying PDGs and MHP analysis, resulting in algorithms with considerably enhanced precision and scalability.