Abstract
Quantitative information flow is a rigorous approach for evaluating the security of a system. It is used to quantify the amount of secret information leaked to the public outputs. In this paper, we propose an automated approach for quantitative information flow analysis of concurrent programs. Markovian processes are used to model the behavior of these programs. To this end, we assume that the attacker is capable of observing the internal behavior of the program and propose an equivalence relation, back-bisimulation, to capture the attacker’s view of the program behavior. A partition refinement algorithm is developed to construct the back-bisimulation quotient of the program model and then a quantification method is proposed for computing the information leakage using the quotient. Finally, an anonymous protocol, dining cryptographers, is analyzed as a case study to show applicability and scalability of the proposed approach.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Secure information flow is a rigorous technique for evaluating security of a system. A system satisfies confidentiality requirements if it does not leak any secret information to its public outputs. However, imposing no leakage policy is too restrictive and in practice the security policy of the system tends to permit minor leakages. For example, a password checking program leaks information about what the password is not when it shows a message indicating that user has entered a wrong password. Quantitative information flow has been a well-established attempt to overcome this deficiency. Given a system with secret (high confidentiality) inputs and public (low confidentiality) outputs, quantitative information flow addresses the problem of measuring the amount of information leakage, i.e., the amount of information that an attacker can deduce about the secret inputs by observing the outputs. Quantitative information flow is widely used in analyzing timing attacks [23, 24], differential privacy [1], anonymity protocols [13, 25, 29, 30], and cryptographic algorithms [18, 19].
Assume a program with a secret input and a public output. Furthermore, assume an attacker that executes the program and observes the public output. A common approach for measuring the amount of leaked information is to use the notion of uncertainty [31]. Before executing the program, the attacker has an initial uncertainty about the secret, which is determined by her prior knowledge of the secret. After executing the program and observing the output, she may infer information about the secret and thus her uncertainty may be reduced. This yields the following intuitive definition of the information leakage [31]: leaked information = initial uncertainty - remaining uncertainty.
In this paper, a practical and automated formal approach is proposed to quantify the information leakage of terminating concurrent programs. The approach considers leakages in intermediate states of the program executions and effect of the scheduling policy.
We assume the program has a secret input h, a public output l, and zero or more neutral variables. Neutral variables specify temporary and/or auxiliary components of the runtime program configuration that do not belong to a certain confidentiality level by nature, e.g., the stack pointer and loop indexes. h is fixed and does not change during program executions. This is the case in any analysis in the context of confidentiality that assumes data integrity to be out of scope, e.g., [2, 7]. We also assume that the public and neutral variables have single initial values. Furthermore, a probabilistic attacker is supposed, who has the full knowledge of source code of the concurrent program and is able to choose a scheduler and execute the program under the control of that scheduler. She can observe sequences of values of l during the executions, called execution traces. We also assume that the attacker can execute the program an arbitrary number of times and can then guess the value of h in a single attempt. This is called one-try guessing model [31].
In order to model the concurrent program, Markov decision processes (MDPs) are used. MDPs provide a powerful state transition system, capable of modeling probabilistic and nondeterministic behaviors [28]. The scheduler is assumed to be probabilistic, resolving nondeterminism in the MDP and inducing a Markov chain (MC). States of an MC contain values of h, l, and possible neutral variables. For computing the leakage, however, MC should capture the attacker’s view of the program. The attacker, while executing the program and observing the execution traces, does not know the exact value of h in each step. She can only guess a set of possible values based on the executed program statements and the observed traces. She also cannot distinguish those executions of MC that have the same trace. In this regard, we define an equivalence relation for a given MC, called back-bisimulation, to specify these requirements of the threat model. Back-bisimulation induces a quotient which models the attacker’s view of the program. A partition-refinement algorithm is proposed to compute the back-bisimulation quotient.
Each state of the back-bisimulation quotient contains a secret distribution, which shows possible values of h in that state, and thus is a determiner of the attacker’s uncertainty about h. Each execution trace of the quotient shows a reduction of the attacker’s uncertainty from the initial state to the final state of the trace. Therefore, secret distribution in the initial state of the quotient determines the attacker’s initial uncertainty and secret distributions in the final states determine the remaining uncertainty. In the literature, uncertainty is measured based on the notion of entropy. The entropy of h expresses the difficulty for an attacker to discover its value. Based on the program model and the attacker’s observational power, various definitions of entropy have been proposed. As Smith [31] shows, in the context of one-try guessing model, uncertainty about a random variable should be defined in terms of Renyi’s min-entropy. This yields that the information leakage is computed as the difference of the Renyi’s min-entropy of h in the initial state of the quotient and the expected value of the Renyi’s min-entropy of h in the final states of the quotient.
We also show a subclass of MCs, called Markov chains with pseudoback-bisimilar states, in which back-bisimulation cannot correctly construct the attacker’s view of the program behavior. Using back-bisimulation to handle this situation is considered a potential future work. Briefly, our contributions include
-
proposing back-bisimulation equivalence, in order to capture the attacker’s observation of the program,
-
developing an algorithm to compute back-bisimulation quotient of an MC,
-
proposing a method to compute the leakage of a concurrent program from the back-bisimulation quotient, and
-
analyzing the dining cryptographers problem.
1.1 Paper Outline
The paper proceeds as follows. Section 2 provides a core background on some basics, information theory, Markovian models and probabilistic schedulers. Section 3 presents the proposed approach. It starts with introducing the program and threat models. It then formally defines back-bisimulation and discusses how to compute the program leakage. Finally, it describes how to construct the attacker’s view of the program model, the back-bisimulation quotient. Section 4 concludes the paper and proposes future work. Finally, the case study and related work are discussed in Appendix A and Appendix B, respectively.
2 Background
In this section, we provide preliminary concepts and notations required for the proposed approach.
2.1 Basics
A probability distribution \( Pr \) over a set \(\mathcal {X}\) is a function \( Pr : \mathcal {X} \rightarrow [0,1]\), such that \(\sum _{x \in \mathcal {X}} Pr (x) = 1\). We denote the set of all probability distributions over \(\mathcal {X}\) by \(\mathcal {D}(\mathcal {X})\).
Let S be a set and \(\mathcal {R}\) an equivalence relation on S. For \(s \in S\), \([s]_\mathcal {R}\) denotes the equivalence class of s under \(\mathcal {R}\), i.e., \([s]_\mathcal {R} = \{ s^\prime \in S \ | \ s \mathcal {R} s^\prime \}\). Note that for \(s^\prime \in [s]_\mathcal {R}\) we have \([s^\prime ]_\mathcal {R} = [s]_\mathcal {R}\). The set \([s]_\mathcal {R}\) is often referred to as the \(\mathcal {R}\)-equivalence class of s. The quotient space of S under \(\mathcal {R}\), denoted by \(S/\mathcal {R} = \{ [s]_\mathcal {R} \ | \ s \in S \}\), is the set consisting of all \(\mathcal {R}\)-equivalence classes. A partition for S is a set \(\varPi = \{B_1, \ldots , B_k \}\) such that \(B_i \ne \varnothing \) (for \(0 < i \le k\)), \(B_i \cap B_j = \varnothing \) (for \(0< i < j \le k\)) and \(S = \cup _{0 < i \le k} B_i\). \(B_i \in \varPi \) is called a block. \(C \subseteq S\) is a superblock of \(\varPi \) if \(C = B_{i_1} \cup \dots \cup B_{i_l}\) for some \(B_{i_1}, \ldots , B_{i_l} \in \varPi \). Note that for equivalence relation \(\mathcal {R}\) on S, the quotient space \(S/\mathcal {R}\) is a partition for S.
2.2 Information Theory
Let \(\textrm{X}\) denote a random variable with the finite set of values \(\mathcal {X}\). Vulnerability [31] of \(\textrm{X}\) is defined as \( Vul (\textrm{X}) = \underset{x \in \mathcal {X}}{\max } \ Pr(\textrm{X}=x)\). Vulnerability is defined as the highest probability of correctly guessing the value of the variable in just a single attempt. In order to quantify information leaks, we convert this probability into bits using Renyi’s min-entropy [31].
Definition 1
The Renyi’s min-entropy of a random variable \(\textrm{X}\) is given by \(\mathcal {H}_\infty (\textrm{X}) = - \log _2 \ Vul (\textrm{X})\).
2.3 Markovian Models
We use Markov decision processes (MDPs) to model operational semantics of concurrent programs. MDPs are state transition systems that permit both probabilistic and nondeterministic choices [28]. In any state of an MDP, a nondeterministic choice between probability distributions exists. Once a probability distribution is chosen nondeterministically, the next state is selected in a probabilistic manner. Nondeterminism is used to model concurrency between threads by means of interleaving, i.e., all possible choices of the threads are considered. Formally,
Definition 2
A Markov decision process (MDP) is defined as a tuple \(\mathcal {M}=(S, Act , {\textbf {P}}, \zeta , AP , V)\) where,
-
S is a set of states,
-
\( Act \) is a set of actions,
-
\({\textbf {P}} : S \rightarrow ( Act \rightarrow (S \rightarrow [0,1]))\) is a transition probability function such that for all states \(s \in S\) and actions \(\alpha \in Act \), \({\sum }_{s^\prime \in S} {\textbf {P}}(s)(\alpha )(s^\prime ) \in \{0,1\}\),
-
\(\zeta :S \rightarrow [0,1]\) is an initial distribution such that \({\sum }_{s \in S} \zeta (s) = 1\).
-
\( AP \) is a set of atomic propositions,
-
\(V:S \rightarrow AP \) is a labeling function.
Atomic propositions represent simple known facts about the states. The function V labels each state with atomic propositions. An MDP \(\mathcal {M}\) is called finite if S, \( Act \), and AP are finite. An action \(\alpha \) is enabled in state s iff \({\sum }_{s^\prime \in S} {\textbf {P}}(s)(\alpha )(s^\prime ) = 1\). Let \( Act (s)\) denote the set of enabled actions in s. Each state \(s^\prime \) for which \({\textbf {P}}(s)(\alpha )(s^\prime ) > 0\) is called an \(\alpha \)-successor of s. The set of \(\alpha \)-successors of s is denoted by \( Succ (s, \alpha )\). The set of successors of s is defined as \( Succ (s) = \underset{\alpha \in Act (s)}{\cup } Succ (s, \alpha )\). The set of successors of a set of states \(\mathcal {S}\) is defined as \( Succ (\mathcal {S}) = \underset{s \in \mathcal {S}}{\cup } Succ (s)\). The set of predecessors of s is defined as \( Pre (s)=\{ s^\prime \in S \ | \ s \in Succ (s^\prime ) \}\). The set of labels that are associated with the predecessors of s is defined as \( PreLabels (s) = \{ V(s^\prime ) \ | \ s^\prime \in Pre (s), s^\prime \ne s \}\).
MDP Semantics. The intuitive operational behavior of an MDP \(\mathcal {M}\) is as follows. At the beginning, an initial state \(s_0\) is probabilistically chosen such that \(\zeta (s_0) > 0\). Assuming that \(\mathcal {M}\) is in state s, first a nondeterministic choice between the enabled actions needs to be resolved. Suppose action \(\alpha \in Act (s)\) is selected. Then, one of the \(\alpha \)-successors of s is selected probabilistically according to the transition function \({\textbf {P}}\). That is, with probability \({\textbf {P}}(s)(\alpha )(s^\prime )\) the next state is \(s^\prime \).
Initial and Final States. The states s with \(\zeta (s) > 0\) are considered as the initial states. The set of initial states of \(\mathcal {M}\) is denoted by \( Init (\mathcal {M})\). To ensure \(\mathcal {M}\) is non-blocking, we include a self-loop to each state s that has no successor, i.e., \({\textbf {P}}(s)(\tau )(s^\prime ) = 1\). The distinguished action label \(\tau \) is used to show that the self-loop’s action is not of further interest. Then, a state s is called final if \( Succ (s) = \{s\}\). In the literature, these states are called absorbing [3]. We call them final, because in our program model they show termination of the program. The set of final states of \(\mathcal {M}\) is denoted by \( final (\mathcal {M})\).
Execution Paths. Alternating sequences of states that may arise by resolving both nondeterministic and probabilistic choices in an arbitrary MDP \(\mathcal {M}\) are called (execution) paths. More precisely, a finite path fragment \(\hat{\sigma }\) of \(\mathcal {M}\) is a finite state sequence \(s_0s_1\ldots s_n\) such that \(s_i \in Succ (s_{i-1})\) for all \(0<i\le n\). A path \(\sigma \) is an infinite state sequence \(s_0 s_1 \ldots s_{n-1} s_n^\omega \) such that \(s_0 \in Init (\mathcal {M})\), \(s_i \in Succ (s_{i-1})\) for all \(0<i\le n\), \(\omega \) denotes infinite iteration, and \(s_n \in final (\mathcal {M})\), i.e., \(s_n^\omega \) denotes the infinite iteration over \(s_n\). The final state of \(\sigma \), i.e. \(s_n\), is given by \( final (\sigma )\). The set of execution paths of \(\mathcal {M}\) is denoted by \( Paths (\mathcal {M})\). The set of finite path fragments starting in s and ending in \(s^\prime \) is denoted by \( PathFrags (s,s^\prime )\).
Traces and Trace Fragments. A trace of an execution path is the sequence of atomic propositions of the states of the path. Formally, the trace of a finite path fragment \(\hat{\sigma } = s_0s_1 \ldots s_n\) is defined as \(\hat{T} = trace (\hat{\sigma }) = V(s_0) V(s_1) \ldots V(s_n)\). For a path \(\sigma = s_0s_1\ldots \), \(trace_{\ll i}(\sigma )\) is defined as the prefix of \( trace (\sigma )\) up to index i, i.e., \(trace_{\ll i}(\sigma ) = V(s_0) V(s_1) \ldots V(s_i)\). Let \( Paths (T)\) be the set of paths that have the trace T, i.e., \( Paths (T) = \{ \sigma \in Paths (\mathcal {M}) \ | \ trace (\sigma ) = T \}\). We define \( final ( Paths (T))\) to denote the set of final states that result from the trace T, i.e., \( final ( Paths (T)) = \{ final (\sigma ) \ | \ \sigma \in Paths (T) \}\).
MDPs are suitable for modeling concurrent programs, but since they contain nondeterministic choices, they are too abstract to implement. We need to resolve these nondeterministic choices into probabilistic ones. The result is a Markov chain, which does not contain action and nondeterminism.
Definition 3
A (discrete-time) Markov chain (MC) is a tuple \(\mathcal {M}=(S, {\textbf {P}}, \zeta , AP, V)\) where,
-
S is a set of states,
-
\({\textbf {P}} : S \times S \rightarrow [0,1]\) is a transition probability function such that for all states \(s \in S\), \({\sum }_{s^\prime \in S} {\textbf {P}}(s, s^\prime ) = 1\),
-
\(\zeta :S \rightarrow [0,1]\) is an initial distribution such that \({\sum }_{s \in S} \zeta (s) = 1\),
-
AP is a set of atomic propositions,
-
\(V:S \rightarrow AP\) is a labeling function.
The function \({\textbf {P}}\) determines for each state s the probability \({\textbf {P}}(s,s^\prime )\) of a single transition from s to \(s^\prime \). Note that for all states \(s \in S\), \({\sum }_{s^\prime \in S} {\textbf {P}}(s, s^\prime ) = 1\).
Reachability Probabilities. We define the probability of reaching a state s from an initial state in an MC \(\mathcal {M}\) as \( Pr (s) = \underset{ \begin{array}{c} \hat{\sigma } \in PathFrags (s_0 , s) \\ s_0 \in Init (\mathcal {M}) \end{array} }{\sum \ \ Pr (\hat{\sigma })}\), where
Trace Probabilities. The occurrence probability of a trace T is defined as \( Pr (T)=\underset{ \sigma \in Paths (T) }{\sum } Pr (\sigma )\), where \( Pr (\sigma = s_0 s_1 \ldots s_n^\omega ) = Pr (\hat{\sigma } = s_0 s_1 \ldots s_n)\).
DAG Structure of Program Models. We assumed that the programs always terminate and states indicate the current values of the variables and the program counter. This implies that Markovian models of every terminating program takes the form of a directed acyclic graph (DAG), modulo self-loops in final states. Therefore, reachability probabilities coincide with long-run probabilities [3]. Initial states of the program are represented as roots of the DAG, and final states as leaves. Each state of a Markovian model is located at a level equal to the least distance of that state from an initial state. Level of state s is denoted by level(s).
2.4 Probabilistic Schedulers
A probabilistic scheduler implements the scheduling policy of a concurrent program. It determines the order and probability of execution of threads. When a probabilistic scheduler is applied to a concurrent program, nondeterministic choices are replaced by probabilistic ones. As we modeled concurrency between threads using nondeterminism in MDP, the scheduler is used to resolve the possible nondeterminism in MDP. For demonstration purposes, it suffices to consider a simple but important subclass of schedulers called memoryless probabilistic schedulers. Given a state s, a memoryless probabilistic scheduler returns a probability for each action \(\alpha \in Act (s)\). This random choice is independent of what has happened in the history, i.e., which path led to the current state. This is why it is called memorylessFootnote 1. Formally,
Definition 4
Let \(\mathcal {M}=(S, Act , {\textbf {P}}, \zeta , AP, V)\) be an MDP. A memoryless probabilistic scheduler for \(\mathcal {M}\) is a function \(\delta :S \rightarrow \mathcal {D}( Act )\), such that \(\delta (s) \in \mathcal {D}( Act (s))\) for all \(s \in S\).
As all nondeterministic choices in an MDP \(\mathcal {M}\) are resolved by a scheduler \(\delta \), a Markov chain \(\mathcal {M}_\delta \) is induced. Formally,
Definition 5
Let \(\mathcal {M}=(S, Act , {\textbf {P}}, \zeta , AP, V)\) be an MDP and \(\delta :S \rightarrow \mathcal {D}( Act )\) be a memoryless probabilistic scheduler on \(\mathcal {M}\). The MC of \(\boldsymbol{\mathcal {M}}\) induced by \(\boldsymbol{\delta }\) is given by \(\mathcal {M}_\delta = (S, \textbf{P}_\delta , \zeta , AP, V)\) where \({\textbf {P}}_\delta (s, s^\prime ) = \underset{\alpha \in Act (s)}{\sum } \delta (s)(\alpha ).{\textbf {P}}(s)(\alpha )(s^\prime )\)
3 The Proposed Approach
Suppose a concurrent program P, running under control of a scheduling policy \(\delta \). The proposed approach proceeds in three steps: (1) defining an MDP representing P and applying \(\delta \) to the MDP to resolve the nondeterminism in the MDP (Sect. 3.1), (2) constructing a back-bisimulation quotient (Sect. 3.2), and (3) computing the leakage (Sect. 3.3). Finally, an algorithm for computing the back-bisimulation quotient is presented (Sect. 3.4).
3.1 The Program and Threat Models
It is assumed P has a secret input variable h and a public output variable l and h has a fixed ue during the program executions. If the program has several secret variables, they can be encoded (e.g. concatenated) into one secret variable. The same is done for public and neutral variables. Possible values of l and h are denoted by \( Val _l\) and \( Val _h\).
The attacker has a prior knowledge of the secret, which is modeled as a prior probability distribution over the possible values of h, i.e. Pr(h). Here, the attacker is assumed to be probabilistic, i.e., she knows size of the secret, in addition to some accurate constraints about the values of h. For instance, the attacker could know that h is 2 bits long, its value is not 1, the probability that its value is 2 is 0.6, and the probability that its value is 3 is thrice the probability that it is 0. The prior distribution encoding these constraints is \( Pr (h) = \{ 0 \mapsto 0.1, 2 \mapsto 0.6, 3 \mapsto 0.3 \}\)Footnote 2. A special case of the probabilistic attacker is ignorant [6], who has no prior information about the value of h except its size. Thus, the ignorant attacker’s initial knowledge is a uniform prior distribution on h.
Define an MDP Representing P. Operational semantics of the concurrent program P is represented by an MDP \(\mathcal {M}^{\texttt {P}}=(S, Act , {\textbf {P}}, \zeta , Val _l, V)\). Each state \(s \in S\) is a tuple \(\langle \overline{l}, \overline{h}, \overline{n}, pc \rangle \), where \(\overline{l}\), \(\overline{h}\), and \(\overline{n}\) are values of the public, secret, and neutral variables, respectively, and pc is the program counter. Actions \( Act \) are program statements of P. The function \(\textbf{P}\) defines probabilities of transitions between states. Atomic propositions are \( Val _l\) and the function V labels each state with value of l in that state. In fact, a state label is what an attacker observes in a state and traces of \(\mathcal {M}^{\texttt {P}}\) are the sequences of public values that are valid during the execution.
The initial distribution \(\zeta \) is determined by the prior knowledge of the attacker \( Pr (h)\), i.e., \(\zeta (s) = Pr (h=\overline{h})\) for all \(s \in Init (\mathcal {M}^{\texttt {P}})\), where \(s = \langle ., \overline{h}, ., . \rangle \).
Example 1
(Program P1). Consider the following program, where h is a 2-bit random variable and || denotes the concurrency of the executions:
The attacker’s prior knowledge is the size of h, yielding a uniform distribution on h, i.e., \( Pr (h) = \{ 0 \mapsto \frac{1}{4}, 1 \mapsto \frac{1}{4}, 2 \mapsto \frac{1}{4}, 3 \mapsto \frac{1}{4} \}\). The MDP \(\mathcal {M}^{\texttt {P1}}\) of the program is shown in Fig. 1. The initial distribution \(\zeta \) is determined by \( Pr (h)\), i.e., \(\zeta = \{ s_0 \mapsto \frac{1}{4}, s_4 \mapsto \frac{1}{4}, s_7 \mapsto \frac{1}{4}, s_{10} \mapsto \frac{1}{4} \}\). Each state is labeled by the value of l in that state. Each transition is labeled by an action (a program statement) and a probability. For instance, the transition from \(s_0\) to \(s_1\) has the action \(\alpha : \texttt {l:=h/2}\) and the probability \({\textbf {P}}(s_0)(\alpha )(s_1)=1\); Or the transition from \(s_0\) to \(s_2\) has the label \(\beta : \texttt {l:=h}\ \texttt {mod}\ \texttt {2}\) and the probability \({\textbf {P}}(s_0)(\beta )(s_2)=1\).
Resolve the Nondeterminism in the MDP. The scheduling policy is represented by a memoryless probabilistic scheduler \(\delta \). As the MDP \(\mathcal {M}^{\texttt {P}}\) is run under the control of the scheduler \(\delta \), all nondeterministic transitions are resolved and an MC \(\mathcal {M}^\texttt {P}_\delta = (S, {\textbf {P}}_\delta , \zeta , Val _l, V)\) is produced.
Example 2
(MC of P1). We choose the scheduler to be uniform. The uniform scheduler, denoted by the function uni, picks each thread with the same probability. This yields the definition of the scheduler as follows:
The MC \(\mathcal {M}^{\texttt {P1}}_{uni}\) of the program P1 running under control of the uniform scheduler is depicted in Fig. 2. In this Figure, transitions are labeled by the transition probability.
3.2 The Attacker’s View of the Program: Back-Bisimulation Quotient
In order to measure the amount of information the attacker can deduce about h, we need to construct the attacker’s view of the program. First, the attacker can distinguish a final state from a non-final one by observing termination of the program. Second, she cannot discriminate between those paths that have the same trace. For instance, in \(\mathcal {M}^{\texttt {P1}}_{uni}\) (Fig. 2) the attacker only observes the traces \(\{ \langle 0,0,0^\omega \rangle , \langle 0,0,1^\omega \rangle , \langle 0,1,0^\omega \rangle , \langle 0,1^\omega \rangle \}\), whereas there are seven different execution paths. The implication is that she cannot distinguish those final states that have the same public values and result from the same traces. Third, she does not know secret values in the final states, but may guess the value of h based on a probability distribution that she can compute according to the possible values of h in each final state. These three requirements are captured by an equivalence relation, called back-bisimulation, denoted by \(\sim _b\).
Definition 6
Let \(\mathcal {M}^\texttt {P}_\delta \) be an MC. A back-bisimulation for \(\mathcal {M}^\texttt {P}_\delta \) is a binary relation \(\mathcal {R}\) on S such that for all \(s_1\ \mathcal {R}\ s_2\), the following three conditions hold:
(1) \(V(s_1) = V(s_2)\), (2) if \(s_1^\prime \in Pre(s_1 )\), then there exists \(s_2^\prime \in Pre(s_2 )\) with \(s_1^\prime \ \mathcal {R}\ s_2^\prime \), (3) if \(s_2^\prime \in Pre(s_2 )\), then there exists \(s_1^\prime \in Pre(s_1 )\) with \(s_1^\prime \ \mathcal {R}\ s_2^\prime \).
States \(s_1\) and \(s_2\) are back-bisimilar, denoted by \(s_1 \sim _b s_2\), if there exists a back-bisimulation \(\mathcal {R}\) for \(\mathcal {M}^\texttt {P}_\delta \) with \(s_1\ \mathcal {R}\ s_2\).
Condition (1) requires that the states \(s_1\) and \(s_2\) have the same public values. According to condition (2), every incoming transition of \(s_1\) must be matched by an incoming transition of \(s_2\); the reverse is assured by condition (3).
Theorem 1
Back-bisimulation is an equivalence relation.Footnote 3
As \(\sim _b\) is an equivalence relation, it induces a set of equivalence classes on the state space of an MC. Given MC \(\mathcal {M}^\texttt {P}_\delta \), a quotient space \(\mathcal {M}^\texttt {P}_\delta /\sim _b\) captures the attacker’s view of the program P. The MC \(\mathcal {M}^\texttt {P}_\delta /\sim _b\) aggregates same-trace paths of \(\mathcal {M}^\texttt {P}_\delta \) into one path.
Definition 7
For MC \(\mathcal {M}^\texttt {P}_\delta = (S, {\textbf {P}}_\delta , \zeta , Val _l, V)\) and back-bisimulation \(\sim _b\), the back-bisimulation quotient \(\boldsymbol{\mathcal {M}^\texttt {P}_\delta /\sim _b}\) is defined by \(\mathcal {M}^\texttt {P}_\delta /\sim _b\) where
\(\mathcal {M}^\texttt {P}_\delta /\sim _b = ( S/\sim _b , {\textbf {P}}^\prime _\delta , s^b_{init}, Val _l, V, Pr (h) )\)
-
\(S/\sim _b\) is the quotient space of S under \(\sim _b\),
-
\({\textbf {P}}^\prime _\delta : (S/\sim _b) \times (S/\sim _b) \rightarrow [0,1]\) is a probability transition function between equivalence classes of \(S/\sim _b\) such that \(\forall \ s^b, t^b \in S/\sim _b . \ {\textbf {P}}^\prime _\delta (s^b, t^b) = \underset{\begin{array}{c} s \in s^b, \ t \in t^b \end{array}}{\sum } Pr (s)*{\textbf {P}}_\delta (s,t)\), where \( Pr (s)\) is the probability of reaching to s in MC \(\mathcal {M}^\texttt {P}_\delta \),
-
\(s^b_{init} = Init(\mathcal {M}^\texttt {P}_\delta )\),
-
\(V ([s]_{\sim _b}) = V(s)\),
-
\( Pr (h)\) is a mapping from each quotient state \(s^b\) to \( Pr (h_{s^b})\), where \( Pr (h_{s^b})\) is the probability distribution of h in the state \(s^b\) and is computed, for all \(\overline{h} \in Val _h\), as
where \( Pr (s^b)\) is the reachability probability of \(s^b\) in \(\mathcal {M}^\texttt {P}_\delta /\sim _b\).
The public variable has a single initial value. Thus, all of the initial states of \(\mathcal {M}^\texttt {P}_\delta \) have the same public value and form a single equivalence class \(s^b_{init}\). Each state \(s^b\) is labeled with a probability distribution \( Pr (h_{s^b})\) which shows the probabilities of possible values of h in that state.
Example 3
(Back-bisimulation quotient of P1). The back-bisimulation quotient \(\mathcal {M}^{\texttt {P1}}_{uni}/\sim _b\) is depicted in Fig. 3. Each state of \(\mathcal {M}^{\texttt {P1}}_{uni}/\sim _b\) is an equivalence class, containing related states of \(\mathcal {M}^{\texttt {P1}}_{uni}\):
States are labeled with the value of l, together with the distribution of h:
The back-bisimulation quotient can be automatically constructed from the MC. This will be discussed in Sect. 3.4. After constructing the quotient, the next step is to compute the program leakage from the back-bisimulation quotient.
3.3 Measuring the Leakage Using Back-Bisimulation Quotient
Let \(\mathcal {M}^\texttt {P}_\delta /\sim _b = ( S/\sim _b , {\textbf {P}}^\prime _\delta , s^b_{init}, Val _l, V, Pr (h) )\) be the attacker’s view of the program P running with the scheduler \(\delta \). In each state \(s^b\) of \(\mathcal {M}^\texttt {P}_\delta /\sim _b\), the secret distribution \( Pr (h_{s^b})\) determines the attacker’s uncertainty about h. Depending on the program statements that are chosen and executed by the scheduler, and the public values observed by the attacker, the distribution of h changes from state to state along each trace of \(\mathcal {M}^\texttt {P}_\delta /\sim _b\). In fact, \(\mathcal {M}^\texttt {P}_\delta /\sim _b\) transforms a priori distribution of h in the initial state \(s^b_{init}\) to posterior distributions in the final states \( final (\mathcal {M}^\texttt {P}_\delta /\sim _b)\).
The attacker’s uncertainty about h in a state \(s^b\) with the secret distribution \( Pr (h_{s^b})\) is measured by \(\mathcal {H}_\infty (\texttt {h}_{s^b})\). Thus, the initial uncertainty is measured by \(\mathcal {H}_\infty (\texttt {h}_{s^b_{init}})\).
Since there might be more than one final state with different reachability probabilities and the MC can seen as a discrete probability distribution over all of its final states, the remaining uncertainty is defined as the expectation of uncertainties in all final states: \(\underset{s^b_f \in final (\mathcal {M}^\texttt {P}_\delta /\sim _b)}{\sum } Pr (s^b_f) \mathcal {H}_\infty (\texttt {h}_{s^b_f})\), where \( Pr (s^b_f)\) is the probability of reaching \(s^b_f\) from the initial state \(s^b_{init}\). It now follows that the leakage of the concurrent program P running under control of the scheduler \(\delta \) is computed as \(\mathcal {L}(\texttt {P}_\delta ) = \mathcal {H}_\infty (\texttt {h}_{s^b_{init}}) - \sum _{s^b_f \in final (\mathcal {M}^\texttt {P}_\delta /\sim _b)} Pr (s^b_f). \mathcal {H}_\infty (\texttt {h}_{s^b_f})\).
Notice that for measuring the leakage of P, we computed min-entropy of initial and final states, and did not consider min-entropy of intermediate states. This is not in contrast with our assumption of taking into account the intermediate values of l along the execution paths. This is because in \(\mathcal {M}^\texttt {P}_\delta /\sim _b\) distributions of h in the final states result from values of l and distributions of h in the intermediate states. Thus, when computing the remaining uncertainty from the final distributions, the intermediate values of l are automatically taken into account. The final distributions of h also result from the program statements which are chosen by the scheduler. Therefore, the effect of the scheduler choices is considered, as well.
Moreover, in the literature, the remaining uncertainty is usually measured by the conditional entropy \(\mathcal {H}_\infty ({h}|{l})\), but we measure it by the non-conditional entropy \(\mathcal {H}_\infty ({h})\). These entropies are identical in our program model, because in \(\mathcal {M}^\texttt {P}_\delta /\sim _b\) the entropy \(\mathcal {H}_\infty ({h})\) is computed from final states that result from traces observed by the attacker. This is exactly the same as the conditional entropy \(\mathcal {H}_\infty ({h}|{l})\).
Example 4
(Back-bisimulation quotient of P1 is a distribution transformer.). In the initial state \(s^b_0\) of \(\mathcal {M}^{\texttt {P1}}_{uni}/\sim _b\), the distribution of h is \( Pr (h_{s^b_0}) = \{0 \mapsto \frac{1}{4}, 1 \mapsto \frac{1}{4}, 2 \mapsto \frac{1}{4}, 3 \mapsto \frac{1}{4}\}\). This means that before executing the program, the attacker only knows that the value of h belongs to the set \(\{0,1,2,3\}\) and if she guesses the value of h, then the likelihood of her being successful is \(\frac{1}{4}\). Therefore, \( Pr (h_{s^b_0})\) determines the attacker’s initial uncertainty about h. Now consider the final state \(s^b_3\), in which the distribution of h is \( Pr (h_{s^b_3}) = \{1 \mapsto \frac{1}{2}, 2 \mapsto \frac{1}{2}\}\). In this state, the attacker knows that the value of h belongs to \(\{1,2\}\), and thus her uncertainty about h is reduced. This means that after executing the program and observing the trace \(\langle 0,1,0^\omega \rangle \), if the attacker guesses the value of h, then the likelihood of her being successful is \(\frac{1}{2}\). These considerations imply that the back-bisimulation quotient is a distribution transformer.
Example 5
(Information leakage of P1). The initial uncertainty is quantified as the Renyi’s min-entropy of h in the initial state \(s^b_0\), i.e., \(\mathcal {H}_\infty (\texttt {h}_{s^b_0}) = -\log _2 \frac{1}{4}=2 \ (bits)\).
The remaining uncertainty is quantified as the Renyi’s min-entropy of h in the final states. There are four final states with different reachability probabilities: \(s^b_2\), \(s^b_5\), \(s^b_3\), \(s^b_4\). Consequently, the remaining uncertainty is quantified as the expectation of the Renyi’s min-entropy of h in these states:
where \( Pr (s^b)\) denotes the probability of reaching \(s^b\) from the initial state \(s^b_0\). Finally, the leakage of the program P1 running with the uniform scheduler is computed as \(\mathcal {L}(\texttt {P1}_{uni}) = 2 - 0.5 = 1.5 \ (bits)\).
The following section formally defines the back-bisimulation equivalence and explains how to compute the back-bisimulation quotient.
3.4 Computing Back-Bisimulation Quotient
In this section, we discuss how to compute the back-bisimulation quotient. Before that, we first explain a subclass of MCs, called Markov chains with pseudoback-bisimilar states, in which back-bisimulation cannot correctly construct the attacker’s view of the program behavior.
Pseudoback-Bisimilar States. In order to compute the states of a back-bisimulation quotient, we need to aggregate back-bisimilar states into one equivalence class. For that, we define Back-bisimulation signature, which is defined as a kind of fingerprint for states of a back-bisimulation equivalence class.
Definition 8
The back-bisimulation signature of a state s is defined as
It asserts that two states that have the same public value and their predecessors belong to the same equivalence class, have the same signature.
Definition 9
Let \(\mathcal {M}^\texttt {P}_\delta \) be an MC. Two states \(s_1 , s_2 \in S\) are pseudoback-bisimilar iff (1) \(V(s_1) = V(s_2)\), (2) \(level(s_1) = level(s_2)\), (3) \(sig_{\sim _b}(s_1) \ne sig_{\sim _b}(s_2)\), (4) \(PreLabels(s_1) \cap PreLabels(s_2) \ne \varnothing \). An MC that contains some pseudoback-bisimilar states is denoted by MC\(_\mathfrak {p}\) and an MC with no pseudoback-bisimilar state is denoted by MC\(_\mathfrak {n}\).
Stated in words, two states are pseudoback-bisimilar if they have the same label, are at the same level (distance from an initial state), and have different signatures, but intersecting pre-labels. In an MC\(_\mathfrak {n}\), states at the same level and with the same label, either have no intersecting pre-labels or have the same pre-labels.
Example 6
(An example \(MC_\mathfrak {p}\)). Consider the following program:
where \( Val _h \in \{0,1\}\) and \( Pr (h) = \{ 0 \mapsto \frac{1}{2}, 1 \mapsto \frac{1}{2} \}\). A uniform scheduler is selected for both parallel operators. The MC \(\mathcal {M}^{\texttt {P2}}_{uni}\) is shown in Fig. 4a.
In \(\mathcal {M}^{\texttt {P2}}_{uni}\), states \(s_8\) and \(s_9\) are pseudoback-bisimilar. They both have the label 3, are at the level 3, and have different signatures:
but intersecting pre-labels: \(preLabels(s_8) = \{2,1\}, \qquad preLabels(s_9) = \{2\}\).
Back-bisimulation captures the attacker’s view for all programs that do not contain pseudoback-bisimilar states, i.e., those final states that have the same public values and result from the same trace are indistinguishable and fall into the same \(\sim _b\)-equivalence class. Formally,
Theorem 2
Let \(\mathcal {M}^\texttt {P}_\delta \) be an MC\(_\mathfrak {n}\). For all paths \(\sigma _1, \sigma _2 \in Paths(\mathcal {M}^\texttt {P}_\delta )\) with
\(\sigma _1 = s_{0,1} s_{1,1} \ldots s_{n-1,1} (s_{n,1})^\omega \), \(\sigma _2 = s_{0,2} s_{1,2} \ldots s_{n-1,2} (s_{n,2})^\omega \), and \(n \ge 0\) it holds that \(s_{n,1} \sim _b s_{n,2}\) iff \(trace(\sigma _1) = trace(\sigma _2)\).
Theorem 2 argues that same-trace final states are back-bisimilar. A similar argument can be made for non-final states.
Theorem 3
Let \(\mathcal {M}^\texttt {P}_\delta \) be an MC\(_\mathfrak {n}\). For all paths \(\sigma _1, \sigma _2 \in Paths(\mathcal {M}^\texttt {P}_\delta )\) with
\(\sigma _1 = s_{0,1} s_{1,1} \ldots s_{n-1,1} (s_{n,1})^\omega \), \(\sigma _2 = s_{0,2} s_{1,2} \ldots s_{m-1,2} (s_{m,2})^\omega \), \(n, m > 0\), and \(0 \le i < min(n,m)\) it holds that \(s_{i,1} \sim _b s_{i,2}\) iff \(trace_{\ll i}(\sigma _1) = trace_{\ll i}(\sigma _2)\).
Therefore, all paths of \(\mathcal {M}^\texttt {P}_\delta \) with the same trace form a single path in \(\mathcal {M}^\texttt {P}_\delta /\sim _b\). Stated formally, let \(\sigma ^\prime \in Paths(\mathcal {M}^\texttt {P}_\delta /\sim _b)\), \(s^b_f = final (\sigma ^\prime ) \in final (\mathcal {M}^\texttt {P}_\delta /\sim _b)\), and \(T = trace(\sigma ^\prime )\). The path \(\sigma ^\prime \) is the aggregation of all paths \(Paths(T) \subseteq Paths(\mathcal {M}^\texttt {P}_\delta )\). All final states of \(\mathcal {M}^\texttt {P}_\delta \) that result from the trace T fall into the same \(\sim _b\)-equivalence class \(s^b_f = \{ s_f \ | \ s_f \in final (Paths(T)) \}\). For \(s^b_f\), the secret distribution \( Pr (h_{s^b_f})\) contains probabilities of possibles values of h that the attacker might be able to guess by observing T.
Pseudoback-bisimilar states do not fall into the same \(\sim _b\)-equivalence class and thus \(\sim _b\) is not able to aggregate all paths with the same trace. For instance, in \(\mathcal {M}^{\texttt {P2}}_{uni}/\sim _b\) (Fig. 4b) there are two paths \(s^b_3 s^b_7 s^b_{10} s^b_5 s^b_2 (s^b_6)^\omega \) and \(s^b_3 s^b_7 s^b_{10} s^b_4 s^b_9 (s^b_{12})^\omega \) with the same trace \(\langle 0,1,2,3,4,5^\omega \rangle \). The attacker, after observing the trace, cannot discriminate the value of h to be 0 or 1. But, in the attacker’s view of the MC constructed by back-bisimulation (Fig. 4b) the value of h is distinguished in the final states of the two paths. Furthermore, the probability of some traces in the back-bisimulation quotient might be different from their probability in the concrete model. For example, the probability of the trace \(\langle 0,1,2,3,4,5^\omega \rangle \) in \(\mathcal {M}^{\texttt {P2}}_{uni}\) is \(\frac{5}{8}\), while it is \(\frac{9}{16}\) in \(\mathcal {M}^{\texttt {P2}}_{uni}/\sim _b\). The implication is that back-bisimulation cannot correctly construct the attacker’s view of an MC\(_\mathfrak {p}\). For MC\(_\mathfrak {p}\)s, we use the trace-exploration-based method, introduced in [25], which computes the program leakage directly from the MC\(_\mathfrak {p}\) \(\mathcal {M}^{\texttt {P}}_{\delta }\).
Algorithm for Computing the Back-Bisimulation Quotient Space. In this section, an algorithm is proposed for obtaining the back-bisimulation quotient space for a finite MC\(_\mathfrak {n}\). This algorithm is similar to Kanellakis and Smolka’s algorithm for computing the bisimulation quotient space [20]. It relies on a partition refinement technique, where the state space is partitioned into blocks. It starts from an initial partition \(\varPi _0\) and computes successive refinements of \(\varPi _0\) until a stable partition is reached. A partition is stable if no further refinements are possible. The obtained partition is \(S/\sim _b\), the largest back-bisimulation over the input finite MC\(_\mathfrak {n}\). The essential steps are outlined in Algorithm 1. The algorithm consists of two main parts: (a) computing the initial partition, and (b) successively refining the partitions.
Computing the Initial Partition. Since back-bisimilar states have the same public value, it is sensible to use this in determining the initial partition \(\varPi _0\).
All initial states have the same public value and have no predecessors. Consequently, they form a single block \(s^b_{init} = Init(\mathcal {M}^\texttt {P}_\delta )\). This block will remain unchanged during the refinements.
For the remaining states \(S {\setminus } s^b_{init}\), each group of states with the same public value forms a block. Same-label blocks can be obtained by the equivalence relation \(\mathcal {R} = \{ (s_1, s_2) \ | \ V(s_1)=V(s_2) \}\), which induces the quotient spaces \(\big (S \ \backslash \ Init(\mathcal {M}^\texttt {P}_\delta ) \big )\ / \ \mathcal {R}\). Thus, the initial partition is obtained as \(\varPi _0 = \{ s^b_{init} \} \cup \Big ( \big (S \ \backslash \ Init(\mathcal {M}^\texttt {P}_\delta ) \big ) \ / \ \mathcal {R} \Big )\).
Partition Refinement. Since all partitions are a refinement of the initial partition \(\varPi _0\), each block in these partitions contains states with the same public value. However, blocks of \(\varPi _0\), except \(s^b_{init}\), do not consider the one-step predecessors of states. This is taken care of in the successive refinement steps, by the refinement operator.
Definition 10
Let \(\varPi \) be a partition for S and \(\mathcal {C}\) be a superblock of \(\varPi \). Then,
where \( Refine _b(\mathcal {B}, \mathcal {C}) = \{ \mathcal {B} \cap Succ(\mathcal {C}),\ \mathcal {B} {\setminus } Succ (\mathcal {C}) \} {\setminus } \{ \varnothing \}\). Here, \(\mathcal {C}\) is called a splitter for \(\varPi \), refining blocks of \(\varPi \) to subblocks.
Using \(\mathcal {C}\), \( Refine _b(\mathcal {B}, \mathcal {C})\) decomposes the block \(\mathcal {B}\) into two subblocks, provided that the subblocks are nonempty.
A key step of computing the back-bisimulation quotient space is to determine a splitter \(\mathcal {C}\) for a given partition \(\varPi \). Algorithm 1 uses the blocks of the previous partition \(\varPi _{old}\) as splitter candidates for \(\varPi \).
Theorem 4
Algorithm 1 always terminates and correctly computes the back-bisimulation quotient space \(S/\sim _b\).
The following theorem discusses the time complexity of Algorithm 1.
Theorem 5
The time complexity of Algorithm 1 is O(|S|.|E|), where E denotes the set of transitions of \(\mathcal {M}^\texttt {P}_\delta \).
4 Conclusions and Future Work
In this paper, a quantification approach is proposed for concurrent programs. Back-bisimulation equivalence is defined to model the attacker’s view of the program behavior. Then a partition refinement algorithm is developed to compute the back-bisimulation quotient of the program. The back-bisimulation quotient is automatically constructed and contains secret distributions, which are used to compute the information leakage of the program.
The back-bisimulation quotient contains all execution traces which the attacker can observe during executing the program. Thus, it can be used to compute maximal and minimal leakages that might occur during the program executions. Furthermore, the quotient is an abstract model of the program and the quantification analysis is done on a minimized model, most likely saving time and space.
Back-bisimulation equivalence creates a lot of exciting opportunities for future works. It can be used to verify any trace-equivalence-based property, such as observational determinism [21, 26, 33], a widely-studied confidentiality property of secure information flow for concurrent programs. It can also be defined on multi-terminal binary decision diagrams (MTBDDs), in order to improve the scalability of the quantification approach to a great extent. We aim to lift the program-termination restriction and extend the proposed approach to non-terminating concurrent programs. We also aim to study bounded leakage problem [25] and channel capacity [29] on the back-bisimulation quotient. Probably, using some reductions, such as on-the-fly techniques, can improve the scalability of the problem. Furthermore, handling programs with pseudoback-bisimilar states using back-bisimulation is a possible future work. Another avenue to consider the current work is to perform time analysis of the proposed approach, e.g. on dining cryptographers protocol.
Notes
- 1.
A rather general notion of schedulers is to let them use the full history of execution to make decisions. Here, this general definition is not needed and only makes the program model unnecessarily complex.
- 2.
Only elements with a positive probability are shown.
- 3.
The proofs of the theorems have been omitted due to meet the page limit.
References
Alvim, M.S., Andrés, M.E., Chatzikokolakis, K., Palamidessi, C.: Quantitative information flow and applications to differential privacy. In: Aldini, A., Gorrieri, R. (eds.) FOSAD 2011. LNCS, vol. 6858, pp. 211–230. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23082-0_8
Amir-Mohammadian, S.: A semantic framework for direct information flows in hybrid-dynamic systems. In: Proceedings of the 7th ACM Cyber-Physical System Security Workshop (CPSS 2021), pp. 5–15. Association for Computing Machinery, June 2021
Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)
Barthe, G., D’Argenio, P.R., Rezk, T.: Secure information flow by self-composition. In: Proceedings of the 17th IEEE Workshop on Computer Security Foundations, CSFW 2004, pp. 100–114. IEEE Computer Society (2004)
Beckert, B., Hähnle, R., Schmitt, P.H.: Verification of Object-Oriented Software. The KeY Approach. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-69061-0
Biondi, F.: Markovian processes for quantitative information leakage. Ph.D. thesis, IT University of Copenhagen (2014)
Biondi, F., Legay, A., Malacaria, P., Wasowski, A.: Quantifying information leakage of randomized protocols. Theor. Comput. Sci. 597, 62–87 (2015)
Biondi, F., Legay, A., Nielsen, B.F., Wasowski, A.: Maximizing entropy over Markov processes. J. Log. Algebraic Methods Program. 83(5), 384–399 (2014)
Cardelli, L., Tribastone, M., Tschaikowski, M., Vandin, A.: Forward and backward bisimulations for chemical reaction networks. arXiv preprint arXiv:1507.00163 (2015)
Chadha, R., Mathur, U., Schwoon, S.: Computing information flow using symbolic model-checking. In: Raman, V., Suresh, S.P. (eds.) 34th International Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS 2014). Leibniz International Proceedings in Informatics (LIPIcs), vol. 29, pp. 505–516. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2014)
Chaum, D.: The dining cryptographers problem: Unconditional sender and recipient untraceability. J. Cryptol. 1(1), 65–75 (1988). https://doi.org/10.1007/BF00206326
Chen, H., Malacaria, P.: The optimum leakage principle for analyzing multi-threaded programs. In: Kurosawa, K. (ed.) ICITS 2009. LNCS, vol. 5973, pp. 177–193. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14496-7_15
Chen, H., Malacaria, P.: Quantifying maximal loss of anonymity in protocols. In: Proceedings of the 4th International Symposium on Information, Computer, and Communications Security, pp. 206–217. ACM (2009)
Chothia, T., Kawamoto, Y., Novakovic, C.: LeakWatch: estimating information leakage from Java programs. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8713, pp. 219–236. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11212-1_13
De Nicola, R., Vaandrager, F.: Three logics for branching bisimulation. J. ACM (JACM) 42(2), 458–487 (1995)
Esparza, J., Kiefer, S., Schwoon, S.: Abstraction refinement with Craig interpolation and symbolic pushdown systems. J. Satisfiability Boolean Model. Comput. 5, 27–56 (2008)
Högberg, J., Maletti, A., May, J.: Backward and forward bisimulation minimization of tree automata. Theor. Comput. Sci. 410(37), 3539–3552 (2009)
Jurado, M., Palamidessi, C., Smith, G.: A formal information-theoretic leakage analysis of order-revealing encryption. In: Proceedings of the 34th IEEE Workshop on Computer Security Foundations, CSFW 2021. IEEE Computer Society (2021)
Jurado, M., Smith, G.: Quantifying information leakage of deterministic encryption. In: Proceedings of the 2019 ACM SIGSAC Conference on Cloud Computing Security Workshop, pp. 129–139 (2019)
Kanellakis, P.C., Smolka, S.A.: CCS expressions, finite state processes, and three problems of equivalence. Inf. Comput. 86(1), 43–68 (1990)
Karimpour, J., Isazadeh, A., Noroozi, A.A.: Verifying observational determinism. In: Federrath, H., Gollmann, D. (eds.) SEC 2015. IAICT, vol. 455, pp. 82–93. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18467-8_6
Klebanov, V.: Precise quantitative information flow analysis - a symbolic approach. Theor. Comput. Sci. 538, 124–139 (2014)
Köpf, B., Basin, D.: An information-theoretic model for adaptive side-channel attacks. In: Proceedings of the 14th ACM Conference on Computer and Communications Security, pp. 286–296. ACM (2007)
Köpf, B., Smith, G.: Vulnerability bounds and leakage resilience of blinded cryptography under timing attacks. In: Proceedings of 23rd IEEE Computer Security Foundations Symposium (CSF), pp. 44–56. IEEE (2010)
Noroozi, A.A., Karimpour, J., Isazadeh, A.: Information leakage of multi-threaded programs. Comput. Electr. Eng. 78, 400–419 (2019)
Noroozi, A.A., Salehi, K., Karimpour, J., Isazadeh, A.: Secure information flow analysis using the PRISM model checker. In: Garg, D., Kumar, N.V.N., Shyamasundar, R.K. (eds.) ICISS 2019. LNCS, vol. 11952, pp. 154–172. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-36945-3_9
Phan, Q.S., Malacaria, P., Păsăreanu, C.S., d’Amorim, M.: Quantifying information leaks using reliability analysis. In: Proceedings of the 2014 International SPIN Symposium on Model Checking of Software, pp. 105–108. ACM (2014)
Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (1994)
Salehi, K., Karimpour, J., Izadkhah, H., Isazadeh, A.: Channel capacity of concurrent probabilistic programs. Entropy 21(9), 885 (2019)
Salehi, K., Noroozi, A.A., Amir-Mohammadian, S.: Quantifying information leakage of probabilistic programs using the PRISM model checker. In: Emerging Security Information, Systems and Technologies, pp. 47–52. IARIA (2021)
Smith, G.: On the foundations of quantitative information flow. In: de Alfaro, L. (ed.) FoSSaCS 2009. LNCS, vol. 5504, pp. 288–302. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00596-1_21
Sproston, J., Donatelli, S.: Backward bisimulation in Markov chain model checking. IEEE Trans. Softw. Eng. 32(8), 531–546 (2006)
Zdancewic, S., Myers, A.C.: Observational determinism for concurrent program security. In: Proceedings of the 16th IEEE Computer Security Foundations Workshop, CSFW 2003, pp. 29–43. IEEE Computer Society (2003)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Case Study
In this section, we analyze a case study to show applicability and feasibility of the approach.
The Dining Cryptographers Protocol. We consider the dining cryptographers problem [11] to show how an attacker can deduce secret information through execution observations. The dining cryptographers problem was first proposed by David Chaum in 1988 as an example of anonymity and identity hiding [11]. In this problem, N cryptographers are sitting at a round table to have dinner at their favorite restaurant. The waiter informs them that the meal has been arranged to be paid by one of the cryptographers or their master. The cryptographers respect each other’s right to stay anonymous, but would like to know whether the master is paying or not. So, they decide to take part in the following two-stage protocol:
-
Stage 1: Each cryptographer tosses a coin and only informs the cryptographer on the right of the outcome.
-
Stage 2: Each cryptographer publicly announces whether the two coins that she can see are the same (‘agree’) or different (‘disagree’). However, if she actually paid for the dinner, then she lies, i.e., she announces ‘disagree’ when the coins are the same, and ‘agree’ when they are different.
Let the variable parity be exclusive-or (XOR) between all the announcements. If N is odd, then an odd number of ‘agree’s (parity = 1) implies that none of the cryptographers paid (the master paid), while an even number (parity = 0) implies that one of the cryptographers paid. The latter is reverse for an even N.
The payer can be either
-
i.
one of the cryptographers, i.e., \(Val_{payer} = \{ c_1,\ldots ,c_N \}\), or
-
ii.
the master (m, for short) or one of the cryptographers, i.e., \(Val_{payer} = \{ m,c_1,\ldots ,c_N \}\).
Assume an attacker who tries to find out the payer’s identity. The attacker is external, i.e., none of the cryptographers. This attacker can observe the parity and also the announcements of the cryptographers. All observable variables are concatenated to form a single public variable. The program model is an MC\(_\mathfrak {n}\) and we employ the proposed algorithms to compute the leakage.
The experimental results for the cases in which the coin probability is 0.5 are shown in Table 1. In this table, N denotes the number of cryptographers. \(\mathcal {M}^{\texttt {DC}_N}_{uni}\) and \(\mathcal {M}^{\texttt {DC}_N}_{uni} /\sim _b\) denote the MC of the program run with a uniform scheduler and the back-bisimulation quotient, respectively. Symbols \(\#st\) and \(\#tr\) denote the number of states and transitions, respectively.
Similar results for the coin probability of 0 or 1 are shown in Table 2. As shown in Tables 1 and 2, back-bisimulation results in impressive reductions of the state space. For example, when the coin probability is 0.5 (Table 1) reductions vary between \( 92\% \) and \( 99.5\% \).
Consider the last three cases of Table 1, where the coin probability is 0.5 and the payer is one of the cryptographers (\(Val_{payer} = \{ c_1,\ldots ,c_N \}\)). In these cases, the program leakage is 0. This shows that the attacker cannot identify the payer. This is why the dining cryptographers protocol is said to be secure in the context of anonymity.
The analysis results in Table 2 show that when the probability of the coin is 0 or 1, no matter whoever the payer is, the leakage is \(\log _2 |Val_{payer}|\), proving that the secret gets completely leaked and thus the attacker learns the identity of the payer.
B Related Work
The notion of back-simulation is similar to the notion of backward strong bisimulation considered by De Nicola and Vaandrager [15]. They use a different notion than our definition, as they only allow to move back from a state along the path representing the history that brought one into that state. Högberg et al. [17] defined and considered backward bisimulation minimization on tree automata, Sproston and Donatelli [32] considered a probabilistic version of backward bisimulation and studied the logical properties it preserves, and Cardelli et al. [9] who considered backward bisimulation in the stochastic setting of chemical reaction networks. None of these works use backward bisimulation in quantitative information flow.
Chen and Malacaria [12] model multi-threaded programs as state transition systems. They use Bellman’s optimality principle to determine the leakage bounds, i.e., minimal and maximal leakage occurred during possible program executions.
Phan et al. [27] propose to use symbolic execution, a verification technique which bounds runtime behavior of the program, thus mitigating state-space explosion problem. In state-space explosion problem, the amount of state-space of the program model gets too huge to store in the memory, thus making the analysis difficult. Phan et al. run symbolic execution to extract all symbolic paths of the program. Then, paths with a direct information flow are labeled. Finally, they use a model counting technique to count the number of inputs that follow direct-labeled paths, to compute channel capacity, which is an upper bound of the leakage over all possible distributions of the secret input.
Biondi et al. [8] use interval Markov chains to compute the channel capacity of deterministic processes. They reduce the channel capacity computation to entropy maximization, a well-known problem in Bayesian statistics.
Chothia et al. [14] have developed LeakWatch to approximate leakage of Java programs. LeakWatch is based on probabilistic point-to-point information leakage, in which the leakage between any given two points in the program from secret to public variables is computed.
Chadha et al. [10] employ symbolic algorithms to quantify the precise leakage from public to secret variables. They use Binary Decision Diagrams (BDDs) to model the relation between the inputs and outputs of the program. To do so, Moped [16], a symbolic model checker, is exploited to construct BDDs. Chadha et al. have implemented their method into a tool called Moped-QLeak.
Klebanov [22] uses symbolic execution in combination with deductive verification [5] and self-composition [4] to measure residual Shannon entropy and min-entropy of the secret input. Exploitation of deductive verification makes the analysis immune to the state-space explosion problem, but also makes it semi-automatic, as user-supplied invariants are needed for the analysis to proceed.
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Salehi, K., Noroozi, A.A., Amir-Mohammadian, S., Mohagheghi, M. (2022). An Automated Quantitative Information Flow Analysis for Concurrent Programs. In: Ábrahám, E., Paolieri, M. (eds) Quantitative Evaluation of Systems. QEST 2022. Lecture Notes in Computer Science, vol 13479. Springer, Cham. https://doi.org/10.1007/978-3-031-16336-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-16336-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16335-7
Online ISBN: 978-3-031-16336-4
eBook Packages: Computer ScienceComputer Science (R0)