Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The protection of privacy and data security is one of the main concerns of computer science. Security often falls down to the impossibility for an attacker to obtain a given secret value. Such an impossibility can be defined by non-interference [18]. However this definition rejects any program which publishes any variable whose value depends on the secret. For instance, publishing the results of an election when each individual vote is secret breaks non-interference. Such a yes/no approach does not consider that an attacker may have a partial information about a secret.

Information-theoretical techniques have the advantage of considering the secret not as an atomic object but as a known number of secret bits, allowing the definition of measures of effectiveness of an attack based on the amount of secret bits that the attack compromises. The amount of secret bits that are compromised by an attack are known as information leakage. Leakage depends on the information about the secret known to the attacker before the attack, known as prior information and usually modeled as a prior probability distribution over the values of the secret. This approach dates back to Denning [16]. Different information leakage measures have been introduced, including Shannon leakage [19], min-entropy leakage [30] and the g-leakage [1], encoding different security properties of the system. All the tools we compare in this work can compute both Shannon and min-entropy leakage with no significant difference in computation time. We compare them on the computation of Shannon leakage, but we expect no significant difference if the tools were to be compared on the computation of min-entropy leakage.

Among the results in the field, Köpf et al. studied leakage of side-channel attacks [2, 21], while Boreale has defined leakage for process calculi [6] and characterized the best attack strategy of an adaptive attacker [9].

In this work we compare the three tools that compose the state of the art in Shannon leakage computation: QUAIL [5] equipped with a new trace analysis algorithm, LeakWatch [13], and Moped-QLeak [11].

QUAIL is a recent but already well established tool for precise and exact information leakage computation, and later tools by multiple authors have used it as comparison [13, 25]. Nonetheless, QUAIL needs to produce a full Markov chain model of the system-attacker scenario to produce a meaningful result.

LeakWatch is the most recent of a family of tools for statistical approximation of information leakage developed by Chothia et al. [12, 14]. LeakWatch analyzes Java code, requiring the programmer to annotate the code of the system with secret and observable values, then simulates the system repeatedly using the Java Virtual Machine and estimates the correlation between the secret and observable values. LeakWatch follows a different perspective than QUAIL and Moped-QLeak, since LeakWatch computes an approximated result, contrarily to QUAIL’s arbitrary precision and Moped-QLeak’s fixed double precision. LeakWatch’s approximation can be improved at the cost of running more simulations, which is time expensive.

Moped-QLeak [11] uses the Moped tool [17] to compute a symbolic summary of the program under analysis as an Algebraic Decision Diagram (ADD), and then computes the leakage using the ADD representation. The symbolic approach is very efficient when the program can be represented in a compact way using ADDs, and in these cases Moped-QLeak is significantly faster than the other tools.

The first contribution of this paper is a new algorithm for precise information leakage computation, which is able to compute information leakage following the same Markovian semantics we introduced previously [3, Sect. 4] by performing a depth-first search analyzing the execution traces of the system. We implemented this algorithm in QUAIL, allowing it to compute leakage without having to build the full Markov chain model of the system.

As a second contribution, we provide two scalable case studies for the benchmarking of quantitative information leakage tools. Both case studies arise from real-life privacy problems. The case studies are anonymity of user data in Smart Grids and privacy comparison in voting protocols.

Smart Grids are in the family of interconnected objects and have received a growing interest over the last years. Our case study is based on a real system deployed at fortissFootnote 1 labs [22]. In our case study, we focus on the negotiation between a set of prosumers and an aggregator. The prosumers (PROducer conSUMERS) consume, store and produce energy. To stabilize the grid, the prosumers negotiate with the aggregator how much energy they will exchange with the grid for the next period of time. This exchange might expose the consumption of one of the prosumers, and, in turn, allow a potential attacker to deduce that a house is empty or that a factory has increased its production. In that example, the difficulty is to decide not only whether the exact information can be deduced or not, but also how well an attack can approximate it. Measuring the leakage indicates how much of the secret is unveiled through the negotiation phase. We show that increasing the number of prosumers also increases security.

In the voting protocols comparison case study, we compare two different voting protocols: the Single Preference, where each voter expresses a single vote for his favorite candidate, and the Preference Ranking, where each voter ranks all candidates from his most to his least favorite. In both cases there are multiple voters and candidates, and the secret is the preference of each voter. Both protocols have a large number of possible secrets and outputs, so they become cumbersome to analyze even with a small number of voters and candidates.

We compare the tools on their computation time, precision of the answer returned, scalability and usability. Since no tool works strictly better than the others in all category, we determine the problem classes that are better suited to be analyzed by each tool.

2 Background: Information Leakage

The information leakage of a program is a measure quantifying how much information an attacker infers about the program’s secret by observing the program’s output. We assume that the attacker has access to the program’s source code, unlimited computational power, and some prior information about the secret (e.g. the bit size of the secret). Leakage corresponds to the reduction in the attacker’s uncertainty about the secret.

Let h be a random variable with values in a domain D(h) representing the value of the secret and o be a random variable with values in a domain D(o) modeling the value of the output. The information the attacker has on the secret is modeled by a discrete probability distribution, i.e. for a discrete random variable X a function \(\pi :D(X)\rightarrow [0, 1]\) such that \(\sum _{x\in D(X)}\pi (x)=1\). The information that the attacker has on the secret before the attack is modeled by the prior distribution \(\pi (h)\) while the information the attacker has after observing the output is modeled by the posterior distribution \(\pi (h|o)\). We consider the prior distribution as given, since it is part of the model of the attacker. Let U be an uncertainty measure defined on probability distributions, including Shannon entropy, min-entropy, and g-vulnerability. Computing leakage for the measure U reduces to computing the prior and posterior distributions and applying the formula

$$\begin{aligned} {Leakage}_U&=U(\pi (h))-U(\pi (h|o))\end{aligned}$$
(1)
$$\begin{aligned}&=U(\pi (h))-\sum _{\bar{o}\in D(o)}\pi (o=\bar{o})U(\pi (h|o=\bar{o})) \end{aligned}$$
(2)

In this work we want to compute Shannon leakage, and thus we use Shannon entropy as the measure of uncertainty: \(U(\pi (x))=\sum _{x\in D(X)}\pi (x)\log _2\pi (x)\).

3 Quantitative Information Leakage Tools

We introduce the quantitative information leakage computation tools that will be tested on the case studies.

3.1 QUAIL

QUAIL [5] computes Shannon and min-entropy leakage of a program written in an imperative WHILE language. The language allows the user to program naturally with constants, arrays, and loops, which is syntactic sugar for QUAIL’s if-goto Markovian semantics. Given the prior information of the attacker, QUAIL represents the program as a Markov chain, and computes the information leakage from the Markov chain with an arbitrary number of precision digits.

Syntax. We present the syntax of the QUAIL imperative language we use to model programs. We distinguish the variables in public and private variables according to their level of abstraction: public variables have precise values, while private variables have sets of possible values. The observable variable o is public, while the secret variable h is private. Let \(\mathtt {v}\) range over names of variables and x range over reals from [0; 1]. Let L (resp. H) be a set of assignments of values to public variables (resp. assignments of sets of values to private variables).

Let label, \( l_0 \) and \(l_1\) denote any program point and f (g) pure arithmetic (Boolean) expressions. Assume a standard set of expressions and the following statements:

The first statement declares a public variable v of size n bits with a given value k, while the second statement similarly declares a private variable h of size n bits with allowed values ranging from 0 to \(2^n-1\). We assume a standard type system to verify that values of n-bit variables do not exceed \(2^n-1\). The third statement assigns to a public variable the value of expression f depending on public variables; assignment to private variables or depending on the value of private variables is not allowed. The fourth statement assigns zero with probability x, and one with probability \(1\!-\!x\), to a 1-bit public variable. The return statement outputs values of all public variables and terminates. A conditional branch first evaluates an expression g dependent on private and public variables, and it jumps to label \(l_a\) if g is true and to label \(l_b\) otherwise. Since only a single variable scope exists, loops can be added in a standard way as syntactic sugar.

As a contribution, we present a method to compute information leakage of a program by analyzing the execution traces of the program. We introduce the Markovian semantics of our language by means of a function computing the successors of each state. Then we explain how we perform a depth-first exploration of the traces of the system, obtaining a set Q of final states that represent all possible output states of the system. Finally, we show how to compute the posterior entropy from Q.

Fig. 1.
figure 1

Successor function for each state in the Markovian trace semantics.

Semantics. The Markovity of the semantics allows us to define states containing enough information to determine a probability distribution over all traces originating from any state.

Definition 1

A state in a Markovian semantics is a tuple (pcLHp) where \(pc\in \mathbb {N}^0\) is the program counter, L a set of assignments of values to public variables, H an set of assignments of sets of values to private variables, and \(0\le p\le 1\) is the probability of the state.

The initial state of the semantics is \((1,\emptyset ,\emptyset ,1)\). The set of successor states of a state (pcLHp) depends on the statement pointed at by the program counter pc. States pointing to a return statement have 0 successors, states pointing to a rand or if statement have up to 2 successors, and any other state has 1 successor. The successor function defining the semantics of the language is shown in Fig. 1. If a state has zero probability, e.g. when a conditional is always true, it is removed from the set of successors.

We call a state final if it has no successors, meaning that the program counter of the state points to a return statement. The trace analysis terminates when a final state is encountered. This means that the analysis terminates if and only if the program under analysis terminates, so non-terminating programs cannot be analyzed with this technique. Non-termination of the program under analysis raises other issues in leakage computation [4], and is not considered here.

Conditional states and random assignment states have two successors. The successors of a conditional state correspond to the guard being true or false. Since the guard can depend on the secret, both successor states may have positive probability depending on the prior distribution \(\pi (h)\) on the secret, which is available at this time. The successors of a random assignment state correspond to the bit being set to 0 or 1. In both cases the probability of each successor state is computed and one of the successor states with non-zero probability is chosen to be the next step in the analysis. Successors with probability zero are dropped, pruning unreachable leaves from the trace tree.

Because of the Markovian semantics, each state contains the information to compute the probability distribution over its outgoing transitions. The probability of a trace is computed as the product of the probabilities of the transitions composing the trace. In the successor states of the conditional statement, H|g( LH ) (resp. \(H|\lnot \) g( LH )) represents the assignment function obtained by removing from the sets of values assigned to the private variables those values that contradict (resp. respect) the guard g( LH ). Similarly, Pr(g( LH ) \(\,|\pi (h))\) (resp. \(Pr(\lnot \) g( LH ) \(\,|\pi (h)))\) refers to the probability that the guard g( LH ) is true (resp. false) considering the prior probability distribution \(\pi (h)\) on the private variables.

When the analysis of a single trace terminates, the corresponding final state \((\bar{pc},\bar{L},\) \(\bar{H},\bar{p})\) is produced, in which pc points to a return statement. The sets of allowed values assigned to the private variables in \(\bar{H}\) have been appropriately reduced to account for the conditional statements visited by the trace.

Depth-First Trace Exploration. We perform a depth-first exhaustive exploration of the execution traces of the system, starting from the initial state \((1,\emptyset ,\emptyset ,1)\). Each trace is explored until it gets to a final state, then the final state gets added to the multiset Q of final states. When all traces have been explored, the full multiset of final states Q of the system is produced. We then use Q to compute the posterior entropy of the system using Algorithm 1 presented below. The leakage of the system is computed as the difference between the prior and posterior entropy, as explained in Sect. 2.

Note that the exploration also depends on the prior distribution \(\pi (h)\): values of the secret with a probability zero in the prior distribution are not explored. This behavior is intended, as is avoids unnecessarily exploring traces that have probability zero.

The depth-first exploration algorithm can be parallelized to take advantage of multicore architectures and is implemented in the current release of QUAIL, available at http://project.inria.fr/quail. Since this new algorithm is hundreds of times faster than the previous QUAIL implementation, we consider is as the standard QUAIL algorithm.

Posterior Uncertainty Computation. We show how to compute the posterior uncertainty \(U(\pi (h|o))\) of a system with a secret h and an observable o, given the uncertainty measure U and a multiset Q of final states of the system. Q encodes the posterior joint probability of all variables in the system and can be produced by the depth-first exploration algorithm presented above.

Let (pcLHp) be a final state in Q, where L represents the assignments of given values to the public variables, H the assignments of sets of values to the private variables, and p the joint probability of such assignments. Since different traces may produce the same final assignments to variables (LH), the joint probability of these assignments is the sum of the probabilities of all such final states. To apply the formula (2) \(U(\pi (h|o))=\sum _{\bar{o}\in D(o)}\pi (o=\bar{o})U(\pi (h|o=\bar{o}))\), we need to compute the marginal probability distribution \(\pi (o)\) and for each observable output \(\bar{o}\in D(o)\) s.t. \(\pi (o=\bar{o})>0\) the corresponding conditional probability distribution on h, i.e. \(\pi (h|o=\bar{o})\).

figure a

Algorithm 1 computes \(\pi (o)\) and each \(\pi (h|o=\bar{o})\) by analyzing a multiset of final states. For each state (pcLHp) the value of the observable variable o and set of values of the secret variable h are analyzed (lines 2–8). The probability of observing the value \(\bar{o}\) of the observable variable in the state is increased by p (line 4), and the probability of observing each of the n values of the secret variable conditioned on \(\bar{o}\) is increased by p / n (line 6). Finally, the probability on each subdistribution \(\pi (h,o=\bar{o})\) is normalized to 1 by dividing it by \(\pi (o=\bar{o})\) to obtain the conditional probability \(\pi (h|o=\bar{o})\) (line 9) since .

Theorem 1

Algorithm 1 terminates and outputs the posterior uncertainty \(U(\pi (h|o))\) of the posterior distribution represented by Q.

3.2 LeakWatch

LeakWatch [13] estimates the leakage of a Java program with secrets and observations by running it several times for each possible value of the secret and inferring a probability distribution on the observations for each secret. The tool automatically terminates the analysis when the precision of the estimation is deemed sufficient, but different termination conditions can be used.

For small secrets, LeakWatch gives reliably approximates the leakage of complex Java programs. For larger secret, i.e. more than 10 bits, LeakWatch takes more time to return a value. However, the user can decide an acceptable error level for the tool to reduce the computation time necessary to obtain an answer. Also, if the tool is terminated prematurely, it can still provide an answer, even if it will be potentially quite imprecise. This makes LeakWatch the only tool of the three considered that can always provide an answer in a time-limited scenario, since QUAIL and Moped-QLeak generate a leakage result only if they complete their execution.

Finally, LeakWatch provides many command-line options for tuning the analysis parameters. In particular, one of the options displays the current estimation of the leakage at regular intervals, which can be very useful when developing.

Syntax and Usage. The syntax is the same as the Java language, with the additional commands secret(name,value) to declare a secret with a given name and value, and observe(value) to declare an observation of a given value. The analysis evaluates how much information leaks from the secret to the observable values. In particular, LeakWatch can compute leakage from a point of a program to another point of the program, and not necessarily from the start to the termination of the program.

To run LeakWatch, a Java programa annotated with secret and observable statements has to be compiled linking the LeakWatch library:

javac -cp leakwatch-0.5.jar:. MyClass.java

The tool is then run passing the name of the compiled class as a parameter:

java -jar leakwatch-0.5.jar MyClass

The tool returns its leakage estimate for the Java program. Normally LeakWatch determines automatically when it has run enough executions. We have used the -n parameter to fix the number of executions of the program when we experimented with different precisions and computation times.

3.3 Moped-QLeak

Moped-QLeak [11] uses the Moped tool [17] to compute a symbolic Algebraic Decision Diagram (ADD) representation of the summary of a program, which contains the relation between the inputs and outputs of the program. Moped-QLeak then computes Shannon or min-entropy leakage from this ADD representation using two algorithms introduced by the authors. To obtain the ADD representation of the program, Moped basically performs a fix-point iteration.

Moped’s ability to build a symbolic representation of a program depends on the program’s complexity. When such representation is computed, Moped-QLeak computes the information leakage with a small time overhead. On the other hand, some programs are not easy to reduce to a symbolic representation, and in this case Moped-QLeak’s computation does not terminate within a reasonable time.

The ADD-based representation of probability distributions allows Moped-QLeak to analyze examples with large secret and observation spaces. In particular, the authors test it with 32-bit secrets and observables, whereas QUAIL’s computation time tends to be exponential in the size of the observables and LeakWatch’s in the size of the secret. This suggests that the ADD approach is a key improvement on the state of the art, allowing the analysis tools to analyze off-the-shelf programs using 32- and 64-bit variables.

3.4 Syntax and Usage

The tool analyzes programs written in a variant of Moped’s Remopla language. We provide here a simplified version of the syntax used by Moped-QLeak.

The if and do constructs from Remopla, originally non-deterministic in Moped, have been made deterministic in Moped-QLeak. The language has also been enriched with a probabilistic choice operator, pchoice which allows the programmer to probablistically define the next statement (e.g. by giving a probability \( prob \) to each statement). Remopla supports loops, arrays and integers of arbitrary size. The language is normally used to encode systems for model checking against temporal logics.

The language does not provide constructs to declare secrets and observables, but assume that all global variables are at the same time secret and observable. More precisely, the initial values are considered as the input and the final values as the output. In practice, a variable is made secret by assigning it the same value in all final states.

Moped-QLeak is executed on a Remopla file MyFile.rem by calling

mql -shannon MyFile.rem

where -shannon specifies that the tool will compute and return the Shannon leakage.

4 Case Studies

We evaluate the three tools described in the previous Section with two scalable case studiesFootnote 2. The case studies have been chosen because they model real-life systems and the results computed are representative of realistic security concerns. In order to compare them, we consider the following criteria:

  • Speed. Evaluating the time required by the tool to provide a result;

  • Accuracy. Evaluating the precision of the result returned by the tool;

  • Scalability. Evaluating how the tool behaves on larger instances of the case studies;

  • Usability. Evaluating the easiness of modeling and the usefulness of the error messages from the compiler.

Fig. 2.
figure 2

Smart grid overview

4.1 Case Study A: Smart Grids

A Smart grid is an energy network where every node may produce, store and consume energy. Nodes are called prosumers (PROducer consSUMERS). The Living Lab demonstrator [22] is an instance of such a prosumer, whose data can be accessed onlineFootnote 3. The prosumers periodically negotiate with an aggregator in charge of balancing the consumption and production among several prosumers. Figure 2 depicts a grid with 3 prosumers. Each prosumer declares its plan, that is, how much it intends to consume or produce during the next period of time. The aggregator sends to each house the value indicating the excess of energy production or consumption. An excess of 0 indicates that the plans are feasible and terminates the negotiation. Otherwise, the prosumers adapt their plan accordingly and send the updated version. Smart grid and smart sensors raise several security and privacy concerns. The platform can ensure the information cannot flow directly between prosumers [10]. However, stability requires a feedback from the aggregator that potentially conveys information about other prosumer, where only the software can limit information leakage. In general, knowing the consumption of a particular household may reveal some sensitive information about the house (presence of people in the house, type of electrical devices ...). Therefore, the consumption of a prosumer should remain secret. The privacy of a prosumer with respect to the aggregator can be ensured in several ways [29]. However, each prosumer receives some information about the consumption of other prosumers through the excess value sent back by the aggregator.

An attacker might use the information obtained through the grid in order to decide whether a given house is occupied or not. In our scenario, we assume different types of houses with different consumptions. Each house is modeled by a private boolean value, which is true if the house is occupied. An occupied house consumes a fixed amount of energy, according to its type. An empty house does not consume anything. Table 1 presents how much a given house consumes, in two different cases that we consider.

Table 1. Consumption of houses wrt size

For this experiment, we assume that the attacker observes the global consumption of the quarter. We consider different targets for the attack and thus different secrets. Either the attacker targets a single house of a given type (i.e. S, M or L) and only the bit corresponding to the presence in that house is secret, or the attacker wants to obtain informations about all the houses and the whole array of bits indicating the presence in each house is secret.

Usability. We model the above scenario in the three tools. We consider two versions depending on the target. The program is rather simple to model, the only noticeable difference between the tools language is the declaration of unobservable variables. When targeting all the houses, the secret is an array of boolean. When targeting a single house, the secret is a single boolean. Both targets are supported by all the tools. However, the presence in the other houses is not a secret, but still an unknown and unobservable input of the program. In QUAIL, the private keyword allows the programmer to declare directly such variables. With LeakWatch, we chose these values randomly but do not declare them as secret. In Moped-QLeak, we choose these values randomly, as in LeakWatch.

Table 2 presents the leakage for the Smart grid case study. The first two columns indicate the case, as presented in Table 1 and the number of houses in the model. For a model with N houses, there are N / 3 houses of each type. The columns S, M and L indicates the leakage of the variable representing the presence in a house of the corresponding type. The column “Global leakage” contains the leakage of the whole array of presence information bits and the column “Global leakage/bit” indicates the average leak per bit of secret.

Table 2. Leakage of presence information through the global consumption
Table 3. Time to compute or approximate the leakage for a large house
Table 4. Average relative error and computation time over 100 runs for computing the leakage of the presence in a large house within 12 houses in Case B.

In Case B with only 3 houses, the presence information can be deduced from the global consumption information, which is indicated by a leakage of 1 for each presence bit. Otherwise, the average leakage per bit from a global attack is more important that the information obtained by focusing on a single house. This means than obtaining information about the whole array, for instance the number of occupied houses, is easier than obtaining information about a single bit, i.e. presence information of a single house. In both cases, the leakage, and thus the loss of anonymity of prosumers, diminishes when the number of houses increases.

Speed. In Table 3 we show the time needed by QUAIL, LeakWatch and Moped-QLeak for computing the leakage of the presence information in a house of size L. Moped-QLeak takes around 20 ms to compute this value, LeakWatch takes between 300 and 500 ms and QUAIL takes between 100 and 1700 ms, depending on the size of the model. Furthermore, Moped-QLeak and QUAIL compute the exact leakage value, whereas LeakWatch computes an approximation. For a more precise comparison, we need to take precision into account.

Accuracy. We compare QUAIL, LeakWatch and Moped-QLeak on computing the leakage of the presence information of a single large house, in Case B. QUAIL takes 1.7 s to compute the exact leakage. With the default parameters, LeakWatch takes 0.4 s to compute an approximation with a relative error of 14 % (average on 100 runs). It requires 500 to 700 simulations.

To compare execution times with respect to errors, we did an additional experiment, where we requested LeakWatch to run more simulations. For each requested number of simulations we provide in Table 4 the average relative error (over 100 runs) and the time needed for the computation. We see that for an equivalent amount of time, LeakWatch provides a result with a relative error of 4 to 6 %, whereas QUAIL returns the exact result. Moped-QLeak is the fastest and most precise.

Scalability. Finally, we evaluate the scalability of the tools by increasing the number of houses until the analysis time reaches 1 h. For this experiment, we evaluate the leakage of the presence information, in Case B, for a single house of size L (1 bit secret), or for all the houses simultaneously (N bits of secret). The results are shown in Table 5.

Table 5. Maximal size analyzable in one hour

We see that LeakWatch can handle a very large number of houses when computing the leakage from a small secret, but is not much more scalable than QUAIL with a large secret. Recall that LeakWatch provide an approximation of the leakage, whereas QUAIL and Moped-QLeak provide the exact value. Moped-QLeak scales relatively well with both a small and a large secret to analyze.

4.2 Case Study B: Voting Protocols

In an election, each voter is called to express his preference for the competing candidates. The voting system defines the way the voters express their preference: either on paper in a traditional election, or electronically in e-voting. After the votes have been cast, the results of the vote are published, usually in an aggregated form to protect the anonymity of the voters. Finally, the winning candidate or candidates is chosen according to a given electoral formula.

In this section we present two typologies of voting, representing two ways in which the voters can express their preference: in the Single Preference protocol the voters declare their preference for exactly one of the candidates, while in the Preference Ranking protocol each voter ranks the candidate from his most favorite to his least favorite.

Single Preference. This protocol typology models all electoral formulae in which each of the N voters expresses one vote for one of the C candidates, including plurality and majority voting systems and single non-transferable vote [24]. The votes for each candidate are summed up and only the results are published, thus hiding information about which voter voted for which candidate. The candidate or candidates to be elected are decided according to the electoral formula used.

The secret is an array of integers with a value for each of the N voters. Each value is a number from 0 to \(C-1\), representing a vote for one of the C candidates. The observable is an array of integers with the votes obtained by each of the C candidates.

The protocol is simple, and its information leakage can be computed formally, as shown by the following lemma:

Lemma 1

The information leakage for the Single Preference protocol with n voters and c candidates corresponds to

$$\begin{aligned} - \sum _{k_1+k_2+...+k_c=n} \frac{n!}{c^n k_1! k_2! \ldots k_c!} \log _2 \left( \frac{n!}{c^n k_1! k_2! \ldots k_c!}\right) \end{aligned}$$

While the lemma provides a formula to “manually” compute the leakage, it is very hard to find such a formula for an arbitrary process. Therefore automated tools should be employed.

Preference Ranking. This protocol typology models all electoral formulae in which each of the n voters expresses an order of preference of the c candidates, including the alternative vote and single transferable vote systems [24]. In the Preferential Voting protocol the voter does not express a single vote, but rather a ranking of the candidates; thus if the candidates are A, B, C and D the voter could express the fact that he prefers B, then D, then C and finally A. Then each candidate gets c points for each time he appears as first choice, \(c-1\) points for each time he appears as second choice, and so on. The points of each candidate are summed up and the results are published.

The secret is an array of integers with a value for each of the N voters. Each value is a number from 0 to \(C!-1\), representing one of the possible C! rankings of the C candidates. The observable is an array of integers with the points obtained by each of the C candidates.

Table 6. Voting protocols: percent of secret leaked by Single Preference (on the left) and Preference Ranking (on the right) computed with the QUAIL tool. Timeout is set at 1 h.

Experimental Results

Usability. We model the two voting systems, where the secret is the votes, and the observable the results. In single preference voting, the secret is an array of integer that represent individual votes. The range of this integer corresponds to the number of candidates. In QUAIL, it is possible to declare the range of a secret integer. In LeakWatch, each vote is drawn uniformly in the range and then declared secret. In Moped-QLeak, this case requires more work. The range of a secret integer depends on the chosen size bits. A special variable, out_of_domain, is set to true if one of the votes is not in the valid range and the corresponding input is not considered. Furthermore, when using this variable, it’s not possible to use local variables, which is indicated by the error message “The first computed value is not a constant.”. The impossibility to use local variables and the imprecision of the error message increased considerably the modelling time.

For the Preferential Voting, we were not able to produce a Moped-QLeak program that terminates. We suspect that Moped is unable to compute a symbolic representation of the Preferential Voting protocol due to its inherent complexity. Indeed, this program decodes an integer between 0 and the factorial of the number of candidates into a sorted list of the candidates, to assign the corresponding points to the candidates.

Table 7. Percent error of the leakage obtained by LeakWatch relatively to the exact value for Single Preference (on the left) and Preference Ranking (on the right). Timeout is set to 1 h.

Accuracy. Table 6 shows the percentage of the secret leaked by the Single Preference and Preference Ranking protocols for different numbers of voters and candidates. The results for 2 candidates are identical, since in this case in both protocols the voters can vote in only 2 different ways. The results obtained for Single Preference are correct with respect to the formula stated in Lemma 1. The table shows that the Single Preference protocol leaks a larger part of its secret than the Preference Ranking protocol.

Table 8. Time in seconds needed to compute the leakage for Single Preference with QUAIL (left), LeakWatch (middle) and Moped-QLeak (right). Timeout is set to 1 h.

Table 7 shows the percent error of the leakage value obtained with LeakWatch. Indeed, LeakWatch computes an approximation of the leakage based on simulation, whereas QUAIL and Moped-QLeak compute the exact value. Furthermore, the leakage computed by LeakWatch for a given program may change at each invocation of the tool, because LeakWatch samples random executions. Here, LeakWatch slightly underestimates the leakage, by 2 to 5 %.

Speed. We compare the execution time of the three tools in Table 8 for Single Preference and in Table 9 for Preference Ranking. These execution times have been obtained on a laptop with a i7 quad-core running at 3.3 GHz and 16 GB of RAM. The results show that QUAIL is significantly faster than LeakWatch on these examples. This shows that QUAIL performs better than LeakWatch with large secrets, in line with previous results [5]. For single preference, Moped-QLeak clearly outperforms QUAIL on large examples. The results for Moped-QLeak in the preferential voting case studies are missing from Table 9 because the tool did not terminate in this case study, even with the smallest instance of 2 voters and 2 candidates.

Table 9. Time in seconds needed to compute the leakage for Preference Ranking with QUAIL (on the left) and LeakWatch (on the right). Timeout is set to one hour.

Scalability. Concerning the Scalability, we see that QUAIL and Moped-Qleak are more scalable than LeakWatch, since the latter times out in Tables 8 and 9. For Single Preference, QUAIL stops at 7 voters and 6 candidates, due to an error. Moped-QLeak finished with 12 voters and 6 candidates but returned -inf as leakage value, instead of 11. With 9 voters and 6 candidates, the result has approximately 1 bit of errors. Therefore, we conjecture that the -inf value is a precision error. On these examples, no tool seems to be much more scalable than the others, due to various reasons.

5 Conclusions

In this paper, we provided two scalable case studies for the leakage computation and used them for comparing the existing tools able to perform such an approximation. We have compared the state of the art in information leakage tools – LeakWatch, QUAIL and Moped-QLeak – on their speed, accuracy, scalability and usability in addressing the case studies. We summarize here our observations and experience with the tools.

Speed. Concerning the execution time, Moped-QLeak is usually the fastest tool in providing an exact result. However, in the preferential voting example Moped-QLeak was unable to terminate its analysis in less than one hour even for the smallest instances of the problem. We can note that LeakWatch is faster than QUAIL on small secrets (e.g. 1 bit) but QUAIL outperforms LeakWatch on larger secrets. Finally, LeakWatch is very fast on small secrets, but its result and evaluation of the system (presence or absence of leakage) tends to change between different executions of the tool.

Accuracy. The tool giving the most accurate result is QUAIL because it supports arbitrary precision. LeakWatch provides an approximated result and therefore is imprecise by definition. Moped-QLeak does not implement arbitrary precision analysis, and consequently suffers from approximation errors. For instance, we found an error in the order of 1 bit on the majority voting protocol with 9 voters and 6 candidates, for which we have the exact result. Also, for the same protocol with 12 voters and 6 candidates Moped-QLeak reported a leakage of negative infinity bits, which we conjecture is caused by approximation and division-by-zero errors in the computation.

Scalability. For small secrets, LeakWatch scales better than the other tools analyzed. In the Smart Grid case study, we managed to analyze the leakage for an aggregation of 150000 houses in less than one hour.However, the returned result is obtained statistically, and varies from one execution to the other.

For large secrets, the winner is Moped-QLeak, as it scales much better than QUAIL on the Smart Grid case study. However, for the voting protocol, QUAIL manages to analyze only two voters less than Moped-QLeak (6 against 8), before approximation issues make Moped-QLeak’s results incorrect.

Usability. Since all the tools studied here are academic tools who are still in their early years, usability is not necessarily the main concern of their developers. However, we have found some important discrepancies in this area.

The most usable tool is LeakWatch, especially if the program to analyze is already written in Java. In that case, it is sufficient to annotate the program in order to declare the secrets and the observable values. Furthermore, LeakWatch has a command line option to display the current results based on the traces collected so far, which is convenient when the analysis time is very long.

QUAIL has its own language, which is an imperative WHILE language with arrays and constants. QUAIL allows the explicitly declaration of variables as observable, public, private or secret, with a specific range of allowed values. Furthermore, QUAIL has a command-line option to change the values of constants declared in a program, which comes in handy when performing batch experimentation.

Using Moped-QLeak has been more problematic because of some issues with the Remopla language. In particular, the range of the secrets cannot be determined, instead the program has to raise an out_of_domain exception when the values are not in the expected range. Also, all integer variables have the same length, defined in the DEFAULT_INT_BITS constant. Finally, some error messages are misleading and slow down the modelling process.

To conclude, Moped-QLeak is the fastest tool, because it uses a suitable data structure (Algebraic Decision Diagrams) for representing the executions. However, this data structure may become a problem with complex program, as shown by the preference ranking example, which Moped-QLeak cannot analyze, contrarily to the other tools. The other tools, QUAIL and LeakWatch are more usable. QUAIL, which also has its own dedicated language, provide some specific constructs for declaring the visibility and range of a variable.

We believe that reimplementing QUAIL with a better data structure for probability distributions, like the ADDs used in Moped-QLeak, would provide a fast and usable tool for performing leakage analysis. The statistical techniques used in LeakWatch should also be integrated to allow approximated results for large instances.

6 Related Tools

We discuss some security-related automated tools and their relation with the work presented in this paper.

The STA tool developed by Boreale et al. [7] is similar in intent to the algorithms we propose, since it also uses symbolic trace analysis. More recent work by Boreale et al. [8] introduces a semiring-based semantics able to perform compositional quantitative analysis of non-deterministic systems, but no tool is available at the moment.

Efficient tools have been developed by Phan and Malacaria for information-theoretical analysis of systems. The tools squifc [25], QILURA [26], and jpf-qif [27] use SMT solving to perform a symbolic analysis of C or Java code and to compute channel capacity of programs, where the channel capacity is the maximum information leakage achievable for any prior distribution over the secret and randomness of the system. Since the tools compute channel capacity and not Shannon leakage of randomized systems, they have not been included in our comparison.

McCamant et al. have obtained interesting results in detecting leakage of information by implicit flow by applying dynamic and quantitative taint analysis techniques [20, 23]. Again, their techniques have not been included in this evaluation since they do not compute information-theoretical leakage measures like Shannon and min-entropy leakage.