Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Security compromises which cause information leakage of business or consumer data are a particularly challenging problem in the cybersecurity domain. Affected businesses frequently struggle to recover once the consequences of a breach become apparent such as a competitor outpacing them in a race for the next innovation, or data troves appearing on cybercriminal marketplaces and eventually impacting consumer confidence. For example, data about small and medium-sized businesses suggests that approximately 60 % fail within six months after a data breach [3].

Businesses struggle for multiple reasons to prevent information leakage. In particular, the increasing prevalence of well-motivated, technically capable, and well-funded attackers who are able to execute sophisticated multi-stage attacks and advanced persistent threats (APT) poses significant challenges to prevent information leakage. Such attacks may take time to execute, but they will eventually succeed with high likelihood. In a recent talk, the Chief of Tailored Access Operations, National Security Agency, characterized the mindset of these attackers in the following way: “We are going to be persistent. We are going to keep coming, and coming, and coming [12].”

Further, carefully orchestrated attacks as employed during corporate, cyber-criminal or nation-state sponsored cyber-espionage and sabotage (see Stuxnet [4]) change our understanding of the likelihood to reliably detect stealthy attacks before it is too late. Estimates for how long attacks remain undetected are dire. For example, a recent presentation by the CEO of Microsoft suggested that the time until detection of a successful attack is on average over 200 days [21].

All of these observations emphasize the need to reason about the suitable response to stealthy attacks which cause continued information leakage. We know that perfect security is too costly; and even air-gaped systems are vulnerable to insider risks or creative technical approaches. Another mitigation approach is to limit the impact of attacks by resetting system resources to a presumed safe state to lower the chances of a perpetual undetected leak. However, in most scenarios such actions will be costly. For example, they may impact productivity due to system downtime or the need to reissue cryptographic keys, passwords or other security credentials. As such, determining the best schedule to reset defense mechanisms is an economic question which needs to account for monetary and productivity costs, strategic and stealthy attacker behavior, and other important facets of information leakage scenarios such as the effectiveness of the system reset. To address this combination of factors, we propose a new game-theoretic model called FlipLeakage.

In our proposed model, an attacker has to engage in a sustained attack effort to compromise the security of a system. Our approach is consistent with two scenarios. On the one hand, the attacker may conduct surveillance of the system to collect information that will enable a security compromise, e.g., by pilfering traffic for valuable information, or by gathering information about the system setup. On the other hand, the attacker may incrementally take over parts of a system, such as user accounts, parts of a cryptographic key, or collect business secrets to enable further attack steps. In both scenarios, persistent activity and the accumulated information will then enable the attacker to reach her objective to compromise the system and to acquire the primary business secret; if the defender does not interfere by returning the system to a presumed safe state.

In Fig. 1, we provide an initial abstract representation of the studied strategic interaction between an attacker and a defender. The attacker initiates sustained attack efforts at \(t_1\), \(t_2\), and \(t_3\) right after the defender’s moves, where each time she also starts gaining information about the system. After accumulating sufficient information about the system, the attacker will be able to compromise it. The attacker’s benefit until the security compromise is completed is represented as a triangle, which represents the value of the leaked information during the attack execution. After the compromise, the attacker continues to receive benefits from the compromised system which is represented as a rectangle.

The defender can take a recovery action (to reset the resource to a presumed safe state) and can thereby stop the attack. In our model, we consider the scenario when the defender only partially eliminates the foothold of the attacker in the system. In Fig. 1 those defensive moves occur at \(t_1\), \(t_2\), and \(t_3\). Further, the defender cannot undo any information leakage that has already taken place during an attack.

In our model, we focus on periodic defensive moves for the defender. That means the time between any two consecutive moves is assumed the same motivated by practical observations for security policy updates of major software vendors such as Microsoft and Oracle which we will discuss in detail in Sect. 3. Within this context, we aim to determine the defender’s best periodic defensive strategies when the moves of the attacker are unobservable to the defender, i.e., the attacks succeed to be stealthy. At the same time, we assume that the attacker can observe the defender’s moves. The latter assumption rests on two observations. On the one hand, the attacker will be cut off from access to a partially compromised system when a recovery move takes place. On the other hand, many defensive moves may actually be practically observable for attackers, e.g., when a patch for a software system becomes available which makes a particular attack strategy impractical. The scenario under investigation is a security game of timing, e.g., we are studying when players should move to act optimally.

Fig. 1.
figure 1

FlipLeakage is a two-player game between an attacker and a defender competing with each other to control a resource. \(t_1\), \(t_2\), and \(t_3\) represent the defender’s move times. During the time when the attacker launches her attack, she incrementally benefits from information leakage which is shown as red triangles. (Color figure online)

In the following, we provide a brief summary overview over our contributions.

  • We develop a game-theoretic model titled FlipLeakage. In our model, an attacker will incrementally take ownership of a resource (e.g., similar to advanced persistent threats). While her final objective is a complete compromise of the system, she may derive some utility during the preliminary phases of the attack. The defender can take a costly periodic mitigation move and has to decide on its optimal periodic timing.

  • We consider the scenario when the defender only partially eliminates the foothold of the attacker in the system. Further, the defender cannot undo any information leakage that has already taken place during an attack.

  • We derive optimal strategies for the agents in our model and present numerical analyses and graphical visualizations. One of our findings corroborates an intuition: the higher the defensive cost, the slower the defender’s periodic move rhythm. Moreover, our numerical observations imply that the defender moves faster when the attacker’s average time to totally compromise the defender’s system is lower.

In the presence of stealthy attacks and information leakage, defenders have to set a schedule for updating and resetting their defense mechanisms without any feedback about the occurrence of attacks. This poses significant challenges for the design of new methods to mitigate such attacks. The objective of our theoretical model is to provide a systematic approach for the defender’s best schedule to reset his system to a presumed safe state to lower the chances of a perpetually undetected leak. As such, our work provides important steps towards building a rigorous model for an optimal defender’s response to these unknowns.

Roadmap: The rest of our paper is organized as follows. We discuss related work in Sect. 2. In Sect. 3, we develop the FlipLeakage model followed by payoff calculations in Sect. 4. We analyze our proposed model in Sect. 5. In Sect. 6, we present numerical examples. Finally, we conclude our paper in Sect. 7.

2 Related Work

Game theory is widely used in cybersecurity and privacy scenarios to study interdependencies [7, 10, 13, 27], and dynamic interactions between defenders and attackers of varying complexity [5, 17, 19]. One recently emphasized aspect of security games is the consideration of when to act to successfully mitigate attacks. In particular, the issue of optimally timing defensive actions to successfully thwart stealthy attacks has attracted attention in the cybersecurity domain with the introduction of the FlipIt game [2, 29] which broadens the games of timing literature initiated in the cold-war era [1, 28]. In what follows, we provide a brief description of the FlipIt game as well as theoretical follow-up research.

FlipIt is a two-player game between a defender and an attacker competing with each other to control a resource which generates a payoff to the owner of the resource. Moves to take over the resource, i.e., flips, are costly [2, 29]. In [29], the authors studied the FlipIt game with different choices of strategy profiles and aimed to calculate dominant strategies and Nash equilibria of the game in different situations. Pham and Cid [26] extended the FlipIt game by considering that players have the ability to check the state of the resource before their moves.

Feng et al. [6] and Hu et al. [9] modified the FlipIt game by considering insiders in addition to external adversaries. Zhang et al. [31] studied the FlipIt game with resource constraints on both players. Pawlick et al. extended the FlipIt game with characteristics of signaling games [25]. Wellman and Prakash developed a discrete-time model with multiple, ordered states in which attackers may compromise a server through cumulative acquisition of knowledge rather than in a one-shot takeover [30].

The original FlipIt paper assumed that the players compete with each other for one resource. Laszka et al. [14] addressed this limitation by modeling multiple contested resources in a game called FlipThem. Other authors extended this game by considering a threshold for the number of contested resources which need to be compromised to achieve the attacker’s objective [18]. In a similar way, a variation of the game has been proposed with multiple defenders [24]. Laszka et al. [15, 16] studied timing issues when the attacker’s moves are non-instantaneous. Moreover, they considered that the defender’s moves are non-covert and the attacker’s type can be targeting and non-targeting. Johnson et al. [11] investigate the role of time in dynamic environments where an adversary discovers vulnerabilities based on an exogenous vulnerability discovery process and each vulnerability has its corresponding survival time.

Complementing these theoretical analyses, Nochenson and Grossklags [22] as well as Grossklags and Reitter [8] study human defensive players when they interact with a computerized attacker in the FlipIt framework.

Our work differs from the previous FlipIt literature regarding two key considerations. First, we take into account the problem of information leakage and propose a more realistic game-theoretic framework for defender’s best time to update his defense mechanism. We propose a model in which an attacker will incrementally take ownership of a resource. Note that the attacker’s goal is to compromise the defender’s system completely, but she may acquire already some benefit during the initial steps of her attack. Second, we consider the possibility of the defender’s defense strategy not being able to completely eliminate the attacker’s foothold in the system. As a result, our work overcomes several significant simplifications in the previous literature which limited their applicability to realistic defense scenarios.

3 Model Definition

In this section, we provide a description of the FlipLeakage model which is a two-player game between a defender (\(\mathcal {D}\)) and an attacker (\(\mathcal {A}\)). We use the term resource for the defended system, but also for the target of the attack which will leak information during the attack and after the successful compromise. The attack progresses in a stealthy fashion. However, the defender can regain partial control over a compromised resource by taking a defensive recovery move (e.g., a variety of system updates).

In the FlipLeakage model, we emphasize the following aspects which we will discuss below: (1) uncertainty about the time of compromising the defender’s resource entirely, (2) process of information leakage, (3) quality of defensive moves, (4) strategies of both players, and (5) other parameters which are necessary for our model.

Uncertainty About Attack Launch and Success Timings: In FlipLeakage, the defender is the owner of the resource at the beginning of the game. The resource is in a secure state, when it is completely controlled by the defender. However, due to the stealthy nature of many practically deployed attacks, e.g., related to cyber-espionage and advanced persistent threats, it is reasonable to assume that the defender cannot acquire any information about the time when an attack is launched as well as its success [21].

In contrast, we assume that the attacker can observe the time of a defender’s move. One motivating practical example for this consideration is that many software companies publicly announce the arrival of new patches for previously discovered vulnerabilities. Hence, an attacker could infer when a certain system weakness is not available anymore. It follows that we model asymmetry with respect to knowledge between the two players.

Furthermore, we differentiate between the time of launching an attack and the time of an attack’s full effectiveness (i.e., the resource is completely compromised). It is worth mentioning that the value of this time difference is not known to both the defender and the attacker. Hence, this time difference is represented by a random variable \(t_{\mathcal {A}}\) with probability density function \(f_{\mathcal {A}}(t_{\mathcal {A}})\). The value of \(t_{\mathcal {A}}\) depends on many factors such as the defender’s defense strategy and the attacker’s ability to compromise the defender’s system.

The gap between these two factors can be interpreted as the attacker requiring a nontrivial amount of time and effort to control the resource completely, e.g., to gather leaked information from the resource and to conduct subsequent attack steps. Further, the time of launching an attack can be understood as the time that the attacker starts to gather information from the defender to execute the attack successfully (e.g., by conducting surveillance of the system setup or pilfering traffic to collect information that will enable a security compromise). For simplicity, we assume that the value of \(t_{\mathcal {A}}\) is chosen according to a random variable, but it is constant during each round of the attack. For future work, we are going to consider the case where the values of \(t_{\mathcal {A}}\) are different for each round of the attack. Note that we assume that other important parameters of the game are common knowledge between the players. The extension of the framework to uncertainty about game-relevant parameters is subject of future work

Process of Information Leakage: After initiation of the attack move, the attacker’s reward until a complete compromise is accomplished is based on the percentage of the whole resource which is currently controlled by the attacker. For this purpose, we consider a function \(g_{\mathcal {A}}(t)\) (which is increasing on the range [0, 1]). \(g_{\mathcal {A}}(t)\) can also be interpreted as the normalized amount of leaked information accessible to the attacker over time which can be used by her to improve her attack effectiveness. Recall that the time of completing an attack successfully is represented by a random variable \(t_{\mathcal {A}}\). It follows that the function \(g_{\mathcal {A}}(t)\) should be dependent on \(t_{\mathcal {A}}\). In doing so, we define a general function \(g_{\mathcal {A}}(t)\) reaching to 1 (i.e., the amount at which the attacker would control the whole resource completely) at one unit of time. We represent, as an example, a simple version of this function in the left-hand side of Fig. 2. To represent the described dependency, we use then the function \(g_{\mathcal {A}}\left( t/t_{\mathcal {A}}\right) \) for the reward calculation for the attacker during the time of completing the attack successfully, i.e., as shown on the right-hand side of Fig. 2.

Fig. 2.
figure 2

The attacker’s reward function during the time of completing an attack successfully depends on \(t_{\mathcal {A}}\). To show this dependency in our model, we define a function as shown on the left-hand side of this figure with one unit of time to reach 1. The figure on the right-hand side is \(g_{\mathcal {A}}\left( t/t_{\mathcal {A}}\right) \) representing this dependence.

Defense Quality: In FlipLeakge, we consider the quality of the defender’s recovery action (or alternatively the ability of the attacker to transfer information from a previous attack to the next attempt). That is, the defender’s recovery action does not guarantee regaining complete control over the resource, so that the attacker has an initial advantage (during the next attack attempt) and retains a foothold in the system. In other words, the defender’s defense strategy cannot entirely eliminate previous attacks’ effects. Motivating examples for this imperfect recovery model are advanced persistent threats. These attacks are typically driven by human staff who intelligently make use of any available and gathered information during the next multi-stage attack step which may include an initial compromise, foothold establishment, reconnaissance, etc. In this scenario, any recovery move by the defender will frequently only partially remove the attacker from the system, or at the very least cannot eliminate any information advantage by the attacker. In the FlipLeakage game, we introduce a new random variable, i.e., \(\alpha \) with range [0, 1], to represent the fraction of retained control over the previously compromised resource by the attacker after the defender’s recovery move.

In the worst case, the defender’s recovery move does not impact the level of the resource being controlled by the attacker (i.e., \(\alpha = 1\)). In contrast, \(\alpha = 0\) represents the situation when the defender’s recovery is perfect. Then, the attacker has to start with a zero level of knowledge during her next attack. We model \(\alpha \) as a continuous random variable with PDF \(f_{\alpha }(.)\) in which \(\alpha \) chooses values between zero and one, i.e., \(\alpha \in [0,1]\). Note that in the FlipLeakage model, the attacker never starts with a higher level than the level attained in the most recent compromise attempt, i.e., we assume that defense moves are not counterproductive. For simplicity, we assume that the random variable \(\alpha \) takes its value after the first attack and it remains constant during the game. For future work, we will consider the case where the values of \(\alpha \) are completely independent from each other in each step of the attack.

Players’ Strategies: In FlipLeakage, we assume that the defender moves according to periodic strategies, i.e., the time interval between two consecutive moves is identical and denoted by \(\delta _{\mathcal {D}}\). In what follows, we provide two examples to show that in practice, several major software vendor organizations update their security policies in a periodic manner to underline the practical relevance of this assumption.

The first example that we take into account are Microsoft’s security policy updates which are known as Patch Tuesday, i.e., according to [20], “Microsoft security bulletins are released on the second Tuesday of each month.” We visualize the time differences among security updates from March 14th, 2015, until March 12th, 2016, which is shown in Fig. 3(a). In this figure, the vertical axis represents the number of security updates for each update instance. On the horizontal axis, 0 represents the first security update we take into account which took place on March 14th, 2015. Based on this figure, Microsoft security policy updates are almost perfectly periodic. We only observe two dates with out-of-schedule security updates. These two security updates are corresponding to an update for Internet Explorer and a vulnerability in a Microsoft font driver which allowed remote code execution.

Another example are Oracle’s critical patch updates. These updates occur in January, April, July, and October of each year. To visualize the time differences between updates, which are shown in Fig. 3(b), we consider Oracle’s critical patch updates from 13 July, 2013, to January 19, 2016, based on available information at [23]. We calculate the time differences between two consecutive patch updates in terms of days and divided this number by 30 in order to calculate an approximate difference in months. In this figure, 1 along the vertical axis represents the occurrence of a patch update. We observe that Oracle’s policy for critical patch updates is almost periodic.Footnote 1

Fig. 3.
figure 3

In practice, many organizations update their system according to periodic strategies. As examples, we provide two organizations: (1) Microsoft and (2) Oracle.

In the FlipLeakage model, we assume that the attacker moves right after the defender. We follow with this assumption the results of [16] who showed that in the scenario of a defender with a periodic strategy, the best strategy for the attacker, who has the ability to observe the defender’s defense strategy, is to move right after the defender.

Other Parameters: The cost of the defender’s recovery moves and the attacker’s attack moves are represented by \(c_{\mathcal {D}}\) and \(c_{\mathcal {A}}\), respectively, and we assume that they do not change over time. Examples of the defender’s moves are changes of passwords, reinstallations of systems, and the application of new patches. Taking steps to incrementally infer cryptographic keys, brute-force passwords, or to inject malware are examples of the attacker’s moves.

Once the attacker controls the resource completely, she receives an immediate reward which is represented by a constant value \(I_{\mathcal {A}}\). The rationale behind the introduction of this parameter is that once the attacker infers the defender’s secret such as a cryptographic key, she can, for example, decrypt secret messages which she has collected.

For the time that the attacker (defender) controls the resource completely, we assume that the defender’s (attacker’s) reward is equal to zero and the attacker (defender) receives \(B_{\mathcal {A}}\) (\(B_{\mathcal {D}}\)) per unit of time controlling the resource. For example, these incremental earnings for the attacker represent newly arriving messages which can be decrypted with the compromised key. Note that the resource is controlled by the attacker completely after a successful attack and before the next recovery move by the defender.

4 Payoff Model

In this section, we develop the payoff functions for the FlipLeakage model based on what we presented in Sect. 3.

The time required to execute an attack successfully is defined by a continuous random variable with PDF \(f_{\mathcal {A}}\). We consider one of the realizations of this random variable as \(t_{\mathcal {A}}\). Moreover, the time between two consecutive defender’s moves is represented by \(\delta _{\mathcal {D}}\). Based on the \(t_{\mathcal {A}}\) realization, we have two possible cases, i.e., \(t_{\mathcal {A}} \ge \delta _{\mathcal {D}}\) and \(t_{\mathcal {A}} < \delta _{\mathcal {D}}\). In what follows, we consider each of these two cases separately and then combine them according to the probability of each case to propose the payoff function.

Case 1: \(t_{\mathcal {A}} < \delta _{\mathcal {D}}\)

In this case, the attacker can complete her attack before the defender’s recovery move. Hence, she receives the immediate reward for compromising the resource completely, i.e., \(I_{\mathcal {A}}\), as well as the reward for controlling the resource completely, i.e., \(B_{\mathcal {A}}\).

In our model, we assume that the attacker’s control over the resource does not fall to zero right after the defender’s recovery move. As discussed in Sect. 3, we have introduced a new parameter, \(\alpha \), and described the resulting changes to players’ payoffs. For \(t_{\mathcal {A}} < \delta _{\mathcal {D}}\), the attacker controls the resource completely before the next recovery move by the defender. Then, right after the defender’s move, the attacker controls a fraction \(\alpha \) of the resource. For the remainder of the resource to be taken over, i.e., \(\left( 1 - \alpha \right) \), the attacker can gain control based on \(g_{\mathcal {A}} \left( t/t_{\mathcal {A}}\right) \). Hence, the attacker’s benefit for this period is then based on \(\alpha + \left( 1-\alpha \right) g_{\mathcal {A}}\left( t/t_{\mathcal {A}}\right) \). The attacker’s payoff is as follows:

$$\begin{aligned} u_{\mathcal {A}}^1(t_{\mathcal {A}},\alpha ,\delta _{\mathcal {D}}) = \dfrac{\displaystyle \int _{0}^{t_{\mathcal {A}}} \left( \alpha + \left( 1- \alpha \right) g_{\mathcal {A}}\left( \dfrac{t}{t_{\mathcal {A}}}\right) \right) dt + I_{\mathcal {A}} + B_{\mathcal {A}}(\delta _{\mathcal {D}} - t_{\mathcal {A}}) -c_{\mathcal {A}}}{\delta _{\mathcal {D}}}. \end{aligned}$$
(1)

In the above equation, based on our discussion in Sect. 3, the first term in the numerator represents the attacker’s benefit due to information leakage. Note that the utility function is divided by \(\delta _{\mathcal {D}}\), since this function is the average attacker’s payoff over time.

Since the defender’s move time is greater than the attacker’s time of completing an attack successfully, the defender only receives a partial benefit during the period when the attacker is in the process of completing her attack. Therefore, the defender’s payoff is as follows:

$$\begin{aligned} u_{\mathcal {D}}^1(t_{\mathcal {A}},\alpha ,\delta _{\mathcal {D}}) = \dfrac{\displaystyle \int _{0}^{t_{\mathcal {A}}} \left( 1-\left( \alpha + \left( 1 - \alpha \right) g_{\mathcal {A}}\left( \dfrac{t}{t_{\mathcal {A}}}\right) \right) \right) dt -c_{\mathcal {D}}}{\delta _{\mathcal {D}}}. \end{aligned}$$
(2)

Both payoff functions, i.e., Eqs. 1 and 2, are a function of \(t_{{\mathcal {A}}}\) which is a random variable with PDF \(f_{\mathcal {A}}\) as well as \(\delta _{\mathcal {D}}\). Therefore, we need to calculate the expected value of both payoff functions. Note that these expected payoff functions are conditional, i.e., they are a function of a random variable \(t_{\mathcal {A}}\) given that \(t_{\mathcal {A}} < \delta _{\mathcal {D}}\). The conditional expected payoffs for these two functions are calculated as follows:

$$\begin{aligned} u_{\mathcal {A}}^1 (\alpha ,\delta _{\mathcal {D}}) = \dfrac{\displaystyle \int _{0}^{\delta _{\mathcal {D}}} u_{\mathcal {A}}^1 (t_{\mathcal {A}},\alpha ,\delta _{\mathcal {D}}) f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}}{\displaystyle \int _{0}^{\delta _{\mathcal {D}}} f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}}, \end{aligned}$$
(3)
$$\begin{aligned} u_{\mathcal {D}}^1 (\alpha ,\delta _{\mathcal {D}}) = \dfrac{\displaystyle \int _{0}^{\delta _{\mathcal {D}}} u_{\mathcal {D}}^1 (t_{\mathcal {A}},\alpha ,\delta _{\mathcal {D}}) f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}}{\displaystyle \int _{0}^{\delta _{\mathcal {D}}} f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}}. \end{aligned}$$
(4)

Defender’s and attacker’s payoffs are both functions of \(\alpha \) and \(\delta _{\mathcal {D}}\). Finally, the probability of \(t_{\mathcal {A}} < \delta _{\mathcal {D}}\) is calculated as follows:

$$\begin{aligned} P[t_{\mathcal {A}} < \delta _{\mathcal {D}}] = \int _{0}^{\delta _{\mathcal {D}}} f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}. \end{aligned}$$
(5)

Case 2: \(t_{\mathcal {A}} \ge \delta _{\mathcal {D}}\)

In contrast to the previous case, the attacker cannot get the immediate reward as well as the benefit from controlling the resource completely. In this case, the attacker only reaches \(g_{\mathcal {A}}\left( \delta _{\mathcal {D}}/t_{\mathcal {A}}\right) \) level of control over the resource upon the defender’s recovery move, and her reward is then equal to \(\alpha g_{\mathcal {A}}\left( \delta _{\mathcal {D}}/t_{\mathcal {A}}\right) \) right after the defender’s move. The attacker gains her control for the rest of the resource, i.e., \(\left( 1 - \alpha \right) \), based on \(g_{\mathcal {A}}\left( t/t_{\mathcal {A}}\right) \). Hence, during the time between two consecutive defender’s moves, the attacker’s benefit is equal to \(\alpha g_{\mathcal {A}}\left( \delta _{\mathcal {D}}/t_{\mathcal {A}}\right) + \left( 1 - \alpha \right) g_{\mathcal {A}}\left( t/t_{\mathcal {A}}\right) \). Note that the upper integral bound changes into \(\delta _{\mathcal {D}}\) from \(t_{\mathcal {A}}\) compared to the previous case.

$$\begin{aligned} u_{\mathcal {A}}^2(t_{\mathcal {A}},\alpha ,\delta _{\mathcal {D}}) = \dfrac{\displaystyle \int _{0}^{\delta _{\mathcal {D}}}\left( \alpha g_{\mathcal {A}}\left( \dfrac{\delta _{\mathcal {D}}}{t_{\mathcal {A}}}\right) + \left( 1-\alpha \right) g_{\mathcal {A}}\left( \dfrac{t}{t_{\mathcal {A}}}\right) \right) dt -c_{\mathcal {A}}}{\delta _{\mathcal {D}}}. \end{aligned}$$
(6)

The defender’s payoff function is almost equivalent to Eq. 2 except the upper bound for the integral is changed into \(\delta _{\mathcal {D}}\). Hence, the defender’s payoff is as follows:

$$\begin{aligned} u_{\mathcal {D}}^2(t_{\mathcal {A}},\alpha ,\delta _{\mathcal {D}}) = \dfrac{\displaystyle \int _{0}^{\delta _{\mathcal {D}}} \left( 1 - \left( \alpha g_{\mathcal {A}}\left( \dfrac{\delta _{\mathcal {D}}}{t_{\mathcal {A}}}\right) + \left( 1-\alpha \right) g_{\mathcal {A}}\left( \dfrac{t}{t_{\mathcal {A}}}\right) \right) \right) dt -c_{\mathcal {D}}}{\delta _{\mathcal {D}}}. \end{aligned}$$
(7)

Both players’ payoffs are functions of \(t_{\mathcal {A}}\), \(\alpha \), and \(\delta _{\mathcal {D}}\). We take the conditional expectation over parameter \(t_{\mathcal {A}}\) in order to calculate the average payoffs with respect to \(t_{\mathcal {A}}\) for this condition. The resulting equations are:

$$\begin{aligned} u_{\mathcal {A}}^2 (\alpha ,\delta _{\mathcal {D}})= \dfrac{\displaystyle \int _{\delta _{\mathcal {D}}}^{\infty } u_{\mathcal {A}}^2 (t_{\mathcal {A}},\alpha ,\delta _{\mathcal {D}}) f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}}{\displaystyle \int _{\delta _{\mathcal {D}}}^{\infty } f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}}, \end{aligned}$$
(8)
$$\begin{aligned} u_{\mathcal {D}}^2 (\alpha ,\delta _{\mathcal {D}})= \dfrac{\displaystyle \int _{\delta _{\mathcal {D}}}^{\infty } u_{\mathcal {D}}^2 (t_{\mathcal {A}},\alpha ,\delta _{\mathcal {D}}) f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}}{\displaystyle \int _{\delta _{\mathcal {D}}}^{\infty } f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}}. \end{aligned}$$
(9)

Furthermore, the probability that the required time by the attacker to compromise the resource entirely is greater than the time between two consecutive recovery moves is given by:

$$\begin{aligned} P[t_{\mathcal {A}} \ge \delta _{\mathcal {D}}] = \int _{\delta _{\mathcal {D}}}^{\infty } f_{\mathcal {A}}(t_\mathcal {A}) dt_{\mathcal {A}}. \end{aligned}$$
(10)

By taking into account the probability of occurrence of each condition as well as their corresponding payoffs, we can calculate the defender’s and the attacker’s payoff functions which are represented by the following equations, respectively.

$$\begin{aligned} u_{\mathcal {D}} (\alpha ,\delta _{\mathcal {D}})= P[t_{\mathcal {A}} \ge \delta _{\mathcal {D}}] u_{\mathcal {D}}^2 (\alpha ,\delta _{\mathcal {D}}) + P[t_{\mathcal {A}} < \delta _{\mathcal {D}}] u_{\mathcal {D}}^1(\alpha ,\delta _{\mathcal {D}}), \end{aligned}$$
(11)
$$\begin{aligned} u_{\mathcal {A}} (\alpha ,\delta _{\mathcal {D}})= P[t_{\mathcal {A}} \ge \delta _{\mathcal {D}}] u_{\mathcal {A}}^2 (\alpha ,\delta _{\mathcal {D}}) + P[t_{\mathcal {A}} < \delta _{\mathcal {D}}] u_{\mathcal {A}}^1(\alpha ,\delta _{\mathcal {D}}). \end{aligned}$$
(12)

In the above equation, each player’s payoff is a function of \(\alpha \) and \(\delta _{\mathcal {A}}\). As mentioned before, \(\alpha \) is a random variable whose range is in [0, 1] with PDF \(f_{\alpha }(.)\). Therefore, we can calculate the expected value of the defender’s and the attacker’s payoff functions with respect to \(\alpha \) being represented in the following equations, respectively.

$$\begin{aligned} u_{\mathcal {D}} (\delta _{\mathcal {D}}) = \int _{0}^{1} u_{\mathcal {D}} (\alpha ,\delta _{\mathcal {D}}) f_{\alpha }(\alpha ) d\alpha , \end{aligned}$$
(13)
$$\begin{aligned} u_{\mathcal {A}} (\delta _{\mathcal {D}}) = \int _{0}^{1} u_{\mathcal {A}} (\alpha ,\delta _{\mathcal {D}}) f_{\alpha }(\alpha ) d\alpha . \end{aligned}$$
(14)

5 Analytical Results

In the previous section, we have developed the general payoff functions for the FlipLeakage model. Our payoff calculations are general and can be applied to many cybersecurity problems and we did not quantify any of the parameters being used in our model. For our analyses in this paper, we quantify \(g_{\mathcal {A}}(.)\), \(f_{\mathcal {A}}(.)\), and \(f_{\alpha }(.)\), but we believe that the concrete functions we use still allow for meaningful insights about the stealthy information leakage scenarios. The instantiations of the other parameters in our proposed models would be specific to the concrete scenario under consideration, e.g., the corresponding cost for each player as well as the benefits.

To model the time of the attacker completing her attack successfully, we consider an exponential distribution with rate parameter \(\lambda _{\mathcal {A}}\). The rationale behind choosing an exponential distribution for the random variable \(t_{\mathcal {A}}\) is the memoryless feature of this distribution. Due to the memoryless condition, if the defender knows that his system is not compromised entirely at a specific time, it does not give any further information to the defender about the time of the next potential compromise. Moreover, the exponential distribution is a widely accepted candidate to model waiting times for event-driven models. The exponential distribution with rate parameter \(\lambda _{\mathcal {A}}\) is as follows:

$$\begin{aligned} f_{\mathcal {A}}(t_{\mathcal {A}})= {\left\{ \begin{array}{ll} \lambda _{\mathcal {A}} e^{-\lambda _{\mathcal {A}}t_{\mathcal {A}}} &{} \text {if } t_{\mathcal {A}} \ge 0 \\ 0 &{} \text {if } t_{\mathcal {A}} < 0. \end{array}\right. } \end{aligned}$$
(15)

Moreover, for the random variable \(\alpha \in [0,1]\), we consider the uniform distribution, since the defender does not have any knowledge about the ability of the attacker to use previously leaked information and, accordingly, all values are possible with the same probability. The uniform distribution, \(f_{\alpha }(.)\), is represented in Eq. 16.

$$\begin{aligned} f_{\alpha }(\alpha )= {\left\{ \begin{array}{ll} 1 &{} \text {if } 0 \le \alpha \le 1 \\ 0 &{} \text {Otherwise}. \end{array}\right. } \end{aligned}$$
(16)

The attacker’s reward function during the time to launch her attack successfully can be represented by a linear function:

$$\begin{aligned} g_{\mathcal {A}}\left( \dfrac{t}{t_{\mathcal {A}}}\right) = {\left\{ \begin{array}{ll} \dfrac{t}{t_{\mathcal {A}}} &{} \text {if } 0 \le t \le t_{\mathcal {A}} \\ 0 &{} \text {Otherwise}. \end{array}\right. } \end{aligned}$$
(17)

In the following, we provide our lemmas and theorem based on our payoff calculation and the specification described above. First, the defender’s and the attacker’s best responses are stated in Lemma 1 and Lemma 2, respectively. Then, we propose the Nash equilibrium of the game being stated in Theorem 1.

Lemma 1

The defender’s best response is as follows:

  • The defender plays a periodic strategy with period \(\delta ^{\star }_{\mathcal {D}}\) which is the solution of Eq. 18, if the corresponding payoff is non-negative, i.e., \(u_{\mathcal {D}} (\delta ^{\star }_{\mathcal {D}}) \ge 0 \), and it yields a higher payoff compared to other solutions of Eq. 18.

    (18)
  • The defender drops out of the game (i.e., the player does not move anymore) if Eq. 18 has no solution for \(\delta _{\mathcal {D}}\).

  • The defender drops out of the game if the solutions of Eq. 18 yield a negative payoffs, i.e., \(u_{\mathcal {D}} (\delta _{\mathcal {D}}) < 0\).

Note that in Lemma 1, \(\varGamma (0,\lambda _{\mathcal {A}}\delta _{\mathcal {D}})\) represents a Gamma function which is defined as follows:

$$\begin{aligned} \varGamma (s,x) = \int _{x}^{\infty } t^{s-1} e^{-t} dt. \end{aligned}$$
(19)

Proof of Lemma 1 is provided in Appendix A.1.

Lemma 1 exhibits how we should calculate the defender’s time between his two consecutive moves. As we see in Eq. 18, the defender’s best response is a function of \(c_{\mathcal {D}}\) and \(\lambda _{\mathcal {A}}\).

Lemma 2 describes the attacker’s best response in the FlipLeakage game.

Lemma 2

In the FlipLeakage game model, the attacker’s best response is:

  • The attacker moves right after the defender if \(c_{\mathcal {A}} < M\left( \delta \right) \) where

    (20)
  • The attacker drops out of the game if \(c_{\mathcal {A}} > M\left( \delta \right) \).

  • Otherwise, i.e., \(c_{\mathcal {A}} = M\left( \delta \right) \), dropping out of the game and moving right after the defender are both the attacker’s best responses.

The proof of Lemma 2 is provided in Appendix A.2. This lemma identifies conditions in which the attacker should move right after the defender, not move at all, and be indifferent between moving right after the defender and not moving at all. Note that the attacker’s decision depends on \(c_{\mathcal {A}}\), \(\delta _{\mathcal {D}}\), \(\lambda _{\mathcal {A}}\), \(I_{\mathcal {A}}\), and \(B_{\mathcal {A}}\).

The following theorem describes the Nash equilibria of the FlipLeakage game based on our described lemmas.

Theorem 1

The FlipLeakage game’s pure Nash equilibria can be described as follows.

A. If Eq. 18 has a solution, i.e., \(\delta ^{\star }_{\mathcal {D}}\), yielding the highest positive payoff for the defender compared to other solutions (if other solutions exist), then the following two outcomes apply:

1- If \(c_{\mathcal {A}} \le M(\delta _{\mathcal {D}})\), then there is a unique pure Nash equilibrium in which the defender moves periodically with period \(\delta ^{\star }_{\mathcal {D}}\) and the attacker moves right after the defender.

2- If \(c_{\mathcal {A}} > M(\delta _{\mathcal {D}})\), then there exists no pure Nash equilibrium.

B. If Eq. 18 does not have a solution or the solutions of this equation yield a negative payoff for the defender, i.e., \(u_{\mathcal {D}} \left( \delta _{\mathcal {D}} \right) < 0\), then there exists a unique pure Nash equilibrium in which the defender does not move at all and the attacker moves once at the beginning of the FlipLeakage game.

The proof of Theorem 1 is provided in Appendix A.3.

In this theorem, in the first case, the defender’s cost is lower than his benefit when he moves according to the solution of Eq. 18 and the attacker’s cost is lower than Eq. 20. Hence, the attacker moves right after the defender’s periodic move. In the second case, if the defender moves periodically, it is not beneficial for the attacker to move at all. Therefore, it is better for the defender to not move at all. But, if the defender does not move at all, the attacker can move once at the beginning of the game and control the resource for all time. However, as a result, the defender should move in order to hinder this situation. Because of this strategic uncertainty, in this scenario a Nash equilibrium does not exist. The third case represents the situation where the defender’s benefit is lower than his cost for defending the resource. Then, it is beneficial for him to not move at all, and because of that the attacker has to move only once at the beginning of the game.

6 Numerical Illustrations

In this section, we provide selected numerical illustrations for our theoretical findings. First, we represent the defender’s best response curves, i.e., Eq. 18, as well as the defender’s payoff for different defender’s cost values, i.e., \(c_{\mathcal {D}}\), which are depicted in Fig. 4. Then, we illustrate the defender’s best responses for different values of \(c_{\mathcal {D}}\) and \(\lambda _{\mathcal {A}}\) in Fig. 5.

Fig. 4.
figure 4

The defender’s best response curves and the corresponding payoff functions for different values of \(c_{\mathcal {D}}\) are represented in (a) and (b), respectively. These two figures depict the situation that Eq. 18 has a solution, but the corresponding payoff may be negative.

We plot Eq. 18, i.e., the defender’s best response curve, for different values of \(c_{\mathcal {D}}\), i.e., \(c_{\mathcal {D}}\) = {0.2, 0.4, 0.7, 1, 1.5}, and \(\lambda _{\mathcal {A}}=0.3\) in Fig. 4(a). We illustrate the defender’s payoff for these values in Fig. 4(b), as well. For all of these different \(c_{\mathcal {D}}\)s, Eq. 18 has a solution. But as we see in Fig. 4(b), the defender’s payoffs are negative for \(c_{\mathcal {D}}\) = {0.7, 1, 1.5} for all values of \(\delta _{\mathcal {D}}\). Therefore, the defender will drop out of the game given these defense costs. For lower values of \(c_{\mathcal {D}}\), i.e., \(c_{\mathcal {D}}\) = {0.2,0.4}, the defender’s best responses are to move periodically with period 0.8711 and 1.4681, respectively. This also provides us with the intuition that the higher the defender’s costs are, the slower will be the defender’s moves. To examine this intuition, we calculate the defender’s best responses for different values of \(c_{\mathcal {D}}\).

Figure 5(a) represents the defender’s best response for different values of defense costs in which \(\lambda _{\mathcal {A}}=0.5\). This figure corroborates our intuition that the higher the defense costs are, the slower will be the defender’s move period. When the cost of defense is high, the defender’s best response is to drop out of the game which is represented as \(\delta ^{\star }_{\mathcal {D}} = 0\) in Fig. 5(a).

Fig. 5.
figure 5

The impact of \(c_{\mathcal {D}}\) and \(\lambda _{\mathcal {A}}\) on the defender’s best response

We are also interested to see the relation between \(\lambda _{\mathcal {A}}\) and \(\delta _{\mathcal {D}}\). We represent this relation in Fig. 5(b). It is worth mentioning that an exponential distribution with parameter \(\lambda _{\mathcal {A}}\) has mean being equal to \(1/\lambda _{\mathcal {A}}\). In the FlipLeakage game, a higher value of \(\lambda _{\mathcal {A}}\) means that the attacker will successfully compromise the defender’s system faster on average which is corresponding to \(1/\lambda _{\mathcal {A}}\). Figure 5(b) represents the defender’s best response for different values of \(\lambda _{\mathcal {A}}\) for specific defender’s cost, i.e., \(c_{\mathcal {D}} = 0.3\). This figure shows that the faster the attacker can completely compromise the defender’s system on average, the faster will be the defender’s periodic move. In other words, the defender moves faster when the attacker’s average time to successfully compromise the defender’s system is faster. But if the attacker’s average time to successfully compromise the defender’s system is too fast, the rational choice for the defender is to drop out of the game.

7 Conclusion

In this paper, we have proposed a novel theoretical model to provide guidance for the defender’s optimal defense strategy when faced with a stealthy information leakage threat. In our model, an attacker will incrementally take ownership of a resource (e.g., as observed during advanced persistent threats). While her final objective is a complete compromise of the system, she may derive some utility during the preliminary phases of the attack. The defender can take a costly mitigation move and has to decide on its optimal timing.

In the FlipLeakage game model, we have considered the scenario when the defender only partially eliminates the foothold of the attacker in the system. In this scenario, the defender cannot undo any information leakage that has already taken place during an attack. We have derived optimal strategies for the agents in this model and present numerical analyses and graphical visualizations.

We highlight two observations from our numerical analyses which match well with intuition. First, the higher the defender’s cost, the slower is the defender’s periodic move. The second observation is that the faster the attacker’s average time to compromise the defender’s system completely (i.e., higher \(\lambda _{\mathcal {A}}\)), the faster is the defender’s periodic move. In addition, our model also allows for the determination of the impact of less-than-optimal strategies, and comparative statements regarding the expected outcomes of different periodic defensive approaches in practice, when information about the attacker and her capabilities is extremely scarce. As this problem area is understudied but of high practical significance, advancements that allow a rigorous reasoning about defense moves against stealthy attackers are of potentially high benefit.

In future work, we aim to conduct theoretical and numerical analyses using insights from data about practical information leakage scenarios. However, our current study is an important first step to reason about frequently criticized system reset policies to prevent information leakage in high-value systems. Reset policies have to provide an expected utility in the absence of concrete evidence due to the stealthiness of attacks which can be challenging to articulate. Our work also illustrates the positive deterrence function of system reset policies from a theoretical perspective. Further, we aim to consider a more general case in which the values of \(t_{\mathcal {A}}\) and \(\alpha \) are different in each step of the attack. In future work, we will also consider the case where a defender (e.g., a software vendor) has the ability to provide out-of-schedule security updates besides the periodic one.