Keywords

1 Introduction

These days the term IoT, which describes physical objects—“things”—connected to the Internet, is often used. At the same time Cyber-Physical Systems (CPS) [1, 2], which is closely related to the IoT, is also getting a lot of attention. CPS is about attaching many sensors to objects to be controlled in the real world, such as people and cars, and analyzing the data collected by these devices in cyberspace and feeding it back to the objects for more optimal control. These technologies will enable a variety of services that have never been available before.

In order to realize CPS/IoT society, a secure and safe networked relationship is needed to communicate. However, with new technology comes the risk of new cyber-attacks, for example, Advanced Persistent Threats (APTs) [3]. They target a specific individual or organization and continuously attack it with a combination of suitable attacks. Because they require a large amount of resources, these attacks are often carried out by huge organizations and have a significant impact on society. Since new technologies such as IoT and CPS have only been created for a short period of time, the vulnerabilities are undiscovered and the risk of a zero-day attacks to exploit them before a fix or countermeasure patch is made is high. APTs are often a combination of these zero-day attacks and are highly dangerous. This attack could allow the attacker to take ownership of the cloud to send signals to the device.

In this paper, we propose Moving Target Defense (MTD) [4] as a strategy for the administrator of the cloud which is vulnerable to APT and may be controlled by the attacker. Furthermore, we model a situation in which the device decides whether to trust a command from the cloud controlled by the defender using MTD or the static attacker, and find a Gestalt Nash equilibrium (GNE) through game-theoretic analysis. We clarify that MTD is an effective strategy in this situation. We created a proposed model using the CloudControl game [5, 6]. This game consists of the signaling game and the FlipIt game. The signaling game is a typical incomplete information dynamics game, which have been developed based on the study of two-player language game [7]. Many studies have utilized this game to model various security situations [8,9,10,11]. The Flipit game is a recently created game in response to the development of cloud systems [12]. This game is suited for studying systems attacked by APTs [13,14,15,16,17,18,19].

Because APTs persistently attack the system, we believe that the defenders can count backwards the time that the defenders have moved since the system’s IDs and passwords are no longer available. The attacker should use this information to conduct a dynamic attack. However, the proposed models in [5, 6] used simple and static attacker and defender strategies in the FlipIt game. Van Dijk proposed LM Attacker (LMA) and Defender playing with Exponential Distribution (DED) as dynamic strategies for attackers and defenders in the FlipIt game, respectively [12]. And Hyodo proposed the CloudControl game model that uses the above dynamic strategies in the FlipIt game and proved that GNE exists in the proposed model [20].

In this paper, we show that there is an effective strategy called Moving Target Defense (MTD) in addition to the defender’s strategy in the FlipIt game proposed in [12], and we propose the CloudControl game model using that strategy. We also show that GNE is present in that model as well. This can guide the optimal action of defenders and IoT devices against attackers (APTs) who launch advanced attacks. The results of this study will be useful for cyber-insurance, commercial investment and corporate policies.

The remainder of this paper is organized as follows. We proposes the CloudControl game with a defender using MTD in Sect. 2. Then we presents the results of the simulations performed to reveal the presence of GNE in the above proposed model in Sect. 3. We conclude the paper in Sect. 4.

2 Our Model

We model a cloud-based system in which the cloud is the target of APTs. In this model, an attacker capable of APTs can pay the cost and compromise the cloud. The defender, or the cloud administrator, can pay the cost and regain control of the cloud. The cloud sends a message to the device, denoted by \(\mathrm{r}\). The device can follow this message, but has an on-board control system to operate autonomously. So it is also possible to use the autonomous motion system without following the message from the cloud.

In this scenario, we uses the CloudControl game that combines two games, the FlipIt game and the signaling game. The FlipIt game takes place between the attacker and the defender, while the signaling game takes place between the possibly compromized cloud and the device. Specifically, the player who controls the resource in the FlipIt game will be the sender of the signaling game.

The model proposed in this study is the CloudControl game model played by a static attacker and a defender using Moving Target Defense (MTD), described below. We investigated whether MTD is an effective strategy against the static attacker (Fig. 1).

Fig. 1.
figure 1

(Hyodo, T., Hohjo, H., 2019)

The CloudControl game. The FlipIt game models the interaction between an attacker and a defender, or a cloud administrator, who compete for ownership of the cloud. The signaling game is played in which the player, who controls the cloud in the FlipIt game, sends a message to a device. The device then decides whether to trust or not to trust the message.

2.1 The Signaling Game in the Proposed Game Model

We describe the symbols used in this study.

  • Player: Sender (Cloud(\(t\))), Receiver (Device(\(r\)))

  • Type of the sender: \(T=\{t|{t}_{A}, {t}_{D}\}\)

  • Message: \(M=\{m|{m}_{L}, {m}_{H}\}\)

  • Action: \(A=\{a|{a}_{Y}, {a}_{N}\}\)

Player \({t}_{A}\) is the attacker and \({t}_{D}\) is the defender. In the CloudControl game, the type of the sender is determined by the equilibrium of the FlipIt game. Let \({m}_{L}\) and \({m}_{H}\) denote low and high risk messages, respectively. After receiving the message, the device chooses an action. Action \({a}_{Y}\) represents trusting the message from the cloud, and \({a}_{\mathrm{N}}\) represents not trusting it.

Let \({\sigma }_{{t}_{A}}^{S}\left(m\right)\), \({\sigma }_{{t}_{D}}^{S}\left(m\right)\) be the strategy in which player \(t_{A}\), \( t_{D}\) sends a message \(m\), and \({\sigma }_{r}^{S}\left(a|m\right)\) be the strategy in which the device \(\mathrm{r}\) takes an action \(\mathrm{a}\) when it receives a message \(\mathrm{m}\). Also let \({u}_{{t}_{A}}^{S}\left(m,a\right)\), \({u}_{{t}_{D}}^{S}(m, a)\) be the utilities players \({t}_{A}\), \({t}_{D}\) gain. Then the expected utilities \({\overline{u} }_{{t}_{A}}^{S}\left({\sigma }_{{t}_{A}}^{S},{\sigma }_{r}^{S}\right)\), \({\overline{u} }_{{t}_{D}}^{S}\left({\sigma }_{{t}_{D}}^{S},{\sigma }_{r}^{S}\right)\) in the signaling game of the attacker and defender is as follows.

$$ \overline{u}_{{t_{A} }}^{S} \left( {\sigma_{{t_{A} }}^{S} , \sigma_{r}^{S} } \right) = \sum\nolimits_{a \in A} {\sum\nolimits_{m \in M} {u_{{t_{A} }}^{S} \left( {m, a} \right)\sigma_{r}^{S} \left( {a{|}m} \right)\sigma_{{t_{A} }}^{S} \left( m \right)} } $$
(1)
$$ \overline{u}_{{t_{D} }}^{S} \left( {\sigma_{{t_{D} }}^{S} , \sigma_{r}^{S} } \right) = \sum\nolimits_{a \in A} {\sum\nolimits_{m \in M} {u_{{t_{D} }}^{S} \left( {m, a} \right)\sigma_{r}^{S} \left( {a{|}m} \right)\sigma_{{t_{D} }}^{S} \left( m \right)} } $$
(2)

Let \(\mu \left(t|m\right)\) be the belief that the receiver determines the type of the sender is \(t\) and \({\sigma }_{r}^{S}\left(t,m,a\right)\) be the utility that he gains when he receives the message \(m\), then his expected utility \({\overline{u} }_{r}^{S}\left({\sigma }_{r}^{S}|m, \mu \right)\) in the signaling game is as follows.

$$ \overline{u}_{r}^{S} \left( {\sigma_{r}^{S} |m, \mu } \right) = \sum\nolimits_{a \in A} {\sum\nolimits_{m \in M} {u_{r}^{S} \left( {t,m, a} \right)\mu (t|m) \sigma_{r}^{S} (a|m)} } $$
(3)

Let \(p\) be the probability that an attacker sends a message. The receiver’s belief that the sender is in state \(t\) when he receives the message \(m\) is as follows.

$$\mu \left({t}_{A}|m\right)=\frac{{\sigma }_{{t}_{A}}^{S}\left(m\right)p}{{\sigma }_{{t}_{A}}^{S}\left(m\right)p+{\sigma }_{{t}_{D}}^{S}\left(m\right)(1-p)}$$
(4)

Each player updates their strategy each game to maximize their own expected utility. We used the ARP model proposed by Bereby-Meyer & Erev [21] to update the strategy. This model is more human-like by learning with reference to the current and past reward values. The ARP model is described below.

The probability \({Q}_{n}(time)\) of taking a move \(n\) at \(\mathrm{time}\) is given by

$${Q}_{n}\left(time\right)=\frac{{q}_{n}(time)}{\sum {q}_{n}(time)}$$
(5)

\({q}_{n}(time)\) is the pure value at the move \(n\) and is updated with each passing \(time\).Let \({g}_{j}\) be the reward for choosing a move \(j\) at \(time\), then the renewal formula is given by

$${q}_{n}\left(time+1\right)=\mathrm{max}\left\{\upsilon , \left(1-\phi \right){q}_{n}\left(time\right)+{E}_{j}\left(n, {L}_{time}\left({g}_{j}\right)\right)\right\},$$
(6)

where \(\varphi \) is the forgetting rate and \(v\) is the guaranteed value. Also the functions \({E}_{j}\) and \({L}_{time}\) are given by

$$ E_{j} \left( {n,{ }L_{time} \left( {g_{j} } \right)} \right) = \left\{ {\begin{array}{*{20}l} {L_{time} \left( {g_{j} } \right)\left( {1 - \varepsilon } \right)} \hfill & {\left( {j = n} \right)} \hfill \\ {L_{time} \left( {g_{j} } \right)\varepsilon } \hfill & {\left( {otherwise} \right)} \hfill \\ \end{array} } \right. $$
(7)
$${L}_{time}\left({g}_{j}\right)={g}_{j}-\rho \left(time\right),$$
(8)

where the parameter \(\varepsilon \) is the weight of the reward. The \(\rho (time)\) in Eq. (8) is an important function in the ARP model. As mentioned above, the ARP model learns rewards and the function \(\rho (time)\) plays the role. It is given by

$$ \rho \left( {time + 1} \right) = \left\{ {\begin{array}{*{20}l} {\left( {1 - c^{ + } } \right)\rho \left( {time} \right) + \left( {c^{ + } } \right)g_{j} } \hfill & {(g_{j} \ge \rho \left( {time)} \right)} \hfill \\ {\left( {1 - c^{ - } } \right)\rho \left( {time} \right) + \left( {c^{ - } } \right)g_{j} } \hfill & {(g_{j} < \rho \left( {time)} \right)} \hfill \\ \end{array} } \right. $$
(9)

\({c}^{+}\) and \({c}^{-}\) are parameters representing the impact of the next reward when the reward \({g}_{j}\) was better and worse than the evaluation function \(\rho (time)\), respectively.

2.2 The FlipIt Game in the Proposed Game Model

The FlipIt game in the original CloudControl game is a two-player game in which the attacker and the defender compete for one shared resource along a timeline. In this paper, we envision a system in which a defender can prevent an attack by moving the resource through the network. In the next subsection, we describe the defender’s strategy in the proposed game model.

Moving Target Defense (MTD).

MTD is a defender’s strategy to migrate resources to node \(i\) through a fully connected network of \(n(n\ge 2)\) nodes with a probability of \(p(i)\). Defenders can use this strategy to prevent attackers from discovering vulnerabilities and critical resources in their systems. In other words, this model assumes a situation where the target resource is not visible to the attacker. For simplicity, we assume that the MTD in this study randomly migrates resources to all nodes (Fig. 2).

$$ p\left( i \right) = \frac{1}{n}, i = 1, \ldots ,n $$
(10)
Fig. 2.
figure 2

Moving Target Defense (MTD) when the number of nodes is 3. The defender can migrate a resource to other node, through a fully connected network. \(p\left(0\right)=p\left(1\right)=p\left(2\right)=p\left(3\right)=1/4\).

The FlipIt Game with MTD.

Let the number of nodes be \(n\). The rules of the FlipIt game in this case are as follows.

  • The game begins with the defender in control of the resource on \(node\,0\) (\(time=1\)).

  • Both players follow their own strategies at a certain \(time\) and determine whether to pay the moving cost.

  • When the defender moves, he takes the ownership of the resource and may or may not migrate it to another node.

  • When the attacker moves, he selects one node at random to attack. The attacker takes the ownership of the resource only if he attacks a node where the resource actually exists.

  • When both players move at the same time, the defender takes the ownership of the resource.

For each FlipIt game, the player who controls the resource becomes the sender and plays multiple signaling games with the device (the receiver).

The expected utilities \({\overline{u} }_{{t}_{A}}^{F}({\alpha }_{{t}_{A}},{\alpha }_{{t}_{D}})\), \({\overline{u} }_{{t}_{D}}^{F}({\alpha }_{{t}_{A}},{\alpha }_{{t}_{D}})\) of the FlipIt game for the attacker and defender in the proposed model is as follows.

$${\overline{u} }_{{t}_{\mathrm{A}}}^{S}\left({\alpha }_{{t}_{A}},{\alpha }_{{t}_{D}}\right)={\overline{u} }_{{t}_{A}}^{S}\frac{p}{n}-{k}_{{t}_{A}}{\alpha }_{{t}_{A}}$$
(11)
$${\overline{u} }_{{t}_{D}}^{S}\left({\alpha }_{{t}_{A}},{\alpha }_{{t}_{D}}\right)={\overline{u} }_{{t}_{D}}^{S}\left(1-\frac{p}{n}\right)-{k}_{{t}_{D}}{\alpha }_{{t}_{D}}$$
(12)

where \({\alpha }_{{t}_{A}},{\alpha }_{{t}_{D}}\) is the attacker’s and defender’s strategy in the FlipIt game, \({\overline{u} }_{{t}_{A}}^{S}, {\overline{u} }_{{t}_{D}}^{S}\) is the attacker’s and defender’s expected utility in the signaling game, \(p\) is the probability of the attacker controlling the resource at either node, and \({k}_{{t}_{A} }, {k}_{{t}_{D}}\) is the attacker’s and defender’s moving cost.

3 Numerical Experiments

In this study, numerical experiments were conducted to identify the existence of GNE. The value of the ARP model used to update the strategy of the signaling game was set to (\(\phi , \upsilon , \varepsilon , {c}^{+}, {c}^{-}, {q}_{n}(0)\)) = (\(0.001, 0.0001, 0.2, 0.01, 0.02, 1000\)) from [21].

The procedure of the game is as follows.

  • Step 1. Players are a static attacker and a defender that use MTD, and a device. At the start of the game, all players use a random strategy.

  • Step 2. The attacker and defender play the FlipIt game (\(time < 4000\)). Each time the player controlling the resource plays the signaling game with the device, and updates the signaling game strategy.

  • Step 3. From the expected utility of the signaling game, attackers and defenders find the FlipIt game strategy that maximizes the expected utility of the FlipIt game.

  • Step 4. The attacker and the defender reset the signaling game strategy and return to a random state.

We repeated Step 2 to Step 4 above 100 times to examine the variability of the expected utilities of attacker and defender in the signaling game and the strategies of both players in the FlipIt game. Also the signaling game was played enough times to reach equilibrium.

The gain in the signalling game was set up as shown in the Table 1. The number on the left is the sender’s (attacker or defender) gain and the number on the right is the receiver’s (device) gain.

Table 1. The gain of the attacker, the defender and the device in the signaling game.

We first experimented with fixed \(n=3\) and not fixed \({k}_{{t}_{A}}, {k}_{{t}_{D}}\). Figure 3 shows the result of the experiment for \({k}_{{t}_{A}}=20\), \({k}_{{t}_{D}}=15\). In the top graph, the red dots represent the attacker’s expected utility \({\overline{\mathrm{u}} }_{{t}_{A}}^{S}\) in the signaling game, and the blue dots represent the defender’s expected utility \({\overline{u} }_{{t}_{D}}^{S}\) in the signaling game, with the vertical axis representing the expected utility and the horizontal axis representing the number of sets. In the bottom graph, the red dots represent the attacker’s strategy \({\alpha }_{{t}_{A}}\) in the FlipIt game, and the blue dots represent the defender’s strategy \({\alpha }_{{t}_{D}}\) in the FlipIt game, with the vertical axis representing the strategy and the horizontal axis representing the number of sets. In this situation, the expected utilities of the attacker and defender in the signaling game and their strategies in the FlipIt game converged to a certain value, respectively. This indicates a convergence to GNE. The converged values were \({\overline{u} }_{{t}_{A}}^{S}=12\), \({\overline{u} }_{{t}_{D}}^{S}=16\), \({\alpha }_{{t}_{A}}=0.12\), and \({\alpha }_{{t}_{D}}=0.15\).

Fig. 3.
figure 3

The changes in the expected utilities \({\overline{u} }_{{t}_{A}}^{S}, {\overline{u} }_{{t}_{D}}^{S}\) and strategies \({\alpha }_{{t}_{A}}, {\alpha }_{{t}_{D}}\) with \(n=3, {k}_{{t}_{A}}=20, {k}_{{\mathrm{t}}_{D}}=15\).

Figure 4 shows the result of the experiment for \({k}_{{t}_{A}}=40\), \({k}_{{t}_{D}}=30\). In this situation, \({\alpha }_{{t}_{A}}={\alpha }_{{t}_{D}}=0.00\). This shows that the attacker and the defender have the strategy of not moving in the FlipIt game even if the signaling game’s utility conditions of Table 1. were met. From Eq. (11), (12), the expected utility of the FlipIt game is smaller as the moving cost increases. If they don’t benefit from attacking, they won’t bother attacking because they won’t have to.

Fig. 4.
figure 4

The changes in the expected utilities \({\overline{u} }_{{t}_{A}}^{S}, {\overline{u} }_{{t}_{D}}^{S}\) and strategies \({\alpha }_{{t}_{A}}, {\alpha }_{{t}_{D}}\) with \(n=3, {k}_{{t}_{A}}=40, {k}_{{t}_{D}}=30\).

Next, we experimented with fixed \({k}_{{t}_{A}}=20, {k}_{{t}_{D}}=15\) and not fixed \(n\). Figure 5 shows the result of the experiment for \(n=5\). In this situation, the expected utilities of the attacker and defender in the signaling game and their strategies in the FlipIt game converged to a certain value, respectively. This indicates a convergence to GNE. The converged values were \({\overline{u} }_{{t}_{A}}^{S}=12\), \({\overline{u} }_{{t}_{D}}^{S}=16\), \({\alpha }_{{t}_{A}}=0.09\), and \({\alpha }_{{t}_{D}}=0.07\).

Figure 6 shows the result of the experiment for \(n=10\). In this situation, \({\alpha }_{{t}_{A}}={\alpha }_{{t}_{D}}=0.00\). This shows that the attacker and the defender have the strategy of not moving in the FlipIt game. From their results, we found that even when the number of nodes \(n\) is large, the attacker chooses not to move. In this situation, that is to say, the attacker cannot find the actual location of the resource among the multiple nodes and gives up on attacking it. However, we don’t take into account the costs of building a fully connected network with multiple nodes and of migrating a resource. Therefore, in the real world, if the number of nodes \(n\) is large, the defender is likely to have to pay more costs.

Fig. 5.
figure 5

The changes in expected utilities \({\overline{u} }_{{t}_{A}}^{S}, {\overline{u} }_{{t}_{D}}^{S}\) and strategies \({\alpha }_{{t}_{A}}, {\alpha }_{{t}_{D}}\) with \(n=5, {k}_{{t}_{A}}=20, {k}_{{t}_{D}}=15\).

Fig. 6.
figure 6

The changes in the expected utilities \({\overline{u} }_{{t}_{A}}^{S}, {\overline{u} }_{{t}_{D}}^{S}\) and strategies \({\alpha }_{{t}_{A}}, {\alpha }_{{t}_{D}}\) with \(n=10, {k}_{{t}_{A}}=20, {k}_{{t}_{D}}=15\).

4 Conclusion and Future Work

In this paper, we proposed the cloud control game with a static attacker and a defender using MTD and showed that GNE exists in the proposed model. This equilibrium will help protect cloud-connected CPSs by revealing the frequency of attacks by attackers launching APTs in the future IoT/CPS society and the optimal strategies for MTD and IoT devices against these attackers.

However, the only thing revealed in this study is the presence of GNE in the proposed model. Its equilibrium equation is not clear.

In future work, it is important to find the equilibrium equation in the proposed model. We would also like to revisit a model that takes into account the cost of building the network and of migrating a resource. Furthermore, APTs in the real world are likely to launch more sophisticated attacks. Therefore, we want to clarify whether MTD is an effective strategy for defenders even against advanced and dynamic attackers, and whether GNE exists even in such a model.