Keywords

1 Introduction

Researchers have recently addressed the challenges of protecting the smart grid from cyber security threats, which can significantly impact the power system as well as human life [1]. Several authors have pointed out the threats to smart grids, especially from the viewpoint of information security [14]. The need to assess such threats for the planning of security resources and mitigation plans is being increasingly recognized. Existing approaches include quantitative, qualitative, and hybrid assessments. The ultimate goal of the quantitative approach is to utilize probability theory and statistics to assign numerical probability values to threat likelihood [5]. While these methods can provide clear guidance about the threats, they are very difficult to implement and evaluate [6]. On the other hand, the qualitative techniques rely on a systematic expert analysis for providing qualitative output rather than a quantitative one [7]. Their main advantage is that they involve reliable expert reasoning, however, in many cases, the output is not detailed enough to help take clear decisions [8]. Recently, several hybrid models were proposed that combine the quantitative and qualitative methods and eliminate their weaknesses. Among the hybrid approaches, the Factor Analysis of Information Risk (FAIR) framework [9] is well-known and applicable to many risk and threat assessment situations, due to its effective yet simple practical guidelines. Instead of purely qualitative analysis, FAIR assesses threats by means of the Loss Event Frequency (LEF) concept using a five-point scale. However, if many threats need to be assessed, it is necessary to extend this scale further to differentiate threats in the same group. This is particularly relevant for smart grids as these are highly complex systems that are exposed to a wide variety of threats from diverse threat actors.

In this paper, we propose a method to construct a Bayesian network model based on FAIRs LEF concept and a look-up table for supporting the analysis in the context of the IRENE project (i.e. the resilience of the energy grid) [5]. The proposed Bayesian model is consistent with the FAIR’s look-up tables. However, the difference between the two is that our model provides a numerical output instead of a categorical one. Moreover, the method has several advantages due to its design as it:

  • Supports ranking threats in the same group. By providing systems managers with a numerical output, a threat’s LEF can be contrasted against other threats in the same group. The managers can therefore make better decisions regarding security countermeasures and mitigation plans;

  • Generates an output even from fuzzy inputs, for instance, when experts do not fully agree on specific threat parameters;

  • Illustrates how changes in the input data propagate through the network and contribute to the output;

  • Points out the most influential factor that, if lowered, could decrease the overall LEF by a greater margin than the others.

The remainder of this paper is organized as follows. Section 2 introduces our model to transform the FAIR framework to the Bayesian network reasoning. Section 3, presents experimental results and then discusses how to consider several threats in a smart grid configuration. Finally, Sect. 4 presents the conclusions.

2 Proposed Model

2.1 The FAIR Framework

This paper addresses how threats can be assessed using FAIR’s LEF concept using a reasoning structure for a number of threat factors. These factors include (i) Contact (C): the frequency within a defined timeframe that the attackers will come in contact with the asset, (ii) Action (A): the probability that an attacker will act against an asset once contact occurs, (iii) Threat Capability (Tcap): the probable level of force that an attacker is capable of applying against an asset, and (iv) Control (i.e. Resistance) Strength (CS): the strength of a control compared to a baseline measure of force. The reasoning structure between these factors is presented in Fig. 1. These FAIR constructs can be projected to the risk assessment constructs from NIST 800-30 Guide for Conducting Risk Assessments as shown in [10].

Fig. 1.
figure 1

The FAIR model’s LEF analysis structure (left) and the look-up table for deriving the LEF state (right)

FAIR encodes each threat factor by means of a five-point scale (i.e. Very Low, Low, Moderate, High, and Very High). It also provides a reference for estimating the state of the input factors. A more detailed explanation on how to derive FAIR’s input states can be found in [11]. After the cause states (Fig. 1, left) are established, FAIR provides reasoning tables to look up the effect state. The look-up table in Fig. 1 shows how the “LEF” factor can be derived from the “Threat Event Frequency” (TEF) and the “Vulnerability” (V) factors. If values for Contact, Action, Control Strength, and Threat Capability are provided, the TEF and Vulnerability states can be derived, leading to the LEF state.

2.2 Bayesian Network Approach to Transform a Structural Analysis

This paper applied the method proposed in [12], and developed a method to construct the Bayesian Conditional Probability Table (CPT) of an effect based on the fuzzy relations of the causes that lead to it. The transformed model is given a Bayesian reasoning structure with n causes that lead to an effect. The causes and the effect all can have m states, represented as state 1, 2,…, m. For example, in the FAIR analysis, the TEF can be considered as the effect, while the Contact and Action are the causes. Each of these factors have 5 states, which are [VL, L, M, H, VH]; VH is the highest level state (level 5), while VL is the lowest level (level 1). One assumes that each cause i affects the effect through the individual effect vector [r i1 r i2 … r im ], meaning that, if the state of cause i is j, then it will contribute r ij percent to the event that the effect has the highest state (state m) on. On the other hand, one assumes that the relationships between the causes and the effect are represented through the weights a 1 , a 2 , …, a n , meaning that the state of cause i will contribute a i percent to the state of the effect. The weight vector [a 1 , a 2 , …, a n ] and the individual effect vectors for each cause are standardized, in a way that the sum of all the vector members is 1, in particular, \( \mathop \sum \nolimits_{i = 1}^{n} a_{i} = 1 \) and \( \mathop \sum \nolimits_{j = 1}^{m} r_{ij} = 1, \forall i = 1, \ldots ,n \).

With this model, the method in [12] allows to generate the effect’s Conditional Probability Table (CPT) from the individual effect vectors and the weights through the following formula:

$$ P\left( {E \, = \, j \, | \, C_{1} = \, j_{1} , \, C_{2} = \, j_{2} , \, \ldots , \, C_{n} = \, j_{n} } \right) = \sum\nolimits_{i = 1}^{n} {a_{i} r_{{i\sigma (j_{i } - j, j)}} } $$
(1)

in which P(E = j | C 1 = j 1 , C 2 = j 2 , …, C n = j n ) is the conditional probability of the event in which the effect E has state j, while its causes C 1 , C 2 , …, C n have state of j 1 , j 2 , …, j n respectively; and \( \sigma \) (j i  − j, j) is calculated as:

$$ \sigma (j_{i} - j, j) = \left\{ {\begin{array}{*{20}l} {j_{i} - j, \;j_{i} - j \ge 0} \hfill \\ {j,\;j_{i} - j < 0} \hfill \\ \end{array} } \right. $$

I(C i ), the influence of C i to the effect, can also be calculated by formula (2), which is obtained from [12]. In the formula, P(E = m | C i  = k) is the conditional probability when Effect E has state m (highest) and cause C i has state k. By comparing the I value for each of the factors, one is able to identify which element is the most important.

$$ I\left( {C_{i} } \right) = \frac{{\left| {\mathop \sum \nolimits_{k = 1}^{m - 1} \frac{{P\left( {C_{i} = k} \right)}}{{\mathop \sum \nolimits_{j = 1}^{m - 1} P\left( {C_{i} = j} \right)}}P\left( {E = m |P\left( {C_{i} = k} \right)} \right) - P\left( {E = m |C_{i} = m} \right)} \right|}}{P(E = m)} $$
(2)

2.3 Bayesian Network Approach to Transform the FAIR Framework

We consider the FAIR structure in Fig. 1 as a Bayesian network which consists of three pairs of cause-effect relations, including [cause: C, A; effect: TEF], [cause: Tcap, CS; effect: V], and [cause: TEF, V; effect: LEF]. Such cause-effect reasoning structure already forms a Bayesian network model. If one obtains the individual effect vector and the cause weight at the three nodes TEF, V, and LEF such as in Sect. 2.2, one can generate their CPTs. With the generated CPTs, this Bayesian model can generate statistical output for the LEF query, which can later be transformed into a numerical output. Next a method to identify the CPT at each node is proposed, e.g. between effect E and causes C1, C2, given their corresponding FAIR look-up table \( [e_{ij} \in \left\{ {VL, \, L, \, M, \, H, \, VH} \right\},{\text{ i}} = 1.. 5,{\text{ j}} = 1.. 5] \) (refer to Fig. 2) and by means of the following steps:

Fig. 2.
figure 2

Illustration of the cause-effect relation and the transformation parameters

Step 1. Calculating the weight of the factors: It is worth noting that the FAIR tables are formed based on the assumption that the states of the two causes create direct impacts on the state of the effect. Therefore, if one transforms the state data to numerical data, there should be a strong correlation between the causes and effect data in most of the cases. In the simplest form, one can assume the relation to be linear and translate the node state into a number by defining VL = 1; L = 2; M = 3; H = 4; VH = 5. One then has numerical data for the causes and effect, which can be used to run a regression to test the linear model between the causes and effect, E = αC 1 + βC 2 + ς (α and β are the coefficients and ς is the error). The coefficients α, β are then standardized with α’ = |α|/(|α |+ |β|) and β’ = |β|/(|α| + |β|). We choose α’ and β’ as the weights of the causes toward the effect (see Fig. 2 below).

Step 2. Calculating the individual effect vector: In order to sharpen the difference between the levels of the state, one further converts state e ij to n ij in which n ij  = \( k^{{e_{ij} }} \), k > 0. Therefore one has n(VL) = k, n(L) = k 2 , n(M) = k 3 , n(H) = k 4, and n(VH) = k 5. The weights are also set as (γ 1 , γ 2 , γ 3 , γ 4 , γ 5 ) for the state of the cause (VL, L, M, H, VH), to further differentiate the effect of the state from the other causes (see Fig. 2). The choice of k and the state weights will not affect the correctness of the ranking, however, the larger the value, the deeper the numerical difference between the evaluation output of the threats. For each cause, one can derive its individual effect vector r = [r(VL), r(L), r(M), r(H), r(VH)] by calculating the individual effect value of each state s i as:

$$ r\left( {s_{i} } \right) = \frac{{\mathop \sum \nolimits_{j = 1}^{5} \gamma_{i} n_{ij} }}{{\mathop \sum \nolimits_{l = 1}^{5} \mathop \sum \nolimits_{j = 1}^{5} y_{l} n_{lj} }}, i = 1 \ldots 5 $$

For each of the relations, and after obtaining the weight of the factors and the relevant individual effect vectors, one can generate the Bayesian CPT in each of the effect nodes following the formula in Sect. 2.2. Having the 3 CPTs from the 3 FAIR look-up tables is enough to form the overall Bayesian network for calculating the LEF output, given the input states of the causes.

Step 3. Generating numerical output: The output of the Bayesian model is a vector of the probability of the state evaluations for the LEF, for example, [p 1 , p 2 , p 3 , p 4 , p 5 ], in which p1 is the probability that LEF has state VL, p2 is the probability that LEF has state L and so on. One uses the grade vector [1, 2, 4, 8, 16] to derive the final numerical result (in detail the assessment for LEF is equal to p 1 + 2*p 2 + 4*p 3 + 8*p 4 + 16*p 5 ). This grade will be later used to compare and rank the threats, according to their LEF.

Step 4. Adjusting the Bayesian model for FAIR consistency: We also provide fix for inconsistences between FAIR and Bayesian model created by the weak correlation between values in the FAIR table. The fix will adjust the corresponding CPT entry of the Bayesian model based on the upper/lower bound based on the FAIR state. The 25 FAIR LEF outputs are grouped into 5 categories [VL L M H VH]. In each category, one replaces the FAIR output with the corresponding Bayesian grade (with the same input). The value range for each category is obtained next. If there is no intersection between the value ranges, the Bayesian model is fully consistent with the FAIR assessment. In case there are intersections, one decreases the upper bound (for instance, decrease to the same value with the second highest upper bound in the same category) or increase the lower bound of the relevant categories accordingly to eliminate all the intersections. One then updates all the CPT entries that relate to the adjustments. After this stage, assessments involving all the 25 inputs that FAIR provides are consistent.

Once formed, our Bayesian model is able to provide numerical output for the fuzzy inputs that FAIR cannot evaluate, reflecting the assessment trend obtained from the FAIR table, and point out the most influential element. To illustrate the method, in the next section the method is applied to a list of plausible threats to the smart grids.

3 Experimental Results and Discussion

3.1 Experimental Context and Input Data

In this section, we apply our LEF assessment for 14 threats (refer to the second column in Table 1), which are extracted from the 38 threats considering in our IRENE project [2]. The method to obtain the factor state is given in [2, 3]. Let us assume that after the evaluation, the inputs for the 14 threats are given in the third column of Table 1. Among the input, threat 9 and 13 have fuzzy values. This is because for threat 9, security experts were not able to agree to either assign the state “M” or state “H” to the “Tcap” factor. The chosen value indicates that a 40% value judgement was assigned to state “M” and a 60% to state “H”. For threat 13, the experts were not able to evaluate the “A” factor at all, so an equal probability for each state was assigned. Although FAIR does not support assessments in these two cases, our method does handle such cases.

Table 1. Numerical results of Bayesian-FAIR compared to FAIR.

3.2 Results

Following Sect. 2.3 and choosing k = 2 and (γ 1 , γ 2 , γ 3 , γ 4 , γ 5 ) = (1, 2, 3, 4, 5), the weights were obtained as follows [C A] = [0.39 0.61]; [Tcap CS] = [0.5 0.5]; [TEF V] = [0.7 0.3]. The resulting individual effect vectors are [C A] = [0.42, 0.34, 0.18, 0.05, 0.01; 0.49, 0.34, 0.13, 0.03, 0.01]; [Tcap CS] = [0.49, 0.3, 0.15, 0.05, 0.01; 0.01, 0.05, 0.15, 0.3, 0.49]; [TEF V] = [0.62, 0.25, 0.09, 0.03, 0.01; 0.37, 0.3, 0.23, 0.08, 0.02]. The Bayesian model is constructed using the formula in Sect. 2.2. LEF results for the threats calculated by this model are given in Table 1. To see how the change in the value judgement of fuzzy inputs can change the overall assessment of a threat, one varies it for the input of the “A” state for threat 13, while the other three factors [C, Tcap, CS] are fixed to [VL, L, VL]. The changes are represented in Fig. 3, in which different evaluation grades are calculated for an “A” input changing from [100%VL] to [20%VL 20%L 20%M 20%H 20%VH], [40%VL 15% L 15%M 15%H 15%VH], [60%VL 10%L 10%M 10%H 10%VH,…, and 100%VH]. The lower bound, which is the lowest value of the calculated set, is 264.49, and it results from “A” being at 100% “VL”, while the upper bound is 531.3 when “A” is 100% “VH”. The granularity of the evaluation can be observed in Fig. 3.

Fig. 3.
figure 3

Bayesian-FAIR evaluation of LEF with fuzzy state input in the “Action” factor

3.3 Discussion

Table 1 shows that a Bayesian network constructed on the basis of our method, generates assessments consistent with the FAIR framework. This is because the CPTs are derived from the FAIR look-up tables and can be adjusted for ensuring consistency. Moreover, our approach can differentiate further threats in the same category. For example, threats 6 and 7 are in the same “High” category according to FAIR, but have grades of 923.1 and 939.9 respectively according to our approach. The table shows that 6 and 7 have the same assessments for the three inputs [C, A, and Tcap], the only difference being the evaluation of factor “CS”. Threat 7 has “VL” state compared to “M” for threat 6, so the LEF of 7 should be higher than the LEF of 6. This difference cannot be shown by FAIR as both of the threats are in the “H” category, but it can be seen clearly in our Bayesian model.

In addition to providing a repeatable and traceable way to reach some conclusions, even in case of uncertainties, we provide a clear mechanism for integrating a threat threshold. Having the threat grades allows one to simply define the cut out point to reduce the list of threats to consider. For example, a cut out point of 900 means threats are only considered when their grade is higher or equal to 900, reducing the list of threats to {2, 4, 5, 7, 6, 11}.

Our model is also able to handle fuzzy input. For example, for threats 9 and 13, the assessment grades of 290.33 and 343 respectively are given, while the FAIR model cannot provide the exact state. This capability is helpful when there is a lack of expert opinions for assessing the threats, or experts have conflicted assessments of the threats.

Another advantage is that our approach can point out the most influential factor for each of the threats. These outputs can be then combined to show which factor should be improved to lower the threat impact. For example, out of the 14 threats in Table 1, factor “A” is the one affecting the most threats (i.e. 8 out of 14). This suggests that system managers should implement countermeasures to lower the “Action”, for example, by creating policies putting higher punishment on the attackers that initiated such threats, so as to lower attacker motivation. Such countermeasures will significantly lower the impacts of 8 out of 14 threats in the list, hence, effectively improving the security system with the least efforts for a particular smart grid configuration.

4 Conclusion

The ability to assess cyber threats is becoming more and more important for stakeholders, given the rise in smart grids. Due to the complexity of the risk assessment for complex systems such as urban smart grids, it is necessary to look for ways of considering large amounts of threats in a consistent manner and relate them to each other. In this paper, we proposed a method to transform the FAIR look-up tables to the Bayesian network model to provide a numerical threat LEF assessment combining elements of quantitative and qualitative methods. By applying the method to account for threats to a smart grid configuration, as shown in Sect. 3, we show that our method gives a consistent assessment with FAIR, while providing several more advantages, such as differentiating threats with the same FAIR inputs, giving more granular output, allowing flexible fuzzy inputs, and having the capability to highlight the most influential cause for particular threats, so as to effectively plan security countermeasures to lower the smart grid threats’ impact. The interested reader can consult [10] for an elaborated example how the described approach can be used for considering countermeasures to several threats at once. In the future, we will extend this method for smart grid risk assessment.