Keywords

1 Introduction

Cyber-Physical Systems are a result of an efficient combination of cyber systems and the physical world into an integrated structure for vital tasks which originated from advancements in digital electronics [1]. In these systems, physical components and computational resources are integrated through communication links for remote monitoring and control [2, 3].

The smart grid, as a cyber-physical system, emerged from the restructuring of traditional power networks [4]. These systems require smart tools not only for electrical flow, but also for better performance that has led to self-healing, adaptive protection, control, customer involvement, just to name a few [5,6,7].

Even though Cyber-Physical Systems develop system operator interaction with the consumer and other parties, many challenges have been created including security, reliability, stability, maintainability, safety, and predictability [8, 9]. Security is one of the most important challenges in cyber-physical systems due to the integration of many components which has made them vulnerable on both the physical and cyber sides. Malicious attacks have led to interrupt system operation or theft of arcane data which can be directed at the cyberinfrastructure or physical components [8, 10,11,12]. Cyber-physical systems are facing a tsunami of generated data on different components which are too large and complex for real-time processing. Cloud computing techniques along with analytic methods such as machine learning (ML) can help generated information be secure whilst being processed, analyzed and stored [13, 14]. ML in the context of this chapter is referring to making predictions after learning from available data by a system. Figure 10.1 shows the application of ML in smart grid security.

Fig. 10.1
figure 1

Machine learning subsections in CPS security

There are many approaches such as ML to intrusion detection systems which are classified into supervised, unsupervised, and reinforcement learning and can build the requisite model based on training data [15]. In supervised ML, both normal and abnormal behaviors are provided to the model to learn trained labeled data. It is very difficult for attackers on cyber-physical systems to obtain labeled data [16, 17] while they do not need abnormal data in the training phase and it is a great advantage for unsupervised learning [2, 18, 19]. In reinforcement learning, there is no training data and as a result, the agent can learn from their own experience. In fact, it gathers the training examples by trial and error while it is attempting its tasks.

This chapter surveys ML methods for an anomaly attack detection framework for cyber-physical systems. Anomaly detection is defined as detecting patterns that do not fit into predictable behavior [20, 21]. Since the characteristics, structure, quantities, and patterns of research activities are understood by bibliometric analysis, the purpose of this chapter is to identify the state-of-the-art of anomaly attack detection in cyber-physical systems.

Web of Science is used as the search engine for this analysis. First, the related keywords are inputted for extracting publications. Then, we limit research time to the last 10 years. Finally, non-relevant and non-English publications were removed and the inquiry to collect the data for bibliometric analysis was as follows: (TS = ((anomaly detection OR outlier detection) AND (cyber-physical system OR cyber-physical system OR smart grid OR CPS cyber-physical systems))). As a result, in the primitive search, 389 publications were found which were reduced to 379 after the mentioned filters.

Results show that the greater number of the publications fall under computer science and engineering and most of them belong to the United States and China (154 and 72 publications respectively). Iowa State University and the United States Department of Energy are the most productive institutions in this field of study, both being located in the United States.

Figure 10.2 shows 87 documents were published in 2018 while there was only one publication in 2010. Considering the fact that the study was conducted in August 2019, it is predictable to see the number of publications to be higher for 2019 compared to 2018.

Fig. 10.2
figure 2

The number of publications

The rest of this chapter is organized as follows. Section 2 presents an overall view of cyber-physical systems. Attack detection methods in CPS and anomaly detection are studied in Sects. 2 and 3 respectively. Section 4 provides a case study and Sect. 5 concludes the chapter.

2 Cyber-Physical Systems

According to the application of CPSs, these systems can be defined in different ways, such as deeply intertwined computation, communication, networking, advanced tools, and physical processes interacting with each other relying on IT systems, which are used to monitor and control the physical world [22, 23]. Figure 10.3 shows a holistic view of CPSs.

Fig. 10.3
figure 3

Holistically view of CPSs

Different characterizations are presented for CPS which focus on different aspects of these systems including cyber capability, automation, dependability, networking, integration, complexity and reconfiguring [24] which we will briefly mention them [25].

Cyber-physical systems are the integrations of cyber capability and physical components which include distributed networks (i.e. Local Area Network, Bluetooth, Global System for Mobile Communications, etc.) and are severely limited by spatiality and real-time computation. Due to reliability and security necessities for CPS, there is a need to have adaptive capabilities with advanced feedback control technologies.

Since cyber-physical systems use distributed communication and smart tools and sensors, these systems are facing various challenges from different points of view which are presented in Fig. 10.4 [1]. However, in the rest of this part, we focus on security issues because CPSs are more vulnerable to cyber-physical malicious attacks [26,27,28].

Fig. 10.4
figure 4

CPS challenges

Security solutions for cyber-physical systems are required and could be enhanced with Information Technology (IT) systems and techniques like cryptography, access control, attack detection, or others. Lack of security in CPSs (e.g. nuclear power station or medical devices) could cause a worldwide threat or disaster.

Security is also one of the most important challenges in the smart grid due to the high dependency of these systems on cyber information yielding new security vulnerabilities [12, 26, 27, 29]. These systems with extensive communication capabilities are good examples of CPSs, which provide the required infrastructure for handling new challenges. Rising electrical energy demand and several technological developments have motivated the advancement of the smart grid. From this, it can be seen that a comprehensive approach is needed for the realization of this issue to quantify attack impacts and assess the effectiveness of countermeasures. The smart grid view as a cyber-physical system is shown in Fig. 10.5.

Fig. 10.5
figure 5

Smart grid as a CPS

3 Attack Detection in CPSs

There are three main security properties for a Cyber-Physical system including confidentiality, integrity, and availability [30]. So, attacks are classified considering the security properties as shown in Fig. 10.6.

Fig. 10.6
figure 6

Attack taxonomy for CPS

The most efficient way of defending against network-based attacks is Network Intrusion Detection Systems (NIDS). NIDS are used in almost all Cyber-physical systems. Anomaly-based NIDS and signature-based NIDS are the two main kinds of these detection procedures [31]. Signature-based systems use pattern recognition methods while anomaly-based systems configure a statistical model defining the standard network traffic and flag any abnormal behavior that diverges from the model [32]. It should be noted here that the database of previous attack signatures are preserved and compared with analyzed information for signature-based systems while in anomaly-based systems the database of general attacks a training phase is required, and it is a complex process due to the setting of a threshold level of detection. Since innovative attacks can be detected as soon as they take place, anomaly detection systems can detect zero-day attacks and it is a major advantage of this system in contrast to signature-based systems [33]. The rest of this section is focused on anomaly-based detection.

3.1 Anomaly Based Detection

The network’s behavior is a very important factor and if it does not follow the predicted behavior which is learned by the specifications of the network managers, anomaly detection will commence. Given that various protocols are affected by the rule defining process, the ruleset can be recognized as the main drawback of anomaly detection. Rule definition becomes a difficult process when it is facing custom protocols. Network managers should be comprehensively familiar with the accepted network behavior because the malicious action goes unnoticed if it falls under the accepted behavior, while by defining the rules anomaly detection systems work properly [34]. Finally, anomaly detection is related to novel attacks without a signature which can be detected by anomaly-based method if it falls out of the usual traffic patterns [10]. This is a very big difference between anomaly and signature-based detection methods.

Anomaly detection could be matured upon a variety of general methods borrowed from various scientific fields including ML, statistics, artificial intelligence, clustering, pattern recognition, classification, system theory, signal processing, etc. Figure 10.7 shows a taxonomy for anomaly detection.

Fig. 10.7
figure 7

Anomaly detection taxonomy

3.1.1 Statistical Methods

Anomaly detection methods have been advanced using statistical theories which are characterized and qualify the behavior of every component of the system. In these methods, the collected data should be given a probability distribution. The difference between current behavior and normal behavior is detected by using statistical properties such as mean, variance, etc. [35]. Corresponding to the currently observed and the previously trained profile are two different datasets during the anomaly detection which are used by statistical methods. There are many advantages for this method but the most important one is related to decreasing false detection rate because they can provide more accurate detection of malicious actions over a long duration. Given that the ability to learn from observation in statistical methods, detailed awareness about the standard activity of the system is not necessary [36, 37]. It should be noted here that these methods have some drawbacks. For example, the system can be attacked again by generating network traffic in such a way that looks similar to normal behavior. Another disadvantage is that if the system can be modeled in such a way that statistical methods cannot be used, it leaves the detection methods in a useless state [38].

3.1.2 Classification Based Methods

Each attack with a recognized outline and plan can be detected right away if it is dropped while the network administrator prepared details of the features to the detection system. That is why classification methods depend on administrators’ substantial knowledge of the specifications of attacks [39]. If an attack signature has been provided previously by a network manager, the system is capable of detecting that because it can detect only what it knows is vulnerable to another new attack. Even if a new signature of attack is created and put into the system, the inflicted damages are not changeful and there are many losses likewise, the repair process is very expensive [40]. Finally, these methods are dependent on a standard traffic outline that makes the cognizance base and consider activities that stray from baseline outline as anomalous [41, 42].

3.1.3 Clustering Based Methods

One of the main subclasses of unsupervised ML is called classification. In this method, rules are found for grouping similar data examples without the need to labeled data [43]. There are many types of clustering methods but the two most important and functional ones are regular clustering and co-clustering [44, 45]. The difference between these methods is related to the method of clustering. In regular clustering, the rows of the data set are considered. In co-clustering the clusters are based on both rows and columns of the dataset simultaneously [46]. K-means is an example of regular clustering.

3.1.4 Signal Processing Approaches

Signal processing methods rely on time-series and spatial-temporal data [47, 48] which includes three sub-methods: Min-Max-Threshold, Filtering, and Modeling. The simplest form of anomaly detection is Min-Max-Threshold, where minimal, maximal or threshold values are defined from a series considered normal [49]. Filtering method compares a signal with a low-pass filtered version which gives an indication of an outlier value. Finally, the modeling method generates a model based on system identification techniques which are used to predict the next values.

3.1.5 Pattern Recognition

In this method, the difference between a normal and an abnormal state is made by the sequence of samples as the shape of the signal whereas the individual data alone is not important. Support vector machine, Neural networks, and Markov chains are trained in order to detect a difference between normal and abnormal shapes [50, 51].

3.1.6 Machine Learning

Machine learning aims to find patterns, make predictions, and make decisions based on historical information to perform a task [12]. Supervised, Unsupervised, Reinforcement and Semi-supervised learning are four types of ML. In supervised techniques, the rules are learned from different examples which are positive or negative and labeled data are used to find a model that explains the dataset. In unsupervised learning, a procedure cannot consider specified anomalies and the main objective is to find a pattern for unlabeled data. Finally, in semi-supervised learning, just the normal performance can be learned from positive examples so only a portion of data is labeled [16].

Machine learning approaches usually separate data into different categories: training and testing. Training data, which commonly is larger in size, is used for learning and providing a model for the system. Testing data, which is completely independent of the training, is used to assess the efficiency of the algorithm. In anomaly-based detection, the normal behavioral pattern is described and modeled by using a training set. Then, the model is applied to testing dataset in order to classify it as either normal or anomalous. In addition, some ML methods separate datasets into three categories instead of two, adding a validation dataset. The validation dataset is used to validate the testing dataset’s accuracy when used as input to the given ML method. For illustration, the number of layers and nodes in Artificial Neural Network (ANN) can be varied and the best parameters are chosen that have less estimation of error and more efficient to be built depending on the performance on the validation dataset [12].

One important part of any anomaly detection method is evaluating the performance of ML algorithms. Classification accuracy is the most intuitive method in this evaluation, which measures the performance of the model by computing the ratio number of accurate predictions to the whole number of observations. The main drawback in this metric is that it works properly only when the dataset has equal values for false positives and false negatives [16, 18, 19].

F1 score is another metric in measuring the accuracy in uneven class distribution, which computes the balance between Precision and Recall. Precision is the ratio of correctly predicted positive observation compared to total positive observation, while Recall is the ratio of correct positive prediction to the total number of predictions in the same class (true positives and false negatives of the same class). As a result, F1-score can compute the performance by taking both false positives and false negatives into account. In multi-label ML algorithms, F1-score is usually used to evaluate the classification performance. Therefore, by maximizing the F1-score in multi-label classification, the performance of the algorithm can be considerably improved. Finally, ML is used in a wide range of cyber-physical systems due to the prediction and detection are the two most vital factors for these system operations. Anomaly detectors can be built based on ML algorithms, which could lead to secured cyber-physical systems [18, 19, 52].

4 Case Study

The use of ML techniques for the detection of anomalies can be exhibited through the following case study. Heuristic optimization algorithms are proposed as feature selection techniques to reduce the training time of the algorithms. Since one of the main concerns of the use of ML is computational efficiency, this case study aims to implement automated methods to reduce the dimensions of the data prior to training. This reduces the training and operating time of the ML algorithms for increased computational efficiency.

In this case study, ML classifiers are used to categorize the smart grid measurements as normal or malicious. A Support Vector Machine (SVM), K-Nearest Neighbor (KNN), and Naïve Bayesian (NB) classifier are implemented and compared in terms of classification accuracy. Each of the three classifiers is tested with three heuristic feature selection techniques, which are: Binary Cuckoo Search (BCS), Binary Particle Swarm Optimization (BPSO), and Genetic Algorithm (GA). These feature selection methods are optimization algorithms that find the ideal subset of features that produces the best accuracy. The classifiers are tested with each of the resultant subsets of features and evaluated based on its accuracy and F1 score.

Three different IEEE standard power systems are used in this experiment: The IEEE 14-bus system and the IEEE 118-bus system. The measurement data consists of power flow of branches and buses. For each power system, three sets of data were generated; a set of 1000 samples used for feature selection, a set of 40,000 samples used for training of the classification algorithms and a set of 10,000 samples used for testing and evaluation. Each set of data is divided in half into good and malicious data. The malicious data consists of measurements infected with a false data injection (FDI) attack.

Each of the classifiers, as well as the feature selection algorithms, consists of modifiable parameters that can affect the solution. As such, appropriate parameters must be chosen to ensure optimal solutions. For each of the classifiers, the parameters were chosen based on an accuracy test in which accuracy of the classifier was evaluated at varying parameters. Figure 10.8 shows the accuracy of the SVM with varying kernel coefficient (γ) and penalty parameter (C). Similarly, Fig. 10.9 shows the accuracy of the KNN with varying number of neighbors. These tests are performed on the smallest system, IEEE 14-bus system, due to their time-consuming nature. Based on these results, the parameters of the classifiers are chosen. The penalty parameter and kernel coefficient of the SVM is chosen as 1000 and 0.0001 respectively, and the number of neighbors for the KNN algorithm is chosen to be 12. The Naïve Bayesian Classifier, however, was trained with the default smoothing rate of 1 × 10−9.

Fig. 10.8
figure 8

Accuracy of SVM

Fig. 10.9
figure 9

Accuracy of KNN

Each machine learning classifier is tested with the subset of features produced by each of the feature selection algorithms. For each pair of classifier and feature selection algorithms, the classifier is used as the fitness of the solution for each of the heuristic feature selection techniques. The accuracy and F1-score of the classifiers are recorded for each of the resultant feature sets as well as without any feature selection. Furthermore, the runtime for each of the algorithms is recorded for analysis regarding computational efficiency.

The classification accuracy, F1-score, training time, and feature selection time are recorded for each combination of algorithms in Tables 10.1 and 10.2 for the IEEE 14-bus and IEEE 118-bus respectively. The results clearly demonstrate the trade-off between classification accuracy and runtime. The more simplistic classification algorithms like KNN and NB resulted in a much lower runtime; the associated feature selection time and training time is significantly lower than that of the SVM. The complex nature of the SVM algorithm results in a significantly longer feature selection time as well as training time. However, the resultant accuracy and F1- score of the SVM algorithm is significantly higher. Furthermore, appropriate feature selection can significantly lower the overall runtime of the SVM, as can be seen from comparing SVM with no feature selection to that with BCS or BPSO for both power systems.

Table 10.1 Results for the IEEE 14-bus system
Table 10.2 Results for the IEEE 118-bus system

This case study demonstrates the effectiveness of ML techniques at classifying FDI attacks, which typically bypass the standard bad data detection systems. Additionally, this study reveals the trade-off between computational time and performance. Furthermore, it was proven that heuristic feature selection can be successful at reducing the number of features and, as a result, reduce the training time of the classification algorithms. When combined with a computationally expensive classifier, heuristic feature selection can significantly reduce the overall runtime thus improving the computational efficiency of certain classifiers. This, however, was not exhibited in the more simplistic classifiers due to their much faster training time, which is reduced by less than the runtime of the feature selection algorithms. In realistic applications, with larger systems and larger data, the training time is expected to be significantly larger. As such, the reduction in runtime is expected to be much larger.

5 Conclusion

The main idea of cyber-physical systems is designing an integrated system instead of separate systems on cyber and physical systems. These systems could be a propitious paradigm for current and future engineered systems which are able to make an impressive impact on our interactions with physical components.

Security is one of the most important factors in CPSs because of the frequency of reported cyber-attacks. Although many detection methods have been proposed, new solutions are still expected against new threats and vulnerabilities. Many approaches are presented in this chapter for attack detection in CPSs such as anomaly detection by using ML including supervised, unsupervised, reinforcement, and semi-supervised methods. We also briefly introduce cyber-physical systems and security concerns about them. Then, detection methods were presented. Finally, a case study showing the effectiveness of different ML algorithms in classifying cyber-physical systems attack was given. Our results demonstrated that reducing the number of features can reduce the overall runtime of the program.