Keywords

1 Introduction

The cyber-physical systems can be defined as the systems built by integrating sensors, computers, networks, communication, and other digital monitoring components into physicals infrastructure to control or monitor the infrastructure remotely and autonomously [1,2,3]. Some real-world examples of CPS include Smart grids, medical monitoring systems, robotics, autonomous vehicles, soil treatment plants, and water treatment plants [4,5,6,7,8]. The cyber-physical infrastructure operations include both cyber and physical aspects which make these systems vulnerable to both cyber and physical security threats. The attack on CPS can have a huge impact due to the diversity and scope of operations of these structures [6, 9,10,11,12,13]. Thus, the cyber aspect of such CPS has been studied in many pieces of research, which contributed their finds in detecting the cyber-attacks on CPS using machine learning [12, 14,15,16,17]. Advancement in Machine Learning and Deep Learning models has motivated the cybersecurity communities for leveraging these models so as to enhance the privacy and security of CPS [18,19,20,21,22,23,24]. During the past decade several models have been proposed for a diverse range of cybersecurity including malware detection [25,26,27,28], threat hunting [29,30,31,32] and privacy protection [33].

In this paper, we have used the SWat dataset which is the data collected from a Secure Water Treatment plant [34]. The data was collected for both normal operational days and few days with attacks on the water treatment. The dataset is processed and used to perform the cyber-attack detection on CPS systems using different supervised machine learning algorithms. We have performed the comparative analysis on the four models based on the major evaluation matrices: Accuracy, True Positive Rate (TPR), False Positive Rate (FPR) and the Receiver Operating Characteristics (ROC) curve and Area Under ROC Curve (AUC).

2 Literature Review

In earlier studies, many computer scientists have proposed various approaches to resolve cyber threat hunting problems using different techniques of machine learning [35,36,37,38,39]. Cyber-attack detection is usually accomplished by grouping using power device data or measurements [40,41,42,43,44]. The involvement of risks or attacks is measured in various security and contact levels of the network. Cyber-attacks are observed by measurements by the improved state- estimation techniques using mode-based technique [45]. Numerous studies have presented network traffic-based intrusion detection Ghaeini et al. [46] employ this approach on the SWaT dataset used in our study. Similarly, [47] proposed an Enhanced SVM approach with combined features from two machine learning techniques demonstrated a low false-positive rate. Another paper [48] uses the Random forest Algorithm and achieves a significant accuracy of 94.0187% for cyber-attack detection. A behavior-based machine learning (ML) approach for the detection of any abnormal behavior or attack that may attempt to modify the behavior of the CPS [15]. This method not only recognizes the cyber-attack occurred on a layer of the physical process, but it also identifies the specific attack type. In This study [49] learns how to combine different machine learning methods with the IDS improving the accuracy of threat identification. A prototype IDS is expected in this study. This IDS prototype is equipped to improve accuracy in the identification of several attacks through a combination of machine learning methods. This method not only recognizes the cyber-attack occurred on a layer of the physical process, but it also identifies the specific attack type. In [50] the proposed cyberattack detection system has high detection accuracy and wide attack coverage in order to detect unrecognized attacks using network and host system information.

3 Methodology

This section will describe the process followed to build our supervised machine learning models which can detect the cyber-attack samples from the SWat dataset.

3.1 Dataset Processing

The Swat dataset consists of 77 features and a total of 14,995 data points, 9521 normal and 5474 attack data points. The few features like timestamp and other less critical features were removed to process the dataset. The label feature (Target) was marked as 1 for attack and 0 for the normal activity data point.

3.1.1 Feature Selection

To reduce the overall dimensionality of the dataset, we performed the feature extraction process. The best feature that can contribute to the target variable was extracted by combining results of ExtraTreeClassifier and SelectKBest algorithms of Scikit-learn library and are shown in Fig. 1a, b respectively.

Fig. 1
figure 1

(a) ExtraTreeClassifier (b) SelectKBest Result

The most common and highest-ranked features were extracted and used for all the four-classification model. As shown in Table 1, the major operations of the water treatment plant was used as a major feature category set and the same category of features were used to identify the functionality of water plants at different levels process.

Table 1 Feature category

3.2 Machine Learning Classifiers

For the detection of cyber-attack samples, the KNN, SVM, Decision Tree, and Random Forest, classifiers were trained and tested on the transformed dataset, and results were recorded for comparative analysis.

3.2.1 KNN Model

In the KNN model, we used the processed dataset explained in Sect. 3.2.1. The KNN was implemented with the use of the Sckilearn library and in KNN we initialize the K = 4, but after trial and error K was finally set to 1 and the model was trained with K = 1 on the processed dataset.

3.2.2 SVM Model

The SVM model was trained and tested on the processed dataset. For SVM, kernel function was set to linear, and probability was set to True.

3.2.3 DT Model

Our DT Model was trained with the processed dataset. The DT model simply designs an inverted tree structure on the base of a trained dataset and then classify a sample by tracing the down designed tree.

3.2.4 RF Model

The RF model is like the DT model, but the RF model creates multiple decision trees instead of only one decision tree. In our RF model, the maximum depth was set to 2.

4 Results and Discussion

This section highlights the results achieved with different supervised machine learning techniques in detecting cyber-attack on a CPS system and will describe the comparative analysis results.

4.1 Evaluation Measures

To evaluate and compare the performance of the models, we have used the commonly used evaluation metrics. Table 2 contains a description of the used evaluation metrics for comparative analysis.

Table 2 Description of evaluation metric used for comparative analysis

4.2 Experiment and Results

The processed dataset with the total samples of 14,994 and selected features was used to test all the models. All the models were trained on the processed dataset and the results observed on the basis of evaluation metrics (Table 2) are shown in Table3.

Table 3 Observed accuracy, TPR and FPR values

4.3 Comparison of Models

In our experiment, the KNN model achieved an accuracy of 99%. The TPR received for KNN was 99.9% and the FPR was approximately 0%. With the SVM model, we received an accuracy of 98.7% and the average values of TPR and FPR were 99% and 0.01% respectively. Our DT model received 99% accuracy on the processed dataset and approximately 99% TPR and 0% FPR. Whereas, the RF model hit the accuracy of 96% with 98% APR and 0.01% FPR. According to the three-evaluation metrics values mentioned in Table 3, the DT model performed more effectively than other supervised machine learning models in classifying the cyber-attack samples in the Swat dataset.

4.4 ROC Curve

A ROC curve is a common graphical evaluation metric that is used for evaluating the performance of different machine learning classifiers. It allows us to analyze the binary classifier’s capability of distinguishing between classes [51]. It is simply a plot of TPR and FPR at different threshold settings.

The ROC curve for classification of cyber-attack samples on a processed dataset, for all four supervised classification models, is shown in Fig. 2. As shown in the legend, the overall AUC value for KNN was 0.99, 0.99 for DT, 0.84 for SVM, and 0.99 for RF. The average AUC of all four models was extremely close to 1 which depicts that all four models perform better for binary classification of cyber-attack in a processed dataset. Although, the AUC value observed for all four models was close to 1. However, the AUC value for both KNN and DT is almost equal to 1 with AUC equal to 0.999 for both KNN and DT.

Fig. 2
figure 2

ROC Curve for all four supervised machine learning models

5 Conclusion

We were able to successfully design the four different machine learning models to classify the cyber-attack samples accurately from the Swat dataset. The results achieved using the critical evaluation metrics allowed us to perform effective comparative analysis and propose the most suitable algorithm. Using these four supervised machine learning algorithms, we achieved an overall accuracy of 99% with KNN, 98% with SVM, 99% with DT, and 96% with RF. On the base of all the evaluation metrics, the DT outperforms the other classifier models with a reasonable high accuracy of 99.9% and other almost ideal evaluation metrics.

The future work will be to evaluate the other supervised machine learning algorithms and to experiment with the different cyber-physical system datasets.