Keywords

1 Introduction

Critical infrastructure, such as water treatment systems and power grid, consists of an Industrial Control System (ICS) that controls the underlying physical process using sensors and actuators [6, 25]. A Supervisory Control and Data Acquisition (SCADA) system is an integral part of ICS. Moreover, such critical infrastructure is also a Cyber Physical System (CPS) that includes cyber and physical components. Increased connectivity through communications network within the ICS components, and possibly through the Internet, exposes such CPS to a range of cyber threats [3, 10, 23, 24, 35].

Table 1. Related work.

A cyber or physical attack on an ICS will likely result in anomalous process behavior. In general, approaches for anomaly-based intrusion detection can be categorized based on rules, statistics, and computational intelligence. Among these, computational intelligence based anomaly detection approaches have gained the attention of researchers as the rest of the approaches require a detailed understanding of the process flow, physical laws, and configuration of components in the CPS [2, 14, 18]. Moreover, the application of machine learning algorithms for anomaly detection is found to be fast and relatively easy to develop since the behaviour and process flow of the entire CPS system can be learned with reasonable accuracy from the multivariate historical data [27]. A summary of research on computational intelligence based anomaly detection approaches is given in Table 1.

This work describes a study wherein the Probabilistic Neural Network (PNN) framework is selected as a modeling approach for the design of an anomaly detector. Competing approaches include Convolutional Neural Network (CNN), Deep Neural Network (DNN), Naives Bayes (NB), One class-Support Vector Machine (O-SVM), Random Forest (RF), Recurrent Neural Network (RNN), Long Short Term Memory (LSTM), Deep Autoencoders, and others. PNNs are unique in their characteristic of mapping the input variables to class labels using Bayesian strategy [12, 17, 21, 31]. Unlike other variants of neural networks, PNN is robust, faster, mostly independent of parameters, and has the ability to handle imbalanced datasets- a key reason for exploring it in this work. PNN has been effectively used for the design of anomaly detectors in various applications [9, 13, 32, 33] however to the best of our knowledge, this is the first work to employ PNN for anomaly detection in an ICS, especially in a SwaT operational plant.

Novelty and Contributions: (a) A PNN-based anomaly detector for critical infrastructure, and (b) Validation of the performance of the PNN-based anomaly detector using live data from an operational CPS, namely, SWaT [22].

Organization: This paper is structure as follows. An introduction to PNN is in Sect. 2. Experimental assessment of the effectiveness of a PNN-based anomaly detector in detecting anomalies resulting from cyber attacks, is in Sect. 3. This section contains a description of the architecture of the testbed and its dataset used in the evaluation, impact of smoothening parameter on the performance of PNN, and a detailed comparison with seven other neural network based methods. Conclusions from this work are in Sect. 4.

2 PNN-Based Anomaly Detector

In this section, we provide a detailed insight on the application of PNN for the design of an anomaly detector for CPS. In general, any data driven anomaly detector designed for CPS should be fast, reliable, scalable, and sensitive to noisy data generated by the heterogeneous physical and control components as the CPS environment is dynamic, operates in real time, and the sensor data are often generated at high frequency [27]. Further, the ability to predict the anomalies in the unknown samples based on a similar set of samples in the training dataset forms an important criterion for assessing the performance of a data driven anomaly detector [9, 30]. The above mentioned requirements of an anomaly detector for a critical infrastructure led to the choice of PNN in this work.

Fig. 1.
figure 1

Probabilistic neural network.

As shown in Fig. 1, a PNN is comprised of artificial neurons arranged in four layers as detailed below.

  1. 1.

    Input layer: Passes the unknown sample \(X_s\) to the pattern layer without any computation

  2. 2.

    Pattern layer: Number of neurons in this layer corresponds to the number of training samples. Each neuron corresponds to the training samples and its output is defined in Eq. 1.

    $$\begin{aligned} y_k^i=exp\left[ \frac{-|X_s-x_k^i|^2}{2\sigma ^2}\right] \end{aligned}$$
    (1)

    where, \(x_k^i\) is the \(i^{th}\) training sample of the \(k^{th}\) class and \(\sigma \) is the smoothening parameter.

  3. 3.

    Summation layer: The average of the pattern layer’s output that belongs to the same class is computed using Eq. 2.

    $$\begin{aligned} S_i =\frac{1}{n} \sum _{k=1}^n exp\left[ \frac{-|X_s-x_k^i|^2}{2\sigma ^2}\right] \end{aligned}$$
    (2)
  4. 4.

    Output layer: The output layer consists of one neuron that decides the class of the unknown sample using Eq. 3.

    $$\begin{aligned} C=argmax(S_i),\forall i=(1, 2, \ldots , C_n) \end{aligned}$$
    (3)

Given the conditional attribute (x), decisional attributes (Y), classes in the training set (C), and smoothening factor (\(\sigma \)), PNN computes the class of the unknown sample [26, 30].

3 Experimental Evaluation

The PNN-based approach proposed in this work was evaluated using the dataset obtain from the SWaT testbed. The architecture of SWaT, summary of the dataset and data preprocessing techniques can be found in [18]. To demonstrate the predominance of the proposed anomaly detector, performance validations were carried out by comparing the effectiveness of the PNN-based anomaly detector with that of the existing machine learning models in terms of classification accuracy, precision, detection rate, F-Score, and false alarm rate. The models used for the comparison include Naives Bayes (NB), Support vector machine (SVM), Random forest (RF), and Multi layer perceptron (MLP).

Fig. 2.
figure 2

Stages P1 through P6 in SWaT. AITxxx: chemical property meters, FITxxx: flow rate meters, LITxxx: level sensors; Pxxx: pumps.

3.1 SWaT Architecture

SWaT is a fully operational small footprint water treatment plant at the Singapore University of Technology and Design (SUTD). Details of SWaT are available in [22].

SWaT consists of six stages (P1-P6) as shown in Fig. 2. Each stage comprises of a combination of physical and control components for processing raw water. Each stage is equipped with sensors to measure flow rate, water level in tanks, chemical properties of water, etc., and actuators such as pumps and valves. The cyber part of SWaT consists of a two layered communications network with Programmable Logic Controllers (PLCs), SCADA workstation, Human Machine Interface (HMIs) and a historian. Level 0 network in the testbed consists of a ring for each stage through which all sensors and actuators transfer measurements, and receive commands, to and from the corresponding PLCs via. wired and wireless links. Similarly, Level 1 network consists of a STAR architecture that enables communications between SCADA workstation and the PLCs.

Table 2. Attacks considered in experiments.

3.2 SWaT Dataset

For data collection, the entire plant was operated for 11-days. For the first 7-days, the plant was operated under normal mode. Subsequently, for the remaining 4-days, the attacks were launched by spoofing the sensor values, issuing fake commands, etc. Attack timings, target, expected outcome, and effects are available in [11].

During 11-days of data collection, a total of 946,723 labelled records were collected from the historian. Each record consists of 51 attributes corresponding to the individual sensor values. Note that selecting the entire 946,723 instances for the experiment would bias the PNN to the ‘normal’ class since the normal instance dominates the instances related to the attacks. However, if we consider 449,921, i.e., instances recorded under the attack scenario, reduce the dominating nature and hence the imbalanced nature of dataset is avoided. Therefore, a total of 449,921 records collected during 28th Dec 2015 to 2nd Jan 2016 were used for experimentation.

During the last four days of data collection, a total of ten attacks, referred to as A1-A10 [11], were launched by injecting fake sensor values to the PLCs (Table 2). For each attack, two different subsets of the entire dataset were created using ‘random sampling without replacement’ to train and validate the learning model. The attacks can be categorized as: (i) Single Stage Single Point attack (SSSP), (ii) Single Stage Multi Point attack (SSMP), (iii) Multistage Single Point attack (MSSP), and (iv) Multi-Stage Multi Point attack (MSMP). Attack duration varies based on the nature of the attack. For example, the duration of attack A1 that targets MV101, and attack A9 that targets chemical sensors AIT 402 and AIT 502, are 539 and 251 s, respectively.

3.3 Results and Discussion

Data was collected from the experiments and analyzed. Results from the analysis are presented next.

Impact of Smoothening Parameter: Note from Eq. 1 that \(\sigma \) is a single tunable parameter which is significant in determining the width of the kernel parameter in the pattern layer which in turn has a significant impact on the performance of the PNN. Since the smoothening parameter relies on the characteristics of the input data, it is important to analyze its impact on the performance of the detector. Therefore, the experiments were conducted by varying \(\sigma \) in the range [0.1,0.9] at intervals of 0.1. For each experiment, the average values of the considered performance metrics were computed. The corresponding plots are given in Figs. 3, 4, 5 and 6. From the plots, it is evident that to achieve the optimal value for the considered performance metrics, \(\sigma \) ought to be in [0.1,0.3].

Fig. 3.
figure 3

Smoothening parameter vs. Classification accuracy

Fig. 4.
figure 4

Smoothening parameter vs. F-score

Fig. 5.
figure 5

Smoothening parameter vs. Detection rate

Fig. 6.
figure 6

Smoothening parameter vs. FAR

Table 3. Performance analysis of all classifiers for Attacks 1- 10

Analysis of data from the experiments indicates that the identification of multiple optimal values of \(\sigma \) for effective detection of various anomalies in the process flow of SWaT might further enhance the performance of the PNN-based anomaly detector. Therefore, the design of the PNN-based anomaly detector with multiple \(\sigma \) values and accurate modeling of the physical process of SWaT, resulted in high detection rate and minimal false alarm rate.

Performance Analysis: Performance of PNN was compared with the machine learning techniques mentioned earlier. The results of the comparison are summarized in Table 3. The best values of each metric are highlighted in bold. From the table, it can be noted that PNN outperforms the existing machine learning techniques in terms of all quality metrics expect in a few cases. For example, Naive Bayes and SVM classifier attain the least false alarm rate of 0% when compared with PNN for attack 1 and attack 3.

From the above set of experimental results, some emergent facts observed about data driven anomaly detectors are (i) PNN exhibits an ideal classifier behaviour for attacks 2, 4, 7, 8, and 10, and (ii) The performance of classifiers varies with the nature of the attack, i.e., MLP has a better performance for attacks 1, 2, 4, 5, 7, 8, 9, and 10 when compared with the rest.

Lastly, the performance of the PNN-based anomaly detector over the existing machine learning techniques was analyzed in terms of their respective fault detection ability. In general, attacks 6, 7, 8, and 9 were found to be more difficult to detect as they target multiple sensors across multiple stages. However, PNN achieves 100% detection rate and 0% false alarm rate for attacks 7 and 9. A near optimal outcome was achieved for detecting attacks 6 and 8. This inherent ability of a PNN-based anomaly detector was due to the proper tuning of the smoothening parameter (\(\sigma \)).

To summarize, PNN, and the considered machine learning techniques, either detect the attacks during the initial stage of occurrence or the attack is left undetected. This nature of data driven models is preferred over the existing anomaly detection models, as they do not wait for the behaviour of CPS to exceed any pre-specified threshold for attack identification and therefore possess high detection rate and low false alarm rate [18]. However, they provide worst performance for the attacks that last for a shorter duration since they are left unidentified.

4 Conclusions

A SCADA specific PNN-based anomaly detector is presented. The detector uses a supervised approach to detect anomalies possibly resulting from attacks targeted at a CPS. The novelty of the proposed detector lies in its ability to identify anomalies resulting from single– and multi– stage attacks. Experimental validation on the dataset obtained from SWaT demonstrates the significance of PNN-based anomaly detector over the existing machine learning techniques in terms of various quality metrics. Also analysed in this study was the impact of the smoothening parameter on the performance of the PNN-based anomaly detector.

In the proposed PNN-based anomaly detector, a supervised approach needs training with both attack and normal signatures. However, in an operational plant, especially during the unavailability of appropriate attack patterns, one may employ the supervised learning model in [1] for efficient anomaly detection. In the case of an imbalanced dataset, along with the smoothening parameter, the training samples play a vital role in determining the performance of PNN. Unlike in traditional RNN models, PNN does not rely on the temporal dependencies among the samples. Hence, the application of properties such as hypergraph coarsening, dual hypergraph, etc., for the identification of informative samples, aids in improving the performance of PNN in detecting short term attacks [5]. Further, the analysis and implementation of PNN variants such as heteroscedastic PNN, weighted PNN, arithmetic residue PNN, etc., for efficient anomaly detection in a CPS, is a potential challenge that needs to be focussed.