Keywords

1 Introduction

In recent years, with the continuous expansion of the grid scale and the integration of different types of energy, the centralized control grid is no longer suitable for the needs of today’s society. In order to solve the challenges faced by the existing power grid, a new concept of smart grid has emerged in the power industry [1]. Smart grid is considered as the infrastructure of modern power grid. Through automatic control, high-power converter, modern communication means, sensor metering technology, modern energy management and other technologies, based on the demand, the reliability of energy network is optimized, so as to improve the efficiency and reliability [2].

As one of the cores of smart grid, measurement automation system undertakes the important functions of data acquisition and analysis [3]. With the deepening of the application of measurement automation systems, measurement automation terminals have adopted a large number of wireless communication methods to connect to the smart grid [4]. The wireless communication channel carries important business information such as remote control, telemetry, and remote signaling. Wireless attacks can cause information tampering or leakage, which can lead to the loss of accuracy and credibility of grid control, cause cascading grid failures or equipment damage, and cause economic losses, and endanger personal and social safety [5]. The complex network environment and endless attack methods also make the metering automation system face many challenges, and related security defense issues have gradually become current research hotpots.

Intrusion Detection System (IDS) is a security management system used to detect network intrusions. It can monitor the network transmission data in real time and take intrusive actions without affecting the internal network [6]. It can take measures such as monitoring, analysis, and early warning to improve the network’s ability to respond to external threats. IDS anomaly detection is a behavior-based detection technology that detects unknown attacks by checking whether the actual network behavior deviates from its normal behavior. Many researches on machine learning have developed intrusion detection technologies with machine intelligence, and achieved good results, such as support vector machines, artificial neural networks and genetic algorithms [7,8,9]. However, because the measurement automation system needs to consider the complexity of wireless communication combined with power service detection, the use of traditional intrusion detection models often has the following problems: (1) The collected network traffic data is high-dimensional, and manual selection of features is not effective enough and has little basis. Important features may be lost and redundant features may be retained. (2) Poor self-adaptation ability. As the network operating environment and structure changes, it is necessary to continuously update the model to detect new and unknown attacks. (3) The model fitting ability is poor. The traditional machine learning model has a simple structure and limited feature extraction and learning capabilities. When faced with a large-scale data set, it cannot form an effective non-linear mapping of the data distribution.

Therefore, in response to the problems of traditional methods, this paper designs and implements a learning method that can automatically extract intrusion features and analysis, and proposes an intrusion detection model based on stacked denoising convolutional autoencoding. The traffic data is processed as two-dimensional gray image as input. Combine the denoising autoencoder with the convolutional neural network, use the convolutional characteristics of the Convolutional Neural Network (CNN), fully learn the network features, solve the model by reconstructing the error, and use the adaptive algorithm Adam to optimize the network, so that it can learn the intrusion characteristics as much as possible and get a better recognition effect.

2 Related Work

Many machine learning algorithms have been applied to intrusion detection, using different machine learning algorithms to reduce the false alarm rate and detect abnormal network behaviors, currently. Shon et al. [10] proposed an intrusion detection classifier that combines genetic algorithm and support vector machine (SVM), which has a wide range of adaptability to real environment data sets; Zhao [11] proposed a Least Squares Support Vector Machine (LSSVM) model for network intrusion detection; Hussain et al. [12] proposed a two-stage hybrid classification method. In the first stage, SVM is used for anomaly detection, and in the second stage, artificial neural network is used for misuse detection. Traditional machine learning methods are very effective in intrusion detection and have a certain accuracy rate, but they cannot adjust the parameters of preprocessing and feature extraction independently. They need the manual participation of experts to complete the learning and classification goals, and the model performance depends on the quality of parameter tuning and selection features.

In order to solve the above problems, researchers have introduced deep learning technology. In recent years, deep learning has been widely used in speech recognition, image recognition and natural language processing. Since deep learning can automatically extract features from raw data and can effectively process large, complex and multi-dimensional network data, many scholars have also begun to combine deep learning methods with the field of intrusion detection, and have achieved good results. Erfani et al. [13] proposed a hybrid model that combines a deep belief network (DBN) with a single-class vector machine, and first uses restricted Boltzmann machine (RBM) to eliminate the negative effects of noise and abnormal data on the network and then use a single-class vector machine to achieve the classification task. Staudemeyer [14] pointed out for the first time that an LSTM recurrent neural network can be used for intrusion detection. LSTM can learn to look back in time and find some associations from a time perspective. Javaid et al. [15] combined the coding layer of the Sparse Autoencoder (used for feature extraction) and the soft-max function (used for class probability estimation), and designed a “self-learning” classification mechanism for NSL-KDD classification. Khan et al. [16] used CNN-based residual network (ResNet) and GoogleNet model for malware detection. Although the method based on deep learning has improved sample recognition ability and performance, it is prone to over fitting problems in the network training process, and there are many parameters, the training time is long, and the detection accuracy and efficiency need to be further improved.

3 Intrusion Detection Model of Measurement Automation System

The model framework of this paper is an intrusion detection model based on stacked denoising convolutional autoencoding neural network. The overall framework of the model is shown in Fig. 1.

Fig. 1.
figure 1

Intrusion detection model of measurement automation system

3.1 The Overall Architecture of the Intrusion Detection Model

It can be seen from Fig. 1 that the model mainly includes the following steps for the identification of intrusion detection in the measurement automation system:

Data Acquisition. The measurement automation system environment is built to obtain real-time network traffic data by monitoring and recording network traffic, including source address, target address, connection attribute and other related information.

Data Preprocessing. Process the data into a constructed and processable format. First, the character attributes are mapped to numerical attributes through one-hot encoding, and then the data is normalized to the [0,1] interval to eliminate the influence of the large dimension of different features in the network connection on the training of the intrusion detection model, and finally map the data to a two-dimensional grayscale image.

Model Building and Training. The stacked denoising convolutional autoencoder model was built to extract and analyze features. It pre-trains and adjusts the parameters of the standard data set to achieve the optimal extraction of standard data features. The input layer of the model is a two-dimensional gray image processing format, and the hidden layer is composed of the encoding and decoding of the convolution layer, the pooling layer and the full connection layer, the activation function of the convolutional layer adopts ReLU to learn feature information independently, the fully connected layer introduces the Dropout method to prevent over-fitting, and the output layer uses the soft-max classifier to output classification decisions.

3.2 Stacked Denoising Convolutional Autoencoding Network

The convolutional denoising autoencoder combines the convolution and pooling operations of the convolutional neural network on the basis of the autoencoder, so as to realize feature extraction and better solve the redundancy and distortion of various data information in the measurement automation system, effectively improve the detection rate. The stacked denoising convolutional autoencoder network designed in this paper first constructs an improved convolutional denoising autoencoder which let the data go through two convolution operations and then perform a pooling operation to strengthen the feature learning ability of the network. Then stack two convolutional denoising autoencoders, the first of which retains the complete structure, and the second retains only the encoding part. The output of the second convolutional denoising autoencoder is used as the input of two fully connected layers, and finally the recognition result is obtained through the Softmax output layer. The overall network structure is shown in Fig. 2.

Fig. 2.
figure 2

Overall architecture of stacked denoising convolutional autoencoding network

Convolutional Autoencoding Network. The detailed encoding and decoding process is derived as follows:

Encoding Process:

The output of the convolutional layer can be expressed as:

$$ h_{1} = f\left( {x \otimes W^{\prime}_{11} + b^{\prime}_{11} } \right) $$
(1)

Where \( x \) represents the input feature vector, \( \otimes \) is a convolution operation, \( W^{\prime}_{11} \) represents the first layer weight, and \( b^{\prime}_{11} \) represents the first layer bias, \( f \) is a nonlinear activation function, such as Sigmoid, Tanh, and ReLU. Compared with other activation functions, ReLU can make the network converge faster and reduce the training time. Therefore, this paper adopts the ReLU activation function, which is:

$$ f(x)_{{\text{Re} LU}} = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 0 & {\left( {x \le 0} \right)} \\ \end{array} } \\ {\begin{array}{*{20}c} x & {(x > 0)} \\ \end{array} } \\ \end{array} } \right. $$
(2)

The output of the pooling layer can be expressed as:

$$ h_{2} = pool\left( {h_{1} } \right) = f\left( {down\left( x \right) + b^{\prime}_{11} } \right) $$
(3)

Where \( pool \) represents the pooling operation. This paper uses maximum pooling to reduce the information redundancy caused by the convolution operation and \( down\left( \bullet \right) \) represents downsampling.

Decoding Process:

$$ h^{\prime}_{2} = f\left( {h_{2} \otimes W^{\prime}_{22} + b^{\prime}_{22} } \right) $$
(4)
$$ h^{\prime}_{1} = upsample\left( {h^{\prime}_{2} } \right) = f\left( {upsample\left( x \right) + b^{\prime}_{22} } \right) $$
(5)
$$ x^{\prime} = f\left( {h^{\prime}_{1} \otimes W^{\prime}_{21} + b^{\prime}_{21} } \right) $$
(6)

Where \( x^{\prime} \) is the reconstructed \( x \), \( W^{\prime}_{22} \) and \( b^{\prime}_{22} \) are the weight and offset of the first layer of convolution during decoding, \( h^{\prime}_{2} \) is decoded convolution output, \( upsample \) is upsampling, \( h^{\prime}_{1} \) is decoded pooled output, \( f \) is decoding activation function.

Training of Convolutional Autoencoding Network. The training process of the stacked denoising convolutional autoencoding network is as follow.

Forward Propagation:

  1. Randomly select batch input from the standard data set, and the data parameter dimension is (batch_size, h, w, c).

  2. Enter the convolutional denoising autoencoding neural network shown in Fig. 2, and the convolution operation and pooling operation are as shown in Eqs. (7) and (8) respectively.

    $$ h_{j}^{l} = f\left( {\mathop \sum \limits_{{i \in M_{J} }} h_{j}^{l - 1} \otimes W^{\prime}_{ij} + b^{\prime}_{j} } \right) $$
    (7)
    $$ Z_{j}^{l} = \beta \left( {W_{j}^{l} down\left( {Z_{j}^{l - 1} } \right) + b_{j}^{l} } \right) $$
    (8)
  3. Using the conv2d_transpose function and upsample function in Tensorflow to perform deconvolution pooling decoding, and input to the fully connected layer to output the result. The fully connected layer (FC) is calculated as follows:

    $$ y_{j}^{l} = f\left( {\sum\nolimits_{{i \in M_{J} }} {y_{j}^{l - 1} \otimes W_{ij}^{l} + b_{j}^{l} } } \right) $$
    (9)
  4. Solve the reconstruction error and use Softmax for data classification.

Back Propagation:

  1. Calculate the overall loss function \( J\left( {\omega ,b} \right) \) according to the classification results of the training set samples.

  2. Back propagate the weights and biases of the training network until convergence. In the model training process, in order to speed up the convergence time and improve the convergence accuracy, this paper uses Adam [17] to update the network model parameters. This method solves the problem of slow convergence and easy local optimization, and saves computer resources.

The loss function of this model is:

$$ J\left( {\omega ,b} \right) = J\left( {\omega ,b;x^{i} ,y^{i} } \right) + \frac{\lambda }{2}\sum\nolimits_{l = 1}^{{n_{l} - 1}} {\sum\nolimits_{i = 1}^{{s_{i} }} {\sum\nolimits_{j = 1}^{{s_{j} + 1}} {(\omega_{ji}^{l} )^{2} } } } $$
(10)
$$ J\left( {\omega ,b;x^{i} ,y^{i} } \right) = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {y\ln a + \left( {1 - y} \right)\ln (1 - a)} $$
(11)

Where \( J\left( {\omega ,b;x^{i} ,y^{i} } \right) \) is the cross-entropy loss function, which can reduce the difficulty of different classification problems in unbalanced data. \( \frac{\lambda }{2}\sum\nolimits_{l = 1}^{{n_{l} - 1}} {\sum\nolimits_{i = 1}^{{s_{i} }} {\sum\nolimits_{j = 1}^{{s_{j} + 1}} {(\omega_{ji}^{l} )^{2} } } } \) is a regularization term, that is, weight attenuation. Its purpose is to reduce the weight range and prevent training from overfitting. \( a = \sigma \left( h \right) \), \( h = \omega *x + b \). The activation function Sigmoid and its derivatives are shown in Eqs. (12) and (13):

$$ \sigma \left( h \right) = \frac{1}{{1 + e^{ - h} }} $$
(12)
$$ \sigma^{\prime}\left( h \right) = \frac{{e^{ - h} }}{{(1 + e^{ - h} )^{2} }} = \sigma \left( h \right)\left( {1 - \sigma^{\prime}\left( h \right)} \right) $$
(13)

The cross-entropy derivation of weights and biases is as follows:

$$ \frac{{\partial J\left( {\omega ,b;x^{i} ,y^{i} } \right)}}{{\partial \omega_{j} }} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\frac{{\sigma^{\prime}_{{\omega_{j} }} \left( h \right)x_{j} }}{{\sigma \left( h \right)\left( {1 - \sigma \left( h \right)} \right)}}\left( {\sigma \left( h \right) - y} \right)} $$
(14)
$$ \frac{{\partial J\left( {\omega ,b;x^{i} ,y^{i} } \right)}}{{\partial b_{j} }} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\frac{{\sigma^{\prime}_{{b_{j} }} \left( h \right)}}{{\sigma \left( h \right)\left( {1 - \sigma \left( h \right)} \right)}}\left( {\sigma \left( h \right) - y} \right)} $$
(15)

By substituting into Eq. (13), it can be obtained that:

$$ \frac{{\partial J\left( {\omega ,b;x^{i} ,y^{i} } \right)}}{{\partial \omega_{j} }} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {x_{j} \left( {\sigma \left( h \right) - y} \right)} $$
(16)
$$ \frac{{\partial J\left( {\omega ,b;x^{i} ,y^{i} } \right)}}{{\partial b_{j} }} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\left( {\sigma \left( h \right) - y} \right)} $$
(17)

In this paper, Adam optimization algorithm is used to update the weight and bias. The algorithm is shown in Algorithm 1.

figure a

In general, \( \alpha = 0. 0 0 1 \), \( \beta_{1} = 0. 9 \), \( \beta_{2} = 0. 9 9 9 \), \( \varepsilon = 1 0^{ - 8} \).

4 Experimental Data and Evaluation Index

4.1 Experimental Data Set

In this paper, the NSL-KDD [18] data set is selected as the experimental benchmark data. It is an optimized version of KDDCup99, which solves the problem of redundant data in the KDDCup99 data set. Its original training set KDDTrain contains 125,973 pieces of data, and the original test set KDDTest contains 22,544 pieces of data. This article uses 25,192 pieces of data of KDDTrain +20% as experimental data. Each row of data in the data set has 41 characteristic attributes and 1 label attribute, which mainly include 4 types of attacks: Dos (Denial of Service Attack), Probe (Port Vulnerability Scanning Attack), R2L (Remote Illegal Access Attack), U2R (unauthorized access attack).

4.2 Data Preprocessing

The NSL-KDD data set contains 41 feature attributes, including symbolic features (tcp, udp, icmp, …) and numerical features. The data needs to be standardized before it can be applied to the detection algorithm.

Character Data Mapping Numeric Data

“0, udp, ftp_data, SF, 491, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 1, 0, 0, 150, 25, 0.17, 0.03, 0.17, 0, 0, 0, 0.05, 0, Normal” is a piece of data in the data set. The analysis shows that the 2, 3 and 4 dimensional values are character types and need to be converted to numeric types, for example, the second dimension has three types (tcp, udp, icmp), and the third dimension has (‘auth’, ‘bgp’, ‘courier’, etc.) 70 types, the 4th dimension has 11 types (‘OTH’, ‘REJ’, ‘RSTO’, etc.), which are processed according to one-hot encoding, and finally 41-dimension is converted into 122-dimension attribute, one-hot encoding is shown in the Fig. 3 shown.

Fig. 3.
figure 3

One-hot encoding

Numerical Normalization. In feature vectors, different features generally have different dimensions and magnitudes. This situation will cause large differences in the size of the different feature values of each sample, which will affect the performance of the model. Features with a large order of magnitude will greatly affect the results of the model classification. Therefore, it is necessary to adopt Min-Max standardized processing to eliminate the influence of different orders of magnitude on the experimental results.

$$ X_{normal} = \frac{{x - x_{min} }}{{x_{max} - x_{min} }} $$
(18)

Where \( x \) represents the original value of the sample feature, \( x_{min} \), \( x_{max} \) represents the minimum and maximum value of the data, and \( X_{normal} \) represents the new characteristic value after normalization of each data.

4.3 Evaluation Index

This paper uses a confusion matrix to measure the experimental results, as shown in Table 1.

Table 1. Intrusion detection confusion matrix

The evaluation indexes are as follows:

Accuracy (ACC): The ratio of the number of correctly classified samples to the total number of samples.

$$ ACC = \frac{TP + TN}{TP + TN + FP + FN} $$
(19)

Precision (P): The data correctly judged as intrusion/normal accounts for the total number of data predicted as intrusion/normal.

$$ P = \frac{TP}{TP + FP} $$
(20)

Detection rate/Recall rate (Recall, R): The data correctly judged as intrusion/normal accounted for the total number of intrusion/normal data.

$$ R = \frac{TP}{TP + FN} $$
(21)

False Alarm Rate (FAR): The number of normal data predicted incorrectly accounts for the total number of normal data.

$$ FAR = \frac{FP}{FP + TN} $$
(22)

F1-Score: This indicator is the harmonic average of Precision and Recall.

$$ F1 - Score = \frac{2 \times P \times R}{P + R} = \frac{2 \times TP}{2 \times TP + FP + FN} $$
(23)

5 Experiment and Result Analysis

In order to verify the advantages of the intrusion detection method proposed in this paper under the measurement automation system, the intrusion detection model of this paper is simulated, and evaluation indicators are designed to test the performance.

5.1 Experimental Environment and Parameter Selection

This experiment uses Tensorflow to carry out the experiment simulation and chooses the Python programming language. The computer hardware configuration is Inter(R) Core(TM)i7-6700CPU@2.60 GHz processor, 16 GB memory, and the operating system is 64-bit Windows10. The main parameter variables in the model include convolutional autoencoding network structure parameters, learning rate, connection probability, and training times. The specific values of the parameters are shown in Table 2.

Table 2. Experimental variable parameters

5.2 Result Analysis

In order to evaluate the performance of the model in this paper, the model in this paper is compared with the classic network model NN [19], SVM [19], and the improved convolutional neural network model in literature [20]. The results are shown in Table 3 respectively. Figure 4 is a comparison chart of the accuracy of each type of attack in different models.

Table 3. Results of comparison with other models
Fig. 4.
figure 4

Comparison of accuracy of various attacks

It can be seen from Table 3 that the accuracy of the model proposed in this paper is 11.59% higher than the NN model, 9.63% higher than the SVM model, 4.07% higher than the improved convolutional neural network model in [13], and the detection rate is respectively improved by 12.59%, 10.44% and 3.88%, it is also better than other models in false alarm rate and F1-Score performance. It can be seen from Fig. 4 that the model proposed in this paper is significantly better than the NN model, SVM model and the model in the literature [13] in the recognition of Normal, Probe, and Dos attacks, and there has also been a slight improvement in the identification of U2R and R2L attacks. In summary, it can be concluded that the model proposed in this paper combining the characteristics of convolutional neural network and autoencoding network can be well applied to the intrusion detection system of the measurement automation system, and has good classification and detection performance.

6 Conclusion

At present, in view of the problems of low recognition efficiency, severe feature loss, and poor adaptive ability in the intrusion detection technology of measurement automation systems, this paper proposes an intrusion detection model based on stacked convolutional denoising autoencoding network. Compared with traditional models, this model can learn the internal characteristics of data more fully. The model in this paper uses Dropout and regularization to avoid the occurrence of over-fitting, and uses Adam to optimize the reconstruction error, speed up the convergence speed, and avoid local optimization. Compared with other models, the accuracy and detection rate of the model proposed in this paper are significantly improved, reaching 97.25% and 96.77% respectively, and the false alarm rate is also slightly reduced.

Although the method in this paper has improved the intrusion detection of the measurement automation system, there are still problems that need to be further solved, mainly in the following three aspects: (1) How to save the node storage space in the measurement automation system and ensure the efficiency of intrusion detection; (2) Considering further optimization of the algorithm for the possible problems of gradient disappearance and local optimization during model training; (3) To further strengthen the generalization ability of the model, try to verify the model with multiple intrusion detection data sets.