Keywords

1 Introduction

The Digital Twin (DT) [1,2,3] has been one of the most significant technology in timely monitoring dynamic situations of the physical model by building a simulation model. Here, DT is composed of a physical object, its virtual digital twin, and a mapping relationship that enables the co-evolution of both physical and virtual sides. The virtual digital twin continually adapts to operational changes based on the online collected data and predicts the state of the physical object. With the emergence of the internet of things, the development of communication modes, and the diversity of service types carried by the network, the network requires high flexibility and, as infrastructure, needs higher reliability. Therefore, digital twin technology is applied to the network to create the virtual image of the physical network facilities, that is, to build a digital network platform consistent with the physical network elements, topology, and data. The digital network platform can improve network safe and is an effective method for network anomaly detection, such as DDoS attacks mentioned below, which is the main work in this paper.

With the rapid development of computer and communications technology, the network has been the global information systems’ most critical media facility. Thus, maintaining network security has become a prerequisite for ensuring the sustainable development of information work in various fields. Distributed Denial of Service (DDoS) attack [4, 5] has been one of the most common and unable factors in the network and information environment. Attackers use many zombie machines to simultaneously send several normal or abnormal packets to the target. Ultimately, the target can not provide service due to the system resources or network bandwidth being exhausted or even collapsed [6,7,8].

In terms of the problem DDoS detection faced regarding the classification effect of variable small sample data, traditional methods, such as traffic cleaning technology based on monitoring and filtering of network traffic [4], signature-based and anomaly-based intrusion detection systems [9], do not have an excellent ability to process single and small data for the current data collection environment, which cause a significant degree of detection error. Artificial intelligence technology at this stage, such as deep learning, is highly dependent on the quantity and diversity of training data. However, due to the limited number of users, a sizeable sample of data is unavailable to a single device. Moreover, many internet privacy legal systems stipulate that raw data from users is not shared with others. Federal learning, a relatively novel machine, is highly suitable for solving the problem that data is stored in separate devices and can not be shared and protect privacy. Therefore, this paper will use it to distinguish abnormal traffic by distributed training.

In our study, there is a complex and efficient mapping relationship between the physical network and digital network platform by integrating the digital twin technology. Then it is aimed to detect DDoS attacks on the dataset by building an intrusion detection system with federated learning. The digital twin network platform can quickly find which client in the physical network is under attack by the detection system through real-time interactivity. Our study is focused on providing the following contributions:

  • We construct the federated learning to detect DDoS attacks under the Digital Twin Networks. It does increase not only efficiency but also carries real-time monitoring of clients’ status.

  • We leverage the federated learning scheme to construct the Digital Twin Networks models, solving the data island problem and protecting data privacy.

  • We propose an optimization framework based on FedProx [10] that deals with system and statistical heterogeneity inherent in federated networks and improves model accuracy. Moreover, we choose the LSTM model to detect DDoS attacks due to the correlation between features.

The paper is arranged as follows. In Sect. 2, studies in detecting DDoS attacks are included. In Sect. 3, the theorical background and structure of the proposed model are expressed. Section 4 contains the test environment, assessment criteria and test results. Finally, the general evaluation of the study has been made and future studies have been examined in Sect. 5.

2 Related Work

Under the background of Digital twin, the use of federated learning in DDoS attacks seems to be relatively less proposed than other machine learning. Other algorithms and models are used in many studies to detect DDoS attacks. In this section, the characteristics and shortcomings of these studies are summarized.

The paper [11] adopts the CAT (change-aggregation tree) mechanism to carry out a collaborative analysis of the router traffic flowing through the same ISP network and analyze the flow distribution of each interface to find some abnormalities. A similar flow distribution analysis solution is proposed in [12], but with the requirement that it implements constant or increasing speed flow detection on the backbone network by cross-correlation and weight vector analysis. They do not consider scenarios in which DDoS attacks and heavy traffic access cannot be distinguished. The paper [13] proposes a DDoS detecting method based on a random forest classification model with the classification standard of data flow information entropy. After extracting features from ordinary modes of DDoS attacks, they use their detecting model to distinguish normal or abnormal flow. However, it is only suitable for a single group; the detection accuracy is very low for multiple groups. To deal with the difficulty that traditional DDoS detection mechanism based on SDN controllers lacks network-wide monitoring information or exists serious communication overhead, a new cross-platform collaboration DDoS detection model is proposed in paper [14], called OverWatch. In the structure of OverWatch, they put forward a lightweight flow detection algorithm to capture fundamental eigenvalues of DDoS attack traffic by taking turns asking the values in the counter of the OpenFlow switch.

With the attack means becoming more and more complicated, the above DDoS attack detection has the shortcoming of low detection efficiency, hugely time-consuming, error report, etc. The anomaly detection algorithms based on machine learning can master some standard features by learning from existing intrusion behaviours and are used in many studies to distinguish abnormal traffic from regular traffic. Among these studies, the use of deep learning seems to be more successful than the use of shallow machine learning [15, 16]. The fight against DDoS is the most crucial factor in detecting and separating network traffic. The study [17] summarised the examination of some deep learning model [18,19,20], it showed that deep learning has a high level of accuracy in the detection of DDoS attacks. Moreover, it also suggested that a deep neural network (DNN) as a deep learning model can work quickly and with high accuracy because it includes feature extraction and classification processes in its structure and has layers that update itself as it is trained. In [21], they propose a DDoS attack detecting method based on a convolutional neural network (CNN), which includes the feature processing and detection model. Although there methods using deep learning in DDoS attacks show better performance, most of them ignore data distribution in real word, which is very unfavorable for their training. Therefore, Our method can solve this problem while ensuring accuracy.

3 Background and Proposed Model

3.1 The Framework of Federated Learning

Fig. 1.
figure 1

The structure of federated learning

The DDoS attack detection model based on machine learning needs to extract features and carry out analytical learning on a massive of valid network packets, but whose number is highly few and data type is single for many organizations in many fields. Federated learning, which can leave the training data on a massive number of nodes (devices) and train a shared model by aggregating locally-computed updates, is adopted to solve the problem of data island. In federated learning, participants train their local model and send local weight to the centre unit, such as orchestrating a central server. Then the centre unit will send the global weight to the participant to update the local model. The training process of federated learning is shown in Fig. 1.

Considering the heterogeneity of data and the communication differences between devices in the network environment, the optimization algorithm was raised to reduce the inherent influence of federated learning. In the traditional setting of federated learning, such as the Federated Averaging (Fedavg) algorithm proposed by [22]. At each round, with the same learning rate and the number of local epochs, a subset K\(\ll \)N of the total devices are selected and run stochastic gradient descent (SGD) locally for E number of epochs. Then the resulting model updates are averaged. The adjustment of local epochs plays a vital role in convergence. On the one hand, more local epochs can reduce communication, dramatically improving communication convergence speed in communication-constrained networks. On the other hand, a more significant number of local epochs may lead each client to deviate from the optima of the global objective. Moreover, in federated learning with training performance differences among clients and dissimilar systems resources, setting the fixed number of the local epochs may increase the risk that some clients do not complete training in time and therefore drop out of the procedure [23], seriously hurting the performance of convergence. In traditional federated learning methods (e.g., [24]; [22]), the aim of the global learning objective is to minimize:

$$\begin{aligned} \begin{array}{l} \min _{w} f(w)=\sum _{k=1}^{N} p_{k} F_{k}(w)=\mathbb {E}_{k}\left[ F_{k}(w)\right] \\ \text{ s.t. } \quad p_{k} \ge 0, \sum _{k} p_{k}=1, p_{k}=\frac{n_{k}}{n} \end{array} \end{aligned}$$
(1)

To limit the impact of non-IID (identically and independently distributed) variable local updates and make each client towards the optima of the global objective as opposed to its local objective, instead of minimizing the local function \(F_{k}(\cdot )\), the client k use its local solver of choice to minimize the following objective \(h_{k}\):

$$\begin{aligned} \min _{w} h_{k}(w ; w^{t})=F_{k}(w)+\frac{\mu }{2}\left\| w-w^{t}\right\| ^{2} \end{aligned}$$
(2)

Further, \(\gamma _{k}^t\)-inexact solution is introduced to dynamically adjust the number of local epoches through the imprecise solution of local function, which extremely ensures the tolerance for heterogeneous systems. If \(w^*\) satisfies the following Eq. 3, it is called \(\min _{w} h_{k}\left( w ; w_{t}\right) \) of \(\gamma _{k}^t\)-inexact solution.

$$\begin{aligned} \begin{array}{c} \left\| \nabla h_{k}\left( w^{*} ; w^{t}\right) \right\| \le \gamma _{k}^{t}\Vert \nabla h_{k}\left( w^{t}; w^{t}\right) \Vert \\ \nabla h_{k}\left( w ; w^{t}\right) =\nabla F_{k}(w)+\mu \left( w-w^{t}\right) \\ \gamma \in [0,1] \end{array} \end{aligned}$$
(3)

We improve the algorithm according to FedProx Framework proposed by [10] in clients’ processing strategy and as shown in Algorithm 1. Devices are divided into groups. The training process is divided into two stages: intra-group training and inter-group training, and different optimization strategies are adopted in different stages. Moreover, We select the top Z devices according to the gradient descending order instead of randomly selected clients, accelerating the convergence speed and improving accuracy through the same experimental setting results.

3.2 Federal Learning Integrated into Digital Twin Network

Digital twin (DT) can accurately substitute for a real-world object across multiple granularity levels, and this real-world object could be a robot, device, machine, complex physical system or an industrial process. With moving the definitions of DT technology to DTN (digital twin network) shown in Fig. 2(a), DTN is defined as a many-to-many mapping network constructed by multiple one-to-one DTs. DTN uses advanced communication technologies to realize real-time interaction between the physical object and its virtual twin, the physical object and other physical objects, and the virtual twin and other virtual twins. Meanwhile, the physical object and virtual twin can collaborate, share information, and complete tasks. The DTN, whose simple structure proposed by this paper is shown in Fig. 2(b), has been applied to detect DDoS attacks by using federated learning.

figure a
Fig. 2.
figure 2

The DT and DTN

DTN provides a corresponding virtual network for the internet. The digital platform can artificially control clients through base stations and servers and use a federal learning model to detect abnormal traffic in advance. The actual physical node uses the information feedback from the virtualized network to improve network security.

3.3 The DDoS Attack Detection Model

Fig. 3.
figure 3

The structure of DDoS attack detection model

The detection process mainly includes two steps: Data preprocessing and classification prediction. The structure of the DDoS attack detection model is shown in Fig. 3.

There are many non-numeric or unnecessary attributes on the DDoS dataset in data preprocessing, so these attributes need to be numeric, and all data should be normalized and reconstructed to obtain the 2D standard dataset. The classification model will identify whether the network record belongs to normal or is attacked by distributed learning. Moreover, in the DTN construct, the federated model will be built on the digital twin platform, and the virtual digital platform can provide feedback to prepare actual physical nodes for prevention in advance. DTN can easily detect feedback, collect traffic information, and achieve real-time analysis on actual physical internet nodes with the detection model, computing, and communications technologies.

This paper divides each traffic sample into several parts. Furthermore, there is a strong or weak correlation among the features in the attack dataset, so each part is dependent on the other, which is regarded as a time step. LSTM (Long-short time memory) model [25] introduces a memory cell to replace each ordinary node in the hidden layers and can ensure that the gradients can pass through many times steps without vanishing and exploding. Therefore, the LSTM model is chosen to detect attacks.

4 Experiments and Results

4.1 Experiments Environment

The federated learning model training experiment is based on the hardware environment of Win 10 OS, Inter (R) Core (TM) i7-10700 CPU 2.90 GHz processor, 16 GB RAM, Netac 256 GB SSD, and NVIDIA K80 GPU, and the software environment of Python 3.7 programming language and libraries.

4.2 Dataset and Preprocessing

The famous Knowledge Discovery and Data mining (KDD) CUP 1999 dataset was produced by [26,27,28,29] used in detecting DDoS attacks and classifying attack types. Although existing for a long time, it still remains of certain credibility in the academic circle and has been widely used in the research of network intrusion. With the uneven distribution of the dataset, we randomly assigned the dataset. The attacks in the dataset were divided into four categories as DoS, R2L, U2R, and Probing, as shown in Table 1.

Table 1. The classification of attack types in the dataset

For data preprocessing, the specific steps were as follows:

  1. (1)

    41 features and one label of the network traffic packages in the dataset were needed for numerical conversion numerical standardization. The numerical conversion processing required three features (protocol_type, service, flag) and the string label for data type conversion. For example, the service feature contained 70 kinds of network service types, so they were coded from ‘1–70’ one by one. “normal” was labelled ‘0’ and other attacks were labelled ‘1–4’ to detect attacks in the network traffic.

  2. (2)

    The data type processed by LSTM in this paper was two-dimensional data, so converting the original data into a two-dimensional matrix is necessary. This paper adopted Gaussian distribution to randomly expand the length of each sample to 49 and then standardized the dataset. Lastly, turned the data into a 7 * 7 matrix.

4.3 Performance Metric

Confusion matrix [30] is used to determine the learning criteria of the model. The elements of the confusion matrix consist of True Positive (TP), False Positive (FP), True Negative (TN), False Negative (FN). TP indicates the number of records that correctly predict the attack traffic as an attack; TN indicates the number of records that correctly predict the normal flow records as normal; FP indicates the number of records that mistakenly predict the attack flow records as normal; FN indicates the number of records that mistakenly predict the normal traffic as an attack. The metrics obtained using these elements are described below as in [31]:

The accuracy obtained by Eq. 4 shows the model’s correct prediction rate.

$$\begin{aligned} Accuracy = \frac{TP + TN}{TP + FP + TN + FN} \end{aligned}$$
(4)

Precision obtained by Eq. 5 shows how much of the positive predictions are correctly predicted.

$$\begin{aligned} Precision = \frac{TP}{TP + FP} \end{aligned}$$
(5)

Recall obtained by Eq. 6, shows how much of the true positives are predicted correctly.

$$\begin{aligned} Recall = \frac{TP}{TP + FN} \end{aligned}$$
(6)

F score, obtained by Eq. 7, shows the stability between recall and sensitivity.

$$\begin{aligned} F1 = \frac{2\times Precision \times Recall}{Precision + Recall} \end{aligned}$$
(7)

4.4 Results

The experiment determined the global round and local epoch values to be 200 and 20 in training. On the one hand, to verify the accuracy of the detecting method, there was a set of controlled trials concerning the proposed model, Fedprox framework, and FedAvg framework (detailed in Sect. 3.1). On the other hand, it was also compared with federated learning with an original optimizer to verify the stability of the model dealing with data of non-IID.

Fig. 4.
figure 4

Proposed approach results in significant convergence improvements relative to Fedprox and Fedavg. We simulate different levels of systems heterogeneity by forcing 0%, 50%, and 90% devices to be the stragglers (dropped by FedAvg). (1) Comparing these three optimization algorithms under the same experimental setting, we see that our algorithm’s performance can help convergence in the presence of systems heterogeneity. (2) With the same levels of systems heterogeneity, we show that setting \({u}>1\) can lead to more stable convergence. (3) Note that Fedprox with \({u}= 0\) and without systems heterogeneity (no stragglers) corresponds to Fedavg.

Table 2. The results of the three methods in performance metrics. With the experimental environment mentioned above, we show that the accuracy of our approach is higher relative to the Fedprox and Fedavg.

The effect of data heterogeneity on convergence can be seen in Fig. 4, where the proposed model showed better convergence than federated learning based on Fedavg and Fedprox by several experiments. We have shown the training loss on the datasets, which were unevenly distributed to each node, modelling the data distribution in the real world. Increasing heterogeneity can lead to worse convergence, but setting the setting optimal value of u can help combat this.

The results of the three methods in performance metrics can be seen in Table 2, where the proposed model correctly detected DDoS attacks up to 99.17%, more than Fedprox and Fedavg. It showed that the accuracy improved in our proposed model. The results we obtained for the sample from the KDD Cup99 dataset show that federal learning based on LSTM and network traffic analysis has had great success in using small databases. Furthermore, through DTN technology, the physical node can defend quickly after receiving feedback information from the virtual platform.

5 Conclusion and Future Work

This paper proposes detecting DDoS attacks based on federated learning and the LSTM model under the Digital twin network. Federal learning, an innovative modelling mechanism, allows multi-party collaborative participation, which increases the number of the sample and protects the security of local data in each participant. Meanwhile, the digital twin network ensures the reliability of distributed training and cooperation among physical objects, which can respond quickly based on feedback from the virtual model. An improved optimization algorithm is introduced to solve heterogeneity inherent in federated networks. Our empirical evaluation across the KDD CUP 1999 dataset has achieved our expected effect and demonstrated that the optimization framework could significantly improve model accuracy and the convergence behaviour of federated learning in realistic heterogeneous networks.