Keywords

1 Introduction

With the development of cloud technology, Distributed denial of service (DDoS) attacks gather a large number of botnets and send a large number of continuous attack requests to the target system to increase the attack power by taking advantage of cloud computing bandwidth network access and rapid rebound performance [1,2,3], and make the victims face huge amounts of economic losses.

The Arbor Networks report shows that [4] in recent years, as many as 48% of DDoS attacks in all cyber threats, it is clear that DDoS attacks have become the main cybercrime in today’s society. Although we minimize the number of DDoS attacks, they have rapidly expanded the frequency and scale of target networks and computers, and are evolving to exploit flash population agents, low-rate attacks, and exploit vulnerabilities in DNS servers to scale up attacks. DDoS attacks continue to threaten today’s network environments, and these threats have grown significantly in terms of the size and impact of Internet service providers and governments. DDoS attack features such as diversity, distribution, suddenness and concealment, as well as network flow scale, mass and complexity in the new network environment make current DDoS attack detection methods have such problems as high false alarm rate, high false alarm rate and poor timeliness. Accurate and efficient detection of DDoS attacks, reducing economic losses and negative social impacts is imminent. Therefore, this paper proposes a DDoS attack detection method based on V-SVM. Compared with similar methods, this method not only improves the accuracy, reduces the false negative rate, but also ensures the stability and timeliness of the classification model.

2 Related Work

In recent years, researchers have proposed a large number of DDoS attack detection methods. The existing DDoS attack detection methods are classified into two types: statistics-based attack detection and machine learning-based attack detection.

2.1 Attack Detection Method Based on Statistics

The statistically based detection methods mainly include entropy method, principal component analysis method, correlation and covariance. Christiane et al. [5] proposed an attributional selection method based on Renyi and Tsallis entropy, and evaluated the advantages and disadvantages of Renyi and Tsallis entropy by comparing with shannon entropy, so as to obtain the optimal attribute subset to distinguish normal flow or attack flow. Qi et al. [6] proposed a dynamic model based on entropy to detect DDoS attacks. This model used the conversational relationship of network flow from different perspectives to compare the change rates of dynamic and static entropy values in anomaly detection, and found that the dynamic entropy method is more sensitive and more suitable for anomaly detection. Mohiuddin et al. [7] developed the problem of detecting DDoS attacks as a collective anomaly, and proposed a framework for collective anomaly detection. The statistical analysis of the characteristics of the attack flow and the use of classification clustering techniques to achieve the detection purpose; Cheng et al. [8] ignore the redundancy filtered from the dynamic feature set according to the sparse distribution pattern of network anomalies and its low-dimensional feature attributes, and design a new detection scheme by using the sparsity of network anomaly distribution; Park et al. [9] proposed a method for detecting traffic flood attacks by using probabilistic model to analyze abnormal data detection.

2.2 Attack Detection Method Based on Machine Learning

Machine learning-based detection methods include Support Vector Machine (SVM), Naive Bayes algorithm (NB, Naive Bayes), and decision trees. Karnwal et al. [10] converted the one-dimensional time series into multi-dimensional AR model parameter timings, and used the support vector machine to learn and classify the data stream. Tama et al. [11] used the method of anomaly detection to model the network data stream according to the header attribute, and used the naive Bayesian algorithm to score each arriving data stream to evaluate the rationality of the message. Gao et al. [12] focuses on constructing a privacy-preserving NB classifier that is resistant to an easy-to-perform, but difficult-to-detect attack, which we call the substitution-then-comparison (STC) attack. Latif et al. [13] proposed an enhanced decision tree algorithm for the impact of noise on the accuracy of sensor generation data, which can effectively detect the occurrence of DDoS attacks in Wireless Body Area Network (WBAN). Ma et al. [14] present the generic secure outsourcing schemes enabling users to securely outsource the computations of exponentiations to the untrusted cloud servers. Li et al. [15] introduce Significant Permission IDentification (SigPID), a malware detection system based on permission usage analysis to cope with the rapid increase in the number of Android malware. Because the cloud servers are usually untrusted, Li et al. [16] propose a framework for privacy-preserving outsourced classification in cloud computing (POCC). Li et al. [17] propose a new secure provenance scheme based on group signature and attribute-based signature techniques.

3 DDoS Attack Characteristics

3.1 Analysis of DDoS Attack Characteristics

In the actual network environment, there are external factors such as noise, delay and congestion. To effectively detect a DDoS attack, selecting a group of features that can comprehensively reflect the attack is the core factor to ensure the classifier to speed up learning, reduce computational complexity, improve accuracy and stability [18, 19]. The DDoS attack has a strong correlation with time. Lee et al. [20] use 2 s as a time interval to count the current number of target host connections compared with the number of previous connections or the number of incorrect connections before and after the service as a percentage of the total number of connections. These metrics increase dramatically when the victim detects that the host or server has received a large number of connection requests within a certain period of time. Experiments and related research [21,22,23] have shown that statistics on network traffic based on fixed time intervals can effectively reflect the characteristics of DDoS attacks. Therefore, a nine-node Network service association feature (NSAF) is defined to describe the changing state of the network flow.

$$ NSAF = \left( {T,S,L,N,J,R,P,Y,C} \right) $$
(1)

Where the number of hosts with the same connection and the same target is represented by T; The number of the same service with the same connection is denoted by S; L = {The number of SYN error connections during sampling time/T}; N = {The number of SYN error connections during sampling time/S}; J = {The number of REJ error connections during sampling time/T}; R = {The number of REJ error connections during sampling time/S}; P = {The number of connections with the same service as the current connection/T}; Y = {Number of connections with different services from the current connection/T}; C = {The number of connections with different target hosts from the current connection/S}.

3.2 Data Normalization

The classification accuracy of the model will be affected due to the possible interference of default values, singularities or noises in the samples. Therefore, it is necessary to normalize the data to solve the influence of the dimension between the data indicators to speed up the gradient and find the optimal solution speed. This paper analyzes the following two common data normalization methods: Z-score Standardization and Min-Max Scaling.

Z-Score standardization is a numerical unification of the mean and standard deviation of the original data. The processed data basically conforms to the standard normal distribution, i.e. “Z-distribution”. The sample average is 0, the variance is 1, and the conversion function is:

$$ {\text{Z}} = \frac{{{\text{x}} - \mu }}{\delta } $$
(2)

Where \( \mu \) is mean of all sample data and \( \delta \) is the standard deviation of all sample data.

Min-Max normalization, also known as dispersion normalization, is a linear transformation of the original data, mapping the resulting values between [0, 1] or [–1, 1]. The conversion function is as follows:

$$ {\text{x}}\text{ * } = \frac{x - \hbox{min} }{\hbox{max} - \hbox{min} } $$
(3)

Where max and min are the maximum and minimum values of the training set or test set respectively. Since z-score method has the defect of changing the distribution of original data, sample points are distributed within the interval [0, 1] after the normalization of min-max, and all indexes are in the same order of magnitude, without changing the distribution between the data. For attributes with very small variances, the method can also enhance its stability and maintain an entry of 0 in the sparse matrix. The index values of the data sets used in this experiment are distributed in a fixed interval, and min and max are stable and do not involve cluster analysis. Therefore, this paper adopts the min-max method to normalize data, which simplifies the complexity of comparability among various indicators and is suitable for comprehensive evaluation of data.

3.3 Feature Extraction Based on Principal Component Analysis

Principal Component Analysis (PCA) is widely used in the field of network security. It recombines a group of previously correlated indexes into a new set of unrelated comprehensive indexes. The fewer the number of Principal components, the better the dimensionality reduction effect [24].

This paper proposes a feature extraction algorithm based on PCA, as shown in Table 1. Through the PCA dimension reduction, the nine characteristic indicators in the training set are converted into three comprehensive indicators. The sum of the cumulative variances of the three eigenvalues accounts for about 97% of the total variance, and the residual contribution rate is very low. Each of the principal components can reflect most of the information of the original features, and the information contained therein is not repeated.

Table 1. PCA-based feature extraction algorithm

PCA algorithm simplifies the process of data analysis, achieves the purpose of reconstructing the original high-dimensional vector, and obtains more scientific and effective data.

4 Detection Method of DDoS Attack Based on V-SVM

The support vector machine method is based on the VC dimension theory of statistical learning and the principle of structural risk minimization. Based on the information of finite samples, it seeks the best compromise between the complexity and learning ability of the model to obtain the optimal promotion ability. Compared with other machine learning algorithms such as neural network, decision tree and Adaboost, SVM classifier has simpler structure design, moderate computational complexity and better generalization performance. It has many outstanding advantages in solving small sample, nonlinear and high dimensional pattern recognition.

Although the data preprocessing technology successfully solves the problems of irregular data distribution, singular values or noise points, the unstructured attributes of the network data itself often affect the accuracy of the classification model. Shortly after the advent of the SVM, Vapnik et al. [25] proposed a nonlinear soft-spaced support vector machine, C-SVM, which introduced the slack variable ζ and the penalty parameter C to obtain a more accurate classification hyperplane, thus eliminating the interference of external factors.

Since C is a weight value, it only depends on the selection of artificial experience and has no practical theoretical basis. Based on C-SVM, Schölkopf et al. [26] proposed a V-SVM to cancel parameter C, adding parameters V and variables p that can control the number of support vectors and error vectors.

In order to fully reflect the value of parameter V, improve the detection efficiency of DDoS attacks and reduce the failure rate, this paper proposes a detection method of DDoS attacks based on V-SVM.

4.1 Detection Model of DDoS Attack Based on V-SVM

After the network flow is collected, the NSAF feature vector is calculated at a time interval of 2 s to indicate the change of the state before and after the network flow.

\( {\text{T}} = \{ ({\text{x}}_{1} , {\text{y}}_{1} ), \ldots ,({\text{x}}_{\text{l}} , {\text{y}}_{\text{l}} )\} \in ({\text{X}} \times {\text{Y}})^{\text{l}} \), is as the training set, among them \( {\text{x}}_{\text{n}} \in {\text{X}} = {\text{R}}^{\text{D}} \), indicates that the nth sample is a normal flow or an attack flow; \( {\text{y}}_{\text{n}} \in {\text{Y}} = \{ 1, - 1\} ,{\text{n}} = 1, \ldots ,{\text{l}} \), indicates the true label of the nth sample, and l indicates the size of the training set. The initial classification model that substitutes the training set T into the V-SVM can be expressed as:

$$ \begin{array}{*{20}c} {\mathop {\hbox{min} }\limits_{w,b,\xi ,\rho } \left( {\frac{1}{2}\left\| \omega \right\|^{2} - \nu \rho + \frac{1}{l}\sum\limits_{n = 1}^{l} {\xi_{n} } } \right)} \\ {s.t.\quad \; \, y_{n} ((w \cdot x_{n} ) + b) \ge \rho - \xi_{n} ,\xi_{n} \ge 0} \\ {{\text{n}} = 1 , 2 ,\ldots ,l\quad \;\rho \ge 0} \\ \end{array} $$
(4)

Where \( \frac{1}{2}||\omega ||^{2} \) is the idealized maximum classification interval, \( \frac{1}{l} \) is the upper limit of the outliers, \( \xi_{n} \) is the classification interval error, and b is the decision function constant term.

Under the premise of satisfying the Karush-Kuhn-Tucker condition, the initial problem is equivalent to its dual problem. The dual model of Eq. (4) is:

$$ \begin{array}{*{20}c} {\mathop {\hbox{min} }\limits_{\alpha } \frac{1}{2}\sum\limits_{n = 1}^{l} {\sum\limits_{m = 1}^{l} {y_{n} y_{m} \lambda_{n} \lambda_{m} K\left( {x_{n} ,x_{m} } \right)} } } \\ {s.t. \, \quad \, \sum\limits_{n = 1}^{l} {y_{n} } \lambda_{n} = 0, \, 0 \le \lambda_{0} \le \frac{1}{l} \, } \\ {{\text{ n}} = 1 , 2 ,\ldots ,l \, \quad \; \, \sum\limits_{n = 1}^{l} {\lambda_{0} } \ge \nu } \\ \end{array} $$
(5)

Among them, \( \lambda_{n} \) is a lagrange multiplier. In order to ensure the validity of the classification model, the appropriate parameter V and kernel function \( K\left( {x_{n} ,x_{m} } \right) \) should be selected to obtain the optimal lagrange multiplier \( \lambda^{ * } \):

$$ \lambda^{ * } { = }\left( {\lambda_{1}^{ * } , \ldots ,\lambda_{l}^{*} } \right)^{T} $$
(6)

Further solve:

$$ \begin{array}{*{20}c} {w^{*} = \sum\limits_{n = 1}^{l} {\lambda_{n}^{*} y_{n} x_{n} } } \\ {b^{*} = - \frac{1}{2}\sum\limits_{n = 1}^{l} {\lambda_{n}^{*} y_{n} \left( {K\left( {x_{n} ,x_{m} } \right) + K\left( {x_{n} ,x_{k} } \right)} \right)} } \\ \end{array} $$
(7)

Bring (7) into (4) to get the optimal hyperplane in the high dimensional feature space:

$$ f\left( z \right) = {\text{sgn}}\left( {\sum\limits_{n = 1}^{l} {\lambda_{n}^{*} y_{n} K(x_{n} ,x_{m} ) + b^{*} } } \right) $$
(8)

Among them, A is the classification result of the network flow.

4.2 Determination of the Kernel Function of the Classification Model

Most of the classification problems existing in real life are nonlinear. The support vector machine method is to choose a kernel function for nonlinear expansion. According to the actual problem and the research object, choosing the appropriate kernel function often plays a decisive role in the classification effect of the model. Under the conditions of Linear, Polynomial, Radial Basis Function and Sigmoid four different types of kernel functions, the first two dimensions of the partial training set are selected for visualization. The red and green marks represent normal sample points and attack sample points respectively. Black marks indicate support vectors. As shown in Fig. 1, the effect of the kernel function on the decision hyperplane position is visually reflected.

Fig. 1.
figure 1

Schematic diagram of V-SVM classification under different types of kernel functions (Color figure online)

The experimental results based on different types of kernel functions are shown in Sect. 4, and the results are verified by accuracy, detection rate, false negative rate, and run time. As shown in the formula (9), it is a calculation method of the DDoS attack detection false negative rate.

$$ MR = \frac{{N_{FP} }}{{N_{TN} + N_{FP} }} $$
(9)

Among them, NFP is the number of negative samples that are misjudged as positive by the classification model, NTN is the number of negative samples that are judged to be negative by the classification model, and MR is the false negative rate.

4.3 Determination of V-Values of Classification Model Parameters

Although the parameter V controls the number of support vectors and error vectors very well, the specific detection effects may not be achieved due to the differences in the specific objects of the training and the different influences of the features on the classification results. At present, there is no uniform set of best methods for the selection of V values at home and abroad. This paper analyzes the influence of artificial empirical value V on DDoS attack detection based on specific data, so as to select an appropriate V value. The experimental results are shown in Sect. 4.

4.4 Detection Process of Attack Detection Method

As shown in Fig. 2, the data set is first analyzed, the appropriate features are selected, and the training set is normalized and PCA dimensionality reduction processing; The second step is to construct various SVM classification models, and study the process of classification model parameters and kernel function parameters optimization; The third step initializes the training model, introduces the training set to start the training of the model, and obtains the coefficients w and b of the decision function through the Sequential Mini Optimization; Finally, the constructed classification model is tested on a test set containing unknown tags, and the experimental results are analyzed.

Fig. 2.
figure 2

DDoS attack detection process

5 Experimental Results and Analysis

5.1 Experimental Environment

The experimental data set selected in this paper uses the 9-week network traffic [27] collected by the Massachusetts Institute of Technology (MIT) in 1999, based on the MATLAB R2014a platform, combined with the LIBSVM toolbox and the MATLAB language, at 2.60 GHz, Intel Core i5-3230M. The processor and 4G memory are running under the computer to implement the following three classification methods.

5.2 Analysis of Experimental Results of Kernel Function and Parameter V Value

In order to achieve good detection results, this experiment analyzes the choice of nuclear function types. 5,000 and 1000 sample records were randomly sampled from the training set and test set containing multiple types of DDoS attacks (the test set contained unknown attacks). Under the condition that the empirical parameter V is 0.3, experiments are carried out based on different types of kernel functions, and the accuracy, detection rate, false negative rate and running time are obtained. The experimental results are shown in Table 2. Compared with other kernel functions, the RBF kernel function has the highest classification accuracy and the lowest false negative rate.

Table 2. V-SVM classification results for 5000 training samples with different kernel functions

As shown in Table 3, in each set of experiments, the parameter V represents the upper bound of the ratio of the number of error vectors to the total number of samples, and is also the lower bound of the ratio of the number of support vectors to the total number of samples [28, 29]. As the V value increases, the support vector increases by the same proportion. It controls the number of support vectors well. The accuracy and detection rate remain stable, the false negative rate decreases and tends to be stable, but the running time will increase. When the V value is 0.2, the overall performance of the classifier is optimal. The experiment also shows that the selection of the V value is too small or too large will seriously affect the classification effect of the model, resulting in low classification accuracy and even classification error reporting. By analyzing the experimental results, the V value is 0.2 to study the performance of the D-S attack detection model based on V-SVM.

Table 3. V-SVM classification results for 1000 test samples with different V values

5.3 Comparison and Analysis of Three Models

In this experiment, 5 training samples were randomly selected from the training concentration containing normal flow and attack flow in the first 7 weeks, with the sizes of 1000, 2500, 5000, 10000 and 20000 records successively. Samples with sizes of 200, 500, 1000, 2000 and 4000 were randomly selected from the mixed traffic containing normal, known and unknown attacks in the following 2 weeks as the test set.

Through five sets of experiments with different sample sizes, the classification results of C-SVM parameters before and after optimization (C, g) were compared. In the optimization process, as the sample size increases, the grid search method and cross validation will have an impact on the timeliness of the C-SVM algorithm. The detection time of the grid search based C-SVM method is much longer than that of the traditional C-SVM algorithm. However, compared with the empirical parameters, the classification accuracy of the latter is significantly improved, and the false negative rate after parameter optimization is reduced, which proves the effectiveness of the C-SVM method based on grid search. As shown in Tables 4 and 5.

Table 4. Classification results based on RBF kernel function C-SVM
Table 5. Results of C-SVM classification based on grid search

Although the C-SVM classification method based on grid search is improved compared with the traditional C-SVM classification method, the number of error vectors and the number of support vectors are too large or too small, which will result in poor classification of support vector machines and poor anti-interference performance of SVM. In view of the shortcomings of the above two methods, this paper proposes a DDoS attack detection method based on V-SVM. The experimental results are shown in Table 6.

Table 6. Results of V-SVM classification based on RBF kernel function

As shown in Fig. 3, it is the visualization result of the increase of the detection rate with the sample capacity in Tables 4, 5 and 6. The traditional C-SVM detection method is represented by the blue dotted line. It has the lowest classification accuracy and the highest false negative rate. As the data increases, the accuracy of the algorithm tends to be stable. The green dotted line indicates the C-SVM detection method based on grid search. The classification effect is the worst at the beginning. With the increase of traffic, the classification accuracy is improved by 2% to 10%. It solves the problem of difficult parameter selection of traditional C-SVM well, but occupies a large amount of CPU memory and increases the system overhead.

Fig. 3.
figure 3

Comparison of three methods for classification test results (Color figure online)

In DDoS attack detection [30], timeliness is also one of the important indicators to measure the effectiveness of attack detection algorithms. As shown in Fig. 3, the solid red line indicates the V-SVM-based DDoS attack detection method, which successfully avoids the problem of too long running time, saves a lot of computing resources and system overhead, and makes the training time acceptable. Within the scheme, the stability of the algorithm is good, which not only improves the classification accuracy and reduces the false negative rate, but also ensures the stability and timeliness of DDoS attack detection.

Finally, this paper summarizes the performance of the three algorithms, as shown in Table 7.

Table 7. Comparison of the three methods

In summary, the V-SVM-based DDoS attack detection method proposed in this paper not only improves the classification accuracy, reduces the false negative rate, but also ensures the stability and timeliness of the classification model.

6 Conclusion and Future Work

6.1 Conclusion

In view of the diversity, distribution, suddenness and concealment of DDoS attacks and the large scale, mass and complexity of network flows under the new network environment, the current DDoS attack detection methods have some problems, such as high false alarm rate, high false negative rate and poor timeliness, etc., this paper proposes a detection method of DDoS attacks based on V-SVM. The method defines a nine-tuple NSAF to extract features of the network flow to describe the state characteristics of the network flow, and perform Min-Max normalization and PCA dimensionality reduction on the data. The kernel function is used to map the preprocessed samples to the high-dimensional feature space, and then the number of the parameter V control support vector and error vector is introduced, and the classification model based on V-SVM is established to detect attacks. The experimental results show that compared with similar methods, the detection method proposed in this paper not only improves the classification accuracy and reduces the missing report, but also ensures the stability and timeliness of the classification model.

6.2 Future Work

In this paper, theoretical and experimental analysis of DDoS attack detection based on V-SVM is carried out, which proves a lot of excellent performance of SVM in theory and experiment. However, SVM still has shortcomings in terms of data volume, degree of fit, and robustness. There is still room for further improvement.

  1. (1)

    The mode in which the SVM processes data: With the advancement of the Internet of Things era, servers often have to deal with a large amount of data, and cloud technology can solve problems such as insufficient memory and excessive system overhead. Due to the limitations of the SVM itself, the data cannot be processed in parallel like the network flow collection method based on the cloud service model. Therefore, a hybrid model of V-SVM and distributed detection architecture remains to be studied.

  2. (2)

    Regularization of the V-SVM parameter V: In the experiment, the problem of over-learning usually occurs, which results in the unsatisfactory results of the obtained model test and the weakening of the generalization ability. Due to the operability of the V value, it has a different degree of influence on different features. Therefore, it is worth thinking about how to regularize the parameter V to avoid overfitting when DDoS attacks tend to be more professional.

  3. (3)

    The robustness of V-SVM: In the actual network environment, it is often accompanied by external factors such as noise points, network delays or network congestion. In the future research work, how to improve the robustness of the method needs to be continuously explored.