1 Introduction

Intelligent transportation systems (ITS) designate the application of new information and communication technologies in the transport and logistics fields. ITS integrate sensor, control, information and communication technologies to improve road traffic efficiency and safety. Vehicular Ad hoc Networks VANETs are a key part of the ITS framework. Sometimes, VANETs are referred as Intelligent Transportation Networks. Communication between vehicles can be ensured by three modes of communication. The first mode is the Vehicle-to-Vehicle (V2V) communication, which consists on a direct exchange between communicating entities [1]. The second mode is the Vehicle-to-Infrastructure (V2I) communication, which requires that each information exchanged between two nodes must have passed through a previously installed infrastructure. The third mode combines both previous modes V2V and V2I [2]. The main objective of this communication is to provide a range of application that can be divided into two categories, Security-related applications, such as cooperative collision warning [V–V], intersection collision warning, approaching emergency vehicle, and work zone warning [R–V]) [3]. Moreover, comfort-related applications, such as electronic toll collection, data transfer, parking lot payment, and traffic information [4]. These applications help to increase transportation safety and efficiency and to improve driving conditions for drivers and passengers to make our roads safer [5]. However, these applications raise privacy and security concerns, which significantly threaten the network operation and the user data.

Intrusion detection system based on machine learning (IDS-ML) is one of the known approaches to be effective to protect network against cyberattacks [6]. Based on a classification model, the IDS analyzes a network flow of data packets to check whether there is a suspicious activity in the network, and eventually a cyberattack. Since the effectiveness of the IDS depends on the classification model, it is worthwhile to train the classifier on an realistic and representative network traffic traces dataset that covers a variety of normal and attack samples. However, to the best of our knowledge, such dataset has not been produced for vehicular networks so far. The existing datasets [6,7,8,9,10] contain network traffic traces generated and captured by simulators, which may not be realistic and representative of real environment traffic traces. Therefore, intrusion detection systems trained based on these datasets may not perform effectively in real environment. In this paper, we generate a new vehicular network traffic traces dataset, VDoS-LRS, which contains real network traffic, and diverse types of denial of service cyberattack. We create a realistic testbed environment which considers different types of environment (urban, highway and rural). In addition, we explore a wide range of traffic features for detecting and classifying vehicular traffic. We evaluate the reliability of the VDoS-LRS dataset using different machine learning algorithms for forensics purposes. To be specific, the main contributions of this paper are as follows:

  • A new realistic DoS dataset for vehicular networks available upon request, with a detailed description of testbed design and configuration.

  • We evaluate the performance evaluation of network forensic methods, based on machine learning algorithms using VDoS-LRS dataset.

The rest of the paper is structured as follows. The literature review of the paper is discussed in Sect. 2. In Sect. 3, we present in details the testbed used to create VDoS-LRS dataset. Section 4 presents the machine learning algorithms used for the classification process. In Sect. 5, we present and discuss experimental results. Section 6 concludes the paper and draws some line for future work.

2 Background and Related Work

This section describes DoS attack scenarios in VANETs networks. Additionally, it outlines the limits of the proposed testbeds and datasets in the literature.

2.1 Denial of Service in Vehicular Networks

In the field of computer security, several descriptions of Denial of Service (DoS) attack can be found. It has been defined by Hasrouny et al. [11] as the attack where “the attacker target the communication medium to cause a channel jam”. DoS is easy to perform, it targets the system’s availability to disable users from accessing to the network. Broadly speaking, DoS attacks can be divided into three types: volume based attack (which saturate the bandwidth of the attacked site like UDP-Flood), protocol based attack (which consumes actual server resources like SYN-Flood) and application layer attack (comprised of seemingly legitimate and innocent requests like Slowloris attack). Quyoom et al. [12] investigated different DoS scenarios in the context of vehicular networks. DoS attack in vehicular networks can target vehicle resources, roadside Units (RSU), and communication channels.

For example, if a normal vehicle attempts to start a TCP connection with an RSU:

  • The vehicle ask the establishment of the connection by sending a SYN (synchronize) message.

  • The RSU respond to the vehicle by sending back a SYN-ACK message.

  • The vehicle send him back an ACK, and the exchange operation can start.

In the case of SYN-Flood (Fig. 1), the malicious vehicle does not respond to the RSU with an ACK, which will push the RSU to wait for the ACK some time to avoid network congestion. Depending on the number of request sent by the attacker, the RSU will be temporarily unavailable to communicate with legitimate vehicles. The attacker can also jams the channel, in a way that vehicles would not be able to access the channel and communicate.

Fig. 1
figure 1

SYN-Flood in VANET scenario

2.2 Existing Network Datasets

There have been many researches in the literature [6,7,8,9,10, 13,14,15,16,17,18,19,20] to protect vehicular networks against cyberattack. Most of them are based on machine learning techniques. Which need datasets for analysing network flows, and modelling normal and malicious network traffic. There have been several datasets containing network traffic, e.g., KDD99 [21], CAIDA datasets [22], NSL-KDD [23], ISCX [24], CICDS2017 [25], UNSW-NB15 [26], which have been used by researchers on intrusion detection and forensic analytics. All the aforementioned datasets lack the inclusion of vehicular traffic data, which make them unusable in the vehicular context.

Although some recent researches [6, 7, 9, 10, 18] have produced synthetic vehicular network datasets for intrusion detection purpose, the development of realistic vehicular network traffic dataset that includes vehicular scenarios is still an unexplored topic. The most recent testbeds and datasets are briefly explained below, with a comparison between them and VDoS-LRS given in Table 1.

Table 1 Synthesis VANET datasets

Singh et al. [9, 10] proposed a machine learning based approach to detect Wormhole attack in VANETs. For the production of the dataset, they used NS3 simulator [27], which uses the mobility traces generated by the SUMO traffic simulator. They used forty vehicles in the testbed with multi-hop communication using AODV routing protocol. However, in the simulation scenario, it is supposed that all vehicles moved with the same speed during the entire simulation time (15 m/s), which is unrealistic in the vehicular context. Moreover, the duration of the simulation is too short (400 s) to reflect a real vehicular activity.

Alheeti et al. [7] proposed a ML-based IDS to detect grey hole and rushing attacks in a vehicular networks. For the classification, they used SVM and Feed Forward Neural Networks (FFNN). The dataset used to train the IDS was extracted from a trace file generated through simulation. The authors did not provide details about testbed configuration, which makes their simulation difficult to re-product. Grover et al. [8] proposed ML-based approach to classify the node’s behaviour, i.e. whether the communicating nodes in the vehicular network is honest or malicious. They tested different classification algorithms: Naive Bayes, IBK, J-48, Random Forest and Ada Boost. However, the proposed IDS has been trained on a network traces dataset collected from simulation using NCTUns-5.0 simulator [28]. In addition, the simulation parameters do not reflect real environment, 6 km area and 2000s simulation duration. Lyamin et al. [18] proposed a data-mining-based approach for real-time detection of radio jamming denial-of-service attacks in the IEEE 802.11p vehicle-to-vehicle (V2V) communications. However, the proposed system was trained statistically for short training sequences of 5 and 100 s, which seems not enough to train an IDS and cover multiple scenarios of radio jamming attack.

Belenko et al. [6] tried to develop a method and a tool to generate close-to-reality datasets for intrusion detection in vehicular networks. The network traces dataset was generated using NS-3 simulator, and includes different types of attacks.

It is worthwhile to see that none of the aforementioned: Belenko et al. [6], Singh et al. [9, 10], Grover et al. [8], and Alheeti et al. [7] researches in the literature used real network traces for the training of the ML-IDS. Knowing that training an IDS with data extracted from a simulator may not be reliable, realistic and representative of vehicular network properties related to real environment. In light of foregoing, this research aims to design a real vehicular network traffic dataset to give a dimension of reliability to our research for detecting DoS attack in the vehicular networks.

3 Dataset Generation

In this section, we present in details the followed steps to generate our dataset: testbed configuration, features extraction, data pre-processing and features selection as shown in Fig. 2.

Fig. 2
figure 2

Workflow of DoS detection process

3.1 The Proposed Testbed

The testbed was made in Annaba, Algeria, we have used: two vehicles, 3 physical machines, 4 virtual machines, 2 access points, 4G modem and 2 Cisco antennas were used as shown in Fig. 3. Due to resources constraint, we have used two vehicles with multiple virtual machines. The two vehicles (Fig. 4a) were connected in V2V mode using IEEE 802.11G network provided by two AIR-AP1231G-E-K9 access points (Fig. 4b). The access points were connected with two antennas for coverage extension (Sharkee GPSB—Fig. 4d—and Cisco AIR-ANT1949 Antennas—Fig. 4c). This network configuration provides the connection between the two physical machines (Dell computer with an Intel® Pentium® 3558U @ 1.70 GHz 1.70 GHz processor in the first vehicle and Dell computer with Intel® Core i5-4200 U CPU @ 1.60 GHz 2.30 GHz in the second vehicle).

Fig. 3
figure 3

Testbed environment of the new VDoS-LRS dataset

Fig. 4
figure 4

Equipments used in the testbed

The first machine runs Windows 8 operating system with two network interfaces (Ethernet interface with 192.168.0.3 IP address and Wi-Fi interface with 192.168.8.101 IP address). The second machine runs Windows 7 operating system with IP address 192.168.0.5. Using the Oracle VM virtual Box tool, the first machine executes another Linux system with IP address 192.168.0.4, and the second physical machine (in the second vehicle) executes 3 others virtual machines with following three IP addresses 192.168.0.6, 192.168.0.7, 192.168.0.8, which run the Kali-Linux distrubtion as the operating system. The three Kali machines represent attackers whose machines execute a different type of DoS attack. The other three machines including two victims.

As shown in Fig. 3, for malicious traffic, the three Kali-Linux machines (in the second vehicle) run a DoS attack taking as a targets the two victim machines (in the first vehicle) with UDP Flood, SYN Flood and Slowloris packets alternately. For benign traffic, we tried to simulate different VANETs services. For example, for the exchange of security information and collaborations between nodes in VANETs, we used packet sender [29], a traffic generator software, which sends UDP, TCP and SSL packets to multiple clients simultaneously. For file sharing between benign nodes, we made a file share using Server Message Block (SMB) protocol that operates as an application layer or presentation-layer network protocol. We also considered common user activities such as access to Google Maps, YouTube, social networks, and real-time applications such as video and audio calls. For data collection, we used another Windows 7 physical machine, which runs a network traffic analyse called the Wireshark [30]. The experimentation has been conducted within three different environments: urban, rural, and highway. The generated traffic has been stored as pcap files.

We named the produced dataset “VDoS-LRS”, which stands for Vehicular Denial of Service-Networks and Systems Laboratory. The dataset contains normal and DoS malicious traffic. VDoS-LRS took into account three types of DoS attack. SYN Flood exploits vulnerabilities in TCP protocol; it consists on a massive sending of SYN requests to the vehicle. The goal of this attack is to make the vehicle unavailable for legitimate vehicles by exhausting its resources. UDP Flood, which represent the volume-based category, in which the attacker overwhelms random ports on the targeted host with IP packets containing UDP datagrams. The receiving sends back a “Destination Unreachable” packet. As more and more UDP packets are received and answered, the system becomes overwhelmed and unresponsive to other clients. Slowloris, which represent the DoS attacks on the application layers. It can take down another machine with minimal bandwidth and side effects on unrelated services and ports, by keeping many connections to the target opened and hold them open as long as possible (Using uncompleted HTTP request).

In our testbed, we take in consideration the VANETs intrinsic characteristics. Three diverse environments have been considered: urban, highway, and rural. Each environment has its own characteristics. For example, the availability of network coverage in rural environment is not as good as the urban environment. Regarding the vehicle speed, we consider the recommended speeds in each environments (average speed in urban environment 40 km/h, 90 km/h in highway and 30 km/h in rural environment).

3.2 Features Extraction

After collecting the network traffic (as PCAP files) corresponding to the three environments (urban, rural, highway), we extract a set of 79 network features (see Table 2). To extract features we use CICFlowMeter [31], a network traffic flow generator distributed by the Canadien Institut for Cybersecurity (CIC). It generates bidirectional flows, where the first packet determines the forward (source to destination) and backward (destination to source) directions. Note that TCP flows are usually terminated upon connection teardown (by FIN packet) while UDP flows are terminated by a flow timeout. The flow timeout value can be assigned arbitrarily by the individual scheme e.g., 600 s for both TCP and UDP.

Table 2 List of network features

We consider the same five categories of network features as Lashkari et al. [32], which are based on behaviour, bytes, packets, time, and flow. The behaviour-based features (Table 2. [1]) are used to evaluate an object based on its intended actions before it can execute that behaviour. For example, if a connection lasts for a long time, this behaviour may be a behaviour of a DoS attack. The bytes and packets based features (Table 2 [2–13]) are used to count the number of bytes/packets exchanged. The bytes/packets based features allow the detection of large and abnormal traffic increase, which is symptomatic of DoS attacks. In addition, the time taken between the transmission of the packets in a DoS attack is too short (especially TCP Flood and UDP Flood), for this reason, the time-based features can be revealing of DoS attacks (Table 2 [14–17]). Moreover, the flow data features (Table 2 [18–24]) are like packet-based features but with reduced storage space. Instead of on working on packets, we work on a flow of packets. We do not consider source and destination IP addresses and ports, because attackers can change them easily. Furthermore, it may mislead the classification model and prevent it from accurately analyse the rest of the features.

3.3 Data Pre-processing and Features Selection

For the data pre-processing step, we did the cleaning and the normalization of data. To check missing values and deal with them, we used one of the python programming language function named Dropna(). Dropna removes a row or a column from a dataframe, which has a NaN or no values in it. Moreover, to deal with the huge differences between magnitude, units, and range in the generated dataset, we used the feature scaling. The feature scaling aims to put all the values in the dataset between 0 and 1, in order to make the features more consistent with each other and to makes the training step less sensitive to this problem. To apply this technique on our dataset, we used from the sklearn library in python the StandardScaler function.

We have used in this research two types of feature selection algorithms. The first one is Forward selection [33], which belongs to the wrappers class. Forward selection is an iterative algorithm that starts with an empty set of feature. In each iteration, it adds the best feature that improves the model until the addition of a new feature does not improve the performance of the model. Forward feature selection algorithm has reduced the number of features from 79 to 10 features as shown in Table 3.

Table 3 The selected features

The second features selection algorithm is the Linear Support Vector Classifier LinearSVC [34], which belongs to the embedded features selection algorithms. LinearSVC is an algorithm that gives to each feature a coef\_ or feature\_importances\_ attribute. All the features are considered unimportant at the beginning and it gives them values under a threshold parameter. After that, it uses built-in heuristics for finding the threshold of every feature using a string argument. LinearSVC has reduced the number of features from 79 to 25 as shown in Table 3.

The best results were achieved when using the subset of features giving by the LinearSVC (as shown in the results section).

4 Classification Algorithms

In this section, we briefly present the machine learning algorithms used in our experimentation. We have used the following algorithms for their known efficiency and classification performances: naive Bayes, support vector machine, k-nearest neighbour (KNN), Random Forest and decision trees.

  • Naive Bayes Algorithm

An algorithm based on probabilities. A sample probabilistic classifier t, it assumes that given the contest of a class. The attributes of an example are independent of each other. Usually also found under the name of “Naive Bayes assumption”. It is based on the theory of Bayes; where the classifier calculates a posteriori probabilities of a class using estimates obtained from a training set if labelled data. In addition, when a new data point is presented for classification the a posteriori probability is calculated for each class, then the example is assigned to the class with the largest posteriori probability [35].

$$\begin{aligned} P\left( {c |x} \right) & = \frac{{P\left( {x |c} \right)*P\left( c \right)}}{P\left( x \right)} \\ P\left( {c |x} \right) & = P\left( {x1 |c} \right)*P\left( {x2 |c} \right)* .. *P\left( {xn |c} \right)*P\left( c \right) \\ \end{aligned}$$

P(c|x) is the posterior probability of class (target) given predictor (attribute).

P(c) is the prior probability of class.

P(x|c) is the likelihood, which is the probability of predictor given class.

P(x) is the prior probability of predictor.

  • Support Vector Machine(SVM)

Algorithms widely used for supervised classification problems, the overall idea and essential key for this classification technique is to find a hyperplane that allows to distinctly classifying data points [36]. Algorithmically, we try to build boundaries between instances of the data set, to transform the original feature space, and separate it by a linear function. This requires a nonlinear separation. To achieve this, SVM algorithms benefits from the concept of distance to find the best margins and the new separation in the new feature space.

  • K-Nearest Neighbour

The k-nearest neighbours algorithm (k-NN) is a non-parametric method used for classification and regression [37]. The decision rule in KNN classification is very simple. It works based on minimum distance from the query instance to the training samples to determine the K-nearest neighbours. After gathering the K nearest neighbours, it take simple majority of these K-nearest neighbours to be the prediction of the query instance. Formally:

$$score\left( {D, Ci} \right) = \sum Dj \in KNN\left( d \right) Sim\left( {D, Dj} \right)\gamma \left( {Dj,Ci} \right)$$

Above, KNN(d) indicates the set of K-nearest neighbours of the query instance \(D*\gamma \left( {Dj, Ci} \right)\) with respect to class Ci, which is:

$$\gamma \left( {Dj, Ci} \right) = \left\{ {\begin{array}{*{20}l} 1, & Dj \in Ci \\ 0, & Dj \notin Ci \\ \end{array} } \right.$$

For test document d, it should be assigned the class that has the highest resulting weighted sum.

  • Decision Trees

The concept of this algorithm is to split the data set given a criterion that maximizes the separation of data. The results in a tree like structure [38]. A more simplified idea of the decision tree algorithm is that it breaks down the data set into smaller subsets, at the same time an associated decision tree is incrementally developed, each decision node has two or more branches and the leaf node represents a decision (label of class). There are couple of algorithms to build a decision tree, CART (Classification and Regression Trees) that uses Gini Index (Classification) as metric. Moreover, the one used in this paper, which is ID3 (Iterative Dichotomiser 3) that uses Entropy function and Information gain as metrics. To create a tree, we need to have a root node, which will be the node with the highest information gain in ID3. In order to define information gain precisely, we begin by defining a measure commonly used in information theory, called entropy that characterizes the (im) purity of an arbitrary collection of examples. Entropy function H(S) can be represented formally as follow:

$$H\left( s \right) = \mathop \sum \limits_{c \in C} - P\left( c \right)*log2P\left( c \right)$$

S—The current data set for which entropy is being calculated.

C—Set of classes in S.

P(c)—the proportion of the number of elements in class c to the number of elements in set S.

  • Random Forest

Random forest, like its name implies, consists of a large number of individual decision trees that operate as an ensemble. Each individual tree in the random forest spits out a class prediction and the class with the most votes becomes our model’s prediction.

In this paper, we used the mentioned algorithms above with the defaults parameters, except with the SVM. For it, we used a nonlinear kernel named the Radial Basis Function (rbf), because it is well in practice and relatively ease for calibrate, as opposed to other kernels.

5 Results and Discussion

In this section, we present and discuss the results of DoS attack detection through two experiments. In the first experiment, we perform binary classification (attack/normal). The second experiment aims to identify the type of the DoS attack (SYN-flood, Slowloris, and UDP-flood). Tables 4 and 5 present samples distribution corresponding respectively to binary and multiclass classification. For both experiences, we have tested several configurations with different sets of network features: (1) the initial features set; (2) forward selection features set; (3) linearSVC features set; and (4) the common set of features between forward selection and linearSVC. We have used 60% of the dataset for training and 40% for testing.

Table 4 Dataset samples distribution for binary classification
Table 5 Dataset samples distribution for multiclass classification

To evaluate the performances of the proposed DoS detection appraoch, we used the following performance metrics:

  • Accuracy: the ratio of number of correct predictions to the total number of input samples:

    $$Accuracy = \frac{TP + TN}{TP + FP + TN + FN}$$

    TP: true positive; TN: True negative; FP: false positive; FN: false negative.

  • Precision: represents the proportion of the relevant items among all the items proposed:

    $$Precision = \frac{TP}{TP + FP}$$
  • Recall: is the fraction of relevant instances that have been retrieved over the total amount of relevant instances:

    $$Recall = \frac{TP}{TP + FN}$$
  • F1 score: is defined as the harmonic mean between precision and recall. It is used as a statistical measure to rate performance:

    $$f1 - score = \frac{{2\left( {precision*recall} \right)}}{precision + recall}$$
  • False positive rate (FPR) is the proportion of all negatives that still yield positive test outcomes:

    $$FPR = \frac{FP}{TN + FP}$$
  • False negative rate (FNR): is the proportion of positives that yield negative test outcomes with the test:

    $$FNR = \frac{FN}{TP + FN}$$

5.1 Experiment 1: Binary Classification

In this experiment, we evaluate the performance of the classifier on classifying a network connection as legitimate or a DoS attack. The classification performances corresponding to the three types of environment (highway, rural, and urban) are presented respectively in Tables 6, 7 and 8. For each environment, we present the classification performances corresponding to the initial features set, and the best set of selected features.

Table 6 Binary classification results in highway environment
Table 7 Binary classification results in rural environment
Table 8 Binary classification results in urban environment

Broadly speaking, the experimental results show that the tree-based algorithms yield the best classification performances within the three types of environment. Decision tree and random forest are known for their high accuracy and stability. On the other hand, the Naive Bayes algorithm performs worse than the other considered algorithms, showing the highest false positive rates. We can see that features selection has slightly affected the performance of decision tree and SVM, by increasing their false positive and negative rates in the case of highway environment. Which is not the case in rural and urban environment, and not the case of the other classification algorithms.

The decision tree classifier yields the best accuracy with the lowest false positive and negative rates in the highway environment.

Whereas, the random forest algorithm outperforms all the other classifiers in the rural and urban environment. The random forest algorithm with the best set of features performs better than with the initial features set. This is mainly due to the fact that in some cases some features might be noisy, redundant and irrelevant, which may mislead the algorithm.

From the obtained results, we can see that the detection precision of malicious traffic was slightly better than the normal traffic. This can be explained by the huge volume of superfluous requests sent by the attacker; the size and the time between packets are completely different from a normal connection. The classification accuracy has been slightly dropped in the highway environment; this may be due to the high speed of vehicles.

From these experiments, we notice that the environment slightly affect the feasibility of denial of service attack in the vehicular context and its detection performances.

Table 9 Multiclass classification results in highway environment
Table 10 Multiclass classification results in rural environment
Table 11 Multiclass classification results in urban environment

5.2 Experiment 2: Multiclass Classification

As in binary classification, the tree-based algorithms show the best classification performances, with similar accuracy scores. However, the decision tree outperforms the random forest in terms of false positive and negatives rates in highway and urban environment. Random forest achieved its best performances with the the LinearSVC selected features set. The Naive Bayes gives the worst results with the lowest accuracy score and the highest false positive and negative rates within the three environment and with both features sets. As we can notice, the environment slightly influences the feasibility of denial of service attack and its detection performances (Tables 9, 10, 11).

Regarding the detection precision of each attack type, we can notice that SYN-Flood and UDP-Flood attacks are relatively easier to detect than the Slowloris attack by all the classifiers. This can be explained by the fact that the Slowloris modus operandi is quite different from the others DoS attacks. Slowloris requires minimal bandwidth and does not flood the server. Periodically, the attacker sends subsequent headers for each request, but never completes it. In contrast to SYN-Flood and UDP-Flood attacks, which requires maximum bandwidth. Those differences make the Slowloris DoS harder to detect compared to SYN-Flood and UDP-Flood.

6 Conclusion

In this paper, we proposed a new dataset, VDoS-LRS, which includes normal vehicular ad hoc network traffic, and various types of denial of service attacks traffic. This dataset was generated and labelled based on a realistic testbed, which takes into consideration three types of environments (urban, rural and highway). To characterize network flows, we extracted from the raw network traffic traces a set of eighty features related to session behaviour, exchanged bytes/packets, and time intervals. Then we carried out feature selection to obtain the best set of features, before the training and testing phase. Afterwards, we evaluate the performances of the initial and selected feature sets with five common machine-learning algorithms. The experimental results showed that the decision tree classifier yields the highest accuracy with the lowest false positive and negative rates within the three environments. The detection system based on this classifier would be able to detect DoS attack in VANET as well as its types with an accuracy of 99%. We observed that the environment slightly affects the feasibility and detection performances of DoS attack. In the highway environment, the classifier needs all the features to give the same performances as in the other environments. In future work, we will consider more denial of service scenarios along with other attacks in VANETs. In addition, we plan to extend our testbed to include more vehicles.