Keywords

1 Introduction

The Constrained Application Protocol (CoAP) is a web-like transfer protocol specifically designed to facilitate communication at the application layer for energy-constrained IoT devices [25]. CoAP operates over the User Datagram Protocol (UDP) and adheres to the Representational State Transfer (REST) architectural framework. CoAP’s architecture comprises two distinct layers: (1) the message layer and (2) the request/response layer. The message layer is responsible for managing communication over the UDP protocol, while the request/response layer transmits the corresponding messages, using specific codes to mitigate and circumvent functional issues, such as message loss [17, 21].

One of the notable advantages of CoAP is its ability to seamlessly integrate with HTTP, thus enabling integration with existing web infrastructure while satisfying the specific demands of constrained environments. This integration is achieved by addressing specialized requirements such as support for multicast communication, minimization of overhead, and simplicity in constrained settings. The primary purpose of CoAP is to facilitate machine-to-machine (M2M) communication, particularly in domains like smart energy and building automation. As well as the request/response interaction model, the protocol also encompasses built-in discovery mechanisms for services and resources, aligning itself with the fundamental concepts of the Web [2, 20].

The security of CoAP primarily relies on the implementation of the Datagram Transport Layer Security (DTLS) protocol at the transport layer. DTLS ensures confidentiality, integrity, and non-repudiation of information and services [23, 30]. However, not all IoT devices and environments can make use of DTLS due to the computationally expensive cryptographic operations required by this technology or the necessary additional bytes for message encryption and integrity checks. These needs produce a higher energy consumption, reduced network throughput, and increased latency, which can negatively impact the overall performance of the IoT network [4, 22]. A number of research is being conducted in order to find lighter implementations for DTLS or new techniques that can jointly be used with these cryptographic approaches. One such techniques is the development of model-based intrusion detection systems that can help securing IoT environments while relieving devices from the burden of the task [9, 11]. These models can be based on simple known rules, however, these cannot be used to solve a categorization problem. As a result, the classifier implementation must go through a process of learning from a set of training items.

This work is devoted to develop a convenient model-based IDS for detecting DoS attacks on IoT scenarios, trying to find the best techniques to achieve this objective. DoS attacks based on amplification is one of the most usual and dangerous ones, as it can be performed even in secured encrypted scenarios where DTLS is used [6, 18]. The objective of this work is to get a good detector for this kind of specific attacks. This approach introduces an scalability issue, as the model training should be performed for each different IoT ecosystem where it is needed. Also, the approach is only useful for amplification-based DoS attacks and so different models should be trained for detecting other anomalies, if needed. Currently, a number of efforts are being carried out in order to achieve more generic and zero-day attack detectors [8, 13, 24], but these solutions are more prone to both false positives and false negatives. Also, these approaches may require more computational resources and processing time.

One-class makes reference to a specific situation in which the classifier must distinguish between a known class (target class) and an unknown class (non-target class) [10, 28]. One-class classifiers can be implemented through three approaches: using density estimation functions to approximate the system behavior, delimiting the boundaries of the target set, or applying reconstruction methods. This method implements a model from the training data to minimize the reconstruction error. Then, objects from the non-target class would lead to high reconstruction error, thus facilitating the outlier detection [15].

The paper is divided into a number of sections. The case study, which details the particular IoT CoAP ecosystem and dataset being used, is described in the second section. The third part discusses the techniques that will be used, including information on the auto-encoder, K-Means and PCA (Principal Component Analysis). The experiments that were carried out and their outcomes are covered in section four. Section five addresses the results of set of experiments while the findings and recommendations for further research are presented in the concluding part.

2 Case Study

In the preceding section, the general features of CoAP were discussed, along with the cybersecurity challenges it faces. This section delves deeper into the workings of CoAP and elaborates on the implementation of a DoS attack.

CoAP works like client/server model, much like HTTP, and makes use of the REST API architecture for communication. This structure employs a REST API, where resources are identified by unique URLs and can be manipulated through HTTP methods such as GET, POST, PUT, and DELETE. Additionally, CoAP includes the “Observe” functionality [5], which allows a client to keep track of a server resource. The server delivers periodic updates of the resource to registered clients, enabling bidirectional communication among devices.

To generate a dataset, a testing environment is deployed to create genuine traffic within the CoAP framework, wherein DoS attack will be performed to assess the protocol’s debility, as delineated in RFC7252 [25]. This environment comprises a “Node.js” server furnished with the “node-coap” library to facilitate the CoAP protocol. A “DHT11” sensor is interfaced with a “NodeMCU” board, which is programmed using the “ESP-CoAP” library [19] to deliver temperature and humidity services. A JavaScript client presents the sensor data on the terminal, while a pair of “Copper4cr” clients facilitate the dispatching of requests and the reception of responses [16].

A DoS attack will be executed on the CoAP protocol within the devised environment, with the objective of generating a valuable dataset to aid in the identification of anomalies in the protocol and the mitigation of such threats.

When a request is received, CoAP servers produce a response packet. The size of this response packet can be considerably larger than the request packet, due to CoAP’s capability to transmit multiple blocks in various sizes, even remarkably small ones during an attack. This characteristic makes CoAP clients susceptible to Denial of Service (DoS) attacks [29].

Fig. 1.
figure 1

CoAP environment with DoS attack

An attacker can launch an amplification attack by provoking a denial of service and falsifying the victim’s IP address as the source address in a request packet. This action prompts the server to dispatch a larger packet directed at the victim. To perform this attack within the environment, the attacker poses as a Copper4Cr client, spoofing the client’s IP address, leading the server to reply to their requests instead of the legitimate client’s. To impede the client’s service, the attacker adjusts the response packet to employ exceptionally small block sizes. As a consequence, the server is compelled to deliver an increased volume of response packets to the client. Figure 1 shows how the DoS attack is carried out in the CoAP environment.

All traffic is meticulously captured with the intent of procuring a pcap file, which is subsequently employed to analyze the frames of the generated traffic and extract universally shared fields among them. These fields encompass system times, relative capture times, and all fields pertinent to the CoAP protocol. The frames are labeled in accordance with their timestamp at the time of capture, signifying whether they correspond to a DoS attack or typical traffic.

The dataset collected for this study contains three types of fields: frame level fields, CoAP protocol fields, and a particular “type” field used to identify frames under DoS attack. The frame level fields allow for easy pattern recognition in the generated data. The CoAP protocol fields provide information specific to the frames using this protocol and can be found in the CoAP section of the Wireshark Display Filter Reference. The “type” field is used to indicate the type of attack and frames under DoS attack are labeled with the “DoS” tag. The dataset is stored in a CSV file and contains a total of 30,319 frames, with 21,269 frames representing normal traffic and 9,050 frames representing traffic under attack.

3 One-Class Reconstruction Methods

This section describes the different reconstruction methods applied to the training set to develop anomaly detection. It is important to emphasize that only information about normal operations is registered.

3.1 Autoencoders

Autoencoder is a type of supervised neural network that is based on the dimensional reduction or compression of information, which is later decompressed to recreate the original input data so that the final representation is as close as possible to the original one. Figure 2 shows the architecture of an autoencoder network that presents two stages:

  • Coding stage: it is made up of the input layer, in which the data is entered; one or more hidden layers of dimensional reduction and a last bottleneck layer, in which there is a compressed representation of the original data.

  • Decoding stage: from the bottleneck layer, the information is decompressed by passing it through one or more hidden layers to be displayed in the output layer with the same dimension as at the network’s input.

The hidden bottleneck layer contains a number of hidden \(h_{auto}\) neurons [28, 31]. Once the network is trained, it is assumed that test instances that do not belong to the target set will present a great reconstruction error. It is calculated through Eq. 1, where \(f_{auto}(p;w)\) represents the reconstructed output of the network.

$$\begin{aligned} e(p) = \parallel f_{auto}(p;w)-p\parallel ^2 \end{aligned}$$
(1)
Fig. 2.
figure 2

Autoencoder topology

3.2 K-Means

The K-Means is an unsupervised algorithm commonly used for machine learning problems [27, 28]. This clustering algorithm is based on the distances between objects to determine their memberships. It assumes that the data is grouped into clusters that must be selected by the user, and that they can be characterized by a series of \(\theta _k\) prototype objects located after minimizing the function in Eq. 2. These prototypes create a partition of the entire feature space.

$$\begin{aligned} \varepsilon _{K-means}=\sum _i\left( \min _k\parallel x_i-\theta _k\parallel ^2\right) \end{aligned}$$
(2)

The use of K-Means for one-class purposes lies in the calculation of the reconstruction error. Once the different centroids are determined, a test instance p reconstruction error is the minimum Euclidean distance from the object to its closest prototype, as shown in Eq. 3.

$$\begin{aligned} e(p)=\min _k \parallel p-\theta _k\parallel ^2 \end{aligned}$$
(3)

An example of K-Means application in a two-dimensional dataset is shown in Fig. 3. The target set is separated into two clusters. The test point \(p_1\) is considered anomalous since the distance to its nearest centroid (blue cross) is greater than all maximum distances of blue points to that centroid. On the contrary, \(p_2\) belongs to the target set because it is closer to the green cross than many green points.

Fig. 3.
figure 3

Clasificador K-Means vs K-Centers (Color figure online)

3.3 Principal Component Analysis

Principal Component Analysis (PCA) is a statistical method commonly applied to analyze multivariate data. Its use extends to many branches as a tool for dimensional reduction, clustering, and classification problems to anomaly detection. PCA focuses on finding the relationship between data by obtaining the orthonormal subspace that reflects the greatest variation between the different variables [1, 26, 28]. Then, the eigenvectors of the covariance matrix of the training data are calculated, whose eigenvalues are maximum, to later build a base \(\mathcal {W}\) of the generated subspace with them, onto which the data will be projected. From a test object p, the reconstruction error will be calculated to check if it belongs to the target data set. To do this, its projection in the subspace will be calculated first (see Eq. 4). The reconstruction error is calculated as the difference between the original and projected points, as shown in Eq. 5.

$$\begin{aligned} p_p=\mathcal {W}(\mathcal {W}^T\mathcal {W})^{-1}\mathcal {W}^Tz \end{aligned}$$
(4)
$$\begin{aligned} e(p)=\parallel p - p_p\parallel ^2 \end{aligned}$$
(5)

By default, the k eigenvectors with the largest eigenvalues are commonly used, although there it is possible to use eigenvectors with the smallest eigenvalues. Figure 4(a) shows an example of how two principal components are represented in two dimensions. In this case, PC1 corresponds to a greater eigenvalue. To check whether a test point p (red square) belongs to the target class, the square of the distance d from that point to its projection on the principal component \(p_p\) (purple square), as shown in Fig. 4(b). The point belongs to the target set if this value is below the limit established during the training stage. Otherwise, it will be considered as non-target [12, 14].

Fig. 4.
figure 4

Example of PCA for anomaly detection

4 Experiments

The dataset is divided into two general groups, normal operation and Denial of Service situations. Considering that the training stage is based on data from normal operations, a k-fold iteration is followed to validate the classifiers. Figure 5 shows an example with three folds, where the green sets are used to learn the target class patterns, and the gray and yellow sets represent the normal and DoS operation, respectively. These last two sets determine the performance of the classifier for each technique and configuration during the classifier test.

Fig. 5.
figure 5

Example of k-fold with \(k=3\)

The well-known Area Under the Receiving Operating Characteristics Curve (AUC) measurement is used to assess each classifier configuration’s performance. The true positive and false positive rates are combined to provide the AUC, which provides an interesting idea of the classifier performance. From a statistical perspective, this number represents the possibility that a random positive event will be classified as positive [7]. Furthermore, AUC presents the advantage of not being sensitive to class distribution, particularly in one-class tasks, when compared to other measures like sensitivity, precision, or recall [3]. The times needed to train the classifier and to label a new test sample are also registered as a measure of the classifier goodness.

Furthermore, to seek the best classifier performance, the algorithms were trained with raw data, with a 0 to 1 normalization and with a Z-Score normalization. Besides these three different configurations, several hyperparameters were swept for each technique.

Autoencoders: The number of neurons in the hidden layer was tested from 1 to 11, that is the number minimum dimensional reduction of the coding stage. Furthermore, the threshold is tested considering different anomalies percentage in the target set: 0%, 5%, 10%, 15%. K-Means: The number of clusters in which the data is grouped was tested from 1 to 15. Furthermore, the threshold is tested considering different anomalies percentage in the target set: 0%, 5%, 10%, 15%. PCA: The components tested from 1 to 11 are the number of features minus one. Furthermore, the threshold is tested considering different anomalies percentage in the target set: 0%, 5%, 10%, 15%.

5 Results

Table 1 summarizes the results achieved by Autoencoders classifiers. It is important to remark that the training times are greater than the ones obtained with K-Means and PCA. This fact is especially significant when the dataset is not preprocessed. However, the raw data leads to the best Autoencoder results, with 8 neurons in the hidden layer and considering outliers 15% of the training data.

Table 1. Results achieved by Autoencoders classifiers

The greatest AUC values correspond to a K-Means classifier with 13 clusters, with an AUC of 82.72% (Table 2). Significant differences are also noticeable in terms of training times between classifiers. They are derived from the number of clusters; a greater number of clusters results in a slower training process.

Table 2. Results achieved by K-Means classifiers

Finally, PCA results are shown in Table 3, with Zscore normalization, seven principal components, and 10 (%) of outlier fraction as the best configuration, although it does not overcome K-Means results.

Table 3. Results achieved by PCA classifiers

6 Conclusions and Future Works

The significance of security in IoT devices has grown in recent times, making it increasingly important to employ systems for knowing networks behavior in order to help identify and categorize emerging attack methods, particularly in industrial processes and communication protocols such as CoAP. Early detection of these attacks is essential to ensure the resilience of these processes.

In the current study, was employed a DoS CoAP dataset for developing a model based on the training data with the aim of minimizing the reconstruction error. As a result, instances belonging to the non-target class will yield high reconstruction errors, making it easier to detect outliers. Being, the best way the implementation of a K-Means classifier with 13 clusters.

Future research will explore the application of the set of methods studied in this paper to other CoAP datasets, which consist of Man-in-the-Middle and Cross-Protocol family attacks.