Keywords

1 Introduction

Nowadays, Internet-of-Things (IoT) has matured from its origin as a research concept to commercial products and real deployments, such as the current smart cities [1]. In particular, large-scale IoT pilots are the most interesting and recent topic on IoT innovation [2]. These pilots include very complex systems and architectures which involve a great amount of components (such as devices, services, execution engines, etc.). This complexity facilitates the appearance (deliberate or not) of malicious components; those which provide uncertain data, services or information. In general, IoT architectures try to merge very different devices and other components, which may be (and used to be) geographically sparse and conceptually very distant [3]. Thus, low-level information must be collected, transformed, aggregated and translated various times (using, for example, semantic technologies) before being sent to the high-level final applications. However, none meta-information about the underlying hardware platform (such as sensor sensibility) or other low-level components is provided to the high-level layers.

Therefore, final applications have a very limited knowledge about the system and almost no control over the infrastructure which provides them with the operation data [4]. In this context, extremely important concepts such as the uncertainty level associated with the received data or the real quality associated with the offered services cannot be estimated using traditional configuration algorithms (as they require special control information in order to calculate the results) [5]. Instead of that, new solutions based only on the available information at high-level are necessary.

The objective of this paper is to describe a statistical framework for knowledge discovery in order to estimate the uncertainty level associated with the received operation data by a certain application. Additionally, these results are used as input in a reputation model focused on locating the malicious components. Thus, if possible, final applications may discard information from these components.

The rest of the paper is organized as follows: Sect. 2 describes the state of the art on uncertainty management and reputation models; Sect. 3 includes the mathematical formalization of the proposed framework and reputation model; Sect. 4 presents an experimental validation based on a simulated scenario in order to test the performance of the proposed solution; Sect. 5 contains the experimental results and Sect. 6 concludes the paper.

2 Related Works

Uncertainty management is one of the key problems in IoT scenarios; however, little work has treated this topic. Most papers, moreover, are focused on how uncertainties affect control loops and algorithms. In this case, they use to focus the research on Cyber-Physical Systems [6] (sometimes understood as a specific IoT scenario) which may be described using finite difference equations, which are influenced by unknown discrete functions representing the uncertainties [7]. Additionally, some works [8] propose a mathematical framework in order to calculate the optimal reaction in order to cancel the effect of uncertainties in control loops.

On the other hand, typical works on uncertainty management in IoT scenarios try to measure the influence of factors that designers know they do not know (noise, packet losses, etc.). Thus, uncertainty taxonomies [9], modeling [10] and processing [11] are really typical. Nevertheless, factors that designers do not know they do not know are never addressed; and this kind of factors is the most important in large-scale IoT systems. Thus, a more general framework is required.

Papers on uncertainty level estimation in IoT scenarios are strange, and usually research works on this topic talk about trust levels. Any case, there is also little work on trust management for IoT environments. Furthermore, most of these works are based on the concept of reputation.

Some works try to stablish the definition of reputation [12] in IoT systems, and apply the model in a horizontal way (inside the underlying sensor network, in order to calculate the reputation of nodes) [13]. These models, however, only consider Quality-of-Service (QoS) trust metrics like packet forwarding/delivery ratio and energy consumption. More general models have been also proposed [14], including philosophical concepts such as honesty, cooperativeness, or community-interest.

If vertical solutions are considered, works on service reputation are also found [15]. In particular, models based on the evaluation of user’s trust in a service and service classification [16]; and models considering authentication history and penalty [17] can be found. However, none of these models consider the previously described theoretical formalization, and concepts such as the service honesty are not defined.

In comparison with all the cited works our proposal formalizes the concept of uncertainty level in IoT scenarios using statistical tools. Additionally, definitions initially proposed for sensor networks are extended to services and other IoT components. As a result, malicious components are efficiently located and policies in order to remove or avoid them are enabled.

3 Formalization of the Proposed Solution

Malicious components are those which provide uncertain data, service or information. This behavior may be due to a cyber-attract, a bad programming or a malfunction in hardware (among other reasons). In order to avoid sever damage to entire system, these components must be isolated. With this objective, the concept of “reputation of a IoT entity” is defined.

The reputation of an IoT component \( \Sigma \), \( {\mathcal{R}}_{\Sigma } \), is defined as the global perception of its behavior in the system, in particular, whether transactions including this component present in general positive outcomes. As can be seen, reputation is a global concept, so all components in the system should be involved in its estimation. However, a global definition of reputation may be impractical, thus we also define the concept of local reputation.

The \( \Lambda - \)local reputation of an IoT component \( \Sigma \), \( {\mathcal{R}}_{\Sigma } |_{\Lambda } \), is the local perception of the behavior of the component \( \Sigma \) in a certain system’s component \( \Lambda \), in particular whether transactions including both components present in general positive outcomes.

Then, it is trivial to deduct that the relation between \( {\mathcal{R}}_{\Sigma } \) and \( {\mathcal{R}}_{\Sigma } |_{\varLambda } \) is which indicated in (1), where \( {\mathfrak{C}} \) is the set of all components in the system and \( \lambda_{\Lambda } \) the relative weight of \( {\mathcal{R}}_{\Sigma } |_{\Lambda } \).

$$ {\mathcal{R}}_{\Sigma } = \mathop \sum \limits_{{\Lambda \in {\mathfrak{C}}}} \lambda_{\Lambda } \cdot {\mathcal{R}}_{\Sigma } |_{\Lambda } $$
(1)

The \( \Lambda - \)local reputation of an IoT component \( \Sigma \) it is calculated as the weighted sum of three values: the nobleness \( {\mathcal{N}} \), the solidarity \( {\mathcal{S}} \), and the relevance \( {\mathcal{R}}{e} \) of the component \( \Sigma \) perceived by the component \( \Lambda \) (2).

$$ {\mathcal{R}}_{{\Sigma }} |_{{\Lambda }} = \left[ {\begin{array}{*{20}c} \alpha & \beta & \gamma \\ \end{array} } \right] \cdot \left[ {\begin{array}{*{20}c} {\mathcal{N}} \\ {\mathcal{S}} \\ {{\mathcal{R}}}{e} \\ \end{array} } \right] $$
(2)

Thus, a component \( \Sigma \) is categorized as malicious by a certain component \( \Lambda \) if its \( \Lambda - \)local reputation falls below a certain threshold \( \mu_{\Lambda } \) (a solution which in practical applications should be complemented with, for example, a token-based danger detection technology see Sects. 5 and 6). On the contrary, it is classified as a regular component. Malicious components should be avoided and, in that way, they would tend to be isolated as time passes.

Although these concepts may be applied to any couple of IoT components in a system or architecture, in this work we are focused on the reputation that final applications perceive about their primary data sources (i.e. mainly, services and hardware devices).

In order to evaluate the local reputation, the three parameters which compose it must be defined. For that, four different scenarios may be defined (see Fig. 1), depending on two independent criteria.

Fig. 1.
figure 1

Possible scenarios for the local reputation calculation

  • Transmission mode: Bidirectional communications generate a more complex model as two data flows have to be considered. In the simplest case, data only flows from the information source to the final application (unidirectional communications).

  • Data aggregation: In some occasions, data received by a certain application are obtained by aggregating the flows from various components (various-to-one scheme). In that case, measurements refer a global vision, and estimating the individual reputation is more complicated than in one-to-one schemes (where data are generated only in one component).

For this first work a reduced model is considered. Thus, only unidirectional communications based on a one-to-one scheme are considered. Additionally, in this first study, it is considered that \( \beta = \gamma = 0 \) and \( \alpha = 1 \), so the value of the local reputation matches the nobleness value. Next subsection describes the calculation of this parameter.

3.1 Uncertainty Level Estimation: Nobleness Calculation

The nobleness \( {\mathcal{N}} \) of an IoT component \( \Sigma \), according to a second component \( \Lambda \), is defined as the expectance of \( \Lambda \) to obtain correct information from \( \Sigma \). This information may be production data or meta-information (such as the offered QoS for a certain service).

Nobleness calculation is based on previous experiences, which are weighted in order to limit the effect of events very distant in time. Moreover, it is difficult to stablish the nobleness of a component based on a few interactions. Thus a threshold \( N_{th} \) must be defined in order accumulate the required measurements to start estimating the nobleness value.

Equation (3) represents the mathematical model for nobleness calculation; where \( n \) is the number of accumulated nobleness measurements and \( h \) is the weighted ratio of the number of times the component behaved nobly (i.e. it sent correct information).

$$ {\mathcal{N}} = \text{ }\left\{ {\begin{array}{*{20}c} {1 \,; \,n < N_{th} } \\ {\sqrt 2 \frac{h}{{\sqrt {1 + h^{2} } }} ; \,n > N_{th} } \\ \end{array} } \right. $$
(3)

As can be seen, nobleness follows an algebraic function belonging to the sigmoid class. Thus, \( {\mathcal{N}} \in \left[ {0, 1} \right] \forall h \in \left[ {0, 1} \right] \). Moreover, the model includes the presumption of nobleness, as every component is sincere (\( {\mathcal{N}} = 1 \)) until enough measurements are collected. In order to calculate the weighted ratio \( h \), we are considering a geometric sum (4); where the common ratio \( r \) can be freely fixed (in order to limit the influence of the past behaviors, as \( \left| r \right| \to 0 \)).

$$ h = \mathop \sum \limits_{j = 0}^{n} u[ - j] \cdot r^{j + 1} $$
(4)

The sequence \( u[ \cdot ] \) represents the natural ratio of the number of times the component behaved honestly in every time slot. \( u[j] \) is defined (5) as the quotient between the times the component provided correct information in the j-th time slot \( p_{j} \) and the total number of transactions in that time slot \( t_{j} \).

$$ u\left[ j \right] = \frac{{p_{j} }}{{t_{j} }} $$
(5)

In order to calculate whether an IoT component has provided correct information in a certain transaction, we are evaluating the uncertainty level \( \theta \) associated with the provided information. If this level remains below a certain threshold \( \mu_{h} \) it is considered the component has been honest. In some cases, the detected uncertainty may not be caused by the analyzed component. However, from the final applications’ point of view the provided information is uncertain and, in an aggregated vision, the component is not honest.

In order to evaluate the uncertainty level \( \theta \) we use the following statistical model. Figure 2 represents the scenario under study. A final application received from an information source (IoT component) a certain information \( \overline{\overline{x}} \). In large-scale IoT systems, the uncertainty level associated with \( \overline{\overline{x}} \) is the addition of two amounts.

Fig. 2.
figure 2

Scenario under study

First, the uncertainties \( {\mathfrak{T}}_{IT} \) about the equivalence between the received information \( \overline{\overline{x}} \) and the information generated by the information source \( \overline{x} \). In this first case, the relation between both data may be described as a surjective stochastic application \( T[ \cdot ] \), as every information \( \overline{\overline{x}} \) must be the image of a certain information \( \overline{x} \). These uncertainties are caused by the IoT infrastructure, so they are IT uncertainties.

And, second, the uncertainties \( {\mathfrak{T}}_{PHY} \) about the equivalence between the information generated by the information source \( \overline{x} \) and the real information existing in the physical world \( x \). These uncertainties are caused by physical limitations in the information capture. Therefore, they are physical uncertainties.

Thus, associated with a received information \( \overline{\overline{x}} \) there exists an enumerable set of uncertainty sources \( {\mathfrak{T}} = \left\{ {{\mathfrak{T}}_{PHY} , {\mathfrak{T}}_{IT} } \right\} = \{ i_{k} , k = 1, \ldots ,K\} \) whose cardinality \( K \) may reach the cardinality of the natural numbers \( \aleph_{0} \). This uncertainty sources transform the process of acquiring a certain information \( \overline{\overline{x}} \) in a random experiment \( \varepsilon \), which takes values from the discrete sample space \( \Omega \).

Each uncertainty source is described as a bi-varied stochastic process (6), being \( \Psi \) the sample space of all possible values for the uncertain event.

$$ i_{k} \,{ \asymp }\, {\rm X}_{k} [m ; \omega , \psi ) / \omega \in\Omega , \psi \in\Psi $$
(6)

Stochastic processes are discrete in time as final applications are a cyber component, but the sample space \( \Psi \) may be continuous or discrete, depending on the nature of the uncertainty source. For example, the measurement error has a continuous nature; however, the possibility of suffering a cyber-attack is described by a discrete variable. Furthermore, in general, uncertainties’ value change pretty slow, so these stochastic processes may be considered stationary during a time slot.

As they are unknown effects, stochastic processes are expressed in a parametric way, depending, each one, on a certain parameter \( \vartheta_{k} \). Three basic probability density functions or probability distributions may be used to describe uncertainty sources: uniform distributions, triangular distributions and Gaussian distributions (see Fig. 3).

Fig. 3.
figure 3

Basic probability density functions

Uniform distributions are employed to describe unknown effects whose effect is limited to the range \( [ - \vartheta_{k} , \vartheta_{k} ] \). In data acquisition processes this is most general distribution, as the sample space is bounded. Triangular distributions are employed when, besides the variation range, it is known that the error probability goes down as its value goes up. Finally, if more information is available (for example, if noise is considered) a Gaussian distribution with typical deviation \( \vartheta_{k} \) can be employed.

Considering \( {\mathcal{F}} \) the set of parts of \( \Omega \), and a function \( P\left( A \right) A \varepsilon {\mathcal{F}} \) which fulfills the three Kolmogorov’s probability axioms, the experiment \( \varepsilon \) is completely characterized with the event algebra \( E = \left\langle {\Omega , {\mathcal{F}},\text{ }P( \cdot )} \right\rangle \), which (additionally) is a \( \sigma - \)algebra.

In this context, it is possible to create a partition \( \Pi = \{ \pi_{1} , \ldots ,\pi_{p} \} \) of \( \Omega \), and select a main value \( \delta_{i} \in \pi_{i} \) representing every cluster. Then, the process to estimate the uncertainty level \( \theta [j] \) in the j-th time slot is as follows.

When certain information \( \overline{\overline{x}} \) is received, it is included in the observations vector \( v_{j} \) of the current time slot (7).

$$ v_{j} = \left( {\overline{\overline{x}}_{{_{1} }} , \ldots ,\overline{\overline{x}}_{{_{m} }} } \right) $$
(7)

As these observations are independent, it is possible to calculate the value of the \( \vartheta_{k} \) parameter for each uncertainty source using the maximum likelihood estimation (MLE) [18] algorithm and the vector \( v_{j} \). This method is the most adequate as prior probability distributions are unknown. Then, the event \( \pi_{i} \) to which belongs the received information it is located. For each uncertainty source, the probability \( \rho_{i}^{j} \) of the received information to really belong to the event \( \pi_{i} \) in that time slot is calculated (8).

$$ \rho_{i}^{j} = \mathop \smallint \limits_{{\pi_{i} }}^{{}} {\rm X}_{k} [j \,; \,\overline{\overline{x}} , \psi ) d\psi \,{\text{or}}\,\rho_{i}^{j} = \sum\nolimits_{{\psi \in \pi_{i} }} {{\rm X}_{k} [j \,;\, \overline{\overline{x}} , \psi ]} $$
(8)

In that way, as uncertainty sources are also independent, the global probability \( \rho^{j} \) of \( \overline{\overline{x}} \) to belong to \( \pi_{i} \) is calculated as a probability multiplication (9).

$$ \rho^{j} = \mathop \prod \limits_{k = 1}^{K} \rho_{k}^{j} $$
(9)

Finally, the information \( \delta_{i} \) is considered to be received with an uncertainty level \( \theta [j] \) calculated as indicated in (10).

$$ \theta \left[ j \right] = 1 - \rho^{j} $$
(10)

In this transaction, the information source is considered to be honest if meets the condition explained above (\( \theta \left[ j \right] > \mu_{h} \)).

4 Experimental Validation

In order to evaluate the proposed solution, an experimental validation was designed. The proposed experiment consisted of a simulation scenario, where a large-scale IoT system was deployed (Fig. 4).

Fig. 4.
figure 4

Simulation scenario

The simulated scenario (based on a real European deployment) included four different networks of information sources (public sensors, LoRA devices, etc.) geographically sparse. Each network was composed by one hundred and fifty (150) components. Randomly, the configuration of ten (10) components was created to force them to provide uncertain information (and being considered as malicious). Basically, the precision of the instruments was reduced, the electromagnetic interferences and the packet losses were strengthened and, in one case, it was supposed an intruder controls the component (causing this component to provide erroneous information). One final application was hosted in the FI-WARE ecosystem.

The simulated final application included the proposed uncertainty level calculation algorithm and the described reputation model (expressions from (1) to (10)). The other entities could be measured, but they were not provided with the reputation calculation module.

In order to perform the proposed simulation, the NS3 simulator was employed. NS3 is a network simulator whose scenarios and behavior are controlled and described by means of C++ programs, and which is extensively used in research due to its flexibility. Data about the number of transactions performed by the malicious nodes were collected. Moreover, the evolution of the local reputation of the malicious nodes was monitored.

5 Results

Figure 5(a) presents the evolution of the number of transactions performed by the malicious components as times passes. As can be seen, firstly the number of transaction remains constant, but from a time between \( t = 80 \) and \( t = 200 \) min, the number of transaction in every malicious node descends slowly but continuously, following an exponential-like law. This exponential tendency is explained by the fact that, once the reputation goes below the stablished threshold, final applications look for other information sources to obtain their data immediately. Depending on the uncertainty of the received data (see Expression 9 and Fig. 3), the descent phase can start later or be very abrupt from the beginning. Moreover, components performing a low number of transactions require more time to be isolated, as more time is required to acquire the necessary information about their behavior. Any case, each of the ten malicious components, at the end, got isolated, so the objective of the proposed solution is met.

Fig. 5.
figure 5

Simulation results

On the other hand, the correspondence between the amount of performed transactions and the calculated reputation can be seen in Fig. 5(b). In this figure a group of sigmoid functions is showed, as the nobleness calculation model is described using these functions. In this case, the reputation level descends constantly since the initial instant, as malicious components behave in a disingenuous way from the beginning. In comparison, the number of performed transaction remains constant for every component, until its reputation goes below the fixed threshold. This fact must be taken into account, as there is a convergence time during which malicious components may affect the entire system. The descent stops when no transactions are performed by the component (see Fig. 5(a)) and measurements are not updated.

6 Conclusions and Future Works

In this paper we have described a mathematical framework for the data uncertainty level calculation, which allows final application to estimate the nobleness and the reputation of the information sources and other IoT components in large-scale systems and architectures. Results showed that malicious components get isolated as time passes if implanted the proposed technology.

On the other hand, the proposed reputation model may be enriched with additional measurements (such as the component solidarity) in order to accelerate the isolation process. Future works should consider these ideas (exposed also in Sect. 3), as well the possibility of including a token-based rapid hazard detection system (in order to avoid severe damage during the convergence time). Finally, both, the formalization of the solution and the simulation scenario, describe the basic problem, so future works should extend the proposed framework to scenarios where bidirectional communications and data aggregation are presented.