Keywords

1 Introduction

Recent years have seen a proliferation of the Internet of Things (IoT) devices intended for consumers’ homes [7]. Owners are transforming their homes into Smart Home Systems (SHSs) with variant IoT Internet-connected sensors, lights, and appliances that can sense and actuate in the physical environment. Several SHS automation platforms (e.g., SmartThingsFootnote 1) are now available in the market. These platforms provide a new level of convenience by enabling consumers to automate the control of their SHS devices by installing and delegating authorization to third-party applications (called IoT apps) [9]. To do so, SmartApps use simple Trigger-Action rules where the control action of a given device is only performed when the triggering event has occurred [3]. For instance, a ‘Welcome Home’ SmartApp sets the mode to home when the light in the living room is turned on.

While SHS automation is supposed to be executed regularly, intentional/unintentional issues could make SmartApps deviate from their regular behavior, putting the SHS owner’s security at risk and create unsafe or damaging conditions. First of all, poor configuration by novice SHS users (e.g., parents and kids) at the installing stage of SmartApps can transition the SHS to unsafe physical states due to the conflicting logic of common SmartApps [11]. For example, if the SHS owner installs a new SmartApp, yet, an already installed SmartApp is listening to the action executed on the device controlled by the new SmartApp. As a consequence, this device will be unexpectedly controlled once the new SmartApp is triggered [3]. Secondly, SmartApps provided by different third-party developers may contain some programming faults leading to a bad, hence an unexpected functioning of the SmartApps. Lastly, the Trigger-Action model of SHS platforms provides flexibility for the attacker to embed their malicious logic into the SmartApps using available triggering events (e.g., home mode changing) [4]. The activation of malicious logic makes the SmartApp deviates from its past regular behavior since it starts to perform unexpected automation actions.

Recent works have proposed the enforcement of policies that describe the security and safety properties that refer to regular operation of the SHS [2, 3, 8, 14]. In particular, the adherence of the SHS automation control is continuously checked to the properties defined by the policy, and the control commands causing the policy violations are blocked. Unfortunately, the pre-definition of the policy is the major problem facing this type of system. First, general-purpose policies (e.g., defined by security experts) are not personalized and may not suit all SHSs automation configurations. Moreover, SHS users often do not know exactly what to expect from the system when acquiring it, thus if the policy definition is left for the user him/herself, security and safety properties may not be well defined. Consequently, there is a growing need for a new security model that is personalized and supports self-learning.

Given that the SmartApps leverage the Trigger-Action automation model for operation, they follow a frequent pattern when they are triggered by the occurred events to control the SHS devices. On one hand, the occurred events are the result of the daily living activities of the home inhabitant (e.g., door opening). And because the inhabitant tends to follow frequent patterns when living and performing variant activities inside the home (e.g., every evening he/she back home from work), triggering events also occur in the same particular patterns. On the other hand, since SHS devices are operated by these events (e.g., door opening triggers the light to turn on), SmartApps also control the devices in such a recurrent pattern. This set of regular patterns can be described by several behavioral features such as the occurrence probability of an event, the probability of the SmartApps to control a given device while being triggered by the occurred event, etc. Thus, any deviation of the SHS automation control from such regular behavior could be detected based on the analysis of the SmartApps behavior.

Tracing then assessing users’ and entities’ activities of a cyber system is better known as User Entity Behavior Analysis (UEBA) [13]. UEBA is a self-learning approach that leverages Anomaly Detection (AD) algorithms. The basic idea is to first build a baseline model over the regular conduct of users and entities. Then, deviations from this baseline could be further analyzed accordingly. In this paper, we leverage UEBA and AD to devise a framework for securing SHS automation control based on the behavior analysis of the SmartApps. Such an approach has been already leveraged in our prior work to authenticate SHS users and prevent devices from unauthorized control [1]. However, in this work, the SHS user is not included in the behavior analysis and the only monitored entity is the SmartApp. Our proposed framework ensures the three following properties:

  • Personalization and self-learning: the framework automatically build an ML One-class Support Vector Machine (OCSVM) for each installed SmartApp in the SHS without any intervention of its owner.

  • Continuity: based on a set of behavioral scores that can be calculated and assessed, the behavior of an IoT app is evaluated during its entire lifecycle.

  • Trust-based Verification: a confidence score of each installed SmartApps is calculated and evaluated to ban the ones showing steady anomalous behavior.

This paper makes the following contributions:

  • First to apply behavior-based anomaly detection to secure SHS automation control.

  • Extraction of new Behavioral Scores to monitor and evaluate the SmartApp-based automation control of SHS devices.

  • We are the first to adapt the history data of manual control of appliances and objects by inhabitants inside a home environment to be used as the history automation app-based control of SHS IoT devices, to remedy the lack of such data in public repositories.

  • Experimental results validate that such a user behavior-based approach is a promising security scheme to be integrated into existing commercial SHS platforms.

The rest of the paper is organized as follows. In Sects. 2, we discuss some of the related works. The design of our proposed framework will be explained in Sect. 3. Section 4 presents the experimental evaluation of the Anomaly Analyzer one of the core modules of the framework. Finally, Sect. 5 concludes this paper and underlines some future directions.

2 Related Work

As summarized in Table 1, several works have been proposed recently to secure SHS automation control from malicious control by the enforcement of policies that describe the security and safety preferences of the SHS owner [2, 3, 8, 14].

Table 1. Summary of related work

Tian et al. proposed SmartAuth, an authorization policy-based system that learns about the SmartApps actual functionality by analyzing their source code and the description provided by developers [14]. Then, the discrepancies between the SmartApps description and their programmed logic are pointed out and displayed to the user through an automatically generated interface. After that, SmartAuth retrieves the user’s explanation and approval for the extracted discrepancies using natural-language-generation techniques. Once a user sets his/her policy settings through the user interface, SmartAuth enforces the policy by blocking unauthorized commands. Celik et al. proposed Soteria, a model checking based-system to verify whether installed SmartApps adhere to security and safety properties. The enforced properties are a set of systematically developed policies that represent the physical behavioral specifications of users’ expectations about the safe and secure behavior of an SHS [2]. IoTGuard another policy-based authorization system retrieves SmartApps information (e.g., events and actions) at runtime and stores them in a dynamic model that consists of transitions and states [3]. The dynamic model represents the runtime execution behavior of the SmartApp. Using the reachability analysis technique, this model is then evaluated against the same policies used by Soteria [2]. Recently, Ibrahim et al. proposed an automated technique to derive actionable security rules from security standard recommendations (e.g., OWASP IoT Security Guidance) [8]. The extracted policy is then translated into a formal language to detect policy violation using formal techniques such as the Boolean satisfiability problem (SAT).

Although the proposed systems consider additional design and security features beyond the existing authorization models in current SHS automation (e.g., SmartThings Permission Model), they suffer from a major problem related to the pre-definition of the security policy. Indeed, general-purpose policies as proposed by [2, 3], and [8] are not personalized and may not suit all SHSs automation configurations. Moreover, as leveraged by SmartAuth [14], users may not be able to accurately explain their specific security preferences.

To overcome these issues, we propose to build a self-learning Machine Learning (ML) models that summarize the automation behavior of SmartApps by learning their pattern of triggering events and controlled devices. The historical regular behavior of SmartApps is then used to analyze future automation commands and discriminate their legitimacy or maliciousness. Such an approach is personalized for each SHS and offloads the manual definition of security policy from the user since minor intervention is needed.

3 Proposed Framework

In this section, we first provide an overview of the operation of our proposed framework and present the SHS platform which we consider as a use case. Then, we explain the operation of the framework in detail.

3.1 Overview

Figure 1 shows the operational architecture of our proposed framework including two stages. Before the framework starts securing SHS devices from unexpected SmartApps automation control, an initiation stage is first performed on the historical control of the SmartApps including two processes viz., SmartApps Control Log Collection, and SmartApps Regular Behavior Enrollment. The result of the initial stage is the AD models summarizing the regular behavioral patterns of the SmartApps seen in the SmartApps Control Log. Once AD models are built, the framework starts analyzing the control commands issued by SmartApps. This stage includes two modules viz., Anomaly Analyzer and Action Manager.

Fig. 1.
figure 1

Operation stages of proposed framework

To show the concrete operation of our proposed framework, we consider the SmartThings platform as a use case in this paper which has the largest number of supported SmartApps among all the SHS platforms. SmartThings is a cloud-backed SHS platform that allows third-party developers to publish their automation apps (called SmartApps) [4]. An SHS owner can install and delegate authorization to these apps to autonomously monitor and control his/her home devices.

As depicted in Fig. 2, SmartThings uses a cloud backend to abstracts physical SHS devices into device handler instances. These software wrappers handle the real underlying communication between the cloud and the physical devices. SmartApps can subscribe to the events fired by a set of instances of device handlers and issue commands to control the devices handlers.

SmartThings also provide a smartphone companion app for users so they can install SmartApps published in the store, and configure and delegate authorization to the SmartApps that support the capabilities provided by their devices. The permission model is the security architecture that governs the access of a SmartApp to the commands and attributes provided by the devices handlers. Commands represent ways in which a device can be controlled or actuated (e.g., turn on/turn off). Attributes represent the state information of a device (e.g., on/off) [5]. When a user installs a SmartApp, an enumeration process is triggered that scans all the physical devices currently paired with the user’s hub, and that supports the commands and attributes claimed by the SmartApp. Once the user chooses one of the suggested devices, the SmartApp is authorized to control the selected device.

Fig. 2.
figure 2

SmartThings architecture

Figure 3 shows the distribution of the framework modules on the architecture of SmartThings. In the following, we discuss the operation of each module for both stages.

Fig. 3.
figure 3

Distribution of our framework modules on SmartThings platform

3.2 Initial Stage

The initial stage is the first process the framework has to perform after being deployed to be ready for the analysis stage.

SmartApps Log Collection. The first step towards the building of regular SmartApps behavioral patterns is the collection of their historical automation traces. To accomplish this task we add the Logger, a module that intercepts the automation control commands issued by the SmartApps towards the devices handlers and sends them to be saved in the SmartApps Control Log file. The information extracted from a control command includes: SmartApp ID, triggering event, controlled device, control action, and timestamp.

Fig. 4.
figure 4

Process of SmartApps regular behavior enrollment

SmartApps Regular Behavior Enrollment. Once the SmartApps Control Log is collected, the baseline models summarizing the behavioral patterns of each one of the installed SmartApps are built. This process is called Enrollment and its output is a set of AD models that are saved to be retrieved in the analysis stage. As described in Fig. 4, the Enrollment process includes four sub-processes. The subsequent sections explain each sub-process in detail.

  1. (a)

    SmartApps Control Log Segmentation: before being used in the construction of Probabilistic Models and the training of AD models, the collected SmartApps Control Log needs first to be prepared. This task allows the extraction of more information about the SmartApps patterns. Given that the behavior of an SHS inhabitant through the 24 h of the day is generally segmented into a set of frequent periods wherein the user has some specific behavioral routines (e.g., waking up and going to work in the morning). And because the SmartApps behavior is related to the SHS owner behavior (i.e., triggering event occurs due to SHS inhabitant physical activities), the log segmentation consists of adding the corresponding time interval of the day (i.e., period) to each record in the SmartApps Control Log.

  2. (b)

    Probabilistic Models Construction: a Probabilistic Model is of a set of vectors and matrices containing different probabilities that describe the SmartApps behavioral patterns seen in the prepared SmartApps Control Log. In particular, we distinguish two types of models. The SmartApps Dependent Behavioral Model (SADBM) contains the behavioral probabilities related to the SmartApp itself. Whereas, the SmartApps Independent Behavioral Model (SAIBM) contains the behavioral probabilities related to the set of occurred events and installed SmartApps. Table 2 describes the parameter of each model.

  3. (c)

    Behavioral Scores Extraction: we call the Behavioral Scores, the data on which the AD Models are trained. Extracting these scores consists of calculating a tuple of numeric values for each command seen in the SmartApps Control Log using the constructed Probabilistic Models (i.e., SADBM and SAISBM) (cf. Fig. 4). Hence, a Trigger-Action command issued by a SmartApps is described by the six following behavioral scores:

    • Event Occurrence: probability of the SmartApps to be triggered by the occurred event.

    • Device Control Given Event: probability of the SmartApps to control the given device while being triggered by the occurred event.

    • SmartApp Transition: probability of the given SmartApp to be triggered after the previous one has been triggered.

    • SmartApp Transition Latency: time interval between the triggering of the given SmartApp and the triggering of the previous one.

    • Events Transition: probability of the SmartApp to be triggered by the occurred event after being triggered by the previous one.

    • Events Transition Latencies: time interval between the occurrence of the given event and the previous one.

  4. (d)

    AD Models Training: training AD models on the set of extracted Behavioral Scores is the fruit of all the previous Enrollment sub-processes. Since our objective is to discriminate legitimate control commands from anomalous ones, we are dealing with a binary classification problem in terms of Machine Learning. However, as only regular Behavioral Scores are available during the Enrollment stage, One-Class Classification (OCC) should be used in such a situation. In this work, we use the One-Class Support Vector Machines (OCSVM) [12] as it has shown high performances in detecting anomalies in many other application domains compared to other AD algorithms [6].

Table 2. Probabilistic models description

3.3 Analysis Stage

To prevent unwanted devices operation resulted from SmartApps behaving against their expected behaviors, the Action Manager (AM) intercepts the commands issued by SmartApps to analyze their legitimacy/anomaly via the Anomaly Analyzer (AA). Then, it takes security actions accordingly.

Anomaly Analyzer. As shown in Fig. 5, upon receiving a control command from the AM, the anomaly analysis sub-process is triggered. In particular, AA first calculates the Behavioral Scores (BSs) from different parameters of the command (e.g., event, controlled device, etc.). Then, it retrieves the trained OCSVM model and applies it to the calculated scores. The application of the OCSVM outputs an Anomaly Score (AS) that varies in the range of [–1,+1]. Moreover, AA updates the SmartApps trust (T) using the resulted AS (explained in the following). Finally, it sends the three obtained parameters i.e., AS, BSs, and T back to AM.

Fig. 5.
figure 5

Anomaly analyzer process

Fig. 6.
figure 6

Action manager process

Action Manager. Figure 6 depicts the process of the Action Manager (AM). Once a command is issued by a SmartApp, AM sends it to AA which sends back three parameters i.e., AS, BSs, and T, as explained before. After that, AM starts by testing the legitimacy of the issued command from the obtained AS. In particular, if AS is above a predefined Anomaly Threshold (AT), AM sends the command to the device handler to be executed. However, if AS is below AT, AM prompts the user to confirm whether the command (i.e., occurred event and the action on the device) is suspicious or not. If the user confirms that command is suspicious, the AM suggests the SmartApp removal to the user.

However, if the user confirms that the command is not suspicious, or if he/she is already prompted for the given SmartApp, a trust-based verification is employed. In particular, AM updates a confidence score for each requested command to evaluate the trust towards the SmartApp. To do so, a trust (T) value is calculated from AS outputted by AA. If this value is still below the allowed level of trust called Lockout Threshold (LT), the SmartApp is allowed to operate and its commands are sent to be executed. However, once the T value drops below the LT, the SmartApp must be stopped and its removal is suggested to the user. The formula that we adopt to calculate the change in SmartApp trust is the one described in [10] (cf. Eq. 1), where the parameter AT represents the predefined Anomaly Threshold. Parameter B is the value of AS in which the maximum value of penalty/reward is given, whereas the parameters C and D are the upper bound value of the reward and the penalty, respectively.

$$\begin{aligned} \varDelta _{Trust}(AS_{i}) = min (\frac{D(1+\frac{1}{C})}{\frac{1}{C} + exp(-\frac{AS_{i}-AT}{B})} - D,C) \end{aligned}$$
(1)

4 Experimental Evaluation

In this section, we evaluate the ability of the Anomaly Analyzer in detecting anomalous automation commands issued from misbehaved SmartApps on datasets that involve different SHSs.

4.1 Evaluation Dataset

To remedy the lack of SmartApps-based automation control history in public repositories, we propose to use the history data of manual control of appliances and objects by inhabitants in real-world home environments. We assume that the devices controlled by inhabitants are controlled by the automation SmartApps. The data we will be using for this purpose is the one collected in the MIT House Consortium [15]. For two weeks, sensors were installed in everyday objects such as drawers and refrigerators to record opening-closing events in two single-person apartments as the inhabitants carried out everyday activities. The recorded inhabitant’s activities (e.g., eating) are grouped into a set of categories (e.g., personal needs). Since SHS devices’ automation control is based on the Trigger-Action model, we came up with the idea to assume that the SmartApps IDs are the activities categories, the triggering events are the activities themselves, whereas, the controlled devices with the particular control actions and timestamps are extracted as they are. Thus, the obtained SmartApps Control Log includes necessary information as needed viz., SmartApp ID, event, device, action, timestamp. The description of the obtained Control Logs for both SHSs is given in Table 3.

Table 3. Evaluation dataset description

4.2 Evaluation Methodology

Our evaluation dataset contains the automation control history of two SHSs, where six SmartApps are installed in each. Consequently, six SmartApp baselines (i.e., Probabilistic Models, control commands Behavioral Scores, and OCSVM models) are built for each SHS over the extracted control logs of each SmartApp. To evaluate the ability of the Anomaly Analyzer in discriminating misbehaved SmartApps from regular ones, we follow a primary-vs-adversary strategy. In particular, we first search for common installed SmartApps from the two SHSs. Then, we evaluate each SmartApp OCSVM model of the chosen primary’s SHS against the behavioral scores of the corresponding SmartApp from the adversary’s SHS, and vice versa. To make sure that the results will not be biased or coincidental, a 5-fold cross-validation is employed. In particular, the behavioral scores for each SmartApp are split into two parts (80% and 20%). The 80% part is used to train the OCSVM of each SmartApp. Then, the remaining 20% part is combined with adversarial Behavioral Scores of the corresponding adversary SmartApp to construct the final testing Behavioral Scores.

4.3 Results and Discussion

Since the goal of Anomaly Analyzer is to identify malicious control commands without incorrectly rejecting the legitimate ones, we calculate the fraction of testing control commands that have been incorrectly accepted, better known as False Acceptance Rate (FAR). Whereas, to measure the user convenience level, we calculate the fraction of benign control commands that have been incorrectly rejected, better known as False Rejection Rate (FRR).

Unfortunately, since the two rates cannot be simultaneously reduced, we should prioritize the reduction of one of them over the increase of the other. We recall that the Action Manager (AM) uses an Anomaly Threshold (AT) to discriminate the legitimacy/anomaly of a control command from the outputted Anomaly Score (AS) that varies in the range of [–1,+1]. If a SmartApp issues a malicious control command for the first time, the user is prompted by the AM to confirm the legitimacy/anomaly of this command. However, prompting the user too often cannot satisfy the real-time SHS automation (e.g., users need to be awake to respond). This case implies that the frequency of prompts must be reduced as low as possible and our proposed framework should be more user-friendly than secure. To guarantee such a feature, the FRR should be reduced and prioritized over the FAR, hence the AT value must be chosen as low near to the negative end of the AS range.

By choosing such an AT value, Table 4 gives the obtained values for the FAR and FRR rates for one of the testing scenarios i.e., SHS 1 as the primary vs SHS 2 as the adversary (4 common SmartApps have been found among the two SHSs). We can see that the FRR reach in the worst cases a value of 2.5%. This low rate ensures that benign control commands are rarely rejected. Hence, a better experience is provided for the user since he/she is not falsely prompted very often. On the other hand, we can see that the FAR reach 7.22% in the worst cases. Such an acceptable rate ensures that malicious control commands are rarely accepted. Hence, the removal suggestion of malicious SmartApps to the user happens as quickly as possible and these SmartApps were not able to issue many control commands.

Table 4. Obtained results of FAR and FRR for Subject1 vs Subject2

5 Conclusion

In this paper, we investigated the feasibility of building Anomaly detection (AD) models on the regular behavior of a Smart Home Systems (SHS) automation SmartApps when controlling the SHS devices. The AD models were the basis of our proposed security framework to continuously confirm or reject automation control commands issued by the SmartApps according to their deviation from this AD baseline. In particular, a One-Class Support Vector Machines (OCSVM) was trained on regular behavioral scores which have been extracted from the control logs of the SmartApps. In the future, we plan to investigate how the trained OCSVM models should be updated to cope with the change of SmartApps regular behavior using the Incremental version of OCSVM.