Keywords

1 Introduction

While the rapid evolution of Information Systems (IS) is taking its rise in all domains, data turns out to be the most powerful and silent weapon that can change the world since it can be used to get insights, make decisions, increase revenues, etc. Process mining (PM) [1] is one of the techniques that helps in processing the available data to get better knowledge. PM main objectives are to discover the process model, to check its conformance with the current process, to enhance the process, and to finally recommend to the user the next activity by relying on activity logs recorded from IS [1]. However, PM does not take into consideration the contextual background behind the user activity. It discovers the process model based on the user activity logs only. In fact, users are not activity-oriented and there are many external factors such as time, location, profile, etc. that can affect the activity selection. Hence, with the existence of Artificial Intelligence tools, it would be easier to access the contextual environment behind an activity while relying on different types of sensors. Since the smart home is a rich context environment, it would be interesting to study it in order to explore how contextual information can affect user activity directly or indirectly.

The enrichment of the process model by additional data had already been handled with semantic process models [3]. In this work, the authors discussed the benefits and the capability of a semantic process model. Several works study how process mining could be enhanced using semantic data organized into ontologies [4,5,6]. Our goal is to improve process models by semantics issued from sensor data to build a contextual process model.

In this paper, we explore Smart Home datasets to check how likely contextual information can affect user activities and to construct a contextual process model. The datasets used in this report were provided by BP-Meets-IoT Challenge [7] and contain data about everyday life and home activities. It is composed of 2 main simulated datasets: a dataset that consists of the activity log of a single living inside a home (DS1) coupled with the corresponding sensor logs, and a second one that consists of the activity log of two livings (DS2) also coupled with the sensor logs. All logs were provided in XES format.

The next section will present the background. Section 3 will focus on data exploration and Sect. 4 will present the research questions and the method used to construct a contextual process model. In Sect. 5, we describe the results and the discovered process model. Related works are detailed in Sect. 6. We conclude in Sect. 7.

2 Background

Hereafter, we describe the background of the research fields.

Event logs are considered the most important source of information and the major input for the mining techniques. Usually, there is a difference between the existing prescribed business process model which is provided by the organization, and what the user really does to complete their tasks in the actual process. In fact, events logs are the base of the process mining technique which permits the discovery of the actual business processes, the conformance verification with the existing prescribed processes, and the enhancement of the process model [19].

PM is a Business Process Management evolving technology where the main objective is to discover, check the conformance, and enhance the process models that are based on event logs [1]. PM focuses on the generated activities from the business processes so it can be used as a recommendation technique to direct the user on which next activity to follow according to his current activity [20, 21]. PM focuses on activity-oriented models. PM has shown that the actual processes that are extracted from event logs can be different from the prescribed business process. In contrast to PM models that are created as a sequence of steps that don’t support variability [22]. According to [23] to properly understand research processes, it is essential to trace them. The collected traces depend on the process model established, which must be as accurate as possible to comprehensively record the traces. Still, the major drawback in tracing processes is finding an adequate modeling language that covers all the aspects needed when analyzing these traces. [23] presents five types of process models from the information systems engineering domain to use to represent processes: Activity-oriented process models, Product-oriented process models, Decision-oriented process models, Context-oriented process models, and Strategy-oriented process models. In addition, different process models’ annotations were described in [1], that are used to represent the process after the execution of a process discovery algorithm: Transition Systems, Petri Nets, Workflow Nets, YAWL, BPMN, EPCs, Causal Nets, and Process Trees.

Typically, PM techniques do not take into consideration the context behind the user activity. In [24, 25], they proposed a contextualization methodology based on the process to be able to construct models on the fly while taking into consideration the situation behind them. Although, in [3] they have discussed and pointed out the benefits and the capability of a semantic process model. Hence, it would be interesting to step to build a contextual process model.

3 Dataset Exploration

In this work, we have relied on two main datasets DS1 and DS2 that describe the daily habits of two individuals living in a Smart Home (DS1 for the individual 1 and DS2 for individual 2). Both datasets were recorded for 21 consecutive days between 16 March 2020 and 6 April 2020 from 0:00 am to 11:59 pm. Each dataset contains the person activities logs and the sensor logs. Initially, the activity logs in DS1 contain 4068 event records while the sensors log contains 34571 event records. And the activity logs in DS2 contain 28238 event records and the sensors log contains 39304 event records.

The following subsections present both activity and sensor logs (correspondingly Subsects. 3.1 and 3.2) and data classification in order to identify groups of different elements of both logs (Subsect. 3.3).

3.1 Activity Logs

The activity logs are composed of a set of traces. Each trace contains a set of events grouped under a case name. Each record in the activity logs is characterized by different attributes as illustrated with an extraction in Fig. 1. These attributes are described with: Case Name: categorizes a set of activities under a specific goal; Trace Id: groups a set of events; Event Id: indicates the unique Id that distinguishes every record inside the dataset; Activity Name: describes the activity that is taking place; Resource Id: describes the person who’s doing the activity; Timestamp: indicates the date and time when the activity has occurred; Transition: indicates if the event record is a start or a complete activity.

Fig. 1.
figure 1

Event Logs Samples.

3.2 Sensors Log

The sensor logs correspond to a set of events. Each record in the sensor logs is characterized by some fields as depicted in Fig. 2.

Fig. 2.
figure 2

Sensor Log Samples.

The sensor logs are characterized by the following attributes: Event Id: which indicates the unique Id that distinguishes every record inside the dataset; Activity Name: describes the sensor type that was triggered; Resource Id: describes the resource that is triggering the sensor. It’s either the person living inside the home or it’s automatic by the system; Timestamp: indicates the date and time when the sensor event has occurred; Value: indicates the value range of the sensor. Noting that each sensor has a different set of values according to the sensor type.

3.3 Data Classification

We classified the main elements of the datasets (activities and sensors) into different groups. As a result, we were able to identify 53 activities and 14 sensors that were provided in [7] and that are listed below in Table 1 and Table 2.

In addition, we have found that each activity can be grouped into a set of different activity types. The provided dataset contains already defined categories of activities. However, these categories are defined by the authors with regards to goals. We aim at grouping activities regarding their nature; thus, it will allow us to avoid having the same activity classified into multiple categories as it is done in the initial dataset description [7]. We have defined 13 activity types such as activities specific to the bathroom or the kitchen etc. as shown in Fig. 3. Also, we have classified the sensors into different categories. We have defined 7 categories as shown in Fig. 4. Note that each sensor acts differently from another sensor and each sensor has its own range of value.

Table 1. Home Activities List in the Dataset.
Table 2. Sensors List in the Dataset.

Subsequently, we have identified the different locations or positions where a sensor can be linked to, or an activity can take place. The different positions are listed in Fig. 5.

While categorizing the activities, the sensors, and the positions, we relied on the provided dataset. However, we tried to define the groups in a generic way to allow extension when it’s used in different contexts, countries, or cultures.

4 Research Questions and Proposed Approach

The research methodology used in this proposal has been detailed in [2]. We explain below the research questions specific to the work presented in this paper and the main proposed approach.

4.1 Research Questions

The natural behavior of a living person tends to be variable and doesn’t stick to a fixed schedule or a routine to perform its daily living activities. We believe that the person will act or adjust his daily activities and tasks according to the contextual environment that can affect him directly or indirectly. For instance, the person during the weekend performs different activities than on the weekdays. In addition, on a rainy day, the person will exercise indoors while when it’s a sunny day, he can go outside for a walk. We believe that it is possible to identify the links between context data and user activities.

Therefore, in this paper, our research questions are the following: Question 1: Does the contextual environment affect user activities? Question 2: Can links between sensors’ data and activity logs be automatically identified?

Fig. 3.
figure 3

Activity Types Grouping.

Fig. 4.
figure 4

Sensor Types Grouping.

Fig. 5.
figure 5

Positions.

4.2 Proposed Approach

As mentioned above, we relied on DS1 and DS2 datasets. For the first question, we used both of them while for the second question, we used only DS1. The method used in this paper is illustrated in Fig. 6.

Fig. 6.
figure 6

Approach Overview.

It includes data transformation as a first step. Then, the method contains the next three steps (Activity and Sensor Mapping, application of the Apriori Algorithm [8], and the Process Model Discovery using Disco) which could be applied separately or in parallel. As a final step, we create a contextual process model using the outcome of the previous steps.

Data Transformation.

Data transformation consists of data cleansing and data manipulation according to our needs. The activity logs and the sensors log contained some noise, in addition to some duplicated records. A data cleansing was established on those logs to remove noise and duplicated data. Hence, we obtained 3154 records in the activity logs and 4332 records in the sensors log in DS1, and in DS2 we obtained 23166 event records. For data manipulation, we have transformed both logs from XES format to CSV format. Then, we have added two additional attributes for both logs: the Position attribute and the Day attribute. For the Position attribute, we have annotated each activity record and each sensor event record with a position value that indicates the actual location where the activity has taken place or the location where the sensor should be positioned relying on Fig. 5. For instance, the Position attribute for an activity related to wash_dishes or unwashed_dishes should be the kitchen_sink. As for the Day attribute, we have annotated the records from Day1 to Day21 to group all the events that are linked to a specific day. For example, all the activities records and the sensor events records that have occurred between 2020-03-16 00:00:00 + 00:00 and 2020-03-16 23:59:00 + 00:00 are annotated by Day1.

Activity Mapping with Sensors.

To answer the first question, we have done a mapping between the sensors and the activities according to the Position and the Timestamp attributes. Hence, we obtained 112 correspondences between the sensors and the activities that will be described in Subsect. 4.1 in Fig. 8.

Apriori Algorithm Application.

To answer the second question, we have applied Apriori [8] which is an Association rule mining technique. Hence, association rule mining main’s goal is to find the hidden relationships between different items. It is commonly used for marketing purposes such as in Market Basket analysis to identify the items that are frequently bought together. Association rule mining allows us to find frequent patterns, causal structures, and associations [8, 9]. We used association rule mining because it allows us to find the rules that show us how an appearance of a specific item will allow the occurrence of other items. In our case, an item represents either an activity or a sensor.

Therefore, we will use the association rule mining technique since it’s a rule-based technique to find the causal structures between the activities and the sensors as we believe that there is a hidden relationship between the contextual environment and the user behavior. Association rules are composed of an antecedent and a consequent and is represented by if–then statements. The Apriori algorithm is one of the top algorithms in the rule mining technique [10] and it allows us to find the relationships between the sensors and the activities. In order to apply it to our dataset, we first had to combine both logs in a single log file sorted according to the Timestamp attribute. Then, we had to transform the activities and sensors records into a transactional records list which is supported by Apriori as an input parameter. We obtained 410 transaction records containing the activities and sensors as transaction items. In addition to the transactional list, Apriori needs additional parameters such as the minimum support and confidence. Since we want to obtain strong rules with good confidence, we set the confidence value to 80% in all the experiments. We did 3 experiments as shown in Table 3, and the value of minimum support was set through the process of trial and error.

Table 3. Apriori Experiments.

After analyzing, manually, the generated rules, we found that the result of the second experiment is more realistic due to the number of generated rules in addition to the minimum support which is not very low. In Subsect. 5.2, we will present a set of the generated rules from the second experiment.

Process Model Discovery Using Disco.

This part allows us to complete the answer on the first question. We have used the process mining tool Disco [11] to obtain the process models in order to know the difference between the habits of the weekend and weekdays and how likely a resource profile would affect the process model. We mapped the Day attribute in the activity logs to the Case attribute in Disco and we applied the process discovery on the activity logs. The results are presented in Subsect. 5.3.

Contextual Process Model Creation.

Based on the results of the previous steps, we generated a contextual process model (described in Subsect. 5.4).

5 Results Analysis and Contextual Process Model

In this section, we analyze the results of our experiment.

5.1 Activity Mapping with Sensors Result

We believe that context elements can affect user activities directly or indirectly. Since the sensors’ data represent the contextual environment, we did a mapping using human reasoning between the sensors and activities. Figure 8 represents the established mapping, to show that each activity can have one (or more) triggered sensor(s), or, in the contrary, that a sensor can affect one or more activities. As an instance, when unwashed_dishes sensor value is greater than 0 then the wash_dishes activity might take place. In addition, the water_use and the position sensors values will be modified accordingly. The position sensor value will be set to the kitchen_sink while the water_use sensor will indicate the water usage. Plus, the put_plate_to_sink activity will trigger the unwashed_dishes sensor value which will cause the occurrence of other activities. For the mapping, we used the colored sensors from Fig. 7 to illustrate the different sensor categories.

Fig. 7.
figure 7

Sensors Grouping.

Fig. 8.
figure 8

Activity Mapping with Sensors.

In Fig. 8, the activities that are mapped to the same sensors are grouped together but the sensor value would be different for each triggered activity. As an illustration, when the activity is raise_blinds then the value of the sensor blind will be 1; when the activity is lower_blinds then the value of the sensor blind will be 0.

5.2 Application of the Apriori Algorithm

From Apriori results, we were able to find interesting rules that confirm the links between activities and sensors. Table 4 shows a sample from the generated rules. For instance, when the get_glass and drink_water activities take place then the water_use sensor will be activated indicating the water consumption usage. These rules will be used to construct the contextual process model.

Table 4. Sample of the obtained rules.

5.3 Process Model Discovery Using Disco

In this subsection, we focus on the discovered process models using Disco [11].

As explained above, we used Disco to obtain different process models. We mapped the Day attribute in the activity logs to the Case attribute in Disco and we applied the process discovery on the activity 3 times: (1) for the entire log, (2) for the weekdays and (3) for the weekend days. Hence, we obtained 3 different process models. Finally, we also used Disco to identify the difference between the habits of the two individuals to obtain 2 different process models. Figure 9 represents two extractions for the discovered process models using the weekdays and weekend activity logs.

Fig. 9.
figure 9

Extractions of the Obtained Process Models for Weekdays (left) and Weekend (right).

Based on the obtained models, we can directly deduce that the process model discovered from the weekdays event records is different from the process model discovered from the weekend event records. Human living tends to execute different types of activities between the weekdays and the weekends due to their work schedule, the time, the country, the weather, etc.

As expected, the activity go_to_work is missing from the weekend process model and we can find replacement activities as go_exercise_place or do_exercise, which is not the case in the rest of the week, the individuals having not enough time to exercise after spending hours at the workplace. Moreover, the highest number of events that occurred during the entire week is during the weekends on Sundays and Saturdays because the person tends to spend more days at home. This difference confirms that the date affects the user activities.

We have discovered two process models using the event logs of each person separately. It showed us how likely the activity process model would be different between two persons living inside the same home. We noticed clearly that resource 2 (the second individual) does not go to any workplace. In addition, it seems that resource 2 is responsible for the wash_dishes activity, which does not appear at all in the process model of resource 1 (the first person). This difference between the two process models relates to how the person’s profile such as age, gender, character, and hobbies can also affect the user’s behavior while enacting his daily activities. Thus, these relations between profile, sensor data and activities would provide more precise information to construct contextual process models.

5.4 Contextual Process Model

We showed that the contextual elements such as time, location, profile, etc. affect directly or indirectly the user activities. It would be interesting to annotate the process model with contextual information. This will enrich the process model by providing more accurate information about the activity current situation which will help in better decision making and insights. Figure 10 shows a process sample that was extracted from the weekend process model from DS1. The process model was annotated by the information of the sensors that were found previously from the mapping between the sensors & the activities and from the application of Apriori. This contextual information will help to get a clearer vision about the current situation. For instance, when the person’s position is on the kitchen sink and the unwashed_dishes sensor value is greater than 1 then the person should be recommended to wash_dishes. On the other hand, the actor (user) profile should be taken into account because it also affects the user behavior like the contextual environment.

Fig. 10.
figure 10

Contextual Process Model.

6 Related Work

Multiple research works have already identified the importance of putting context into process models.

In [4], they described scenarios that illustrate how the process mining could be enhanced by using semantically annotated event logs. The authors of [5] described a semantic process mining approach allowing to enrich event logs using semantic data organized in an ontology. In [6], the industrial benefits and challenges of semantic process mining are analyzed. In [3], the authors presented the benefits of semantic annotation for process modeling.

In [12], the authors have considered 4 main types of contexts: the context that is directly linked to the process instance, the context that is related to the process overall, the social context that is linked to how people interact with others, and the external context that is affected by external factors such as weather, economic climate, etc. They conclude that contextual information should be used in the construction of process models.

The authors of [12, 14] have presented the necessary core building elements to enable semantic process mining which focuses on ontologies to find the link between the generated events and the real concepts they presented in the ontologies.

In [15], the authors presented an approach to filtering and abstracting event logs using ontologies and cluster analysis in the healthcare domain. Hence, their approach consists of incorporating data mining with process mining techniques to create contextual process mining.

[16] presents a framework for a knowledge-based abstraction of event logs, and the output of the framework which is the abstracted traces will be given as input to the semantic process mining technique.

The researchers in [17] introduced a framework considered an ontology-based system that supports the development of semantic process mining techniques.

The authors in [18] applied process mining methods to event logs of the activities of daily living of the elderly inside a Smart Home but they didn’t take into consideration the contextual data.

All the mentioned previous works acknowledge that the contextual information related to each specific event will enhance the process mining techniques. Thus, we studied the scenario of daily living inside a Smart Home in order to build a process model using the sensors data, because we believe that the daily activities of a human living can be affected by different factors and can be simulated to build a contextual process model.

7 Conclusion

In this work, we propose an approach to enrich the process models mined from the activity logs with contextual data. We applied this approach to a case study by exploring a dataset related to Smart Home activities.

In the explored case study, we were able to find automatically and manually the links between the sensors and the activities. We highlighted the differences between the mined process models whenever contextual information (weekdays Versus weekends), or the user profile (resource 1 Versus resource 2) is changing. The multiplicity of the different mined process models, each related to a specific context, suggests the importance of constructing contextual process models. A lot of other contextual information can provide better knowledge since a person’s activities would be different in different circumstances such as time, location, country, culture, weather, etc. Using contextual process models would allow us to offer better recommendations to the users by contextually recommending the best-suited activity at a specific time and place.

However, the case study and the provided dataset were quite simple as they didn’t offer any detailed information about the user profile or other context characteristics. The dataset is of small size and lacks data; the only provided characteristics were timestamp and resource Id.

In future work, we aim to be able to discover the contextual process model automatically and we plan to work on a larger dataset to extract more links between the sensors and activities to be able to guide the user on the fly.