1 Introduction

With the development of the Internet of Things, sensor technology is being widely used in our daily life (Suryadevara and Mukhopadhyay 2014; Pansiot et al. 2007; Gao et al. 2014). Data mining, information inference and knowledge learning have been risen in response to the proper time and conditions of the smart world. More and more researches adopt the non-vision ambient sensors in their family scene which pays attention to resident privacy. However, there are many limitations in recognizing complex activities, such as, the noisy interference situation, the indistinguishable similar activity. An activity consists of a series of sensor elements. Similar activities are two activities which contain more than half same sensor elements in their sensor series. A small number of sensor elements are different in similar activities. The same sensor elements mean the sensor belongs to these two activities. The touch sensor of cup is a same sensor element of several activities in same activity category (drink coffee and drink tea), or in different activity categories (brush teeth and drink coffee).

Most of previous studies focus on some typical activities with poor correlations in single resident and sequence scenario. They usually adopt CASAS dataset which includes ten independent and not similar activities, like making meal and eating meal (Cook et al. 2013). There are some other datasets with similar activities have been adopted. For example, Liming Chen et al. have designed their own dataset in kitchen room with eight activities. Three activities of them, make tea, make chocolate and make coffee, are similar activities. But they have not designed the experiments specifically for these similar activities (Okeyo et al. 2014a). Two similar activities occurred in one time window is a usual situation which needs a particular dataset. Although similar activity recognition is at an initial stage, but its essence is mining features of data to establish the high correlations with the right activity which is same with the traditional activity recognition.

In addition to sensor, location (Tahir et al. 2019) and other intuitive features of sensor data, time sequence (Moutacalli et al. 2015) is an effective feature which has been studied for concurrent and interleaved activities by Saguna (2013), Yongmian Zhang (2013), Li Liu (2016) and others. However, once a sensor is triggered in one window, the related similar activity will have the same probabilities (calculate by the maximum conditional probability from the labeled dataset). Similar activities always have a number of same time sequence orders, sensors and locations which makes the distinguishing between them represents a difficult task.

The new feature, time duration, has been proposed. Fadi Al Machot et al. have adopted the Information Gain (IG) evaluation to find the set of “best fitting sensors” (Machot et al. 2016). Li Junhuai et al. have divided the activities to basic and transitional activities, after running the shortest segmenting on the raw sensor data, using the K-Means cluster analysis to gather the related segments as the basic activities’ time blocks (Junhuai et al. 2019). These two methods calculate all the possible results to get the best which is complex and high computation. Surong Yan et al. have combined the Latent Dirichlet allocation and Bayes theorem to represent and extract activity duration feature (Yan et al. 2016). However, these duration features are fixed values which can’t handle the dynamic situations. Ehsan has proposed a normal distribution model of temporal features and activities, like, time sequences, begin times and durations based on their sensor data. However, this method has poor generalization ability which requires a knowledge model (Nazerfard 2018). Defining alternative and duration range model can reduce reliance on data, and increase the flexibility. In the high-dimensional time features, time block has been introduced to express the duration range is an innovation point in our model.

After feature selecting, algorithm choosing is also a key process for accurate activity recognition. There are two categories of algorithms, one is data-driven method, and the other one is knowledge-based method. The semantic model with the temporal-spatial and time sequence traits is a typical knowledge-based method which design the activity rules in advance and not rely on user data (Liu et al. 2015). Hooda et al. have proposed the ontology model to express the heterogeneous sensor data which has reusability (Hooda and Rani 2020). Using the probability statistics is the basic idea of data-driven method which has a good performance in dynamic and unknown case (Chamroukhi et al. 2013). The combination of semantic and probability statistics algorithm is the promising method of the inference, especially for the complex representation and relationship of activities situation (Okeyo et al. 2014b; Riboni and Bettini 2011; Ordóñez et al. 2013; Meditskos et al. 2013). Markov Logic Network (MLN) is a combination solution which has been widely adopted (Gayathri et al. 2017; Helaoui et al. 2011). These studies are mainly handling the activity recognition in interleaved and concurrent scenes. In order to better understand how it can be applied to similar activity recognition, Markov Logic Network has been elaborated in Sect. 2.

With the deepening of the research, more and more detailed activities are involved in the model whose scale increases greatly with the redundant representation of similar activities (Chen and Nugent 2009). The related rules of an activity consist of special habit rules and the complete homologous rules for same category activities. That reduces the consumption of the resource and the complexity of these rules which builds the formal management for these activities (Ye et al. 2015). It can be found that generalization for these similar activities generates homologous rules which have better representation than semantic rules (depend more on expert knowledge than data) for dynamically unknown activities.

In this paper, we improve Markov Logic Network model as described in the following steps.

  1. 1.

    Adding temporal characteristics, such as duration and time block of an activity, to activity models. This trait can increase the correlations between sensor and activity which can distinguish the similar activity easily.

  2. 2.

    Proposing a novel hierarchical structure and improving the model robustness and generalization.

The basic concept and theory of MLN algorithm has been presented in Sect. 2. In Sect. 3, the semantic activity representation has been presented including the time duration and time block. The hierarchical structure based same category rules and special derivative rules is explained in Sect. 4. Section 5 shows the experiment results for similar activity based on the Markov Logic Network model which has good performance. In Sect. 6, we discuss the solution and propose directions for future work.

2 Markov Logic Network

In this section, the basic concepts of MLN have been described, including the knowledge representation method and probabilistic reasoning logic. Knowledge representation is the fundamental for characteristics and hierarchical structure. Probabilistic reasoning logic is the key of accurate inference.

Markov Logic Network is one kind of Markov Network (MN) whose rules are expressed by First order Logic (FoL) (Tran and Davis 2008; Chahuara et al. 2012; Gayathri et al. 2015).

First order Logic is a knowledge representation model which is built by connector (e.g, \(\wedge\), \(\vee\), \(\lnot\), \(\rightarrow\), \(\leftrightarrow\)) and quantifiers (e.g, \(\forall\), \(\exists\)) recursively. The complete representation contains types of terms, for example, constant, variable, function, etc. Variable is the generalization of constants which has the same correlations or attributes. The function represents mappings from tuples of objects to objects (Domingos and Lowd 2009). Predicate expresses the correlation and attributes of terms (Domingos et al. 2008). Each term represents a node of MLN, each predicate represents a edge of MLN which link all the terms in one FoL rule. An MLN is an undirected graph. Each FoL rule represents a fully-connected graph called “clique”. The ground term is a constant term without any variables.

We construct the MN based the FoL formula and then give the weight (related to the potential function) for every formula which represents the occurrence probability of them based the label data. Weight \(\omega\) has the following relationship with potential function \(\varPhi _k(x_{\{k\}})\). Therefore, MLN also defined as the combination of FoL and a set of potential functions. The potential functions represent the relational degree for the linked nodes which is non-negative real-valued function of the state. The potential function is applied to pairwise nodes in one FoL

$$\begin{aligned} \omega =log \varPhi _k(x_{\{k\}}) \end{aligned}$$
(1)

There are two kinds of methods to obtain the weight of MLN, one is manually set, and the other one learns by learning algorithms automatically. We adopt the second one which can obtain much better models with less work (Domingos and Lowd 2009). We adopt the discriminative weight learning method where some atoms are evidence, and the others are queried to achieve our goal in predicting the latter from the former. The MN usually represents as log-linear probability models. Maximizing the conditional log-likelihood is an optimization method for learning weight. The weight “\(\omega\)” has the following formula with the learning rate “\(\eta\)” and gradient “g” (Singla and Domingos 2005)

$$\begin{aligned} \omega _{t+1}=\omega _t-\eta g \end{aligned}$$
(2)

The gradient “g” is obtained by taking the derivative for the conditional probability of the unknown atoms y and known evidence x. g is the difference of the expected number of true groundings of the corresponding clause \(\sum _{y'} P_\omega (Y=y'\mid X=x)n_i(x,y')\) and the actual number \(n_i(x,y)\). \(E_{\omega ,y}\) is the expectation over the non-evidence atoms Y. \(n_i(x,y)\) is the number of true groundings of the ith formula in the data.

$$\begin{aligned} \begin{aligned} g&=\frac{\partial }{\partial \omega _i} (-log P_\omega (Y=y\mid X=x)) \\&=-n_i(x,y)+\sum _{y'} P_\omega (Y=y'\mid X=x)n_i(x,y') \\&=E_{\omega ,y}[n_i(x,y)]-n_i(x,y) \end{aligned} \end{aligned}$$
(3)

Inference in MLN is a non-deterministic polynomial hard (NP-hard) problem which requires the sampling method. Gibbs sampling is the typical method that we adopt in this paper. Gibbs sampler ensures the conditioning variables fixing to their given values. The details of this algorithm are shown in following. The sample sequence is approximated by iterative conditional distribution and joint distribution.

figure a

In order to reduce the computing scale, sampling in Markov blanket is an efficient method for inference. Markov blanket is the minimal set of nodes that renders one specific node independent of the remaining network. The probability of a ground predicated (query nodes) \(X_l\) when its Markov blanket (related evidence nodes which has smaller number than MLN evidence nodes) \(B_l\) is in state \(b_l\) is in (4). \(F_l\) is the set of ground formulas that \(X_l\) appears in, \(\omega _i\) is the weight of clique of one formula, and \(f_i \in \{0,1\}\) is a binary function which represents the state of clique. \(f_i(X_l=x_l,B_l=b_l)\) is the value of ith ground formula when \(X_l=x_l\) and \(B_l=b_l\). \(f_i(X_l=0,B_l=b_l)\) is the value of ith ground formula when \(X_l=0\) and \(B_l=b_l\). \(f_i(X_l=1,B_l=b_l)\) is the value of ith ground formula when \(X_l=1\) and \(B_l=b_l\)

$$P(X_{l} = x_{l} \left| {B_{l} = b_{l} } \right.) = \frac{{exp\left( {\sum\limits_{{f_{i} \in F_{l} }} {\omega _{i} f_{i} (X_{l} = x_{l} ,B_{l} = b_{l} )} } \right)}}{{exp\left( {\sum\limits_{{f_{i} \in F_{l} }} {\omega _{i} f_{i} (X_{l} = 0,B_{l} = b_{l} )} } \right) * exp\left( {\sum\limits_{{f_{i} \in F_{l} }} {\omega _{i} f_{i} (X_{l} = 1,B_{l} = b_{l} )} } \right)}}$$
(4)

3 Semantic model with duration and time block

Semantic representations for all activities adopt FoL format in MLN (Ryoo and Aggarwal 2009; Gayathri et al. 2017). In the sensor event layer, the sensor attributes (time point, location, time block, ID, attached object) have been defined as parts of term. The time sequence has been defined as a new term which can be linked by predicates with sensor term or activity term. We only recorded the jump value of these terms and discarded the ones that do not change, which saves storage and computation resources. Most of sensors have two states, we redefine these sensor terms’ states, 1 means from untriggered to triggered, 0 means from triggered to untriggered. While, pressure sensor, temperature sensor and other similar sensors have values instead of states, we transform these values to term states, 1 means the value has been increased, 0 means the value has been decreased. The two state terms are expressed in function (5–6). These two functions are opposites of each other which has been shown in function (7). There are more than ten sensor categories, such as motion, touch, light, magnetic, gas, water, pressure, tilt, temperature, humidity and vibration. DDPPHHMMSS is the format of the time information used to show the time information, including the traditional temporal data DDHHMMSS, day, hour, minute and second. The new concept of the time block PP which has the twelve value (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) is divided by 2 h in 1 day as shown in Table 1

Table 1 Time block definition
$$\begin{aligned}&Sensor(ID)Place(LABEL)(DD,PP,HHMMSS) \end{aligned}$$
(5)
$$\begin{aligned}&\lnot Sensor(ID)Place(LABEL)\_ID(DD,PP,HHMMSS) \end{aligned}$$
(6)
$$\begin{aligned}&\begin{aligned} Sensor(ID)Place(LABEL)(DD,PP,HHMMSS) \\ \leftrightarrow \lnot (\lnot Sensor(ID)Place(LABEL)(DD,PP,HHMMSS)) \end{aligned} \end{aligned}$$
(7)

All terms, which are constant, are named ground atom. For example, when the cup has detected a touch from touch sensor 1, the cup’s touch sensor changed the value from “0” to “1”, then it sends one record. The date is 20190212, the time block is 2 (Morning), the hour is 09, the minute is 15, and the second is 12 as shown in function (8)

$$\begin{aligned} Touch(1)Cup(1)(20190212,2,091512) \end{aligned}$$
(8)

In entity/action event layer, the atom is similar to the sensor event layer and shown in function (9). For example, the action “UsingCup” is shown in function (10) which has been used to infer. We define the action time is the last triggered sensor time

$$\begin{aligned}&Action(ID)Place(LABEL)(DD,PP,HHMMSS) \end{aligned}$$
(9)
$$\begin{aligned}&\begin{aligned} Touch(1)Cup(1)(x,y,z_1) \wedge \lnot Megnetic(1)Coster(1)(x,y,z_2)\\ \wedge z_1\le z_2 \rightarrow Using(1)Cup(1)(x,y,z_2) \end{aligned} \end{aligned}$$
(10)

Additional Notes One action only occurs at one time point, but it can belong to three time blocks.

For example, 06:32:45 (HH:MM:SS) is one time point for one action, this action, belongs to 4, 3.5, and 4.5 three time blocks (3.5 and 4.5 will be give the definition in following). Therefore, the preprocessing for the raw data extends them to three instances. This method aims at solving the issue of the representation for crossing time blocks activity, even though that will waste some storage resources. The efficiency of computation has been improved sharply because of the unification rules.

The activity event layer, is the same as entity and sensor layer, except adding the duration concept to the atom (Duong et al. 2006, 2009; Zhang et al. 2010). The activity is defined as (11), \(Begin\_HHMMSS\) and \(End\_HHMMSS\) have been provided from series of action events. The knowledge rules of the activities consist of several action events. Selecting the minimum time and the maximal time is set as the activity’s begin time and end time. Usually, one activity consists of more than one action event, therefore, the max and min time must exist. For the exception that only has one action event, we define the begin and end time the same is the event time. In order to reduce the character numbers and lower the limit, we combine the duration and the time block, which adds the 12 new values shown in Table 2. The duration is a loose time frame which lasting less than 4 h, when one activity happens in cross time block, they are exists in the new time block. The definition for crossing duration has been shown in following.

Table 2 New time block definition
$$\begin{aligned} \begin{aligned} Activity(ID)Place(LABEL)\\ (DD,PP,Begin\_HHMMSS,End\_HHMMSS) \end{aligned} \end{aligned}$$
(11)

The typical activity rules of 12 activities (DrinkTea, DrinkCoffee, WashFace, WashCloth, HaveMeal, DoDishes, DrinkMilk, DrinkJuice, FriedDishes, BoiledDishes, Sweep and Wipe) that are used in experiments have been shown in Table 3. In order to make the rules clearer, we just keep the entity name, time sequence and activity name.

Table 3 Typical rules of 12 activities

4 Hierarchical activity modelling

The structuring of the activity model aims at establishing the abstract and generalized rules based on classifying the categories of the similar Activities of Daily Living (ADL) as shown in Fig. 1 (Brostow et al. 2008). For example, we defined a category activity “DrinkHot” based on similar activities which includes Activity (drink water), Activity (drink tea), Activity (drink coffee), and so on, the generalized rule of “DrinkHot” as shown in Table 4. The “DrinkTea” rule has been redefined by “DrinkHot” which has been shown in Table 5. We can find the representation of the special rules is easier than before. In this paper, we just list some typical categories which may be incomplete, but has the same processing method and can be extended in all ADL.

Fig. 1
figure 1

Activity categories

Table 4 Typical DrinkHot category activity rule
Table 5 Typical DrinkTea activity rule based on DrinkHot category

We can see that each type of ADL consists of many detailed and specific activities. For one kind, there are many sub activities which triggers different sensors that not only obey the generalized rules, but also meet the special rules. According to living habits, common sense of ADL, the semantic knowledge of ADL can be easily established and enriched.

For the father nodes, extracting the generalized rules from the sub nodes, adopts the FoL to describe them. For the leaf nodes, the complex description can be replaced by the father nodes and personal characters with connectives which have been shown in function (12). There are four typical features for every activity node, Time Block, Duration, Location and Time Series.

Adopting the hierarchical structure model has many advantages as following.

  • Ease of maintenance, sub activities inherit rule models from father node which doesn’t influence the special feature of sub. When activity habits change, model maintenance is convenient and low-cost because of readability and inheritance.

  • High expandability, when the father category is defined, the father node is expended to various sub nodes. The model is flexible and low-cost to realize the high cohesion relationship with father and sub nodes.

  • High reusability, the father node is independent and can be reused by new activities that have the same features.

  • High efficiency, because of the high expandability and reusability, the whole operation time has been reduced.

  • Multiple inheritance, each specific activity not only inherit one father node attributes, but also can belong to more father nodes which means they can get all of their fathers’ attributes. That enhances the robustness of the model.

$$\begin{aligned} \begin{aligned} Father\_Activity \wedge Special\_Character \rightarrow Sub\_Activity \end{aligned} \end{aligned}$$
(12)

5 Inference and experiment

In this section, we develop an inference method which combines the data and knowledge reasoning. FoL has been presented as a typical semantic model. We establish the expert’s knowledge base based on the essence and nature characteristics of these activities which is completely unaffected by the sensor data. In order to decrease the space usage and computation complexity, a hierarchical structure activity model has been presented to modifying the rules of knowledge base. The activity consists of a series of sensor data, and the most critical feature is time series. Therefore, MN is a statistical probability graphic method has been adopted in this paper. It can dig the complex and personal features from the sensor data. By combining the FoL and MN, we have adopted the MLN which has a good performance in recognizing complex activities.

Our experiment has two research points, one is recognizing the similar activity which happen in one day, and the other one is hierarchical structure model performance improvement. We deploy 27 sensors in our room, including touch sensor (TTP223B), tilt sensor, magnetic (MKA14103), water (FC-37), pressure (HX711) and so on. The simulation deployment diagram of the room is shown in Fig. 2. These sensors are divided into the module boxes and deployed in their families. Similar activity groups consist of Activity (DrinkTea) and Activity (DrinkCoffee), Activity (WashFace) and Activity (WashCloth), Activity (HaveMeal) and Activity (DoDishes), Activity (DrinkMilk) and Activity (DrinkJuice), Activity (FriedDishes) and Activity (BoiledDishes), Activity (Sweep) and Activity (Wipe) which have similar actions more than different ones. We extract the similar parts to be used as the father features, whereas the remaining parts are used as particular characters of sub activities.

Because of the difference between our work and other activity recognition, this work is mainly distinguishing the similar activities which designs a new dataset and adds duration and time block features. Comparing the existing research, we adopt MLN with time, location and time sequence features. These features are not easy to express by other algorithms. Therefore, we make the comparison experiments with the MLN (without adding duration and time block features) which has a good performance in interleaved and concurrent complex situations. The comparison results are given details in following parts.

Alchemy 2.0 is an inference engine of MLN. We use Alchemy 2.0 to learn the weight of rule and inference the likelihood probability. The FoL rules have been stored as “.mln” file. The train data with labels has been stored as “.db” file. The test data has also been stored as “ .db” file.

Fig. 2
figure 2

Simulation deployment diagram for Kitchen Room

5.1 Similar activity recognition

The duration is a unique habit for residents, because of the essence difference between similar activities, like Boiled Dishes and Fried Dishes, the duration is obvious different. In addition, the time block is one of a typical and different habits in similar activities, different time block has different activity preference, like Drink Coffee and Drink Tea, residents usually choose the different time block in one day. We can test our ideas by the following experiments. For the FriedDishes and BoiledDishes two similar activity groups, the recognition accuracy ratings have been improved from an average 90.3% (the result is shown in Fig. 3) to 92.5% (the recognition result is shown in Fig. 4). The horizontal axis represents all the possible activity results when some sensors are triggered by two similar activities (FriedDishes and BoiledDishes). The vertical axis represents the probability of these possible activity results. The right results mean the begin time and end time are right. Adding duration and time block features, the total probability of two similar activities has been improved. From those two figures, we can easily find that the accuracy increases, and the error in the result disappeared. The results for the six similar activities groups are shown in Table 6. The second column “Probab.(before)” is the probability without the duration and time block features, and the third column “Probab.(modify)” adds the duration and time block features whose performance has been improved.

Fig. 3
figure 3

Frieddishes and Boileddishes recognition result time series and location only

Fig. 4
figure 4

Frieddishes and Boileddishes recognition result with additional duration and time block

Table 6 Two similar activities probability

5.2 Hierarchical structure model

For the fivr typical activities categories, Activity (Drink), Activity (Wash), Activity (Meal), Activity (Cook), Activity (Clean), which are the sets of objects. Sub activities belong to the father nodes, and the related rules of these father nodes have been inherited. Inference processing builds the instances graph which is times bigger than none father nodes network just for the related rules. MLN is based on the rule to construct the related max fully connected subgraph. When we extract the father node from those sub activities, one subgraph has been segmented to two subgraphs and another new concept is added to express the father node.

In order to avoid the computation cost, we preprocess the inheritance which retains the structuration advantages and reduces the inference complexity. The preprocessing pseudo-code to construct the non-father nodes rules has been described as following:

figure b

We have compared the operating time with traditional Markov Logic Network and adding the “Delete Father Nodes” algorithm, we can find the operating time has been decreased. The experiment results have been shown in Table 7.

Table 7 Father nodes model operation time comparison

6 Conclusion

This paper focused on improving the performance of similar activity recognition. We have presented two new characteristics which can restrict activity inference rules to improve reasoning efficiency. We introduce duration and time block of an activity to sensor data with time series and location to expand inference rules. We can easily recognize the similar activity which happened at the same day. Research findings have shown, based on similar activities to generalize the hierarchical activity models enhances expandability and readability. In order to decrease the computation cost of adding father node to MLN, we have proposed a preprocessing method to reduce the complexity.

The solution of this paper can be generalized to other field, especially for the timely and accuracy personalized service areas. For future work, other high dimension characteristics should be considered for activity modelling which can get more accuracy representation and inference.