1 Introduction

Activity monitoring in ambient smart homes has been attracting a great deal of attention recently due to its promising application in aged care where privacy and obtrusiveness are the major concerns [3]. Different from visual-based approaches that employ video data for learning and reasoning about the activities of multiple residents, ambient approaches rely on low-cost, power-saving sensors placed in different locations in a smart home. Although the use of cameras can provide rich information for accurately modelling human behaviour, many users feel uncomfortable with the appearance of cameras and also fear that their identities and sensitive activities can be unnecessarily revealed. By contrast, ambient sensors are only triggered when a resident performs an activity and therefore being less intrusive. Wearable sensors, perhaps, are currently the most popular and successful tool for personal activity tracking. However, in many cases, especially in aged care for multiple occupants, this approach is expensive and is not always welcomed by senior citizens who are not comfortable with having an electronic device attached. Such limitations have motivated research on the future of smart home where human activities and interactions can be measured through sparse, indicative sensor data from ambient devices.

Research on activity monitoring in ambient environment have been active in the past fifteen years with the focus is shifting from static approaches such as k-Nearest Neibourgh, Decision Trees, Multi-layer Perceptron [12, 29], incremental Decision Tree (IDT) [23] to temporal approaches such as Hidden Markov Models (HMM) [1, 5, 8, 11, 25], Conditional Random Field (CRF) [3, 10, 18, 30], and Recurrent Neural Networks [28]. The central idea of these approaches is to employ a data-driven model to learn a classifier for activity recognition with inputs are sensors’ values. Data-driven approaches have also been used to predict the time when an activity will occur which would be useful for early prevention and anomaly detection [20, 21]. With the success of deep learning recently, some sequence modelling approaches have been adopted for activity recognition with ambient sensor data, most of them are inspired by natural language processing. For example, in [28] the recognition module of the Smarter Safer Home platform is built upon Gated Recurrent Unit [7] and Long Short Term Memory [17]. In [31] the authors take the advantage of stacked Auto-encoders in learning high-level, predictive features for activity recognition with binary sensor data. In [16],sequence-to-sequence model [26] are used to learn temporal features. Besides investigating new activity modelling approaches, a heterogeneous strategy was also used to transfer activity information from one sensor platform to another to boost the performance [14].

As we can see, most of the research currently focuses on improving the prediction performance, while in a multi-resident environment, there is an unanswered question of whether the activities of a resident can influence the others. In [2], the activities are segmented to detect activity transitions. However, this work does not show how the activities of residents are related. We argue that the activity of a resident at a time step can be seen as a frequent pattern, like a habit (having breakfast after getting up in the morning), or as a part of a collaborative event (one is cooking and the other is cleaning the dining table). This seems that the activity of a resident depends on the previous activities and also from the activities of other residents. In this paper, instead of developing a new model for performance improvement, we are interested in study the relationship of activities in multi-resident smart home environments.

In the first part of this paper we show an investigation into the behaviour dependencies in smart home environments. We classify the dependencies of activities as: individual dependency: Activities of a resident depend only on his previous activity; cross dependency: Activities of a resident depends on the previous activities of all residents; and group dependency: Activities of all residents depends on their previous activities. We also consider two forms of interactions between activities and environment’s states. Here the states are triggered either by each individual independently or by the activities of all residents at a time. We model the dependencies into six variants of Hidden Markov Models (HMMs), some of which have not been used for multi-resident activity recognition in ambient environments before.

The second part of this paper is a proposal of a novel approach for multi-resident modelling. We argue that a complete model should embody more than one type of dependency to be able to present the complexity of collaborative, interactive activities. We then propose an ensemble of HMMs, which we call as multi-dependencies HMM (md-HMM), to implement the idea. A md-HMM is a combination of several HMMs having different transition probability tables while sharing the same emission probability table. It is obvious that the role of each type of dependency varies in different environments, depending on the living styles of occupants. Therefore, we further generalize the ensemble to a mixed dependencies model (MDM). Different from md-HMM where the model’s log-probability is the sum of the other HMMs, MDM is represented by a weighted log-probability. At a special case, a MDM can be seen very similar as md-HMM.

We carry out experiments on three smart home environments from CASAS [9, 18] and ARAS [1] datasets. As far as we know, this is the first work that performs intensive empirical evaluation using multiple smart home environments with different types of feature representation. For reproducibility we share our code at https://github.com/sFunzi/mdm. Among six variants of HMMs we find that the HMMs with group dependency is more accurate than the other variants of HMMs. We also find that representing the environment’s state as a result of all residents’ activities is better than separate it for each individual. More importantly, the empirical results confirm our hypothesis in which mixed dependencies indeed capture the complexity of multi-resident activities. In particular, the md-HMM has better performance than other models and MDM can even achieve further improvement.

The organization of this paper is as follows. In the next section, we discuss the dependencies in ambient smart home environments with multiple occupants. Section 3 shows how to combine such dependencies to construct different HMMs for activity modelling. In Section 4 a ensemble of HMMs, called md-HMM, and its generalisation MDM are proposed. Related work is presented in Section 5. We showcase the empirical study in Section 6 and perform intensive experiments on three smart home environments from two datasets. The last section concludes the work.

2 Smart home environments

In a seamless smart homes we can utilise ambient devices to monitor the behaviour of residents by attaching them to various locations. Such devices such as motion and force sensors are affordable with low-energy consumption which are very suitable for mass production in future.

2.1 Notations

Let us denote am,t and ot as an activity of resident m and sensors’ state at time t respectively. For ease of presentation we denote at = {a1,t,a2,t,..,aM,t} as activities of all M residents at time t. We use t1 : t2 to denote a sequence of events/states from time t1 to t2. For example, \(\mathbf {a}^{t_{1}:t_{2}}=\{\mathbf {a}^{t_{1}}, .. , \mathbf {a}^{t_{2}}\}\) is a sequence of activities performed by all residents from time t1 to t2.

2.2 Activity dependencies

For multiple residents there are three types of dependencies as being illustrated in the first row of Table 1. Let us take two residents as an example. First, each resident’s activities are seen as an independent Markov chain where there is no interactive link between them. We call this parallel dependency, as shown in the left figure . Second, we assume that activity of a resident at time t depends not only on his previous activity but also on the activities of the others. This seems to make sense since in smart homes occupants tend to interact with others. We call this cross dependency, as shown in the middle figure. Finally, we can treat the group of activities of all residents as a single random variable where their current combined activity depends on the previous one, as shown in the right figure. We call this group dependency.

Table 1 First row (left to right): parallel dependency, cross dependency, group dependency ; First column (top to bottom): individual interaction, group interaction; pHMM: parallel HMM, cHMM: coupled HMM, gd-cHMM: coupled HMM with group dependency, fHMM: factorial HMM, cd-fHMM: factorial HMM with cross dependency, gd-HMM: HMM with group dependency

2.3 Environment interaction

We now consider two types of interactions between activities and environment’s states, as can be seen in the first column of Table 1. The states of a smart home are normally represented by the values of sensors, denoted as ot at time t. A state is recorded when one or more residents perform some activities and therefore trigger the sensors. In the first case, as shown in the top figure, each resident has his own interaction with the environment. This based on the fact that a sensor can only be triggered by a single person. Note that in non-intrusive ambient environments, it is difficult to associate the activated sensors to the person who activates them. Instead, states of environment are replicated for each person. In the second case, one may argue that since the environment is treated as one object, its states should reflect the whole dynamic in it. Therefore, in this case the environment’s state is modelled to be dependent on the activities of all residents, as shown in the bottom figure.

3 Multi-resident activities modelling

In this section we show how to model the activities of multiple residents in smart homes by combining the dependencies discussed in the previous section.

3.1 Hidden Markov models

A HMM [24] consists of a single hidden y and an observation variable x which assumes a Markov process. It represents a state sequence as a joint distribution as:

$$ p(y^{1:T},x^{1:T}) = p(x^{1}|y^{1})p(y^{1})\prod\limits_{t=2}^{T}p(x^{t}|y^{t})p(y^{t}|y^{t-1}) $$
(1)

This is parameterised by the probability tables p(xt = i|yt = j), \(p(y^{t}=j|y^{t-1}=j^{\prime })\), p(y1 = j), which are called emission, transition, and prior probabilities respectively. In order to learn the model’s parameters, one would like to maximize the log-likelihood: \( \ell = {\sum }_{y^{1:T},x^{1:T} \in \mathcal {D}}\log (p(y^{1:T},x^{1:T})\), where \(y^{1:T},x^{1:T} \in \mathcal {D}\) is a sequence of inputs and output, e.g sensors’ states and activities, in the training data \(\mathcal {D}\). Given a new sequence of observations, prediction can be performed by finding the most probable hidden states using Viterbi algorithm.

3.2 Multi-resident activity models

HMM is perfectly useful for modelling the dependencies we have discussed in Section 2. The activities of residents can be seen as multiple hidden variables while the environment’s states can be either presented as a single variable or as multiple replicas of a variable. By considering each type of dependencies and interactions we come up with six variants of HMM as shown in Table 1. In what follows we discuss each of them in detail.

3.2.1 pHMM

We can model each resident’s activities by a separated HMM, similar to the model proposed in [6]. However, it should be noted that in that work the data association is provided such that the input of each HMM is tied with only the states of the sensors triggered by a specific person. In the general case as we are studying in this paper such information is not available. Therefore the input for each HMM should be replicated for all residents. The joint distribution of this parallel HMM is:Footnote 1

$$ \begin{aligned} p = \prod\limits_{m} \left[p(o^{1}|a^{m,1})p(a^{m,1}) \prod\limits_{t=2}^{T}p(o^{t}|a^{m,t})p(a^{m,t}|a^{m,t-1})\right] \end{aligned} $$
(2)

The parameters in this model is different from those in the single HMM where we only need three probability tables. Here, the model would have M transition probability tables, M emission probability tables, and M priors. Each HMM will be learned independently using the same algorithm. For prediction, each HMM will predict the activities of a resident and the results are combined from all HMMs for evaluation.

3.2.2 cHMM

Parallel HMM has an issue in that it does not take into account the relations of residents’ activities. Each HMM assumes that the current activity of a resident depends only on his previous activity. As we mentioned earlier, this may not reflect the real situation in smart homes where activities of a resident depend on other residents at some time. Therefore, by coupling the hidden variables of separate HMMs while maintaining the replication of observation variables we can have a new model that capture such cross dependency, similar as in [6]. In this case the joint distribution is:

$$ p = \prod\limits_{m} \left[p(o^{1}|a^{m,1})p(a^{m,1}) \prod\limits_{t=2}^{T}p(o^{t}|a^{m,t})p(a^{m,t}|\mathbf{a}^{t-1})\right] $$
(3)

Here, the emission probabilities are the same as those in the parallel HMM but the transition probabilities are different. We also use Viterbi algorithm to infer the most probable activities given a sequence of sensors’ states. Due to the coupling, we are not able to perform parallel inference as in the previous model. Instead, we apply the Viterbi algorithm by replacing p(xt|yt) with \({\prod }_{m} p(o^{t}|a^{m,t})\), p(yt|yt− 1) with \({\prod }_{m} p(a^{m,t}|\mathbf {a}^{t-1})\), and p(y1) with \({\prod }_{m} p(a^{m,1})\).

3.2.3 gd-cHMM

A gd-cHMM has similar structure as the coupled HMM, the only difference is that the hidden variables are coupled by group dependency instead of cross dependency.

$$ \left. p = \prod\limits_{m} p(o^{1}|a^{m,1})p(\mathbf{a}^{1}) \prod\limits_{t=2}^{T}\prod\limits_{m} p(o^{t}|a^{m,t})p(\mathbf{a}^{t}|\mathbf{a}^{t-1})\right] $$
(4)

Since the same environment dependency is used, this model has the same emission probabilities as pHMM and cHMM. The transition table in this case should have higher storage complexity than two previous cases. For prediction, we replace p(xt|yt) with \({\prod }_{m} p(o^{t}|a^{m,t})\), p(yt|yt− 1) with p(at|at− 1), and p(y1) with p(a1) before applying the Viterbi algorithm.

3.2.4 fHMM

Factorial HMM was proposed by Ghahramani and Jordan in [15]. This can be seen as a generalization of HMMs where the single hidden variable is factored into multiple hidden variables. To apply the model to multi-resident activity recognition, we assign each hidden variable to represent a resident’s activities. One can see it as similar as the parallel HMM except that there is only a single observation. We take into account that in pHMM and cHMM the sensors depend on each individual’s activities separately, which is only valid when the data association is available. Without this, separating the observation of each resident may lead to drop in performance as what we will show in the experiments. Therefore, factorial HMM has one single observation variable, hopefully to solve such problem. The joint probability of the fHMM is:

$$ \left.p = p(o^{1}|\mathbf{a}^{1})\prod\limits_{m} p(a^{m,1}) \prod\limits_{t=2}^{T}\big{[}p(o^{t}|\mathbf{a}^{t})\prod\limits_{m}p(a^{m,t}|a^{m,t-1})\right] $$
(5)

Similar to other HHM-based models this factorial HMM will be learned by maximizing the log-likelihood. Once the parameters are learned, we can use the model to perform prediction task through Viterbi algorithm, as in Section 3.1. In this case we just need to replace p(xt|yt) by p(ot|at), p(yt|yt− 1) by \({\prod }_{m} p(a^{m,t}|a^{m,t-1})\) and p(y1) by \({\prod }_{m} p(a^{m,1})\)

3.2.5 cd-fHMM

In order to represent the relations between activities among residents as what has been discussed in Section 2, we add cross connections from all hidden variables at time t − 1 to each hidden variable at time t. This results in a fHMM model with cross dependency. In this model, the joint probability of sensors’ states and activities of all residents is:

$$ p = p(o^{1}|\mathbf{a}^{1})\prod\limits_{m} p(a^{m,1}) \prod\limits_{t=2}^{T}(p(o^{t}|\mathbf{a}^{t})\prod\limits_{m}p(a^{m,t}|\mathbf{a}^{t-1})) $$
(6)

It can be seen that only the transition probabilities are changed in comparison to the fHMM above. For inference, similar to the fHMM we can apply the Viterbi algorithm with substitutions of \({\prod }_{m} p(a^{m,t}|\mathbf {a}^{t-1})\) and p(a1) for p(at|at− 1) and \({\prod }_{m} p(a^{m,1})\) respectively.

3.2.6 gd-HMM

The last variant we study in this paper is the HMM with group dependency which can be seen as a single HMM with one hidden variable to represent the combined activities of all residents. The joint distribution of this HMM for multi-resident activity modelling simply is:

$$ p = p(o^{1}|\mathbf{a}^{1})p(\mathbf{a}^{1}) \prod\limits_{t=2}^{T}p(o^{t}|\mathbf{a}^{t})p(\mathbf{a}^{t}|\mathbf{a}^{t-1}) $$
(7)

Compare to the other variants this model require larger storage for emission probability table, similar as gd-cHMM. However, this may be useful for inference since it does not need to combine M small transition probability tables as in the other HMMs, except pHMM.

4 Mixed-dependency models

Previous section studies various variants of HMMs, each represents a type of activity dependency and interaction in smart home environments. We argue that the complexity of multi-resident activities would require more than one type of dependency for better reasoning. In this section, first we propose an ensemble of HMMs to combine different type of activity dependencies. Then we generalize the idea to propose another novel model that mixes the dependencies and subsumes the ensemble.

4.1 Ensemble model

Let us consider an ensemble of fHMM, cd-HMM and gd-HMM where parallel dependency, cross dependency and group dependency are combined. Note that for simplicity we only use the HMMs that have the same representation of environment. The idea here is to constrain the HMMs together such that the most likely sequence of activities must maximise the combined probabilities of all HMMs. For example, we can represent the ensemble as the sum of the probabilities as: pgd-hmm + pcd-hmm + pfhmm. With this, the learning is efficient by applying maximum likelihood estimate to each model separately. However, we are not sure that whether dynamic programming algorithm in HMMs, i.e. Viterbi, can be applicable to the sum of probabilities. Therefore, to ease the inference we propose an ensemble model which is formularised in a closed form of the combined probabilities in log-space as:

$$ \begin{aligned} &\phi_{\text{md-HMM}} = \log p_{\text{gd-hmm}} + \log p_{\text{cd-hmm}} + \log p_{\text{fhmm}} \end{aligned} $$
(8)

We call this ensemble as mix-dependency HMM or md-HMM. After training the md-HMM by maximising the log-likelihood of each HMM in the ensemble we can combine them for prediction as:

$$ {\mathbf{a}^{*}}^{1:T} = \arg\max\limits_{\mathbf{a}^{1:T}} (\phi_{\text{md-HMM}}(o^{1:T},\mathbf{a}^{1:T})) $$
(9)

This can be done through dynamic programming, similar as in HMMs. Let us denote \(\mu _{t} = \max \limits _{\mathbf {a}^{1:t-1}}p(\mathbf {a}^{t}=j,\mathbf {a}^{1:t-1},o^{1:t})\) we have:

$$ \begin{array}{@{}rcl@{}} \mu_{t}(j) &=& \log(p(o^{t}|\mathbf{a}^{t}=j)) + \max\limits_{j^{\prime}}[\log(p_{\text{gd-hmm}}(\mathbf{a}^{t}=j|\mathbf{a}^{t-1}=j^{\prime}))\\ &&+ \log(p_{\text{cd-hmm}}(\mathbf{a}^{t}=j|\mathbf{a}^{t-1}=j^{\prime})) + \log(p_{\text{fhmm}}(\mathbf{a}^{t}=j|\mathbf{a}^{t-1}=j^{\prime})) \\&&+ \mu_{t-1}(j^{\prime})] \end{array} $$
(10)

In order to find the most probable activities, first we find \({\mathbf {a}^{*}}^{T} = j^{*} = \arg \max \limits _{j} \mu _{T}(j)\) and then trace back to get \({\mathbf {a}^{*}}^{T-1}=\arg \max \limits _{j^{\prime }}[\log (p_{\text {gd-hmm}}(\mathbf {a}^{T}=j^{*}|\mathbf {a}^{T-1}=j^{\prime })) + \log (p_{\text {cd-hmm}}(\mathbf {a}^{t}=j|\mathbf {a}^{t-1}=j^{\prime })) + \log (p_{\text {fhmm}}(\mathbf {a}^{T}=j^{*}|\mathbf {a}^{T-1}=j^{\prime }))+ \mu _{T-1}(j^{\prime })]\) in (10). We repeat this process to infer the whole sequence of activities aT, aT− 1, ..., a1, which can be done efficiently using dynamic programming.

4.2 Mixture of dependencies

We observe that the emission probability table does not have important role as the transmission probabilities in activity modelling. We also find that the influence of each type of dependency varies in different environments, depending on the complexity of the occupants’ activities. Therefore we generalize the log-probability in the ensemble such that each type of dependency is assigned with a different weights. This can be seen as a mixture of weighted log-probabilities which we call mixed-dependency model (MDM). MDM is a single model which capture all the dependencies, rather than an ensemble of different HMMs. The combined log-probability of this model is:

$$ \begin{array}{@{}rcl@{}} \mathcal{\phi}_{\text{MDM}}&=& \log p(o^{t}|\mathbf{a}^{t}) + \alpha \log p(\mathbf{a}^{0}) + (\upbeta+\gamma)\sum\limits_{m} \log p(a^{m,0}) \\ &&+\sum\limits_{t} \left[ \alpha \log p(\mathbf{a}^{t}|\mathbf{a}^{t-1}) + \sum\limits_{m}\upbeta\log p(a^{m,t}|\mathbf{a}^{t-1}) + \gamma\log p(a^{m,t}|a^{m,t-1}) \right]\\ \end{array} $$
(11)

where α, β, γ are non-negative weights. We can show that this MDM subsumes fHMM, cd-fHMM, gd-HMM and also the ensemble md-HMM. Indeed, the combined log-probability of MDM is equivalent to the log-probabilities of fHMM, cd-fHMM and gd-HMM with the assignments (α = 0,β= 0,γ = 1), (α = 0,β= 1,γ = 0) and (α = 1,β= 0,γ = 0) respectively. Similarly, if we set α =β= γ = 1/3 we would have ϕMDMϕmd-HMM. Interestingly, if we maximise the combined log-likelihood from ϕMDM given a constraint that α +β +γ = 1 we end up in finding the best model among fHMM, cd-fHMM and gd-HMM. This is similar as applying linear programming to maximise the log-likelihood of MDM while setting α, β, γ as variables. In practice, these values are selected empirically as shown in the experiments.

Similar to (10) in md-HMM, inference in MDM is efficient with μt(j) as:

$$ \begin{array}{@{}rcl@{}} \mu_{t}(j) &=& \log(p(o^{t}|\mathbf{a}^{t}=j)) + \max_{j^{\prime}}[ \alpha \log p(\mathbf{a}^{t}|\mathbf{a}^{t-1})\\ &&+ \sum\limits_{m}\upbeta\log p(a^{m,t}|\mathbf{a}^{t-1})+ \gamma\log p(a^{m,t}|a^{m,t-1})+ \mu_{t-1}(j^{\prime})] \end{array} $$
(12)

5 Related work

In multi-resident smart homes, HMMs [24] have been studied intensively, as being showed in previous studies [1, 5, 8, 25]. The first model could be employed is single HMM. However, due to the complexity of multiple activities it may need some modification. For example, the activities can be combined as joint labels so that they can be represented by a single hidden variable [5]. Another method to model the activities of multiple residents is to create multiple HMMs, one for each resident [6]. Such model, as known as parallel HMM, has been evaluated in the case that data association is provided. Another approach is coupling HMMs by assuming that the activity of a resident is dependent not only on his previous activity but also on the previous activities of other residents. There were proposals of coupled HMM and factorial HMM in computer vision domain [4], but only cHMM was employed for sensor data [6]. Combining HMMs of the same type was studied in [13]. Different from that, in this paper we ensemble HMMs of various types for activity recognition. Besides HMMs, Conditional Random Fields (CRFs) [10, 18] and incremental decision trees (IDT) [23] also have been used for multi-resident activity recognition.

6 Experiments

6.1 Datasets

The CASAS dataFootnote 2 was collected in the Washington State University Smart D9epartment Testbed with multi-residents where each resident performs 15 unique activities [9]. The data is collected in 26 days in a smart home equipped with 37 ambient sensors

The ARAS dataFootnote 3 [1] is collected in two different houses, denoted as House A and House B, in 30 days. In these environments, there are 20 sensors for two residents in each house.

6.2 Feature representations

There are several different ways to represent sensors’ state, as follows. The easiest way is, one can treat each state of all sensors as a “word” in a vocabulary set which has the size of \({\prod }_{i} |s_{i}|\), where |si| is the number of states of sensor i. Another representation method is to store the values of all sensors in a vector, as in [25]. However, one may argue that only a subset of sensors are triggered by human’s activities at a time. Therefore, it is still able to use a vector to represent the observation, but an element i is set to 1 if and only if the ith sensor changes its state, similar as [5]. Finally, we represent the sensors state as a one-hot vector where all elements are set to 0s except one element whose corresponding sensor has its value recorded along with the activities (no matter if the value is “ON” or “OFF”). This element is set to 1. This type of representation can only be obtained from the CASAS dataset. In Table 2 the notations “dis.”, “vec1.”, “vec2”, “vec3” indicate four different ways to represent the sensors’ states as discussed above respectively.

Table 2 Evaluation results of six variants of HMMs using leave-one-out validation

In the experiments we use leave-one-out cross validation for all datasets. In particular, the data of one day (one file) is employed for evaluation and the data of the other days are for training the models. We repeat the evaluation for every day and report the average accuracy.

6.3 Experimental results

6.3.1 Single dependency

In CASAS dataset, gd-HMM achieves much higher performance than the other variants with 69.127% accuracy. In comparison with other works which use the same evaluation method, in [18] the iterative CRF achieves 64.16% and in [6] pHMM achieves 61.78% accuracy. In [25] and [5], the authors report the accuracy of 60.60% and 75.77% respectively, but different from us they use threefold cross-validation. Also note that all these methods rely on the prior knowledge of data association while our HMMs do not. In ARAS House A and ARAS House B, cd-fHMM and gd-HMM achieve similar results which suggest that the current activities should be dependent on the combination of previous acitivities. The results in ARAS House A are small due to the complexity in its collected data, i.e. the number of available sensor states is \(\sim 9\) times larger than CASAS and \(\sim 3\) times larger than ARAS House B. For completeness, we also report the results in [1] with 61.5% and 76.2% for ARAS House A and ARAS House B respectively. Different from us, in that work the activities of each residents are grouped into 6 categories while we use all 27 activities. Overall, without data association the performance of parallel HMM drops dramatically. When coupling the activities using crossed dependencies, as in cHMM and cd-fHMM, we can observe improvement of performance. This means that activities of a resident indeed depend on the others’.

For feature representation, despite being simple discrete representation of sensors’ states works very effectively in all three datasets. In practice the number of the states would grow exponentially with respect to the number of sensors. Fortunately, not all states are available, for example in CASAS, ARAS House A, and ARAS House B we have only 73, 655, and 200 states of sensors respectively. However, for a larger dataset with more residents, it would be more preferable to use vector representation for the sensors’ states.

6.3.2 Mix dependency

From the results of six HMMs we find that modelling the interaction between residents and environment separately is not effective. Therefore we decide to construct an ensemble from fHMM, cdFHMM and gd-HMM, those consider the environment’s state as a result of all residents’ activities, as described in Section 4. Here α,β and γ are selected empirically. Two last rows in Table 2 show the results of ensemble model (md-HMM) and the mixed-dependency model (MDM) in comparison to the best accuracy from six HMMs extracted from Table 2. The results indicate that the ensemble model seems not very useful in CASAS data. This is because there are many misclassified activities from the parallel part which performs poorly in this case. In ARAS data, md-HMM performs very well and achieves better results than all variants. Combine with the performance of MDM, we can confirm our hypothesis that complexity of activities must be captured by multiple dependencies. Our MDM achieves impressive results in all three datasets, notably in ARAS data where it outperforms the others model with large margins. In particular, compare to the best HMMs from six variants MDM achieves higher accuracy of 0.738% in CASAS dataset, 28.858% in ARAS house A, and 8.392% in ARAS house B.

6.3.3 Model selection

One concern over the selection of α,β and γ may be raised when applying MDM in practice. In order to show the effectiveness of our model we now evaluate MDM with a held-out test set. We apply grid-like search to select the α,β and γ using a separate validation set. In CASAS data we use 24 days for training, 1 day for validation and 1 day for testing. In ARAS House A and ARAS House B we partition the data into 10 days for training, 10 days for validation, and 10 days for testing. The results are shown in Fig. 1. As we can see, MDM performs better than other models. In CASAS, MDM is slightly better than the best result from other models (0.69%). In ARAS House B, MDM is 2.14% better than the best HMMs and 1.59% better than the ensembles. Especially in ARAS House A MDM achieves at least 22.80% higher than other models. However, we observe that MDM may get overfitting in the case of CASAS data if the training data size small. This makes sense because the activities in CASAS is much less complex than the activities in ARAS.

Fig. 1
figure 1

Performance of all models on CASAS, ARAS House A and ARAS House B environments using model selection

For completeness we also compare MDM with incremental decision trees [23]. Similar to them, for training we use the first 1, 7, 14 and 21 days and we test the models on days from 22 to 28. We use days 29 and 30 for model selection. The results in Fig. 2 show that MDM peforms better than IDT. With different number of days for training, in House A MDM has 0.86%, 2.52%, 4.18%, and 3.81% higher accuracy than IDT; and in House B MDM is 0.43%, 1.41%, 2.32% and 4.13% higher.

Fig. 2
figure 2

MDM v.s IDT in ARAS House A & B

6.3.4 Compare with deep learning methods

Recently, deep learning has been employed for multi-resident activity recognition [22, 27]. In this experiment, we compare our MDM model with two notable deep learning models for sequence classification, namely Long Short-term Memory (LSTM) [17, 19] and Gated Recurrent Unit (GRU) [7]. In order to apply LSTM and GRU for the task in this paper we use multi ouputs to present activities of the residents. As we mentioned earlier, this work focuses on investigating the influence of multi-residents’ activities on their future actions, not on performance improvement. HMM is used because it is simple and easy to explain the dependencies of the activities. As we can see in Table 3, its simplicity is also a disadvantage in comparison to complex deep learning models. It intrigues an idea for future work on incorporating mixed-dependency with deep recurrent neural networks.

Table 3 MDM versus LSTM and GRU

7 Conclusions

This paper studies smart home environment with ambient settings, aiming to understand the insights of the behaviours of multiple residents. First, we break down the dynamics in such environment into activity dependencies and human-environment interaction. From that we construct six variants of HMMs for multi-resident activity modelling. We show that, good results can be achieved by using simple HMMs that capture the combined activities of all residents. Second, the key contribution of the paper is our proposal of a mixed-dependency model to deal with the complexity of multiple residents’ activities. The experimental results show that our model outperforms other dynamic Bayesian counterparts such as HMMs and CRFs which have been employed for activity recognition in ambient environment. Although the proposed MDM is less effective than deep learning approaches, it has shown a promising idea of incorporating mixed dependency with recurrent neural networks, which will be a future work.