Keywords

1 Introduction

As the number of elderly people in our society increases and the households will include someone who needs help performing basic activities of daily living such as cooking, dressing, toileting, bathing and so on [1, 2]. For their comfort and because the healthcare infrastructure will not be able to handle this growth, it is suggested to assist sick or elderly people at home. Sensor based technologies in the home is the key of this problem. Sensor data collected often needs to be analysed using data mining and machine learning techniques [3] to determine which activities took place. State of the Art methods used for recognizing activities can be divided in two main categories: generative models and discriminative models [3].

However, activity recognition datasets are generally imbalanced, meaning certain activities occur more frequently than others. However, the learning system may have difficulties to learn the concept related to the minority class. Many popular machine learning algorithms have been tried to see how well they can cope with the imbalanced situation [4], e.g. Weighted Support Vector Machine (WSVM) [5], k-Nearest Neighbors k-NN [5], random forests [6] and CS-SVM [7].

The main contribution of our work is twofold. Firstly, we demonstrate the efficiency of the standard discriminative method named Support Vector Machines (SVM) [3] combined with the Synthetic Minority Over-sampling Technique [8] in order to avoid the overfitting caused by imbalanced activity samples in smart homes. Secondly, this method is compared with the standard SVM, Linear Discriminant Analysis (LDA) [9] and Hidden Markov Model (HMM) [2].

2 Discriminative Models for Activity Recognition

2.1 Linear Discriminant Analysis (LDA)

Given a set of observations in n-dimensional space: \(D_{i} = \left\{ {x_{1}^{i} ,\, \ldots ,x_{{m_{i} }}^{i} } \right\}\;(x_{j}^{i} \,{ \in }\,R^{n} )\) from class \(C_{i} (i = 1, \ldots ,N,N\) is the number of classes), we assume that each of the class probability density functions can be modeled as normal distribution. Define the prior probabilities \(p(C_{i} )\), means \(\bar{m}_{i}\), and covariance matrices Σ i of each class:

$$\Sigma _{\text{i}} = \frac{1}{{m_{i} }}\sum\limits_{i = 1}^{{m_{i} }} {(x_{i} - \bar{m})} (x_{i} - \bar{m})^{{\mathbf{T}}}$$
(1)

where \(m_{i}\) is the number of patterns in class \(C_{i}\). With LDA all classes are assumed to have the same covariance matrices Σ i , …, Σ N , on (1). We assign the new feature vector that is to be classified x to \(C_{i}\) with the linear discriminant function \(d_{i}\). This function is obtained by simplification the quadratic discriminant rule [9]

$$d_{i} (x) = \log (p(C_{i} )) - \frac{1}{2}m_{i}^{T} S_{W}^{ - 1} \bar{m}_{i} + x^{T} S_{W}^{ - 1} \bar{m}_{i}$$
(2)

in which S w is the common covariance matrix

$$S_{W} = \sum\limits_{i = 1}^{N} {\frac{{m_{i} }}{m - N}}\Sigma _{i}$$
(3)

The classification rule is given in Eq. 4.

$$f(x) = i^{*} : \Leftrightarrow i^{*} = \arg \mathop {\hbox{max} }\limits_{i} d_{i} (x)$$
(4)

2.2 Proposed Approach for Activity Recognition (Smote-SVM)

Smote-SVM approach is shown in Fig. 1. In the training phase, we perform the necessary pre-processing on the activity data represented in a feature space. We need only to correct the class imbalance using the pre-classification named Smote strategy. The balanced data is then used to learn the SVM classifier. It will then be used to process a new observation during the testing phase where the associated ADL class will be predicted.

Fig. 1
figure 1

Diagram of Smote-SVM approach

  1. a.

    The Synthetic Minority Over-sampling Technique (SMOTE)

The SMOTE algorithm generates artificial data based on the feature space similarities between existing minority examples in the training set. Synthetic examples are introduced along the line segment between each minority class example and one of its k minority class nearest neighbors. The k-nearest neighbors (k-NN) are defined as the k elements of subset \(S_{{{\mathbf{min}}}} \,{ \in }\,S\) whose Euclidian distance between itself and \(x_{i} \,{ \in }\,S_{{{\mathbf{min}}}}\) under consideration exhibits the smallest magnitude along the n-dimensions of feature space X. To create a synthetic sample, the k-nearest neighbors are randomly chosen, then multiply the corresponding feature vector difference with a random number \(\delta \,{ \in }\,[0,1]\), and finally, add it to x i

$$x_{new} = x_{i} + (\hat{x}_{i} - x_{i} ) \times \delta ,$$
(5)

where \(x_{i} \,{ \in }\,S_{{{\mathbf{min}}}}\) is the minority instance under consideration, \(\hat{x}_{i}\) is one of the k-NN for \(x_{i}\): \(\hat{x}_{i} \,{ \in }\,S_{{{\mathbf{min}}}}\).

  1. b.

    Support Vector Machines (SVM)

We assume that we have a training set \(\left\{ {\left( {x_{i} ,y_{i} } \right)} \right\}_{i = 1}^{m}\) where \(x_{i} \,{ \in }\,R^{n}\) are the observations and y i are class labels either 1 or −1. The dual formulation of the SVM can be solved by representing it as a Lagrangian optimization problem as follows [3]:

$$\begin{aligned} \mathop {\hbox{max} }\limits_{{\alpha_{i} }} {\mkern 1mu} \quad \sum\limits_{i = 1}^{m} {\alpha_{i} } - \frac{1}{2}\sum\limits_{i = 1}^{m} {\sum\limits_{j = 1}^{m} {\alpha_{i} \alpha_{j} y_{i} y_{j} } } {\mkern 1mu} K(x_{i} ,x_{j} ) \hfill \\ {\text{Subject to}}\quad \sum\limits_{i = 1}^{m} {\alpha_{i} } y_{i} = 0\;and\;0 \le \alpha_{i} \le C \hfill \\ \end{aligned}$$
(6)

where \(K(x_{i} ,x_{j} )\) is the kernel, the radial basis kernel function (RBF) is used in the study:\(K(x,y) = \exp \left( {\tfrac{ - 1}{{2\sigma^{2} }}\left\| {x_{i} - x_{j} } \right\|^{2} } \right)\). \(\alpha_{i} > 0\) are Lagrange multipliers. The regularization parameter C is used to control the trade-off between maximization of the margin width and minimizing the number of training error.

Solving (6) for \(\alpha\) gives a decision function in the original space for classifying a test point \(x\, \in \,R^{n}\) [3]

$$f(x) = \text{sgn} \left( {\sum\limits_{i = 1}^{{m_{sv} }} {\alpha_{i} y_{i} K(x,x_{i} ) + b} } \right)$$
(7)

with \(m_{sv}\) is the number of support vectors \(x_{i} \,{ \in }\,R^{n}\).

In this study, a software package LIBSVM [10] was used to implement the multiclass classifier algorithm. It uses the one-versus-one method [3].

3 Experimental Results

We use an openly datasets [11] gathered from three houses KasterenA, KasterenB, KasterenC, having different layouts and different number of sensors, thus providing a diverse testbed. The activities performed using a wireless sensor network with a single man occupant. Data are collected using binary sensors, such as reed switches and float sensors. The sensor data were labelled using different annotation methods using Bluetooth headset or Handwritten diary. We separate the data into a test and training set using a “Leave one day out cross validation” approach [2].

As the activity instances were imbalanced between classes, we evaluate the performance of our models by two measures, the accuracy and the class accuracy. The accuracy shows the percentage of correctly classified instances, the class accuracy shows the average percentage of correctly classified instances per classes. They are defined as follows:

$$Accuracy = \tfrac{{\sum\limits_{i = 1}^{m} {[\inf erred(i) \,=\, true(i)]} }}{m}$$
(8)
$$Class\;\; = \tfrac{1}{N}\sum\limits_{c = 1}^{N} {\left[ {\tfrac{{\sum\nolimits_{i = 1}^{{\mathop m\limits_{c} }} {\left[ {\mathop {\inf erred}\nolimits_{c} (i) \,=\, \mathop {true}\nolimits_{c} (i)} \right]} }}{{\mathop m\nolimits_{c} }}} \right]}$$
(9)

in which [a = b] is a binary indicator giving 1 when true and 0 when false. m is the total number of samples, N is the number of classes and m c the total number of samples for class c. A problem with the accuracy measure is that it does not take differences in the frequency of activities into account. Therefore, the class accuracy should be the primary way to evaluate an activity classifier’s performance.

In our experiments, for the Smote-SVM method, the minority class examples were over-sampled using k = 4 nearest neighbors for Smote. We utilize the leave-one-day-out cross validation technique for the selection of width parameter for the SVM classifier. We found σ opt  = 1, σ opt  = 1 and σ opt  = 2 for these datasets respectively. The summary of the accuracy and the class accuracy obtained, for HMM, LDA, SVM and Smote-SVM methods performed using various real world datasets are shown in Table 1. This table shows that Smote-SVM performs better in terms of class accuracy.

Table 1 Class accuracy and Accuracy for HMM, LDA, SVM and Smote-SVM

Our results give us early experimental evidence that Smote-SVM works better for model classification; it consistently outperforms the other methods in terms of the class accuracy for all datasets. In the rest of section, we explain the difference in terms of performance between HMM and our method. HMM is trained by splitting the training data in which a separate model \(P(x|y)\) is learned for each class, as parameters are learned for each class separately. This is why HMM performs better for the minority activities. Our method shows that SVM becomes more robust for classifying the minority class.

4 Conclusion and Perspectives

Our experiments on real world datasets show that the choice of Smote-SVM approach can significantly increase the recognition performance to classify multiclass sensory data, and are less prone to overfitting caused by imbalanced datasets. It significantly outperforms HMM, LDA and SVM. Developing Classifiers which are robust and skew insensitive or hybrid algorithms can be point of interest for the future research in activity recognition. It would be interesting to compare Smote-SVM and Smote-CS-SVM [7] and then deciding which gives the best results.