Abstract
Accurately recognizing the rare activities from sensor network based smart homes for monitoring the elderly person is a challenging task. Activity recognition datasets are generally imbalanced, meaning certain activities occur more frequently than others. Not incorporating this class imbalance results in an evaluation that may lead to disastrous consequences for elderly persons. To deal with this problem, we evaluate a new model OS-WSVM combining Over-Sampling (OS) with Weighted SVM (WSVM). Our experiments are carried out on real world datasets, demonstrating that OS-WSVM is able to surpass SVM, OS-SVM and WSVM in Human Activity Recognition tasks.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In recent years, the classification problem with imbalanced data has received considerable attention in areas such as Machine Learning and Pattern Recognition. A two-class data set is said to be imbalanced when one of the class (the minority class) is heavily under-represented in comparison to the other class (the majority one) in the training dataset. In such situations, it is costly to misclassify activities from the minority class but the learning system may have difficulties to learn the concepts related to such activities, and therefore, results in the classifier’s suboptimal performance.
This paper deals with the problem of imbalanced data to assist sick or elderly people in performing daily life activities [1] such as cooking, brushing, dressing, and so on. Activity recognition datasets are generally imbalanced, meaning certain activities occur more frequently than others. These differences may correspond to how often an activity is performed, e.g. leaving is generally done once a day, while toileting is done several times a day, or to the number of time slices an activity takes up, e.g. of leaving activity generally takes up considerably more time slices than a toileting activity.
In recent years, there have been several attempts to deal with the class imbalance problem [2, 3]. Traditionally, research on this topic has mainly focused on a number of solutions both at the data and algorithmic levels. At the data level [4], solutions include many different forms of re-sampling such as Over-Sampling (OS), Under-Sampling (US). At the algorithmic level, solutions include adjusting the costs associated with misclassification so as to improve performance [5]. In [6], we proposed a new version of Weighted Support Vector Machines (WSVM) setting different cost parameters for each activity employed to handle the imbalanced human activity datasets. In this paper, we propose a new classification model named OS-WSVM that combines the oversampling method with WSVM method to deal the class imbalance problem. The experiments were implemented on multiple annotated real world datasets from sensor readings in different houses [7, 8].
2 Proposed Approach
2.1 System Overview
The main idea proposed in this paper is to entirely determine the boundary of datasets by the support vectors. Therefore Over sampling (OS) is only applied in the support vectors obtained by WSVM learning. Through this process, the performance of Weighted SVM can be enhanced in the imbalanced datasets. Moreover, the approach can reduce the processing time because the number vectors are bounded and become small. According to the proposed idea, the new algorithm can be expressed as follows (Fig. 1):
- Step1::
-
Use the Weighted SVM to deal with the imbalanced training datasets, and record the support vectors.
- Step2::
-
Sample the support vectors to improve the balanced degree between the majority class and the minority class by using Over Sampling technique.
- Step3::
-
Use the SVM to deal with balanced datasets, and get the ultimate classifier.
The outcome of the trained SVM will then be used to process a new observation during the testing phase where the associated activities of daily living class will be predicted.
2.2 Over-Sampling (OS)
This approach increases the number of minority class samples. The simplest approach is Random oversampling, in which examples from the minority class are chosen randomly. Chosen examples are then duplicated from the minority class to the original set and added to the training data, which implies that no information is lost.
2.3 Support Vector Machines (SVM) [9]
For a two class problem, we assume that we have a training set \( \left\{ {\left( {{\text{x}}_{\rm{i}} ,{\text{y}}_{\rm{i}} } \right)} \right\}_{{{\rm{i}} = 1}}^{\rm{m}} \) where \( {\text{x}} \in {\text{R}}^{\rm{n}} \) and yi are class labels either 1 or –1. The primal formulation of SVM maximizes margin 2/K(w, w) and minimizes the training error ξi simultaneously by solving
where w is normal to the hyperplane, b is the translation factor of the hyperplane and \( \varphi (.) \) is a non-linear function which maps the input space into a feature space defined by \( {\text{K}}({\text{x}}_{\text{i}} ,{\text{x}}_{\rm{j}} ) = \varphi ({\text{x}}_{\rm{i}} )^{\text{T}} \varphi ({\text{x}}_{\rm{j}} ) \). Solving dual formulation of Eq. (1) for the Lagrange multipliers \( \alpha \) gives a decision function for classifying a test point \( {\text{x}} \in {\text{R}}^{\rm{n}} \)
with \( {\text{m}}_{\rm{sv}} \) is the number of support vectors \( {\text{x}}_{\rm{i}} \in {\text{R}}^{\rm{n}} \).
2.4 Weighted Support Vector Machines (WSVM) [10]
WSVM was presented to deal with the imbalanced problem by introducing two different cost parameters \( {\text{C}}_{ + } \) and \( {\text{C}}_{ - } \) in the SVM optimization primal problem [9] for the majority classes (yi = +1) and minority ones (yi = –1), as given in Eq. (1) below:
\( {\text{C}}_{ + } \) and \( {\text{C}}_{ - } \) are cost parameters for positive and negative classes, respectively.
Some authors [10, 11] have proposed adjusting different cost parameters to solve the imbalanced problem. Veropoulos et al. in [11] proposed to increase the cost of the minority class (i.e. \( {\text{C}}_{ - } > {\rm{C}}_{ + } \)) to obtain a larger margin on the side of the smaller class. In [6], we proposed a new criterion to choose the cost parameters for WSVM algorithm. The coefficients are adapted for each class of activity and typically chosen as:
where \( {\text{m}}_{ + } \) is the number of samples of majority class and mi is the number of samples of the other class. C is the common ratio misclassification cost factor of the WSVM. This parameter is determined with the cross validation method.
3 Simulation Results and Assessment
3.1 Datasets
We used fully labeled datasets [7, 8] gathered by a single occupant from three houses having different layouts and different non-intrusive sensor networks. Each network is composed of a different number of state-change sensors nodes such as reed switches to determine open-close states of doors and cupboards; pressure mats to identify sitting on a couch or lying in bed. The data was labelled using different ways for annotation. Time slices for which no annotation is available are collected in a separate activity labelled ‘Idle’. Table 1 shows the number of data per activity in each dataset.
3.2 Results
In this study, a software package LIBSVM [12] was used to implement the SVM multiclass classifier algorithm. First we optimized the hyper-parameters (σ, C) for all training sets in the range (0.1–2) and [0.1, 1, 10, 100] respectively to minimize the error rate of leave-one day-out cross-validation technique. Then locally, we optimized the cost parameter Ci adapted for each activity class by using WSVM [6] classifier with the common cost fixed parameter C = 1. The overall performance of our approach is compared with SVM, OS-SVM and WSVM and is summarized in Table 2. The results demonstrate that our approach outperforms other methods.
We show in Fig. 2 for the TK26M dataset, that OS-WSVM outperforms the other approaches for ‘Toileting’, ‘Showering’, ‘Breakfast’, ‘Dinner’ and ‘Drink’ activities and similar results with other methods for ‘Leaving’, and ‘Sleeping’ activities. The majority activities ‘Leaving’ and Sleeping’ are better for all methods while the ‘Idle’ activity is less accurate for the proposed method compared to other methods. Additionally, the kitchen-related activities as ‘Breakfast’, ‘Dinner’ and ‘Drink’ are in general harder to recognize than other activities.
In order to quantify the extent to which one class is harder to recognize than another one, we analyzed the confusion matrix of OS-WSVM for TK26M dataset in Table 3. We noticed that the activities ‘Leaving’, ‘Toileting’, ‘Showering’, ‘Sleeping’, ‘Dinner’ and ‘Drink’ are better recognized comparatively with ‘Idle’ and ‘Breakfast’.
The kitchen activities seem to be more recognized using the proposed method. In the TK26M house, there is a separate room for almost every activity. The kitchen activities are food-related tasks, they are worst recognized because most of the instances of these activities were performed in the same location (kitchen) using the same set of sensors. Therefore the location of sensors strongly influences the recognition performance.
4 Conclusion
Our experiments on real-world datasets from smart home environment showed that OS-WSVM strategy dealing with the class imbalance at the data and algorithmic levels can significantly increase the recognition performance to classify multiclass sensory data, and can improve the prediction of the minority activities.
In the future, it will be interesting to use the temporal features when the activity is performed to improve the activity classification performance. Also, the scalability of our approach will be further tested by considering datasets containing increased classes and various amounts of sensors.
References
Abidine MB, Fergani L, Fergani B, Fleury A (2015) Improving human activity recognition in smart homes. Int J E-Health Med Commun (IJEHMC) 6(3):19–37
Chawla N (2010) Data mining for imbalanced datasets: an overview. In: Data mining and knowledge discovery handbook, pp 875–886
Abidine MB, Yala N, Fergani B, Clavier L (2014) Soft margin SVM modeling for handling imbalanced human activity datasets in multiple homes. In: 4th international conference on multimedia computing and systems (ICMCS 2014). IEEE, Marrakesh, pp 421–426
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progress Artif Intell 5(4):221–232
Zhang Y, Wang D (2013) A cost-sensitive ensemble method for class-imbalanced datasets. In: Abstract and applied analysis, vol. 2013. Hindawi Publishing Corporation
Abidine MB, Fergani L, Fergani B, Oussalah M (2018) The joint use of sequence features combination and modified weighted SVM for improving daily activity recognition. PAA 21(1):119–138
http://sites.google.com/site/tim0306/. Accessed Mar 2017
Ordonez FJ, de Toledo P, Sanchis A (2013) Activity recognition using hybrid generative/discriminative models on home environments using binary sensors. Sensors 13:5460–5477
Fradkin D, Muchnik I (2006) Support vector machines for classification. DIMACS Ser Discrete Math Theor Comput Scis 70:13–20
Huang YM, Du SX (2005) Weighted support vector machine for classification with uneven training class sizes. In: Proceedings of the IEEE international conference on machine learning and cybernetics, vol 7, pp 4365–4369
Veropoulos K, Campbell C, Cristianini N (1999) Controlling the sensitivity of support vector machines. In: Proceedings of the international joint conference on AI, pp 55–60
Chang CC, Lin CJ (2017) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2:1–27 http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Abidine, M.B., Fergani, B., Seth, S. (2019). Human Activity Recognition in Smart Home Environment Using OS-WSVM Model. In: Hajji, B., Tina, G.M., Ghoumid, K., Rabhi, A., Mellit, A. (eds) Proceedings of the 1st International Conference on Electronic Engineering and Renewable Energy. ICEERE 2018. Lecture Notes in Electrical Engineering, vol 519. Springer, Singapore. https://doi.org/10.1007/978-981-13-1405-6_15
Download citation
DOI: https://doi.org/10.1007/978-981-13-1405-6_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1404-9
Online ISBN: 978-981-13-1405-6
eBook Packages: EngineeringEngineering (R0)