Keywords

1 Introduction

Bedridden is a term to describe patients that have to stay on the bed and cannot leave their bed. When bedridden patients do not change their posture regularly, it can cause medical complications, such as, bed sore and urinary tract infection [1]. However, those complications can be prevented by reminding care giver to change patient’s posture frequently.

The sleep posture monitoring system is proposed in order to prevent medical complications. There are many kinds of sensors that can detect the patient movement on the bed. In this paper, Kinect is chosen to capture the patient movement on the bed. The input data represents the body joints of the patient on the bed. The data then sent to the web server to classify the postures. The system then informs the user via mobile notification regarding the posture information. However, this work will focus mainly on the algorithm aspect of the system and the process of selecting the classifiers to be implemented in the system.

This work contains 5 sections: Sect. 1 Introduction, Sect. 2 Literature review which will conclude the other works that focus on sleeping posture Sect. 3 Sleep posture recognition system which will describes how the structure of the proposed system is and how to implement and assess different machine learning algorithm, Sect. 4 Experimental result which will report the result of model assessment from Methodology part, and lastly, Sect. 5 Conclusion.

2 Literature Review

There are many works that attempt to differentiate the sleeping postures of a person. The goals are mainly to prevent any medical complex caused by staying in the same position for too long. For example, pressure ulcer in the case of patients in the postoperative period, patients with Gastroesophageal Reflux Disease (GRD) which some works stated that laying on their side can reduce its effect [2] or patients with obstructive sleep apnea where their throat muscles are relaxed and result in narrowed airway. The works are separated by the sources of the data used in the implementation of algorithms.

The following works proposed recognition systems with a single data source in which the data used to extract the recognition features come only from one type of sensor. The sensor can be either an array of a force sensor or single depth camera.

Zachary et al. [3] implemented a system using four load cells which are placed beneath the bed. Center of Pressure (CoP) in both x-axis and y-axis are calculated from the signals from four load cells then CoPs are used to determine Angle of displacement (ANG) of each sample. Finally, K-mean clustering is used to classify the data into four groups and the generalized accuracies are 0.68, 0.57, 0.69, and 0.33 in the cases of Back, Right, Left, and Stomach respectively. When the number of groups is reduced to, the generalized accuracies are 0.92, 0.75, and 0.86 in the cases of Back/Stomach, Right, and Left.

Similarly, Yousefi et al. [4] focused on using Force Sensor Array (FSA) which comprises of 32 × 64 force sensors that can measure the pressure up to 100 mmHg per sensor. The data from FSA system can be treated as image data. After normalization, the data were projected into an Eigenspace then two different methods which are Principal Component Analysis (PCA) and Independent Component Analysis (ICA) are applied separately. K Nearest Neighbor (KNN) classifier is used to classify each sleep posture. Both methods illustrate satisfactory accuracies which are 97.7 and 94.3% in the cases of PCA and ICA respectively. Moreover, recall rate is also in the range of 90–99% in each posture for both ICA and PCA.

On the other hand, Timo et al. [5] proposed sleep posture recognition system using depth camera which was hung from the ceiling. A 3D graph of depth called Bed Aligned Map (BAM) is extracted from each depth image. Three layers of Convolutional Neural Network (CNN) are applied to BAMs. The outputs from CNN are then fed as inputs for Multilayer Perceptron (MLP) that classifies four classes of sleep positions which is Empty, Right, Supine, and Left. However, they also try the others two methods which are MLP without CNN and Histograms of Oriented Gradients (HoG) with Support Vector Machines (SVM). Their results conclude that their first proposed method (CNN + MLP) is the best method among the rest with accuracies ranged from 94.00–98.40% in each class. The second-best method is HoG + SVM with accuracies from 85.20–96.00% while the worst method is MLP that shows accuracies in ranged from 78.80–84.80%.

Some works rely on a system that has more than one sensor (or source of raw data) in an attempt to reduce disadvantages of each sensor.

Weimin et al. [6] proposed a multimodal system that combines video data and force sensors together. This requires spatial-temporal registration to consider both data at once. Color map and edge map are extracted from each frame of the video data while the data from Force Sensing Resistor (FSR) are treated as 2D data. Moreover, joint feature, which is the leg count, is also extracted. PCA is applied to the features before a training session of multi-class SVM. Their result can be interpreted that with each sensor alone, the accuracies are lower than multimodal approach and the accuracies are improved with a present of the joint feature. They achieve the best accuracies at 99.01 and 94.05% in case of Person dependent and Person independent, respectively.

Similarly, Torres et al. [7] proposed multimodal sleep posture recognition system which comprises of three sensors namely, pressure sensor, depth sensor, RGB sensor. There are 10 sleep postures: Soldier U, Soldier D, Faller R, Faller L, Log R, Log L, Yearner R, Yearner L, Fetal R, and Fetal L plus 1 case of background; 3 light conditions: bright, medium, and dark; and 4 occlusions: clear, blanket, pillow, and blanket and pillow. The raw data are subtracted with the background then converted to gray scale data and normalized. Then, HoG is used to extract the feature from RGB data while Image Geometric Moments (gMOM) is used to extract the feature from both depth data and pressure data. After that, data from each sensor are trained with Support Vector Classifier (SVC) and Linear Discriminant Analysis (LDA). The outputs are compared with the ground truth label to estimate the trust of each sensor. Each trust then becomes the weight of the corresponding classifier. Finally, the output of each model is used to find the maximum which then becomes the output of the multimodal classifier. In the case of the best situation (Bright light condition and clear occlusion) both SVC and LDA can achieve 100% accuracies while in the case of the worst situation (Dark light condition and Blanket and pillow occlusion), the accuracies are 17.7 and 18.6% (SVC and LDA, respectively).

In the next section, our implemented system used to recognize the sleep posture will be explained.

3 Sleep Posture Recognition System

In this section, the first part of the bedridden monitoring system is an explanation of proposed system design which will cover the overview of the system. The second part is about how the data are acquired and how the data are processed and used to train the classification model.

3.1 System Design

Our proposed system consists of 3 main parts namely; Sleep data collection part, Sleep posture analysis part, and Sleep notification part. The first part, sleep data collection part, comprises of the computer connecting to Kinect that continuously monitors a bedridden patient, the information from this part is sent to Cloud service in sleep posture analysis part via the Internet. The algorithm then applied to the data from Cloud service via Microsoft Azure Machine Learning Studio to identify the sleep posture. The last part, Sleep notification part, communicates with a mobile application to notify the user regarding patient’s sleeping posture. Figure 1 shows an overview of proposed system.

Fig. 1
figure 1

An overview of system architecture

Data from Kinect are received at 30 frames per second. In every 5 s, the system determines whether the patient is in the frame of the depth sensor. If the patient is not present in the frame, the system will notify the user that the patient is missing from the frame. If the patient is in the frame, the system considers 10 sequential frames of skeleton joints data from Kinect. The data from Kinect contains confidence of each joint which ranged from 0 to 1. The data is more reliable when the value is closer to 1. However, only one frame which has the highest skeleton joint confidence value will be chosen to find the posture. The process of finding the posture will be discussed in the next section. The posture of the current frame is then compared with the previous candidate. If the patient changes the posture, the system will apprise the user about the new posture of the patient. On the other hand, if the patient stays in the same posture for more than a certain period of time (for example, 60 s), the system will inform the user that the patient is staying in the same posture for too long. This feature could be applied in order to prevent bed sore.

3.2 Kinect Data Analysis

In this section, the details about data collection and data analysis will be given.

Experimental Data Acquisition. A mattress is placed in the same position and Kinect sensor is placed with camera tripod just below the mattress and oriented so that it points downward to the mattress. Data are collected from six normal people; the data from three of them are used to train and test the models and data from another three people are used only to test the model. There are three sleep postures that will be considered, namely: Normal (face upward), LeftFlip (face leftward), and RightFlip (face rightward). Each people have to repeat each posture for 30 times. During data collection, the mattress and the tripod are positioned in the same position and orientation.

Data Processing. In order to analyze the data from Kinect, Microsoft Azure Machine Learning Studio (MAMLS) is chosen as a tool due to its ability to provide a result as a web service in the context of a system architecture [8]. Furthermore, the performance of MAMLS can analyze and return the result instantly. The data collected from the previous step are uploaded into dataset of MAMLS. The data from three people are merged together into a single dataset. After that, the dataset is normalized with Normalize Module. Then, the dataset is split into a training dataset and a testing dataset with a ratio of 7:3, respectively.

The data concerned is skeleton joint position derived from Kinect sensor and after observations, we found that the value of the position of Z-axis in the left shoulder and right shoulder are good candidates for classification since the value in the Z-axis of both shoulders vary the most when the patient changes their posture. Figure 2 illustrates the plots of values from Z-axis of both shoulders with LZ being Left Z-value and RZ being Right Z-value while the status indicates the status of data belong to. Status at 0.5 means Normal, 0 means LeftFlip, and 1 means RightFlip.

Fig. 2
figure 2

The plot of Z-value from left and right shoulder and their status

However, the model is separated into two models, one to classify LeftFlip and Normal, the other one to classify RightFlip and Normal. Both models are merged into a single model that will predict three classes. There are three classifying techniques that will be used and compared: Decision Tree (DT), Neural Network (NN), and SVM. After deploying web service, the API key will be used to communicate with web service.

Models Performance Assessment. After the models are trained, the models will be evaluated to find the most suitable technique to be implemented in the system. The result of evaluation will be elaborated in Sect. 4.

4 Experimental Results

The experiment can be divided into 3 sections; Validation with the testing dataset, Validation with new dataset absent in the model, and Validation with both testing dataset and new dataset without error exclusion. Note that every section is performed with error data exclusion except the last section with does not exclude error data (Error in this work means the data with low joint data confidence level occurring by Kinect camera).

4.1 Validation with Testing Dataset

We then test each algorithm with testing dataset while the data with error are excluded from the testing process. The results from the same person within the training dataset will be discussed.

Table 1 shows the accuracies of each subject that were averaged over every class and the average accuracies of each classifier technique over every subject. It evidentially reports that NN and SVM which achieve 100% accuracies can perform better than DT which achieves at 93.33% accuracy. The data collected for this testing is derived from the same subjects that are used to train the model.

Table 1 The results from first three subjects for each classifier

4.2 Validation with New Dataset Absent in the Model

The validation in this section is done by using a dataset which is not present in the training dataset. In the case of using data from three subjects that are new to the model, DT performs at 90% accuracy while both NN and SVM perform at 100% accuracies (Table 2).

Table 2 The results from new subject

4.3 Validation with Both Testing and New Datasets Without Error Exclusion

There is no way to compare the performance of NN and SVM with the current results since they both achieve 100%. We then set up another experiment where, unlike the first experiment, every data point is included no matter if it contains error or not, in order to compare the performance of NN and SVM.

From Table 3, NN shows better performance at 63.33% than SVM which is 57.78%. According to the aforementioned result, it can be seen that NN is the best classifier among the 3 chosen algorithms. Finally, NN will be the classifier to be implemented with the proposed system.

Table 3 The results from the data without any exclusion

5 Conclusion

According to the result, NN is the best model to be used when errors are present in the data. However, SVM also performs well in the case where errors are absent in the dataset. Nonetheless, DT performs the worst among 3 models but not in the aspect of visualization of the model since DT can be visualized easily with a simple tree diagram. The proposed system with NN can help notify users when the patient is absent within the frame or when the user stays too long in the same position.