1 Introduction

Automatically recognizing motion activities, such as standing still, walking, running, etc, allows many applications in areas such as healthcare, elderly care and energy expenditure estimation. Because of its inherent property, accelerometers are most often employed in human activity recognition to measure the movement characters of human body. With the development of micro-electrical mechanical systems (MEMS), the accelerometers are miniaturized so that they can be embedded into small mobile devices.

Compared with traditional wearable activity recognition (Roggen et al. 2011) which fixes accelerometers on specific locations of human body, activity recognition based on mobile device faces a new problem of varying device locations and orientations because the users can put the device at different places as they prefer, such as hand, pocket, bag, etc. When the device is deployed at varying locations but with the same orientation, the embedded accelerometer may exert different forces because the movement patterns of different body parts are distinct, even when the user is doing the same activity. When the device is deployed with varying orientations but at the same location, the accelerometer will exert the same force. However, the sensor readings of the three axes will be different because the decomposed force along the coordinates of the device are related with the angles between the synthesized force and the three axes.

In this paper, a fast, robust and device displacement free activity recognition model is proposed to deal with the problem of varying device locations and orientations. First, the readings along three axes are synthesized and the magnitude of synthesized acceleration is used to extract features. This can eliminate the orientation difference of the mobile device, at the cost of losing direction information. Second, based on a lot of features, principal component analysis (PCA) is used to eliminate noise feature, extract robust features for recognition. Third, extreme learning machine (ELM) is used to adapt the recognition model to new locations in the online phase. The high confident recognition results will be selected and added into the training dataset. Then, the recognition model will be retrained through taking the advantage of fast learning speed of ELM. Experimental results show the validity and high performance of the proposed model.

The rest of this paper is organized as follows. In Sect. 2, some related work will be illustrated. Section 3 introduces the proposed model. Experiments and results will be presented in Sect. 4. Finally, we conclude this paper in Sect. 5.

2 Related work

As to varying device orientations, Reddy et al. (2008) propose to recognize transportation modes using the features extracted from the series of acceleration magnitude, which is the value of acceleration synthesization of three axes. Mizell (2003) shows that the average of the accelerometer signal over a reasonable time period can produce a good estimation of the gravity-related component, which in turn can be used to estimate the vertical and horizontal component of dynamic acceleration. Yang (2009) uses these two methods to recognize the user’s activities including sitting, standing, walking, running, driving and bicycling. Experimental results show that acceleration decomposition-based method performs a little better than acceleration synthesization-based method. However, Wang et al. (2010) show that the acceleration synthesization-based method outperforms the acceleration decomposition-based method for recognizing six typical transportation modes, including biking, busing, driving, stay, taking subway and walking. In addition, they demonstrate that the gravity estimation error will degrade the performance of acceleration decomposition-based method.

As to varying device locations, Kunze et al. (2005) propose a location recognition method which firstly identifies time periods where the user is walking and then leverages the specific characters of walking motion to determine the location of the body-worn sensor, including wrist, head, trouser’s pocket and breast pocket. However, for mobile device-based activity recognition, this method may not be feasible because it requires well-defined fixed positions. Kunze and Lukowicz (2008) demonstrate that acceleration caused by rotational motion is location sensitive and combining gyroscope with accelerometer is helpful to reduce the sensitivity.

To address the varying location and orientation issues simultaneously, Sun et al. (2010) attempt to extract features that are independent from or insensitive to orientation change. They build a support vector machine (SVM) classifier for all physical activities in all pocket positions. However, they assume that most of the people put their mobile phones in one of their six pockets around the waist, which simplifies this problem to a predefined scenario and may not be feasible in the real daily life.

As to the recognition model adaption, Lai et al. (2010) uses the body posture analysis flowchart to judge which bodily behavior is occurring. In order to adapt the detection algorithm to individual habits, the authors apply subtractive clustering method (SCM) to calculate the center position of the habitual inclination angles for each pose, which is used as the threshold for pose judgement. Zhao et al. (2011) propose a transfer learning-based algorithm which integrates a decision tree and the k-means clustering for personalized activity recognition model adaptation. However, above papers only take the model adaption across different users into consideration. To our knowledge, there is no existing work on the recognition model adaptation across different locations.

ELM is an efficient and practical learning mechanism for the single-hidden-layer feed-forward neural networks. The recent studies reveal that ELM can be successfully applied to fuzzy integral determination (Wang et al. 2011), fuzzy rule learning (Jun et al. 2011), image recognition (Chacko et al. 2011), and other real-world applications (Huang et al. 2011). The study on generalization performance of learning system has caused wide public concern over the recent years. One novel idea (Wang and Dong 2009; Wang et al. 2011) is to improve the generalization ability by maximizing the uncertainty inherent in the learning system. Such illumination is indeed helpful and useful to the discussion of ELM’s generalization. An ELM-based activity recognition algorithm is presented in the current manuscript. The latest explorations also provide some new human-centric pervasive applications with machine learning technologies (Xiao et al. 2011; Zhang et al. 2011). In this paper, based on the characteristics of ELM, we built a fast and robust model to classify the daily activity.

3 The ELM-based device displacement free activity recognition model

The proposed model contains two steps:

Step 1 Offline classification model construction and online activity recognition, as shown in Fig. 1. For offline classification model construction, firstly the readings of three axes are synthesized into magnitude series to get rid of orientation dependence. Statistic and frequency–domain features are extracted from magnitude series of synthesized acceleration. Then, PCA is used to reduce the dimension of feature space and reserve useful and robust features. With the characters of fast learning speed and high generalization capability, ELM is used to build the classification model. For online activity recognition, the unlabeled testing sample is generated with the same method as that used in the offline phase. Then the sample is classified by the ELM classifier and the classification result can be obtained.

Fig. 1
figure 1

The overview of ELM-based activity recognition. The dotted line represents the offline classification model construction and the solid line represents the online activity recognition

Step 2 Activity recognition model retraining and updating, as illustrated in Fig. 2. Based on the classification results, the confidence that a sample is correctly classified is estimated. The samples whose confidences are greater than a threshold, η, are selected to build up new training dataset, together with the training samples in Step 1. Then, the ELM classification model will be retrained and updated. As the new training samples may be collected from new device location, the updated model would adapt to unknown locations gradually.

Fig. 2
figure 2

Process of activity recognition model retraining and updating

3.1 Acceleration synthesization

Accelerometer detects and transforms changes in capacitance into an analog output voltage, which is proportional to acceleration. For triaxial accelerometer, the output voltages can be mapped into acceleration along three axes, a x a y a z . As a x a y a z are the orthogonal decompositions of real acceleration, the magnitude of synthesized acceleration can be expressed as:

$$ a=\sqrt{a_{x}^{2}+a_{y}^{2}+a_{z}^{2}} $$
(1)

a is the magnitude of real acceleration, but has no directional information. Therefore, the acceleration magnitude-based activity recognition model is orientation independent.

3.2 Acceleration feature extraction and normalization

Based on the acceleration magnitude series, 17 statistic features (Figo et al. 2010) are extracted from a sliding window of 256 samples with 50% overlapping between consecutive windows. These features are mean, standard deviation, energy, mean-crossing rate, maximum value, minimum value, first quartile, second quartile, third quartile, four amplitude statistic features and four shape statistic features of the power spectral density (PSD) (Wang 2004). In addition, based on FFT transformation of the acceleration magnitude series, all frequency components from 1 to 128 Hz are extracted and added into the feature vector, totally 145 features.

PSD is defined as the Fourier transform of the autocorrelation of the time series signal and describes the energy distribution of a signal in the frequency domain. The amplitude statistics is defined as:

$$ {\rm Amplitude}\!\!: \mu_{\rm amp}=\frac{1}{N}\sum_{i=1}^{N}C(i) $$
(2)
$$ {\rm Std}\!\!: \sigma_{\rm amp}=\sqrt{\frac{1}{N}\sum_{i=1}^{N}[C(i)-\mu_{\rm amp}]^2} $$
(3)
$$ {\rm Skewness}\!\!: \gamma_{\rm amp}=\frac{1}{N}\sum_{i=1}^{N}\left[\frac{C(i)-\mu_{\rm amp}}{\sigma_{\rm amp}}\right]^3 $$
(4)
$$ {\rm Kurtosis}\!\!: \beta_{\rm amp}=\frac{1}{N}\sum_{i=1}^{N}\left[\frac{C(i)-\mu_{\rm amp}}{\sigma_{\rm amp}}\right]^4-3 $$
(5)

where C(i) is the PSD magnitude for the ith frequency bin, and N is the number of the frequency bins. Similarly, the shape statistics is defined as:

$$ {\rm Mean}{:} \, \mu_{\rm shape}=\frac{1}{S}\sum_{i=1}^{N}iC(i) $$
(6)
$$ {\rm Std}\!\!: \sigma_{\rm shape}=\sqrt{\frac{1}{S}\sum_{i=1}^{N}(i-\mu_{\rm shape})^2C(i)} $$
(7)
$$ {\rm Skewness}\!\!: \gamma_{\rm shape}=\frac{1}{S}\sum_{i=1}^{N}\left(\frac{i-\mu_{\rm shape}}{\sigma_{\rm shape}}\right)^3C(i) $$
(8)
$$ {\rm Kurtosis}{:}\, \beta_{\rm shape}=\frac{1}{S}\sum\limits_{i=1}^{N}\left(\frac{i-\mu_{\rm shape}}{\sigma_{\rm shape}}\right)^4C(i)-3 $$
(9)

where \(S=\sum\nolimits_{i=1}^{N}C(i). \)

To eliminate the scaling effects among different features, all the features are normalized using the z-score normalization algorithm (Han and Kamber 2000).

3.3 PCA-based dimension reduction

Suppose the normalized feature matrix X is of dimension M × N, where M is the number of samples, N is the number of exacted features. The PCA transformation can be represented as \(Y_{M \times K}=X_{M \times N} \cdot B_{N \times K}, \) where K < N. \(B=[b_{1},b_{2},\ldots,b_{k}]\) is a set of basis vectors which are linearly independent and orthogonal, and can be calculated as follows (Wang 2004):

Step 1 Calculate the covariance matrix of the original feature matrix as

$$ S=\frac{1}{M}(X-\mu)^{T}(X-\mu) $$
(10)

where μ is the mean vector of the feature set. Because the normalized feature vectors have zero mean, the calculation of the covariance matrix can be simplified as

$$ S=\frac{1}{M}X^{T}X $$
(11)

Step 2 Calculate the eigenvalues and the corresponding eigenvectors of the covariance matrix S.

Step 3 Suppose the eigenvectors are sorted in terms of their eigenvalues and those corresponding to the largest K eigenvalues are chosen to construct the transformation matrix \(R=[b_{1},b_{2},\ldots,b_{k}]\) of corresponding eigenvalues \(\lambda_{1} \geq \lambda_{2} \geq \cdots \geq \lambda_{K}. \) The choice of K can be determined as

$$ \frac{\sum_{i=1}^{K}\lambda_{i}}{\sum_{i=1}^{M}\lambda_{i}} \geq 1-\eta $$
(12)

where η denotes the loss of energy.

In this paper, PCA is used (1) to denoise the samples; (2) to reduce the dimensionality of the samples and (3) to extract location-insensitive features for later model construction or online recognition.

4 ELM-based recognition model construction and location adaption

4.1 ELM-based classifier

ELM is a recent neural network algorithm, which is known to achieve good performance in complex problems as well as reduce the computation time compared with other machine learning algorithms (Huang et al. 2004a, b, 2006, 2011). The ELM algorithm does not train the input weights or the biases of neurons, but it acquires the output weights by using the norm least-squares solution and Moore–Penrose inverse of a general linear system (Feng et al. 2009; Huang and Chen 2007, 2008; Huang et al. 2010). By finding the node giving the maximum output value, we decide the final result.

Figure 3 shows the network structure of ELM with a single hidden layer used for our experiments. We used 50 hidden neurons and the sigmoid activation function.

Fig. 3
figure 3

The network structure of ELM

The learning phase for the ELM with a single hidden layer can be summarized as Fig. 4.

Fig. 4
figure 4

The ELM algorithm

In the testing phase, for a testing sample x, the outputs can be calculated as follows:

$$ TY_{1\times m}=[g(w_{1} \cdot x+b),\ldots,g(w_{\tilde{N}} \cdot x+b)]_{1\times \tilde{N}} \cdot \beta_{\tilde{N}\times m} $$
(13)

\(TY=[\alpha_{1},\alpha_{2},\ldots,\alpha_{m}], m\) is the number of output nodes, which equals the number of classes in classification problem. Then, the ELM selects the maximum value of TY and assigns its corresponding index, j, as the class label of the test sample. We can calculate the sample’s confidence to the assigned class by the following steps:

$$ TY_{i}=TY_{i}-{\rm min}(TY_{i}) $$
(14)
$$ {\rm confidence}=\frac{{\rm max}(TY_{i})}{\sum TY_{i}},\quad i=1,2,\ldots,m $$
(15)

4.1.1 Recognition model adaptation to new locations

On the online phase, the user may place the mobile device to new locations. In order to adapt to new locations, the recognition model is retrained based on high confident recognition results by taking advantage of the fast learning speed of ELM. The model adaptation process contains following three steps:

  1. 1.

    During recognition, the test samples whose classification confidence larger than threshold η will be reserved and added into the training dataset.

  2. 2.

    When the number of new training samples exceeds the predefined threshold, ELM is used to retrain the recognition model based on whole training dataset.

  3. 3.

    Replacing the old recognition model with the new model.

5 Experimental evaluations

5.1 Data collection

This paper aims to propose a common method for location-adaptive activity recognition, based on triaxial accelerometer embedded in any mobile devices. The device used in this paper is a white box made by our hardware engineer. A XSens MTx sensor,Footnote 1 which contains a triaxial accelerometer, a triaxial gyroscope and a compass, is embedded in the box. This device is only used for data collection. All data is transmitted to a PC and all the data preprocessing and analysis are done on the PC. The sampling rate of accelerometer is set to 100 Hz. Four participants with varying age and gender are recruited to perform five daily activities, including staying still, walking, running, going upstairs and going downstairs. During data collection, the sensor box can be placed at the subject’s hands (left and right), chest pockets (left and right) and trousers pockets (left and right), as shown in Fig. 5. For each location, all participants will do each activity for 8–10 min. To ensure the quality of collecting samples, every two participants are organized into one group. When one is performing the activities, the other is recording the corresponding information, such as activity type, start time, terminal time and so on. Table 1 shows the number of samples obtained for each activity.

Fig. 5
figure 5

Location information: a in the right trousers pocket; b in the right hand; c in the right chest pocket; d in the left trousers pocket; e in the left hand; f in the left chest pocket

Table 1 Activity sample information

5.2 Experimental results

5.2.1 Classifier performance comparison

The ELM algorithm is compared with two popular classifiers, SVM and nearest neighbor (NN), to evaluate its performance. Based on the dataset obtained in Sect. 5.1, 10-fold cross-validation test is done for each classifier. The number of ELM’s hidden neutron nodes is set to 50 and the activation function is ‘sigmoid’. The training time (s), testing time (s), training accuracy and testing accuracy are listed in Table 2.

Table 2 Performance of three classifiers

As can be seen from Table 2, ELM obtains the highest accuracy among them, which is about 5 and 15% higher than that of SVM and NN, respectively. In addition, the training time and testing time of ELM are much less than that of SVM and NN. For 10-fold cross-validation test of 8,473 samples, the total training time of ELM is 14.98 s, while SVM consumes 197 s. This indicates that ELM has a much faster training speed than SVM. The total testing time of ELM is 0.91 s, which is obviously less than that of SVM and NN.

5.2.2 Cross-location recognition without dimension reduction and model adaptation

In order to evaluate the performance degradation of recognition model when the device location is changed, an experiment of cross-location recognition is done. For each location, a classifier is learned and is tested on all the locations.

Table 3 shows the confusion matrix of cross-location recognition using ELM. The first column of Table 3 is training location and the first row is testing location. From Table 3 we can see that the ELM performs well at the same location but performs poor at the others. The maximum reduction is about 30% when the model learned from chest pocket is applied to hand. The minimum is about 6% when the model learned from hand is applied to chest pocket.

Table 3 Confusion matrix of cross-location recognition by ELM

In Table 4, the confusion matrix of cross-location recognition using SVM is listed. Compared with Table 3, we can see that ELM and SVM obtain approximate accuracies at known locations, but the accuracies of SVM at new locations decrease a lot and much lower than that of ELM.

Table 4 Confusion matrix of cross-location recognition by SVM

As can be seen from Table 5, NN always gets 100% accuracy at same locations because it is an instance-based lazy learning algorithm and the training dataset and testing dataset are same. Compared with Table 3, we can see that NN also performs worser than ELM at new locations.

Table 5 Confusion matrix of cross-location recognition by NN

5.2.3 Dimension reduction by PCA

In Sect. 3.2, we have extracted 145 features. Of these features, some are useful and some may be noise. In order to eliminate the noise features and extract robust ones, PCA is used in our experiments. Figure 6 shows the correlation between the preserved energy and the number of principal components in the transform space. From Fig. 6 we can see that, if the loss of energy is set as 0.05, 30 principal components are enough. Thus, for simplification, the number of principal components is set as 30 in the following experiments.

Fig. 6
figure 6

The effect of PCA

Seen from the confusion matrix of cross-location recognition using ELM illustrated in Tables 3 and 6, we can easily find that all the recognition accuracies of cross locations are almost increased about 2–8%. This proves that PCA indeed eliminates some noise features and the extracted features are more robust than the original ones.

Table 6 Confusion matrix of cross-location recognition after dimension reduction by ELM

Table 7 shows that the accuracies of applying one location’s model to other locations using SVM increase about 14–25%. But the accuracies at the same location decrease about 3–12%. However, in Tables 6 and 7, we can see that ELM outperforms SVM not only at known locations, but also at new locations. The similar situation can be found in Tables 5 and 8, where NN is used as classifier. Above experimental results demonstrate that ELM performs best and has strongest generalization capability among these three classification algorithm.

Table 7 Confusion matrix of cross-location recognition after dimension reduction by SVM

From Table 3 versus Table 6, and Table 4 versus Table 7 we can see that, after dimension reduction by PCA, the recognition accuracy on most unknown locations are increased, which proves that the extracted features by PCA can suit different device deployment locations better. However, this improvement is not obvious in Tables 5 and 8, which cause some confusion. The possible reason maybe that, compared with ELM and SVM, NN has the worst generalization capability and thus confuses the samples of different locations. The recognition accuracy of NN at the same location is meaningless because one sample is always the nearest neighbor of itself.

Table 8 Confusion matrix of cross-location recognition after dimension reduction by NN

In the experiments from Sect. 5.2.1 to 5.2.3, we can see that compared with SVM and NN, ELM has the fastest training and testing speed, the strongest generalization capability. Therefore, we will use ELM as our classifier in the following experiments.

5.2.4 Cross-location model adaptation

In this section, the experiments aim to test the ELM model’s adaptability to new locations. Three locations, hand, chest pocket and trousers pocket, are presented as A, B and C. The datasets of these locations are represented as Data A Data B and Data C , respectively. Each dataset is randomly divided into two parts, which are represented as Data A1 and Data A2, Data B1 and Data B2, and Data C1 and Data C2.

Without loss of generality, we first assume that A and B are known locations and C is a new one. Train AB , which equals \(Data_{A1} \bigcup Data_{B1}, \) is used to train an ELM model named as initial model. Data C1 is used to adapt the initial model to a new one. Test AB , which equals \(Data_{A2}\bigcup Data_{B2}, \) is used to test the two model’s classification capability on the known locations. Data C2 is used to test the two model’s classification capability on the new location. For the initial model and each test sample in Data C1, if the classification confidence, η, is larger than 0.5, it will be added into a new dataset, HConf C1. Then, using all samples in Train AB and HConf C1, a new recognition model can be retrained.

The performances of the initial model and the new model on the known locations are shown in Table 9. We can see that after model adaptation, the new model almost has the same classification capability as the initial model. The performances of the initial model and the new model on the new location are shown in Table 10. We can see that after adaptation accuracy is improved about 6%.

Table 9 Recognition results on the known locations before and after adaptation
Table 10 Recognition results on the new location before and after adaptation

When B and C are assumed as known locations and A is as the new location, experiment results are shown in Tables 11, 12. After adaptation, the accuracy is improved about 7%.

Table 11 Recognition results on the known locations before and after adaptation
Table 12 Recognition results on the new location before and after adaptation

When A and C are assumed as known locations and B is as the new location, experiment results are shown in Tables 13, 14. After adaptation, the performance is improved more than 12%.

Table 13 Recognition results on the known locations before and after adaptation
Table 14 Recognition results on the new location before and after adaptation

6 Conclusions and future work

In this paper, we research on the physical activity recognition on an accelerometer-embedded mobile device. In contrast to the previous work of assuming that the phone is placed in a fixed position, this paper intends to recognize the physical activities in the daily life when the device locations and orientations are varying.

We propose a fast and robust activity recognition model to deal with the problem of varying device locations and orientations. ELM, a fast learning classification method, is used to retrain the recognition model online and adapt the model to new locations. Experimental results demonstrate that model adaptation improves the recognition accuracy obviously without any knowledge of new locations.

In the future, we will employ 30 persons to collect data of more daily activities and consider about more locations where the device is placed. To build a robust model, the participants should have different physical conditions such as gender, age, height, et al. We will collect the realistic data as the participants go about their normal activities. With the number of activities increased, the model will be more and more complex. Then we will test the performance of linear PCA and nonlinear PCA, such as kernel PCA. If nonlinear PCA outperforms linear PCA, we will select it as our dimensionality reduction method.