Keywords

1 Introduction

Due to the popularity of portable smart devices (such as mobile phones), the security of smart devices and their private data has attracted wearer’s close attention. Therefore, establishing a more comprehensive identification mechanism is becoming an urgent need. For example, when the mobile phone is lost, the owner hope that the mobile phone can automatically lock or protect it.

Medical and psychological research showed that everyone’s gait had unique features. Gait recognition refers to person identity recognition based on gait features. It has the characteristics of not easy to forge, non-contact recognition, implicit and so on. Gait has become a new biometric for person identity recognition. At present, gait-based identification technology mainly includes two categories: one is based on video data; The other is based on wearable sensors.

This paper studies gait identification technology based on acceleration sensors. Specifically, in this paper, we collected the signal of the inherent acceleration sensor of the mobile phone and established a gait identification application based on the acceleration signal. We used it as a supplement to other person identificaiton technologies in order to strengthen privacy protection on mobile phone. However, in the process of achieving this overall goal, there is a difficult problem, that is, the recognition model and algorithm generated by specific training data can not adapt to the changing real scene. For example, the used environment of our mobile phones is constantly changing, such as wearer’s movement, road conditions and the placement of mobile phones. There is a large gap in data from different environments. A model based on a kind of data does not work effectively in another environment. Moreover, it is difficult to train a high-precision recognition model suitable for various environments according to various data.

Therefore, the idea of this paper is that the recognition model can quickly adapt to the changing environment. For example, users with mobile phones are sometimes walking, sometimes going upstairs and downstairs, and sometimes running. The recognition model needs to quickly generate the recognition model of the new environment from the general recognition model. For another example, the mobile phone will be placed in different parts of the body by the user, sometimes in the hand, sometimes at the waist, and sometimes in the trouser pocket. The method of this paper is to find out the relationship between the model and environmental variables. When environmental variables change, the general model adopts transfer learning technology to realize rapid adaptation.

These environment variables should be easy to monitor. This paper will mainly study two scene elements, namely wearer’s movement pattern and placement position of the sensors. In this paper, we will study which part of the recognition model plays a key role of person identification in changing scenes. Then, the corresponding parts of the recognition model are quickly adjusted according to the changing environmental characteristics, so as to quickly adapt to different scenarios even if there are only small samples.

The main chapters of this paper are as follows. The related works are introduced in the second part. A gait recognition model based on CNN is designed in the third part. In the fourth part, we explore which parts of the model extract and recognize the gait features brought by placement and activity patterns. Based on these findings, some optimized transfer strategies could be designed. At the same time, based on the composition of motion, the transfer direction of the model can also improve the convergence speed of the model. The last part is the conclusion.

2 Related Works

From the beginning of the 21st century, gait identification based on wearable sensors has been studied by many scholars. The gait recognition systems using acceleration sensors are mainly based on three class of technologies. Namely: (1) similar matching; (2) classification based on machine learning; (3) Deep Neural Network.

The method of similarity matching was usually based on distance measurement or correlation measurement. In this method, the collected gait data were compared with the stored gait template. Then the user’s identity was recognized according to the score of similarity.

The similarity measurement of gait data include: histogram similarity, Manhattan distance [1], Euclidean distance [6], correlation coefficient [4], Tanimoto distance [3] and Hamming distance [28], cosine distance [2]. In some literatures, some advanced measurement methods are introduced. They include: dynamic time warping (DTW), Derivative algorithm based on DTW [9], Cyclic Rotation Metric CRM [26], cycle matching using the fusion of multiple similarity scores of Min rule [32], a statistical Measure of Similarity (MOS) [37], fuzzy finite state machine rules [14], Jaccard distance [11], sparse-code collection (CSCC) [34], weighted voting scheme [13], Geometric Template Matching [38], Box approximation geometry [36], weighted Pearson correlation coefficients [33], curve aligning [16], computational theory of perceptions [14], statistical significance analysis [37], Gaussian membership functions [14], and etc.

The recognition algorithms based on machine learning are the classification process. Machine learning methods usually use gait features to train recognition models. Then the training recognition models are used for classification prediction. At present, most of the methods based on machine learning are supervised learning, and some advanced algorithms in unsupervised learning are also used for gait recognition. But they need experimental verification. Machine learning methods in gait recognition include: KNN (nearest neighbour) [30], support vector machines (SVM) [29], decision trees (DT) [35], random forests [25, 31], neural networks [18], hidden Markov model (HMM) classifier [22, 27], Gaussian mixture model (GMM) classifier [10], logistic regression [17], Bayesian network classifiers [23]. Other advanced methods are also mentioned, such as Linear Discriminant Analysis (LDA) [15], I-vector approach [7], principal component analysis (PCA) [12, 21], Fisher discriminant analysis [20], Gradient-Boosted Trees [39], spare-code collection [34], Gaussian Mixture Model-Universal Background Model GMM-UBM [19] etc.

Compared with the former two, the research of gait identification based on deep neural network is relatively less. The main research is as follows. Dehzangi Omid and etc. designed a deep convolutional neural network (DCNN) learning to extract discriminative features from the 2D expanded gait cycles and jointly optimized the identification model [40]. Gadaleta, Matteo presented a system named IDNet. IDNet was the first system that exploited a deep learning approach as universal feature extractors for gait recognition, and that combined classification results from subsequent walking cycles into a multi-stage decision making framework [41].

3 Proposed Approach

3.1 Structure of Network HIR-Net

In our convolution network model (HIR-Net), a five-layer convolution network structure was adopted. Each layer contained one-dimensional convolution and Activation Function (Relu). The first three layers were appended the maximum pooling. The last layer was appended batch regularization and dropout operation. Finally, the final recognition was carried out through a full connection layer. The convolution layers were used to extract deep spatial and temporal features. The structure of the model was shown in Fig. 1 and the model parameters were shown in Table 1.

Table 1. HIR-net model hyper-parameters
Fig. 1.
figure 1

Stucture of architecture

3.2 Transfer Strategies of HIR-Net Model

Generally, the first convolutional layer of the CNN model extracts small scale features. There are equivariant representations between different layers. So the extracted features from the deeper convolution layer are closely related to large scale features. The extracted features from the deeper convolution layer are closely related to specific tasks.

The wearer’s movement mode and placement of cellphone are the most common factors affecting gait characteristics. In this paper, we tried to find out which layers got the features related to person identification in various states. The detailed processes were explained in the section experiment.

4 Experiments

4.1 Data Acquisition and Preprocessing

  1. (1)

    Tester and Environment

    The age distribution of 34 testers was between 22 and 27 years old. There were 18 males and 16 females. The height and weight of male were 163 cm–181 cm and 52 kg–82 kg respectively. The height and weight of women were 150 cm–169 cm and 43 kg–58 kg respectively. Pants were sports pants or jeans. The test environment is indoor. Four Android mobile phones (Huawei glory V8, oppo find5, red rice Note1, Meizu pro5) were selected for the data-collecting devices. The data acquisition time was 1 min.

  2. (2)

    Normalization and Filtering

    The original data obtained from the accelerometer were the x-y-z triaxial acceleration. The original data were normalized in G (one acceleration of gravity). The formula 1 took the acceleration of X axis as an example. \(OAccx_i\) was originalrunning, going upstairs data and \(Accx_i\) was normalized data.

    $$\begin{aligned} Acc{x_i} = \frac{{OAcc{x_i}}}{{9.8\,m/{s^2}}}g \end{aligned}$$
    (1)
  3. (3)

    Organization of Data

    In order to avoid the difference of X-Y-Z axis data caused by orientation of the phones, two kinds of direction independent features were adopted in this paper. They were defined in formula 3 and formula 4. \(Acc_i = \left( {Accx_i,Accy_i,Accz_i} \right) \) was the acceleration data of the i-th sampling, which including X-Y-Z axis data. \(Accg_i = \left( {Accgx_i,Accgy_i,Accgz_i} \right) \) was the gravitational acceleration data of the i-th sampling, which including X-Y-Z axis data. \(Accu_i = \left( {Accux_i,Accuy_i,Accuz_i} \right) \) was the wearer’s acceleration data of the i-th sampling, which including X-Y-Z axis data. \( Accux_i,Accuy_i,Accuz_i\) are defined in formula 4. The data, inputing to model, contained two dimensional components, which were \(< Acct_i,Accp_i > \)

    $$\begin{aligned} Acct_i = \sqrt{Accx_i^2 + Accy_i^2 + Accz_i^2} \end{aligned}$$
    (2)
    $$\begin{aligned} Accp_i = Accxu_i \times Accxg_i + Accyu_i \times Accyg_i + Acczu_i \times Acczg_i \end{aligned}$$
    (3)
    $$\begin{aligned} Accux_i=Accx_i-Accgx_i,Accuy_i=Accy_i-Accgy_i,Accuz_i=Accz_i-Accgz_i \end{aligned}$$
    (4)
  4. (4)

    Data segmentation

    The collected data were time series data with a long period of time. In sample preparation, we used sliding window to segment time series data. In the following chapter, we would determine the size of sliding window by experiments. The size of the sliding window is 128 sample points.

4.2 Test Method

Firstly, the data domain was divided into source domain and target domain. In each test, we first trained the model on the source domain, and then transfered the model from the source domain to the target domain. In order to find the layer which has the strongest correlation with the specific environmental factors in the network model, the parameters of some layers were fixed while other layers could be fine tuned. Then, according to the convergence rate and accuracy of the model, the layers with strongest correlation to the specific environmental factors were found out.

4.3 Transfering on Movement Pattern

A person’s movement pattern (such as walking or running) is randomness and individual. In daily living, movement patterns always change. The dataset in this section contains four movements, i.e. walking, running, going upstairs and going downstairs. We took the gait data collected during the walking as the source domain dataset and the data collected from the other three movement patterns as the target domain dataset. The data set of the source domain contained the data collected from 34 subjects when they walk, and the data set of the target domain contained the data collected from 34 subjects when they run, went upstairs and went downstairs. The collection environment was indoor. The sensors were placed at the waist. The comparison of experimental elements between source domain and target domain was listed in Table 2. The goal of the experiment was to find the layers which had the strongest correlation with person identificaiton in changing movement pattern.

Firstly, the model was trained in the source domain dataset. Then the model was tranfered to the target domain dataset. In Fig. 2, Fig. 3, and Fig. 4, the notion \(finetune_-X+\) represents the parameters of all layers were imported from the source domain model and the parameters of layers before X were fixed in the target training. i.e.: The parameters of layers after X (including layer X) would be tuned in the target training. Such as, \(finetune_-4+\) indicates that the parameters of layer 1, layer 2 and layer 3 were imported from the source domain model and fixed during the training on the target domain. The parameters of the latter layers would be tuned in the target training. The notion \(base_-mode\) represents retraining the model in the target domain dataset from parameters initialized randomly.

The belows can be seen from Fig. 2, Fig. 3, and Fig. 4. Firstly, when the source domain model was directly used for prediction on the target data set, the prediction performance was very poor. As shown in curve notion \(finetune6 +\).

Secondly, the method based on transfer learning can better adapt to different movement patterns and converged faster. As can be seen from Fig. 2, Fig.  3, and Fig.  4, the performance of curve notion as \(base_-modal\) was poor.

Third, \(finetune_-1+\), \(finetune_-2+\), \(finetune_-3+\) have very similar performance. Therefore, we believed that the small-scale features extracted from the first two layers had little relationship with identity recognition under different motion modes. The features related to identity recognition mainly were the further combination of the previous features. The three layers and the back layers of the model got the identity features in different motion modes.

Therefore, when the model needs to switch between different motion models, the transfer strategy of fixing the first two layers can be adopted. This fine tuning strategy would not reduce the recognition rate, and could shorten the convergence time.

Table 2. Data set of transfering on movement patterns
Fig. 2.
figure 2

Fine tune on running data

4.4 Transfering on Placement of Mobile Phone

The mobile phone are often placed in different positions of the body. For example, mobile phones are sometimes held by hand, sometimes put on the waist, and sometimes put in the trouser pocket. Therefore, it is very meaningful to study the model to quickly adapt to the data collected from different placement. In this paper, we used four placement positions (waist, upper arm, trouser pocket and hand). The source domain dataset contains data collected from the phone on waist. The target domain dataset contains data collected from phones placed in the other three locations (i.e. upper arm, trouser pocket and hand). Both the source domain dataset and the target domain dataset contain data collected from 34 subjects when walking. The comparison of experimental elements between the source domain dataset and the target domain dataset was listed in Table 3.

Table 3. Data set of transfering on placed position

From Fig. 5, Fig. 6, and Fig. 7, we could get similar results. Firstly, when the source domain model was directly used for prediction on the target data set, the prediction performance was very poor. As shown in curve notion \(finetune6 +\).

Secondly, the method based on transfer learning can better adapt to different placement position. As can be seen from Fig. 5, Fig. 6, and Fig. 7, the performance of curve notion as \(base_-modal\) was not the best ones.

Thirdly, \(finetune_-1+\), \(finetune_-2+\), \(finetune_-3+\) have very similar performance. Therefore, we believed that the small-scale features extracted from the first two convolution layers had little relationship with identity recognition. When the parameters of the first two layers were fixed, the migrated model could achieve similar performance. The features related to identity recognition are mainly the further combination of previous features. The third layer and the back layers of the model obtain the identity features. As can be seen from the figure, if we chosed to fix the first two layers, we could obtain good performance in most cases.

Fig. 3.
figure 3

Fine tune on upstairs data

Therefore, when the model needs to adapt to the data collected from different positions, the transfer strategy of fixing the first two layers can be adopted. This fine tuning strategy would not reduce the recognition rate, and could shorten the convergence time.

4.5 Transfering Based on Motion Composition

Firstly, if the acceleration data was directly input into the model trained with different position data, the recognition rate was very low. This conclusion could be seen from the Table 4.

Secondly, from the perspective of human physiological structure and behavior, the movement of different parts of the human body is related to each other. For example, hand motion can be regarded as a combination of waist motion and arm motion relative to the waist. Therefore, we believed that the data collected from the hands contained the characteristics of the waist data. We tested this hypothesis from two aspects.

In this paper, we inputed the waist data into the hand model to test its accuracy. At the same time, we inputed the hand data into the waist model for accuracy test. We compared the above two accuracy rates. Because the hand motion includes the waist motion, the recognition rate obtained by inputting the waist data into the hand model was higher than that obtained by inputting the hand data into the waist model.

Fig. 4.
figure 4

Fine tune on downstairs data

Table 4. The data set of different parts are directly input into the trained model
Fig. 5.
figure 5

Fine tune on hand data

We used the waist data to fine tune the hand model (i.e. the hand model was transfered to the waist model) to test the accuracy. At the same time, we used the hand data to fine tune the waist model (i.e. the waist model was transfered to the hand model) to test the accuracy. Furthermore, we compared the above two accuracy rates. The results were shown in the Fig. 8 and Fig. 9. As can be seen from the figure, the effect of migrating from the handheld model to the waist model was better than that from the waist model to the handheld model. Because the handheld data contains the characteristic information of waist data, the convergence speed was accelerated and the accuracy was higher; On the contrary, the waist data does not contain some details of the handheld data and lacks corresponding features, so the reverse migration effect was poor.

Fig. 6.
figure 6

Fine tune on uparm data

Fig. 7.
figure 7

Fine tune on pocket data

Fig. 8.
figure 8

Transfering from hand to waist

Fig. 9.
figure 9

Transfering from waist to hand

5 Conclusion

The automatic feature extraction ability of convolutional neural network is more suitable for extracting hidden personality features from gait data. In order to make the recognition model quickly adapt to a variety of data sets, in this paper, we explored the rapid adaptation technology based on fine tuning. We explored which parts of the model extracted and identified gait features related to placement and activity patterns. Based on these findings, some optimized fine tuning strategies could be designed. At the same time, based on the superposition of motion, the transfer direction of the model could also improve the convergence speed and accuracy of the model.