1 Introduction

Surface EMG reflects the activity state of the nerve, it can be analyzed to push back the neural information, and the surface EMG signal has the advantages of noninvasive acquisition, bionics, etc., so it is well in the fields of prosthetic control, clinical diagnosis, motion detection, and neurological rehabilitation [1, 2]. Gesture recognition based on surface EMG signals is an important research topic in the practical application of surface EMG, and reliable and effective gesture recognition can help develop a good human–machine interface.

There is a relationship between the sEMG signal and the limb movement. Muscle contraction or relaxation causes different limb movements, and a large difference in bioelectrical signals is released [3, 4]. Feature extraction by sEMG signal and then analysis of features can be pushed to the active state of the muscle, thereby identifying the action pattern resulting from muscle contraction [5,6,7,8].

An auxiliary device for controlling neurological rehabilitation (for example, an active prosthesis) is realized through a human–machine interface. When detecting neuromuscular system information, interfaces can be connected to the brain, peripheral nerves, and muscle regions [9,10,11,12,13], among these potential options, muscle interface is currently the only viable method for controlling external devices in commercial and clinical systems [14, 15, 16]. Due to its noninvasive, relatively simple application, and rich neural information, surface EMG signals can be widely used in human–machine interfaces which can control prostheses in clinical and commercial fields [17, 18].

Studies have shown that human limb movement is a joint movement of muscles and bones controlled by the nervous system, so different peoples have different habitual exercise patterns [19, 20], even the same person has different modes of motions under different external conditions and different physical and psychological conditions. This puts high demands on EMG signal processing and feature extraction.

In addition, because the EMG signal is more complex, the performance requirements of the system are higher, and the response time of the pattern recognition system is slower, which has adverse effects on real-time prosthetic control or other applications [21,22,23].

In view of the above problems, based on the previous research, this paper uses the means of extracting the key information of human body motion to identify the specific action mode and takes the specific gesture as a sample to extract the feature of the acquired surface EMG. Use PCA to reduce feature dimensions and eliminate redundant information and construct the GRNN neural network classifier to achieve the purpose of accurate pattern recognition, which is of great significance for the development of prosthetic control, clinical medicine, brain health, human–computer interaction, and other fields.

2 Feature extraction

After the EMG signal is segmented through the window, feature extraction can be performed on the intercepted signal in a single window. The properties of the extracted features affect the performance of the gesture recognition system, for example, the number and type of features affect the real-time accuracy of the system. In the field of signal analysis, the types of features are mainly time-domain features, frequency-domain features, time–frequency features, and nonlinear methods. According to the research results of the reference, only two time domain features, root mean square (RMS) and wavelength (WL), can obtain good classification results [24,25,26]. In addition, the median amplitude spectrum (MAS) and sample entropy (SampEn) are introduced [27,28,29,30,31]. Therefore, this study uses these four parameters as extracted features.

2.1 Acquisition of signal sets

This paper uses the electromyograph which has advanced surface EMG signal amplification technology and 16-channel high space-time resolution sampling technology. As shown in Fig. 1, the electrode sleeve can be used to quickly and easily collect the arm muscle signal. Among them, 18 dry electrode sheets are placed, among which the electrodes 1–16 are sampling electrodes, 17 is the reference electrode, and 18 is the bias electrode.

Fig. 1
figure 1

Electrode sleeve

As shown in Fig. 2, according to the research results of Xiong Caihua [26], Fang Yinfeng [32], and other people [33,34,35,36], 9 hand movements were planned for sEMG data collection. The 9 gestures involve the whole hand exercise, including palm closure (SH) and palm open (SK), wrist movement including wrist flexion (NQ) and external flexion (WQ), finger movement including thumb force on index finger (MS), middle finger (MZ), ring finger (MW), and the little finger (MX) and in addition to this test gesture action also include a rest action (RE). In each experiment, 10 trials were repeated for each action. The repeated method is to rest for 5 s, keep the action for 5 s, repeat 10 times, and collect for 3 consecutive days, using the same collection method every day. This method can be used to obtain temporal and spatial differences in myoelectric signals of the same individual.

Fig. 2
figure 2

Static gesture

2.2 Time-domain characteristics

  1. 1.

    Root mean square (RMS) is the measure of the amplitude of the EMG signal, which can be expressed as:

    $${\text{RMS}} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {x_{i}^{2} } }$$
    (1)

    where N is the length of the window and is the i th sample point. Similar to RMS, the absolute value of the integral and the mean absolute value (MAV) have been proven to have the same performance in manual identification. Therefore, this paper chooses RMS as the representative.

  2. 2.

    Waveform length (WL) refers to the cumulative length of the EMG waveform. Its principle is:

    $${\text{WL}} = \sum\limits_{i = 1}^{N - 1} {\left| {x_{i + 1} - x_{i} } \right|}$$
    (2)

    where N is the length of the window and i is the ith sample point.

2.3 Frequency-domain characteristics

The median amplitude spectrum (MAS) reflects the relationship between the amplitude and frequency of a wave or wave train. The MAS feature can ignore the extreme value, so it reflects the average characteristics of the signal in the frequency domain. The principle of the median amplitude spectrum is as follows:

$${\text{MAS}} = \frac{1}{2} \times \sum\limits_{i = 1}^{N} {\left| {\frac{{{\text{FFT}}(S)}}{L}} \right|}$$
(3)

where S is a segment of the original signal, N is the length of the window, L is the length of the signal, and FFT is the fast Fourier transform.

2.4 Sample entropy

SampEn (sample entropy) is a method of measuring the complexity of time series. Its sample entropy can be expressed as SampEn(m, r, N), where N is the length of time and r is similar tolerance; dimensions are defined as m and \(m + 1\). The sample entropy is used to reduce the error of the approximate entropy and is closer to the existing random part [4, 37, 38].

When N is finite, the equation is expressed as:

$${\text{SampEn }}(m,r,N) = - \ln \tfrac{{B^{m + 1} (r)}}{{B^{m} (r)}}$$
(4)

Among them, \(B^{m} (r) = L/(N - m - 1)\)

$$d_{x(i)x(j)} = \max_{k} \left( {\left| {x_{(i + k)} - x_{(j + k)} } \right|} \right)\quad \, k = 0, \ldots ,m - 1$$

For a given threshold r, for each i, the number of occasions whenever \(d_{x(i)x(j)}\) is less than r is specified as the number of template matches represented by L, and the ratio of the number to the total number of distance \(d_{x(i)x(j)}\) is expressed as \(B_{i}^{m} (r)\).

3 Feature dimensionality reduction based on PCA

The training of the pattern classifier for EMG signals depends on the large training database, but these multifarious data reduce the training speed of the model, so the highly efficient features or channel data will be selected from it, but it has great uncertainty for different gestures or collection methods. There are different combinations of requirements, and if a large portion of the information is lost in the sample arrangement, the coupling situation will occur, which leads to a large reduction in the classification ability of the classifier [39, 40]. Therefore, it is necessary to improve the efficiency of data utilization by means of dimensionality reduction, and avoid overloading of the classifier [41, 42]. In addition, dimensionality reduction can eliminate redundant information and prevent non-essential information from interfering with the correct judgment of the classifier. Therefore, when we do a lot of pattern recognition, we need to introduce the method of feature dimensionality reduction [43, 44].

For the problem of reducing the dimension of high-dimensional signals, the strategy is to assume that the data are linearly separable in low-dimensional space, and the main representative algorithm is the principal component analysis (PCA) [45]. The algorithm has developed a complete theoretical system, and has shown good performance in practical applications.

3.1 PCA principle

In the basic idea of principle component analysis (PCA), a small number of new variables (linear combinations of the original variables) are used instead of the original variables [46,47,48,49]. The new variable should reflect the signal information of the original variables to the maximum extent, and at the same time, the new variables are orthogonal to each other and can be used to eliminate the overlapping information in the original variables.

The standardized input variable matrix of the sample is as follows:

$$X = \left[\begin{array}{cccc} x_{11}& x_{12}&\ldots& x_{1k} \hfill \\ x_{21}& x_{22}& \ldots &x_{2k} \hfill \\ &&\ldots& \hfill \\ x_{n1}& x_{n2}& \ldots& x_{nk} \hfill \\ \end{array} \right]$$
(5)

It is required to construct a variable \(P_{1}\) to meet the following conditions:

$$P_{1} = Xt_{1} , \, \left\| {t_{1} } \right\| = 1$$
(6)

On the other hand, the variables are enabled to carry information that normalizes the input variable matrix \(X_{n \times k}\).

From the viewpoint of probability and statistics, the greater the variance of a variable, the more information it carries. Therefore, the above problems can be transformed into the maximum variance of the required variables. \(P_{1}\) variance is

$${\text{Var}}(P_{1}) = \frac{1}{n}\left\|{P_{1}}\right\|^{2} = \frac{1}{n}t_{1}^{\prime }X^{\prime }Xt_{1} = t_{1}^{\prime } \quad Vt_{1} V = \frac{1}{n}X^{\prime }X$$
(7)

Constructing a Lagrangian function

$$L = t_{1}^{\prime }Vt_{1} - \lambda (t_{1}^{\prime }t_{1} - 1)$$
(8)

Among them, \(\lambda_{1}\) is the Lagrangian coefficient. To calculate the partial derivatives of \(L\) to \(\lambda_{1}\) and \(t_{1}\) separately and make them zero, there are the following rules:

$$\left\{ \begin{aligned} \frac{\partial L}{{\partial t_{1} }} = 2Vt_{1} - 2\lambda_{1} t_{1} = 0 \hfill \\ \frac{\partial L}{{\partial \lambda_{1} }} = - (t_{1} 't_{1} - 1) = 0 \hfill \\ \end{aligned} \right.\quad Vt_{1} = \lambda_{1} t_{1}$$
(9)

It can be seen that \(t_{ 1}\) is a normalized feature vector of \(V\), and \(\lambda_{ 1}\) is its corresponding feature value

$${\text{Var}}(P_{1} ) = t_{1} 'Vt_{1} = t_{1} '\lambda_{1} t_{1} = t_{1} 't_{1} \lambda_{1} = \lambda_{1}$$
(10)

The required \(t_{ 1}\) is the normalized feature vector corresponding to the maximum eigenvalue \(\lambda_{ 1}\) of the matrix \(V\). The corresponding structural variable \(P_{ 1} = Xt_{1}\) at this time is called the first principal component.

By analogy, the \(m\) th principal component of \(V\) can be found as \(P_{m} = Xt_{m}\).

The sum of the information carried by the former \(m\) principal components is:

$$\sum\limits_{i = 1}^{m} {{\text{Var}}(P_{i} )} = \sum\limits_{i = 1}^{m} {\lambda_{i} }$$
(11)

The data dimension obtained by transforming in the new coordinate space is the same as the data dimension of the original space. It is worth noting that the variance of the data in the new coordinates mainly exists in previous several dimensions, so we can achieve the purpose of dimensionality reduction by retaining only the principal components of the first few dimensions.

3.2 Pretreatment

The feature data need to be preprocessed before dimension reduction, in order to unify the order of magnitude of each feature parameter. Because the magnitude of each feature parameter is different, even a big difference. As shown in Fig. 3, it can be seen that the WL features are of the largest order from Fig. 3a, and other features are covered when gesture recognition is performed. The classifier is sensitive to large values of data, ignoring other features, so the features need to be normalized.

Fig. 3
figure 3

Comparison of single channel eigenvalues before and after normalization

The feature values are mapped into the (0, 1) space by min–max normalization shown in part b of Fig. 3. So the features are in the same order of magnitude, and the features can be merged without overlapping each other, while simplifying the calculation.

3.3 Way of dimensionality reduction

This paper uses four features to be integrated into one feature set. The feature set includes root mean square (RMS), wavelength (WL), sample entropy (SampEn), and median amplitude spectrum (MAS) for a total of 64 dimensions (16 channels × 4 features).

In the experiment, after the 64 feature set is analyzed by the principal component analysis, a new 64-dimensional feature set which is arranged according to the variance of the components is formed. Then, the first n dimensions of the new feature set are taken as the reduced dimension feature set. The value of the dimensional k after the optimal dimensionality reduction is determined by the calculated principal component ratio which should be above 95%.

As shown in Fig. 4, the contribution rate of the first three principal components is accounted for, and the first three accumulative accounts are 98.3%. Therefore, \(k\) value of this experiment is 3, and the first three principal components are taken as the new dimensionality reduction eigenvectors.

Fig. 4
figure 4

Principal component contribution rate

Before the classification, the training components and the first two components of the test set are compared, as shown in Fig. 5.

Fig. 5
figure 5

Principal component analysis of training set and test set

It can be found that the first and second principal components of the test set and the training set are substantially coincident, and the classification effect is best when the main component of the training set is included in the test set.

As shown in Fig. 6, these two images show the feature separation before and after PCA processing, and according to the value of k, the first three features are also taken before dimension reduction. As shown in Fig. 6a, obviously, each gesture cannot be distinguished well. As shown in Fig. 6b, the first three features which have the principal component are used to test feature separation. The divergence of the scatter plot for each gesture is very high, and it is clear that they can be effectively distinguished.

Fig. 6
figure 6

Comparison of feature separation before and after PCA dimension reduction

4 GRNN neural network classifier

4.1 GRNN network principle

The GRNN network proposed by Specht [50] is a variant of a radial basis neural network and is commonly used for approximation functions. As a generalization of radial basis function (RBF) and probabilistic neural network (PNN) networks, GRNN networks do not require an iterative process for training and high degree of parallelism are presented in their structure. This network can be used for predictive modeling, mapping, and interpolation or as a controller [51,52,53,54].

The architecture of the GRNN network is shown in Fig. 7. The input layer receives a vector X containing the M input variables of the network. The number of neurons constituting the layer corresponds to the number of training patterns stored in the weight matrix w1.

Fig. 7
figure 7

Schematic diagram of the GRNN network

When a new vector is input to the network, the distance between the input vector and the vector stored weight is typically calculated in \({\text{Dist}}\) block using the Euclidean distance. The output of block \({\text{Dist}}\) is multiplied by the polarization factor \(b\) point by point. The result of this multiplication is applied to the radial basis function provided as output \(a_{1}\).

The second layer performs the sum of the outputs \(a_{1}\) according to the number of outputs required. The weight matrix \(\omega_{1}\) of this layer already stores the target vector containing the desired output. The output vector \(m_{2}\) is obtained by adding the product of each element of \(a_{1}\) to the elements of each vector stored in \(\omega_{1}\), normalized by the sum of the elements of \(a_{1}\). Then, when an input vector \(X\) is stored in the network beside the training vector \(x_{i}\) in the first layer, the vector \(X\) produces an output \(a_{1i}\) close to 1 in the first layer. This results in the output of the second layer being the vector \(m_{2i}\) next to it, one of which is stored in the second layer.

The GRNN network has the advantages of simple structure, fast training, and few adjustment parameters. In addition, compared with the feedforward network, the calculation results of the GRNN network have global convergence. So in this paper, GRNN is used as a classifier for supervised gesture recognition.

4.2 Gesture recognition experiment results and analysis

The data are divided into a test set and a training set; the training set is used to train the classifier parameters, and the test set is used to test the classifier training.

  1. 1.

    Identification results before dimensionality reduction

The feature values of each gesture are randomly divided into two groups: one is a training set and one is a test set. The training set contains 250 sets of data for each gesture, and each test set contains 60 sets of data.

The sleeves are collected in a total of 16 channels which are not used as input for pattern recognition at the same time. Instead, 16 channels are arranged in a gradient to construct multiple different combinations of classifiers. According to the arrangement and combination, 136 classifiers are obtained. They are arranged as shown in Fig. 8.

Fig. 8
figure 8

Input channel arrangement

As shown in Fig. 9, the correct rate comparison chart of the 136 classifiers corresponding for the four features. The abscissa of the graph represents the label of 136 models, and the ordinate represents the correct rate of the model.

Fig. 9
figure 9

Accuracy of recognition results for four features

The correct rate distribution of the four graphs in Fig. 9 can be analyzed, the more the input channels of the classifier, the better the effect of pattern recognition, the less the input channel, the worse the effect of pattern recognition. However, in some cases, the data of each channel affect each other, and too many channels will cause the pattern recognition success rate to decrease.

The accuracy and calculation time of the classifier are evaluated separately. The classification results of the four characteristics are shown in Table 1. The accuracy of the identification of the four features and the average time of operation are recorded in Table 

Table 1 Comparison table of feature classification results
  1. 2.

    Experimental results after dimensionality reduction

After dimension reduction, the first three principal component features are used for pattern recognition in the GRNN classifier; the correct rate is 95.1%, and the average operation time is 0.19 s, as shown in Figs. 10 and 11. The accuracy of gesture recognition is higher than the average of the success rate of previous individual features.

Fig. 10
figure 10

Identification results after dimensionality reduction

Fig. 11
figure 11

Average operation time after dimensionality reduction

Principal component analysis is a simple and efficient unsupervised feature dimension reduction method. After the dimensionality reduction process, the feature size can be reduced, the redundant information can be reduced, the accuracy of the pattern recognition can be improved, and the stability of the classifier can be improved. At the same time, the dimensionality reduction method can be used to reduce the structure of the simplified classifier and enhance the real-time performance of the classifier. Compared with other recognition systems, such as BP neural network [2, 22] and D–S evidential theory [3], the method of this paper has greatly improved the recognition efficiency and accuracy.

5 Conclusion

This paper studies the recognition of static gestures based on EMG signals. The main characteristics of surface EMG signals, non-stationary, nonlinear, non-deterministic, etc., make effective feature extraction and pattern recognition difficult. In addition, choosing a fast, simple, and effective pattern classification scheme has become a difficult problem that must be faced before pattern recognition. Based on the extraction of RMS, WL, MAS, and SampEn, this paper reduces the feature dimension and eliminates redundant information by the PCA algorithm. At the same time, the classifier of GRNN generalized regression neural network is constructed to achieve highly efficient and highly accurate static gesture recognition target. The accuracy of the resulting gesture recognition was significantly improved to 95.1%, and the calculation time was only about 0.19 s, which made it possible for real-time processing.

In this paper, the feature extraction of static gestures based on EMG signals is studied. Although some theoretical and experimental results have been obtained, there are still many problems that can be further discussed. The selection of the original features, the specific choice of features and the combination of features need to be further studied; in addition to the method of transforming dimensionality reduction, the feature selection method can be used to reduce the dimension, but the specific algorithm needs to be further determined.