1 Introduction

With the introduction of smart medical technology, the application construction in the medical and health field has been greatly developed [1, 2]. The hand function damage is mainly caused by high-intensity hand movements, accidents or strokes, etc., which require medical staff to regularly carry out rehabilitation training on the hands of patients, consuming manpower, material, and financial resources. As a kind of rehabilitation technology that enables patients to train themselves, the hand medical monitoring system is gradually receiving social attention, and the smart medical-real-time medical monitoring system characterized by safety and high efficiency is also gradually realized [2]. Then the medical monitoring technology based on electromyography (EMG) signals needs to rely on the harmonious interaction between people and computers. It is well known that the process of communicating between people and computers through various kinds of information is called human-computer interaction (HCI). The HCI requires the computer to understand the way humans communicate, that is, to understand the information expressed by human body language through various information such as sight and hearing [3,4,5]. The hand, as the most flexible limb in the human body, can express rich communication information through gestures. Therefore, pattern calculation-based gesture is one of the important ways to realize HCI. Nevertheless, the pattern calculation of gestures is essentially a pattern recognition problem. High-accuracy and real-time gesture recognition is one of the goals pursued by pattern recognition technology [6,7,8].

For EMG-based gesture recognition, feature selection and classifier design are the two main areas of research. At present, the well-known feature selection methods are mainly divided into three types: Filter [9], scoring each feature according to divergence or correlation, setting a threshold or a number of thresholds to be selected to select features; Wrapper [10], according to the objective function, selects several features each time, or excludes several features; Embedded [11], first uses some machine learning algorithms or models to train, obtains the weight coefficients of each feature, and selects features according to the coefficients from large to small. The selection of features is mainly to reduce the number of features, make the generalization ability of the model stronger, and enhance the understanding between features and eigenvalues [12]. The feature selection method proposed in this paper belongs to the first-filter in the above method. The redundant feature is evaluated according to the average recognition rate of the feature subset and refines the EMG feature set. The classifier design needs to train the classifier model through a machine learning algorithm to perform gesture classification. At present, the mainstream classification methods are mainly based on deep learning (DL) EMG gesture recognition and machine learning (ML)–based EMG gesture recognition [12,13,14,15].

The core idea of the DL-based EMG gesture recognition method is to define the problem of gesture recognition based on surface EMG as an image classification problem. Secondly, after the data preprocessing and sliding window segmentation, the features extracted from the raw EMG signal are imaged. Finally, the feature image is input into the deep neural network for gesture recognition [16, 17]. Although the deep learning network has certain recognition accuracy, it needs to learn the features of different abstraction levels from a large number of input samples [16, 18, 19]. In other words, the network requires a large amount of sample and a lot of time support training; it has certain difficulties in meeting the real-time requirements of pattern recognition. Therefore, the ML-based gesture recognition still has great research value. The ML-based EMG gesture recognition methods usually require preprocessing of amplitude amplification and filtering and noise reduction of the raw EMG signal, then segment and sample the EMG signal through Hamming window, and extract one or more EMG features [20, 21]. After performing necessary dimensionality reduction on the feature set composed of multiple EMG features, it is divided into a training set and test set, and input into the classifier model trained by ML method for hand motion recognition. Among them, the research of this method mainly focuses on the three aspects: preprocessing of the raw signal, feature extraction, and classifier design [22, 23]. The raw signal preprocessing is usually to filter the collected signal or enhance the data, such as the design of Butterworth filter, wavelet denoising, and adding Gaussian noise [20]. The feature extraction strategy is based on the preprocessed EMG signal, extracting useful manual feature information from the time domain (TD), frequency domain (FD), or time-frequency domain (TFD) of the signal to form a single- or multi-feature feature vector set for classifier training and testing [22], Finally, the well-known pattern recognition methods based on machine learning include the following: K-nearest neighbor (KNN) [24], artificial neural network (ANN) [25], support vector machine (SVM) [24, 25], and linear discriminant analysis (LDA) [24].

But for the research in the above three fields, how to choose the optimal signal preprocessing method, the optimal EMG feature vector set, and the most suitable machine learning network to build a set of optimal hand motion prediction system based on machine learning is still a problem. Therefore, this paper uses the first subset of NinaPro, a large public data set constructed by Atzori team [26, 27], as the data source, and preprocesses the raw data with a 1-Hz Butterworth low-pass filter according to its evaluation protocol. And then, this paper extracts 11 kinds of TD and 5 kinds of FD EMG features from the processed data, evaluates the optimal number, and selects a combination of features in EMG feature set combining the arrangement and combination of features with gesture recognition rate obtained by support vector machine. Finally, this paper compares four different machine learning recognition models with the optimal EMG feature set: KNN, ANN, SVM, and LDA to construct the optimal hand motion prediction framework based on surface EMG signals and improve the hand medical monitoring system.

The rest of the paper is organized as follows. Section 2 introduces the research status of this paper. Then, Section 3 describes the structure of NinaPro data set and data preprocessing methods. And Section 4 elaborates the related information of 16 EMG features mentioned in this paper. In Section 5, this study first describes the basic working principle of four gesture classifiers, then classifies and recognizes 136 combinations of EMG features based on NinaPro database, evaluates the effectiveness of the best EMG set, and finally compares four gesture classifiers based on machine learning, selects the best gesture classifier, and builds the best gesture motion prediction system. In the last part, Section 6 briefly summarizes the main research results of this paper and prospects future research directions.

2 Relate work

The larger the dimension of EMG feature set is, the higher the computational complexity of the classifier will be. Therefore, the dimensionality reduction of EMG feature set is usually the basis of improving classification performance, and in the process of reducing feature dimension, it is necessary to retain as much relevant information as possible. Dimension reduction strategies are mainly divided into two categories: feature selection and feature projection. This paper focuses on feature selection.

Oskoei et al. [10] proposed a cascaded genetic algorithm (CGA) as a search strategy in feature subset selection and used Davies-Bouldin index (DBI) and Fishers linear discriminant index (FLDI) as the subset classification performance evaluation function. The best subset was selected from the common 8 kinds of TD features and 4 kinds of FD features. Finally, the error rate was 6 ± 1.3% in combination with ANN. However, this GA-based feature subset search strategy only works well for the EMG feature set with medium dimension. As the feature dimension increases, its search performance will gradually decrease.

Yan et al. [9] proposed mutual information (MI)–based EMG feature subset search algorithm, which uses cluster separation index as the feature subset divisibility evaluation index and removes redundant features without compromising the classification recognition rate, and then obtained reduced EMG feature set. Finally, they found that the reduced feature subset with 8 features has the best class separability. Then, as a comparison, the recognition performance of the resulting feature subset is evaluated using other inputs having the same number of reduced feature sets as SVM classifiers. And through experimental comparison and analysis, it is proved that the combination of MI-based feature subset selection and SVM technology gesture recognition is better than other common combinations, such as the combination of PCA and NN. However, the MI of two random variables is a measure of the two interdependent variables, which is related to the conditional entropy of two variables. Since the information provided by one variable is related to another variable, the uncertainty of the subset combination is reduced, and some feature subset combinations are bound to be missed.

Xing et al. [11] proposed a feature selection method based on the deep recursive search algorithm and used the standard measurement function Fisher to determine the recognition ability of different categories between feature subsets. Then, different categories were sorted according to the recognition result, and the feature subset with higher recognition rate was retained. Finally, they obtain the best feature subsets and form the final feature set. The recognition rate obtained by SVM is 98%. However, in this method, it is necessary to extract time and frequency information by using wavelet transform and select the node energy value of the WPT coefficient as the characteristic of the electromyography signal.

In this paper, each extracted feature first needs to be sorted according to relevance. Then, from the original EMG features set, the features with low recognition rate are removed one by one to form a new EMG features set. Subsequently, NUM subsets of EMG features with NUM-1 types of EMG features are selected from each new feature set (at this time, the number of features = NUM). Finally, based on the average classification recognition rate, the number of features of the optimal EMG feature combination is determined, and the specific combination of the optimal EMG feature set is determined based on the highest classification recognition rate. The design of the gesture classifier is the last and most important part of gesture recognition. In this paper, four well-known gesture classifiers are combined with the best feature subsets generated above for classification and recognition, and the best gesture classifier is selected by comparison analysis. At present, researchers have conducted some comparisons of classifiers:

Kuzborskij et al. [24] combined the EMG signals of 52 gestures in 27 subjects in the NinaPro DB1 dataset and extracted 5 well-known TD electromyography features and 2 TF electromyography features. Then, they reference four machine learning classifiers: LDA, KNN, multilayer perceptron (MLP), and SVM to classify and identify the combination of the above features. The simulation analysis shows that the classification performance of SVM is the best, MLP requires complex tuning to achieve higher recognition rate, and the recognition rate of KNN and LDA is not very satisfactory.

Omari et al. [28] extracted 10 types of EMG features from 8 gestures. Then they introduce four machine learning classifiers: LDA, KNN, SVM, and general regression neural network (GRNN), identify the feature vectors composed of 10 features, and evaluate the best gesture recognition classifier with the highest classification rate. The simulation analysis shows that the highest classification rate obtained by GRNN using wavelet coefficients is 95%, and the second-highest classification rate is 94% WAVE-WAMP-RMS set-based LDA classifier.

At the same time, Omari et al. [29] also extracted six types of EMG features from eight gestures. And they cite three kinds of gesture classifiers: LDA, quadratic discriminant analysis (QDA), and KNN, to identify the above six types of EMG features and their combinations. Simulation experiments show that the combination of four kinds of EMG features and LDA gesture classifier achieves a recognition rate of 98.56%, and the KNN algorithm has the best classification performance when the input parameter K = 5.

Anama et al. [30] proposed a fast classifier based on extreme learning machine (ELM) for classifying individual and combined finger movements of amputees and non-amputees. And they compared 4 other common pattern classifiers: LDA, KNN, SVM, and least squares SVM (LS-SVM). By collecting the EMG signals of 14 subjects (9 healthy, 5 non-healthy) and extracting 9 kinds of EMG features, the above classifier was used to classify and recognize the combination of 9 kinds of EMG features. The simulation analysis shows that although the radial basis function (RBF)-based ELM classifier has the best recognition performance, the classification accuracy of the SVM classifier is as accurate as 98.55% (amputed) and 99.5% (non-amputed), and both are better than LDA and KNN classifier.

Dhindsa et al. [31] extracted 15 types of EMG features from 10 groups of knee joint movements and used four pattern classifiers: LDA, KNN, Naive Bayes (NB), and SVM to recognize the above 15 kinds of EMG features. The simulation analysis shows that the SVM classifier with quadratic kernel performs best, with classification accuracy of 92.2 ± 2.2% and sensitivity of 90.

3 Data set and data preprocessing

3.1 NinaPro data set

Considering that most researchers have done hand motion recognition based on surface EMG signals, the methods mentioned usually use various advanced techniques to preprocess myoelectric data and extract features from the preprocessed EMG data and then use intelligent algorithms for classification prediction [10, 32,33,34]. Although there are certain differences in the specific methods, they follow the same hand movement recognition process and perform simulation experiments in a proprietary data set. At the same time, the database may cover 5–10 subjects and more than 10 static or dynamic gestures [35]. However, due to the differences in the quality of proprietary data sets, the various methods for studying surface EMG signals can only be compared with a certain extent, reducing the reliability of research results. Therefore, in order to reflect the reliability of the hand motion recognition framework proposed in this paper, the surface EMG dataset of this paper decided to call the NinaPro large public EMG database proposed by Atzori et al. [26].

The NinaPro DB1 sparse multichannel myoelectric dataset was developed by Atzori et al. [26] and is mainly used for the development of active prostheses and contains sparse multichannel EMG signals collected by 10 conventional EMG electrodes sparsely distributed on the forearm of the subject, and the sampling frequency is 100 Hz. At present, the database has been extended to 7 subsets. The multimodal data includes surface EMG signals, hand kinematics, hand dynamics, and other information and covers the movements required by most amputees in daily life. It is used by most research institutes related to EMG signals. This paper calls its first subset, DB1, which contains 52 different gestures (without rest) from 27 healthy subjects (20 male/7 female, 25/2 right-/left-handed, age 28.0 ± 3.4 years old), each of which is repeated 10 times per action. In order to avoid subject fatigue, each action lasts 5 s, then rest for 3 s. The 52 hand movements are completed in three exercises: (1) 12 basic movements of the fingers; (2) 8 isometric, isotonic hand configurations and 9 basic wrist movements; (3) 23 grasping and functional movements [35]. Each exercise is separated by 5 min to prevent muscle fatigue. The details of hand movements are shown in Fig. 1 [24, 35]:

Fig. 1
figure 1

Fifty-two hand movement details

3.2 Data preprocessing

Since the EMG signals of the database have been preliminarily processed, including signal synchronization and re-labeling [26]. Therefore, this paper only needs to consider the high-frequency noise of the acquisition equipment.

Referring to the EMG gesture recognition study on this dataset [12], this paper uses a second-order Butterworth filter with a cutoff frequency of 1 Hz (as shown in Fig. 2) to perform low-pass filtering preprocessing on each channel’s EMG signal to remove high-frequency noise. Figure 3 shows the comparison of waveforms before and after filtering of a section of the surface EMG signal.

Fig. 2
figure 2

Second-order Butterworth filter

Fig. 3
figure 3

Comparison of waveforms before and after EMG signal filtering

4 Feature extraction of surface EMG signals

EMG feature extraction is a strategy of extracting useful information hidden in surface EMG signals and removing redundant information. The features of myoelectric signals are mainly divided into TD, FD, and TFD. In this study, only the TD and FD features are considered. At present, many researchers have focused on the field of feature extraction and have appeared a variety of methods for extracting EMG features [36,37,38]. However, in the classification of features, the more the types of features, the larger the dimension of the EMG feature set, the higher the complexity, the longer the classifier runs, and may even affect the classification effect. Therefore, it is necessary to pay attention to the selection of feature types and remove the redundant features in the feature set.

Reference [39] proposes a time-domain feature set consisting of 4 TD features of mean absolute value (MAV), waveform length (WL), slope sign change (SSC), and zerp crossing (ZC) and has been applied in several related works of electromyography prosthetic control and gesture recognition [39, 40]. Reference [41] proposes a time-domain feature set consisting of 6 time-domain features: integrated EMG (IEMG), WL, VAR, SSC, WAMP, and ZC and combined with a proprietary GRA classifier, the average gesture recognition rate can reach 95%. Reference [42] extracted the electromyography feature set consisting of 7 features of RMS, mDWT, HEMG, MAV, WL, SSC, and ZC and obtained higher gesture recognition rate than other EMG features on the NinaPro DB1 dataset. However, due to the different sources of the benchmark datasets used in the above methods, it is difficult to qualitatively judge the pros and cons of each EMG feature set. Therefore, this article contrasts with reference [42] by calling the first subset DB1 of the publicly available NinaPro dataset built by Atzori et al. The subset consists of three exercises containing 12/17/23 (without rest) hand movements performed by 27 healthy subjects. In order to improve the efficiency of the experiment and the reliability of the experimental results, this paper mainly extracts 11 common TD features and 5 common FD features of 12 kinds of motions in the first exercise of DB1 and constructs the EMG feature set with the number of features NUM = 16. Then, in combination with the well-known SVM classifier, the traversal (that is, take NUM-1 features from NUM features, a total of \( {C}_{NUM}^{NUM-1} \) combinations), training, and testing of the subset (NUM = 15, 14, ..., 1) were performed under the total set. By comparing and analyzing the average recognition rate and the highest recognition rate of the feature sets under various feature numbers, it is proposed to determine the best feature combination and the optimal feature number of the optimal EMG feature set. Finally, in DB1, this paper uses four well-known classifiers to verify the performance of the EMG feature combination and determine the best combination of the best EMG features. It is worth noting that the feature set extracted in this paper does not include ZC features. Because the EMG signal in the NinaPro DB1 data set [42] has undergone full-wave rectification during the acquisition process, so that there is no negative signal value in the signal, so the ZC eigenvalue cannot be extracted. At the same time, in order to keep consistent with the evaluation protocol on the NinaPro dataset, this paper uses the sliding sampling window to segment and sample the signal, and set the window width to 200 milliseconds and the window displacement to 10 milliseconds. According to the data set structure, each repeated action will produce nearly 50 samples. In order to avoid the impact of the sample label offset on the computational reliability, this paper takes only the middle 30 samples for each repeated action as the final experimental data. Finally, 80% of the samples will be randomly selected in each data set as the training set, and the rest will be used as the test set for performance evaluation of the classifier

4.1 Time-domain electromyography feature extraction

4.1.1 Waveform length

In the process of signal change, the sum of the amplitude changes between adjacent data indicates the degree of change in the amplitude of the signal [39, 43]. This feature is defined as:

$$ WL=\sum \limits_{i=1}^{K-1}\mid {x}_{i+1}-{x}_i\mid $$
(1)

where, xi represents the magnitude of the EMG signal at the ith sample point, and K represents how many sample points are extracted from the EMG signal in a signal window, where K = 200.

4.1.2 Slope sign change

The SSC counts the number of changes in the data symbols in the signal sequence, which is another way to denote the frequency information of the electromyography signal [43]. At the same time, in order to avoid the effects of background noise in the electromyography signal, it can use a threshold function to generate positive and negative slope changes between three consecutive segments. And the threshold is usually between 50 휇v and 100 mv, depending on the instrument gainer settings and the level of background noise [44], this paper threshold = 0.001. This feature is defined as:

$$ {\displaystyle \begin{array}{l} SSC=\sum \limits_{i=2}^{K-1}\left\{f\left[\left({x}_i-{x}_{i-1}\right)\times \left({x}_i-{x}_{i+1}\right)\right]\right\}\\ {}f(x)=\Big\{\begin{array}{l}1,\kern1em if\;x\ge threshold\\ {}0,\kern1em otherwise\end{array}\end{array}} $$
(2)

4.1.3 Integrated EMG

The integrated EMG (IEMG) is typically used as a starting point detection index associated with the electromyography signal sequence emission point, represented as the sum of the absolute values of the EMG signal amplitude [36, 43]. This feature is defined as:

$$ IEMG=\sum \limits_{i=1}^K\mid {x}_i\mid $$
(3)

4.1.4 Mean absolute value

The mean absolute value (MAV) is one of the most common feature of surface EMG signal analysis and is similar to the IEMG feature used to establish the index [36, 43]. The MAV feature is the mean of the absolute value of the EMG signal amplitude in the data window, which is defined as:

$$ \mathrm{MAV}=\frac{1}{\mathrm{K}}\sum \limits_{i=1}^{\mathrm{K}}\mid {x}_i\mid $$
(4)

4.1.5 Root mean square

The root mean square (RMS) can be used to measure the power of the EMG signal, while RMS can also represent the energy of the signal, with a clear physical meaning [36, 43]. Therefore, RMS can be used to determine the amount of muscle production, and to measure the duration time of muscle activity, and to determine when to start activities and when to stop activities [45]. This feature is defined as:

$$ \mathrm{RMS}=\sqrt{\frac{1}{K}\sum \limits_{i=1}^K{x}_i^2} $$
(5)

4.1.6 Auto-regression coefficient

This feature enables a single EMG signal to be modeled as a linear autoregressive time series and can provide information about muscle contraction status [36, 44]. This feature is defined as:

$$ {x}_k=\sum \limits_{i=1}^P{a}_i{x}_{k-i}+{e}_k $$
(6)

where ai represents an autoregressive coefficient, which is often used as a feature in the recognition of the hand motion of the myoelectric signal, and P represents the order of the AR model. The order P = 1 in this paper, ek represents the residual white noise.

4.1.7 Sample entropy (SampEn)

Since the Sample Entropy does not contain a comparison of its own data segments, its calculation does not depend on the length of the data, and the Sample Entropy is more accurate and has better consistency than the Approximate Entropy [46]. In order to calculate the Sample Entropy, it is first necessary to embed the scalar time series {x1, x2, …, xi, …, xK} into the delayed m-dimensional space, where the vector is constructed as:

$$ x(p)=\left[x(p),x\left(p+1\right),...,x\left(p+m-1\right)\right],p=1,2,...,K-m+1 $$
(7)

Given the threshold r, the probability Bm(r) of two vectors matching m points is calculated by calculating the average number of vector pairs, and the distance between the two vectors is defined to be less than r. Similarly, adding 1 to the dimension, that is, for the vector of m+1 points, yields Bm + 1(r). Therefore the mathematical expression of this feature is as follows:

$$ SampEn\left(m,r\right)=\underset{x\to \infty }{\lim}\left\{-\ln \left[{B}^{m+1}(r)/{B}^m(r)\right]\right\} $$
(8)

where the value of SamPen is related to the value of the embedded dimension m and the threshold r. Generally, when m = 1 or 2, r = (0.1~0.25) SD, the calculated sample entropy has more suitable statistical properties [22]. SD represents the standard deviation of the original data of x(i), i = 1, 2, ..., K.

4.1.8 Simple square integral

The simple square integral (SSI) uses the energy of the electromyography signal as the EMG feature, which represents the sum of the squares of the electromyography signal amplitudes [36, 47]. Typically, it is defined as:

$$ SSI={\sum}_{i=1}^K{x}_i^2 $$
(9)

4.1.9 Willison amplitude

The Willison amplitude (WAMP) is a measure of the frequency information in the electromyography signal, indicating the number of times the difference in signal amplitude between two adjacent segments exceeds a predetermined threshold, related to motor unit action potential and muscle contractility [36, 44]. This paper threshold = 0.01. This feature is defined as:

$$ {\displaystyle \begin{array}{l} WAMP=\sum \limits_{i=1}^{K-1}\left[f\left(|{x}_i-{x}_{i+1}|\right)\right]\\ {}f(x)=\Big\{\begin{array}{l}1,\kern1em if\;x\ge threshold\\ {}0,\kern1em otherwise\end{array}\end{array}} $$
(10)

4.1.10 Modified mean absolute value 1

The improved mean absolute value (MAV) represents an extension of the MAV feature, and the weighted window function wi added in the equation can improve the robustness of the MAV feature [36, 43]. This feature is defined as:

$$ {\displaystyle \begin{array}{l} MAV1=\frac{1}{K}\sum \limits_{i=1}^K{w}_i\mid {x}_i\mid \\ {}{w}_i=\Big\{\begin{array}{l}1,\kern1em if\;0.25K\le i\le 0.75K\\ {}0.5, otherwise\end{array}\end{array}} $$
(11)

4.1.11 Modified mean absolute value 2

The improved MAV also represents an extension of the MAV feature. At this time, the weighted window function wi added in the equation belongs to a continuous function, which can effectively improve the smoothness of the weighting function [36, 43]. This feature is defined as:

$$ {\displaystyle \begin{array}{l} MAV2=\frac{1}{K}\sum \limits_{i=1}^K{w}_i\mid {x}_i\mid \\ {}{w}_i=\Big\{\begin{array}{l}1,\kern3.24em if\;0.25K\le i\le 0.75K\\ {}\frac{4i}{K},\kern3em else\;i<0.25K\\ {}\frac{4\left(i-K\right)}{K}, otherwise\end{array}\end{array}} $$
(12)

4.2 Frequency domain electromyography feature extraction

4.2.1 Frequency ratio

This feature can distinguish muscle contraction and relaxation from the ratio of the low-frequency portion to the high-frequency portion of the myoelectric signal [36]. The mathematical expression for this feature is as follows:

$$ FR=\sum \limits_{j= ULC}^{ULC}{P}_j/\sum \limits_{j= UHC}^{UHC}{P}_j $$
(13)

where Pi represents the EMG signal power spectrum at band j, ULC and LLC belong to the upper and lower cutoff frequencies of the low band, and UHC and LHC are the upper and lower cutoff frequencies of the high band. The thresholds for dividing the low and high bands can be determined in two ways [48, 49]: (1) it is determined by experiments that the low-frequency band is 30–250 Hz and the high-frequency band is 250–1000 Hz; (2) the high-frequency band and the low-frequency band are determined by using the feature values of the mean frequency (MNF).

4.2.2 Mean frequency

The MNF is defined as the ratio of the sum of the product of the electromyography signal power spectrum and the frequency to the sum of the spectral intensities [36, 50]. The mathematical expression for this feature is as follows:

$$ MNF=\sum \limits_{j=1}^M{f}_j{P}_j/\sum \limits_{j=1}^M{P}_j $$
(14)

where fi represents the spectral frequency at band j and M represents the band length.

4.2.3 Median frequency

The median frequency (MDF) represents half of the TTP feature, dividing the spectrum into two regional frequencies of the same magnitude [36, 50]. The mathematical expression for this feature is as follows:

$$ \sum \limits_{j=1}^{MDF}{P}_j=\sum \limits_{j= MDF}^M{P}_j=\frac{1}{2}\sum \limits_{j=1}^M{P}_j $$
(15)

4.2.4 Mean power

The mean power (MNP) represents the average power of the electromyography signal power spectrum [43, 48]. The mathematical expression for this feature is as follows:

$$ MNP=\sum \limits_{j=1}^MM $$
(16)

4.2.5 Peak frequency

The peak frequency (PKF) represents the frequency corresponding to the maximum power in the spectrum [36]. The mathematical expression for this feature is as follows:

$$ PKF=\max \left({P}_j\right),\kern0.5em j=1,\cdots, M $$
(17)

Finally, after extracting the above features from the electromyography signal, each EMG feature will be combined to produce a 10 channel × 16 features = 160-dimensional EMG feature set. In order to make the performance of each feature have certain comparability and to improve the algorithm convergence speed and the classification accuracy, this paper uses the zero-mean normalization to process the feature values extracted by each channel [50, 51]. The mathematical expression of the normalization method is as follows:

$$ \overset{\wedge }{{\mathrm{c}}_i}=\frac{c_i-E\left({c}_i\right)}{\sqrt{Var\left({c}_i\right)}} $$
(18)

where ci represents the value before the ith channel normalization of the electrode data, \( \overset{\wedge }{{\mathrm{c}}_i} \) represents the value after the ith channel normalization of the electrode data, E(ci) represents the mean of the ith channel, and \( \sqrt{Var\left({c}_i\right)} \) represents the standard deviation of the ith channel. In order to find the feature subset with the optimal combination of EMG features through experiments, the feature set [WL, SSC, IEMG, MAV, RMS, AR, SampEn, MAV1, MAV2, FR, MNF, MDF, MNP, PKF] is added with labels from 1~NUM in order according to the number of corresponding feature set features.

5 Pattern recognition and experimental simulation analysis

5.1 Gesture classifier based on machine learning

In the process of constructing the pattern recognition model, the classifier model refers to a decision rule based on the generated feature space, which aims to realize the discriminant classification of unknown objects under the pattern recognition system [25, 52, 53]. Compared with the large number of methods of feature extraction, the general classifier for the classification of myoelectric motion has only a small part. This article will design feature classifiers from simple statistical methods to more advanced machine learning techniques. Therefore, consider the four well-known classifiers [24, 28,29,30,31, 54] used in the relevant literature: LDA, KNN, SVM, back propagation neural network (BPNN). These classifiers are then used to identify the EMG feature with feature numbers in the previous section. Finally, through the cross-comparison verification test results, this paper will select the best feature combination and EMG gesture classifier suitable for accurate and rapid recognition of various hand movement modes.

5.1.1 Linear discriminant analysis

LDA is a document topic generation model, which can also be called a three-layer Bayesian probability model. The basic idea of the algorithm is to project high-dimensional model samples into the optimal discriminant vector space to achieve the effect of extracting the classification information and compressing the feature space dimension [24, 30, 31]. At the same time, LDA has a low degree of coupling between the classes it wants, and a high degree of aggregation within the class. That is to say, the value of the inter-class divergence matrix is as large as possible, and the value of the intra-class divergence matrix is as small as possible; hence, it can obtain a good classification effect.

5.1.2 K-nearest neighbor

KNN is a more mature myoelectric gesture classification method, which is classified by measuring the distance between different measured values. Although its concept and calculation are simple, if it has enough training samples, it can also demonstrate its excellent classification performance. The main idea of the algorithm is if the majority of the k most neighboring samples in a feature space belong to a certain category [30], then the sample also belongs to this category. In the decision-making of classification, the method only decides the category of the sample to be divided according to the category of one or more nearest samples. In addition, the performance of the algorithm depends on the selection of a suitable distance measurement function, while the distance calculation generally uses Euclidean distance or Manhattan distance [28, 30, 31]. According to the simulation analysis, the distance measurement function of KNN described in this paper uses the Euclidean distance and takes K = 5 [29].

5.1.3 Support vector machine

SVM, as a new pattern recognition classification technology, has a multi-input single-output learning structure under a three-layer network. Its working principle is shown in Fig. 4. In the figure, the original data is preprocessed to obtain a sample set, and then the input space is transformed into a low latitude to high latitude by a suitable kernel function (the RBF kernel function is selected in this paper). The invisible transformation can construct a high-dimensional space through the inner product rather than the mapping function itself, and eliminating the problem that the mapping function expression is difficult to display; hence, the nonlinear problem can be linearly separable [25, 28, 31]. Finally, we can search for the support vector 1support vector n and construct the optimal separation hyperplane in the high-dimensional space, so that the learning machine can complete the training. The support vector machine also has the following features [28]: (1) Since the support vector machine is based on mathematical statistics, it can input a small number of samples through the risk minimization structure and classify the patterns according to the regular features of the training samples, so it has a high generalization power; (2) Support vector machine can solve the problem of input samples over-learning with high data and high dimension; (3) For complex problems, the support vector machine classifier can use the quadratic programming and analyze it two or two to obtain the global optimal solution; (4) Finally, the support vector machine has no special requirements on the dimension of the input data.

Fig. 4
figure 4

SVM algorithm working principle diagram

5.1.4 BP neural network

BPNN is a common used supervised neural network learning algorithm. The purpose of learning is to use the error between the actual output of the network and the expected output to modify its weight so that the actual and expected are as close as possible. Even if the square of the error of the network output layer is minimized, the target is gradually approached by continuously calculating the change of the network weight and deviation in the direction of the slope of the error function [25]. The BP algorithm is mainly divided into two stages: The first stage input information calculates the output values of each unit layer by layer from the input layer through the hidden layer. In the second stage, the error of each unit of the hidden layer is calculated step by step from the output error, and the error of the front layer is corrected by this error [55, 56]. The structure of the BPNN is shown in Fig. 5.

Fig. 5
figure 5

The structural principle of BP neural network

The number N of input neurons is the same as the dimension of the EMG feature set, the number M of output neurons is the same as the hand movement to be recognized, and the number H of hidden layer neurons is determined by the empirical formula in reference [57], and specific parameter settings are shown in Table 1. The training process of BPNN can be summarized as “positive calculation output, back propagation error,” and this process is repeated until the error is reduced to an acceptable range, then the learning calculation process ends.

Table 1 Number of nodes in each layer of the neural network under different input characteristics

5.2 Gesture recognition framework construction and simulation analysis

5.2.1 Construction of optimal EMG feature set

In summary, considering that the NinPro DB1 data set contains a large number of subjects and gestures, the data set itself has a large amount of data, resulting in high classification complexity, which may increase the load of analyzing hardware, and may also cost lot of time. Therefore, considering that the experimental setups of the three exercises in DB1 are basically the same, the experimental conclusions have certain similarities with each other. In order to improve the efficiency of experimental simulation, this paper aims at the NinaPro DB1 dataset exercise 1 data, combined with the well-known EMG gesture classifier SVM, uses the gesture recognition accuracy rate as the feature set performance evaluation standard, and finally constructs the best EMG feature subset. First, this paper analyzed the recognition rate of the above 16 single EMG features in combination with SVM, and observed the separability of each feature. Then, this paper established a feature set including the above-mentioned number of EMG features NUM = 16, and randomly extracted NUM-1 kinds of EMG features to construct a feature subset, for a total of \( {C}_{16}^{15}=16 \) cases. Finally, this paper obtained the recognition rate of each feature subset by SVM simulation, as shown in Fig. 6.

Fig. 6
figure 6

Comparison of recognition rates of 16 single features and 16 feature subsets

As shown in Fig. 6, the yellow histogram indicates the recognition rate corresponding to the current single feature, and the blue histogram indicates the recognition rate of the feature subset with all the features remaining in the feature set except the current feature. For feature subsets, the EMG gesture recognition rate is significantly lower than other feature subsets only when the SSC feature is removed, indicating that the SSC feature can enhance the feature set recognition performance, but its distinguishability is particularly prominent. For a single feature, the individual recognition of some features is not good compared with other features, and the individual recognition rate of some features is significantly higher, reflecting the difference of single feature recognition. Combined with its corresponding feature separation (as shown in Fig. 7), it is easy to see that the separation of features such as WL, IEMG, MAV, RMS, and MDF is clear. The feature separation degree of SamPen, SSI, WAMP, MAV1, and MAV2 is second, and it has good separability when performing gesture recognition alone. The separation degree of AR, FR, MNF, and MNP features is rather confusing, and the effect of gesture recognition alone is not good and has certain inertia.

Fig. 7
figure 7figure 7

Sixteen kinds of EMG feature separation: IF and IE represent index flexion and extension, MF and ME represent middle flexion and extension, RF and RE represent ring flexion and extension, LFF and LFE represent little finger flexion and extension, TAD and TAB represent thumb adduction and abduction, and TF and TE represent thumb flexion and extension

Subsequently, by comparing the gesture recognition rate of a single feature, the author removes the inert features with poor gesture recognition rate from the feature set of the number of EMG features NUM = 16, and then takes NUM-1 feature from the new feature set with the number of EMG features NUM = 15 to construct a new feature subset, total \( {C}_{NUM}^{NUM-1} \) combinations. Finally, the author re-combined the SVM to evaluate the EMG gesture recognition performance of each feature subset under the NUM EMG feature number, and obtained the gesture recognition rate as shown in Table 2 through simulation analysis.

Table 2 Gesture recognition rate of all feature subsets under the feature set of different feature numbers

It is not difficult to see from Table 2 that no matter how many feature sets there are, the gesture recognition rate of the other feature subsets is not very different except that the feature subset of SSC feature be removed will lead to a sharp decline in the overall recognition rate. That is to say, when the average recognition rate of the subset of EMG features is the highest and tends to be stable, the number of EMG features in the total concentration of corresponding features can be initially determined as the number of features of the best feature set. Table 3 and Fig. 2 show the average recognition rate and maximum recognition rate (including specific feature combinations) of all subsets under the feature set of 15 different feature numbers after traversing the gesture recognition rate of all the subsets under the corresponding feature number, and the corresponding waveform diagram.

Table 3 Average recognition rate and maximum recognition rate of all subsets under the feature set of different feature numbers

As shown in Fig. 8, when the number of features in the feature set is NUM = 2~8 and 12, the average recognition rate of all subsets under the feature set is significantly increased for each feature added. When the number of features NUM = 8~11 and 12~16, the average recognition rate is maintained at a relatively high level, but it remains basically stable, indicating that the added features contribute little to the overall recognition rate and this feature are redundant. Therefore, it is possible to preliminarily determine the number of features of the optimal feature set NUM = 9, that is, the first two features plus numbers that the average recognition rate has significant rises. Combining the detailed data in Table 3, considering the requirements of reducing the number of features and improving the recognition rate of gestures, the author takes the combination of the best feature sets as [WL, SSC, IEMG, MAV, RMS, SamPen, WAMP, MNF, MDF]. Through SVM simulation verification, the feature set achieved a recognition rate of 97.1356%, which is higher than all feature combinations under the feature numbers and is very close to the highest gesture recognition rate of 98.0631% with 13 types of EMG features.

Fig. 8
figure 8

Comparison of the highest recognition rate and average recognition rate of 15 feature subsets

5.2.2 Optimal gesture classifier selection

Combining the above-mentioned recognition rate analysis of each feature subset under the feature set with various feature numbers, and for the data obtained in Table 3, the author combines the four well-known gesture classifiers based on machine learning and calls all the data in the NinaPro DB1 dataset (27 subjects and 52 gestures) for the best combination of features for each feature number (e.g., Table 3) to test analysis. The recognition rate result is shown in Fig. 9. When the number of features in the feature set is NUM ≤ 10 (the feature number of the subset is NUM-1), the gesture recognition rate of SVM and BPNN increases with the increase of NUM. When NUM > 10, with the increase of the number of features in the feature set, the feature set dimension increases and the complexity increases. The rate of BPNN’s gesture recognition rate gradually slows down until the highest is 8.131231% (the number of features of the subset is NUM-1 = 13). The SVM is basically in a stable state except for another increase when NUM-1 = 12, and the highest recognition rate is 94.7290% (the number of features of the subset is NUM-1 = 15). When NUM > 2, KNN’s gesture recognition rate is basically above 90%, and compared with the other three classifiers, with the change of NUM, the recognition rate does not increase much, and the highest recognition rate is 98.1988% (the number of features of the subset is NUM-1 = 9). Finally, the recognition rate of LDA has been at a relatively low level, indicating that the recognition performance of the data set is not strong. Obviously, KNN’s EMG gesture recognition performance is best here. Finally, the best combination of features [WL, SSC, IEMG, MAV, RMS, SamPen, WAMP, MNF, MDF] proposed in the previous section was verified. By using four gesture classifiers in turn, combined with the NinaPro DB1 dataset, we obtain the following: The SVM gesture recognition rate is 96.6605% (> 94.7290%); KNN’s gesture recognition rate is 99.2314% (> 98.1988%); Both LDA and BPNN have a gesture recognition rate of less than 90%. The results show that the gesture recognition rate obtained by the best feature combination is higher than the highest gesture recognition rate obtained by the corresponding classifier. Therefore, the best gesture recognition system combined with machine learning and based on the NinaPro DB1 dataset can be composed of optimal EMG feature sets [WL, SSC, IEMG, MAV, RMS, SamPen, WAMP, MNF, MDF] and KNN gesture classifiers.

Fig. 9
figure 9

Comparison of recognition rates of four classifiers

6 Conclusion

By calling the large publicly available NinaPro DB1 dataset and combining the SVM gesture classifier, the author selects the best separability features from 16 TD and FD EMG features through arrangement and combination in combinatorics to construct the optimal EMG feature set. At the same time, the author uses four well-known gesture classifiers based on machine learning: LDA, KNN, SVM, and BPNN to verify the above optimal EMG feature set, and select the best EMG gesture classifier to build the best hand motion prediction system. Finally, the simulation test shows that the gesture recognition rate based on NinaPro’s best EMG feature set and optimal EMG gesture classifier reaches 99.2314%. Although the optimal hand motion prediction system proposed in this paper can achieve a higher gesture recognition rate, due to the variety of subjects and hand movements in the DB1 dataset, there are certain individual differences, and the amount of data is large, resulting in longer classification time. Since real time is also one of the important technical indicators of pattern recognition, therefore, whether it is possible to reduce the number of features, reduce the feature set dimension, or optimize the EMG gesture classifier and combine deep learning methods to compress recognition time under ensuring the recognition accuracy of gestures should be one of the future research directions of researchers.