1 Introduction

Vision-based gesture recognition is a hot topic in human-computer interaction, which is in line with human’s natural communication habits, and can be used for long-distance and non-contact interaction. Vision-based gesture recognition involves technologies such as gesture detection segmentation, feature extraction and classification recognition. These techniques have developed in recent years, but there are still imperfections, including recognition rate and real-time [1, 2]. At present, sparse representation theory has been gradually introduced into pattern recognition, which mainly uses the test sample to identify the linear sparse representation of the dictionary composed of training samples, and has good robustness and recognition effect [3].

After the proposed method of SRC, the theory of sparse representation has been paid more and more attention by researchers, and it has great development potential and broad application prospects in the fields of pattern recognition [4,5,6,7,8,9]. In the construction of redundant dictionaries, the researchers put forward various methods, such as Aharon et al. [10] based on the K-means algorithm proposed K-Singular Value Decomposition (K-SVD) algorithm, using the tracking method for sparse coding, and the use of SVD singular value decomposition for the dictionary update and the corresponding sparse coefficient update, which greatly accelerating the training algorithm convergence rate. In order to improve the calculation speed of SRC algorithm, Li et al. [11] proposed a local SRC algorithm, KNN-SRC, based on sparse representation algorithm, and KNN algorithm was used to select K training samples are similar with test samples to form a local dictionary. However, the computational complexity of the above algorithm was large and the calculation time was long. Qi et al. [12] proposed a classification method based on sparse neighbor representation, which is suitable for non-linear distribution of sample set and low-dimensional data classification problem. It can improve the efficiency while ensuring the recognition rate, but the choice of local dictionary needs further study. In the sparse solution algorithm, there are greedy algorithm and convex optimization algorithm, respectively, to solve the minimized \(l_{0}\) norm and minimize \(l_{1}\) norm problem. Yang et al. [13, 14] analyzed five existing fast minimized \(l_{1}\) norm algorithms, namely Gradient Projection, Homotopy, Iterative Shrinkage-thresholding, Proximal Gradient, and Alternating Direction. In the face recognition experiment, Homotopy has a better effect in recognition rate and calculation time. Zhai et al. [15] also uses Homotopy algorithm to solve the sparse coefficient, combined with color information, which effectively improves the robustness of face recognition in the case of occlusion.

The convex optimization algorithm has high precision, but high computational complexity. In order to solve the problem of computational complexity faced by \(l_{1}\) norm solving algorithm, we introduce the idea of local sparse representation, according to the steps of KNN-SRC algorithm, we use the minimized \(l_{2}\) norm to select the local dictionary to calculate, the \(l_{2}\) norm local sparse representation classification algorithm was proposed. In order to verify the validity of the algorithm, the influence of the parameter K on the recognition rate was verified on two gesture libraries, and the optimal K value was selected. Finally, the recognition rate and the average run time of the algorithm on the two gestures were compared with the change of the dimension. The result shows that the algorithm can guarantee that Recognition rate while improving efficiency.

2 Sparse representation classification algorithm

According to the relevant references, the test samples can be represented by the sparse linear combination of the dictionary elements, and then the sparse reconstruction algorithm is used to solve the sparse coefficients. Sparse representation classification algorithm mainly involves two aspects, namely the redundant dictionary structure and the sparse coefficient solution [16, 17]. The following mainly introduces the framework model of sparse representation classification algorithm.

(1) The test sample is a linear representation of the training sample

y can be represented linearly by \(A_j \):

$$\begin{aligned} y=A_j v_j =a_{j,1} v_{j,1} +a_{j,2} v_{j,2} +\cdots +a_{j,n_j } v_{j,n_j } \end{aligned}$$
(1)

where n is the total number of sample;

$$\begin{aligned} A_j =[a_{j,1} ,a_{j,2} \ldots , a_{j,n_j } ]\in \mathfrak {R}^{m\times n_j }; \end{aligned}$$

\(v_j =[v_{j,1} ,v_{j,2} \ldots ,v_{j,n_j } ]\in \mathfrak {R}^{n_j \times 1}\) is the linear representation coefficient.

The type of test sample has not been determined, so the entire C class of gesture samples constitute a redundant dictionary matrix:

$$\begin{aligned} A= & {} [A_1 ,\ldots ,A_j ,\ldots ,A_C ]\\= & {} [a_{1,n_1 } ,\ldots ,a_{j,n_j } ,\ldots ,a_{C,n_C } ]\in \mathfrak {R}^{m\times n} \end{aligned}$$

y can be denoted by A as:

$$\begin{aligned} y=Ax \end{aligned}$$
(2)

where \(x=[0,\cdots 0,v_{j,1} ,v_{j,2} ,\cdots ,v_{j,n_j } ,0,\cdots ,0]\) is the sparse coefficient of test sample, and non-zero elements are corresponding to the j-th class. If \(S=\left\| x \right\| _0 \) and \(S\ll n\), then that x is sparse, S is the degree of sparseness.

(2) A sparse calculation method with minimized norm

Since the dimension of the dictionary A is less than the number of samples, that is \(m<n\), Eq. (2) belongs to the undetermined equation and the solution is not unique [18]. Equation (2) can be solved by minimizing the \(l_{0}\) norm:

$$\begin{aligned} \hat{{x}}=\mathop {\arg \min }\limits _x \left\| x \right\| _0 \quad s.t.\,\, y=Ax \end{aligned}$$
(3)

According to the theory of compression perception, when x is sufficiently sparse, Eq. (3) can be equivalent to solving the problem of minimizing \(l_{1}\) norm as:

$$\begin{aligned} \hat{{x}}=\mathop {\arg \min }\limits _x \left\| x \right\| _1 \quad s.t.\,\, y=Ax \end{aligned}$$
(4)

Due to environmental factors, the actual sample collection will be affected by noise, light, etc., the test sample cannot be better represented by linear combination of training sample. And the recognition rate is easily affected. To improve robustness, a noise constraint can be added:

$$\begin{aligned} \hat{{x}}= & {} \mathop {\arg \min }\limits _x \left\| x \right\| _0 \quad s.t.\,\, \left\| {y-Ax} \right\| _{_2 } \le \varepsilon \end{aligned}$$
(5)
$$\begin{aligned} \hat{{x}}= & {} \mathop {\arg \min }\limits _x \left\| x \right\| _1 \quad s.t. \,\, \left\| {y-Ax} \right\| _{_2 } \le \varepsilon . \end{aligned}$$
(6)

(3) The minimum residual classification

Fig. 1
figure 1

Gesture recognition framework

Fig. 2
figure 2

Sparse coefficient and residual graph of invalid test samples

The sparse coefficients of the linear combination are obtained, then the test samples can be reconstructed by using sparse coefficients and redundant dictionaries, and then the reconstructed samples and test samples for each class are compared [19, 20], judge the type of test sample based on the minimum residual.

$$\begin{aligned} \gamma _j (y)= & {} \left\| {y-A\delta _j (\hat{{x}})} \right\| _2 \quad j=1,2,\ldots ,C \end{aligned}$$
(7)
$$\begin{aligned} I(y)= & {} {\arg \min }_j r_j (y) \end{aligned}$$
(8)

where \(\gamma _j (y)\) is the residual for the j-th class of the sample, \(\delta _j (\hat{{x}})\) is the coefficient of the corresponding position of the j-th class sample, and the other position is 0; I(y) is the category of the test sample.

The SRC based gesture recognition framework is shown in Fig. 1.

When the gesture is recognized, the test sample may not belong to the gesture in the gesture sample library or not the gesture image. The classifier needs to judge the invalid gesture, and increasing the validity judgment can improve the robustness of the algorithm [21]. When the gesture recognition is done by sparse representation of the classification algorithm, the sparse coefficient is obtained on all the training samples [22]. In theory, the effective test gestures belong to the gestures in the sample library and can be expressed linearly by the similar gestures. The sparse coefficients are nonzero only in the same sample, the invalid test gestures may be distributed in a variety of samples [23]. The sparse coefficient and residual graphs of the ineffective test samples on the grab palette are shown in Fig. 2. It can be seen from the figure that the larger nonzero terms in the sparse coefficients are more distributed on the multi-class training samples. The residuals on the class are not very different, even if the result of the classification based on the minimum residual will be erroneous and meaningless.

Therefore, a Sparse Concentration Index (SCI) can be defined according to the distribution of sparse coefficients,

$$\begin{aligned} \hbox {SCI}=\frac{C\cdot \max _j \Vert {\hat{{x}}_j } \Vert _1 /\Vert {\hat{{x}}} \Vert _1 -1}{C-1}\in [0,1] \end{aligned}$$
(9)

where C is number of training samples; \(\hat{{x}}\) is sparse coefficient.

The SCI field is [0, 1], the effective test sample corresponds SCI value is 1 and the valid test sample corresponds SCI value is 0 in theory. In fact, SCI value is not strictly 0 or 1, you can set a threshold \(\tau \in (0,1)\), when \(\hbox {SCI}\ge \tau \), the test sample is valid and the next step of classification, otherwise it is judged to be invalid gesture. So before the SRC step 5 needs to determine the SCI value, which is better at judging invalid gestures than using the minimum residuals alone. The result of the classification based on the minimum residual will be erroneous and meaningless, the result was determined by SCI classification is easy to be understood and accepted.

3 Local sparse representation classification

3.1 Thought of local sparse representation of thought

For a typical gesture recognition scene, a training sample data set containing a class C sample is given to identify a test image \(y\in \mathfrak {R}^{m}\). Suppose that the class \(j\in [1,\ldots ,C]\) contains \(n_j \) gesture samples, which is a column vector matrix \(A_j =[a_j, \ldots , a_{n_j } ]\), which contains the matrix \(A=[A_1, \ldots , A_C]\in \mathfrak {R}^{m\times n}\) of the gesture samples of all classes, where m is the dimension of the sample feature, \(n=n_1 +\cdots +n_j \) is the number of all training samples, assuming that the test image y can be represented by the training sample set, then the problem can be expressed as \(y=Ax+z\), where x is the coefficient vector representing the relationship between y and some of the columns of A.

According to the principle of compression perception, when solving the sparse coefficients, the minimization of \(l_{1}\) norm problem can be transformed into the minimization of \(l_{2}\) norm convex optimization problem. In this paper, Homotopy method is used to solve the minimized \(l_{1}\) norm problem. Homotopy algorithm is an unconstrained optimization problem, which can be iteratively solved in the direction of the nearest convergence center in the process of finding the optimal solution [24, 25]. However, compared with the greedy algorithm, the computational complexity is high, so it is necessary to propose a method to approximate the minimization of \(l_{1}\) norm problem, which can without reducing the recognition rate under the premise of effectively reducing the overall running time. Regularization technique is one of the effective methods to deal with constraint optimization. Therefore, the Lagrangian objective function v(x) of the minimization of \(l_{1}\) norm is defined as a series of vector operations. The optimization problem becomes:

$$\begin{aligned} v(x)=\left\| {y-\sum _{i=1}^n {a_i x_i } } \right\| _2^2 +\lambda \sum _{i=1}^n {\left| {x_i } \right| } \end{aligned}$$
(10)

where \(a_i \in \mathfrak {R}^{m}\) is the i-th column of A, \(x_i \) is the i-th element of the coefficient vector, and \(\lambda \) is the sparse control parameter. It is assumed that the maximum number of Kelements in the coefficient vector is nonzero, and any of the remaining elements \(x_j =0\), then \(\left\| {a_j x_j } \right\| _2 =0\), \(a_j \) has no effect on v(x). That is, due to the existence of zero coefficient, resulting in the corresponding training samples have no effect on the results of the solution, and other training samples related to test samples, which is the reason why the test sample can be linearly represented by similar samples. The objective function can be redefined as:

$$\begin{aligned} v(\alpha )=\left\| {y-\sum _{i=1}^K {\omega _i \alpha _i } } \right\| _2^2 +\lambda \sum _{i=1}^K {\left| {\alpha _i } \right| } \end{aligned}$$
(11)

The corresponding sample of the k elements constitutes the new dictionary \(\Omega \), then \(\omega _i \) is the i-th column of the matrix \(\Omega \). Since the error estimate does not depend on the zero element in x, the new objective function \(v(\alpha )\) is theoretically the same as v(x), and \(K\ll n\), it is calculated faster. With the new dictionary \(\Omega \) and the coefficient vector \(\alpha \), the minimization of the \(l_{1}\) norm can be redefined:

$$\begin{aligned} \hat{{\alpha }}=\mathop {\arg \min }\limits _\alpha \left\| {y-\Omega \alpha } \right\| _2^2 +\lambda \left\| \alpha \right\| _1 . \end{aligned}$$
(12)

3.2 KNN-SRC classification algorithm

The KNN classifier is widely used, and the conventional KNN classifier simply computes the K neighbors of the test sample and classifies the test samples into the category containing the most neighbors. The conventional KNN classifier, regardless of the proximity of the sample distance from the near sample treats each neighbor in the classification of the test sample. Although the classification criterion of KNN is not strict, it can detect the training samples which are not similar or far away from the test samples. Therefore, a local SRC method KNN-SRC is proposed [26], it combines the KNN algorithm with the sparse representation algorithm. The method is divided into two steps: the first step to calculate the distance between the training sample and the test sample, select K test samples of the neighbors from the training sample:

$$\begin{aligned} d_i (y)=\left\| {y-a_i } \right\| ^{2} \end{aligned}$$
(13)

The second step does not consider the influence of other training samples on the final classification decision, the selected K neighboring test samples as sub-dictionary \(\Omega =[\omega _1 ,\ldots , \omega _K ]\), and then use (13 ) to solve the sparse coefficient, and according to the minimum residual classification.

The steps of the KNN-SRC algorithm are summarized as follows:

  1. Step 1.

    Input: training sample \(A\in \mathfrak {R}^{m\times n}\), test sample \(y\in \mathfrak {R}^{m\times 1}\);

  2. 2.

    For each column of A and y are normalized by \(l_{2}\) norm;

  3. 3.

    Calculate the distance between the training sample and the test sample:

    $$\begin{aligned} d_i (y)=\left\| {y-a_i } \right\| ^{2} \end{aligned}$$
  4. 4.

    The distances are sorted in descending order, and the K samples of the test samples are searched from the training samples to form the subset \(\Omega \in \mathfrak {R}^{m\times k}\);

  5. 5.

    Solving minimize \(l_{1}\) norm on subset \(\Omega \):

    $$\begin{aligned} \hat{{\alpha }}=\mathop {\arg \min }\limits _\alpha \left\| {y-\Omega \alpha } \right\| _2^2 +\lambda \left\| \alpha \right\| _1 \end{aligned}$$
  6. 6.

    Calculate the residual for each class \(j\in [1,C]\):

    $$\begin{aligned} r_j (y)=\left\| {y-\Omega \delta _j (\hat{{\alpha }})} \right\| _2 \end{aligned}$$
  7. 7.

    Calculate SCI:

    $$\begin{aligned} SCI=\frac{C\cdot \max _j \Vert {\alpha _j } \Vert _1 /\Vert {\hat{{\alpha }}} \Vert _1 -1}{C-1} \end{aligned}$$
  8. 8.

    Output: If \(SCI\ge \tau \), Category \(I(y)=\arg \min _j r_j (y)\), otherwise it is an invalid gestures.

3.3 \(l_{2}\) Norm local sparse representation classification algorithm

Besides the \(l_{0}\) norm and the \(l_{1}\) norm, the minimum \(l_{2}\) norm method can be used to find the estimate of x in the sparse coefficient solution of SRC.

$$\begin{aligned} \hat{{x}}_{l_2 } =\mathop {\arg \min }\limits _x \left\| {y-Ax} \right\| _2^2 \end{aligned}$$
(14)

where x is the coefficient vector, y is the test sample, and A is the training sample.

The above equation can be solved by the least squares, it is calculated by pseudo inverse matrix:

$$\begin{aligned} \hat{{x}}_{l_2 } =(A^{T}A)^{-1}A^{T}y \end{aligned}$$
(15)

Since the pseudo inverse matrix \((A^{T}A)^{-1}A^{T}\) is the same for all y, it is only necessary to solve once and save it, and then solve the corresponding coefficients for each y, so the method is simpler and faster to calculate. But the coefficient vector \(\hat{{x}}_{l2} \) is relatively dense, as shown in Fig. 3, it can be seen that the sparsity is not stronger than the \(l_{1}\) norm. However, the larger values of the sparse coefficients solved by the \(l_{2}\) norm are basically the same as those calculated by the \(l_{1}\) norm.

Fig. 3
figure 3

Sparse coefficient of test samples

Therefore, we can combine the K corresponding coefficients in \(|\hat{{x}}_{l_2 } |\) to form a subset \(\Omega =[\omega _1, \ldots , \omega _K ]\), which not only selects the dictionary atoms most likely to be related to the test samples, but also reduces the number of atoms in the new dictionary, and reduces the amount of subsequent calculations, finally the dictionary uses a sparse representation algorithm that minimizes the \(l_{1}\) norm to classify the test samples.

It is not easy to find the exact useful sample by minimizing the \(l_{2}\) norm method, but it has a high degree of sample approximation [27]. As mentioned earlier, the calculation of the minimized \(l_{2}\) norm is fast. Although the coefficients \(l_{2}\) method of solution is dense, some of the peaks are similar to the nonzero coefficients in the \(l_{1}\) method, that is, the training image corresponding to the matching test image category.

\(l_{2}\) Norm local sparse representation classification (\(l_{1}\)-LSRC) uses the least squares method to approximate the \(l_{1}\) norm to minimize, and obtains the computational speed of the least squares and the robustness of the SRC [28, 29]. In the recognition phase, the linear regression approximates SRC. Firstly, the pseudo inverse \((A^{T}A)^{-1}A^{T}\) is used to quickly calculate the coefficient vector \(\hat{{x}}_{l_2 } \), the elements in the vector \(|\hat{{x}}_{l_2}|\) is arranged in descending order, selecting the Ksamples that corresponding to the maximum coefficient in \(|\hat{{x}}_{l_2} |\), and establishing the approximate dictionary matrix \(\Omega \). And use this smaller dictionary \(\Omega \) is used to as input of the minimize \(l_{1}\) norm solution, a new sparse vector \(\alpha \) is calculated. Finally, the sparse and concentrated index SCI is calculated for the test sample image, and the minimum residue is used to determine the most likely category.

Fig. 4
figure 4

Five types of grabbing hand samples. a Five fingers grabbing, b three fingers grabbing, c two fingers pinching, d single finger hook, e openning

The \(l_{2}\) norm local sparse representation classification method steps are summarized as follows:

  1. Step 1.

    Input: training samples \(A\in \mathfrak {R}^{m\times n}\), test samples \(y\in \mathfrak {R}^{m\times 1}\);

  2. 2.

    Each column of A and yis normalized by \(l_{1}\) norm;

  3. 3.

    Use the inverse operator \((A^{T}A)^{-1}A^{T}\) to solve the minimization of \(l_{1}\) norm:

    $$\begin{aligned} \hat{{x}}_{l_2 } =(A^{T}A)^{-1}A^{T}y \end{aligned}$$
  4. 4.

    Select the K samples that corresponding to the maximum coefficient from A to form a subset \(\Omega \in \mathfrak {R}^{m\times k}\);

  5. 5.

    Solve the minimum \(l_{1 }\)norm by an approximate subset dictionary \(\Omega \):

    $$\begin{aligned} \hat{{\alpha }}=\mathop {\arg \min }\limits _\alpha \left\| {y-\Omega \alpha } \right\| _2^2 +\lambda \left\| \alpha \right\| _1 \end{aligned}$$
  6. 6.

    Calculate residuals for each class \(j\in [1,C]\):

    $$\begin{aligned} r_j (y)=\left\| {y-\Omega \hat{{\alpha }}_j } \right\| _2 \end{aligned}$$
  7. 7.

    Output: if \(SCI\ge \tau \), category \(I(y)=\arg \min _j r_j (y)\), otherwise the gesture is invalid.

In this algorithm, the coefficients are quickly calculated by the least squares method, and the training samples corresponding to K larger coefficients are selected to reduce the number of training samples to minimize the \(l_{1}\) norm and remove the jamming samples. Although only the third step is added, the K samples are extracted by the least squares method greatly reduce the computational complexity, especially when K is small, the recognition rate can be effectively accelerated.

4 Simulation of experiment results of gesture recognition

4.1 Establishment of gesture sample library

In order to verify the recognition effect of the gesture recognition algorithm, the gesture image is collected to establish the gesture sample library, and the influence of each factor on gesture recognition is analyzed. In the experiment, the specified gesture samples are selected, the ellipse model is established in YCbCr color space for segmentation gestures, and the Hu invariant moments and HOG characteristics are extracted [30, 31]. (1) Grabbing gesture sample library

The camera is used to collect five kinds of typical grabbing hand samples for five people, namely, single finger hooking, two fingers pinching, three fingers grabbing, five fingers grabbing, and opening, which as shown in Fig. 4. Each of the five types of gestures is collected 20 pictures, that is, 100 pictures of each type of gesture, a total of 500 pictures can be used for experiments [32]. In the process of collecting gestures, also consider gesture rotation, scale, light and background changes.

(2) ASL sample library

The ASL sample library contains 26-letter gestures (the letters j and z gestures are dynamic, so this is not considered here) as shown in Fig. 5, which were collected for five individuals by Kinect in the same light and scene. Each operator and letter has a color picture and a depth picture, and each of which collects 20 color pictures of 24 letters, a total of 2400 color pictures can be used for the experiment [33], the part of color pictures as shown in Table 1.

Fig. 5
figure 5

ASL finger spelling alphabet

Table 1 Part of the ASL gesture sample

4.2 Effect of parameter K on algorithm recognition rate

The \(l_{1}\)-LSRC and KNN-SRC algorithms use the local sparse representation to select the K training samples most relevant to the test samples for classification recognition, where \(K<n\), which excludes unrelated training samples, reduces the computational complexity while maintaining the robustness of the SRC algorithm. In this paper, we experimentally validate the influence of different K values on the recognition rate to select the best K. In the experiment, 50 training samples are selected for each class, 50 as test samples, the principal component analysis (PCA) is used to reduce dimension, the features are reduced to 100, Homotopy algorithm was used to solve the problem of minimizing \(l_{1}\) norm. The experimental results are shown in Figs. 6 and 7.

The change of recognition rate with K on the grasping gesture library was shown in Fig. 4. It can be seen from the figure that the recognition rate of the SRC algorithm does not change with K, and it is stable at 91.4%, which is the reference of the other two algorithms. As the value of K increases, the more relevant sample increases, and the recognition rate of \(l_{1}\)-LSRC and KNN-SRC algorithm gradually increases. And the recognition rate of \(l_{1}\)-LSRC algorithm is almost unchanged after \(K>125\), it is about 90%, which closes to the SRC algorithm recognition rate; when \(K>150\), KNN-SRC algorithm recognition rate is almost unchanged at about 88%, which closes to the SRC algorithm recognition rate. Therefore, the best K value of the \(l_{1}\)-LSRC algorithm is 125 and the KNN-SRC algorithm is 150 in grabbing gesture sample library. When the value of K is low, the recognition rate of the \(l_{1}\)-LSRC algorithm is higher than the KNN-SRC algorithm, the reason is that the sample selected by the minimization \(l_{1}\) norm is more relevant than the sample selected by KNN.

Fig. 6
figure 6

Change of recognition rate with K on the grasping gesture library

Fig. 7
figure 7

Change of recognition rate with K on the ASL sample library

The change of recognition rate with K on the ASL sample library was shown in Fig. 7. It can be seen from the figure that the recognition rate of SRC algorithm is not affected by K, it is stable at 94.2%, and the recognition rate of \(l_{1}\)-LSRC and KNN-SRC algorithm increases with the increase ofK. When K changes between [500, 1000], the recognition rate of \(l_{1}\)-SRC algorithm is relatively stable, and it is about 93%. When K changes between [600, 1000], the recognition rate of KNN-SRC algorithm is relatively stable and about 92%, which both are close to SRC algorithm. Therefore, the best K value of the \(l_{1}\)-LSRC algorithm on the ASL sample library is 500, and the best K value of the KNN-SRC algorithm is 600. It can be seen from the figure that when K is low, the recognition rate of \(l_{1}\)-LSRC algorithm is higher than KNN-SRC algorithm.

4.3 Comparison of the performance of algorithms

The SRC, KNN-SRC and \(l_{1}\)-SRN algorithm are compared in terms of recognition rate and average run time. At the time of experiment, 50 training samples and 50 test samples were randomly selected from each of the grabbing library, and the PCA dimensionality was extracted after extracting the features. In addition, setting the parameters of each algorithm, \(K=125\) in the \(l_{1}\)-LSRC algorithm, \(K=150\) in the KNN-SRC algorithm, the experimental results were shown in Figs. 6 and 7.

It can be seen from Fig. 8 that the recognition rate of each algorithm increases with the increase of the dimension. When the dimension is low, the change of recognition rate is great and the difference is large. When the dimension is high, the recognition rate tends to be flat and the difference is small. The SRC algorithm has higher recognition rate than the \(l_{1}\)-LSRC and KNN-SRC algorithms under the same dimension. When the dimension is low, the recognition rate of \(l_{1}\)-LSRC algorithm is higher than KNN-SRC algorithm, which is smaller both than SRC algorithm. When the dimension is greater than 100, the recognition rate of the \(l_{1}\)-LSRC algorithm and KNN-SRC algorithm is close to the recognition rate of SRC algorithm, which is about 91%. It can be seen from Fig. 9 that the average running time of each algorithm increases with the increase of the dimension, the SRC algorithm has the highest average running time, the lowest is 0.105 s, the highest is 0.195 s, followed by \(l_{1}\)-LSRC and KNN-SRC algorithm, the average running time is between [0.051, 0.119 s], and the average running time of \(l_{1}\)-LSRC algorithm is lower than KNN-SRC algorithm.

Fig. 8
figure 8

Change of recognition rate with dimension on the grasping gesture library

Fig. 9
figure 9

Change of average running time with dimension on the grasping gesture library

In order to further compare the algorithm performance, 50 training samples and 50 test samples were selected randomly from each class in the ASL sample library, and after the feature is extracted. The principal component analysis (PCA) is used to reduce dimension. The goal of PCA is to express high-dimensional data in low-dimensional subspace. It mainly projects the original sample data into a low-dimensional space by linear transformation. And \(K=500\) in the \(l_{1}\)-SRC algorithm, \(K=600\) in the KNN-SRC algorithm, the experimental results were shown in Figs. 8 and 9.

Fig. 10
figure 10

Change of recognition rate with dimension on ASL sample library

Fig. 11
figure 11

Change of average running time with dimension on ASL sample library

Table 2 Comparison of the performance of algorithms on the grasping gesture library
Table 3 Comparison of the performance of algorithms on ASL sample library

It can be seen from Figs. 8 and 9 that the recognition rate and the average calculation time increase with the increase of the dimension. Due to the large number of samples was selected on the ASL sample library, the recognition rate and the average run time were relatively high compared to the crawl hand library. It can be seen from Fig. 10 that the SRC algorithm has the highest recognition rate. When the dimension is low, the recognition rate of \(l_{1}\)-LSRC algorithm is higher than KNN-SRC algorithm, which is smaller both than SRC algorithm. When the dimension is greater than 100, the recognition rate of the \(l_{1}\)-LSRC algorithm and KNN-SRC algorithm is close to the recognition rate of SRC algorithm, which is about 94%. It can be seen from Fig. 11 that the lowest of the average running time of the SRC algorithm is 0.282s, the highest is 0.355s, and the average running time of the \(l_{1}\)-LSRC and KNN-SRC algorithm is between [0.121s, 0.214s]. The \(l_{1}\)-LSRC algorithm is better than the KNN-SRC algorithm in the computation time.

The specific results data was shown in Tables 2 and 3, and the comparison of the performance of algorithms under different gesture sample library as shown in Tables 2 and 3. According to the data in the table, considering the recognition rate and the average running time, the \(l_{1}\)-LSRC algorithm is better than the KNN-SRC algorithm. Therefore, the \(l_{1}\)-LSRC and KNN-SRC algorithm have chosen the K most relevant training samples rather than all training samples, making the recognition rate and SRC little difference, but \(K<n\), the average run time is much lower than SRC. So they improve the efficiency while ensuring algorithm accuracy. In addition, the \(l_{1}\)-LSRC and KNN-SRC algorithm have different way in choosing the K, the KNN-SRC algorithm is executed for each test sample are required to calculate the distance, while the \(l_{1}\)-LSRC algorithm only needs to compute a pseudo inverse matrix, and it is more relevant to calculate the sample selected by \(l_{1}\) norm. So the \(l_{1}\)-LSRC algorithm is better than the KNN-SRC algorithm in recognition rate and computation time.

5 Conclusions

Aiming at the problem of high computational complexity based on \(l_{1}\) norm solving algorithm, this paper proposed \(l_{1}\) norm local sparse representation classification algorithm based on the introduction of local sparse representation method. The local dictionary was chosen by minimizing \(l_{1}\) norm and was carried out on the dictionary to solving sparse coefficients. In order to verify the validity of the algorithm, the influence of the parameter K on the recognition rate was verified in two gesture libraries. The optimal K value was selected and the recognition rate and the average running time of each algorithm were compared. The experimental results showed that using the idea of local sparse representation, the \(l_{1}\)-LSRC algorithm can effectively reduce the computation time while ensuring the recognition rate. And the performance of \(l_{1}\)-LSRC algorithm is slightly better than that of KNN-SRC algorithm.