Keywords

1 Introduction

Vision-based hand gesture recognition is appealing more nowadays as it provides the most natural way to interact for human-machine interaction. The vision base method is widely adopted for research due to low computational complexity. This paper work presents research carried out for the recognition of Indian sign language (ISL). ISL is one of the sign languages which is more complex than American Sign Language as it consists of complex signs (most signs are two hand signs). The typical processes performed for sign language recognition are preprocessing, feature extraction, and classification. Preprocessing consists of converting a color image into gray. Feature extraction is a technique where features such as shape, geometric features, statistical features, texture features, etc. for an image can be extracted. The shape of the hand can be used to identify the gesture termed as shape identification. The contour of the hand identifies the shape. Extracting the contour of hand gives more information in shape detection. Classification techniques such linear classifiers KNN, SVM, and neural network can be applied for recognition. This paper focuses on creation of own database with use of a simple web camera, feature extraction by combining Histogram of orientation gradient (HOG) and Gabor features and classified using support vector machine (SVM) and K-Nearest neighbor (KNN). Gabor filters are used in image processing due to its mathematical and biological properties (Guptaa et al. 2012). The feature dimension generated depends on the selection of parameters for the Gabor filter. The Gabor filter is designed by selecting parameters such as orientation, bandwidth, and frequency. The HOG features give the spatial distribution of local intensity gradients. These features well describe the hand gestures as they describe the edge features. Thus the shape feature of the hand gestures can be extracted by using HOG.

2 Related Work

Hand gesture recognition is one form of interaction between human and computer to achieve typical application. A real-time hand gesture recognition system implemented (Kishore and Rajesh Kumar 2012) with an accuracy of 96% using a combination of color and texture features and fuzzy logic for classification (Nandy et al. 2010). Indian sign language recognition was implemented by evaluating mean feature of histogram gradient and Euclidean distance for recognition and used for controlling a humanoid robot. Gabor is a linear filter that gives best localization characteristics by changing, bandwidth, frequency, and orientation (Huang et al. 2010). In (Zhao et al. 2010) extracted HOG features were converted into low-dimensional subspace using PCA-LDA. It was classified using the nearest neighbor classifier to achieve a recognizing accuracy of 91% in real-time. While in (Teoh and Branunl 2015) the authors used both HOG and Gabor for vehicle detection with three different classifiers SVM, Multi perceptron neural network, and distance classifier. They obtained best performance with HOG and SVM with less processing time. The authors in (Sheenu et al. 2015) have used HOG method followed by sequential minimal optimization with a recognition rate of 93.12%.

3 Methodology

The methodology proposed is a novel method for ISL recognition, where the Gabor features and HOG features combined to form a feature vector. The obtained feature vector is of a higher dimension, and hence PCA is used further to reduce the dimension and then applied to the classifier for recognition. The classifier used is SVM and KNN. Figure 1 shows the proposed methodology for ISL recognition.

Fig. 1
figure 1

Proposed methodology for ISL recognition

3.1 Feature Extraction by HOG

The HOG features are widely used for object detection. The image is divided into small square cells, the histogram of oriented gradients is computed for each cell, normalizes the result using a block-wise pattern, and return descriptors for each cell. Histogram of Oriented Gradient descriptor assumes that the local object appearance and shape within an image can be described by the distribution of intensity gradients or edge directions. The implementation of these descriptors can be achieved by dividing the image into small connected regions called cells, and for each cell computing a histogram of gradient directions i.e. edge orientations for the pixels within the cell. The combination of these histograms then represents the descriptor (Savaris and von Wagenheim 2010). The shape features are evaluated by applying color normalization on the input image, then evaluate the horizontal and vertical gradients, next is the formation of spatial blocks and then calculate orientation bin-wise and then forming a feature vector. The gradient magnitude and its orientation are calculated as in Eqs. (1) and (2) respectively, where gx and gy are horizontal and vertical gradients.

$$ G = \sqrt {gx^{2} + gy^{2} } $$
(1)
$$ \theta = \tan^{ - 1} \frac{gy}{{gx}} $$
(2)

3.2 Feature Extraction by Gabor Filter

Gabor filter is a linear filter used for object edge detection. Gabor transform has strong frequency and orientation selectivity so that the edge features can be extracted. Gabor gives the best resolution in time and frequency domain and hence recognized as a very useful tool in computer vision and image processing (Huang et al. 2010). The parameters such as bandwidth, frequency, and orientation are changed to achieve best local features as Gabor is a linear filter. The features are extracted by convolution of the Gaussian kernel with the input image. A 2-D Gabor filter kernel over the image (x, y) is defined as per Eq. 3

$$ G\left( {x,y,\theta ,\lambda ,\varphi ,\sigma ,\gamma } \right) = \exp - \frac{1}{2}\left\{ {\frac{{x^{{\prime}{2}} }}{{\sigma x^{{\prime}{2}} }} + \frac{{y^{{\prime}{2}} }}{{\sigma y^{{\prime}{2}} }}} \right\}\cos \left( {\frac{2\pi }{{\lambda x^{\prime}}} + \varphi } \right) $$
(3)

where \(x^{\prime}\) = xsin \({ }\theta\) + ycos \({ }\theta\) and \(y^{\prime}\) = xcos \({ }\theta\)ysin \(\theta\), \(G\)(x, y, \(\theta\), λ, \( \varphi ,\sigma , \gamma\)) kernel is a function of various parameters \({ }\theta\), λ, \( \varphi ,\sigma , \gamma\) of the wavelet. \(\theta\) is the orientation of the Gabor function, varied between 0 and 360. λ is the wavelength of the cosine factor of the Gabor kernel referred to as the wavelength of the filter. \(\varphi\), it is the phase shift of the Gabor function in degrees which specifies the elasticity of the Gabor function. The features are extracted by convolution of the image with Gabor kernel represented as in Eq. 4

$$ G\left( {x,y,\theta ,\lambda ,\varphi ,\sigma ,\gamma } \right)\left( {x,y} \right) = I\left( {x,y} \right)*G\left( {x,y,\theta ,\lambda ,\varphi ,\sigma ,\gamma } \right) $$
(4)

where I(x, y) is the image.

The HOG features and Gabor features are finally combined together to form feature vector. The features are concatenated and the length of the feature vector.

4 Results

4.1 Results of HOG and Gabor

For implementation, Matlab is used. The final HOG vector Obtained is a vector of 2 × 2 × 9 vector i.e. 36 × 1. Here as the block overlap is of 2 there are 9 matrices of 2 × 2 size. Thus 8 × 8 cell finally reduced to 2 × 2 of 2 block overlap. Therefore 9 bins contain gradients of each cell. The final feature vector evaluated is the combination of both HOG and Gabor. This combination provides a perfect feature matrix which represents signs of Indian sign language. The HOG extraction method applied to a grayscale resized image of 130 × 130 resolution results in a feature vector of size 2700 × 1. The cell size selected 8, the block size is 2, and bin size is 3 for HOG extraction. The Gabor coefficients obtained after convolution of the Gaussian kernel with the sign image are of size 16,900 × 1. Figures 2 and 3 shows the results for sign “0” and “A”.

Fig. 2
figure 2

Results of combined features of HOG and Gabor gestures for sign “0”

Fig. 3
figure 3

Results of combined features of HOG and Gabor gestures for Sign “A

The performance of the classifier algorithm is stated by evaluating the accuracy from confusion matrix. The confusion matrix is as shown in Table 1 is for SVM and Table 2 is for KNN. The average accuracy obtained with SVM 83.92% and with KNN is 84.92% for K = 3. As there is very less research carried on ISL and no method used based on combined recognition comparison with existing work cannot be obtained.

Table 1 Confusion matrix by combined hog and Gabor features with SVM
Table 2 Confusion matrix by combined hog and Gabor features with KNN

5 Conclusion

The combined feature extraction by HOG and Gabor technique is obtained to increase the accuracy and reduce complexity of the system, though average accuracy obtained is just 83.92 and 84.92%. Gabor though is a robust technique accuracy decreases as filter output depends on many parameters. The recognition for ISL is a challenging task as the signs in ISL are complex. Further, the technique can be extended to recognize sentences and generate audio output for the recognized gestures.