1 Introduction

Gender classification is a crucial visual task in a human computer interaction (HCI). In the field of HCI, either computers or robots are utilized to recognize the gender of a human to improve the system performance based on their gender. Few other applications of gender classification are security and surveillance system, content based image and video retrieval, access control, demographic data collection, forensics, consumer behavioral analysis and marketing, audience measurement and reporting, mobile application and video games etc. One of the fascinating applications is gender medicine, which aims to recognize and analyze the biological, physiological, anatomical, functional response of a gender with respect to disease, pharma and treatment. The diversified biological and physiological nature of the men and women has an impact of their health and disease.

Fig. 1
figure 1

Face—a cue for gender classification with more challenges [(make-up (a), expression (b, g), ethnicity (a), occlusion (c, f), illumination (e), pose (h), race (d), and age (c, d and f)]

Gender of a person can be classified from biology or physiology of human body [1]. The biological features include biometric information like finger print, iris, palm print, etc., and bio-signal features like EEG, ECG, DNA, etc. The physiological features include face, hand shape, body shape, gait, gesture, etc. and also some apparel information like clothing, hair style and footwear etc. The gender classification by biological approach may not be suitable for HCI since the extraction of biological features requires human co-operation. Humans can easily classify the gender of a person by looking their face, body shape, speech and gait. But it is a challenging task in computer vision due to various intrinsic and extrinsic factors associated with face, body shape, speech and gait. But, the large number of psychophysical studies has examined gender classification from face information. Facial images are probably the most common biometric characteristic used for gender classification because it is non-intrusive in vision task. Though face is an important cue for gender classification, it has many challenges due to variations of pose, expression, aging, race, make-up, occlusion and illumination as shown in Fig. 1. The objective of the proposed work is to classify a person as male or female from the facial image irrespective of these constraints.

Recently, researches that focus on gender classification with faces. Gender classification is considered to be a two class pattern recognition problem since it classifies a person as either male or female. It involves three modules namely face detection, feature extraction and classification with training and testing phase. The classification accuracy mainly depends on feature extraction techniques since it provides the best representation of face region. Based on feature extraction, the literature can be categorized into geometrical and appearance based approaches.

Geometric based approach In this approach, the fiducial points (i.e. eyes, nose and mouth) of face are localized and their relative position, width, distance and ratios are measured. In [2], the distance between two eyes, eyes to nose tip, nose tip width and their ratio are calculated for gender classification. But, it requires accurate and reliable tracking and detection techniques result in extra computation to localize various facial regions and also not considers other useful information such as shape and texture. So, it is not suitable for all circumstances like different poses.

Appearance based approach Appearance-based approaches are based on transformation performed on the pixels of an image to provide the representative description about facial regions/ whole face. In holistic approach, the color, shape and texture of whole face image is considered for gender classification. If the feature extraction considers the whole face instead of smaller blocks of face, then the. features are said to be global features otherwise they are called local features. The feature vector obtained from these approaches should be a discriminative feature representation of the face image

In [3], the combination of local and global features has also been utilized for gender classification. The size of the feature vectors becomes very high, and to over-come the problem of the curse of dimensionality, principal component analysis (PCA) and linear discriminant analysis (LDA) were involved for the reduction of feature size. The feature vector extracted through the global methods possesses geometric details but it is sensitive to variation in faces includes view, expression and etc. [4]. It is noticed that the local features outperform global features. The literature reveals that the combination of more than one features can increase the performance of classification by complimenting their advantage over the another and this leads to the hybrid features.

Recent efforts made by the researchers to develop gender classification algorithms [2, 5,6,7,8,9,10] that perform well in unconstrained face images with variation of the pose, occlusion, illumination etc. In this perspective, the use of local descriptors, such as Gabor jets [8], SURF [11], SIFT [12], PHOG [13], HOG [14] and histograms of local binary patterns (HLBP) [15] are prevalent. Most of these local feature descriptors are invariant to image transforms, distinctive for recognition and robust against occlusion, expression variation, pose variation, noise and illumination when compared to traditional holistic algorithms. Therefore, the local descriptors are mostly adapted for gender recognition to improve the recognition rate.

Lowe [16] proposed the most accepted algorithm called SIFT which includes both feature detection and description stages. It is basically a rotation and scale invariant feature. SIFT descriptor is developed by describing the local gradient information around a corresponding detected interest point. Dense SIFT (D-SIFT) is computed at every pixel, or every kth pixel. Wang et al. [17] extracted dense SIFT descriptors at regular image grid points and combined it with global shape contexts of the face, adopting AdaBoost for gender classification. In [18], the combination of local features such as dense SIFT descriptors and Gabor features are used for gender classification.

Ojala et al. [19] introduced local binary patterns (LBP) for grayscale and rotation invariant texture classification. Lian and Lu [20] used LBP of equally sized blocks with SVM classifier for multi-view gender classification. Yang and Ai [21] applied LBP for classifying age, gender and ethnicity. Alexandre [22] combined LBP with intensity and shape feature in a multiscale fusion approach. Classifiers were trained for each feature and at different image sizes. The decisions from the classifiers were then fused by majority voting. Local binary pattern finds its significance in gender classification due to its ability to describe the facial textures. Simple LBP is a global features whereas it misses the local information of faces. Ahonen et al. [23] proposed to split face images into several local regions and then local LBP histograms of each region can be obtained, and all are concatenated together to form spatially enhanced LBP feature histogram. The resultant histogram exposes both the local texture and global shape of face images.

Dalal and Triggs [24] proposed histogram of oriented gradients (HOG) for human detection, which divides the object into many fixed-sized blocks, computes the HOG of each block, and represents the object by a concatenation of the blocks HOG vectors. It captures edge or gradient structure of the given image by which it represents the local shape information with an easily controllable degree of invariance to local geometric and photometric transformations. The HOG feature is widely used in many applications, including human detection [25, 26], face recognition [27], object detection [28], and emotion recognition [29].

The literature emphasize that the hybrid features can impart high performance and robustness in a gender classification technique. It is highly important to choose the appropriate features to describe faces. The features are to be universal, distinctive and permanence for gender classification. In case of geometrical features, extra computation is required to locate the facial components and they highly rely on the accuracy of facial component detection. Hence, it is better to adopt appearance based approach. For gender classification, visual cues of human face play vital roles which are represented by color, shape and texture. Color feature is not suitable for different races and make up conditions. Hence, it is preferable to rely on shape and texture features for classifying genders. Out of the existing shape and texture features, HOG and LBP is found to provide good results in many pattern recognition scenarios. SLBP is the enhanced form of LBP where the local information of facial regions is retained and robust to monotonic illumination variations. Also, HOG is robust to local pose variations. In this research work, a hybrid appearance based approach with two complimentary features namely spatially enhanced local binary pattern (SLBP) and histogram of oriented gradients (HOG) are fused to represents the face region more efficiently and it is implemented with support vector machine (SVM) classifier. This work is limited to male and female gender classification and does not include transgender.

The major contributions of this paper are

  • Performance analysis of gender classification with individual features and hybrid features.

  • Selection of hybrid features namely SLBP and HOG for gender classification where these features are complimentary to one another.

  • Comparative assessment of proposed hybrid technique with state of the art methods on benchmark databases.

Fig. 2
figure 2

Flow diagram of a proposed hybrid gender classification technique

Rest of the paper is organized as follows. Section 2 discusses about the methodology of the proposed work. In Sect. 3, the results of the proposed work are elaborated and its comparison against state of art methods is dealt. Section 4 concludes and provides future direction.

2 Proposed methodology

The proposed method is developed to distinguish the gender of a person as either male or female from facial cues of captured image. The overall flow diagram of the proposed method is shown in Fig. 2. This proposed method comprises of four major steps namely face detection, preprocessing, feature extraction and classification. Face detection is the first step to choose the region of interest so as to remove the unwanted parts like neck, hand, and surroundings. This process is accomplished through Viola Jones [30] algorithms using Harr-like features and AdaBoost. After the successful detection of face region, preprocessing like color conversion and image resizing is performed. Spatially enhanced local binary pattern (SLBP) and HOG features are extracted from the preprocessed image. These features are fused [31] to get a hybrid feature vectors for classification. In training phase, the features from training images are extracted and the classification model is built with the help of their true class labels. When the query image is fed as an input to the gender classification system, the SLBP and HOG features are extracted and identified as either male or female. The advantage of using hybrid feature over the individual feature is analyzed through their performance on gender classification. Also, three different classifiers are implemented and their performance is compared to select the suitable classifier.

2.1 Feature extraction

2.1.1 Spatial local binary pattern

Local binary pattern is implemented in numerous applications, such as computer animation, face image analysis and surveillance. LBP is a very important descriptor because of its illumination resistivity, stumpy computational complication and capability to grasp fine details. The simple LBP descriptor provides the global details of the image and it lacks in local details. In order to consider the local details, an input image is separated into several blocks and local binary pattern descriptor is obtained for each block to develop an enhanced LBP feature namely SLBP. It uses the Local Binary Pattern descriptor to build several local descriptions of the face and stack the local details to describe globally. The resultant local feature performs well irrespective of any variations in pose or lighting than holistic techniques. Thus, the facial image is divided into various local regions and LBP texture descriptors are obtained from each local region separately. Then the descriptors are concatenated to form a global representation of the face. SLBP encodes both the appearance and the spatial relations of facial regions. Computation of Spatial Local Binary Pattern is shown in the Fig. 3, where an image is split into 4 \(\times \) 4 blocks and the histogram of LBP feature (\(\uppsi _{\mathrm{ij}}\)) with 58 bins of each blocks are concatenated to form the SLBP feature.

In a block, a pixel at (\({\hbox {x}}_{\mathrm{c}}\), \({\hbox {y}}_{\mathrm{c}}\)) is compared against its 8 neighbors and if it is maximum then ‘1 ’is assigned else ‘0’. The resulting 8 bits represent the local binary pattern of that pixel.

$$\begin{aligned} \varsigma ({x_c, y_c}) = \sum _{n=0}^7 {\phi ({i_c -i_n})} {*}2^{n} \end{aligned}$$
(1)

where ‘n’ is the number of neighborhood pixel (say 8), ‘\({\hbox {i}}_{\mathrm{c}}\)’ and ‘\({\hbox {i}}_{\mathrm{n}}\)’ are the gray values of center and neighbor pixels, and the function \(\upphi \)(x) is given as

$$\begin{aligned} \phi (x)=\left\{ {{\begin{array}{ll} 1&{} \,{x\ge 0} \\ 0&{} \,{x<0} \\ \end{array} }} \right. \end{aligned}$$
(2)
Fig. 3
figure 3

Spatial local binary pattern

Likewise, every pixels in a block are assigned to a pattern and construction of their histogram leads to histogram of LBP. This histogram of LBP of a block represented as (\(\uppsi _{\mathrm{ij}}\)).

$$\begin{aligned} \psi _{{\textit{ij}}} = p (\varsigma ); \quad 0\le \varsigma \le 255 \end{aligned}$$
(3)

where p(.) refers the probability of occurrences and (i,j) denotes the position of block in an image. The concatenation of histogram of LBP obtained from each block constitute the spatially enhanced Local binary pattern of the face image which is represented as in Eq. (4),

$$\begin{aligned} \Psi = [{{\psi _{11}}| {\psi _{12}}|.......\psi _{44}}] \end{aligned}$$
(4)

The SLBP feature vector describes the face on three different levels due to the following reasons

  • LBP retrieves information about the patterns at pixel level.

  • LBP combined over a small local region (within a block) provides information at regional level.

  • Regional level LBP histograms concatenated to form SLBP gives a global representation of the face.

SLBP feature grabs the texture details of face but it lacks to retrieve the structure of facial regions. Therefore, the facial structural details are extracted using HOG descriptor.

2.1.2 Histogram of oriented gradients (HOG) descriptor

Histogram of oriented gradients (HOG) is a shape descriptor that picks the local object appearance and shape of the object. HOG represents the distribution of intensity gradients over a region. It is used to detect objects in image processing and computer vision. The occurrences of gradient orientation in localized portions of an image say detection window, or region of interest (ROI) are counted in HOG descriptor.

As given in Fig. 4, an input image is divided into small connected regions called blocks, each block is further divided into cells and a histogram of gradient directions with 31 bins or edge orientations of the pixels within the cell is computed. The computation of gradient values with the kernel functions along horizontal \({\hbox {k}}_{\mathrm{x}}\) and vertical directions \({\hbox {k}}_{\mathrm{y}}\).

$$\begin{aligned} k_x= & {} \left[ {{\begin{array}{lll} {-1}&{} 0&{} 1 \\ \end{array} }} \right] \quad k_y =\left[ {{\begin{array}{l} {-1} \\ 0 \\ 1 \\ \end{array}}}\right] \end{aligned}$$
(5)
$$\begin{aligned} F_x= & {} F{*}k_x ; \quad F_y =F{*}k_y \end{aligned}$$
(6)

The horizontal and vertical derivative of the face image (F) is obtained by convolving the kernel function as given in Eq. (6). The magnitude of the gradient (G) and orientation of the gradient (\(\uptheta \)) is computed by,

$$\begin{aligned} |G| = \sqrt{F_x^2 +F_y^2} \quad \hbox {and} \quad \theta =\tan ^{-1} \left( {\frac{F_y}{F_x}}\right) \end{aligned}$$
(7)
Fig. 4
figure 4

Histogram of oriented gradients

The range of orientation is either (0–180\(^{\mathrm{o}}\)) for un-signed gradients or (0–360\(^{\mathrm{o}}\)) for signed gradients. The magnitude and orientation of gradients are calculated for each and every pixel in the cells. A normalized histogram is computed for each cell and then concatenated to represent a block. The set of block histograms represents the HOG descriptor of the face image. The preservation of spatial information is advantageous in HOG.

Texture feature obtained from SLBP (\(\Psi \)) and shape feature obtained from HOG (\(\upeta \)) are combined together as a hybrid feature vector (\(\updelta \)) for gender classification.

$$\begin{aligned} \delta = [{\Psi |\eta }]. \end{aligned}$$
(8)

2.2 Classifier

The hybrid features are fed to the classifier for discriminating male from female images. Support vector machine (SVM) are chosen for experimentation because it is an efficient discriminant method which handles non-linearly separable data and attains lower error for the query samples. For linearly non-separable data, SVM can nonlinearly map the input to a high-dimensional feature space where a linear hyperplane can be found. SVM [32] is a supervised machine learning algorithm, which can be employed for both classification and regression problems. For a set of ‘N’ training hybrid features ‘\(\updelta _{\mathrm{i}}\)’ and their true labels ‘\(\uptau _{\mathrm{i}}\)’ belong to two different gender classes, where \(\updelta _{\mathrm{i}} \in {\hbox {R}}^{\mathrm{M}}\) and \(\uptau _{\mathrm{i}} \in \{-1,1\}\). The hyper plane for SVM classifier is constructed by optimizing Eq. (9).

$$\begin{aligned} \min \frac{1}{2} \Vert {\omega ^{2}}\Vert \, \hbox { subject to } \, \tau _i ({\omega {\bullet } \delta _i + \beta })\ge 1 \quad \forall \quad \delta _i \end{aligned}$$
(9)
Fig. 5
figure 5

SVM classifier

SVMs are based on the design of finding a hyper plane that best divides a dataset into male and female with these training hybrid features as shown in Fig. 5. The hybrid features along the hyper planes near the class boundary are called support vectors, and the margin is the distance between the support vectors and the class boundary hyper planes. These support vectors are further used for classifying the query hybrid features into either male or female.

3 Experimental results

The proposed work aims to classify the gender of human into male and female classes from face images. This is evaluated on the two benchmark datasets namely FERET and Labeled Faces in the Wild (LFW) database.

Fig. 6
figure 6

Sample images of FERET database

FERET database FERET [33] is most widely used in various face processing applications. It contains human faces at different age, ethnicity, different poses, expressions, gender, lightening conditions. A FERET database of facial imagery was collected between 1993 and 1996. The database contains 14,126 images that includes 1199 individuals at different constraints. These samples images from FERET database are shown in Fig. 6.

Labeled faces in the wild (LFW) LFW is an unconstrained face database [34], containing more than 13,000 face images of 5749 individuals collected from the web. These face images are very dynamic which includes variation like pose, view point, expression, age effects and occlusion. Out of which, 10,256 images are male faces and 2977 are female faces. It consists mostly of public figures such as celebrities and politicians. Few sample images of LFW database acquired under uncontrolled conditions are shown in Fig. 7.

Fig. 7
figure 7

Sample images of LFW database

Fig. 8
figure 8

Selecting ROI using Viola jones algorithm (a) LFW image (b) Face detected output (c) Feret image (d) Face detected output

Fig. 9
figure 9

Result of SLBP and HOG Features. (a) Input image, (b) SLBP and (c) HOG

Table 1 Performance evaluation of tenfold cross validation on FERET and LFW database
Table 2 Performance evaluation of tenfold cross validation on FERET and LFW database

Initially, the proposed algorithm is tested on the small set of images from FERET and LFW datasets. 920 images (460 per class) and 600 images (300 per class) is chosen for experimentation from FERET and LFW database respectively. The performances of SLBP, HOG and the fusion of SLBP and HOG are evaluated on the selected small dataset with three different classifier. The experiments were conducted using k-fold cross validation method and it is more suitable for both larger and smaller data sets. Any arbitrary value of k can be chosen for validating the effectiveness of the proposed approach. From the literature [18], it is inferred that10 Fold cross validation is advisable for classification problems. Hence, k value is chosen as 10 to evaluate the performance of the proposed method. Total images considered for experimentation from the dataset were randomly divided into 10 partitions for tenfold cross validation technique. For the performance analysis, FERET and LFW samples of 46 and 30 images per class are randomly chosen for each fold.

Table 3 Performance evaluation of tenfold cross validation on FERET and LFW database
Table 4 Performance evaluation of tenfold cross validation on FERET and LFW database
Fig. 10
figure 10

Error analysis on FERET and LFW database

Fig. 11
figure 11

Classification accuracy of proposed method. (a) FERET and (b) LFW

From each images, face region is detected using Viola Jones Algorithm, because the proposed algorithm aims to classify the gender with only facial region from human body. The results of face detection on sample input images from LFW and FERET database are shown in Fig. 8. The region of interest (ROI) is resized and converted to gray scale image for feature extraction. Resizing of detected face region is essential to make uniform feature vector length. Two different image resolutions of \(32 \times 32\) and \(64 \times 64\) are set for experimentation. If the region of interest is of \(64 \times 64\) size and block size is 8, then the SLBP feature extracted with 64 blocks has the feature vector of size \(1 \times 3712\) and if ROI is \(32 \times 32\) and block size is 8, then the length of the extracted feature vector is \(1 \times 928\). Also, HOG feature is extracted from ROI with varied cell size of either \(8 \times 8\) or \(4 \times 4\).The size of HOG feature for \(64 \times 64\) ROI is \(1 \times 1984\) and for \(32 \times 32\) is \(1 \times 496\) with the cell size of 8. The extracted SLBP and HOG feature is shown in Fig. 9.

The concatenation of these feature leads to the feature vector of size \(1 \times 5696\) for the image size of \(64 \times 64\) with cell of \(8 \times 8\). The length of the fusion of SLBP and HOG feature for \(32 \times 32\) ROI and block size of 8 is \(1 \times 1424\). These features are fed to the SVM. Also for comparison, these features are classified with the traditional classifier namely k-Nearest Neighbor (k-NN) and spare representation classifier (SRC) [35].

Table 5 Comparison of proposed gender classification with state of the art techniques

To study the effect of variation in size of ROI, block and cell, the comparison of the proposed method are presented in the Tables 1, 2, 3 and 4. The tenfold cross validation produces the gender classification rate as 96.20% using SVM, 94.67% using SRC and 90.11% using K-NN classifier with cell size 4 on FERET database as given in Table 1. Also, the gender classification rate achieved is 92.50% using SVM, 90.67% using SRC and 80.17% using K-NN classifier with cell size 4 on LFW database. It seems to produce good result at low resolution. By using cell size 8, gender classification rate achieved is 93.26% using SVM, 92.72% using SRC and 89.24% using K-NN classifier on FERET database as given in Table 2.

By using cell size 8, gender classification rate achieved is 89.67% using SVM, 88.50% using SRC and 79.50% using K-NN classifier on LFW database. From the performance evaluation, it is observed that concatenated SLBP and HOG features overrides their individual performance irrespective of the classifier used. This hybrid feature along with SVM classifier provides the higher accuracy of 97.61 and 95.67% on FERET and LFW database respectively as inferred from Table 4.

The texture information is extracted by SLBP and Shape information is provided by HOG descriptor. Thereby both the HOG and SLBP are complimentary in nature. These two local descriptors are robust against occlusion, expression variation, pose variation and illumination that have complimented to one another. Thus, the proposed combination of SLBP and HOG feature is powerful in classifying the gender with SVM classification with input size of \(64 \times 64\) and \(\hbox {cell size} = 8 \times 8\). The experimentation result of the proposed hybrid feature with 450 LFW images shows that the average time taken for training as well as classifying an image is 0.422 and 0.0142 s respectively considering tenfold. So it does not required dimensionality reduction technique. In order to reduce the processing time further, the whole process can be performed in cluster computing environment by partitioning the training image data set into few small subsets/folds.

The error analysis of the proposed gender classification is made for each feature with different classifiers. The mean error rate analysis is shown in Fig. 10 to validate the proposed method. The minimum error rate of male images is 2.17% and of female images is 2.6% on FERET database with hybrid features and SVM classifier. The overall error rate achieved is 2.39% using SVM classifier. Minimum error rate is observed from the performance analysis of the proposed method on FERET and LFW database. The mean error rate of the SLBP with SVM classifier is less than remaining two Classifiers such as SRC and k-NN Classifier. This is attained by the discriminative feature set chosen for gender classification.

The experimentation of the proposed method is carried over the large dataset of size 4500 images from FERET and 12,236 images from LFW. The Classification rate with three various kernels of SVM are shown in Fig. 11. Linear, Quadratic and Cubic kernels of SVM are used for classification on SLBP, HOG and hybrid feature. The cubic SVM on hybrid attains higher classification rate. Thus, the proposed method has attained the maximum classification rate of 99.1 and 95.7% on FERET and LFW respectively. Also, the proposed hybrid gender classification technique is compared against the state-of-the-methods as shown in Table 5 and found to prove its effectiveness.

4 Conclusion

A gender classification technique is developed with the combination of Spatial Local Binary Pattern (SLBP) and Histogram of Oriented Gradient (HOG) descriptors using Support Vector Machine. The proposed method is tested on standard controlled FERET and uncontrolled LFW databases. The performance is analyzed with each descriptor and their hybrid combination with three different classifiers. The efficacy of the proposed method is realized from the classification rate of 99.1% for FERET and 95.7% for LFW databases through extensive cross validation. The future work would concentrate to include third gender (namely transgender) for gender classification and improve gender recognition accuracy by automatically estimating demographic information from face images.