Abstract
In this paper a novel multikernel deterministic extreme learning machine (ELM) and its variants are developed for classification of non-linear problems. Over a decade ELM is proved to be efficacious learning algorithms, but due to the non-deterministic and single kernel dependent feature mapping proprietary, it cannot be efficiently applied to real time classification problems that require invariant output solution. We address this problem by analytically calculation of input and hidden layer parameters for achieving the deterministic solution and exploiting the data fusion proficiency of multiple kernel learning. This investigation originates a novel deterministic ELM with single layer architecture in which kernel function is aggregation of linear combination of disparate base kernels. The weight of kernels depends upon perspicacity of problem and is empirically calculated. To further enhance the performance we utilize the capabilities of fuzzy set to find the pixel-wise coalition of face images with different classes. This handles the uncertainty involved in face recognition under varying environment condition. The pixel-wise membership value extracts the unseen information from images up to significant extent. The validity of the proposed approach is tested extensively on diverse set of face databases: databases with and without illumination variations and discrete types of kernels. The proposed algorithms achieve 100% recognition rate for Yale database, when seven and eight images per identity are considered for training. Also, the superior recognition rate is achieved for AT & T, Georgia Tech and AR databases, when compared with contemporary methods that prove the efficacy of proposed approaches in uncontrolled conditions significantly.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Face recognition is nonintrusive and natural technique among other biometric modalities like iris and fingerprint. It is a visual pattern classification problem. Due to automation of face recognition in numerous applications including surveillance, biometric authentication and human computer interaction, has motivated many researchers to design a proficient and accurate classifier. The face recognition system performance deteriorates when the database incorporates a huge number of face images collected under uncontrolled environment conditions [10]. Face recognition system imply one to many matching of query face image from the set of facial images in the associated database. The recognition rate of face recognition system relies upon various factors like pose, age span, facial expression, facial wear, and illumination conditions [61]. It is a very challenging task in non-cooperative user scenarios specifically in surveillance video. The key challenges for face recognition system are i) immense variation in face images of same subject due to viewing direction, ii) non-convexity property of face, that requires a non- linear algorithm that can map high dimensional feature space to low dimension, iii) a system is trained on small number of face images [39, 61]. To handle all these challenges the face recognition system consists of mainly two modules: feature extraction and classification. The feature extraction module selects the discriminative features from normalized facial image and the objective of classification module is to design a powerful and robust classifier as the non-linearity and non-convexity of face image cannot be completely resolved by image normalization and efficient feature extraction.
The performance of face recognition system degrades due to variations in illumination conditions, as the change in the appearance of a face image due to variation in illumination is larger than face identity changes. The illumination normalization is central problem in face recognition system and various well-known techniques were developed to tackle this problem. The standard techniques to deal with illumination normalization are gamma correction, logarithmic transform, Histogram Equalization (HE), histogram match etc. However, these algorithms are unable to give satisfactory results due to variants illumination conditions. Hafed et al. had used discrete cosine transform (DCT) to obtain the feature vector of face image. Chen employed DCT in logarithmic domain by discarding low-frequency DCT coefficients to overcome the affect of illumination variations [13]. Xie et al. had normalized both large (low-frequency components) and small scale features for improved facial image recognition and restoration [56]. Vishwakarma proposed a fuzzy filter based illumination normalization algorithm in which low-frequency DCT coefficients were modified using fuzzy polynomial membership function [53].
After performing preprocessing on facial image, features are extracted. A variety of feature descriptors are developed and analyzed in literature [2, 4, 5, 8, 12, 14, 47]. Deniz et al. had introduced the fusion of Histogram of oriented gradients (HOG) at different scales to acquire the important features for face recognition [17]. Zhou et al. had proposed an approach for face recognition in which features are extracted by the integration of principal components and Linear discriminant analysis (LDA) [62]. Fernando et al. had introduced the concept of multi-scale gray level co-occurrence matrix for feature descriptors [14]. Furthermore, fuzzy approximation theory can be applied on pixel wise feature extraction of image. There is perpetually uncertainty in real pattern classification problems. Fuzzy logic can be employed to analyze these problems as it deals with approximate reasoning rather than precise [23]. Zadeh had introduced the concept of fuzzy logic for imprecise mode of reasoning that helps to take a decision in uncertain conditions [59, 60]. kim et al. designed a automatic facial expression recognition classifier using neurofuzzy approach [37]. Kyoung et al. had implemented the concept of fuzzified feature vectors for face recognition. Hai-jun et al. had designed an online-sequential ELM with fuzzy activation functions for classification problems and function approximation [49].
The extracted features are recognized using classifier [1, 19, 21, 36, 41, 42] . The traditional gradient based learning algorithms are not efficient for challenging real time recognition applications. The structural parameters of these algorithms are iteratively calculated that leads to slow learning rate. Beside this, traditional algorithm are affected by overfitting, underfitting, out of memory and easily convergence to local minima problems [30, 33]. To work out with the issues of classical learning approaches, the concept of non-iterative learning algorithms was introduced [20, 24, 25, 29]. Huang et al. had proposed the concept of ELM, a fast and non-iterative single hidden layer framework for classification (binary and multiclass) and regression problems [29, 31]. Based on Bartlett concept of strong generalization performance, the objective of ELM is to minimize the output weight norm along with training error [7]. Benefitting from the non–linear mapping capability, ELM has been materialized in numerous applications like object recognition, imbalance learning, signal processing, traffic sign recognition [16, 34, 35, 67].
Although ELM has achieved some accomplishments, but there is a scope for enhancement. In previous few years researchers had done a lot of research to optimize the structure of ELM. Zhu et al. used the concept of differential evolutionary algorithm for selection of input weights [63]. Fei et al. proposed improved ELM for better performance by encoding the priori information for function approximation [26]. Huang et al. [32] schemed incremental ELM, where hidden nodes were incremented randomly with complex activation function. One of its variants was introduced by Feng et al. named as error minimized ELM (EM-ELM) [20]. In this design, hidden nodes can be incremented one by one or chunk by chunk. Deng et al. introduced the concept of regularized ELM, that works on the theory of weighted least square approach for the databases containing outliers [15]. To automate the architecture of ELM, Yoan et al. proposed optimally pruned ELM, in which statistical techniques were utilized to measure the significance of hidden nodes [46]. Wang et al. introduced effective ELM, in which the proper selection of bias and input weight were done to achieve the high performance rate [54]. Zong et al. had successfully implemented ELM on face recognition application and compared it with support vector machine (SVM) [65]. Zong et al. introduced the kernel version of ELM and implemented on face recognition application [66]. Jose et al. introduced regularized ELM, in which different regularization techniques were used to prune the architecture of ELM [45]. Huang et al. proved that ELM is better than SVM and least square SVM in terms of optimization constraints and efficiency [33]. Zong et al. schemed weighted ELM to balance the input data with imbalance class distribution by assigning different weights according to user’s needs [67]. Hong et al. introduced the hierarchical ELM based on hierarchical structure for sequential learning [27]. Tang et al. introduced the concept of multi-layer ELM by adding more than one hidden layer to the original ELM design [52]. Zhiyong et al. had utilized the ELM concept for traffic sign recognition using HOG features [34]. In order to avoid the manual tuning of network parameters in multi layer-ELM Wong et al. introduced the kernel version of ML-ELM, entitled as ML-KELM [55]. Chengbo et al. introduced the concept of improved weighted ELM for classification of imbalance data based on majority voting [44]. Deng et al. had used the concept of ELM autoencoder along with online sequential ELM for feature extraction and classification respectively in visual tracking system [16] .
For computer vision applications the single kernel classifier is not an appropriate choice. This motivates researchers to use the multiple kernel learning with classification techniques. Multiple kernel learning (MKL) is a method to select and integrate data derived from different sources such as strings, graphs and trees for a specific recognition problem [18, 57]. It strengthens the ability of machine learning by allowing the kernels to be chosen based on data. Gonen et al. proposed and analyzed combination and computational time of different methodologies for MKL on real time data [22]. Zhuang et al. explored the integration of different kernels in multi layer architecture [64]. Yanfeng et al. utilized the concept of MKL for classification of hyperspectral images, in which kernel weights were calculated by statistical significance. Bucack et al. had implemented different approaches of MKL for object recognition and analyzed that MKL is more efficient than single kernel function [11]. Fabio et al. proposed a new approach of MKL in which the kernel combination parameters were calculated by min-max algorithm [6]. Xinwang et al. proposed the sparse, non-sparse and radius incorporated methods to find the optimized kernel combination coefficient [43]. Xiao dong et al. implemented the convex combination approach to integrate different base kernels and exercised it for classification problems [40].
The objective of present investigation is to design a novel classification engine that can extract invariant features for efficient classification of non-linear face images captured under uncontrolled constraints. The essence of this novel work is:
-
The extension of original ELM is proposed with deterministic and multiple kernel learning approach. In original ELM (non-deterministic) the input parameters are randomly assigned and hidden layer neurons are experimentally calculated. In deterministic ELM the input and hidden layer parameters are analytically evaluated, which results in steady and invariant output.
-
To make it powerful and robust for real world heterogeneous databases and to discriminate between intra and inter class variance, fuzzification is envisioned for invariant feature extraction. Further the input space is mapped to optimized kernel feature space, which is integration of multiple base kernels.
-
To handle the real time challenges (lightning variations) of face recognition system some pre-processing techniques are needed to be employed. As the face image is composed of low and high frequency components and illumination variations affect the low-frequency components. Therefore to surpass the effect of lightning conditions, low frequency components needed to be modified. In the proposed method we have used the fuzzy filter based normalization algorithm that modifies the low-frequency components using polynomial membership function.
Extensive set of experiments have been executed on face databases to analyze the performance of proposed approaches with contemporary techniques including OMKELM, KELM and ELM. The experiential results acknowledge the supremacy of proposed approaches with aforesaid techniques.
This paper is framed as follows. The concept of fuzzy logic, Extreme learning machine and multiple kernel learning are briefed in Section 2. The proposed work followed by analysis is presented in Section 3. In Section 4 empirical evaluation is performed on comprehensive set of databases and section 5 summarize our conclusions.
2 Preliminaries
2.1 Fuzzy logic
In the crisp set an individual is either a member or non-member of a given set Q. The characteristic function of crisp set Q is given as:
Fuzzy logic is based on the principle of imprecise and approximate mode of reasoning, that helps in taking decision in an environment of imprecision and uncertainty [59]. Each element in fuzzy set is assigned a value that denotes its membership grade in fuzzy set. This grade is analogous to the degree with which an individual is compatible or similar with the concept illustrated by fuzzy set [38]. An individual with larger membership grade belongs to the fuzzy set with greater degree. These grades are characterized by real number in [0, 1]. The individual with a 1 value of membership grade indicates full membership and 0 value is for full-non membership. Crisp set is considered to be a restricted case of fuzzy set with only two values of membership grade i.e. 0 and 1. The membership grade of an individual in fuzzy set is determined by the membership function (MF). The MF maps individuals of a given set E (universal set) into real values in [0, 1]. The Membership grade μA for fuzzy set A is represented by
2.2 Pre-processing technique: Illumination normalization
The objective of normalization algorithm should be to reduce or nullify the effect of illumination variation on images, without changing the details of image. In this investigation fuzzy filter [53] is used for normalization of images. The key points behind this approach are i) the illumination variations remarkably affect low frequency (LF) elements and ii) the facial features which can segregate two facial images are in ascending order of effectiveness in frequency domain. Consider a face image of size U × V, where U determines number of rows and V represents the number of columns. Perform contrast stretching by employing Adaptive histogram equalization (AHE) followed by logarithmic transform (LOG) written as AHE + LOG. After contrast enhancement, image is converted from spatial to frequency domain by applying DCT. The DCT for an image of dimension U × V is given as follows:
For m = 0, 1, 2,... U − 1 and n = 0, 1, 2,…, V − 1.
The inverse DCT transform is given by
For xc = 0, 1, 2,…, U −1 and yc = 0, 1, 2,…, V −1.
where
Obtain the AC coefficients of image (representing the change in intensity values of pixels) in the ascending order of frequency. Determine the DCT coefficients for complete face image using Eq. (3).
The initial coefficients are acknowledged as low-frequency DCT (LF-DCT) coefficients. Consider a universal set c depicting the indices of LF-DCT coefficients. To depreciate the effect of illumination, LF-DCT coefficients are modified by fuzzy polynomial MF with increasing order of degree of membership of LF-DCT indices. The polynomial MF is written as follows:
where C is index of largest LF-DCT and ω is constant which is experimentally obtained. Let us consider LFN number of LF-DCT coefficients. Obtain the fuzzy membership grade for LFN coefficients using Eq. (7). These fuzzy membership grades can be expressed in the form of vector as
The modified DCT coefficients of image can be obtained using:
The fuzzy filter algorithm for illumination normalization is outlined as follows:
2.3 Extreme learning machine
ELM is non-deterministic learning approach with feedforward architecture, initially designed for single hidden layer [31, 33]. ELM foundation is based on the concept of non-iteratively calculation of input and hidden layer parameters. In ELM for accomplishing the higher accuracy rate the hidden layer parameters are experimentally tuned and input parameters are randomly projected. The output function of ELM is given as:
Where iwj is the input weight vector connecting the ith input neuron to jth hidden node, aj is learning parameter of hidden layer known as bias, θj is the weight vector related to output layer, connecting jth hidden neuron to output nodes and fo is the output of ELM. η and N denotes the count of hidden nodes and training samples respectively. Equation (10) can be compactly written as:
B is the output matrix of hidden layer that can be defined in the matrix form as
The qth column of B is the output vector of qth hidden node and qth row of B is output of hidden layer analogous to input xq. x = [x1, …xN] is the input data vector and Y = [y1, …yN]T is target output vector for N count of input instances. θ = [θ1, θ2, …, θη]T denotes the output vector for η count of hidden neurons. Each element of θ is a vector of dimension equal to output nodes (number of classes).
The input weight matrix is defined as:
The size of randomly assigned input weight vector for d dimensional input data is η × d. IW is dynamic in size, as the count of hidden nodes is not static (experimentally calculated).
Equation (11) is linear system and its least-square solution is
Where B† is Ϻoore-Ƥenrose Ğeneralized inverse [48] of output matrix B of hidden layer. The Ŏrthogonal ƿrojection approach is exploited to determine Ϻoore-Ƥenrose Ğeneralized inverse of matrix B: B† = BT(BBT)−1 .
For obtaining the strong generalization performance and to determine the invariant output solution, a positive number W is added to the diagonal of BBT [28].
W is regularization coefficient and b(x) is ELM feature mapping, generally known to user.
Different from the iterative learning algorithms, ELM objective is to minimize the output weight along with training error. The objective of ELM for classification problems is:
Where δk = [δk, 1, …, δk, h]T is training error vector of h output neurons corresponding to input data xk..
2.4 Multiple kernel learning
In kernel based classifiers the efficiency relies on finding an appropriate kernel for representing a data. Single kernel based classifiers does not give good empirical performance to classify heterogeneous data, where each instance is represented by multiple source of data [3, 11]. To address this problem multiple kernel learning is utilized to determine optimal kernel from a set of base kernels. The objective of MKL is to combine the different sets of features by determining the optimal combination of different kernels [22, 58]. MKL is an approach to concatenate features from multiple sources and fed to a single learning algorithm. The optimal combination coefficient of different kernels is data dependent.
The MKL formulation using linear combination of set of u pre-defined kernels is given by:
With γv & > 0 and \( \sum \limits_v{\gamma}_v=1 \). K(xi, xk) is resultant optimal kernel. Kv(xi, xk) represents the vth sub-kernel and γv is the weight (combination coefficient) of vth kernel. Each sub-kernel uses different set of feature vectors for every instance. Depending upon the significance of features the weights are assigned to the different kernels to achieve accurate classification.
The Eq. (18) can be equivalently defined as
The choice of kernel K(⋅, ⋅) and its combination coefficient depends on perceptivity of the classification problem. In terms of feature mapping the MKL can be formulated as
Where θ(⋅; γ) and θv(⋅) are feature mapping corresponding to K(xi, xk) and Kv(xi, xk) respectively.
3 Proposed work
In this section, we propose novel progressive ELM based algorithms subject to multi-class pattern classification applications. The objective of proposed algorithms is to overcome the randomness of original ELM along with enhancement in classification performance by utilizing multikernel and fuzzy logic concepts. The suggested algorithms with single hidden layer feedforward neural network structure are a) Deterministic ELM, in which feature vectors are obtained from pixel intensity values and feature vector mapping is performed by employing multikernel approach b) Deterministic ELM that utilizes fuzzy MF for determination of feature vectors and multikernel technique for feature mapping. The proposed approaches are evaluated on face recognition application. The block diagram of system architecture is shown in Fig. 1. As the face images captured under real time environment contains illumination variations, the training and test images are pre-processed using illumination normalization technique. For this, fuzzy filter has been applied in DCT domain. The normalized face images are classified using proposed Deterministic ELM based classifiers.
3.1 Deterministic multikernel extreme learning machine (DMK-ELM)
The real world pattern classification application demands a fast and deterministic learning algorithm to classify complex data. The random practicing of structural parameters (weights related to input layer and biases of hidden neurons) in ELM results in varying and non-deterministic output solution [4, 65]. Different classification results are obtained for the same database by changing the count of hidden nodes. For obtaining the immutable and deterministic output solution the statistics of input and hidden layer should be calculated with the help of invariant parameters. Also, to deal with the complex data that require the representation of each instance with multiple features the single kernel feature mapping is not an optimal choice. In this paper, we design a deterministic ELM integrated with MKL approach. The proposed approach is capable to classify simple and complex data, by performing appropriate selection of kernels and optimal combination coefficients. In DMK-ELM for obtaining the deterministic output, the input and hidden layer parameters are obtained with the help of input data samples. In the proposed approach, the achieved output solution is stable as the structural parameters are analytically calculated. In DMK-ELM the feature vector utilized for mapping input data to output classes is concatenation of feature vectors obtained with the help of pre defined sub-kernels.
The mathematically formulation of DMK-ELM is:
The input weight IW = [iw1, …, iwη] can be determined with the help of input data xi of dimension d as follows:
Where Z is analytically determined by evaluating the norm of input data as follows:
And the value of constant m is determined by calculating norm of all Zi:
As the counts of hidden neurons are equivalent to number of input samples, the Eq. (21) and (23) can be re-formulated as:
For d dimensional input data, the bias of kth hidden node is determined by mean of input weight vector IW of size N × d.
The objective function of DMK-ELM for multi-class classification problem is
B(.; γ) = [γ1B1(.), …, γuBu(.)] is the feature vector with respect to all the kernels from 1,2,…u.
The mathematically re-formulation of Eq. (27) is
Equation (28) is optimization problem (joint-convex) and its Lagrangian function is given by:
Where τ and α ∈ ℜN × h are lagrangian multipliers. The optimal condition of Eq. (29) is mathematically defined using KKT theorem [9] as follows:
Equation (32) can be written as:
The output function of DMK-ELM is
The Deterministic multikernel ELM algorithm is outlined as follows:
3.2 Deterministic extreme learning machine with fuzzy feature extraction (DMK-ELM-FFE)
In this section, DMK-ELM is further extended to DMK-ELM-FFE, by incorporating the concept of fuzzy logic for feature extraction. DMK-ELM is pixel wise algorithm that works on the pixel intensity of face image. In DMK-ELM-FFE the fuzzy theory is used to determine the pixel wise coalition of face image to distinct classes. It handles the uncertainty involved in face recognition due to varying environment conditions. The pixel-wise membership value extracts the unobserved information from face images likely to a considerable extent. It considers facial image as input and uses π MF to fuzzify the intensity of pixels for obtaining degree of membership of pixels to different classes. Consider a face image of dimension U × V. This can be represented in the form of UV dimensional vector T as: T = [t1, t2, …, td], which is the universal set in this context. The size of vector T is d, which denotes the number of pixels in face image i.e. UV. The MF considers each image as a vector and fuzzifies it. The π MF used for the fuzzification of face image is given by:
Where α, β and χ represents min, max and mean of a specific pixel element in all training images. π MF provides highest membership grade equal to 1 at mean χ and 0.5 at two cross-over points (q1 and q2). The q1 is between min and mean, whereas q2 lies between mean and max. The value of membership grade is less than 0.5 beyond q1 and q2. The pixels with membership grade value less than 0.5 are used to enhance the capability of face recognition system to assimilate the variation unwrapped by training data. The π MF is calculated based on the values of min, max and mean of a pixel in all training images using:
Where TD is a vector representing the intensity values of a particular pixel in all training images. For example if TX is pixel matrix of all the training images, α (1) will be the min in the first row of this matrix. Similarly β and χ are the max and mean values respectively of vector TD. After applying π MF the membership grade of face image is represented in the vector form as: M = [m1, …, m2, …, md]. M is a feature vector obtained after fuzzification of input image vector. Concatenate the pixel wise vector i.e. T and fuzzy feature vector M. The resultant feature vector is
The objective function of DMK-ELM-FFE is:
The objective function of DMK-ELM-FFE in Eq. (38) is formulated in the same way as achieved for DMK-ELM (Eq. (34)). The primary difference is that the DMK-ELM is pixel wise algorithm and DMK-ELM-FFE is multi-modal (pixel-wise and fuzzify). The output function of DMK-ELM-FFE is:
The Deterministic multikernel ELM with Fuzzy feature extraction is outlined in algorithm 3.
3.3 Analysis of proposed approaches
We conclude this section by comparing our proposed algorithms with ELM, KELM (with RBF and polynomial kernel functions) and OMKELM. The objective of DMK-ELM is to achieve deterministic and accurate solution for real time classification problems. The empirical results achieve with the help of ELM are variant. The parameters affecting the ELM performance are input weights, hidden layer biases and count of hidden neurons. In ELM, the input weights and biases are arbitrarily projected and hidden layer nodes are adjusted with respect to classification accuracy. In DMK-ELM these parameters are analytically calculated with the help of input data. The DMK-ELM is computational efficient as the input weights and biases are obtained by norm and mean operations. DMK-ELM is more accurate and potential than KELM that exercises single kernel for mapping the feature vectors, which is not efficient to discriminate inter class variance.
To make the DMK-ELM more accurate, we use the concept of approximation theory of fuzzy logic to design DMK-ELM-FFE. In this the feature vectors are obtained with the help of fuzzification operation (πMF). It is computationally faster as the parameters of π MF are analytically calculated with the help of min, max and mean operation on input data. It performs better than OMKELM, as the feature vectors are powerful to extract the hidden information from training data that reveals the inter class variance.
4 Experimental evaluation
This section demonstrates the utility of proposed approaches of classification. For this the experiments are performed on prominent face databases AT& T [50], Yale [8], Georgia Tech [12] and AR [68]. A brief description of face databases is stated in sub section. The accomplished executed results of introduced techniques are compared with OMKELM, KELM and ELM.
4.1 Databases
-
1)
AT& T database: consist of 400 images (gray scale) depicting 40 subjects (male and female). The images are captured with distinct expressions and decoration e.g. surprise, happiness, neutral, open/closed eyes. The dimension of each image is 112 × 92. The face images of database are exemplified in Fig. 2.
-
2)
Yale database: comprises of facial images representing 15 persons. The facial image of each person spans a variation in facial details like surprised, winking, center-light, left-light and illumination conditions. The variations are depicted in 11 facial images. The size of each image is 320 × 243. The images of database are exemplified in Fig. 3.
-
3)
Georgia Tech database: Comprises of 750 color facial images, depicting 50 individuals. The images of this database vary in terms of lightning conditions, appearance and facial expressions. Each image is of dimension 120 × 90. The images of this database are exemplified in Fig. 4.
-
4)
AR database: consist of 4000 images representing 70 male and 56 female subjects. An individual with varying occlusions, face configuration and lightnining conditions are depicted in 26 facial images. We have employed a subset of this database depicting 100 individuals. For each subject 14 images of dimension 165 × 120 with changing expressions and environmental conditions are considered. The images of database are exemplified in Fig. 5.
4.2 Experimental setup
In our simulations we have implemented the variants of ELM on AT& T, Yale, Georgia Tech (GT) and AR databases. For calculation of performance results, the cardinality of training set is increased with one image up to 8 images per subject. For exemplary, consider AT& T database, the minimum cardinality of training set is 40 and maximum is 320 images. Corresponding to this, the testing set has maximum cardinality of 360 images and minimum of 80 images. For ELM techniques utilizing the kernel formulation, we have exploited the RBF kernel:
and polynomial kernel:
Here σ, ζ, and φ are kernel parameters, that are experimentally adjusted with respect to classification accuracy.
In case of ELM, employing random input weight iw and bias b, we have exploited sigmoidal activation function:
x is the image vector and its dimension for AT& T database is 10,304. We figure out the performance of ELM, utilizing random weights for a count of hidden nodes η = (100, 200, …, 1000) and coefficient of regularizationW = (1, 5, 10, …, 100, 120). After trying a range of values for W and kernel parameters, the best classification results are obtained for W = 10, σ = 100, ψ = 2, ζ = 1 and η = 1000. In OMKELM the kernel function exploited for the performance calculation is convex combination of RBF and polynomial kernel given as:
The combination coefficient of different kernels is database dependent and has different optimal values for distinct databases.
4.3 Performance measures for classification
In our experiments, we utilize several metrics to characterize the performance of proposed algorithms on face databases. The main metrics employed in our research are:
-
Testing accuracy: refers to the ratio of correctly classified instances from the total number of testing instances (testing dataset).
where m_class represents the instances which are misclassified as another class and Total represents the cardinality of testing dataset.
-
Precision: refers to the proportion of predicted positive examples that are actually true positives. It is used to measure the correctly predicted labels. Precision is calculated as follows [51]:
\( precision=\frac{\sum \limits_{i=1}^h{TP}_i}{\sum \limits_{i=1}^h\left({TP}_i+{FP}_i\right)} \) where h is number of classes, TP is True Positives, FP is False Positives
-
Recall: refers to proportion of true positive examples that are predicted to be positive. It is used to measure the number of correct labels predicted by a classifier. Recall is calculated as follows [51]:
-
F-measure: is an accuracy measurement, which determines a weighted average between precision and recall. It is relationship between positively labeled data and actual prediction by a classifier based on per-class average [51].
4.4 Empirical results
Empirical results calculated by implementing competing approaches and proposed techniques on AT& T, Yale, GT and AR databases, utilizing testing accuracy as performance metric are revealed in Tables 1, 2, 3 and 4 respectively. The experimental results calculated for face databases, utilizing precision, recall and F-measure as performance metrics are revealed in Tables 5, 6, 7 and 8. The results are analyzed for different cardinality of training dataset.
4.4.1 Classification results on AT& T database
The AT& T database incorporates very less illumination variations, so no pre-processing is done on this database. The value of kernel combination coefficients γ1 and γ2 is 0.5. In AT& T database, for all the classifiers the accuracy rate increases with increase in size of training set. From the recognition results it can be stated that kernel formulation of ELM gives better results than ELM with random input parameters. For exemplary, when the cardinality of training set is 240 images, the recognition rate achieved with ELM, KELM (RBF) and KELM (polynomial) classifier is 93.75%, 96.25% and 96.80% respectively. Further, the results evaluated using multi kernels (OMKELM) are better than single kernel ELM. Although the results of KELM with polynomial kernel function are comparable with OMKELM. As compared with OMKELM, there is significant improvement in accuracy rate using DMK-ELM and DMK-ELM-FFE. For exemplary when two images per subject are used for training, the percentage increase in recognition rate is 9% with DMK-ELM and 12.5% for DMK-ELM-FFE. Figure 6(a) shows the performance comparison graph of different techniques on this database. To establish the efficacy of the proposed algorithms, the performance of all the classifiers are evaluated on other metrics for AT& T database and the values are listed in Table 5. It is clearly evoked by comparing the results using precision, recall and F-measure, that proposed classifiers are significantly superior to that of other existing ELM based classifiers.
4.4.2 Classification results on Yale database
The face images in Yale database are normalized using fuzzy filter, as the database contains large variation in terms of illumination and expressions. To reduce the effect of lightning variations, the numbers of LF-DCT coefficients considered are 21 with 0.5 value of fuzzy constant. The kernel combination coefficient for RBF and polynomial is 0.4 and 0.6 respectively. The recognition rate in this database is more than AT & T database in most of the cases. For example, when the training set size is eight images per subject we achieve 100 percentage recognition rate with KELM (polynomial), OMKELM, DMK-ELM and DMK-ELM-FFE. The accuracy results reveal that multi kernels technique is more accurate than single kernel. For example, on comparing the accuracy results of OMKELM and KELM (polynomial) the percentage increase in accuracy rate is 2%, when four images per subject are used for training. The results illustrate that proposed algorithms give better results than the OMKELM. For example when the training images are seven for each face identity, the proposed algorithms give 100% accuracy, while 98.67% accuracy is achieved with OMKELM. On average the increase in accuracy rate is more than two percentage by implementing the proposed approaches. Figure 6(b) shows the graph of comparison, based on percentage recognition rate of different approaches for Yale database. The performance results of proposed approaches based on precision, recall and F-measure are more promising than the comparative ELMs for all the size of training and test datasets. In Table 6, when seven and eight images per identity are considered for training the values of precision, recall and F-measure is equal to 1, which means all the test images are classified to their corresponding classes.
4.4.3 Classification results on GT database
The GT database contains color images, so before performance evaluation these images are converted to gray scale images without performing any pre-processing. The recognition rate of this database is very less irrespective of training size, when compared with other databases. The maximum accuracy is achieved when the cardinality of training dataset is 400, i.e. eight images per subject as there are 50 subjects with 15 images per subject. As it can be verified from the results that OMKELM gives better performance than ELM and KELM (with single kernel). The recognition results of OMKELM are comparable with DMK-ELM, but there is 10% improvement in accuracy rate with DMK-ELM-FFE irrespective of size of training dataset. On comparing the proposed approaches, the DMK-ELM-FFE results are more accurate than DMK-ELM irrespective of cardinality of training set. For example, when two images per subject are considered for training, the accuracy rate increases by 22% with DMK-ELM-FFE classifier. Figure 6(c) shows the graphical comparison, based on accuracy rate of proposed approaches with existing techniques in literature for GT database. The precision, recall and F-measure for GT database is less when compared with other databases. Although the values of these metrics are higher in case of proposed algorithms than existing ELM variants. On analyzing the performance results based on different metrics in Tables 3 and 7 for GT database, it can be concluded that among the proposed algorithms DMK-ELM-FFE is superior than DMK-ELM.
4.4.4 Classification results on AR database
As the AR database contains colored images with non-uniform lightning variations, so the images in this database are normalized and converted to gray scale. The LF-DCT coefficients modified by employing fuzzy filter are 91 and the value of fuzzy constant is 0.5. The values of kernel coefficients γ1 and γ2 are 0.4 and 0.6 respectively. On analyzing the accuracy results it can be stated that OMKELM performs better than other existing techniques of classification. The percentage increase in accuracy rate is 11%, 20% and 4% with ELM, KELM (RBF) and KELM (polynomial) respectively, when the cardinality of training set is 200 images. The achieved recognition rates with proposed schemes are more than OMKELM. For exemplary, when four images per identity are considered, the increase in accuracy rate is more than 4.5 percentage. On analyzing the performance results of proposed approaches, DMK-ELM-FFE performs better than DMK-ELM. For example when the size of training dataset is 500 images, the increase in accuracy rate is 2%. Figure 6 (d) displays the graphical comparison of different classifiers for AR database, formed on recognition rate. In Table 8, the performance results of different classifiers based on precision, recall and F-measure for AR database are shown. The empirical results based on all performance measures proves that proposed non-iterative DMK classifiers are more efficient than other state-of-the-art classifiers.
5 Conclusions
This paper prospects the issue of face recognition efficiently and deterministically by developing multi kernel learning based deterministic ELM along with fuzzy based feature extraction. The foundation of proposed approach is ELM classifier, whose structural parameters are analyzed in detail. Although the ELM is efficient classifier, but it cannot be employed in classification applications that demands deterministic and immutable output solution. To make it deterministic, the structural parameters of ELM are analytically devised. Furthermore, to make it more efficient, we employed data related multikernel approach, in which kernel and their combination coefficients are experimentally determined by investigating the perceptivity of the problem. To make it more accurate, we use fuzzy π MF that works on the intensity of pixels to find their association with different classes. From analyzing the results, it is evident that Kernel model of ELM is more accurate than ELM in most of the cases. Further, when single kernel approach is adopted the polynomial kernel gives more promising result than RBF kernel. For all the databases, OMKELM results are more accurate than single kernel approach, although the improvement in result is database dependent. On analyzing the empirical results, it can be concluded that the presented approaches outperform ELM, KELM and OMKELM. The experiential results acknowledge that the DMK-ELM-FFE performs superior than DMK-ELM for all the databases. The differences in performance results of proposed approaches are larger, when the databases are not pre-processed.
The objective of the proposed novel methods is to overcome the randomness of ELM along with to make it more efficient and accurate with optimized kernel and fuzzification of pixels. In contempt of proposed methods outshines other competitive algorithms, there is enough research work virtue to be explored in future such as to develop the variants of proposed algorithms for multilayer and to develop a potential algorithm that deals with extensive large massive training set emanated from augmentation of large-scale kernel functions.
References
Ahlawat S, Choudhary A, Nayyar A, Singh S, Yoon B (2020) Improved handwritten digit recognition using convolutional neural networks (CNN). Sensors 20:3344
Ahonen T, Hadid A, Pietikäinen M (2004) Face recognition with local binary patterns. In: Eur Conf Comput Vis. pp. 469–481
Ahuja B, Vishwakarma VP (2018) Optimised multikernels based extreme learning machine for face recognition. Int J Appl Pattern Recognit 5:330–340
Ahuja B, Vishwakarma VP (2019) Local feature extraction based KELM for face recognition. In: 2019 twelfth Int Conf Contemp Comput. pp. 1–5
Ahuja B, Vishwakarma VP (2020) Local binary pattern based feature extraction with KELM for face identification. In: 2020 6th Int. Conf. Signal Process. Commun. pp. 91–95
Aiolli F, Donini M (2015) EasyMKL: a scalable multiple kernel learning algorithm. Neurocomputing 169:215–224
Bartlett PL (1998) The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 44:525–536
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19:711–720
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge university press
Brunelli R, Poggio T (1993) Face recognition: features versus templates. IEEE Trans Pattern Anal Mach Intell 15:1042–1052
Bucak SS, Jin R, Jain AK (2013) Multiple kernel learning for visual object recognition: a review. IEEE Trans Pattern Anal Mach Intell 36:1354–1369
Chen L, Man H, Nefian AV (2005) Face recognition based on multi-class mapping of fisher scores. Pattern Recogn 38:799–811
Chen W, Er MJ, Wu S (2006) Illumination compensation and normalization for robust face recognition using discrete cosine transform in logarithm domain. IEEE Trans Syst Man, Cybern Part B 36:458–466
De Siqueira FR, Schwartz WR, Pedrini H (2013) Multi-scale gray level co-occurrence matrices for texture description. Neurocomputing 120:336–345
Deng W, Zheng Q, Chen L (2009) Regularized extreme learning machine. In: 2009 IEEE Symp Comput Intell data Min pp. 389–395
Deng C, Han Y, Zhao B (2019) High-performance visual tracking with extreme learning machine framework. IEEE Trans Cybern.
Déniz O, Bueno G, Salido J, la Torre F (2011) Face recognition using histograms of oriented gradients. Pattern Recogn Lett 32:1598–1603
Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6:615–637
Fan X, Xiang C, Chen C, et al (2020) BuildSenSys: Reusing building sensing data for traffic prediction with cross-domain learning. IEEE Trans Mob Comput
Feng G, Huang G-B, Lin Q, Gay R (2009) Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans Neural Netw 20:1352–1357
Gadekallu TR, Rajput DS, Reddy MPK, et al (2020) A novel PCA--whale optimization-based deep neural network model for classification of tomato plant diseases using GPU. J Real-Time Image Process 1–14.
Gönen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Gonzalez RC, Woods RE, others (2002) Digital image processing [M]. Publ house Electron Ind 141
Guo P (2018) A vest of the pseudoinverse learning algorithm. arXiv Prepr. arXiv1805.07828
Guo P, Lyu MR, Mastorakis NE (2001) Pseudoinverse learning algorithm for feedforward neural networks. Adv Neural Networks Appl.
Han F, Huang D-S (2006) Improved extreme learning machine for function approximation by encoding a priori information. Neurocomputing 69:2369–2373
Han H-G, Wang L-D, Qiao J-F (2014) Hierarchical extreme learning machine for feedforward neural network. Neurocomputing 128:128–135
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67
Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Neural Networks, 2004. Proceedings. 2004 IEEE Int. Jt Conf pp 985–990
Huang G-B, Zhu Q-Y, Mao KZ et al (2006) Can threshold networks be trained directly? IEEE Trans Circuits Syst II Express Briefs 53:187–191
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501
Huang G-B, Li M-B, Chen L, Siew C-K (2008) Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing 71:576–583
Huang G-B, Zhou H, Ding X, Zhang R (2012) Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man, Cybern Part B 42:513–529
Huang Z, Yu Y, Gu J, Liu H (2017) An efficient method for traffic sign recognition based on extreme learning machine. IEEE Trans Cybern 47:920–933
Jian Y, Huang D, Yan J, Lu K, Huang Y, Wen T, Zeng T, Zhong S, Xie Q (2017) A novel extreme learning machine classification model for e-nose application based on the multiple kernel approach. Sensors 17:1434
Khare N, Devan P, Chowdhary CL, Bhattacharya S, Singh G, Singh S, Yoon B (2020) SMO-DNN: spider monkey optimization and deep neural network hybrid classifier model for intrusion detection. Electronics 9:692
Kim D-J, Bien Z (2008) Design of “personalized” classifier using soft computing techniques for “personalized” facial expression recognition. IEEE Trans Fuzzy Syst 16:874–885
Klir G, Yuan B (1995) Fuzzy sets and fuzzy logic. Prentice hall New Jersey
Li SZ, Anil K (2005) Jain. Handbook of Face Recognition.
Li X, Mao W, Jiang W (2016) Multiple-kernel-learning-based extreme learning machine for classification design. Neural Comput Appl 27:175–184
Li Y, Hu H, Zhu Z, Zhou G (2020) SCANet: sensor-based continuous authentication with two-stream convolutional neural networks. ACM Trans Sens Networks 16:1–27
Li Y, Zou B, Deng S, Zhou G (2020) Using feature fusion strategies in continuous authentication on smartphones. IEEE Internet Comput 24:49–56
Liu X, Wang L, Huang G-B, Zhang J, Yin J (2015) Multiple kernel extreme learning machine. Neurocomputing 149:253–264
Lu C, Ke H, Zhang G, Mei Y, Xu H (2019) An improved weighted extreme learning machine for imbalanced data classification. Memetic Comput 11:27–34
Martínez JM, Escandell-Montero P, Soria-Olivas E et al (2011) Regularized extreme learning machine for regression problems. Neurocomputing 74:3716–3721
Miche Y, Sorjamaa A, Bas P et al (2009) OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Netw 21:158–162
Ojala T, Pietikäinen M, Mäenpää T (2000) Gray scale and rotation invariant texture classification with local binary patterns. Eur Conf Comput Vis, In, pp 404–420
Rao CR, Mitra SK (1971) Generalized inverse of matrices and its applications.
Rong H-J, Huang G-B, Sundararajan N, Saratchandran P (2009) Online sequential fuzzy extreme learning machine for function approximation and classification problems. IEEE Trans Syst Man, Cybern Part B 39:1067–1072
Samaria FS, Harter AC (1994) Parameterisation of a stochastic model for human face identification. In: Appl. Comput. Vision, 1994., Proc. Second IEEE Work. pp 138–142
Sokolova M, Lapalme G (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45:427–437
Tang J, Deng C, Huang G-B (2016) Extreme learning machine for multilayer perceptron. IEEE Trans neural networks Learn Syst 27:809–821
Vishwakarma VP (2015) Illumination normalization using fuzzy filter in DCT domain for face recognition. Int J Mach Learn Cybern 6:17–34
Wang Y, Cao F, Yuan Y (2011) A study on effectiveness of extreme learning machine. Neurocomputing 74:2483–2490
Wong CM, Vong CM, Wong PK, Cao J (2016) Kernel-based multilayer extreme learning machines for representation learning. IEEE Trans neural networks Learn Syst 29:757–762
Xie X, Zheng W-S, Lai J et al (2010) Normalization of face illumination based on large-and small-scale features. IEEE Trans Image Process 20:1807–1821
Xu Z, Jin R, Yang H, et al (2010) Simple and efficient multiple kernel learning by group lasso. In: Proc. 27th Int. Conf. Mach. Learn. pp 1175–1182
Yang H, Xu Z, Ye J et al (2011) Efficient sparse generalized multiple kernel learning. IEEE Trans Neural Netw 22:433–446
Zadeh LA (1988) Fuzzy logic. Computer (Long Beach Calif) 21:83–93
Zadeh LA (1999) Fuzzy logic= computing with words. In: Comput. with words information/intelligent Syst. 1. Springer, pp 3–23
Zhao W, Chellappa R, Phillips PJ, Rosenfeld A (2003) Face recognition: a literature survey. ACM Comput Surv 35:399–458
Zhou C, Wang L, Zhang Q, Wei X (2013) Face recognition based on PCA image reconstruction and LDA. Optik (Stuttg) 124:5599–5603
Zhu Q-Y, Qin AK, Suganthan PN, Huang G-B (2005) Evolutionary extreme learning machine. Pattern Recogn 38:1759–1763
Zhuang J, Tsang IW, Hoi SCH (2011) Two-layer multiple kernel learning. Proc Fourteenth Int Conf Artif Intell Stat, In, pp 909–917
Zong W, Huang G-B (2011) Face recognition based on extreme learning machine. Neurocomputing 74:2541–2551
Zong W, Zhou H, Huang G-B, Lin Z (2011) Face recognition based on kernelized extreme learning machine. In: Int Conf Auton Intell Syst. pp. 263–272
Zong W, Huang G-B, Chen Y (2013) Weighted extreme learning machine for imbalance learning. Neurocomputing 101:229–242
Zou C, Kou KI, Wang Y (2016) Quaternion collaborative and sparse representation with application to color face recognition. IEEE Trans Image Process 25:3287–3302
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ahuja, B., Vishwakarma, V.P. Deterministic multikernel extreme learning machine with fuzzy feature extraction for pattern classification. Multimed Tools Appl 80, 32423–32447 (2021). https://doi.org/10.1007/s11042-021-11097-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11097-3