Keywords

1 Introduction

Facial age estimation is a relatively new research topic in the area of facial image analysis. Compared with much other facial information, estimation of age has many challenges such as health, lifestyle, weather conditions, and a human gene. Another reason is that large aging databases are hard to collect. Moreover, the aging process can be accelerated or slowed down by a physical condition or lifestyle [15].

Age estimation can be considered either as a classification or a regression problem [6]. In the classification-based problem, the age group is estimated while in regression-based problem, the exact age can be estimated. Some earlier work has been reported on different aspects of age progression and estimation. Some studies deal with the age estimation process as a classification problem. Kwon and Lobo, proposed an age classification method that used both the shape and wrinkles of a human face to classify input images into only one of the three age groups: babies, young adults, and senior adults [7]. Lanitis et al., proposed a quadratic aging function that maps the Active Appearance Model (AAM) features of a face image to an age [8]. Moreover, they compared different classifiers for age estimation based on AAM features in their later work [9]. With AAM based face encoding, Geng et al., handled the age estimation problem by introducing an aging pattern subspace (AGES), which is a subspace representation of a sequence of individual aging face images [10]. Feng Gao and Haizhou Ai, used Gabor features as a face representation and the Linear Discriminant Analysis (LDA) to construct the age classifier that classifies human faces as a baby, child, adult, or elder people. In their proposed model, the images in the training set were labeled without the age information [11]. On the other hand, many studies deal with age estimation process as a regression problem. Guodong Guo et al., introduced the age manifold learning scheme for extracting face aging features and have designed a locally adjusted robust regressor for learning and prediction of human ages [12]. Ni et al., presented a multi-instances regression method in order to adopt the face images with noisy labels that were collected from Web image resources [13].

In this paper, a classification and regression models are proposed to estimate exact age from two-dimensional face images. Theoretically, classification affected by missing data than regression. Hence, in this paper, we compared between the classification and regression models when one class is neglected, i.e. missing data. In addition, the proposed models are evaluated when the genders are known or unknown. The rest of the paper is organized as follows: Sect. 2 describes some of the related work. Section 3 presents the proposed age estimation system. Experimental results and discussion are discussed in Sect. 4. Finally, concluding remarks are presented in Sect. 5.

2 Preliminaries

2.1 Local Binary Pattern (LBP) Features

LBP is one of the feature extraction methods that are used to extract local features from greyscale images [14]. In LBP, the LBP code is calculated for each pixel by comparing each pixel with its neighbors [14, 15]. The LBP code is represented by a binary number and it is calculated as follows:

$$\begin{aligned} LBP_{P,R}=\sum _{i=0}^{P-1} s(g_{i}-g_{c})2^i,\, \text {where}\,\, s(x)=\left\{ \begin{array}{l l} 1, &{} \quad {x \ge 0 }\\ 0, &{} \quad {x < 0} \end{array}\right. ,\; i=0,1,\dots , P-1 \end{aligned}$$
(1)

where \(g_c\) is the gray level of the center pixel, \(g_i\) represents the gray levels of the neighbours of \(g_c\), \(LBP_{P,R}\) is the LBP code when the number of neighbors pixels is P, \(R (R>0)\) is the distance from the center to the neighboring pixels, i.e. radius, and s is the threshold function of x [14]. The LBP code is rotated until reach its minimum to make LBP robust against rotation as follows, \(LBP_{P,R}=\text {min}\{\text {ROR}(LBP_{P,R},i)\}, \; i=0,1,\dots , P-1\), where ROR (fi) represents the circular bit-wise right shift on the f value i times. More details about LBP algorithm are reported in [14].

2.2 Facial Landmarks

The Regions Of Interest (ROI) in face images are called facial landmarks such as eyes, nose, lips and mouth. The landmarks are first located and then a template matching is used to accurately locate the location of the facial features. Cristinacce et al., have shown that precise landmarks are essential for a good face-recognition performance [16].

Fig. 1
figure 1

Some images of the FG-NET database with landmarks.

In our model, 68 landmarks of each face image that attached into the database are used to extract the features from specific regions that are the most regions affected by age variation. Sample images with landmark annotations are shown in Fig. 1.

2.3 k-NN Classifier

k-Nearest Neighbor (k-NN) classifier is one of the most widely used classifiers. In k-NN classifier, an unknown pattern, \(x_{test}\), is classified based on the similarity to the labeled/training samples by computing the distances from the unknown sample to all labeled samples and select the k-nearest samples as the basis for classification. The unknown sample is assigned to the class that has the most samples among the k-nearest samples. Hence, k-NN classifier algorithm depends on; (1) Integer k (number of neighbors) and changing the values of k parameter may change the classification result, (2) A set of labeled data, thus adding or removing any samples to the labeled samples will affect the final decision of k-NN classifier, and (3) A distance metric [17]. In k-NN, Euclidean distance is often used as the distance metric to measure the similarity between two samples as follows, \(d(x_i,x_j)=\sum _{k=1}^{d} (x_{ik}-x_{jk})^2\), where \(x_i,x_j \in \mathcal {R}^d\) and \(x_i=\{x_{i1},x_{i2},\dots ,x_{id}\}\).

2.4 Regression

Regression is used to build a relationship between a dependent variable, Y, and one or more independent variables, \(X=\{x_1,x_2,\dots ,x_p\}\) as follows, \(h_\beta =\beta _0+\beta _1 x_1+\beta _2 x_2+\dots +\beta _p x_p\), where p is the number of independent variables and \(\beta _i\)’s are the parameters/weights or regression coefficients.

Given n patterns, hence \(Y \in \mathcal {R}^ {n \times 1}\) and \(X \in \mathcal {R}^ {n\times p}\). The intercept is denoted by \(\beta _0\), i.e. where the line cuts the y axis while \(\beta =\beta _1\dots ,\beta _p\) represent the slope of the regressor. In gradient descent algorithm, the values of \(\beta \) are calculated to minimize the cost function (\(J(\beta )\)) as follows, \(J(\beta )=\frac{1}{2n}\sum _{i=1}^{n}( h_\beta (x^{(i)})-y^{(i)})^2\), where \(h_\beta \) is the hypothesis or regressor that is used to estimate the output value of \(x^{(i)}\). The values of \(\beta \) are initialized randomly and iteratively converged to minimize \(J(\beta )\) as follows, \(\beta _{i}=\beta _i-\alpha \frac{\partial J(\beta )}{\partial \beta _i} \; \forall i=1,2,\dots ,n\), where \(\frac{\partial J(\beta )}{\partial \beta _i}=\frac{1}{n} \sum _{i=1}^{n}(h_\beta (x^{(i)})-y^{(i)}) x^{(i)}\) and \(0\le \alpha \le 1\) is the learning rate [18]. Hence, the general form is \(\beta _j=\beta _j -\frac{\alpha }{n} \sum _{i=1}^{n} (h_\beta (x^{(i)})-y^{(i)})x^{(i)} \; \forall \; i=1,\dots ,n\).

To avoid the problem of overfitting in regression, a regularization parameter, \(\lambda \), is used as follows, \(\frac{\partial J(\beta )}{\partial \beta _i}=\frac{1}{n}(\sum _{i=1}^{n}(h_\beta (x^{(i)})-y^{(i)})^2 x^{(i)}+\lambda \beta _i)\), where \(\lambda \) is the regularization parameter and it is used to reduce the overfitting, which reduces the variance of the estimated regression parameters and increases the bias [18].

In non-linear regression, the objective or cost function is the same in linear regression, but the independent variables are in higher orders and hence the regressor will be non-linear.

3 Proposed Models

This section describes the proposed models, i.e. classification and regression, in detail. In the proposed models, the LBP algorithm and facial landmarks were used to extract features. The extracted features were then fused or combined through concatenating the feature vectors of the local features, i.e. LBP, and global features, i.e. facial landmarks, as shown in Fig. 2. There are two types of models, namely, classification and regression models.

Fig. 2
figure 2

Framework of the proposed Algorithm for exact facial age estimation.

Table 1 Age group distribution of the facial images in FG-NET Database.

3.1 Classification Models

In the classification models, three different methods were used to estimate the exact age. In the three methods, k-NN classifier was used to determine the class or age group of the unknown sample. The exact age still unknown and it will be calculated using one of the following methods.

k -NN-Dist Method: In this method, minimum distance classifier is used to match the unknown sample and all the trained samples belong to the class that has the highest number of the nearest neighbors.

Mean k -NN (M- k -NN) Method: Calculating the exact age using the class with a maximum number of the nearest neighbors only may avoid other correct or semi-correct classes. Hence, in this method, i.e. k-NN-Dist method, the exact age of the unknown sample was estimated by calculating the mean of the classes that were resulted from k-NN multiplying by the weight of each class as follows, \(\sum _{i=1}^{n} w_i\times \mu _i\), where \(\mu _i\) is the mean of each class and it is calculated as follows, \(\mu _i=\frac{1}{m}\sum _{x_i \in C_i} x_i\) and \(w_i\) is the weight of each class and it represents the ratio between the number of the nearest neighbors from that class, \(C_i\), to the total number of the nearest neighbors, k, as follows, \(w_i= \frac{C_i}{k}\).

k -NN-R Method: In this method, the classification and regression techniques were used to calculate the exact age of the unknown sample. Simply, the samples of the class that has the maximum number of nearest neighbors that are used to build a regression model and then the exact age will be estimated.

3.2 Regression Models

In the regression model, there were two different methods were used to estimate the exact age, namely, linear and non-linear regression methods as shown in Fig. 2. The features of the training samples were used to build both regression methods and then the exact age will be estimated.

4 Experimental Results

In this section, different experiments were conducted to estimate the exact human age from two-dimensional face images. All experiments were applied on FG-NET Database [19]. The FG-NET Aging Database (Face and Gesture Recognition Research Network) has 1002 color and grey scale face images from 82 subjects. Each subject has a different number of images ranged from 6 to 18 face images at different ages. Each face image was manually annotated with 68 landmark points. The ages are distributed in a wide range from 0 to 69 as shown in Table 1.

The original face images in the database have many different backgrounds, clothes, hair, color, illumination, and orientation. Thus, it is necessary for each face image to crop and convert it to grey scale. Moreover, all images were resized to (\(64\times 64\)). In addition, due to a small number of images of subjects older than 40 in the database, only the first four age groups were used in our experiments.

Fig. 3
figure 3

Examples of M-k-NN, k-NN-Dist, and k-NN-R proposed age estimation system models.

4.1 Simulated Example

Figure 3 shows three examples to calculate the exact age in three of the proposed models, i.e. k-NN-Dist, M-k-NN, and k-NN-R. As shown, the unknown sample is classified using k-NN classifier when \(k=19\). As a result, the unknown sample has nine, five, four, and one nearest neighbors belong to the fifth, third, fourth, and second classes, respectively, and hence the unknown sample belongs to the fifth class which has the highest number of the nearest neighbors. However, the exact age still unknown and it will be calculated using one of the following methods.

k -NN-Dist Method: Figure 3 shows an example to calculate how the exact age is estimated using this method (k-NN-Dist). As shown, the unknown sample is matched with all training samples that belong to the fifth class that has the highest number of nearest neighbors and the age of the nearest sample is assigned to the unknown sample.

M- k -NN Method: In this method, the weight of each class is first calculated. The weight of each class is the ratio between the number of the nearest neighbors in that class to the total number of nearest neighbors. Hence, the weight of the fifth class is calculated as follows, \(\frac{7}{19}\). Similarly, the weight of the third, fourth, and second classes are as follows \(\frac{3}{19}\), \(\frac{1}{19}\), and \(\frac{1}{19}\), respectively. The exact age can be estimated as shown in Fig. 3.

k -NN-R Method: In this method, the samples of the fifth class that has the maximum number of nearest neighbors represent the training samples that are used to train the regression model to estimate the exact age of the unknown sample as shown in Fig. 3.

Table 2 A comparison between classification and regression models in terms of MAE using FG-NET database.

4.2 Real Data Experiments

In our experiments, Leave-One-Person-Out (LOPO) evaluation scheme is used. In each fold, all samples of a single person were used as the testing set and the remaining samples were used as the training set. To evaluate our experiments, Mean Absolute Error (MAE) was used. MAE is one of the most commonly used metric for age estimation and it is calculated as follows, \(MAE=\frac{\sum |l_k-l_k^*|}{N}\), where \(l_k^*\) is the estimated age for the sample, \(l_k\) is the ground truth age of the sample and N is the total number of testing images [9]:

In the first experiment, the classification and regression models were used to estimate the exact age for unknown patterns. This experiment is divided into two sub-experiments. In the first one, the gender of the subject is ignored, i.e. gender unknown, while in the second sub-experiment, the subjects are divided into male and female groups and the unknown sample is matched with its group. The results of this experiment are summarized in Table 2.

In the second experiment, we proposed to evaluate the influence of losing one age group or class on the two proposed models. The third class, i.e. age group (20–29), which consists of 144 images, was neglected from the training stage. In this experiment, both classification and regression models were used. The results of this experiment are summarized in Table 3.

Table 3 A comparison between classification and regression models in terms of MAE, MAE change rate (%), and total MAE change using FG-NET database in case of missing one class.

4.3 Discussion

From Table 2 many notices can be seen. First, k-NN-R method in classification model achieved the lowest MAE among all methods in all classification models when the gender was known or unknown. Moreover, k-NN-Dist method achieved the worst results. The reason for that is k-NN-Dist depends mainly on one sample while the other two methods, i.e. M-k-NN and k-NN-R, depend on the nearest samples to the unknown sample. Second, non-linear regression method achieved MAE lower than linear regression method. The reason for this result is that the relation between the features of the images and the age of those images are non-linear and hence non-linear regression is suitable for this problem. However, there is no clear conclusion about the regularization parameter and this point needs more experiments. Generally, regression model achieved results better than classification model and the best result was 4.8 using the non-linear method and when the gender was known. Another positive finding is that the results of the gender known were better than the results of the gender unknown.

From Table 3 we can note that regression model achieved results better than classification model. Moreover, k-NN-R method and non-linear methods achieved the lower MAE in the classification and regression models, respectively. In addition, gender known results are better than gender unknown results and the MAE of male subjects was lower than female subjects. These findings are consistent with the findings from the first experiment. It is not surprising that the MAE was decreased when one class was removed. However, the difference between the MAE of the first and second experiment reflects the robustness of our two proposed models. In other words, the two proposed models achieved good results, despite removing one class. Moreover, the average changes of MAE of regression model were lower than the classification model and hence regression model deals with missing data better than classification model.

5 Conclusions and Future Work

In this paper, we have implemented the framework for facial age estimation. In this research, we have proposed two different models to estimate the facial age estimation. In the first model, classification model, k-NN classifier was used to determine the class of the unknown sample. In this model, three methods (M-k-NN, k-NN-Dist, and k-NN-R) were used to estimate the exact age. In the regression model, linear and non-linear regression methods were used to estimate the exact age of the unknown pattern. The implementation results illustrated that the regression model outperforms classification model. Moreover, the results demonstrated that the regression model deals with missing data better than classification model. Moreover, non-linear regression achieved results better than linear regression. In addition, k-NN-R method achieved the best results among all other methods in the classification model. Finally, the gender known results were better than gender unknown.

In the future, an optimization technique will be used to search for the optimal values of the regularization parameter of the linear and non-linear methods.