Abstract
In this paper, a joint model is devised to simultaneously tackle the age invariant face recognition and age estimation. The face image is separated into two parts: the identity specific part and the age related part. Two dictionaries are introduced to encode the two parts onto two different subspaces. Label constraint terms are added to improve the discriminative capabilities of the two learned subspaces. The identity space can be used for face recognition and the age space can be used for age estimation. Extensive experiments are conducted on the FGNET dataset and MORPH dataset, illustrating a great improvement over the state-of-the-arts.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Human facial appearances change substantially over time, resulting in great challenges for the face recognition task. On the other hand, the age estimation task expects to catch the age related information. To some extent, the two tasks can be considered as competing with each other. Most existing researches investigate the two tasks independently, while in this paper we explore the coordination between them.
Existing researches on age invariant face recognition (AIFR) fall into two categories: the generative methods and the discriminative methods. The generative methods [1,2,3] focus on generating an aging pattern which relies on accurate age estimation and the model is computationally expensive. The discriminative methods focus on extracting age invariant features for face recognition [4,5,6,7]. However, they ignore the aging effects on facial appearance.
The age estimate task usually performs by two steps: feature extraction and classification (or regression). Most existing methods extract the geometry and texture features and combine with the AAM to simultaneously model the shape and texture of face images [1, 8]. Nevertheless, it is difficult to design effective features for age estimation due to the unclear mechanism of the ageing effect on face images [9].
The method proposed in [10] considers the two tasks with conflicting goals can help inhibit irrelevant features for each task and hence improve the performance. Since the age invariant face recognition expects to extract age insensitive features while the age estimation expects the age sensitive features, it is desired to improve one of them by incorporating information from the other. Gong et al. [11, 12] propose a probabilistic model with the two latent factors and learn an identity space and an age space at the same time. But the model is based on the assumption that the latent factors and random noise follow the Gaussian distribution. The method proposed in [13] consider that the face image can be decomposed into a class specific part and a common part and a dictionary can be split into two corresponding dictionaries. However, they ignore the discrimination ability of the coding coefficients.
In this paper, a novel unified framework is proposed for face recognition and age estimation. The two tasks are modeled together in a joint framework. Identity and age dictionaries are introduced to encode the identity and age parts of a face image onto two separated subspaces. The learned identity subspace only catches the identity information of the face images, which can be used for face recognition. The age subspace catches the age sensitive features which can be used for age estimation.
2 The Joint Model for Face Recognition and Age Estimation
2.1 Face Images Decomposition
Firstly, Suppose the training set is composed of \( c \) classes, then the training samples of the \( i \)-th class \( F_{i} = [f_{1} , \ldots ,f_{{n_{i} }} ] \in R^{{d \times n_{i} }} \), where \( f_{j} \in R^{d} , j = 1, \ldots ,n_{i} \). Let \( F = [F_{1} , \ldots , F_{c} ] \in R^{d \times n} \), \( n = n_{1} + \cdots + n_{c} \) be the training samples matrix. Each training sample \( f_{j} \in R^{d \times 1} ,j = 1, \ldots ,n \) can be decomposed into four parts: the person specific part \( I_{j} \in R^{d \times 1} \), the age related part \( A_{j} \in R^{d \times 1} \), the mean face feature \( m_{j} \in R^{d \times 1} \) and the random noise part \( \varepsilon_{j} \in R^{d \times 1} \). The face image decomposition can be formulated as:
2.2 Collaboration Coding with Identity and Age Dictionaries
In this paper, two dictionaries are introduced to encode the identity part and age part of face features: the identity dictionary \( U = [U_{1} ,U_{2} \ldots U_{i} , \ldots U_{c} ] \in R^{d \times p} \), \( U_{i} \in R^{{d \times p_{i} }} \) and the age dictionary \( V = [V_{1} , \ldots V_{s} , \ldots V_{t} ] \in R^{d \times q} \), \( V_{s} \in R^{{d \times q_{s} }} \). The coding coefficients \( X_{i} = \left[ {x_{i}^{1} , \ldots ,x_{i}^{j} , \ldots x_{i}^{{n_{i} }} } \right] \in R^{{p \times n_{i} }} \) denotes the identity component of the \( i \)-th class training samples \( I_{i} \in R^{{d \times n_{i} }} \) coded over the identity dictionary \( U \in R^{d \times p} \), where \( x_{i}^{j} \in R^{p \times 1} \) denotes the identity coding efficient of \( f_{i}^{j} \). Similarly, \( Y_{i} = \left[ {y_{i}^{1} , \ldots ,y_{i}^{j} , \ldots y_{i}^{{n_{i} }} } \right] \in R^{{q \times n_{i} }} \) denotes the age component of the \( i \)-th class training samples \( A_{i} \in R^{{d \times n_{i} }} \) coded over the age dictionary \( V \in R^{d \times q} \), where \( y_{i}^{j} \in R^{q \times 1} \) denotes the age coding efficient of \( f_{i}^{j} \).
In our model, over-complete dictionary is difficult to acquire since the number of training images per class is limited. Here, we introduce collaboration coding [15] and combine it with the reconstruction error to form a unified objective function:
where \( UX \) and \( VY \) denote the person specific component \( I \) and age related component \( A \). The first term \( \left\| {F - m - UX - VY} \right\|_{F}^{2} \) is based on the idea of data reconstruction. The mean face can be calculated by: \( M = [m,m, \ldots ,m] \in R^{d \times n} \), \( m = \frac{1}{n}\sum\limits_{i = 1}^{c} {f_{i} } \in R^{d \times 1} \), where \( n \) is the size of the total training set. \( \lambda_{1} ,\lambda_{2} \) are introduced to balance the reconstruction error and the collaboration constraints.
2.3 Discriminative Coefficients
The collaboration coefficient of one face image can be regarded as a feature vector representing the data in a new space, which should be discriminant for each class. Here, two label supervised constraints are introduced [14]. Identity label supervised term \( \left\| {H_{1} - AX} \right\|_{F}^{2} \) ensures that the samples from the same identity get the similar identity coding coefficient \( x \). \( \left\| {H_{2} - BY} \right\|_{F}^{2} \) ensures that the samples from the same age group get the similar age coding coefficient \( y \) . The unified objective function is given as follows:
where \( H_{1} \in R^{c \times p} \) and \( H_{2} \in R^{a \times p} \) denote the identity and age label matrixes. If the \( l \)-th training sample belongs to the \( k \)-th identity class, \( H_{1} (k,l) \) equals 1, otherwise \( H_{1} (k,l) \) equals 0, which is similar to the definition of \( H_{2} (m,n) \).
2.4 Optimization of the Joint Model
The objection function in formula (3) is not convex for \( U,V,X,Y,A,B \) jointly. While it is convex with respect to each of them when the other ones are fixed.
-
Firstly, fix \( V,B,X,Y \) and update \( U,A \). Formula (3) can be rewritten as:
Let \( D = \left[ {\begin{array}{*{20}c} U \\ {\sqrt \alpha A} \\ \end{array} } \right] \), \( D \) can be updated class by class according to [16].
-
Secondly, fix \( U,A,X,Y \) and update \( V,B \).
The update process of \( V,B \) is similar to the optimization of \( U,A \);
-
Thirdly, fix \( U,V,A,B,Y \) and update \( X \). Formula (3) can be rewritten:
\( X \) can be calculated directly in formula (6).
-
Fourthly, fix \( U,V,A,B,X \) and updated \( Y \).
The update process of \( Y \) is similar to the optimization of \( X \).
2.5 Face Matching
In the testing, with the learned two dictionaries \( U,V \), any unseen face \( f \in R^{d} \) can be encoded as follow (assume \( \lambda_{1} = \lambda_{2} \)):
The coding coefficient \( x \) can be directly used as an age invariant feature descriptor for face recognition. The coding coefficient \( y \) can be used for age estimation. Cosine distance and the Nearest Neighbor (NN) algorithm are used for both tasks.
3 Experiment
3.1 Implementation Details
Follows the settings in [12], HOG feature is used as feature descriptor. Feature slicing and PCA are used for dimension reduction. The PCA feature dimension \( d \), the number of slices \( p \), label parameters \( \alpha ,\beta \) and regularization parameters \( \lambda_{1} ,\lambda_{2} \) are set as follows: \( d \) = 900, \( p = 6 \), \( (\alpha ,\beta ,\lambda_{1} ,\lambda_{2} ) \) = (2.3, 1.2, 0.005, 0.005) for MORPH database, \( (\alpha ,\beta ,\lambda_{1} ,\lambda_{2} ) \) = (2.3, 0.01, 0.002, 0.002) for FGNET database.
3.2 Experiment on the FGNET Database
The FGNET database contains1002 face images with 82 different individuals. Each one has 13 images on average collected at ages in the range of 0 to 69. To train the joint model, the face images in FGNET database is divided into 9 age groups every 5 age gap.
Following the training and testing split scheme used in [5, 11], the leave-one-person-out (LOPO) scheme is used for validation. Table 1 records the rank-1 recognition rate of these methods. It shows that our method can achieve an accuracy higher than the best published result (76.2%). Moreover, the proposed model outperforms the HOG feature baseline method by a clear margin.
3.3 Experiment on the MORPH Database
The MORPH Album2 database contains 55,134 facial images of 13,618 subjects. Following the experimental settings in [17], we randomly select 10,000 subjects from the MORPH Album2. For each subject, the images with the youngest age is selected as gallery set and the images with the oldest age is used as probe set. Then another 1000 subjects are randomly selected from the dataset as the training set.
The experiment results are shown in Table 2. One thing to note is that the age span of one subject on the MORPH dataset is relative small and face images contain more expression and pose variations. Nevertheless, our approach still achieves a good result on both two tasks. Some failed examples of age estimation are shown in Fig. 1. Some failed results of face matching are shown in Fig. 2.
4 Conclusions
In this paper, we develop a joint model to tackle age invariant face recognition and age estimation at the same time. The result of the experiments on the MORPH and FGNET databases indicate the effectiveness of the proposed method.
References
Geng X, Zhou ZH, Smith-Miles K (2007) Automatic age estimation based on facial aging patterns. IEEE Trans Pattern Anal Mach Intell 29(12):2234–2240
Park U, Tong Y, Jain AK (2010) Age-invariant face recognition. IEEE Trans Pattern Anal Mach Intell 32(5):947–954
Du JX, Zhai CM, Ye YQ (2013) Face aging simulation and recognition based on NMF algorithm with sparseness constraints. Neurocomputing 116:250–259
Klare B, Jain AK (2011) Face recognition across time lapse: on learning feature subspaces. In: International joint conference on biometrics (IJCB). IEEE, pp 1–8
Li Z, Park U, Jain AK (2011) A discriminative model for age invariant face recognition. IEEE Trans Inf Forensics Secur 6(3):1028–1037
Ling H, Soatto S, Ramanathan N, Jacobs DW (2010) Face verification across age progression using discriminative methods. IEEE Trans Inf Forensics Secur 5(1):82–91
Otto C, Han H, Jain A (2012) How does aging affect facial components? In: Computer Vision_ECCV 2012. Workshops and demonstrations. Springer, Berlin, pp 189–198
Lanitis A, Draganova C, Christodoulou C (2004) Comparing different classifiers for automatic age estimation. IEEE Trans Syst Man Cybern 34(1):621–628
Niu Z, Zhou M, Wang L, Gao X, Hua G (2016) Ordinal regression with multiple output cnn for age estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4920–4928
Du L, Ling H (2015) Cross-age face verification by coordinating with cross-face age verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2329–2338
Gong D, Li Z, Lin D, Liu J, Tang X (2013) Hidden factor analysis for age invariant face recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2872–2879
Gong D, Li Z, Tao D, Liu J, Li X (2015) A maximum entropy feature descriptor for age invariant face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5289–5297
Sun Y, Liu Q, Tang J, Tao D (2014) Learning discriminative dictionary for group sparse representation. IEEE Trans Image Process 23(9):3816–3828
Jiang Z, Lin Z, Davis LS (2013) Label consistent K-SVD: Learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 35(11):2651–2664
Zhang L, Yang M, Feng X (2011) Sparse representation or collaborative representation: which helps face recognition? In: IEEE international conference on computer vision (ICCV). IEEE, pp 471–478
Yang M, Zhang L, Yang J, Zhang D (2010) Metaface learning for sparse representation based face recognition. In: 17th IEEE international conference on image processing (ICIP). IEEE, pp 1601–1604
Chen BC, Chen CS, Hsu WH (2014) Cross-age reference coding for age-invariant face recognition and retrieval. In: European conference on computer vision. Springer International Publishing, Berlin, pp 768–783
Li Z, Gong D, Li X, Tao D (2015) Learning compact feature descriptor and adaptive matching framework for face recognition. IEEE Trans Image Process 24(9):2736–2745
Acknowledgements
This paper is partially financially supported by the National Science Foundation of China (NSFC) under grants 61533012 and 61521063.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, C., Su, J. (2018). A Unified Framework for Age Invariant Face Recognition and Age Estimation. In: Deng, Z. (eds) Proceedings of 2017 Chinese Intelligent Automation Conference. CIAC 2017. Lecture Notes in Electrical Engineering, vol 458. Springer, Singapore. https://doi.org/10.1007/978-981-10-6445-6_68
Download citation
DOI: https://doi.org/10.1007/978-981-10-6445-6_68
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6444-9
Online ISBN: 978-981-10-6445-6
eBook Packages: EngineeringEngineering (R0)