1 Introduction

Facial expression recognition has attracted increasing attention in recent years due to its wide application in human-computer interaction [9, 20, 24]. Although much progresses have been achieved in computational facial expression recognition, almost all studies focus on discriminative geometric and appearance features to characterize facial images, and effective classifiers to model the spatial and temporary patterns embedded in facial expressions, ignoring the effects of facial attributes, such as age, on expression recognition even though research indicates that face structures develop with ages and expression manifestation varies with ages. Furthermore, most benchmark facial expression databases, such as the MMI database and CK+ database [18], only consider expressions with a small age ranges. The lack of databases with larger age ranges limits the generality and the performance of current expression recognition studies, and further hinders the development of age-related facial expression recognition.

Recently, researchers in psychology have realized that a large number of faces throughout the adult lifespan carry crucial information for complete understanding of many psychological studies, including perception, attention, memory, social reasoning, emotion, infant and adult development, and neuropsychology [8, 19]. Therefore, two benchmark databases have been constructed: one is Lifespan [19], consisting of 575 faces from ages 18 to 93, and the other is FACES [8], containing 2,052 images from ages 19 to 80. Very recently, Ebner and Johnson’s work [7] investigated interference of face-related tasks by irrelevant faces of different ages and with different facial expressions. Their work demonstrates age-group differences in interference from emotional faces of different ages. By reviewing theoretical frameworks and empirical findings of age effects on facial expression decoding, Fölster et al. [10] concluded that the age of the face plays an important role in facial expression decoding. Their review suggests that the expression decoding accuracy for older faces may be reduced by many factors, such as lower expressivity, age-related facial changes, less elaborated emotion schemas, etc. Hess et al. [13] investigated how emotions expressed by the elderly are perceived by others. Their findings suggest that emotions shown on older faces have reduced signal clarity due to wrinkles and folds, and thus may consequently impact on the behavioral inferences that others draw from the emotion expression. Houstis and Kiliaridis [14] quantitatively evaluated the facial expressions of children and adults in order to assess their dependence on age. Their studies on 80 subjects find a trend from childhood to adulthood, showing an increase in the percentage of change in most vertical movements, possibly due to development of the mimic musculature from childhood to adulthood.

To the best of our knowledge, there are only three studies to discover the age effect on facial expression recognition in computer vision. Guo et al. [11] are the first to study the age effect on facial expression recognition computationally. They proposed two methods, i.e. age group constrained facial expression recognition and age-removing facial expression recognition. The former trains a multi-class classifier by considering each expression in each age group as one independent class. The later removes the facial wrinkles and other aging details using an edge-preserving image smoothing technique before expression recognition. Experiments were conducted on the Lifespan and the FACES databases, demonstrating the significant influence of human aging on computational facial expression recognition. Other than focusing on age-invariant expression recognition, Alnajar et al. [1] considered expression-invariant age estimation. They proposed a graphical model with a latent layer between the age/expression labels and the features to jointly learns the age and the expression. Experimental results on the Lifespan and FACES databases illustrate the improvement in age estimation when the age is jointly learnt with expression in comparison to expression-independent age estimation. In addition, expression recognition performance is improved on the FACES data set, and is comparable on the Lifespan data set by joint-learning. These two studies adopt appearance features, i.e. Gabor features [11] and LBP features [1]. Unlike the two studies, Dibeklioglu et al. [6] analyzed the effect of age on distinguishing posed and spontaneous smile by using age as one feature along with the defined dynamic features. Their experiments on the BBC, MMI, SPOS, and UvA-NEMO databases demonstrate that the performance of posed and spontaneous smile differentiation is improved by using aging information as a feature.

Among the three studies, the first two studies can recognize expression and age jointly, or remove aging details before expression recognition. It means age information is not required during testing. Therefore, age information is used as privileged information, which is only available during training [23], and is exploited during training to construct a better classifier. While in the Dibeklioglu’s study, age estimation should be performed before expression recognition during testing. Such sequential approach may propagate the error of age estimation to the subsequent expression recognition. Therefore, we prefer to incorporate age information as privileged information, which is only required during training, in this paper. Furthermore, the first two studies adopted appearance features, which are useful to describe wrinkles, and the third study used dynamic features, which are crucial for posed and spontaneous smile distinction. In this paper, we exploit spatial patterns, which carry crucial information for facial expressions that have not been thoroughly exploited in age-invariant expression recognition. Specifically, we propose two methods. One is a three-node Bayesian Network (BN) [21] to recognize expressions with the help of age from geometric features. During training, we construct a full probabilistic model P(x, x , y) by using the training set \({ (x_{i}, {x}_{i}^{\star }, y_{i}), i=1,...,l}\), where x i is geometric features, \({x}_{i}^{\star }\) is age information, and y i is expression label. During testing, we can obtain P(y|x) by marginalizing over x . The other is to construct multiple Bayesian networks to explicitly capture the spatial facial expression patterns for each age group. During testing, only facial geometric features are provided, and the samples are classified into expressions according to the BN with the largest likelihood. Experiments on the Lifespan and FACES databases demonstrate the effectiveness of our proposed approaches.

The rest of this paper is organized as follows. Section 2 introduces two benchmark databases and the extracted geometric features. Section 3 analyzes the age effect on spatial pattern of expressions and on expression recognition. Section 4 introduces our two proposed methods. Section 5 presents the results and analyses on the experiments for validating our proposed methods. Section 6 compares our methods with related work. Section 7 summarizes our work.

2 Two databases

Currently, only two databases, i.e. the FACES [8] and Lifespan [19] databases, contain a large range of age variations as mentioned in Section 1, therefore, we adopt them in our work.

The FACES database consists of 2052 images, which are divided into two sets. Since the images of the two sets are almost the same, we adopt one set in this work. The FACES database includes six expressions, i.e. anger, disgust, fear, happiness, neutral and sadness, as shown in Fig. 1. The Lifespan database consists of images with eight expressions as shown in Table 1. Since the numbers of expression samples with surprise, sadness, anger, annoy, disgust, and grumpy are much fewer than those of neutral and happy facial images, only neutral and happy samples of the Lifespan database are adopted in our work. Therefore, the number of used samples is 835. Figure 2 lists sample faces of happy and neutral expressions for the Lifespan database. Both databases are posed facial expression databases. In our work, we group the samples of both databases into 3 age groups, which are 18–31, 32–59 and 60–93 respectively as shown in Table 1.

Fig. 1
figure 1

Expression samples in the FACES database a young-anger; b young-disgust; c young-fear; d young-neutral; e young-happy; f young-sad; g middle age-anger; h middle age-disgust; i middle age-fear; j middle age-neutral; k middle age-happy; l middle age-sad. m old-anger; n old-disgust; o old-fear; p old-neutral; q old-happy; r old-sad

Fig. 2
figure 2

Expression samples in the Lifespan database a young-happy; b young-neutral; c middle age-happy; d middle age-neutral; e old-happy; f old-neutral

Table 1 Facial Expressions with Age Group Divisions on two databases

In addition, Guo et al. [11] manually labeled the fiducial points for each face image of both databases. (For FACES database, they only labeled 2004 images. So in our experiment, the database we use contains 2004 images). Therefore, we choose 26 fiducial points on the FACES database and 31 fiducial points (include two eye pupils) on the Lifespan database in our work, as shown in Fig. 3 (for the Lifespan database, the 22-th point is only used to extract person-independent features). Since only apex images are provided in the two databases, and neutral faces are not available, we extract person-independent geometric features [2], i.e. ratios of distances, areas and angles, to represent the spatial patterns embedded in expressions, instead of using distances directly or normalizing these distances using neutral faces. The person-independent features are listed in Table 2, where the second column denotes the corresponding formula to calculate the feature f j . Each facial landmark is denoted as p i = (x i , y i )∈R 2, and the index i in this table is consistent with the point index in Fig. 3a, b. The features listed in the table represent the spatial relationships of the fiducial points on faces and exhibit discriminative person-independent properties. For example, the first feature f 1 is the ratio of the distance d 1 (i.e. the distance between the left eye outer corner and the left mouth corner) to the distance d 2 (i.e. the distance between the right eye outer corner and the right mouth corner). This ratio almost remains the same for every person for the same expression, thus it is a person-independent feature. Similarly, other features listed in Table 2 also exhibit discriminative person-independent properties. More details can be found in [2].

Fig. 3
figure 3

The face fiducial points on two databases. a FACES; b Lifespan

Table 2 Person Independent Geometric Features [2]

Before feature extraction, we normalize the images according to the coordinates of two pupils.

3 Statistical analyses of age effect on expressions

Two kinds of statistical analyses are conducted to investigate the age effect on expressions. The first one is to discover whether there is any aging difference on spatial patterns embedded in expressions. The second one is to analyze age effect on expression recognition. Both analyses use person-independent features.

For the first study, a one-way ANOVA [3, 15] with age as an independent variable and the geometry features as dependent variables is adopted. The null hypothesis (H0) is that the mean value of geometric features among three age groups for each expression are equal. The alternative hypothesis (H1) is that the mean value of geometrical features among age groups for each expression are not exactly the same. The significance level is set at 0.05.

Statistical analysis results are listed in Table 3. From Table 3, we can find that for most expressions, more than half features are age-related, since their p-values are less than 0.05. It proves the age effect on spatial patterns embedded in expressions. For both databases, happy and neutral expressions have the largest number of features with significant difference. The age effect on the neutral expression may indicate that face structures develop with ages, since neutral expression mainly represents face structures but rarely expression. Compared with other non-neutral expressions, the happy expression shows more variations across age groups. It may demonstrate the changes of happy expression manifestation are much more significant than those of other expressions with ages. The reason may be that happy expression, a kind of smile expressions, is the most frequent displayed expressions during our daily life. This kind of frequently display may enhance the change of expression manifestation with ages. In addition, the features with significant difference among age groups for the same expression are not exact the same on the two databases. For example, for happy and neutral expressions, the p-values of f 3 on the FACES database are lower than 0.05, but larger than 0.05 on the Lifespan database. It may caused by the database bias.

Table 3 Results of statistic hypothesis test on two databases

For the second study, we compare the performance of expression recognition within age group with that of cross age group by using person-independent features and SVM. Ten-fold cross validation is adopted. Experimental results on the FACES database and the Lifespan database are listed in Table 4. From this table, we can obtain the following observations: first, the recognition accuracies of within age group are much higher than those of a cross age group in most cases, which clearly demonstrates the age effect on expression recognition. Second, in most cases, the accuracies of cross age group decrease with the increase of age difference between age groups. For example, on the Lifespan database, when training on age group 18–31, the expression recognition accuracy of within age group is 93.3 %, while the accuracy drops to 84.57 and 83 % respectively when testing on age group of 32–59 and 60–93. Third, the accuracy of within age group decreases with aging for both databases, suggesting the challenge of expression recognition for the old. This may be caused by the wrinkles and the facial muscle elasticity reduction developed with aging. Another possible reason is different expression manifestations for different age groups. For example, old people tend to express their expressions in a subtle way, while the young are inclined to show expressions exaggeratedly. This difficulty of expression recognition for the old may lead to the lower accuracy of within age group 60–93, compared with those of a cross age groups for both databases.

Table 4 Facial expression recognition of within age group and cross age group on two databases

Last, comparing the performance on two databases, the accuracies on the Lifespan database are higher than those on the FACES database for both within age group or cross age group. Since the number of expression categories of the FACES is six, while that of the Lifespan database is two, obviously it is easier to classify two expressions than to classify six expressions.

To further analyze age effect on expression recognition, we conduct the above within age group and cross age group facial expression recognition experiment for twenty times, and employ Wilcoxon test [22] to investigate whether there are significant differences between the performance of within age group and cross age group. Wilcoxon signed-rank test is a nonparametric method and can be used to assess whether the population means of the paired samples’ rank differ. The null hypothesis (H0) is that the difference between two age groups comes from a distribution whose median is zero. The alternative hypothesis (H1) is that the difference between two age groups comes from a distribution whose median is not zero. The significance level is set at 0.05 in our work. The results is that all the p-value are 1.9E-6, which is much lower than the significant level 0.05. It means the age influence on facial expression recognition is statistically significant.

4 Expression recognition enhanced by ages

We propose two methods to recognize expressions by modeling age-related spatial expression patterns. One is a three-node Bayesian Network to classify expressions with the help of age from person-independent geometric features. During training, we construct a full probabilistic model of features, age groups, and expression labels. During testing, we can infer the posterior probability of expression labels given geometric features by marginalizing over ages. For such a method, the age-related spatial patterns are represented in geometric features. The other is to construct multiple Bayesian networks to explicitly capture the spatial patterns embedded in expressions from feature points for different ages. During training, the age-related spatial patterns are modeled through structure and parameter learning of multiple Bayesian networks. During testing, only feature points are provided, and the samples are classified into expressions according to the BN with the largest likelihood. For such method, the spatial expression patterns are represented in the structure and parameters of learned BNs. The framework of our proposed method are shown in Fig. 4.

Fig. 4
figure 4

The flowdiagram of our methods

4.1 3-node Bayesian network for age-augmented expression recognition

The proposed 3-node Bayesian network for expression recognition enhanced by age is shown in Fig. 5b.

Fig. 5
figure 5

Two kinds of Bayesian Network. a two-node BN; b three-node BN

During training, we construct a full probabilistic model P(x, x , y) by using the training set \({(x_{i}, {x}_{i}^{\star }, y_{i}), i=1,...,l}\), where x i is geometric features, \({x}_{i}^{\star }\) is age information, and y i is expression label. The label prior probability P(y = k)(k = 1, 2, ⋯ , m) and the Conditional Probability Distribution(CPD) P(x|y = k) and \(P(x|y=k, x_{i}^{\star })\) are estimated through the Maximum Likelihood Estimation (MLE) [16] method from the training data \({(x_{i}, x_{i}^{\star }, y_{i}), i=1,...,l}\), where m is the number of expressions, and l is the number of training samples. During testing, the posterior probability P(y = k|x) is computed for each class y, and the class is recognized as the one with the highest posterior probability, according to (1):

$$\begin{array}{@{}rcl@{}} y^{\star}&=&\underset{k}{argmax} P(y=k | x)\\ &=&\underset{k}{argmax}{\frac{{\sum}_{x^{\star}}P(y=k,x,x^{\star})}{P(x)}}\\ &=&\underset{k}{argmax} {\frac{P(y=k){\sum}_{x^{\star}}P(x^{\star} | y=k)P(x|x^{\star}, y=k)}{P(x)}} \end{array} $$
(1)

where P(x |y = k) is a tabular probability and the CPD P(x|x , y = k) can be represented as Gaussian distribution: \(P(x|x^{\star }, y=k) \sim \mathcal {N} \left (x|{\mu }_{i}^{(k)}, {\Sigma }_{i}^{(k)}\right )(i=1,2,\cdot \cdot \cdot ,n)\) for each given value of x , suppose x has n states. In our work, n represents the number of age groups.

It is clear from (1) that x is encoded into p(y|x). Furthermore, according to the definition of mixture Gaussian, we find that \( P(x|y=k) = {\sum }_{x^{\star }}P(x^{\star } | y=k)P(x|x^{\star }, y=k)\) follows a mixture of Gaussian distribution, while P(x|y = k) of the native two-node BN structure (shown in Fig. 5a) often obeys a single Gaussian distribution. Since mixture Gaussian distribution can fit the data better than a single Gaussian distribution, this three-node BN structure with discrete x can better the class distribution of P(x|y).

4.2 Expression recognition by modeling age-related spatial patterns using multiple BNs

The proposed multiple BNs for expression recognition by modeling age-related spatial patterns are shown in Fig. 3. As a directed acyclic graph, a BN represents a joint probability distribution among a set of variables. In this figure, each node of a BN represents the coordinates of a feature point, and the links between nodes and their conditional probabilities capture the probabilistic dependencies among the feature points. The BN hence captures the spatial relationships among facial landmark points. We further assume the spatial relationships vary with facial expression and age. Different BNs are constructed to capture the spatial facial patterns under different age and expression.

In our work, the age group information is regarded as privileged information, thus m×n BN models G c , c = 1, … , m × n are established during training, where m is the number of expressions, and n is the number of age groups. For every BN model G c , the learning procedure includes structure learning and parameter learning from the training data set x c = (x c i )i = 1l c where \(x_{ci} = \left ({f}_{ci}^{1}, {f}_{ci}^{2}, \ldots , {f}_{ci}^{p}\right )\), and p is the dimension of features. The structure learning is to find the network with the highest score, so that the learned network can represent the training data x c best. In our work, the Bayesian Dirichlet equivalence(BDe) criterion score function is adopted [5], as defined in (2). Supposing that the prior probability of G c , c = 1, … , m × n are uniform, we get P(G c |x)∝P(x|G c ) when testing on the test set x. For the continuous nodes, the local probability distribution are linear Gaussian of the continuous parents. The parameters for each node are defined as \(f_{j} \sim N\left (b_{j} + {{W}_{j}^{T}} Pa(f_{j}),{{\delta }_{j}^{2}}\right )(j = 1,\ldots ,p)\), where P a(f j ) is the states of node f j ’s parents, W j is the regression coefficients, b j is the regression intercept, and \({{\delta }_{j}^{2}}\) is the variance. We use θ c to represent the parameters given G c .

$$\begin{array}{@{}rcl@{}} Score(G_{c}) &=& \log P(x|G_{c})\\ &=& \underset{\theta_{c}}{\max} \log P(x|G_{c},\theta_{c}) \end{array} $$
(2)

Given the score function, the search strategy greedy search with random restarts [12] was employed to learn G c . After the BN structure is constructed, the parameters can be learned from the training data. The parameter learning is to determine the conditional probability of each node given the structure of Bayesian Network. And we use Maximum Likelihood Estimation (MLE) method to estimate the parameters:

$$ \theta_{c}^{*} = arg \underset{\theta_{c}}{\max} \log P(x|\theta_{c}), $$
(3)

where θ c denotes the parameter set for c t h BN model. The algorithms of BN structure and parameter learning for continuous variables are already implemented in DEAL package [4]. In our experiment, we employed the DEAL directly. After training, the learned BNs capture the spatial patterns embedded in expressions respectively given age groups.

During testing, the posterior probability of every testing sample represents the fitness on each BN model. And the sample is given the label of the BN that best fits the sample. Thus we use the following equation to classify the testing set into expression with the maximum log-likelihood:

$$\begin{array}{@{}rcl@{}} c^{\star} &=& arg \underset{c \in [1,m\times n]}{\max} \frac {P(E_{T}|G_{c})}{Complexity(G_{c})}\\ &=& arg\underset{c \in [1,m\times n]}{\max}{\frac {{\prod}_{j=1}^{p} P_{c}(F_{j}|pa(F_{j}))}{Complexity(G_{c})}} \\ && \propto arg \underset{c \in [1,m\times n]}{\max} \sum\limits_{j=1}^{p} \log(P_{c}(F_{j}|pa(F_{j})))\\ && -\log(Complexity(G_{c})), \end{array} $$
(4)

where E T represents the features of a sample, G c stands for the c t h model where c ranges from 1 to m × n, P(E T |G c ) denotes the likelihood of the sample given the c t h model, F j is the j t h node in the BN, and p a(F j ) denotes the parent nodes of F j , and C o m p l e x i t y(G c ) represents the complexity of G c . Because of the diversity among different spatial structures, the model likelihood P(E T |G c ) will be divided by the model complexity for balance. In our work, the total number of the links in BN is used as the model complexity.

5 Experiments and analyses

To validate our proposed methods, expression recognition experiments are conducted on the FACES and Lifespan databases, and ten-fold subject-independent cross validation is adopted. For both methods, two experiments are conducted, one is to recognize expressions without considering age information, denoted as Exp model, and the other is to recognize expression using age information as privileged information, denoted as Exp_age model. For the first method, two-node BN is used as Exp model, and our proposed three-node BN is adopted as Exp_age model. For the second method, Exp model is performed by constructing m BNs using samples for each expression category, while Exp_age model is conducted by constructing m × 3 BN models to recognize expressions using samples for each age group respectively. Thus, we can obtain 6 Exp models and 18 Exp_age models on the FACES database and 2 Exp models and 6 Exp_age models on the Lifespan database. Figure 3a, b show a example of BN model on the FACES and Lifespan database respectively.

Experimental results on the FACES and Lifespan databases are shown in Tables 5 and 6 respectively. From Tables 5 and 6, we can find follows:

Table 5 Experimental results on FACES
Table 6 Experimental results on Lifespan

First, for both methods, experimental results demonstrate clear performance improvement with the help of age information, since the accuracy and F1-score of Exp_age model are higher than those of Exp model in most cases. Specifically, for the first method, the average accuracy increases 0.3 percent on the FACES database as well as 1.0 percent on the Lifespan database, and the F1-score increases 1.0 percent on both databases by using age information as privileged information. For the second method, the average accuracy is improved by 0.7 percent on the FACES database and 0.5 percent on the Lifespan database, and the F1-score increases 2.5 percent and 0.6 percent on the FACES and Lifespan database respectively. It indicates that by modeling the age-related spatial patterns embedded in expressions, our proposed methods not only improve the recognition accuracy, but also make the recognition results more balanced.

Second, the method of multiple BNs outperforms three-node BN method on both databases with higher accuracy and F1-score. It may indicate that the age-related spatial patterns represented by links and parameter of BNs may be more effective in capture expression spatial pattern than those represented in geometric features.

Third, when comparing the results on two databases, we find that the performance of the Lifespan database is better than that of the FACES database. This further proves that multi-class recognition is more challenging than binary classification.

Finally, we find that for both methods, the improvement margin of disgust and sad expression is the biggest. We think this is because that the baseline performance of these two expressions are lower than other expressions, so it is easier to achieve an improvement.

6 Comparison with related work

We compare our methods with the most related work, Guo et al’s work [11]. Since Guo et al. use Gabor features, not geometric features, we can not compare our experimental results with theirs directly. So we perform a comparison experiment by using their recognition method [11] and our features. Guo et al’s proposed to perform age group classification and facial expression recognition jointly. Specifically, each expression in each age group is considered as one independent class. Thus, the number of classifiers is equal to the product of the number of expressions and the number of age groups, and a multi-class classification is performed. In our work, we use two classifier to conduct experiment, one is SVM (Support Vector Machine) [17], the other is two-node Bayesian network. The experimental results are shown in Tables 5 and 6, denoted as Guo’s.

From the tables, we can find for both databases, the proposed multiple BN method outperforms Guo’s in terms of both accuracy and F1-score despite using SVM or BN. Specifically, the average accuracy and F1-score of our method is 4 percent and 5 percent higher than Guo’s by using BN on Lifespan database. And for the FACES database, ours is 5 percent and 17 percent higher on the accuracy and F1-score separately. Likewise, Table 6 shows that modeling spatial pattern for each expression in each age group generally improves both the accuracy and F1-score by 2 percent when applying SVM in Guo’s method. What’s more, compared to Guo’s by using SVM, the average F1-score of our method is 2.0 percent and 6.0 percent higher on the FACES database. This further demonstrates that our proposed multiple BN models systematically captures the age-related spatial patterns embedded in expressions. This also empirically shows that spatial expression pattern is more discriminative than appearance pattern.

The performance of the proposed three-node BN method is better than that of Guo’s not only on the FACES database but also on the Lifespan database when using Bayesian Network. This indicates that our method is really better than Guo’s when applying the same classifier. However, when using SVM, our method is comparable to Guo’s, since it is superior to Guo’s on the Lifespan database, but not on the FACES database. This is because as a discriminative classifier SVM is stronger than the generative Bayesian Network classifier.

The above comparison demonstrates the advantages of our approaches compared with state of the art. Our approaches can successfully capture the age-related spatial patterns embedded in expressions through the parameters and structure of Bayesian networks. The age information, which is available during training, further enhance expression classifiers.

As discussed in Section 1, both Guo et al. [11] and Alnajar et al. [1] conducted experiments on the FACES database and the Lifespan database. Although the former focused on age-invariant expression recognition, and the latter considered expression-invariant age estimation, they both adopted appearance features, i.e. Gabor features [11] and LBP features [1] respectively, and provided expression recognition results as shown in Table 7. From this table, we can find that the expression recognition performance of our method using geometric features are comparable with those using texture features. It further demonstrates the importance of spatial patterns for expression recognition.

Table 7 The accuracy of facial expression recognition in [11] and [1]

7 Conclusion and future Work

Current studies of facial expression recognition pay little attention to the age effect on the performance of expression recognition. In this paper, we propose to enhance expression recognition by modeling age-related spatial expression patterns. First, we conduct two statistical analyses to investigate the age effect on spatial patterns of expressions and on facial expression recognition respectively. Analysis results demonstrate that the spatial expression patterns are significantly different among age groups, and age information has a significant effect on the facial expression recognition. Second, we propose two methods to recognize expression with the help of age. One is a three-node Bayesian Network to classify expressions from person-independent geometric features. The age-related spatial patterns are represented in geometric features. The other is to construct multiple Bayesian networks to explicitly capture the spatial patterns embedded in expressions from feature points for different ages. The spatial expression patterns are represented in the structure and parameters of learned BNs. For both methods, age information is used as privileged information, and is exploited during training to construct a better classifier. Experimental results on two databases demonstrated the power of the proposed model in capturing age-related spatial patterns embedded in expressions as well as its advantage over existing approaches for expression recognition.

In addition to age-related spatial patterns, age-related temporal patterns is crucial for expression recognition. This work only exploits age-related spatial patterns embedded in expressions. Therefore, we will further investigate age-related temporal patterns for expression recognition in the future. Furthermore, we will also consider combing spatial and appearance expression pattern for expression recognition. Although age recognition and facial expression recognition are typically done separately and independently, they may help each other. Specifically, as demonstrated in our paper, age information could help expression recognition, expression information may also help age recognition. Therefore, another possible future work is to use expression as privilege information to improve age recognition. Currently, only two benchmark facial expression databases, i.e. the Lifespan and the FACES, contain a large range of age variations. A large scale facial expression database with multi-ethnic, multi-age, multi-personality, and multi-occupation subjects should be constructed, since the size and the diversity of a database are crucial for the research of expression recognition.