Keywords

1 Introduction

Beauty is an abstract concept, getting more and more attention in recent years. In general, it’s easy for people to select good ones from one subject’s portraits but hard to make sure it’s right. Which portrait is more attractive? Will other people agree with me? Many factors, such as facial expressions, makeup, lighting or resolutions can contribute to why a face is beautiful. Besides, individuals have different tastes. That means our perception of ourselves is often quite different than that of others [1]. Specially, our method offers users feedback on how their range of portraits are perceived by others. Our model can be used to select the most attractive pictures of people or delete quite bad ones from a photo collection.

At present, many scholars use features to predict facial beauty by machine learning methods [2, 15]. Bottino and Laurentini [3] delivered a study of facial beauty as seen in the pattern analysis literature. These researches suggests more attractiveness levels of one’s appearance. Zhu [4] focuses on attractiveness of a given person only based on expression. However, lighting, make up, resolution are also important factors to judge what a portrait is flat. In paper, we consider a more complicated dataset to have a more applicable model.

Ranking and relative ordering have been thoroughly investigated within the machine learning literature especially in applications pertaining to information and document retrieval [5, 6]. We also find image search related applications for image retrieval using similar techniques that were used for document retrieval [7, 8]. Our work differs significantly from those as we learn a facial ranking function [9, 14] for individuals over image pairs. Most comparison works about facial beauty [4] focus on yes or no two answers. However, it’s so hard to judge which picture is flatter when compared images are in the same attractiveness level. Differently, we obtain the additional relative attributes what we call “similar” for training. In our comparison works, we find more than 1/3 pairs are similar. Finally, our experimental results show such annotation method can provide more information for our training and get a more accuracy facial ranking model.

This paper is organized as follows. Section 2 introduces our collecting dataset, comparisons and training approach about learning to rank. Section 3 shows our assessment method and experimental results compared with another recent methods. Section 4 shows the limitations and draws the conclusion and feature works.

2 Rank Beauty

In this section we shows our works about collecting portraits dataset, pre-processing and pairwise comparisons (Sect. 2.1). Then we present our main approach we used to build the facial beauty models (Sect. 2.2).

2.1 Data and Crowdsourcing

Our first goal is to collect a set of a subject’s portraits including variety of expressions, make up, lighting and resolution. Then rating them along attributes. In this section, we first show how we collect portraits. Then, we pre-process the images to normalize the position and extract the features for our model. Finally, we collect pairwise comparisons of portraits along attractive attributes.

Collecting Data and Pre-processing. We start by collecting a large of photos from Bing Search API that may be appropriate for portraits for each subject. All pictures are from celebrities, in order to make sure the number of portraits being large enough. In total, we collected the data of 108 subjects, 500 to 600 images per subject, including both male and female subjects ranging in age from 20 to 65. We perform several pre-processing steps (Fig. 1) for each image collection to align the facial data, compute facial features and reduce data redundancy.

Fig. 1.
figure 1

Key works in pre-processing.

Fig. 2.
figure 2

Accuracy for similar images by RankSVM models

Fig. 3.
figure 3

Accuracy for dissimilar images by RankSVM models

Fig. 4.
figure 4

Accuracy for similar images by RankSVM and SVR [4]

Fig. 5.
figure 5

Accuracy for dissimilar images by RankSVM and SVR [4]

We first perform recognition and cropping to normalize the face in a common reference frame. We crop faces using the bounding boxes, generated by Vio-Jones Face detector. Then we use a face tracker [10] that accurately estimates 9 facial feature points and localizes different facial parts such as eyes, mouth, nose and facial edge (Fig. 1). We apply a median filter with a window size of 5 frames to smooth the estimated points and suppress tracking temporal jitter. Besides, we filter small face, poor alignment and Non-frontal face. Later, we warp the face into a frontal view using the 3D template model [11] and the 3D-to-2D transformation matrix [4]. We exclude portraits for which little face or the tracker reports tracking failures. Finally, we have 108 subjects and 100 to 200 portraits for every collection. Figure 1 shows several examples of the remaining images. Hani Altwaijry and Serge Belongie [2] find that the most effective feature types for predicting beauty preferences are HOG. So we use a simple and straightforward 3720-dimensional HOG (Histogram of Oriented Gradients) [12] features that was calculated on five parts of images to capture visual properties of pictures.

Pairwise Comparisons. We next collect human response data that allows us to rank attractive portraits for each subject. We develop our annotation system to collect human response data of every pairwise comparison (e.g., “Is image A more beautiful than B?”). We suggest that volunteers can consider facial expression, postures, makeup, lighting and resolutions to make choose. We sample portraits to form pairwise comparisons in random and provide 3 labels, which are “yes”, “no”, “similar”. Choosing the relative attribute of “similar” not only means photos of the comparison are similar with each other. What’s important, our system also allows annotators to chose “similar” when they hesitate between yes or no. There are 20 volunteers to support our work. Each pair is annotated once. A single worker complete about 3 or 4 groups. Finally, we receive 78 subjects’ comparisons collections, 12434 pairs in total.

2.2 Learning to Rank

We use ranking functions trained with comparative labels. This method, originally introduced by Parikh and Grauman [9], compare images in terms of how strongly they exhibit a nameable visual property. We use a large-margin approach to model our facial relative attributes. In this work, we require using the “similar” relative attributes we collected before. To learn our ranking function g, it uses a set of portraits I = {I 1, I 2, …, I n} in the dataset, each of which is described by the image features \( x_{i} \) ϵ R d. The annotated portraits list is given by a subject as a tuple A = {A 12, A 23, …, A n–1,n} and A ij ϵ {0, 0.5, 1}, where “A ij = 1” denotes that I i is more attractive than I j and “A ij = 0” means inverse. Moreover, “A ij = 0.5” denotes that both I i and I j are equivalent in terms of attractiveness. Then, we sort comparisons of all subjects to two sets according to A. The first set O = {(i, j)} consist of ordered pairs of images for which the first image I i has the attribute more than the second image I j which means “A ij = 1” or “A ji = 0”. The second set S = {(i, j)} consisted of unordered pairs for which both images have the attribute to a similar extend and “A ij = 0.5”. Our goal is to learn the function:

$$ g\left( {x_{i} } \right) = w^{T} x_{i} $$
(1)

subject to the constraints:

$$ \forall \left( {i,j} \right) \in O \to g\left( {x_{i} } \right) > g\left( {x_{j} } \right) $$
(2)
$$ \forall \left( {i,j} \right) \in S \to g\left( {x_{i} } \right) = g\left( {x_{j} } \right) $$
(3)

While this is an NP hard problem, as described by [9] is modeled as following optimization problem:

$$ {\text{minimize}}\,\left( {\frac{1}{2}\left\| w \right\|_{2}^{2} + C\left( {\sum \xi_{ij}^{2} + \sum \gamma_{ij}^{2} } \right)} \right) $$
(4)
$$ s.t.\,\,w^{T} \left( {x_{i} - x_{j} } \right) \ge 1 - \xi_{ij} ;\forall \left( {i,j} \right) \in O $$
(5)
$$ \left| {w^{T} \left( {x_{i} - x_{j} } \right)} \right| \le \gamma_{ij} ;\forall \left( {i,j} \right) \in S $$
(6)
$$ \xi_{ij} \ge 0;\gamma_{ij} \ge 0. $$
(7)

where ξij, γij are slack variables, the constant C balances the regularizer and constraints, and controls the satisfaction of strict relative order. Ranksvm is defined without Eq. (6). While we strict with two restrictions imposed by Eqs. (5) and (6) for reasons related to our sorting mechanism which corresponds to the tuple A. Extended ranksvm with Eq. (6) can learn more useful information about similar pairs. Moreover, this setting can enable us to get a more accuracy image ranking.

3 Results

In this section we present our main approach we used to measure accuracy of the ranking model (Sect. 3.1). Finally, we shows our experimental results (Sect. 3.2).

3.1 Measuring Accuracy

To test our method we collect an additional 30 subjects’ comparisons collections. To measure the accuracy of our method we turn towards a tool for comparing ranked orders of pairs: Kendall Tau [13].

In our implementation, we first focus on the accuracy of learning “more attractive than” relationships. For one subject, we get the our predicting ranking for pairs defined as E = {(i, j)} consisted of ordered pairs for which I i is more beautiful than I j. We measures the number of pairwise accuracy between O and E as follows: \( \tau \left( {E,O} \right) = \sum\limits_{{\forall \left( {i,j} \right) \in E}} {I\left( {\left( {i,j} \right) \in O} \right)} \), where I(.) is an indicator function.

Based on the Kendall Tau we construct our accuracy measurement to account for correct pairs divided by the total number of pairs to reach a notion of correctness. If N O is the total number of pairs in set O for one subject, then our accuracy measurement for E matching O is: \( \alpha \left( {E,O} \right) = \tau \left( {E,O} \right)/N_{O} \).

Then we want to make full use of set S to know the ability of our model to rank similar portraits during our testing. We define a set D = {Dij} and (i, j) ϵ S, where Dij = |g(xi) − g(xj)|/d and d is a normalized parameter. Dij means the normalized margin between mapped image Ii and Ij. In our implementation, we compare our model with the model presented in [3]. Next we set ours as Ds and another as Dus. We measure the number of the greater margin pairs between Ds and Dus as: \( \tau \left( {D_{s} ,D_{us} } \right) = \sum {\left\{ {D_{s} \left( {i,j} \right) < D_{us} \left( {i,j} \right)} \right\}} \). If NS is the total number of pairs in set S for one subject, then our accuracy measurement about similar pairs for D s relative to D us is: \( \alpha \left( {D_{s} ,D_{us} } \right) = \tau \left( {D_{s} ,D_{us} } \right)/N_{S} \). Also we can easy know the accuracy of another model about similar pairs is \( 1 - \alpha \,(D_{\text{s}} ,D_{\text{us}} ) \).

Later, we will show the good performance of our model both in dissimilar and similar image pairs.

3.2 Facial Beauty Ranking

We designed our experiments using the method, which is introduced in Sect. 2.2. We trained our model by the 12434 training examples from 78 subjects and the relative attributes of each subjects from our volunteers. In order to test the accuracy of the rank system, we have 30 testing subjects and collecte pairwise comparisons of each subjects additionally. Running the experiment with four testing subjects showed the ranking results in Fig. 6. In Fig. 6, we show two rows for each of four testing subjects from personal images collections: the five most attractive, and the five least.

Fig. 6.
figure 6

Ranking results by Extended-RankSVM

For observing how the similar attribute behaved, we decided to train original ranksvm model which is without the restriction imposed by Eq. (6). Noticing that similar attribute has no contribution to this rank model. We ran the experiment with the pairwise comparisons of each subjects and the results as the ranking accuracy plot are showed in Fig. 2. From Fig. 2, two models have exactly the same high rank accuracy for the dissimilar portraits. But original ranksvm model perform bed in Fig. 3 when it comes to the similar images. We believe this is due to the similar attribute. The results shows the similar attribute contributes a more accurate rank model.

Then we contrasted our facial beauty rank system with the existing SVR model proposed in [4]. Figures 4 and 5 shows how the two models perform. In Fig. 5, the average accuracy of 73 % was achieved by our rank system and another model performed the average accuracy of 59 %. Besides, our rank system perform good than another when ranking similar portraits. This can be seen with Fig. 4. Considering Figs. 2, 3, 4 and 5 our facial beauty rank system performs quite accurate.

Finally, we conclude that the portrait which has a good makeup, nice lighting, flat expression and not bad resolution gets higher ranking. Especially, our model loves personal photos with high professional photography level. The result of our facial rank model seems reasonable.

4 Conclusion

Our method has some limitations. Our model is a cross-subject rank model to predict attractiveness automatically. However, different people have different nice poses, expressions, polishing ways and so on, which are suitable for themselves. Besides, although our annotations from 20 workers maybe can stand for the public opinion, people still have many different opinions in details.

In this paper, we described a ranking system for facial beauty that can rank portraits from one subject according to public preferences. More importantly, we made full use of the relative attributes, especially similar attribute. Finally, we achieve an average accuracy of 73 %. Therefore, we can express that our personal facial beauty rank system do not only consider the expressions but also integrate with considering the lighting, clarity, makeup and so on. Similar relative attribute is proved to be important. Next, we can improve our dataset and organized method. Besides, doing some adaptive works is a promising avenue for the future work.