A Robust Facial Landmark Detector with Mixed Loss

Zhang, Xian; Tong, Xinjie; Li, Ziyu; Yang, Wankou

doi:10.1007/978-3-030-36189-1_21

Xian Zhang^13,14,
Xinjie Tong¹³,
Ziyu Li^13,14 &
…
Wankou Yang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11935))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

1496 Accesses

Abstract

Facial landmark detection is one of the most important tasks in face image and video analysis. Existing algorithms based on deep convolutional neural networks have achieved good performance in public benchmarks and practical applications such as face verification, expression analysis, beauty applications and so on. However, the performance of a facial landmark detector degrades significantly when dealing with challenging facial images in the presence of extreme appearance variations such as pose, expression, occlusion, etc. To mitigate these difficulties, we propose a robust facial landmark detection algorithm based on coordinates regression in an end-to-end training fashion. By using the soft-argmax function, the network weights can be optimised with a mixed loss function. The online pose-based data augmentation technology is used to effectively solve the data imbalance problem and improve the robustness of the proposed method. Experiments conducted on the 300-W and AFLW datasets demonstrate that the performance of the proposed algorithm is competitive to the state-of-the-art heatmap regression algorithms, in terms of accuracy. Besides, our method achieves real-time speed on 300-W with 68 landmarks, which runs at 85 FPS on a Tesla v100 GPU.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Rectified Wing Loss for Efficient and Robust Facial Landmark Localisation with Convolutional Neural Networks

Article Open access 17 December 2019

Facial Landmark Detection: A Literature Survey

Article 08 May 2018

Real-Time Markerless Facial Landmark Detection Using Deep Learning

Keywords

1 Introduction

Facial landmark detection, also known as face alignment [29, 43], is a fundamental task in various facial image and video analysis applications [31, 32, 41, 44,45,46]. During the past decades, the facial landmark detection area has made significant progress. Nevertheless, many existing approaches have difficulties in dealing with in-the-wild faces with extreme appearance variations in pose, expression, illumination, blur and occlusion.

Existing facial landmark detection algorithms can be roughly divided into three categories: global appearance based approaches, constrained local models and regression-based methods. Global appearance based methods detect the key points using the whole facial textural information and global shape information [3,4,5, 13, 25, 30]. Constrained local model [17] is based on global face shape and independent local textural information around each key point that captures more robust information for illumination and occlusion variations. Regression-based methods can be divided into direct regression, cascade regression and regression with deep neural networks. At present, the most widely used and the most accurate methods are all based on deep Convolutional Neural Networks (CNNs) [12, 16]. In this paper, the proposed facial landmark detection method is based on CNNs as well.

The key innovations of the proposed method include:

For data augmentation, we adopt the online Pose-based Data Balancing (PDB) [8] method that balances the original training dataset. To be more specific, we copy the samples of low proportion defined by PDB and randomly modify the samples (flip, rotate, blur, etc.) including changing the copied samples with different styles since the intrinsic variance of image styles can also affects the performance of a trained network [6].
The baseline of this paper is CPM [34] that generates a heatmap image as the final output of a network. In order to apply Wing Loss that is specially designed for coordinates regression models in this work, we introduce the soft-argmax function [21]. The function converts heatmaps to coordinates thus the network is differentiable.
The original Wing Loss function [8] focuses on small and medium errors, but pays less attention to the samples with large errors. To address this issue, we design a new loss function, namely mixed loss, that considers the samples with errors at various magnitudes.

2 Related Work

2.1 Pose Variation

The aim of data augmentations is to reduce the bias in network training due to the imbalance of a training dataset. STN [22] applies spatial transformer network to learn transformation parameters thus to automatically initialise a training dataset. SAN [6] translates each image to four different styles by a generative adversarial module. Both of them try to inject diversity to a training dataset and balance the training samples.

2.2 Regression Model

The regression methods used from facial landmark detection can be divided into two categories: coordinate regression and heatmap regression. A coordinate regression network performs well on a dataset with sparse landmarks, but not as well as heatmap regression on dense landmarks. However, heatmap regression has been proved that the prediction can be worsen despite MSE improving during the regression of heatmap matching [26]. Luvizon et al. [21] propose the soft-argmax function to convert heatmaps to coordinates to make the network differentiable. Nibali et al. [26] use a new regularisation strategy to improve the prediction accuracy of a network.

2.3 Loss Function

For a CNN-based facial landmark detector, a loss function has to be defined to supervise the network training process. Most existing facial landmark detection approaches are based on the L2 loss, which is sensitive to outliers. Feng et al. [8] propose a new loss function, i.e. wing loss, to balance the sensitivity of small errors and big errors for the training of a deep CNN model. Guo et al. [11] introduce a loss that can adjust weights for different samples during the training process according to the tag that describes the pose of each sample. Merget et al. [23] proposes a loss function that judges whether each landmark is labelled and within the image boundary at the beginning, which gives each landmark a specific weight according to the judgement.

3 Methodology

3.1 Data Augmentation

Data imbalance is a common issue in deep learning, which limits the accuracy and robustness of a trained network [11]. From Table 1 and Fig. 1, we can see that most datasets contain a large number of frontal faces, but lack of samples with large poses, expressions, illuminations and occlusions [42]. The imbalance of a dataset in gesture is very significant. If we train a network using an imbalanced dataset, the network may not able to generalise well to practical applications. Besides, the distribution variations among training and test sets can influence the performance of a trained network significantly.

Table 1. Distribution of the 300-W dataset in gesture [29].

Full size table

To address the data imbalance problem, various algorithms have been proposed, including both geometric and textural transformations [9]. The main methods used for geometric transformation are flipping, scaling, translation and rotation. For textural transformation, Gaussian noise and brightness transformation are widely used. Nevertheless, if we randomly apply the above methods to the training samples of a dataset, we don’t know how many times a training sample should be augmented/copied.

To improve the balance of a dataset, we introduce the Pose-based Data Balancing (PDB) [8] strategy (Algorithm 1) in our work. PDB is a statistical method that aims to analyse the distribution of a face dataset in shape and posture. To adapt PDB to our network, first, we take Procrustes Analysis [10] to align all the faces in a training dataset to the mean face. Procrustes Analysis learns a affine transformation from a shape to another shape with minimum mean square error. By applying PCA to the training set and analysing the distribution of the principle component, we can balance the training set by copying each sample for a fixed number of times which is set to balance the distribution.

In order to minimise the impact of dataset imbalance on facial landmark detection accuracy, the PDB process is applied in each epoch at the beginning. Since the modification of each sample is random, the online PDB process can substantially enhance the variety of samples in different attributes. In each epoch, the data is copied the same number of times, but in different epochs, the data is randomly transformed independently. In this case, dataset is invariably expanded by a period of multiple times and each epoch can be regarded as sampling in a large dataset. According to our experiments, the offline data augmentation has a very good improvement on the performance of a detector. When we convert the offline data augmentation to online data augmentation, the performance of a trained facial landmark detector can be further improved. However, it is worth noting that offline data augmentation does not require many CPU resources. If one does not perform multi-thread data augmentation, each online PDB training process needs to multiply the original running time by several times.

3.2 Network and Mixed Loss

The backbone network of our facial landmark detector is shown in Figs. 2 and 3. The network is based on VGG16 [33] $+$ CPM [7, 34], which uses first four convolutions of VGG16 to extract coarse feature maps, followed by three stages of CPM structure. The detailed architecture of conv in Fig. 2 is shown in Fig. 3. We use the convolutional pose machines (CPM) as the main architecture, CPMs combine and concatenate outputs of each stage in the network, in order to hold the geometric constraint and semantic information in feature maps. The ground truth is transformed into heatmap style via taking Gaussian blur on the landmark point. After down-sampling the image with landmark points to the same size and channels with the output of CPM, the error between the predicted and ground truth values is back propagated in each stage of CPM since each stage is intermediately supervised by the L1/L2 loss function.

Models based on heatmap regression have higher accuracy. Additionally, all heatmap regression methods are supervised by the L2 loss, which makes it difficult to improve the form of loss function. However, it is practical to optimise the loss function in coordinates regression, because errors between points are direct. So we try to refine a detector, intending to help the model in learning better parameters by combining coordinate and heatmap regression.

In order to get refined landmark detection in a cascaded model, the multi-stage CPM network is learned in a L2 heatmap regression style. To calculate the loss of the whole network, the multi-stage L2 heatmap loss function and the improved Wing loss function are combined.

$$\begin{aligned} {l_{mix}} = {\alpha _1}{l_{po{\mathop {\mathrm {int}}} }} + {\alpha _2}{l_{stage}}, \end{aligned}$$

(1)

$$\begin{aligned} {l_{stage}} = \sum \limits _{i = 1}^3 {{\beta _i}l_{stage}^{(i)}}. \end{aligned}$$

(2)

The form of mixed loss function is shown in (1) and (2), we can see that the network will be updated by point information and heatmap information. The ratio between these two losses are controlled by $\alpha _1$ and $\alpha _2$. In heatmap loss, the output of each stage can also contributes to total loss, $\beta _1$, $\beta _2$ and $\beta _3$ are hyper-parameters.

As aforementioned, it is well-known that the proportion of difficult samples is relatively small in a training data set, causing data imbalance issue. Additionally, the simple samples usually dominate the network training. In this case, the widely used L2 loss is not necessarily the best loss function. The L2 loss function amplifies the effects of samples with large errors and neglects small errors. In contrast, the Wing loss function focuses on small and medium errors, but pay less attention to the samples with large errors. In order to design a new loss function that considers the samples with various errors, we formulate the function in Eq. (3) [8]:

$$\begin{aligned} {l_{po{\mathop {\mathrm {int}}} }} = wing\left\{ x \right\} = \left\{ {\begin{array}{*{20}{c}} {w\ln \left( {1 + \frac{x}{\varepsilon }} \right) \,\,\,\,\,\,\,\,if\,\left| x \right| < w}\\ {\left| x \right| - C\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,otherwise} \end{array}} \right. . \end{aligned}$$

(3)

3.3 Heatmap to Point Regression

It is easy to convert a heatmap to key point coordinates, by just finding the peak locations in the heatmap throughout the argmax function. However, the process is not trivial because the gradients cannot be back-propagated through argmax. To address this issue, this paper adopts soft-argmax, which can guarantee the differentiation in the training process while searching for the maximum value. We represent the argmax function as a parsed form to explain the expectation of the idea. The expectation on the idea of representing the argmax function as a parsed form.

Assuming that one channel of heatmap can be represented as I(x, y), which has the size of $W \times H \times C$, where W and H are the width and height of heatmap, and C denotes channels. The maximum point can be calculated by [26]:

$$\begin{aligned} softargmax(I) = \left( {\sum \limits _{i,j} {{W_x}\left( {i,j} \right) I\left( {i,j} \right) ,\sum \limits _{i,j} {{W_y}\left( {i,j} \right) I\left( {i,j} \right) } } } \right) , \end{aligned}$$

(4)

$$\begin{aligned} {W_x}\left( {i,j} \right) = \frac{i}{W}, \end{aligned}$$

(5)

$$\begin{aligned} {W_y}\left( {i,j} \right) = \frac{j}{H}. \end{aligned}$$

(6)

In fact, considering that in our model, each heatmap has the order of ${10^{ - 5}}$, to avoid truncation errors and insufficient precision, we use the adapted Algorithm 2.

4 Experimental Results

4.1 Datasets

In this paper, we conduct experiments on two datasets: the 300-W [29] and AFLW facial landmark datasets [15].

300-W is an open facial landmark dataset, which is composed by LFPW [1], AFW [40], HELEN [18], XM2VTS [24] and IBUG [20] datasets. The whole 300-W dataset contains 3148 training images and 687 test images. Each image in 300-W is labelled with 68 facial landmark (Fig. 4).

AFLW is another classic datasets in face alignment. AFLW consists of more than 25000 images with 21 landmarks. In our experiments, we follow AFLW-Full protocol [15], which contains 24386 images in total, 20000 images for training and others for testing. The images are annotated with 19 landmarks since the landmarks of two ears are ignored in this protocol.

Table 2. Results on 300-W and AFLW datasets. For 300-W, we use inter-pupil distance to compute NME. For AFLW, we use the face size for NME.

Full size table

4.2 Experimental Settings

We conduct all the experiments on an Intel E5-2650 v4 CPU with two Tesla v100 GPUs. The proposed method was implemented with Pytorch 1.1 [27, 28] and Python 3.7. All the input images are resized to $256\times 256\times 3$ and the output is N*2 landmark coordinates. The type of heatmap is Gaussian. Our models is updated by Stochastic Gradient Descent (SGD), with the momentum of 0.9 and weight decay of 0.0005. For the 300-W dataset, the learning rate is 0.00005, while for ALFW we set the learning rate to 0.00001. From epoch 30 to 40, the learning rate will decay by a factor of 0.2. After 40 epochs, the learning rate will decay by a factor of 0.1. We train the model for more than 60 epochs. The batch size is set to 64. In the mixed loss function, we try to combine different coefficients with grid search. We get the best result when setting $\alpha _1 = 0.7$ and $\alpha _1 = 0.3$. Meanwhile $\beta _i$ are set as $ \left\{ {0.5,0.5,1} \right\} $. In the training step, it cost above half a day on 300-W without PDB while about one day with offline PDB. When applying online PDB, it costs 8 days on the same CPU and GPU. For the AFLW dataset we don’t do online PDB due to the time limitation.

4.3 Results

We use the backbone network with L2 point regression as the baseline method. Then we try to observe the effects of different methods on 300-W, we use NME as the evaluation metric, which is defined as:

$$\begin{aligned} NME = \frac{1}{N}\sum \limits _{k = 1}^N {\frac{{{{\left\| {{x_k} - {y_k}} \right\| }_2}}}{d}}. \end{aligned}$$

(7)

where x denotes the ground truth landmarks for a given face, y denotes the corresponding prediction and d can be computed as the face size, using the inter-ocular distance or the pupil distance.

4.3.1 Results on 300-W

We apply different innovations to our experiments on 300-W. The performance of different state-of-the-art methods as well as the proposed method in terms of NME are reported in Table 2. We can see that, in spite of the accuracy loss of point regression our method achieves competitive result. The test batch size is 16 and the proposed method achieves 85 FPS on a Tesla v100 GPU.

4.4 Results on AFLW

As shown in Table 2, we conduct similar experiments on the AFLW dataset. The speed of the proposed method can also achieve more than 80 FPS under the same environment. We also summarise the important parameters, e.g. model size, FLOP and so on. The model size is similar to the model used for 300-W except for the last output layer. The number of parameters is 15.94 M, model size is 127 MB, and the GFLOPs is 2.57 billions.

5 Conclusion

In this paper, we presented a robust facial landmark detector that combines coordinate and heatmap information, thus improving the performance of a trained CNN network in terms of accuracy. Besides, we used the soft-argmax instead of argmax as well as online PDB for training data augmentation. The main purpose of the proposed method is to mitigate the dataset imbalance problem. In addition, we designed a mixed loss function consisting of more information for network training. The experiments obtained on 300-W and AFLW demonstrate the effectiveness of the proposed method compared with the state-of-the-art approaches.

References

Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2930–2940 (2011)
Article Google Scholar
Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: 2013 IEEE International Conference on Computer Vision, pp. 1513–1520 (2013)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 484–498. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0054760
Chapter Google Scholar
Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. 61, 38–59 (1995)
Article Google Scholar
Cootes, T.F., Walker, K.N., Taylor, C.J.: View-based active appearance models. Image Vision Comput. 20, 657–664 (2000)
Article Google Scholar
Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Style aggregated network for facial landmark detection. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 379–388 (2018)
Google Scholar
Dong, X., Yu, S.-I., Weng, X., Wei, S.-E., Yang, Y., Sheikh, Y.: Supervision-by-registration: an unsupervised approach to improve the precision of facial landmark detectors. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 360–368 (2018)
Google Scholar
Feng, Z.-H., Kittler, J., Awais, M., Huber, P., Wu, X.: Wing loss for robust facial landmark localisation with convolutional neural networks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2235–2245 (2018)
Google Scholar
Feng, Z.-H., Kittler, J., Xiaojun, W.: Mining hard augmented samples for robust facial landmark localization with CNNs. IEEE Signal Process. Lett. 26(3), 450–454 (2019)
Article Google Scholar
Gower, J.C.: Generalized procrustes analysis. Psychometrika 40, 33–51 (1975)
Article MathSciNet Google Scholar
Guo, X., et al.: PFLD: a practical facial landmark detector. ArXiv, abs/1902.10859 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Google Scholar
Kahraman, F., Gökmen, M., Darkner, S., Larsen, R.: An active illumination and appearance (AIA) model for face alignment. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–7 (2007)
Google Scholar
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014)
Google Scholar
Köstinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 2144–2151 (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2012)
Article Google Scholar
Kumar, N., Belhumeur, P., Nayar, S.: FaceTracer: a search engine for large collections of images with faces. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 340–353. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_25
Chapter Google Scholar
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33712-3_49
Chapter Google Scholar
Liu, Y., et al.: Grand challenge of 106-point facial landmark localization. ArXiv, abs/1905.03469 (2019)
Google Scholar
Luo, B., Shen, J., Wang, Y., Pantic, M.: The iBUG eye segmentation dataset. In: ICCSW (2018)
Google Scholar
Luvizon, D.C., Tabia, H., Picard, D.: Human pose regression by combining indirect part detection and contextual information. CoRR, abs/1710.02322 (2017)
Google Scholar
Lv, J.-J., Shao, X., Xing, J., Cheng, C., Zhou, X.: A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3691–3700 (2017)
Google Scholar
Merget, D., Rock, M., Rigoll, G.: Robust facial landmark detection via a fully-convolutional local-global context network. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 781–790 (2018)
Google Scholar
Messer, K., Matas, J., Kittler, J., Luettin, J., Maître, G.: XM2VTSDB: The extended M2VTS database (1999)
Google Scholar
Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008. LNCS, vol. 5305, pp. 504–513. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-88693-8_37
Chapter Google Scholar
Nibali, A., He, Z., Morgan, S., Prendergast, L.: Numerical coordinate regression with convolutional neural networks. CoRR, abs/1801.07372 (2018)
Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch, Alban Desmaison (2017)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)
Article Google Scholar
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S.P., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: 2013 IEEE International Conference on Computer Vision Workshops, pp. 397–403 (2013)
Google Scholar
Saragih, J.M., Goecke, R.: A nonlinear discriminative approach to AAM fitting. In: 2007 IEEE 11th International Conference on Computer Vision, pp. 1–8 (2007)
Google Scholar
Benitez-Quiroz, C.F., Srinivasan, R., Martínez, A.M.: EmotioNet: an accurate, real-time algorithm for the automatic annotation of a million facial expressions in the wild. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5562–5570 (2016)
Google Scholar
Taigman, Y., Yang, M.W., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556 (2015)
Google Scholar
Wei, S.-E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4724–4732 (2016)
Google Scholar
Xiao, S., Feng, J., Xing, J., Lai, H., Yan, S., Kassim, A.: Robust facial landmark detection via recurrent attentive-refinement networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 57–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_4
Chapter Google Scholar
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 532–539 (2013)
Google Scholar
Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 1–16. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10605-2_1
Chapter Google Scholar
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_7
Chapter Google Scholar
Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3409–3417 (2016)
Google Scholar
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2879–2886 (2012)
Google Scholar
Zhu, X., Lei, Z., Liu, X., Shi, H., Li, S.Z.: Face alignment across large poses: a 3d solution. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 146–155 (2016)
Google Scholar
Zhu, X., Lei, Z., Yan, J., Yi, D., Li, S.Z.: High-fidelity pose and expression normalization for face recognition in the wild. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 787–796 (2015)
Google Scholar
Liu, F., Zeng, D., Zhao, Q., Liu, X.: Joint face alignment and 3D face reconstruction. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 545–560. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_33
Chapter Google Scholar
Liu, F., Zhao, Q., Liu, X., Zeng, D.: Joint face alignment and 3d face reconstruction with application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1312–1320 (2017)
Google Scholar
Lu, J., Liong, V.E., Zhou, X., Zhou, J.: Learning compact binary face descriptor for face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2041–2056 (2015)
Article Google Scholar
Lu, J., Tan, Y.-P., Wang, G.: Discriminative multimanifold analysis for face recognition from a single training sample per person. In: 2011 International Conference on Computer Vision, pp. 1943–1950 (2011)
Google Scholar

Download references

Acknowledgement

This work is partly supported by the National Natural Science Foundation of China (61773117, 61703096 and 61473086), the Jiangsu key R&D plan (BE2017157) and the Natural Science Foundation of Jiangsu Province (BK20170691).

Author information

Authors and Affiliations

School of Automation, Southeast University, Nanjing, 210096, China
Xian Zhang, Xinjie Tong, Ziyu Li & Wankou Yang
Key Lab of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing, 210096, China
Xian Zhang, Ziyu Li & Wankou Yang

Authors

Xian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinjie Tong
View author publications
You can also search for this author in PubMed Google Scholar
Ziyu Li
View author publications
You can also search for this author in PubMed Google Scholar
Wankou Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wankou Yang .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhen Cui
Nanjing University of Science and Technology, Nanjing, China
Jinshan Pan
Nanjing University of Science and Technology, Nanjing, China
Shanshan Zhang
Nanjing University of Science and Technology, Nanjing, China
Liang Xiao
Nanjing University of Science and Technology, Nanjing, China
Jian Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Tong, X., Li, Z., Yang, W. (2019). A Robust Facial Landmark Detector with Mixed Loss. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Visual Data Engineering. IScIDE 2019. Lecture Notes in Computer Science(), vol 11935. Springer, Cham. https://doi.org/10.1007/978-3-030-36189-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-36189-1_21
Published: 29 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36188-4
Online ISBN: 978-3-030-36189-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics