An Illumination Augmentation Approach for Robust Face Recognition

Feng, Zhanxiang; Xie, Xiaohua; Lai, Jianhuang; Huang, Rui

doi:10.1007/978-3-319-97909-0_44

Zhanxiang Feng²¹,
Xiaohua Xie^22,23,
Jianhuang Lai^22,23 &
…
Rui Huang²²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10996))

Included in the following conference series:

Chinese Conference on Biometric Recognition

3207 Accesses

Abstract

Deep learning has achieved great success in face recognition and significantly improved the performance of the existing face recognition systems. However, the performance of deep network-based methods degrades dramatically when the training data is insufficient to cover the intra-class variations, e.g., illumination. To solve this problem, we propose an illumination augmentation approach to augment the training set by constructing new training images with additional illumination components. The proposed approach first utilizes an external benchmark to generate several illumination templates. Then we combine the generated templates with the training images to simulate different illumination conditions. Finally, we conduct color correction by using the singular value decomposition (SVD) algorithm to confirm that the color of the augmented image is consistent with the input image. Experimental results demonstrate that the proposed illumination augmentation approach is effective for improving the performance of the existing deep networks.

Access provided by CONRICYT-eBooks. Download conference paper PDF

ILL-Dataset: A Dataset Under Different Illumination Conditions for Face Recognition

FIE-GAN: Illumination Enhancement Network for Face Recognition

Convolutional Neural Networks for Face Illumination Transfer

Keywords

1 Introduction

Face recognition has been a hot research topic in the past decades and attracted considerable research attention. In recent years, with the development of deep learning techniques and the emergences of large-scale face datasets, deep network-based methods have significantly advanced face recognition techniques [1,2,3]. Applications of face recognition are emerging in video surveillance, social security, company attendance, and identity authentication.

Although deep models have proved their effectiveness in improving the performance of face recognition techniques, most of the existing face recognition systems are trained with large-scale datasets. In some applications, each person contains only 1–2 samples, and the training data may be inadequate to cover the changing illumination, pose, and image quality. Particularly, the performance of the existing deep networks will decrease dramatically when dealing with extreme lighting condition. Therefore, the topic of learning robust deep representations with insufficient training samples will be worthwhile for face recognition.

A natural idea is to augment the training data and generate additional training samples. Some recent works have focused on synthesizing novel face images with changing poses, attributes, and identities by GAN (generative adversarial network) [4,5,6]. Learning deep networks with the synthesized face images improves the performance of deep networks to some specific problem. However, the generalizability of GAN-based approaches to other datasets is under study. Besides, controlling the facial details and ID of the generated image by GAN is extremely difficult. Furthermore, the training process of the GAN models is time-consuming, and the labeling of face image attributes is costly.

In this paper, we focus on improving the performance of face recognition systems using the data augmentation technique. We propose to perform illumination augmentation to increase the diversity of the training data. Firstly, we generate different reference illumination templates from other datasets. For each training sample, our approach simulates different illumination conditions using the pre-defined illumination templates. Eventually, we utilize the singular value decomposition (SVD) algorithm to transform the output image to the color subspace which is consistent with the input image. Furthermore, we construct a new dataset by collecting images stored in the second-generation ID cards and images captured in the realistic surveillance environment. We also build a testing set which comprises of images captured in the railway station. Experiments demonstrate that the proposed illumination augmentation approach is effective for improving the performance of deep network-based face recognition models.

2 Related Works

Deep networks have achieved remarkable success in face recognition and dramatically improved the performance of the state-of-the-art methods [1,2,3]. Taigman et al. [1] proposed a pioneer CNN model named DeepFace which outperformed traditional face recognition methods and closely approached human-level performance. Sun et al. [2] proposed a DeepID network which employed identification and verification supervisory signals to improve the recognition performance. Schroff et al. [3] proposed a network named FaceNet which adopted triplet loss to enforce a margin between distances of intra-class samples and those of inter-class samples.

3 Proposed Method

3.1 Overall Framework

Figure 1 demonstrates the overall framework of the proposed illumination augmentation approach. We first perform Gaussian filtering on the reference images from an external benchmark to extract the reference illumination masks. Then, we extract the facial details of the input image by subtracting the illumination mask of the input. After that, we combine the facial details of the input image with the reference illumination masks to generate face images with different illumination situations. Eventually, we perform color correction to make sure that the color components of the augmented image is consistent with the input image.

3.2 Illumination Variation Simulation

We perform Gaussian filtering with a large blur kernel on the reference image and input image to extract the corresponding illumination mask. Denote the input image and the reference image as $\varvec{X}_i$ and $\varvec{X}_r$, we can compute the illumination masks $\varvec{X}^m_i$ and $\varvec{X}^m_r$ as follows:

$$\begin{aligned} \begin{aligned} \varvec{X}_i^m&=\varvec{X}_i*\mathcal {G},\\ \varvec{X}_r^m&=\varvec{X}_r*\mathcal {G},\\ \end{aligned} \end{aligned}$$

(1)

where $\mathcal {G}$ denotes the function of Gaussian filtering and can be defined as follows:

$$\begin{aligned} \begin{aligned} \mathcal {G}=\frac{1}{2\pi \sigma ^2}e^{-\frac{\varvec{X}^2+\varvec{y}^2}{2\sigma ^2}}. \end{aligned} \end{aligned}$$

(2)

We can obtain the facial details of the input image by subtracting the illumination mask. Denote $\varvec{X}^d_i$ as the facial details, the computation is as follows:

$$\begin{aligned} \begin{aligned} \varvec{X}_i^d&=\varvec{X}_i-\varvec{X}_i^m. \end{aligned} \end{aligned}$$

(3)

Then we combine the facial details with the reference illumination mask to simulate different illumination conditions $\varvec{X}^v_i$, which is formulated as follows:

$$\begin{aligned} \begin{aligned} \varvec{X}_i^v&=\varvec{X}_i^d+\varvec{X}_r^m. \end{aligned} \end{aligned}$$

(4)

3.3 Color Correction

The color components of the simulated image may be different to the input image. We propose to conduct color correction to ensure that the color components of the final output is consistent with that of the input image. We first perform SVD [11] algorithm on each channel of both the input image and the simulated output to extract their color components. Denote $\varvec{X}_{iA},A=\{R,G,B\}$ and $\varvec{X}_{iA}^v,A=\{R,G,B\}$ as input and simulated image, we have:

$$\begin{aligned} \begin{aligned} \varvec{X}_{iA}&=U_{iA}\varSigma _{iA}V_{iA},A=\{R,G,B\},\\ \varvec{X}_{iA}^v&=U_{iA}^v\varSigma _{iA}^vV_{iA}^v,A=\{R,G,B\}. \end{aligned} \end{aligned}$$

(5)

Note that $\varSigma _{iA}$ and $\varSigma _{iA}^v$ contains the color components of the input and simulated image, we can correct the color condition of the simulated image according to the input image by replacing $\varSigma _{iA}^v$ with $\varSigma _{iA}$. Denote $\varvec{X}_{oA}$ as the augmented output, then we have:

$$\begin{aligned} \begin{aligned} \varvec{X}_{oA}&=U_{iA}^s\varSigma _{iA}V_{iA}^s,A=\{R,G,B\} \end{aligned} \end{aligned}$$

(6)

4 Experiment

4.1 Experimental Settings

Training Set. We utilize CASIA-WebFace [7], a popular public face dataset, to train the baseline model. CASIA-WebFace contains 494,414 samples of 10,575 subjects detected from the Internet.

We also construct a domestic dataset for training a stronger model for the domestic face recognition. The domestic training dataset contains 864,652 samples of 386,847 subjects. Most of the subjects contain only 2–3 images, of which one image is from the second-generation ID cards and other images are from the surveillance videos. Training a robust model for the domestic dataset is challenging because of the lack of training sample for each person.

Testing Set. We evaluate the performance of the proposed illumination augmentation approach on the LFW dataset [8]. The LFW dataset contains 13,233 images of 5,749 subjects captured in the unconstrained environment. The LFW dataset is now the most popular benchmark for face recognition. We adopt the standard verification protocol to conduct fair comparison with other methods.

We also construct a domestic testing set to evaluate the performance of the face recognition models under realistic surveillance environment. The domestic testing dataset contains 3,722 prob images of 39 subjects captured in a railway station. The challenges include illumination, pose, and occlusion. For testing, we conduct matching between the domestic testing dataset and a gallery set comprised of 10,039 images captured in the second-generation ID cards.

Table 1. The DenseNet structure

Full size table

Implementation Details. We select 20 images from the CMU-PIE [9] dataset to generate reference illumination templates. For each training sample, we randomly select 2 reference templates and obtain 2 illumination augmentation results. Our training process is two step. First we train a baseline model with CASIA-WebFace using the DenseNet [10] structure. Table 1 demonstrates the details of the network. Then we utilize the triplet loss [3] to fine-tune the baseline model with the domestic training set. For the first step, we set the batch size as 128, the learning rate as 0.1 and will be decreased by half every 40,000 iterations, and the weight decay as $5\times 10^{-4}$. For the second step, we set the batch size as 120, the learning rate as 0.01 and will be decreased by half every 40,000 iterations, and the weight decay as $5\times 10^{-4}$.

4.2 Qualitative Evaluation of Illumination Augmentation

Figures 2 and 3 demonstrate the illumination augmentation results on CASIA-WebFace and the domestic training set. We can see that the illumination augmentation approach manage to add additional illumination variations to the input image without changing the facial details for both CASIA-WebFace and the domestic training set. Our approach is proved to be adaptive to any training sample with changing illumination, pose, and image quality.

4.3 Quantitative Evaluation of Illumination Augmentation

Evaluation on LFW. Table 2 demonstrates the quantitative evaluation of the proposed illumination augmentation (IA) approach on the LFW dataset. We compare our method with DeepFace [1], DeepID2+ [2], and FaceNet [3]. The experimental results verify that the proposed IA approach is effective for improving the performance of the existing deep models. Implementing the proposed network with the augmented training samples results in an improvement of 0.27% verification accuracy. We also notice that the verification accuracy of the proposed approach outperforms DeepFace and DeepID2+. Note that with the same training samples, the accuracy of our method is higher than that of FaceNet. Consequently, our method is competitive against the state-of-the-art methods.

Table 2. Evaluation on LFW

Full size table

Table 3. Evaluation on the domestic testing set

Full size table

Evaluation on the Domestic Testing Set. Table 3 demonstrates the evaluation results on the domestic testing set. We can see that training deep models with the domestic training set is beneficial to improving the recognition accuracy on the test set captured in the realistic surveillance environment. Compared with the deep models trained with CASIA-WebFace, an improvement of 24.6% is obtained for FaceNet trained with the domestic training set. Similarly, an improvement of 19.77% is also observed for the proposed network. With more training data, the performance of deep networks continue to improve. As the number of training data increases from 0.12M to 0.86M, we can see a performance gain of 7% for FaceNet and 8.73% for our network. Furthermore, we notice that the proposed IA approach is effective for improving the performance of deep networks with the domestic dataset. With IA approach, an improvement of 4.02% is observed for the proposed network. Note that the performance of the proposed network trained with CASIA-WebFace outperforms that of FaceNet with a margin of 8.56%. Consequently, our method achieves better generalizablity than FaceNet.

5 Conclusion

In this paper, we study the topic of data augmentation for face recognition and propose an illumination augmentation (IA) method. We first simulate different illumination conditions from the external benchmark and then perform color correction to obtain the augmented training samples with additional illumination variations while preserving the facial details. The IA approach is suitable for any face image with changing illumination, pose, and image quality. To further improve the performance of the deep networks towards robust face recognition under realistic environment, we construct a domestic training set together with a domestic testing set. Experimental results on the LFW and the domestic testing set verify the effectiveness of the proposed approach.

References

Taigman, Y., Yang, M., Ranzato, M.A., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2892–2900 (2015)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Yin, W., Fu, Y., Sigal, L., Xue, X.: Semi-latent GAN: learning to generate and modify facial images from attributes. arXiv preprint arXiv:1704.02166
Bao, J., Chen, D., Wen, F., Li, H., Hua, G.: CVAE-GAN: fine-grained image generation through asymmetric training. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2745–2754 (2017)
Google Scholar
Tran, L., Yin, X., Liu, X.: Disentangled representation learning GAN for pose-invariant face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1415–1424 (2017)
Google Scholar
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv preprint arXiv:1411.7923
Huang, G.B., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical report 07–49, University of Massachusetts, Amherst (2007)
Google Scholar
Sim, T., Baker, S., Bsat, M.: The CMU pose, illumination, and expression (PIE) database. In: Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 53–58 (2002)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Demirel, H., Anbarjafari, G.: Pose invariant face recognition using probability distribution functions in different color channels. IEEE Sig. Process. Lett. 537–540 (2008)
Google Scholar

Download references

Acknowledgments

This project was supported by the NSFC (U1611461, 61573387, 61672544) and Tip-top Scientific and Technical Innovative Youth Talents of Guangdong special support program (NO. 2016TQ03X263).

Author information

Authors and Affiliations

School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, China
Zhanxiang Feng
School of Data and Computer Science, Sun Yat-sen University, Guangzhou, China
Xiaohua Xie, Jianhuang Lai & Rui Huang
Guangdong Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou, China
Xiaohua Xie & Jianhuang Lai

Authors

Zhanxiang Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Xie
View author publications
You can also search for this author in PubMed Google Scholar
Jianhuang Lai
View author publications
You can also search for this author in PubMed Google Scholar
Rui Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianhuang Lai .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Jie Zhou
Beihang University, Beijing, China
Yunhong Wang
Chinese Academy of Sciences, Beijing, China
Zhenan Sun
Xinjiang University, Urumqi, China
Zhenhong Jia
Tsinghua University, Beijing, China
Jianjiang Feng
Chinese Academy of Sciences, Beijing, China
Shiguang Shan
Xinjiang University, Urumqi, China
Kurban Ubul
Tsinghua University, Shenzhen, China
Zhenhua Guo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feng, Z., Xie, X., Lai, J., Huang, R. (2018). An Illumination Augmentation Approach for Robust Face Recognition. In: Zhou, J., et al. Biometric Recognition. CCBR 2018. Lecture Notes in Computer Science(), vol 10996. Springer, Cham. https://doi.org/10.1007/978-3-319-97909-0_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-97909-0_44
Published: 09 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97908-3
Online ISBN: 978-3-319-97909-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

An Illumination Augmentation Approach for Robust Face Recognition

Abstract

Similar content being viewed by others

ILL-Dataset: A Dataset Under Different Illumination Conditions for Face Recognition

FIE-GAN: Illumination Enhancement Network for Face Recognition

Convolutional Neural Networks for Face Illumination Transfer

Keywords

1 Introduction

2 Related Works