Mask-aware photorealistic facial attribute manipulation

Sun, Ruoqi; Huang, Chen; Zhu, Hengliang; Ma, Lizhuang

doi:10.1007/s41095-021-0219-7

Mask-aware photorealistic facial attribute manipulation

Research Article
Open access
Published: 28 April 2021

Volume 7, pages 363–374, (2021)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Mask-aware photorealistic facial attribute manipulation

Download PDF

Ruoqi Sun¹,
Chen Huang²,
Hengliang Zhu¹ &
…
Lizhuang Ma¹

900 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

The technique of facial attribute manipulation has found increasing application, but it remains challenging to restrict editing of attributes so that a face’s unique details are preserved. In this paper, we introduce our method, which we call a mask-adversarial autoencoder (M-AAE). It combines a variational autoencoder (VAE) and a generative adversarial network (GAN) for photorealistic image generation. We use partial dilated layers to modify a few pixels in the feature maps of an encoder, changing the attribute strength continuously without hindering global information. Our training objectives for the VAE and GAN are reinforced by supervision of face recognition loss and cycle consistency loss, to faithfully preserve facial details. Moreover, we generate facial masks to enforce background consistency, which allows our training to focus on the foreground face rather than the background. Experimental results demonstrate that our method can generate high-quality images with varying attributes, and outperforms existing methods in detail preservation.

Article PDF

Face attribute editing based on generative adversarial networks

Article 27 February 2020

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network

SSCGAN: Facial Attribute Editing via Style Skip Connections

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Park, U.; Tong, Y. Y.; Jain, A. K. Age-invariant face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 32, No. 5, 947–954, 2010.
Article Google Scholar
Duong, C. N.; Quach, K. G.; Luu, K.; Le, T. H. N.; Savvides, M. Temporal non-volume preserving approach to facial age-progression and age-invariant face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, 3755–3763, 2017.
Zhang, G.; Kan, M. N.; Shan, S. G.; Chen, X. L. Generative adversarial network with spatial attention for face attribute editing. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11210. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 422–437, 2018.
Google Scholar
Qian, S.; Lin, K.; Wu, W.; Liu, Y.; Wang, Q.; Shen, F.; Qian, C.; He, R. Make a face: Towards arbitrary high fidelity face manipulation. In: Proceedings of the International Conference on Computer Vision, 10033–10042, 2019.
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y.; Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 2, 2672–2680, 2014.
Google Scholar
Zhou, W. Y.; Yang, G. W.; Hu, S. M. Jittor-GAN: A fast-training generative adversarial network model zoo based on Jittor. Computational Visual Media Vol. 7, No. 1, 153–157, 2021.
Article Google Scholar
Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784, 2014.
Yang, J.; Kannan, A.; Batra, D.; Parikh, D. LRGAN: Layered recursive generative adversarial networks forimage generation. In: Proceedings of the International Conference on Learning Representations, 2017.
Choi, Y.; Choi, M.; Kim, M.; Ha, J. W.; Choo, J. Star-GAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8789–8797, 2018.
Chen, Y. C.; Shen, X. H.; Lin, Z.; Lu, X.; Pao, I. M.; Jia, J. Y. Semantic component decomposition for face attribute manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9851–9859, 2019.
Liu, M.; Ding, Y.; Xia, M.; Liu, X.; Ding, E.; Zuo, W.; Wen, S. STGAN: A unified selective transfer network for arbitrary image attribute editing. In: Proceedings of the Computer Vision and Pattern Recognition, 3673–3682, 2019.
Gatys, L. A.; Ecker, A. S.; Bethge, M. Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2414–2423, 2016.
Li, Y. H.; Wang, N. Y.; Liu, J. Y.; Hou, X. D. Demystifying neural style transfer. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, 2230–2236, 2017.
Zhang, Z. F.; Song, Y.; Qi, H. R. Age progression/regression by conditional adversarial autoencoder. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5810–5818 2017.
Zhu, J. Y.; Park, T.; Isola, P.; Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, 2242–2251, 2017.
Liu, M. Y.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. In: Proceedings of the 31st Conference on Neural Information Processing Systems, 443–449, 2017.
Shen, W.; Liu, R. J. Learning residual images for face attribute manipulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1225–1233, 2017.
Lample, G.; Zeghidour, N.; Usunier, N.; Bordes, A.; Denoyer, L.; Ranzato, M. Fader networks: Manipulating images by sliding attributes. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 5969–5978, 2017.
Larsen, A.; Sønderby S.; Larochelle, H.; Winther, O. Autoencoding beyond pixels using a learned similarity metric. In: Proceedings of the International Conference on Machine Learning, 1558–1566, 2016.
He, Z.; Zuo, W.; Kan, M.; Shan, S.; Chen, X Arbitrary facial attribute editing: Only change what you want. arXiv preprint arXiv:1711.10678, 2017.
He, Z.; Zuo, W.; Kan, M.; Shan, S.; Chen, X. AttGAN: Facial attribute editing by only changing what you want. IEEE Transactions on Image Processing Vol. 28, No. 11, 5464–5478, 2019.
Article MathSciNet Google Scholar
Chen, P.; Xiao, Q.; Xu, J.; Dong, X. L.; Sun, L. J. Facial attribute editing using semantic segmentation. In: Proceedings of the International Conference on High Performance Big Data and Intelligent Systems, 97–103, 2019.
Bahng, H.; Chung, S.; Yoo, S.; Choo, J. Exploring unlabeled faces for novel attribute discovery. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5820–5829, 2020.
Gauthier, J. Conditional generative adversarial nets for convolutional face generation. 2014. Available at http://cs231n.stanford.edu/reports/2015/pdfs/jgauthie_final_report.pdf.
Perarnau, G.; Joost, V.; Raducanu, B.; Alvarez, J. Invertible conditional GANs for image editing. In: Proceedings of the Advances in Neural Information Processing Systems, 2016.
Kingma, D. P.; Welling, M. Auto-encoding variational Bayes. In: Proceedings of the International Conference on Learning Representations, 2014.
Suwajanakorn, S.; Kemelmacher-Shlizerman, I.; Seitz, S. M. Total moving face reconstruction. In: Computer Vision — ECCV 2014. Lecture Notes in Computer Science, Vol. 8692. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 796–812, 2014.
Chapter Google Scholar
Hou, X. X.; Shen, L. L.; Sun, K.; Qiu, G. P. Deep feature consistent variational autoencoder. In: Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 1133–1141, 2017.
Richardson, E.; Sela, M. T.; Or-El, R.; Kimmel, R. Learning detailed face reconstruction from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5553–5562, 2017.
Zhu, W. B.; Wu, H. T.; Chen, Z. Y.; Vesdapunt, N.; Wang, B. Y. ReDA: Reinforced differentiable attribute for 3D face reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4957–4966, 2020.
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2016.
Tseng, H. Y.; Lee, H. Y.; Jiang, L.; Yang, M. H.; Yang, W. L. RetrieveGAN: Image synthesis via differentiable patch retrieval. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12353. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 242–257, 2020.
Chapter Google Scholar
Kim, T.; Cha, M.; Kim, H.; Lee, J. K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th International Conference on Machine Learning, Vol. 70, 1857–1865, 2017.
Google Scholar
Shen, Y. J.; Gu, J. J.; Tang, X. O.; Zhou, B. L. Interpreting the latent space of GANs for semantic face editing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9240–9249, 2020.
Oren, K.; Dani, L.; Cohen-Or, D. Cross-domain cascaded deep translation. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12347. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 673–689, 2020.
Chapter Google Scholar
Zhang, Z.; Song, Y.; Qi, H. Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations, 2018.
Wang, C.; Zheng, H. Y.; Yu, Z. B.; Zheng, Z. Q.; Gu, Z. R.; Zheng, B. Discriminative region proposal adversarial networks for high-quality image-to-image translation. In: Proceedings of the European Conference on Computer Vision, 770–785, 2018.
Kim, H.; Garrido, P.; Tewari, A.; Xu, W. P.; Thies, J.; Niessner, M.; Pérez, P.; Richardt, C.; Zollhöfer, M.; Theobalt, C. Deep video portraits. ACM Transactions on Graphics Vol. 37, No. 4, Article No. 163, 2018.
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3431–3440, 2015.
Parkhi, O. M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In: Proceedings of the British Machine Vision Conference, 41.1–41.12, 2015.
Liu, Z. W.; Luo, P.; Wang, X. G.; Tang, X. O. Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, 3730–3738, 2015.
Kingma, D. P.; Ba, J. Adam: A method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations, 2015.

Download references

Acknowledgements

This paper was partially funded by the National Natural Science Foundation of China (No. 61972157), the National Social Science Foundation of China (No. 18ZD22), the Science and Technology Commission of Shanghai Municipality Program (No. 18D1205903), the Science and Technology Commission of Pudong Municipality Program (No. PKJ2018-Y46), and the Multidisciplinary Project of Shanghai Jiao Tong University (No. ZH2018ZDA25), and is also partially supported by a joint project of SenseTime and Shanghai Jiao Tong University.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, 200240, China
Ruoqi Sun, Hengliang Zhu & Lizhuang Ma
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, 15213, USA
Chen Huang

Authors

Ruoqi Sun
View author publications
You can also search for this author in PubMed Google Scholar
Chen Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hengliang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Lizhuang Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lizhuang Ma.

Additional information

Ruoqi Sun was born in Weihai, Shandong Province, China, in 1993. She received her B.S. degree in digital media technology from Shandong University in 2015. She is currently pursuing a Ph.D. degree in the Department of Computer Science and Engineering in Shanghai Jiao Tong University. Her current research interests include facial attribute manipulation, semantic segmentation, and image classification.

Chen Huang received his Ph.D. degree in electronic engineering from Tsinghua University, Beijing, China, in 2014. He was a postdoctoral fellow in the Robotics Institute of Carnegie Mellon University, and also in the Department of Information Engineering, the Chinese University of Hong Kong. He is currently a Research Scientist at Apple Inc. His research interests include machine learning and computer vision, with a focus on deep learning and efficient optimization. He has published more than 20 papers in top tier conferences such as CVPR, ICCV, ECCV, NeurIPS, and ICML.

Hengliang Zhu received his M.S. degree from Fujian Normal University, China, in 2010. He is now a Ph.D. candidate in the Department of Computer Science and Engineering, Shanghai Jiao Tong University. His current research interests include saliency detection and face alignment.

Lizhuang Ma received his B.S. and Ph.D. degrees from Zhejiang University, China, in 1985 and 1991, respectively. He is now a Distinguished Professor and Head of the Digital Media Technology and Data Reconstruction Lab at the Department of Computer Science and Engineering, Shanghai Jiao Tong University. He has published more than 200 academic research papers. His research interests include computer aided geometric design, computer graphics, scientific data visualization, computer animation, digital media technology, and theory and applications of computer graphics and CAD/CAM.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorial-manager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Sun, R., Huang, C., Zhu, H. et al. Mask-aware photorealistic facial attribute manipulation. Comp. Visual Media 7, 363–374 (2021). https://doi.org/10.1007/s41095-021-0219-7

Download citation

Received: 30 December 2020
Accepted: 25 February 2021
Published: 28 April 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s41095-021-0219-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Mask-aware photorealistic facial attribute manipulation

Abstract

Article PDF

Similar content being viewed by others

Face attribute editing based on generative adversarial networks

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network

SSCGAN: Facial Attribute Editing via Style Skip Connections

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mask-aware photorealistic facial attribute manipulation

Abstract

Article PDF

Similar content being viewed by others

Face attribute editing based on generative adversarial networks

MagGAN: High-Resolution Face Attribute Editing with Mask-Guided Generative Adversarial Network

SSCGAN: Facial Attribute Editing via Style Skip Connections

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation