On the Use of Generative Adversarial Networks to Generate Face Images from Voice Neural Embeddings

Salamea-Palacios, Christian; Zumba-Narváez, Edison; Zumba-Narváez, Fernando

doi:10.1007/978-3-031-37717-4_18

Christian Salamea-Palacios¹⁰,
Edison Zumba-Narváez¹⁰ &
Fernando Zumba-Narváez¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 711))

Included in the following conference series:

Science and Information Conference

633 Accesses

Abstract

Nowadays, the automatic detection of objects, either in images or in videos are topics of high interest in the development of technologies related to the deep learning approach, one of them is the field of forensic acoustics. Forensic acoustics is related to a security approach where the technology is used to try resolve policies’ issues. In this work, considering that Generative Adversarial Networks (GANs) have been used successfully in the reconstruction or generation images, the recognition of the face of people from its audio or video recordings has been studied using GANs. Due the novelty of this research, there are not many databases to evaluate the goodness of the models, so, an images-audio database has been created, containing both, images and audio of people who appear in videos of the YouTube platform. These images and corresponding audio embeddings have been used to train the proposed models based on GANs. The objective of this work is to generate the image of the face of a person considering only its audio voice signal as feature, that is, generate a face like the owner of the voice. The metric used to evaluate the efficiency of the proposed technique has been the “Peak Signal to Noise Ratio” metric (PSNR) which is able to determine if an image could be considered as a human face. Up to 28.39 dB of PSNR has been obtained when the images generated from its voice embeddings were evaluated, presenting up to 30% of relative improvement comparing to the same technique that use noise as feature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A literature review and perspectives in deepfakes: generation, detection, and applications

Article 23 July 2022

Exploring deep convolutional generative adversarial networks (DCGAN) in biometric systems: a survey study

Article Open access 28 May 2024

DeepFakes Evolution: Analysis of Facial Regions and Fake Detection Performance

References

Duarte, A., et al.: Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial Networks (2019). http://arxiv.org/abs/1903.10195
Roldan Snchez, F.: Speech-conditioned Face Generation with Deep Adversarial Networks (2018)
Google Scholar
Wen, Y., Singh, R., Raj, B.: Face Reconstruction from Voice using Generative Adversarial Networks. https://github.com/cmu-mlsp/reconstructing_faces_from_voices
Goodfellow, I.: NIPS 2016 Tutorial: Generative Adversarial Networks (2016). http://arxiv.org/abs/1701.00160
Minaee, S., Abdolrashidi, A., Su, H., Bennamoun, M., Zhang, D.: Biometrics Recognition Using Deep Learning: A Survey (2019). http://arxiv.org/abs/1912.00271
OpenCV: Face Detection using Haar Cascades. https://docs.opencv.org/3.4.3/d7/d8b/tutorial_py_face_detection.html. Accessed 07 Nov 2021
Wang, J., Perez, L.: The effectiveness of data augmentation in image classification using deep learning. Convolutional Neural Netw. Vis. Recogn. 11, 1–8 (2017)
Google Scholar
Park, D.S., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition (2019). https://doi.org/10.21437/Interspeech.2019-2680
Ko, T., Peddinti, V., Povey, D., Khudanpur, S.: Audio Augmentation for Speech Recognition (2015). http://www.isip.piconepress.com/
Chatziagapi, A., et al.: Data augmentation using GANs for speech emotion recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 2019-September, pp. 171–175 (2019). https://doi.org/10.21437/Interspeech.2019-2561
Grado, D., et al.: Identificación de la Fuente de Adquisición de Ficheros Multimedia de Dispositivos Móviles mediante Deep Learning TRABAJO FIN DE GRADO (2018)
Google Scholar
Radford, A., Metz, L., Chintala, S.: Unsupervised Representation Learning With Deep Convolutional Generative Adversarial Networks (2015)
Google Scholar
Ahmad, K.S., Thosar, A.S., Nirmal, J.H., Pande, V.S.: A unique approach in text independent speaker recognition using MFCC feature sets and probabilistic neural network. In: 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), pp. 1–6 (2015). https://doi.org/10.1109/ICAPR.2015.7050669
Winursito, A., Hidayat, R., Nur, M., Utomo, Y., Bejo, A.: Feature Data Reduction of MFCC Using PCA and SVD in Speech Recognition System (2018)
Google Scholar
Juvela, L., et al.: Speech waveform synthesis from MFCC sequences with generative adversarial networks. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, pp. 5679–5683 (2018). https://doi.org/10.1109/ICASSP.2018.8461852
Gabriel, J., Cruz, M.: Diseño de una red generativa antagónica para el mejoramiento de la resolución de imágenes
Google Scholar
Caballero Hernández, H.: Cálculo de la dispersión de píxeles en imágenes RGB para Esteganografía con base en la Teoría Fractal (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Interaction, Robotics and Automation Research Group, Universidad Politécnica Salesiana, Cuenca, Ecuador
Christian Salamea-Palacios, Edison Zumba-Narváez & Fernando Zumba-Narváez

Authors

Christian Salamea-Palacios
View author publications
You can also search for this author in PubMed Google Scholar
Edison Zumba-Narváez
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Zumba-Narváez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Salamea-Palacios .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salamea-Palacios, C., Zumba-Narváez, E., Zumba-Narváez, F. (2023). On the Use of Generative Adversarial Networks to Generate Face Images from Voice Neural Embeddings. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 711. Springer, Cham. https://doi.org/10.1007/978-3-031-37717-4_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-37717-4_18
Published: 01 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37716-7
Online ISBN: 978-3-031-37717-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

On the Use of Generative Adversarial Networks to Generate Face Images from Voice Neural Embeddings

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A literature review and perspectives in deepfakes: generation, detection, and applications

Exploring deep convolutional generative adversarial networks (DCGAN) in biometric systems: a survey study

DeepFakes Evolution: Analysis of Facial Regions and Fake Detection Performance

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

On the Use of Generative Adversarial Networks to Generate Face Images from Voice Neural Embeddings

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A literature review and perspectives in deepfakes: generation, detection, and applications

Exploring deep convolutional generative adversarial networks (DCGAN) in biometric systems: a survey study

DeepFakes Evolution: Analysis of Facial Regions and Fake Detection Performance

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation