Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations

Zorilă, Tudor-Cătălin; Erro, Daniel; Hernaez, Inma

doi:10.1007/978-3-642-35292-8_4

Tudor-Cătălin Zorilă^7,8,
Daniel Erro⁸ &
Inma Hernaez⁸

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 328))

762 Accesses
7 Citations

Abstract

This paper presents a new method to train traditional voice conversion functions based on Gaussian mixture models, linear transforms and cepstral parameterization. Instead of using statistical criteria, this method calculates a set of linear transforms that represent physically meaningful spectral modifications such as frequency warping and amplitude scaling. Our experiments indicate that the proposed training method leads to significant improvements in the average quality of the converted speech with respect to traditional statistical methods. This is achieved without modifying the input/output parameters or the shape of the conversion function.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM

A novel voice conversion approach using cascaded powerful cepstrum predictors with excitation and phase extracted from the target training space encoded as a KD-tree

Article 08 October 2019

Towards Physically Interpretable Parametric Voice Conversion Functions

Keywords

References

Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 655–658 (1988)
Google Scholar
Arslan, L.M.: Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun. 28, 211–226 (1999)
Article Google Scholar
Valbret, H., Moulines, E., Tubach, J.P.: Voice transformation using PSOLA technique. Speech Commun. 1, 145–148 (1992)
Google Scholar
Sündermann, D., Ney, H.: VTLN-based voice conversion. In: Proc. IEEE Symp. Signal Process. Inf. Technol., pp. 556–559 (2003)
Google Scholar
Narendranath, M., Murthy, H.A., Rajendran, S., Yegnanarayana, B.: Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995)
Article Google Scholar
Duxans, H., Bonafonte, A., Kain, A., van Santen, J.: Including dynamic and phonetic information in voice conversion systems. In: Proc. Int. Conf. Spoken Lang. Process., pp. 1193–1196 (2004)
Google Scholar
Stylianou, Y., Cappé, O., Moulines, E.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech and Audio Process. 6, 131–142 (1998)
Article Google Scholar
Kain, A.: High resolution voice transformation. Ph.D. thesis, Oregon Health & Science University (2001)
Google Scholar
Chen, Y., Chu, M., Chang, E., Liu, J.: Voice conversion with smoothed GMM and MAP adaptation. In: Proc. Eurospeech, pp. 2413–2416 (2003)
Google Scholar
Toda, T., Black, A.W., Tokuda, K.: Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio, Speech, Lang. Process. 15(8), 2222–2235 (2007)
Article Google Scholar
Toda, T., Saruwatari, H., Shikano, K.: Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 841–844 (2001)
Google Scholar
Erro, D., Moreno, A., Bonafonte, A.: Voice conversion based on weighted frequency warping. IEEE Trans. Audio, Speech, Lang. Process. 18(5), 922–931 (2010)
Article Google Scholar
Tamura, M., Morita, M., Kagoshima, T., Akamine, M.: One sentence voice adaptation using GMM-based frequency-warping and shift with a sub-band basis spectrum model. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 5124–5127 (2011)
Google Scholar
Cappé, O., Laroche, J., Moulines, E.: Regularized estimation of cepstrum envelope from discrete frequency points. In: IEEE Workshop on Apps. Signal Process. to Audio & Acoustics, pp. 213–216 (1995)
Google Scholar
CMU ARCTIC speech synthesis databases, http://festvox.org/cmu_arctic/
Erro, D., Sainz, I., Navas, E., Hernaez, I.: HNM-based MFCC+F0 extractor applied to statistical speech synthesis. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., pp. 4728–4731 (2011), http://aholab.ehu.es/ahocoder

Download references

Author information

Authors and Affiliations

POLITEHNICA University of Bucharest (UPB), Bucharest, Romania
Tudor-Cătălin Zorilă
AHOLAB, University of the Basque Country (UPV/EHU), Bilbao, Spain
Tudor-Cătălin Zorilă, Daniel Erro & Inma Hernaez

Authors

Tudor-Cătălin Zorilă
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Erro
View author publications
You can also search for this author in PubMed Google Scholar
Inma Hernaez
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politecnica Superior, Universidad Autonoma de Madrid. C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Doroteo Torre Toledano
Centro Politécnico Superior, Edificio Ada Byron, C/ María de Luna nº 1, 50018, Zaragoza, Spain
Alfonso Ortega Giménez
Universidade de Aveiro, Campus Universitário Aveiro, 3810-193, Aveiro, Portugal
António Teixeira
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Joaquín González Rodríguez
E.T.S.I.Telecomunicacion, Universidad Politécnica de Madrid, Ciudad Universitaria s/n, 28040, Madrid, Spain
Luis Hernández Gómez & Rubén San Segundo Hernández &
Escuela Politecnica Superior, Universidad Autonoma de Madrid, C/ Francisco, Tomas y Valiente 11, 28049, Madrid, Spain
Daniel Ramos Castro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zorilă, TC., Erro, D., Hernaez, I. (2012). Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations. In: Torre Toledano, D., et al. Advances in Speech and Language Technologies for Iberian Languages. Communications in Computer and Information Science, vol 328. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35292-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-35292-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35291-1
Online ISBN: 978-3-642-35292-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations

Abstract

Chapter PDF

Similar content being viewed by others

Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM

A novel voice conversion approach using cascaded powerful cepstrum predictors with excitation and phase extracted from the target training space encoded as a KD-tree

Towards Physically Interpretable Parametric Voice Conversion Functions

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving the Quality of Standard GMM-Based Voice Conversion Systems by Considering Physically Motivated Linear Transformations

Abstract

Chapter PDF

Similar content being viewed by others

Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM

A novel voice conversion approach using cascaded powerful cepstrum predictors with excitation and phase extracted from the target training space encoded as a KD-tree

Towards Physically Interpretable Parametric Voice Conversion Functions

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation