Skip to main content

A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping

  • Conference paper
  • First Online:
Emerging Trends in Intelligent Computing and Informatics (IRICT 2019)

Abstract

In this paper, a new model for voice morphing is proposed. The spectral characteristics of a source speaker’s speech have been transferred to speech as it was spoken by another designated target speaker. The proposed model performs a phoneme segmentation of the voice signal and then transforms the spectral characteristics of each segment using a Linear Prediction model. The spectral features extracted using the Linear Prediction Coding (LPC) technique are aligned using the Dynamic Time Wrapping (DTW). The Generative Topographic Mapping (GTM) method was used for modeling the LPC features. Then, the transformation is achieved using the Gaussian Mixture Model (GMM). The transformed code-books are finally converted to prediction coefficients, and the excitation signal is filtered in order to synthesis the speech. A correlation test is performed between the source, and target signals showed a high correlation. The results reveal that the proposed model is promising in terms of recognizing full sentences in addition to individual words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hutchinson, M.: Methods for voice conversion (2012)

    Google Scholar 

  2. Saundade, M., Kurle, P.: Speech recognition using digital signal processing. Int. J. Electron. Commun. Soft Comput. Sci. Eng. 2, 31 (2013)

    Google Scholar 

  3. Orphanidou, C., et al.: Voice morphing using the generative topographic mapping (2003)

    Google Scholar 

  4. Kain, A., Macon, M.W.: Spectral voice conversion for text-to-speech synthesis (1998)

    Google Scholar 

  5. Mccree, A.: Low-Bit-Rate Speech Coding. Information Systems Technology Group, MIT Lincoln Laboratory (2008)

    Google Scholar 

  6. Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice conversion through vector quantization. In: Proceedings of IEEE ICASSP (1988)

    Google Scholar 

  7. Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Prentice-Hall Signal Processing Series (1978)

    Google Scholar 

  8. Drioli, C.: Radial basis function networks for conversion of sound spectra. EURASIP J. Appl. Signal Process. 2001, 36–44 (2001)

    Google Scholar 

  9. Orphanidou, C., Moroz, I.M., Roberts, S.J.: Wavelet-based voice morphing (2004)

    Google Scholar 

  10. Garofolo, J.S.: TIMIT Acoustic-Phonetic Continuous Speech Corpus LDC93S1. Web Download. Linguistic Data Consortium, Philadelphia (1993)

    Google Scholar 

  11. Songar, A., Harita, M.B.: MATLAB based voice conversion model using PSOLA algorithm. Int. J. Digit. Appl. Contemp. Res. 1, 2319–4863 (2013)

    Google Scholar 

  12. Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 64, 561–580 (1975)

    Article  Google Scholar 

  13. Hosom, J.-P.: Automatic time alignment of phonemes using acoustic-phonetic information, May 2000

    Google Scholar 

  14. Markus, J.F.: GTM: the generative topographic mapping, April 1998

    Google Scholar 

  15. Netlab Toolbox. http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Murad A. Rassam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Rassam, M.A. et al. (2020). A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping. In: Saeed, F., Mohammed, F., Gazem, N. (eds) Emerging Trends in Intelligent Computing and Informatics. IRICT 2019. Advances in Intelligent Systems and Computing, vol 1073. Springer, Cham. https://doi.org/10.1007/978-3-030-33582-3_38

Download citation

Publish with us

Policies and ethics