Evaluating Voice Quality and Speech Synthesis Using Crowdsourcing

Parson, Jeanne; Braga, Daniela; Tjalve, Michael; Oh, Jieun

doi:10.1007/978-3-642-40585-3_30

Jeanne Parson²⁰,
Daniela Braga²⁰,
Michael Tjalve^20,21 &
…
Jieun Oh²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8082))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

Abstract

One of the key aspects of creating high quality synthetic speech is the validation process. Establishing validation processes that are reliable and scalable is challenging. Today, the maturity of the crowdsourcing infrastructure along with better techniques for validating the data gathered through crowdsourcing have made it possible to perform reliable speech synthesis validation at a larger scale. In this paper, we present a study of voice quality evaluation using the crowdsourcing platform. We investigate voice gender preference across eight locales for three typical TTS scenarios. We also examine to which degree speaker adaptation can carry over certain voice qualities, such as mood, of the target speaker to the adapted TTS. Based on an existing full TTS font, adaptation is carried out on a smaller amount of speech data from a target speaker. Finally, we show how crowdsourcing contributes to objective assessment when dealing with voice preference in voice talent selection.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Article Open access 22 November 2020

Utilizing Crowdsourcing for the Construction of Chinese-Mongolian Speech Corpus with Evaluation Mechanism

Outcomes of Speech to Speech Translation for Broadcast Speeches and Crowd Source Based Speech Data Collection Pilot Projects

Keywords

References

Wolters, M., Isaac, K., Renalds, S.: Evaluating Speech Synthesis intelligibility using Amazon’s Mechanical Turk. In: Proc. 7th Speech Synthesis Workshop, SSW7 (2010)
Google Scholar
King, S., Karaiskos, V.: The Blizzard Challenge 2012. In: Proc. Blizzard Challenge Workshop 2012, Portland, OR, USA (2012)
Google Scholar
Lane, I., Waibel, A., Eck, M., Rottman, K.: Tools for Collecting Speech Corpora via Mechanical-Turk. In: Proc. of Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 185–187 (2010)
Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast—but is it good?: evaluating non-expert annotations for natural language tasks. In: Proc. of the Conference on Empirical Methods in Natural Language Processing, pp. 254–263. Association for Computational Linguistics (2008)
Google Scholar
Marge, M., Banerjee, S., Rudnicky, A.: Using the Amazon Mechanical Turk for transcription of spoken language. In: Proc. IEEE-ICASSP (2010)
Google Scholar
Parent, G., Eskenazi, M.: Speaking to the Crowd: looking at past achievements in using crowdsourcing for speech and predicting future challenges. In: Proc. of INTERSPEECH 2011, pp. 3037–3040 (2011)
Google Scholar
Cooke, M., Barker, J., Lecumberri, M.: Crowdsourcing in Speech Perception. In: Eskenazi, M., Levow, G., Meng, H., Parent, G., Suendermann, D. (eds.) Crowdsourcing for Speech Processing: Applications to Data Collection, Transcription and Assessment, pp. 137–172. Wiley, West Sussex (2013)
Chapter Google Scholar
Lee, J., Nass, C., Brave, S.: Can computer-generated speech have gender?: an experimental test of gender stereotype. In: Proc. CHI EA 2000, CHI 2000 Extended Abstracts on Human Factors in Computing Systems, pp. 289–290. ACM, New York (2000)
Google Scholar
Masuko, T., Tokuda, K., Kobayashi, T., Imai, S.: Voice characteristics conversion for HMM-based speech synthesis system. In: Proc. of ICASSP, pp. 1611–1614 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft, USA
Jeanne Parson, Daniela Braga & Michael Tjalve
University of Washington, Seattle, USA
Michael Tjalve
CCRMA, Stanford University, Stanford, USA
Jieun Oh

Authors

Jeanne Parson
View author publications
You can also search for this author in PubMed Google Scholar
Daniela Braga
View author publications
You can also search for this author in PubMed Google Scholar
Michael Tjalve
View author publications
You can also search for this author in PubMed Google Scholar
Jieun Oh
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal & Václav Matoušek &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Parson, J., Braga, D., Tjalve, M., Oh, J. (2013). Evaluating Voice Quality and Speech Synthesis Using Crowdsourcing. In: Habernal, I., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2013. Lecture Notes in Computer Science(), vol 8082. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40585-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-40585-3_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40584-6
Online ISBN: 978-3-642-40585-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating Voice Quality and Speech Synthesis Using Crowdsourcing

Abstract

Chapter PDF

Similar content being viewed by others

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Utilizing Crowdsourcing for the Construction of Chinese-Mongolian Speech Corpus with Evaluation Mechanism

Outcomes of Speech to Speech Translation for Broadcast Speeches and Crowd Source Based Speech Data Collection Pilot Projects

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Evaluating Voice Quality and Speech Synthesis Using Crowdsourcing

Abstract

Chapter PDF

Similar content being viewed by others

Towards speech quality assessment using a crowdsourcing approach: evaluation of standardized methods

Utilizing Crowdsourcing for the Construction of Chinese-Mongolian Speech Corpus with Evaluation Mechanism

Outcomes of Speech to Speech Translation for Broadcast Speeches and Crowd Source Based Speech Data Collection Pilot Projects

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation