A novel visualization tool for manual annotation when building large speech corpora

Kun, She; Shuzhen, Chen; Shen, Yang; Lian, Zou

doi:10.1007/BF02832127

A novel visualization tool for manual annotation when building large speech corpora

Published: February 2006

Volume 11, pages 381–384, (2006)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Wuhan University Journal of Natural Sciences

A novel visualization tool for manual annotation when building large speech corpora

Download PDF

She Kun¹,
Chen Shuzhen¹,
Yang Shen¹ &
…
Zou Lian¹

35 Accesses
Explore all metrics

Abstract

A novel visualized sound description, called sound dendrogram is proposed to make manual annotation easier whenbuilding large speech corpora. It is a lattice structure built from a group of “seed regions” and through an iteractive procedure of mergence. A simple but reliable extraction method of “seed regions” and advanced distance metric are adopted to construct the sound dendrogram, so that it can present speech’s structure character ranging from coarse to fine in a visualized way. Tests show that all phonemic boundaries are contained in the lattice structure of sound dendrogram and very easy to identify. Sound dendrogram can be a powerful assistant tool during the process of speech corpora’s manual annotation.

Article PDF

The Phonetic Grounding of Prosody: Analysis and Visualisation Tools

The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition

CWordle: A Visual Analytics System for Extracting the Topics of Speech

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Tang M.Large Vocabulary Continuous Speech Recognition Using Linguistic Features and Constraints, [D]. Massachusetts: Massachusetts Institute of Technology, 2005.
Google Scholar
Campbell J, Reynolds D. Corpora for the Evaluation of Speaker Recognition Systems [C]//Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. New York: IEEE, 1999: 829–832.
Google Scholar
Furui S. 50 Years of Progress in Speech and Speaker Recognition [EB/OL] [2005-10-05].http://www. furui. cs. titech. ac. jp/publication/2005-10-05].http://www. furui. cs. titech. ac. jp/ publication/ 2005/SPCOM05.pdf.
Padró M, Padró L. Comparing Methods for Language Identification [EB/OL] [2004-07-06].http://www.lsi. upc. edu/~nlp/papers/2004/sepln04-pp.pdf.
Laureys T, Demuynck K, Duchateau J,et al. An Improved Algorithm for the Automatic Segmentation of Speech Corpora [C]//Proceedings of the 3 ^rd International Conference on Language Resources and Evaluation. Paris: the European Language Resources Association, 2002:1564–1567.
Google Scholar
Sharma M, Mammone R. “Blind” Speech Segmentation: Automatic Segmentation of Speech without Linguistic Knowledge [C]//Proceedings of the 4^th International Conference on Spoken Language Processing. New York: IEEE, 1996; 1237–1240.
Google Scholar
Boersma P, Weenink D. Praat Documentation [EB/OL]. [2005-01-20]http://www.fon.hum.uva.nl/praat/.
Seneff S. A, Joint Synchrony/Mean-Rate Model of Auditory Speech PRocessing [J].Journal of Phonetics, 1988,16(1): 55–76.
Google Scholar
Glass J R. Finding Acoustic Regularities in Speech: Application to Phonetic Recognition [D]. Massachusetts: Massachusetts Institute of Technology, 1988.
Google Scholar
Rabiner L, Juang B H.Fundamentals of Speech Recognition [M]. Indianapolis: Prentice Hall, 1993: 35–36.
Google Scholar
Husson J L, Laprie Y. A New Search Algorithm in Segmentation Lattices of Speech Signals [C]//Proceedings of the 4^th International Conference on Spoken Language Processing. New York: IEEE, 1996: 2099–2102.
Google Scholar
Husson J L, Evaluation of A Segmentation System Based on Multi-Level Lattices [C]//Proceedings of the 6th European Conference on Speech Communication and Technology. Budapest: the International Speech Communication Association, 1999: 471–474.
Google Scholar
Demuynck K, Laureys T. A Comparison of Different Approaches to Automatic Speech Segmentation [C]//Proceedings of the 5th International Conference on Text, Speech and Dialogue. Brno: the International Speech Communication Association, 2002: 277–284.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic Information, Wuhan University, 430072, Wuhan, Hubei, China
She Kun, Chen Shuzhen, Yang Shen & Zou Lian

Authors

She Kun
View author publications
You can also search for this author in PubMed Google Scholar
Chen Shuzhen
View author publications
You can also search for this author in PubMed Google Scholar
Yang Shen
View author publications
You can also search for this author in PubMed Google Scholar
Zou Lian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Shuzhen.

Additional information

Foundation item: Supported by the National Natural Science Foundation of China (50099620) and the National High-Technology Development Program of China (2001 AA132050)

Biography: SHE Kun (1979-), male. Ph.D. candidate, research direction: multimedia signal processing.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kun, S., Shuzhen, C., Shen, Y. et al. A novel visualization tool for manual annotation when building large speech corpora. Wuhan Univ. J. Nat. Sci. 11, 381–384 (2006). https://doi.org/10.1007/BF02832127

Download citation

Received: 20 March 2005
Issue Date: February 2006
DOI: https://doi.org/10.1007/BF02832127

Key words

CLC number

TP 37

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A novel visualization tool for manual annotation when building large speech corpora

Abstract

Article PDF

Similar content being viewed by others

The Phonetic Grounding of Prosody: Analysis and Visualisation Tools

The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition

CWordle: A Visual Analytics System for Extracting the Topics of Speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

CLC number

Navigation

A novel visualization tool for manual annotation when building large speech corpora

Abstract

Article PDF

Similar content being viewed by others

The Phonetic Grounding of Prosody: Analysis and Visualisation Tools

The Multi-level Approach to Speech Corpora Annotation for Automatic Speech Recognition

CWordle: A Visual Analytics System for Extracting the Topics of Speech

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

CLC number

Search

Navigation