Abstract
A novel visualized sound description, called sound dendrogram is proposed to make manual annotation easier whenbuilding large speech corpora. It is a lattice structure built from a group of “seed regions” and through an iteractive procedure of mergence. A simple but reliable extraction method of “seed regions” and advanced distance metric are adopted to construct the sound dendrogram, so that it can present speech’s structure character ranging from coarse to fine in a visualized way. Tests show that all phonemic boundaries are contained in the lattice structure of sound dendrogram and very easy to identify. Sound dendrogram can be a powerful assistant tool during the process of speech corpora’s manual annotation.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Tang M.Large Vocabulary Continuous Speech Recognition Using Linguistic Features and Constraints, [D]. Massachusetts: Massachusetts Institute of Technology, 2005.
Campbell J, Reynolds D. Corpora for the Evaluation of Speaker Recognition Systems [C]//Proceedings of the International Conference on Acoustics, Speech, and Signal Processing. New York: IEEE, 1999: 829–832.
Furui S. 50 Years of Progress in Speech and Speaker Recognition [EB/OL] [2005-10-05].http://www. furui. cs. titech. ac. jp/publication/2005-10-05].http://www. furui. cs. titech. ac. jp/ publication/ 2005/SPCOM05.pdf.
Padró M, Padró L. Comparing Methods for Language Identification [EB/OL] [2004-07-06].http://www.lsi. upc. edu/~nlp/papers/2004/sepln04-pp.pdf.
Laureys T, Demuynck K, Duchateau J,et al. An Improved Algorithm for the Automatic Segmentation of Speech Corpora [C]//Proceedings of the 3 rd International Conference on Language Resources and Evaluation. Paris: the European Language Resources Association, 2002:1564–1567.
Sharma M, Mammone R. “Blind” Speech Segmentation: Automatic Segmentation of Speech without Linguistic Knowledge [C]//Proceedings of the 4th International Conference on Spoken Language Processing. New York: IEEE, 1996; 1237–1240.
Boersma P, Weenink D. Praat Documentation [EB/OL]. [2005-01-20]http://www.fon.hum.uva.nl/praat/.
Seneff S. A, Joint Synchrony/Mean-Rate Model of Auditory Speech PRocessing [J].Journal of Phonetics, 1988,16(1): 55–76.
Glass J R. Finding Acoustic Regularities in Speech: Application to Phonetic Recognition [D]. Massachusetts: Massachusetts Institute of Technology, 1988.
Rabiner L, Juang B H.Fundamentals of Speech Recognition [M]. Indianapolis: Prentice Hall, 1993: 35–36.
Husson J L, Laprie Y. A New Search Algorithm in Segmentation Lattices of Speech Signals [C]//Proceedings of the 4th International Conference on Spoken Language Processing. New York: IEEE, 1996: 2099–2102.
Husson J L, Evaluation of A Segmentation System Based on Multi-Level Lattices [C]//Proceedings of the 6th European Conference on Speech Communication and Technology. Budapest: the International Speech Communication Association, 1999: 471–474.
Demuynck K, Laureys T. A Comparison of Different Approaches to Automatic Speech Segmentation [C]//Proceedings of the 5th International Conference on Text, Speech and Dialogue. Brno: the International Speech Communication Association, 2002: 277–284.
Author information
Authors and Affiliations
Corresponding author
Additional information
Foundation item: Supported by the National Natural Science Foundation of China (50099620) and the National High-Technology Development Program of China (2001 AA132050)
Biography: SHE Kun (1979-), male. Ph.D. candidate, research direction: multimedia signal processing.
Rights and permissions
About this article
Cite this article
Kun, S., Shuzhen, C., Shen, Y. et al. A novel visualization tool for manual annotation when building large speech corpora. Wuhan Univ. J. Nat. Sci. 11, 381–384 (2006). https://doi.org/10.1007/BF02832127
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02832127