Abstract
This paper presents a next-generation web application that enables users to contribute corrections to automatically acquired transcription of long speech recordings. We describe differences from similar settings, compare our solution with others and reflect on the development from the now 6 years old work we build upon in the light of the progress made, lessons learned and the new technologies available in the browser.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
Transcriber explicitly aligns the text with speech, while the other two merely support addition of timestamps into the transcription.
- 6.
Transcribe supports team co-operation.
- 7.
In our data, other speakers represent a negligible fraction but we may later add support for speaker annotation.
- 8.
The current word is on the top line on the screenshot because it is at the beginning of the recording.
- 9.
The median number of chunks is 1 (most recordings have no manually corrected segments), maximum is 1109. Median only counting touched recordings is 8.
- 10.
We could even stop the recalculation as soon as we find that the new horizontal coordinate of a word is left untouched, and add the difference in the vertical coordinate to all subsequent words, i.e. when a line stays the same, so do all below it.
- 11.
References
Abramov, D.: Redux. React Community. c (2015)
Adenot, P., Wilson, C., Rogers, C.: Web audio API. W3C, October 10 (2013)
Bojar, O., Janíček, M., Češka, P., Beňa, P., et al.: Czeng 0.7: parallel corpus with community-supplied translations. LREC 2008 (2008)
Codd, E.F.: A relational model of data for large shared data banks. Commun. ACM 13(6), 377–387 (1970)
Hájek, J.: Český mystik karel makoň. Dingir 2007(4), 142–143 (2007)
Ide, N., Fellbaum, C., Baker, C., Passonneau, R.: The manually annotated sub-corpus: a community resource for and by the people. In: Proceedings of the ACL 2010 Conference Short Papers, pp. 68–73. Association for Computational Linguistics (2010)
Krůza, O., Peterek, N.: Making community and ASR join forces in web environment. In: International Conference on Text, Speech and Dialogue, pp. 415–421. Springer (2012)
Marge, M., Banerjee, S., Rudnicky, A.I.: Using the Amazon mechanical Turk for transcription of spoken language. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5270–5273, March 2010
Mihalcea, R., Chklovski, T.: Building sense tagged corpora with volunteer contributions over the web. Recent Advances in Natural Language Processing III: Selected Papers from RANLP 2003 260, p. 357 (2004)
Reese, S., Boleda, G., Cuadros, M., Rigau, G.: Wikicorpus: a word-sense disambiguated multilingual wikipedia corpus (2010)
Acknowledgments
The research was supported by SVV project number 260 453. This work has been using language resources stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2015071).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Krůza, O., Kuboň, V. (2019). Second-Generation Web Interface to Correcting ASR Output. In: Arai, K., Bhatia, R., Kapoor, S. (eds) Proceedings of the Future Technologies Conference (FTC) 2018. FTC 2018. Advances in Intelligent Systems and Computing, vol 880. Springer, Cham. https://doi.org/10.1007/978-3-030-02686-8_56
Download citation
DOI: https://doi.org/10.1007/978-3-030-02686-8_56
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02685-1
Online ISBN: 978-3-030-02686-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)