Exploring CTC Based End-To-End Techniques for Myanmar Speech Recognition

Chit, Khin Me Me; Lin, Laet Laet

doi:10.1007/978-3-030-68154-8_87

Khin Me Me Chit¹⁷ &
Laet Laet Lin¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1324))

Included in the following conference series:

International Conference on Intelligent Computing & Optimization

1078 Accesses
2 Citations
1 Altmetric

Abstract

In this work, we explore a Connectionist Temporal Classification (CTC) based end-to-end Automatic Speech Recognition (ASR) model for the Myanmar language. A series of experiments is presented on the topology of the model in which the convolutional layers are added and dropped, different depths of bidirectional long short-term memory (BLSTM) layers are used and different label encoding methods are investigated. The experiments are carried out in low-resource scenarios using our recorded Myanmar speech corpus of nearly 26 h. The best model achieves character error rate (CER) of 4.72% and syllable error rate (SER) of 12.38% on the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

End-to-End Speech Recognition in Russian

Language Adaptive Multilingual CTC Speech Recognition

A study of transformer-based end-to-end speech recognition system for Kazakh language

Article Open access 18 May 2022

References

Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J., Fan, L., Fougner, C., Han, T., Hannun, A., Jun, B., LeGresley, P., Lin, L., Narang, S., Ng, A., Ozair, S., Prenger, R., Raiman, J., Satheesh, S., Seetapun, D., Sengupta, S., Wang, Y., Wang, Z., Wang, C., Xiao, B., Yogatama, D., Zhan, J., Zhu, Z.: Deep Speech 2: end-to-end speech recognition in English and Mandarin. arXiv:1512.02595 [cs.CL] (2015)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML 2006: Proceedings of the 23rd International Conference on Machine Learning. ACM Press, New York (2006)
Google Scholar
Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)
Google Scholar
Graves, A.: Sequence transduction with recurrent neural networks, arXiv:1211.3711 [cs.NE] (2012)
Zweig, G., Yu, C., Droppo, J., Stolcke, A.: Advances in all-neural speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4805--4809. IEEE (2017)
Google Scholar
Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep speech: scaling up end-to-end speech recognition, arXiv:1412.5567 [cs.CL] (2014)
Shan, C., Zhang, J., Wang, Y., Xie, L.: Attention-based end-to-end speech recognition on voice search. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4764–4768. IEEE (2018)
Google Scholar
Li, J., Zhao, R., Hu, H., Gong, Y.: Improving RNN transducer modeling for end-to-end speech recognition. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 114–121. IEEE (2019)
Google Scholar
Mon, A.N., Pa, W.P., Thu, Y.K.: Building HMM-SGMM continuous automatic speech recognition on Myanmar Web news. In: International Conference on Computer Applications (ICCA2017), pp. 446–453 (2017)
Google Scholar
Naing, H.M.S., Pa, W.P.: Automatic speech recognition on spontaneous interview speech. In: Sixteenth International Conferences on Computer Applications (ICCA 2018), Yangon, Myanmar, pp. 203–208 (2018)
Google Scholar
Nwe, T., Myint, T.: Myanmar language speech recognition with hybrid artificial neural network and hidden Markov model. In: Proceedings of 2015 International Conference on Future Computational Technologies (ICFCT 2015), pp. 116–122 (2015)
Google Scholar
Naing, H.M.S., Hlaing, A.M., Pa, W.P., Hu, X., Thu, Y.K., Hori, C., Kawai, H.: A Myanmar large vocabulary continuous speech recognition system. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 320–327. IEEE (2015)
Google Scholar
Mon, A.N., Pa Pa, W., Thu, Y.K.: Improving Myanmar automatic speech recognition with optimization of convolutional neural network parameters. Int. J. Nat. Lang. Comput. (IJNLC) 7, 1–10 (2018)
Article Google Scholar
Aung, M.A.A., Pa, W.P.: Time delay neural network for Myanmar automatic speech recognition. In: 2020 IEEE Conference on Computer Applications (ICCA), pp. 1–4. IEEE (2020)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICRL) (2015)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift, arXiv:1502.03167 [cs.LG] (2015)
Hannun: Sequence Modeling with CTC. Distill. 2 (2017)
Google Scholar
Department of Meteorology and Hydrology. https://www.moezala.gov.mm/
DVB TVnews. https://www.youtube.com/channel/UCuaRmKJLYaVMDHrnjhWUcHw
Google: Myanmar Tools. https://github.com/google/myanmar-tools
ICU - International Components for Unicode. https://site.icu-project.org/
Thu, Y.K.: sylbreak. https://github.com/ye-kyaw-thu/sylbreak

Download references

Acknowledgments

The authors are grateful to the advisors from the University of Information Technology who gave us helpful comments and suggestions throughout this project. The authors also thank Ye Yint Htoon and May Sabal Myo for helping us with the dataset preparation and for technical assistance.

Author information

Authors and Affiliations

University of Information Technology, Yangon, Myanmar
Khin Me Me Chit & Laet Laet Lin

Authors

Khin Me Me Chit
View author publications
You can also search for this author in PubMed Google Scholar
Laet Laet Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khin Me Me Chit .

Editor information

Editors and Affiliations

Department of Fundamental and Applied Sciences, Universiti Teknologi Petronas, Tronoh, Perak, Malaysia
Pandian Vasant
Faculty of Electrical Engineering and Computer Science, VŠB TU Ostrava, Ostrava, Czech Republic
Ivan Zelinka
Faculty of Engineering Management, Poznan University of Technology, Poznan, Poland
Gerhard-Wilhelm Weber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chit, K.M.M., Lin, L.L. (2021). Exploring CTC Based End-To-End Techniques for Myanmar Speech Recognition. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2020. Advances in Intelligent Systems and Computing, vol 1324. Springer, Cham. https://doi.org/10.1007/978-3-030-68154-8_87

Download citation

DOI: https://doi.org/10.1007/978-3-030-68154-8_87
Published: 08 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-68153-1
Online ISBN: 978-3-030-68154-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Exploring CTC Based End-To-End Techniques for Myanmar Speech Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-End Speech Recognition in Russian

Language Adaptive Multilingual CTC Speech Recognition

A study of transformer-based end-to-end speech recognition system for Kazakh language

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Exploring CTC Based End-To-End Techniques for Myanmar Speech Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

End-to-End Speech Recognition in Russian

Language Adaptive Multilingual CTC Speech Recognition

A study of transformer-based end-to-end speech recognition system for Kazakh language

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation