Skip to main content

Exploring CTC Based End-To-End Techniques for Myanmar Speech Recognition

  • Conference paper
  • First Online:
Intelligent Computing and Optimization (ICO 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1324))

Included in the following conference series:

Abstract

In this work, we explore a Connectionist Temporal Classification (CTC) based end-to-end Automatic Speech Recognition (ASR) model for the Myanmar language. A series of experiments is presented on the topology of the model in which the convolutional layers are added and dropped, different depths of bidirectional long short-term memory (BLSTM) layers are used and different label encoding methods are investigated. The experiments are carried out in low-resource scenarios using our recorded Myanmar speech corpus of nearly 26 h. The best model achieves character error rate (CER) of 4.72% and syllable error rate (SER) of 12.38% on the test set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Amodei, D., Anubhai, R., Battenberg, E., Case, C., Casper, J., Catanzaro, B., Chen, J., Chrzanowski, M., Coates, A., Diamos, G., Elsen, E., Engel, J., Fan, L., Fougner, C., Han, T., Hannun, A., Jun, B., LeGresley, P., Lin, L., Narang, S., Ng, A., Ozair, S., Prenger, R., Raiman, J., Satheesh, S., Seetapun, D., Sengupta, S., Wang, Y., Wang, Z., Wang, C., Xiao, B., Yogatama, D., Zhan, J., Zhu, Z.: Deep Speech 2: end-to-end speech recognition in English and Mandarin. arXiv:1512.02595 [cs.CL] (2015)

  2. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML 2006: Proceedings of the 23rd International Conference on Machine Learning. ACM Press, New York (2006)

    Google Scholar 

  3. Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960–4964. IEEE (2016)

    Google Scholar 

  4. Graves, A.: Sequence transduction with recurrent neural networks, arXiv:1211.3711 [cs.NE] (2012)

  5. Zweig, G., Yu, C., Droppo, J., Stolcke, A.: Advances in all-neural speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4805--4809. IEEE (2017)

    Google Scholar 

  6. Hannun, A., Case, C., Casper, J., Catanzaro, B., Diamos, G., Elsen, E., Prenger, R., Satheesh, S., Sengupta, S., Coates, A., Ng, A.Y.: Deep speech: scaling up end-to-end speech recognition, arXiv:1412.5567 [cs.CL] (2014)

  7. Shan, C., Zhang, J., Wang, Y., Xie, L.: Attention-based end-to-end speech recognition on voice search. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4764–4768. IEEE (2018)

    Google Scholar 

  8. Li, J., Zhao, R., Hu, H., Gong, Y.: Improving RNN transducer modeling for end-to-end speech recognition. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 114–121. IEEE (2019)

    Google Scholar 

  9. Mon, A.N., Pa, W.P., Thu, Y.K.: Building HMM-SGMM continuous automatic speech recognition on Myanmar Web news. In: International Conference on Computer Applications (ICCA2017), pp. 446–453 (2017)

    Google Scholar 

  10. Naing, H.M.S., Pa, W.P.: Automatic speech recognition on spontaneous interview speech. In: Sixteenth International Conferences on Computer Applications (ICCA 2018), Yangon, Myanmar, pp. 203–208 (2018)

    Google Scholar 

  11. Nwe, T., Myint, T.: Myanmar language speech recognition with hybrid artificial neural network and hidden Markov model. In: Proceedings of 2015 International Conference on Future Computational Technologies (ICFCT 2015), pp. 116–122 (2015)

    Google Scholar 

  12. Naing, H.M.S., Hlaing, A.M., Pa, W.P., Hu, X., Thu, Y.K., Hori, C., Kawai, H.: A Myanmar large vocabulary continuous speech recognition system. In: 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 320–327. IEEE (2015)

    Google Scholar 

  13. Mon, A.N., Pa Pa, W., Thu, Y.K.: Improving Myanmar automatic speech recognition with optimization of convolutional neural network parameters. Int. J. Nat. Lang. Comput. (IJNLC) 7, 1–10 (2018)

    Article  Google Scholar 

  14. Aung, M.A.A., Pa, W.P.: Time delay neural network for Myanmar automatic speech recognition. In: 2020 IEEE Conference on Computer Applications (ICCA), pp. 1–4. IEEE (2020)

    Google Scholar 

  15. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (ICRL) (2015)

    Google Scholar 

  16. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift, arXiv:1502.03167 [cs.LG] (2015)

  17. Hannun: Sequence Modeling with CTC. Distill. 2 (2017)

    Google Scholar 

  18. Department of Meteorology and Hydrology. https://www.moezala.gov.mm/

  19. DVB TVnews. https://www.youtube.com/channel/UCuaRmKJLYaVMDHrnjhWUcHw

  20. Google: Myanmar Tools. https://github.com/google/myanmar-tools

  21. ICU - International Components for Unicode. https://site.icu-project.org/

  22. Thu, Y.K.: sylbreak. https://github.com/ye-kyaw-thu/sylbreak

Download references

Acknowledgments

The authors are grateful to the advisors from the University of Information Technology who gave us helpful comments and suggestions throughout this project. The authors also thank Ye Yint Htoon and May Sabal Myo for helping us with the dataset preparation and for technical assistance.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khin Me Me Chit .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chit, K.M.M., Lin, L.L. (2021). Exploring CTC Based End-To-End Techniques for Myanmar Speech Recognition. In: Vasant, P., Zelinka, I., Weber, GW. (eds) Intelligent Computing and Optimization. ICO 2020. Advances in Intelligent Systems and Computing, vol 1324. Springer, Cham. https://doi.org/10.1007/978-3-030-68154-8_87

Download citation

Publish with us

Policies and ethics