A Comprehensive Review of Recent Automatic Speech Summarization and Keyword Identification Techniques

Kumar, Tapesh; Mahrishi, Mehul; Meena, Gaurav

doi:10.1007/978-3-030-85383-9_8

Tapesh Kumar⁶,
Mehul Mahrishi⁶ &
Gaurav Meena⁷

Part of the book series: Learning and Analytics in Intelligent Systems ((LAIS,volume 25))

904 Accesses
15 Citations

Abstract

Speech has been the most popular form of human communication. A keyboard or a mouse, on the other hand, is the most common way of entering data into a computer. It would be wonderful if computers could understand and carry out human commands. The method of obtaining the transcription (word sequence) of an utterance from the speech waveform is known as automatic speech recognition (ASR). Over the last few decades, speech technology and systems in human-computer interaction have progressed progressively and significantly. This chapter suggests a comprehensive review of automatic speech recognition systems (ASR) and their most recent developments. This research aims to outline and explain some of the popular approaches in speech recognition systems at various stages and highlight selected systems’ unique and innovative characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Study on Speech Processing

Survey on Automatic Speech Recognition Systems for Indic Languages

Automatic Speech Recognition for Moroccan Dialects: A Review

References

M. Abdel-Mottaleb, N. Dimitrova, R. Desai, J. Martino, Conivas: content-based image and video access system. In Proceedings of the Fourth ACM International Conference on Multimedia, MULTIMEDIA ’96, pp. 427–428, New York, NY, USA, 1997. Association for Computing Machinery
Google Scholar
J. Adcock, M. Cooper, L. Denoue, H. Pirsiavash, L.A. Rowe, Talkminer: a lecture webcast search engine. In Proceedings of the 18th ACM International Conference on Multimedia, MM ’10 (New York, NY, USA 2010), pp. 241–250. Association for Computing Machinery
Google Scholar
T. Afouras, J.S. Chung, A. Senior, O. Vinyals, A. Zisserman. Deep audio-visual speech recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, p. 1 (2018)
Google Scholar
M.A. Anusuya, S.K Katti, Speech recognition by machine, a review. arXiv preprint (2010). arXiv:1001.2267
S.J. Arora, R.P. Singh, Automatic speech recognition: a review. Int. J. Comput. Appl. 60(9) (2012)
Google Scholar
A. Biswas, A. Gandhi, O. Deshmukh, Mmtoc: a multimodal method for table of content creation in educational videos. In Proceedings of the 23rd ACM International Conference on Multimedia, MM ’15 (New York, NY, USA, 2015), pp. 621–630. Association for Computing Machinery
Google Scholar
Li. Chai, Du. Jun, Qing-Feng. Liu, Chin-Hui. Lee, A cross-entropy-guided measure (cegm) for assessing speech recognition performance and optimizing dnn-based speech enhancement. IEEE/ACM Trans. Audio, Speech Lang. Proc. 29, 106–117 (2021)
Article Google Scholar
C.-C. Chiu, T.N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R.J. Weiss, K. Rao, E. Gonina, N. Jaitly, B. Li, J. Chorowski, M. Bacchiani, State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4774–4778 (2018)
Google Scholar
Shun-Po. Chuang, Alexander H. Liu, Tzu-Wei. Sung, Hung-yi Lee, Improving automatic speech recognition and speech translation via word embedding prediction. IEEE/ACM Trans. Audio, Speech Lang. Proc. 29, 93–105 (2021)
Article Google Scholar
Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen, Gated recurrent fusion with joint training framework for robust end-to-end speech recognition. IEEE/ACM Trans. Audio, Speech Lang. Proc. 29, 198–209 (2021)
Article Google Scholar
Gregory Gelly, Jean-Luc. Gauvain, Optimization of rnn-based speech activity detection. IEEE/ACM Trans. Audio, Speech Lang. Proc. 26(3), 646–656 (2018)
Article Google Scholar
Hossein Hadian, Hossein Sameti, Daniel Povey, Sanjeev Khudanpur, End-to-end speech recognition using lattice-free mmi. Proc. Interspeech 2018, 12–16 (2018)
Article Google Scholar
Reinhold Haeb-Umbach, Jahn Heymann, Lukas Drude, Shinji Watanabe, Marc Delcroix, Tomohiro Nakatani, Far-field automatic speech recognition. Proceedings of the IEEE 109(2), 124–148 (2021)
Article Google Scholar
C. Hui, S. Yunyu, Y. Haisheng, G. Ming, Yongxiang Liu Xiang, Xia, A fast and robust key frame extraction method for video copyright protection. J. Elect. Comp. Engin. (March 2017)
Google Scholar
S. Jothilakshmi, Spoken keyword detection using autoassociative neural networks. Int. J. Speech Technol. 17 (2014)
Google Scholar
C.H. Lee, B.H. Juang, W. Chou, Statistical and discriminative methods for speech recognition. The Kluwer International Series in Engineering and Computer Science (VLSI, Computer Architecture and Digital Signal Processing) (1996)
Google Scholar
V.K. Kamabathula, S. Iyer, Automated tagging to enable fine-grained browsing of lecture videos. In 2011 IEEE International Conference on Technology for Education, pp. 96–102 (2011)
Google Scholar
Tomoko Kawase, Manabu Okamoto, Takaaki Fukutomi, Yamato Takahashi, Speech enhancement parameter adjustment to maximize accuracy of automatic speech recognition. IEEE Trans. Consum. Electr. 66(2), 125–133 (2020)
Article Google Scholar
Qiuqiang Kong, Yin Cao, Turab Iqbal, Yuxuan Wang, Wenwu Wang, Mark D. Plumbley, Panns: Large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Trans. Audio, Speech Lang. Proc. 28, 2880–2894 (2020)
Article Google Scholar
M. Lin, J.F. Nunamaker, M. Chau, H. Chen, Segmentation of lecture videos based on text: a method combining multiple linguistic features. In 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the, p. 9 (2004)
Google Scholar
M. Mahrishi, S. Morwal, Index point detection and semantic indexing of videos a comparative review. Advances in Intelligent Systems and Computing AISC Springer (2020)
Google Scholar
M. Merler, J.R. Kender, Semantic keyword extraction via adaptive text binarization of unstructured unsourced video. In 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 261–264 (2009)
Google Scholar
Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan, Online hybrid ctc/attention end-to-end automatic speech recognition architecture. IEEE/ACM Trans. Audio, Speech Lang. Proc. 28, 1452–1465 (2020)
Article Google Scholar
J. Pustejovsky, A. Stubbs, Natural language annotation for machine learning
Google Scholar
R. Rana, R. Singh, D. Mishra, An improved hindi speech recognition system by using i-rover (2013)
Google Scholar
Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, Yoshua Bengio, Light gated recurrent units for speech recognition. IEEE Trans. Emerging Topics Comput. Intell. 2(2), 92–102 (2018)
Article Google Scholar
M. Riedl, C. Biemann, TopicTiling: a text segmentation algorithm based on LDA. In Proceedings of ACL 2012 Student Research Workshop (Jeju Island, Korea, July 2012), pp. 37–42. Association for Computational Linguistics
Google Scholar
Florinda Sauli, Alberto Cattaneo, Hans van der Meij, Hypervideo for educational purposes: a literature review on a multifaceted technological tool. Technol. Pedag. Educ. 27(1), 115–134 (2018)
Article Google Scholar
M. Sharma, K. Sarma, Soft-Computational Techniques and Spectro-Temporal Features for Telephonic Speech Recognition: An Overview and Review of Current State of the Art, 11 (2015)
Google Scholar
R. Sharma, M. Mahrishi, S. Morwal, G. Sharma, Index point detection for text summarization using cosine similarity in educational videos. IOP Conf. Series Mater. Sci. Eng. 1131(1), 012001 (Apr 2021)
Google Scholar
Xiusong Sun, Bo. Wang, Shaohan Liu, Lu. Tingxiang, Xin Shan, Qun Yang, Lmc-smca: A new active learning method in asr. IEEE Access 9, 37011–37021 (2021)
Article Google Scholar
Andros Tjandra, Sakriani Sakti, Satoshi Nakamura, End-to-end speech recognition sequence training with reinforcement learning. IEEE Access 7, 79758–79769 (2019)
Article Google Scholar
N.J. Uke, R. Thool, Segmentation and organization of lecture video based on visual contents. Int. J. e-Education, e-Business, e-Management and e-Learning (2012)
Google Scholar
Jing-Xuan. Zhang, Zhen-Hua. Ling, Li-Juan. Liu, Yuan Jiang, Li-Rong. Dai, Sequence-to-sequence acoustic modeling for voice conversion. IEEE/ACM Trans. Audio, Speech Lang. Proc. 27(3), 631–644 (2019)
Article Google Scholar
Lin Zhang, Lu. Yao, Video object segmentation by latent outcome regression. IEEE Access 8, 30355–30367 (2020)
Article Google Scholar
W. Zhang, X. Cui, U. Finkler, B. Kingsbury, G. Saon, D. Kung, M. Picheny. Distributed deep learning strategies for automatic speech recognition. In ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5706–5710 (2019)
Google Scholar
Tianxiang Zhou, Ke Wang, Jun Wu, and Ruifeng Li. Video text processing method based on image stitching. In 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC), pp. 561–566 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Swami Keshvanand Institute of Technology, Jaipur, India
Tapesh Kumar & Mehul Mahrishi
Central University of Rajasthan, Ajmer, India
Gaurav Meena

Authors

Tapesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Mehul Mahrishi
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Meena
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mehul Mahrishi .

Editor information

Editors and Affiliations

Department of Computer Science, Creighton University, Omaha, NE, USA
Steven Lawrence Fernandes
Shobhit University, Saharanpur, India
Tarun K. Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kumar, T., Mahrishi, M., Meena, G. (2022). A Comprehensive Review of Recent Automatic Speech Summarization and Keyword Identification Techniques. In: Fernandes, S.L., Sharma, T.K. (eds) Artificial Intelligence in Industrial Applications. Learning and Analytics in Intelligent Systems, vol 25. Springer, Cham. https://doi.org/10.1007/978-3-030-85383-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-85383-9_8
Published: 08 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85382-2
Online ISBN: 978-3-030-85383-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Comprehensive Review of Recent Automatic Speech Summarization and Keyword Identification Techniques

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Study on Speech Processing

Survey on Automatic Speech Recognition Systems for Indic Languages

Automatic Speech Recognition for Moroccan Dialects: A Review

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comprehensive Review of Recent Automatic Speech Summarization and Keyword Identification Techniques

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Study on Speech Processing

Survey on Automatic Speech Recognition Systems for Indic Languages

Automatic Speech Recognition for Moroccan Dialects: A Review

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation