Abstract
With the development of modern informational communication systems, voice control interface and speech recognition systems find application in various fields of activity. One application of such systems is for people with special needs who have speech impairments, and thus find using speech-dependent voice interfaces challenging. Our research team is developing a speaker dependent computer system «Deep Interactive Voice Assistant» (DIVA), which allows recognizing an arbitrary set of commands to control the computing system. The article presents the results of testing various artificial neural networks to train the machine to recognize vocal inputs. We examine such architectures as associative memory, multilayer perceptron and convolutional network. The research justifies the use of multilayer perceptron for the speaker dependent computer system DIVA as a training solution that demonstrated high results on a small selection. DIVA will be implemented in voice-user interface of such systems as «Smart House», mobile applications and IT-based assistive systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Convention on the Rights of Persons with Disabilities (CRPD): http://www.un.org/development/desa/disabilities/convention-on-the-rights-of-persons-with-disabilities.html. Accessed 01 May 2018
Gaida, C.: Comparing open-source speech recognition toolkits. http://suendermann.com/su/pdf/oasis2014.pdf. Accessed 01 May 2018
Gazetić, E.: Comparison Between Cloud-based and Offline Speech Recognition Systems. https://mediatum.ub.tum.de/doc/1399984/1399984.pdf. Accessed 01 May 2018
Rybka, J., Janicki, A.: Comparison of speaker dependent and speaker independent emotion recognition. Appl. Math. Comput. Sci. 4(23), 797–808 (2013)
Lee, K., Huang, X.: On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition. IEEE Trans. Speech Audio Process. 2(1), 150–157 (1993)
Senkevich, G.: Computer for People with Disabilities. BHV-Petersburg, St. Petersburg (2014)
Center of Speech Technologies: https://www.speechpro.ru/. Accessed 01 May 2018
El Amrania, M., Hafizur Rahmanb, M., Wahiddinb, M., Shahb, A.: Building CMU Sphinx language model for the Holy Quran using simplified Arabic phonemes. Egypt. Inform. J. 3(17), 305–314 (2016)
Tampel, I.: Automatic speech recognition - the main stages of 50 years. Sci. Tech. Her. Inf. Technol. Mech. Opt. 6(15), 957–968 (2015)
Roebuck, K.: Speech Recognition: High-Impact Emerging Technology - What You Need To Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors. Emereo Publishing, Australia (2012)
Povey, D.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, pp. 1–4 (2011)
Lange, P., Suendermann-Oeft, D.: Tuning Sphinx to outperform Google’s speech API. In: Proceedings of the ESSV 2014, Conference on Electronic Speech Signal Processing, Dresden, Germany (2014)
Simon, O.: Haykin Neural Networks and Learning Machines, 3rd edn. Pearson, Upper Saddle River (2009)
Zhang, Y., Pezeshki, M., Brakel, P., Zhang, S., Bengio, C.L.Y., Courville, A.: Towards end-to-end speech recognition with deep convolutional neural networks, CoRR, vol. abs/1701.02720. http://arxiv.org/abs/1701.02720 (2017)
Vazquez, R.A., Sossa, H.: Associative Memories Applied to Image Categorization. In: Martínez-Trinidad, J.F., Carrasco Ochoa, J.A., Kittler, J. (eds.) CIARP 2006. LNCS, vol. 4225, pp. 549–558. Springer, Heidelberg (2006)
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79, 2554–2558 (1982)
Vaishnavi, Y., Shreyas, R., Suhas, S., Surya, U.N., Ladwani V.M., Ramasubramanian, V.: Associative memory framework for speech recognition: adaptation of Hopfield network. In: IEEE Annual India Conference (INDICON), Bangalore, pp. 1–6 (2016)
Ladwani, V.M., Vaishnavi, Y., Shreyas, R., Vinay Kumar, B.R., Harisha, N., Yogesh, S., Shivaganga, P., Ramasubramanian, V.: Hopfield net framework for audio search. In: Communications (NCC), pp. 1–6. https://doi.org/10.1109/ncc.2017.8077074 (2017)
Barra, A., Beccaria, M., Fachechi, A.: A relativistic extension of Hopfield neural networks via the mechanical analogy. arXiv:1801.01743v1 (2018)
Hamming, R.: Coding and Information Theory. Prentice-Hall, Englewood Cliffs (1968)
Kosko, B.: Adaptive bidirectional associative memories. Appl. Opt. 26(23), 4947–4960 (1987)
Willshaw, D.J., Buneman, O.P., Longuet-Higgins, H.C.: Non-holographic associative memory. Nature 222, 960–962 (1969)
Stöckel, A.: Design Space Exploration of Associative Memories Using Spiking Neurons with Respect to Neuromorphic Hardware Implementations. Universität Bielefeld, Bielefeld (2016)
Vázquez, A.: New associative model with dynamical synapses. Neural Process. Lett. 28(3), 189–207 (2008)
Vázquez, R. Sossa, H.: Voice translator based on associative memories. In: Advances in Neural Networks, pp. 341–350 (2008)
Minghu, J., Biqin, L., Baozong, Y.: Speech recognition by using the extended associative memory neural network (EAMNN). In: IEEE International Conference on Intelligent Processing Systems, vol. 2, pp. 1777–1780 (1997)
Krotov, D., Hopfield, J.: Dense associative memory for pattern recognition. In: Advances in Neural Information Processing Systems 29, pp. 1172–1180 (2016)
Giovanni, C.: Design of associative memory for gray-scale images by multilayer Hopfield neural networks. In: Proceedings of the 10th WSEAS International Conference on CIRCUITS, Vouliagmeni, Athens, Greece, pp. 376–379 (2006)
Sussner, P., Esmi, E., Villaverde, I., Graña, M.: The Kosko subsethood fuzzy associative memory (KS-FAM): mathematical background and applications in computer vision. J. Math. Imaging Vis. 42, 134–149 (2012)
Kohonen, T.: Self-organizing Maps, 3rd Extended edn. Springer, New York/Heidelberg (2001)
Furao, S., Ouyang, Q., Kasai, W., Hasegawa, O.: A general associative memory based on self-organizing incremental neural network. Neurocomputing 104, 57–71 (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Khorosheva, T., Novoseltseva, M., Geidarov, N., Krivosheev, N., Chernenko, S. (2019). Neural Network Control Interface of the Speaker Dependent Computer System «Deep Interactive Voice Assistant DIVA» to Help People with Speech Impairments. In: Abraham, A., Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds) Proceedings of the Third International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’18). IITI'18 2018. Advances in Intelligent Systems and Computing, vol 874. Springer, Cham. https://doi.org/10.1007/978-3-030-01818-4_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-01818-4_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01817-7
Online ISBN: 978-3-030-01818-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)