Abstract
Using speech in computer interaction is advantageous in many situation and more natural for the user. However, development of speech enabled applications presents, in general, a big challenge when designing the application, regarding the implementation of speech modalities and what the speech recognizer will understand.
In this paper we present the context of our work, describe the major challenges involved in using speech modalities, summarize our approach to speech interaction design and share experiences regarding our applications, their architecture and gathered insights.
In our approach we use a multimodal framework, responsible for the communication between modalities, and a generic speech modality allowing developers to quickly implement new speech enabled applications.
As part of our methodology, in order to inform development, we consider two different applications, one targeting smartphones and the other tablets or home computers. These adopt a multimodal architecture and provide different scenarios for testing the proposed speech modality.
Chapter PDF
Similar content being viewed by others
References
Ford sync, http://www.ford.com/technology/sync/
ios - siri, http://www.apple.com/ios/siri/
Baggia, P., Burnett, D.C., Carter, J., Dahl, D.A., McCobb, G., Raggett, D.: Emma: Extensible multimodal annotation markup language, http://www.w3.org/TR/emma/
Barnett, J., Akolkar, R., Auburn, R., Bodell, M., Burnett, D.C., Carter, J., McGlashan, S., Lager, T., Helbing, M., Hosn, R., Raman, T., Reifenrath, K., Rosenthal, N., Roxendal, J.: State Chart XML (SCXML): State Machine Notation for Control Abstraction, http://www.w3.org/TR/scxml/
Bernsen: Towards a tool for predicting speech functionality. Speech 23, 181–210 (1997)
Bernsen, N., Dybkjaer, L.: Multimodal Usability (2009)
Bernsen, N.O.: Multimodal usability: More on modalities (December 2012), http://www.multimodalusability.dk/
Bernsen, N.O.: Multimodality in language and speech systems – from theory to design support tool. In: Granstrm, B., House, D., Karlsson, I. (eds.) Multimodality in Language and Speech Systems, Text, Speech and Language Technology, vol. 19, pp. 93–148. Springer, Netherlands (2002)
Bodell, M., Dahl, D., Kliche, I., Larson, J., Porter, B.: Multimodal Architecture and Interfaces, W3C (2012), http://www.w3.org/TR/mmi-arch/
Bui, T.H.: Multimodal dialogue management - state of the art. Technical Report TR-CTIT-06-01, Centre for Telematics and Information Technology University of Twente, Enschede (January 2006)
Dahl, D.A.: The W3C multimodal architecture and interfaces standard. Journal on Multimodal User Interfaces (April 2013), http://springerlink.bibliotecabuap.elogim.com/10.1007/s12193-013-0120-5
Deketelaere, S., Cavalcante, R., RasaminJanahary, J.F.: Oasis speech-based interaction module. Tech. rep. (2009)
Hak, R., Dolezal, J., Zeman, T.: Manitou: A multimodal interaction platform. In: 2012 5th Joint IFIP Wireless and Mobile Networking Conference (WMNC), pp. 60–63 (September 2012)
Hale, K.S., Reeves, L., Stanney, K.M.: Design of systems for improved human interaction (2011)
Hoste, L., Dumas, B., Signer, B.: Mudra: A unified multimodal interaction framework. In: Proceedings of the 13th International Conference on Multimodal Interfaces, ICMI 2011, pp. 97–104. ACM, New York (2011)
Johnston, M., Fabbrizio, G.D., Urbanek, S.: mtalk - A multimodal browser for mobile services. In: INTERSPEECH, pp. 3261–3264. ISCA (2011)
Nass, C., Brave, S.: Wired for Speech: How Voice Activates and Advances the Human-computer Relationship. MIT Press (2007)
Sarter, N.: Multimodal information presentation in support of human-automation communication and coordination, vol. 2, pp. 13–35. Emerald Group Publishing Limited (2002)
Sarter, N.B.: Multimodal information presentation: Design guidance and research challenges. International Journal of Industrial Ergonomics 36(5), 439–445 (2006)
Teixeira, A., Braga, D., Coelho, L., Fonseca, J., Alvarelhão, J., Martins, I., Queirós, A., Rocha, N., Calado, A., Dias, M.: Speech as the basic interface for assistive technology. In: Proc. 2th International Conference on Software Development for Enhancing Accessibility and Fighting Info-Exclusion, DSAI (2009)
Teixeira, A., Hämäläinen, A., Avelar, J., Almeida, N., Németh, G., Fegyó, T., Zainkó, C., Csapó, T., Tóth, B., Oliveira, A., Dias, M.S.: Speech-centric multimodal interaction for easy-to-access online services – A personal life assistant for the elderly. In: Proc. DSAI 2013, Procedia Computer Science (November 2013)
Teixeira, A.J.S., Almeida, N., Pereira, C., Silva, M.O.: W3c mmi architecture as a basis for enhanced interaction for ambient assisted living. In: Get Smart: Smart Homes, Cars, Devices and the Web, W3C Workshop on Rich Multimodal Application Development. New York Metropolitan Area, US (July 2013)
Teixeira, A.J.S., Ferreira, F., Almeida, N., Rosa, A.F., Casimiro, J., Silva, S., Queirós, A., Oliveira, A.: Multimodality and adaptation for an enhanced mobile medication assistant for the elderly. In: Third Mobile Accessibility Workshop (MOBACC), CHI 2013 Extended Abstracts (April 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Almeida, N., Silva, S., Teixeira, A. (2014). Design and Development of Speech Interaction: A Methodology. In: Kurosu, M. (eds) Human-Computer Interaction. Advanced Interaction Modalities and Techniques. HCI 2014. Lecture Notes in Computer Science, vol 8511. Springer, Cham. https://doi.org/10.1007/978-3-319-07230-2_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-07230-2_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07229-6
Online ISBN: 978-3-319-07230-2
eBook Packages: Computer ScienceComputer Science (R0)