Abstract
This paper presents an approach towards achieving fundamental understanding of unimodal and multimodal output and input representations with the ultimate purpose of supporting the design of usable unimodal and multimodal human-human-system interaction (HHSI). The phrase ‘human-human-system interaction’ is preferred to the more common ‘human-computer interaction’ (HCI) because the former would appear to provide a better model of our interaction with systems in the future, involving (i) more than one user, (ii) a complex networked system rather than a (desktop) ‘computer’ which in most applications may soon be a thing of the past, and (iii) a system which increasingly behaves as an equal to the human users (Bernsen, 2000). Whereas the enabling technologies for multimodal representation and exchange of information are growing rapidly, there is a lack of theoretical understanding of how to get from the requirements specification of some application of innovative interactive technology to a selection of the input/output modalities for the application which will optimise the usability and naturalness of interaction. Modality Theory is being developed to address this, as it turns out, complex and thorny problem starting from what appears to be a simple and intuitively evident assumption. It is that, as long as we are in the dark with respect to the nature of the elementary, or unimodal, modalities of which multimodal presentations must be composed, we do not really understand what multimodality is. To achieve at least part of the understanding needed, it appears, the following objectives should be pursued, defining the research agenda of Modality Theory (Bernsen, 1993):
-
(1)
To establish an exhaustive taxonomy and systematic analysis of the unimodal modalities which go into the creation of multimodal output representations of information for HHSI.
-
(2)
To establish an exhaustive taxonomy and systematic analysis of the unimodal modalities which go into the creation of multimodal input representations of information for HHSI. Together with Step (1) above, this will provide sound foundations for describing and analysing any particular system for interactive representation and exchange of information.
-
(3)
To establish principles for how to legitimately combine different unimodal output modalities, input modalities, and input/output modalities for usable representation and exchange of information in HHSI.
-
(4)
To develop a methodology for applying the results of Steps (1) – (3) above to the early design analysis of how to map from the requirements specification of some application to a usable selection of input/output modalities.
-
(5)
To use results in building, possibly automated, practical interaction design support tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baber, C. & J. Noyes (Eds.). Interactive Speech Technology. London: Taylor & Francis, 1993.
Benoit, C., J.C. Martin, C. Pelachaud, L. Schomaker & B. Suhm. Suhm “Audio-Visual and Multimodal Speech Systems.” In: D. Gibbon (Ed.), Handbook of Standards and Resources for Spoken Language Systems - Supplement Volume. Kluwer, 2000.
Bernsen, N.O. “A research agenda for modality theory.” In: Cox, R., Petre, M., Bma, P., and Lee, J. (Eds.), Proceedings of the Workshop on Graphical Representations, Reasoning and Communication. World Conference on Artificial Intelligence in Education. Edinburgh, 1993: 43–46.
Bernsen, N.O. “Foundations of multimodal representations. A taxonomy of representational modalities.” Interacting with Computers 6. 4 347–71, 1994.
Bernsen, N.O. “Why are analogue graphics and natural language both needed in HCI?” In: Paterno, F. (Ed.), Design, Specification and Verification of Interactive Systems. Proceedings of the Eurographics Workshop, Carrara, Italy, 165–179. Focus on Computer Graphics. Springer Verlag, 1995: 235–51, 1994.
Bernsen, N.O. “Towards a tool for predicting speech functionality.” Speech Communication 23: 181–210, 1997.
Bernsen, N.O. “Natural human-human-system interaction.” In: Eamshaw, R., R Guedj, A. van Dam & J. Vince (Eds.). Frontiers of Human-Centred Computing, On-Line Communities and Virtual Environments. Berlin: Springer Verlag, 2000.
Bernsen, N.O. & L. Dybkjær. “Working Paper on Speech Functionality.” Esprit Long-Term Research Project DISC Year 2 Deliverable D2.10. University of Southern Denmark. See www.disc2.dk, 1999a.
Bernsen, N.O. & L. Dybkjær. “A theory of speech in multimodal systems.” In: Dalsgaard, P., C.-H. Lee, P. Heisterkamp & R. Cole (Eds.). Proceedings of the ESCA Workshop on Interactive Dialogue in Multi-Modal Systems, Irsee, Germany. Bonn: European Speech Communication Association: 105–108, 1999b.
Bernsen, N.O., H. Dybkjær & L. Dybkjær. Designing Interactive Speech Systems. From First Ideas to User Testing. Springer Verlag, 1998.
Bernsen, N.O. & S. Lu. “A software demonstrator of modality theory.” In: Bastide, R. & P. Palanque (Eds.). Proceedings of DSV-IS’95: Second Eurographics Workshop on Design, Specification and Verification of Interactive Systems. Springer Verlag, 242–61, 1995.
Bernsen, N.O. & S. Verjans. “From task domain to human-computer interface. Exploring an information mapping methodology.” In: John Lee (Ed). Intelligence and Multimodality in Multimedia Interfaces. Menlo Park, CA: AAAI PressURL: http://www.aaai.org/Press/Books/Lee/lee.html, 1997.
Bertin, J. Semiology of Graphics. Diagrams. Networks. Maps. Trans. by J. Berg. Madison: The University of Wisconsin Press, 1983.
Bodart, F., A.M., Hennebert, J.-M. Leheureux, I. Provot, G. Zucchinetti & J. Vanderdonckt. “Key Activities for a Development Methodology of Interactive Applications.” In: Benyon, D. & P. Palanque (Eds.). Critical Issues in User Interface Systems Engineering, Springer Verlag, 1995.
Buxton, W. “Lexical and pragmatic considerations of input structures.” Computer Graphics 17, 1: 31–37, 1983.
Foley, J.D., V.L., Wallace & P. Chan. “The Human Factors of Graphic Interaction Techniques.” IEEE Computer Graphics and Application 4. 11: 13–48, 1984.
Greenstein, J.S. & L.Y. Amaut. “Input devices.” In: M. Helander (Ed.). Handbook of Human-Computer Interaction, Amsterdam: North-Holland, 495–519, 1988.
Holmes, N. Designer’s Guide to Creating Charts and Diagrams. New York: Watson-Guptill Publications, 1984.
Hovy, E. & Y. Arens. “When is a picture worth a thousand words? Allocation of modalities in multimedia communication.” Paper presented at the AAAI Symposium on Human-Computer Interfaces, Stanford, 1990.
Joslyn, C., C. Lewis & B. Domik. “Designing glyphs to exploit patterns in multidimensional data sets.” CHI’95 Conference Companion, 198–199, 1995.
Lenorovitz, D.R., M.D. Phillips, R.S. Ardrey & G.V. Kloster. “A taxonomic approach to characterizing human-computer interaction.” In: G. Salvendy (Ed.). Human-Computer Interaction. Amsterdam: Elsevier Science Publishers, 111–116, 1984.
Lockwood, A. Diagram. A visual survey of graphs, maps, charts and diagrams for the graphic designer. London: Studio Vista, 1969.
Lohse, G., N. Walker, K. Biolsi & H. Rueter. “Classifying graphical information.” Behaviour and Information Technology 10, 5419–36, 1991.
Luz, S. & Bemsen, N.O. “Interactive advice on the use of speech in multimodal systems design with SMALTO.” In: Ostermann, J., K.J. Ray Liu, J.Aa. Sorensen, E. Deprettere & W.B. Kleijn (Eds.). Proceedings of the Third IEEE Workshop on Multimedia Signal Processing, Elsinore, Denmark. IEEE, Piscataway, NJ: 489–494, 1999.
Mackinlay, J., S.K. Card & G.G. Robertson. “A semantic analysis of the design space of input devices.” Human-Computer Interaction 5: 145–90, 1990.
Mullet, K. & D.J. Schiano. “3D or not 3D: `More is better’ or `Less is more’?” CHI’95 Conference Companion, 174–175, 1995.
Rosch, E. “Principles of categorization.” In: Rosch, E. & B.B. Lloyd (Eds.). Cognition and Categorization. Hillsdale, NJ: Erlbaum, 1978. SMALTO: http://disc.nis.sdu.dk/smalto/
Stenning, K. & J. Oberlander. “Reasoning with words, pictures and calculi: Computation versus justification.” In: Barwise, J., J.M. Gawron, G. Plotkin & S. Tutiya (Eds.). Situation Theory and Its Applications. Stanford, CA: CSLI, Vol. 2: 607–62, 1991.
Tufte, E.R. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press, 1983. Tufte, E.R. Envisioning information. Cheshire, CT: Graphics Press, 1990.
Twyman, M. “A schema for the study of graphic language.” In: Kolers, P., M. Wrolstad & H. Bouna (Eds.). Processing of Visual Language Vol. 1. New York: Plenum Press, 1979.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Bernsen, N.O. (2002). Multimodality in Language and Speech Systems — From Theory to Design Support Tool. In: Granström, B., House, D., Karlsson, I. (eds) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_6
Download citation
DOI: https://doi.org/10.1007/978-94-017-2367-1_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-6024-2
Online ISBN: 978-94-017-2367-1
eBook Packages: Springer Book Archive