Skip to main content

Multimodality in Language and Speech Systems — From Theory to Design Support Tool

  • Chapter
Multimodality in Language and Speech Systems

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 19))

Abstract

This paper presents an approach towards achieving fundamental understanding of unimodal and multimodal output and input representations with the ultimate purpose of supporting the design of usable unimodal and multimodal human-human-system interaction (HHSI). The phrase ‘human-human-system interaction’ is preferred to the more common ‘human-computer interaction’ (HCI) because the former would appear to provide a better model of our interaction with systems in the future, involving (i) more than one user, (ii) a complex networked system rather than a (desktop) ‘computer’ which in most applications may soon be a thing of the past, and (iii) a system which increasingly behaves as an equal to the human users (Bernsen, 2000). Whereas the enabling technologies for multimodal representation and exchange of information are growing rapidly, there is a lack of theoretical understanding of how to get from the requirements specification of some application of innovative interactive technology to a selection of the input/output modalities for the application which will optimise the usability and naturalness of interaction. Modality Theory is being developed to address this, as it turns out, complex and thorny problem starting from what appears to be a simple and intuitively evident assumption. It is that, as long as we are in the dark with respect to the nature of the elementary, or unimodal, modalities of which multimodal presentations must be composed, we do not really understand what multimodality is. To achieve at least part of the understanding needed, it appears, the following objectives should be pursued, defining the research agenda of Modality Theory (Bernsen, 1993):

  1. (1)

    To establish an exhaustive taxonomy and systematic analysis of the unimodal modalities which go into the creation of multimodal output representations of information for HHSI.

  2. (2)

    To establish an exhaustive taxonomy and systematic analysis of the unimodal modalities which go into the creation of multimodal input representations of information for HHSI. Together with Step (1) above, this will provide sound foundations for describing and analysing any particular system for interactive representation and exchange of information.

  3. (3)

    To establish principles for how to legitimately combine different unimodal output modalities, input modalities, and input/output modalities for usable representation and exchange of information in HHSI.

  4. (4)

    To develop a methodology for applying the results of Steps (1) – (3) above to the early design analysis of how to map from the requirements specification of some application to a usable selection of input/output modalities.

  5. (5)

    To use results in building, possibly automated, practical interaction design support tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Baber, C. & J. Noyes (Eds.). Interactive Speech Technology. London: Taylor & Francis, 1993.

    Google Scholar 

  • Benoit, C., J.C. Martin, C. Pelachaud, L. Schomaker & B. Suhm. Suhm “Audio-Visual and Multimodal Speech Systems.” In: D. Gibbon (Ed.), Handbook of Standards and Resources for Spoken Language Systems - Supplement Volume. Kluwer, 2000.

    Google Scholar 

  • Bernsen, N.O. “A research agenda for modality theory.” In: Cox, R., Petre, M., Bma, P., and Lee, J. (Eds.), Proceedings of the Workshop on Graphical Representations, Reasoning and Communication. World Conference on Artificial Intelligence in Education. Edinburgh, 1993: 43–46.

    Google Scholar 

  • Bernsen, N.O. “Foundations of multimodal representations. A taxonomy of representational modalities.” Interacting with Computers 6. 4 347–71, 1994.

    Google Scholar 

  • Bernsen, N.O. “Why are analogue graphics and natural language both needed in HCI?” In: Paterno, F. (Ed.), Design, Specification and Verification of Interactive Systems. Proceedings of the Eurographics Workshop, Carrara, Italy, 165–179. Focus on Computer Graphics. Springer Verlag, 1995: 235–51, 1994.

    Google Scholar 

  • Bernsen, N.O. “Towards a tool for predicting speech functionality.” Speech Communication 23: 181–210, 1997.

    Article  Google Scholar 

  • Bernsen, N.O. “Natural human-human-system interaction.” In: Eamshaw, R., R Guedj, A. van Dam & J. Vince (Eds.). Frontiers of Human-Centred Computing, On-Line Communities and Virtual Environments. Berlin: Springer Verlag, 2000.

    Google Scholar 

  • Bernsen, N.O. & L. Dybkjær. “Working Paper on Speech Functionality.” Esprit Long-Term Research Project DISC Year 2 Deliverable D2.10. University of Southern Denmark. See www.disc2.dk, 1999a.

    Google Scholar 

  • Bernsen, N.O. & L. Dybkjær. “A theory of speech in multimodal systems.” In: Dalsgaard, P., C.-H. Lee, P. Heisterkamp & R. Cole (Eds.). Proceedings of the ESCA Workshop on Interactive Dialogue in Multi-Modal Systems, Irsee, Germany. Bonn: European Speech Communication Association: 105–108, 1999b.

    Google Scholar 

  • Bernsen, N.O., H. Dybkjær & L. Dybkjær. Designing Interactive Speech Systems. From First Ideas to User Testing. Springer Verlag, 1998.

    Google Scholar 

  • Bernsen, N.O. & S. Lu. “A software demonstrator of modality theory.” In: Bastide, R. & P. Palanque (Eds.). Proceedings of DSV-IS’95: Second Eurographics Workshop on Design, Specification and Verification of Interactive Systems. Springer Verlag, 242–61, 1995.

    Google Scholar 

  • Bernsen, N.O. & S. Verjans. “From task domain to human-computer interface. Exploring an information mapping methodology.” In: John Lee (Ed). Intelligence and Multimodality in Multimedia Interfaces. Menlo Park, CA: AAAI PressURL: http://www.aaai.org/Press/Books/Lee/lee.html, 1997.

    Google Scholar 

  • Bertin, J. Semiology of Graphics. Diagrams. Networks. Maps. Trans. by J. Berg. Madison: The University of Wisconsin Press, 1983.

    Google Scholar 

  • Bodart, F., A.M., Hennebert, J.-M. Leheureux, I. Provot, G. Zucchinetti & J. Vanderdonckt. “Key Activities for a Development Methodology of Interactive Applications.” In: Benyon, D. & P. Palanque (Eds.). Critical Issues in User Interface Systems Engineering, Springer Verlag, 1995.

    Google Scholar 

  • Buxton, W. “Lexical and pragmatic considerations of input structures.” Computer Graphics 17, 1: 31–37, 1983.

    Google Scholar 

  • Foley, J.D., V.L., Wallace & P. Chan. “The Human Factors of Graphic Interaction Techniques.” IEEE Computer Graphics and Application 4. 11: 13–48, 1984.

    Google Scholar 

  • Greenstein, J.S. & L.Y. Amaut. “Input devices.” In: M. Helander (Ed.). Handbook of Human-Computer Interaction, Amsterdam: North-Holland, 495–519, 1988.

    Google Scholar 

  • Holmes, N. Designer’s Guide to Creating Charts and Diagrams. New York: Watson-Guptill Publications, 1984.

    Google Scholar 

  • Hovy, E. & Y. Arens. “When is a picture worth a thousand words? Allocation of modalities in multimedia communication.” Paper presented at the AAAI Symposium on Human-Computer Interfaces, Stanford, 1990.

    Google Scholar 

  • Joslyn, C., C. Lewis & B. Domik. “Designing glyphs to exploit patterns in multidimensional data sets.” CHI’95 Conference Companion, 198–199, 1995.

    Google Scholar 

  • Lenorovitz, D.R., M.D. Phillips, R.S. Ardrey & G.V. Kloster. “A taxonomic approach to characterizing human-computer interaction.” In: G. Salvendy (Ed.). Human-Computer Interaction. Amsterdam: Elsevier Science Publishers, 111–116, 1984.

    Google Scholar 

  • Lockwood, A. Diagram. A visual survey of graphs, maps, charts and diagrams for the graphic designer. London: Studio Vista, 1969.

    Google Scholar 

  • Lohse, G., N. Walker, K. Biolsi & H. Rueter. “Classifying graphical information.” Behaviour and Information Technology 10, 5419–36, 1991.

    Article  Google Scholar 

  • Luz, S. & Bemsen, N.O. “Interactive advice on the use of speech in multimodal systems design with SMALTO.” In: Ostermann, J., K.J. Ray Liu, J.Aa. Sorensen, E. Deprettere & W.B. Kleijn (Eds.). Proceedings of the Third IEEE Workshop on Multimedia Signal Processing, Elsinore, Denmark. IEEE, Piscataway, NJ: 489–494, 1999.

    Google Scholar 

  • Mackinlay, J., S.K. Card & G.G. Robertson. “A semantic analysis of the design space of input devices.” Human-Computer Interaction 5: 145–90, 1990.

    Article  Google Scholar 

  • Mullet, K. & D.J. Schiano. “3D or not 3D: `More is better’ or `Less is more’?” CHI’95 Conference Companion, 174–175, 1995.

    Google Scholar 

  • Rosch, E. “Principles of categorization.” In: Rosch, E. & B.B. Lloyd (Eds.). Cognition and Categorization. Hillsdale, NJ: Erlbaum, 1978. SMALTO: http://disc.nis.sdu.dk/smalto/

  • Stenning, K. & J. Oberlander. “Reasoning with words, pictures and calculi: Computation versus justification.” In: Barwise, J., J.M. Gawron, G. Plotkin & S. Tutiya (Eds.). Situation Theory and Its Applications. Stanford, CA: CSLI, Vol. 2: 607–62, 1991.

    Google Scholar 

  • Tufte, E.R. The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press, 1983. Tufte, E.R. Envisioning information. Cheshire, CT: Graphics Press, 1990.

    Google Scholar 

  • Twyman, M. “A schema for the study of graphic language.” In: Kolers, P., M. Wrolstad & H. Bouna (Eds.). Processing of Visual Language Vol. 1. New York: Plenum Press, 1979.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Bernsen, N.O. (2002). Multimodality in Language and Speech Systems — From Theory to Design Support Tool. In: Granström, B., House, D., Karlsson, I. (eds) Multimodality in Language and Speech Systems. Text, Speech and Language Technology, vol 19. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2367-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2367-1_6

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-6024-2

  • Online ISBN: 978-94-017-2367-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics