Keywords

1 Introduction

“Perhaps the single quality most central to humanness is the ability to exchange thoughts, ideas, and feelings with others” [1, p. 235]. In this context, speech is seen as a person’s most important instrument to be in contact with their human surroundings [2]. However, individuals with severe physical disabilities or brain injury may not be able to control their oral-respiratory musculature sufficiently for speech [3, 4]. This restriction of verbal communication, and hence, the “separation from the mainstream of society” [1, p. 235] holds true for people with damages to the vocal tract or other handicaps affecting speech as well [4]. In Germany alone, almost 7,000 out of 317,748 humans who were registered as severely handicapped had a proven speech impediment in 2017 [5]. In this respect, augmentative and alternative communication (AAC) systems have been introduced as a means to assist individuals with communication [1, 6,7,8]. An AAC system is defined as a technology that consists of a “group of components, including the symbols, aids, strategies, and techniques used by individuals to enhance communication” [9, p. 10]. In the past, “non-vocal” communication (unaided AAC), e.g., through the use of sign language or picture modes, has been at the center of AAC research [6]. However, in recent decades, voice output communication aids (VOCA) have gained tremendous popularity due to rapid technological progress [1, 6, 8]. It has been shown that VOCAs have a positive impact on disabled people’s communication skills and development [7, 10, 11]. Considering this, the use of tablet-PCs for supporting AAC is judged to be particularly promising [8, 12]. First, the development of AAC systems via tablet-PCs can be realized in a highly cost-effective and user-friendly way [8, 13]. Second, because the technology is widely spread within the population, high social acceptance for tablet-based AAC systems is given [8, 12, 13]. Third, tablet-PCs are more portable, affordable and easy-to-use than many traditional AAC devices [12, 14, 15].

However, the development of AAC systems for tablet-PCs (e.g., [12]) is challenging [16]. For instance, the abilities of current development frameworks to exploit the full capabilities of topical tablet-devices are often unclear [16]. Moreover, the availability of mobile technologies often leads to a premature and overly hasty introduction of AAC systems [17]. Thereby, it is widely acknowledged that the simple provision of an AAC system developed ad hoc will rarely result in effective communication by people who have more complex communication requirements [13]. Accordingly, a clear focus on the aspirations and wants of the user group is central to the successful implementation of a mobile or wearable AAC system [4, 17]. Considering this, typical users of wearable AAC systems are persons with a speech impediment on the one hand but also caregivers or therapists, who configure the system, on the other hand. Thereby, two major skill profiles of speech impaired persons need to be distinguished [18]. Persons with the profile “non-speaking” have sufficient speech comprehension, though their speech production is disabled [18]. Individuals with the profile “non-verbal” have both limited speech comprehension and production, often caused by aphasia and/or multiple disabilities [18]. AAC systems for these groups differ in their vocabulary (cf. [19]); e.g., the composition of a sentence out of singular fragments requires distinct speech comprehension abilities. Hence, this approach is only appropriate for the profile “non-speaking” [20]. In contrast, systems which enable the output of a whole sentence by pushing a certain key are recommended for persons with the profile “non-verbal” [18]. In the work at hand, we focus on the profile “non-verbal” and pose the following research question (RQ): What can a tablet-PC-based and open source AAC system for “non-verbal” speech impaired persons look like that makes use of the current capabilities of the mobile device? Generally, this paper describes a first step in our long-term effort to identify how AAC applications on portable devices may impact the communication abilities of patients with speech impediments. Many existing applications in this field are not available on an open source base and they do not keep up with the progress of software and hardware capabilities of tablet-PCs [16]. Accordingly, topical studies that focus on the functionalities of app development environments (e.g., iOS SDK) to come to innovative tablet-PC-based AAC systems are largely missing. The same holds true for works focusing on the applicability of open source components (e.g., freely available symbol libraries, translation services, etc.) to build AAC systems. We contribute to the generation of corresponding insights by implementing a running, open source “symbols to speech” prototype (cf. [12]) that aims at exploiting the current capabilities of an iPad device. The prototype will address “non-verbal” persons in a first step, because the requirements can be specified more precisely for a homogeneous user group (cf. [12, 17]). But since the degree of disability varies within this group, we further narrow the focus on individuals having (1) sufficient motor skills to select symbols on the tablet screen, (2) visual abilities to recognize symbols and (3) intellectual skills to handle the AAC device and to understand the semantics of symbols. The paper is structured as follows: in the next section, we provide theoretical foundations. Subsequently, the research methodology and the requirements on our prototype are shown. After having described the development, the study concludes with a discussion and an outlook.

2 Foundations and Related Work

AAC has been a topic of lively discussion in research (e.g., [1, 8, 21]). In particular, the design and development of new AAC systems and instructional strategies rely on the use of commonly acknowledged theoretical constructs to best serve people with complex communication needs [21]. Important theoretical constructs in AAC research emerge from various scientific fields like natural language processing, machine learning (e.g., [22]), language acquisition (e.g., [23]), communication or social interaction (e.g., [24]). Another essential feature of AAC research is the active participation of people who rely on AAC since they provide evidence about the effectiveness of AAC systems and strategies [21]. Principally, research in the area of AAC has a broad range and seeks to improve the lives of people with complex communication needs [21]. From a general perspective, the positive impact of VOCAs on the interaction between support personnel and disabled persons is shown by Schepis et al. [6] or Brady [25] amongst others. In this context, Ganz et al. [26] conduct a meta-study of 24 case studies dealing with aided AAC for persons with autism spectrum disorders and receive indicators for a strong effect of AAC on the targeted behavioral outcomes. A suggestion for a corresponding AAC system for children with autism is introduced by Sampath et al. [27] for example. Regarding the development of proprietary AAC systems that go along with a proprietary device (e.g., handheld device, etc.), contributions are made by Allen [4], Pollak and Gallagher [2], Hornero et al. [28] or Francioli [29] for instance. Gonzales et al. [30] focus on the advantages and disadvantages of such proprietary AAC devices, whereas Baxter et al. [31] identify factors that impact the provision and usage of these systems. Additionally, a set of symbols to support people with complex communication needs – that may be referenced for the development of AAC systems – is introduced by Krüger and Berberian [32]. Apart from that, Apple devices are also vividly discussed in AAC literature (e.g., [8, 12]). Thereby, Desai et al. [33] show that an iPad-based AAC system may increase communication skills of individuals with a cerebral palsy and autism. In contrast, Flores et al. [34] find out that the communications skills of young people with autism do not necessarily increase by using an iPad-based AAC system in comparison to alternative solutions. A summarizing overview of developments in the AAC discipline over the last decades is provided by Hourcade et al. [1]. In general, tablet-PC-based AAC applications can be classified as “symbols/pictures only”, “symbols/text to speech” and “text to speech only” solutions [12]. Thereby, the “symbols/text to speech” solutions use a combination of symbols and/or keyboard capabilities to express sentences [12]. A subgroup of this category are dynamic “symbols to speech” apps, which allow the configuration of multiple patterns of pictures or characters that represent phrases or words [35]. Existing apps of this type differ regarding their “vocabulary strategy” [19]. In case of a “1:1-correspondence” strategy, a certain key represents exactly one word (single word strategy) or a spoken phrase (phrase strategy) [19, 36]. “Semantic coding” strategies classify words according to their semantics [19]. Depending on a sequential selection of certain keys, a key may thus have various semantic meanings, which reduces the number of required keys on a device’s display [19, 36]. While several available apps on the market are distributed with a predefined vocabulary, others need to be individually configured at first. Examples of “symbols to speech” AAC systems are “MetaTalk”, “GoTalkNow”, “Quasselkiste”, “TouchSpeak” and “MultiFoXX 24”, amongst others. As a common denominator, commercial apps usually comprise features such as the visualization of a raster screen for speech symbols, the production of voice output (based on pre-defined texts), the availability of symbol libraries or the management of different raster screens. However, many of the software and hardware capabilities of modern tablet-PCs are not used by existing apps (cf. [16]). For instance, the software development kit for Apple’s iPad includes several frameworks, e.g., for voice output, home and internal process automation or machine learning (ML) that would be potentially useful for AAC system design. Besides these software aspects, most available apps do not fully exploit the hardware capabilities of current tablet-PCs either [16]. A precise localization via GPS data, the application of internet services or the authorization by help of the fingerprint sensor could be mentioned in this respect. Moreover, an analysis of existing apps showed that persons with restricted visual and motor skills may easily produce working errors when working with tablet-PCs, which cannot be reversed by them without further assistance (cf. [16]). Against this background, the development of an easy-to-use and “open source” iPad-based application for persons with speech impediments with the abovementioned characteristics (e.g., profile “non-verbal”, etc.; see introduction), which makes use of the current capabilities of the iPad device and can be adapted to individuals’ needs, is considered promising. Accordingly, we aim at complementing the abovementioned AAC research streams by offering insights on how current development environments and frameworks can be used to create an open source “symbols to speech” prototype, which makes use of the contemporary software and hardware capabilities of an iPad (e.g., use of context information, etc.). Thereby, beneficial results are created for iOS developers in the field of AAC systems (e.g., applicability of pre-defined algorithms).

3 Methodology and Research Design

To arrive at a prototype of our app called “BeMyVoice”, we conduct a Design Science Research (DSR) project [37,38,39,40] and follow the procedure of Peffers et al. [41].

Fig. 1.
figure 1

Procedure by Peffers et al. [41] adapted for this research.

The problem statement was formulated in the introduction (Step 1). In the second step (“Objectives of a Solution”), we derive requirements (e.g., [42]) with the help of literature, user stories, an analysis of existing AAC apps that match our research scope (“non-verbal” profile, “symbol to speech”, etc.) and interviews with the management of a German software company that focuses on the development of apps for healthcare, amongst others. Based on that, a prioritization of requirements is performed to come to a manageable set for an initial prototype. Afterwards, the prototype is designed and developed in Step 3. Thereby, design-related decisions, e.g., the layout of the GUI, have to be made. Moreover, frameworks for technically realizing the prototype need to be selected. Afterwards, a demonstration of the prototype is performed (Step 4 – “Demonstration”). In a next step (Step 5 – “Evaluation”), the prototype will be subjected to a larger field study with therapists and speech impaired persons to assess its usefulness, applicability and usability (cf. [37]). Then, the app will be revised and promoted as an “open source” app that can be used straight away and further developed to meet individual requirements (Step 6). The purpose of this research (cf. [40]) is to contribute to the knowledge base (cf. [37, 38]) of how to use iPad-based “symbols to speech” AAC systems to support persons with a speech impediment.

4 Objectives of a Solution

As mentioned above, we strive for the development of an iPad-based “symbol to speech” AAC application “BeMyVoice” that targets speech impaired persons that have some motor skills, intellectual aptitudes as well as visual abilities to recognize and select symbols on the tablet screen, but also are able to understand the semantics of the symbols. To arrive at requirements for a corresponding prototype (Fig. 1 – Step 2), we followed the suggestions of Schilling [42] for mobile app requirements engineering and we emphasize the perspective of people who rely on them to communicate. Accordingly, a review of the market (e.g., “MetaTalk”, “GoTalkNow”, etc.) was performed first (cf. [42]) to derive common functionalities of mobile AAC apps (e.g., visualization via raster screens, etc. – see Sect. 2). Second, user stories (cf. [42]) of potential app users, called “Felix”, “Anita” and “Petra” hereinafter, were set up. Thereby, Felix is characterized by severe mental and physical disability caused by a lack of oxygen at birth. He was diagnosed with aphasia and a spasticity of both arms. He is unable to speak and can only emanate sounds. The sheltered workshop he works at is located near his care facility. He uses a wheelchair and the iPad with “BeMyVoice” is attached to it by help of a mount.

Table 1. Requirements for the prototype.

Anita lives and works at a home for disabled people. She has a mental handicap and aphasia, which were caused by a craniocerebral injury. Anita does not depend on a wheelchair, and can hold the iPad in her hands. Petra is Anita’s and Felix’s therapist and configures the app for them. With the help of the user stories, the objects processed by the app could be itemized and the underlying logic specified more precisely (cf. [42]). Third, interviews (cf. [42]) with the management of a software company were performed. The mentioned firm has long experience in creating apps for education, fitness and healthcare, amongst others. Moreover, creating open source apps is a major principle of the company culture and this expertise helped to identify additional requirements. Fourth, the insights acquired hitherto were complemented by suggestions derived from literature about iPad-based AAC systems (e.g., [8, 33]). Finally, user journeys were specified for the aforementioned user groups to get an understanding for potential usability problems when using the app (cf. [42]).

In summary, the “functional” and “non-functional” design requirements (DR) as shown in Table 1 were defined. After a prioritization in cooperation with the management of the mentioned software company and an initial effort estimation, DR 1 to DR 7 as well as DR 12 to DR 15 were considered to be realized in a first iteration of the DSR process (see Fig. 1).

5 Development and Demonstration

For realizing our app “BeMyVoice” (Fig. 1 – Step 3), the operating system iPadOS and the iOS SDK environment were chosen (https://developer.apple.com/). The programming language Swift was used for code generation (cf. [44]), SQLite as the database management system and Xcode 11 as the integrated development environment (IDE). The operating system functionalities and frameworks of the iPadOS Software Development Kit have the advantage of reducing implementation efforts for programmers, of being constantly updated and of having been tested broadly. Further, the user interface (UI) components are used by the operating system for other applications as well, which largely contributes to a high usability. A first design of the UI, matching the abovementioned requirements – with the help of wireframes – was discussed and revised in interaction with the management of the software company. Figure 2 shows the wireframe from the discussions as well as a simplified use-case-diagram highlighting the users of “BeMyVoice”, namely the speech impaired person and the therapist.

For the symbols, the library of the Aragonese Centre for Augmentative & Alternative Communication was referred to (http://www.arasaac.org/), which is provided for free and has established in practice. The voice output functionality was realized with the help of the AVFoundation Framework of the iOS SDK and the corresponding class AVSpeechSynthesizer. With the “guided access” functionality of the operating system, operating errors can be largely eliminated during the app’s use. For the implementation of the “intelligent suggestions” requirement, the Naïve Bayes (NB) algorithm was used (e.g., [45]). We did not apply the machine learning (ML) framework CoreML of the iOS SDK, because it primarily relies on pre-built machine learning models rather than on mechanisms that support model training during device usage. The Google MLKit Translate API (https://developers.google.com/ml-kit/language/translation) was used to enable the possibility of voice output in multiple languages. In this respect, considerations about data privacy played a major role during the implementation. With the use of online translation APIs, providers could potentially get profound insights about the user’s private life by analyzing “translation requests”. MLKit Translate however uses offline translation, which means that all translated phrases and words are computed on the device as opposed to a server. Besides this privacy-sensitive approach, the API offers a high-quality translation. Therefore, it was chosen for our prototype.

Fig. 2.
figure 2

Partial results of the design stage.

On the left-hand side in Fig. 3, a screenshot of a sample raster screen with selected “push to talk” buttons is shown. The lower edge of the figure shows a “dock” with frequently used buttons but also with suggestions of buttons that might be used next, which are “intelligently” anticipated by “BeMyVoice”. The demonstration (see Fig. 1 – Step 4) of the prototype was done in cooperation with the management of the software company in form of a workshop. So, raster screens were created for the abovementioned persons Felix and Anita by taking the perspective of the therapist Petra.

Fig. 3.
figure 3

Screenshots of “BeMyVoice”.

Then, typical everyday scenarios – that might be supported by the app – were talked through (e.g., visit with a family member, talk with colleagues at work, leisure activities). To cope with these situations, the following “folders” for raster screens were configured for Felix and Anita based on the results of aphasiological research (cf. [18]): “professional activities” (buttons required for the working environment), “emotions/moods” (e.g., “I am happy”), “friends/acquaintances/family” (information about important persons, e.g., “name”, etc.), “social etiquette” (greeting, polite expressions, etc.), “profile” (buttons for introducing oneself, etc.), “information related to one’s speech impediment” (e.g., “I can only speak by help of this AAC device”, “please be patient”, etc.) and “leisure time” (buttons regarding leisure activities, e.g., “I`d like to paint”). In the workshop, there was an agreement that the highlighted design requirements in Table 1 were fulfilled and the general applicability of “BeMyVoice” for the targeted user group could be assumed. Moreover, several ideas regarding a further development of “BeMyVoice” came up (e.g., activation of the device’s flashlight in case of adverse lightning conditions, etc.) encouraging us to start a comprising evaluation of “BeMyVoice” in everyday scenarios.

6 Discussion and Implications

Most high-tech AAC technologies used to require tremendous learning efforts [2]. Thus, the use of commonly accepted technical devices, such as the iPad, has been increasingly getting attention recently. In this project, we develop an iPad-based AAC “symbol to speech” prototype “BeMyVoice”, which helps people with a “non-verbal” speech disability to improve communication with the environment. Considering this, our research and the ongoing development of the app entails several benefits. First, we provide an open source prototype that can be further developed and enhanced by the community to meet individual demands. Accordingly, we provide a cost-effective solution that can be used straight away complementing the market of commercial applications. Second, we contribute to the academic discussion of how technological devices – that are used in peoples’ everyday life (e.g., iPad) – can support AAC. More concretely, the demonstration provided us with the first important findings regarding the app’s suitability for dealing with typical everyday scenarios. In addition, we obtained insights about the usefulness of the defined symbols as well as raster screens and validated the used theoretical constructs. These findings and their contribution to the underlying theories (see Sect. 2) are to be further investigated in upcoming steps. Additionally, the easy handling, portability and social acceptance of such technologies may foster the wide distribution of AAC solutions and increase the social contact of persons with a speech impairment. In this respect, many existing solutions for the iPad do not exploit its full software and hardware capabilities. Considering this, the study proposes a set of frameworks and functionalities of the iPadOS Software Development Kit, which can be purposefully used for building modern tablet-based AAC software along with propositions for the design of the GUI. Furthermore, restrictions also became evident, e.g., regarding the CoreML package to support machine learning in AAC. Additionally, existing services – e.g., the Google MLKit Translate API – turned out to be suitable for building an iPad-based AAC solution. These insights determine the newness and innovativeness of our solution considering that a gap in literature can be observed to this effect. Third, our prototype may serve as a starting point for creating apps for speech impaired persons with other characteristics and “profiles” as well (e.g., “non-speaking”). That way, the requirements for different types of speech disabled persons will become clearer.

Nevertheless, our research is also subject to restrictions. Although we received promising results hitherto, a comprising evaluation is still an open issue. Moreover, we focus on a specific type of speech impaired person as outlined above. This is necessary, for being able to precisely define the requirements matching the particular needs of that group and to avoid creating an “ad-hoc” solution that may not meet expectations (e.g., [17]). Accordingly, our app targets “non-verbal” persons with the characteristics described in Sect. 1 and our insights primarily refer to this group. We used the iPad and its corresponding developer frameworks as a technological base. This of course limits the freedom of design due to platform-dependent reservations. Further, only selected design requirements have been considered hitherto.

7 Conclusion and Outlook

In the work at hand, we introduce “BeMyVoice”, an iPad-based “symbol to speech” system targeting speech impaired persons who have some motor, intellectual and visual skills enabling them to handle the app, to recognize symbols and comprehend their semantics. The research provides beneficial insights for developers about the current capabilities of the iPadOS and iOS SDK as well as freely available services and components (e.g., symbol libraries) to realize an iPad-based “symbols to speech” AAC system, which draws upon the device’s present technological maturity. “BeMyVoice” is intended to be an open source app (i.e., the source code is freely available) that can be adapted on demand.

In the future, “BeMyVoice” will thus be subjected to a larger evaluation to get feedback on its applicability. Moreover, the app will be made accessible via GitHub to push collaborative further development. Recently the app has been made available for free via Apple’s AppStore and has successfully passed Apple’s quality check, an obligatory step for publication.