Keywords

1 Introduction

The development of Web 2.0 technologies has made the WWW a place where people constantly interact with each other, by posting information (e.g. on blogs or discussion forums), by modifying and enhancing other people’s contributions (e.g. in Wikipedia), and by sharing information (e.g., on Facebook and social news websites).

Unfortunately, these technologies are not easily accessible to sign language (SL) users, because they require the use of written language. On the other hand, sign language videos have two major problems: first, they are not anonymous and second, they cannot be easily edited and reused in the way written texts can.

As recently verified by the proof of concept demonstrator of the Dicta-Sign projectFootnote 1 [1, 2], the advanced scenario that makes Web 2.0 interactions in sign language possible may incorporate real time recognition and dynamic production of sign language where sign utterance presentation is made available via a signing avatar (Fig. 1).

Fig. 1.
figure 1

Kinect input to sign representation

Within the Dicta-Sign sign-Wiki demonstrator environment, the end user had the option –among other functionalities–of both creating and viewing signed content by exploiting pre-existing lexical resources (Fig. 2) [3]. One of the options demonstrated was the possibility for slight modifications of the stored resources and previously created utterances in order to convey a new message. In all cases, stored and new content was presented to the user by means of a signing avatar [4].

Fig. 2.
figure 2

Sign building and representation environment in the sign-Wiki Dicta-Sign demonstrator

As much freedom as a given system may give the user to create and view synthetic signing, signing avatar performance still remains a challenging task within the domain of sign language technologies. In the rest of the paper, we will discuss the issues of creation and maintenance of sign language resources for synthetic signing, as well as development of sign synthesis interfaces for non-expert users in view of the implementation of synthetic signing in education and communication.

2 Language Resources for Sign Synthesis

For decades, the video has been widely acknowledged among deaf individuals as the only option for transferring signed linguistic utterances. However, although the video remains the only means for SL message transmission that preserves naturalness of expression, it poses a number of serious restrictions as regards the on-the-fly composition of new “text” or the modification of previously created text. These actions are so common in everyday human communication and educational practice, that it is considered trivial for written text to be copied, modified and reused regardless of whether it was originally located on a web page or a local computer.

The limitations in composing, editing and reusing sign language utterances, as well as their consequences for Deaf education and communication have been systematically mentioned in SL studies literature since the second half of the 20th century. Researchers such as Stokoe [5] and more recently the HamNoSyS team [6, 7] and Neidle [8] have proposed different systems for sign transcriptions in an attempt to provide a writing system for SLs in line with the systems available for oral languages [9]. However, the three-dimensional properties of SLs have prevented wide acceptance of such systems for incorporation by Deaf individuals in everyday practice.

In lab environment, however, the possibility of phonetic decomposition of signs and the association of each component of sign articulation with a set of possible features enabled experimentation on synthetic signing performed by signing avatars. Therefore, it is clear that synthetic signing technology heavily depends on two equally significant parameters: (i) an effective sign synthesis engine and (ii) the availability of language resources adequate for sign synthesis.

2.1 Resources Creation and Maintenance

Availability of adequately annotated resources is a necessary condition for synthetic signing. However, creating resources for sign synthesis is not a trivial task since the annotator must be aware of the formation of the reference sign and also able to verify his/her annotations by constantly viewing the synthetic performance of stored annotations. To facilitate the creation and maintenance of lexical resources for synthetic signing the Internet based tool SiS-BuilderFootnote 2 was developed [10] on the basis of the architecture schematically depicted in Fig. 3.

Fig. 3.
figure 3

The SiS-Builder architecture

In the SiS-Builder environment, the author manually creates SL lemmas by providing phonetic transcriptions of signs. The tool accepts as input any sequence of HamNoSys notations and automatically transforms it to a corresponding SiGML script. SiGML transcripts that are so generated can be executed by the avatar, which performs SL content presentation.

In the course of its implementation, SiS-Builder was enriched with a number of functionalities that have provided a complete environment for creating, editing, maintaining and testing lexical resources of, appropriately annotated for sign synthesis and animation. The environment currently allows for the assignment of both manual and non-manual features to signs utilizing the HamNoSys notation system and a drop-down menu for non-manual features (Fig. 4).

Fig. 4.
figure 4

The SiS-Builder drop-down menu for the assignment of non-manual features to sign lemmas coding.

SiS-Builder enables multiple users to create and test their own data sets. It may accommodate video files of SL lemmas and associate them with a complex structure of data which include lemma coding for manual and non-manual articulation elements, visualization of the coded items via sign performance by an avatar and conversion of HamNoSys to SiGML files and vice versa allowing for easy coding of corrections and modifications where necessary. As regards Greek Sign Language (GSL) resources, SiS-Builder currently entails approximately 5,500 entries appropriately annotated for synthetic signing.

Furthermore, SiS-Builder has been subject to both classification and initial evaluation in respect to its accessibility features and use in the educational environment at European level [11] in the framework of the ENABLE network (http://www.i-enable.eu/).

2.2 On Signing Avatar Technology and Acceptance

Sign synthesis developed by utilizing sign formation features in order to provide 3D synthetic representations of natural signing.

As far as the Deaf user communities are concerned, avatars have been received with scepticism, since the naturalness of the video was not possible for early synthetic signing. During the last decade, signing avatar technology has developed to the point to allow representation of more complex motion including simultaneous performance of the hands, the upper body and the head, also including a number of facial expressions [12, 13]. Technological enhancements allowed for a higher score of acceptance among end users of synthetic signing engines, which paved the way towards considering the development of complete editing environments for sign language.

The avatar currently used to perform GSL signs in the SiS-Builder environment is the signing avatar developed by the University of East Anglia (UEA).Footnote 3 Some of the recent enhancements of this specific avatar were verified by results of extensive testing and evaluation by end-user groups as part of the user evaluation processes that took place in the framework of the Dicta-Sign project [14].

Synthesis, in this context, is based on the use of SiGML (Signing Gesture Markup Language) which is an XML language based heavily on HamNoSys transcriptions which can be mapped directly to SiGML.

The SiGML animation system is primarily implemented through the JASigning software which is supported on both Windows and Macintosh platforms. This is achieved by the use of Java with OpenGL for rendering and compiled C++ native code for the Animgen component that converts SiGML to conventional low-level data for 3D character animation. JASigning includes both Java applications and web applets for enabling virtual signing on web pages. Both are deployed from an Internet server using JNLP technology that installs components automatically, but securely, on client systems.

3 Towards End User Sign Synthesis Interfaces

The accessibility of electronic content by the Deaf WWW users is directly connected with the possibility to acquire information that can be presented in a comprehensible way in his/her SL and also with the availability to create new electronic content, comment on or modify and reuse existing “text”. SL authoring tools, in general, belong to rising technologies that are still subject to basic research and thus not directly available to end users.

However, a few research efforts focus on facilitating end users to view retrieved content in their SL by means of a signing virtual engine (i.e. avatar) and also composing messages in SL by using simple interfaces and by exploiting adequately annotated lexical resources from an associated repository.

The basic architecture (Fig. 5) of this kind of tools associates written text located e.g. on the Internet with a database of signs annotated appropriately, so that the latter can be displayed via synthetic signing performed by an avatar.

Fig. 5.
figure 5

Association of written text on the Internet with a database of appropriately annotated SL lexicon presented by means of a signing avatar applet.

Unknown lexical content may be checked against the lexicon of signs by means of a simple input search box (Fig. 6). End users usually follow a “copy-paste” procedure to enter their queries. The task to be executed, however, is by no means trivial, since in many cases the morphological form of a word in a text differs considerably from the form associated with a headword in a common dictionary. Thus, the successful execution of a query demands an initial processing step invisible to the user. This step is the morphological decomposition of the selected word prior to its correct association with the corresponding lemma entry.

Fig. 6.
figure 6

SL lexicon search box for association of “unknown” words in a text with their equivalent signs.

Nevertheless, such an application needs considerable infrastructure from the domain of Language Technology, which may overload a system that needs to provide real time responses. This kind of Language Technology may even not be available for a specific language.

In order to overcome this problem a number of approaches have been tested. Queries in electronic dictionaries may allow a degree of fuzziness with respect to matching the string which the user enters in the search box with the available entries in the lexicon database. This is the case in the NOEMA multimedia dictionary for the language pair GSL-Greek in which a search option in written Greek is incorporated in the environment [15]. The search result provides a list of candidate lemmas for the user to choose the one he/she is interested in. A similar approach to search by keywords was implemented in the Dicta-Sign sign-Wiki environment (Fig. 7). Again, partial match is also foreseen here, allowing for more than one search results, and thus facilitating the interaction of Deaf users with the Wiki.

Fig. 7.
figure 7

Search via keywords and display of results in the Dicta-Sign sign-Wiki; the avatar performs the user’s choice from the list of results.

Popular user interfaces, especially with deaf end users, suggested the use of a search mechanism based on the handshape (primary and/or secondary) in the lemma formation. This mechanism is present both in the NOEMA multimedia dictionary as well in the DIOLKOS Trilingual terminology environment: GR/ENG/GSL [16] (Fig. 8).

Fig. 8.
figure 8

DIOLKOS: hand shape-based retrieval mechanism

Another approach to the search mechanism, which is more appropriate for video databases of signs is often present in educational or e-government internet based applications. This approach is selecting the alphabetical ordering of the written concept and then choosing the appropriate lemma on those ordered lists (Fig. 9). This method prevents the delay in the system response caused by the searching within the entire content of a video database [17].

Fig. 9.
figure 9

Bilingual terminology lists

As far as sign synthesis interfaces are concerned, the composition of synthetic signing phrases may facilitate communication over the Internet, and it can also be crucial for education and group work, since it allows direct participation and dynamic linguistic message composition, which is similar to what hearing individuals do when writing.

The composition of new synthetic phrases results by selecting the desired phrase components from a list of available, appropriately annotated lexical items.

End users interact with the system via a simple search-and-match interface to compose their desired phrases. Verifying user choices is important at any stage of this process, so that they can be certain about the content they are creating. When the structuring of the newly built signed phrase is completed (Fig. 10), this phrase is performed by a signing avatar for final verification. The user may select to save, modify of delete each phrase she/he has built. Users may reuse the entire phrases or part of the phrases they save depending on their communication needs.

Fig. 10.
figure 10

Synthetic phrase building components by selection from a list

4 Conclusion

The emerging technology of sign synthesis opens new perspectives with respect to the participation of the Deaf in Internet-based everyday activities, including access and retrieval of information and anonymous communication. Moreover, it has also been changing radically user participation in educational practice, since it provides both the learner and the teacher who uses sign language with the option of creating, saving, modifying and reusing signed “text” as she/he wishes. Finally, synthetic signing may be incorporated in applications such as machine translation (MT) targeting sign languages, thus opening new perspectives as regards their potential and usability [18, 19].