Keywords

1 Introduction

A theater play is a cooperative task in which actors express themselves through text, gestures, sound and movements based on stage directions and a story, usually in the form of a script. They are assisted by prompters and technical staff responsible for actors’ performances, set design, lighting, sound and music during rehearsals and performances. Some of these tasks have been taken over by technical installations and systems in research [13]. Takatsu et al. [15] proposed a system that detected speech and movement to support an actor’s performance within an interacting group. The authors described the rehearsal cuing system architecture and developed a study to show that the system helped actors by cuing the flow of events to each actor individually. An evaluation indicated the system might enhance actors’ theatrical performance practice.

Ronfard [12] developed a complete workflow to publish recorded video data and the corresponding metadata from performances and related rehearsals online as Open Data. Recent improvements in ultrasound imaging enabling new opportunities for hand pose detection using wearable devices were reported in McIntosh [11]. Jessop [7] created and evaluated the Gestural Media Framework for gestural control of media in rehearsal and performance recognizing continuous gesture and translating Laban Effort Notation into the realm of technological gesture analysis. In [14], the authors propose “Digital-Script”, a kind of stage directions that “focus on actor’s performance factors, standing position, head direction and timing of actions.” An implemented “theatrical practice support system detects user’s standing position and head direction. In the virtual space, a virtual actor is shown as he performs along with Digital-Script information”.

A crucial prerequisite for this is, in addition to suitable recognition hardware and software, to develop schemas for the standardized encoding of plays, orchestral scores, stage directions and scripts in order to be able to assess the participants’ performance. For the encoding of drama and records of parliamentary debates, the element <stage> (for stage directions) was adopted in P5 Guidelines for Electronic Text Encoding and Interchange (TEI) [9, 16], and a formal transcription system for conversational gestures is described in [17].

According to Eco [2], isotopy, a concept first introduced by Greimas [5], has become “an umbrella term, a rather general notion that can allow for various more specific ones defining different textual phenomena”—even a gesture with a textual metaphorical interpretation. The recurrent occurrence of certain (clas)semes and subordinated lexemes in the text can be recognized and summarized as discursive isotopies together with their accompanying gestures described in the stage directions. Combining related discursive isotopies into a narrative strategy represents a reliable computer-based method for describing and quantifying literary constructs of interest.

To describe gestures, we adopt an approach followed by the authors in [8] providing a formal semantic analysis of coverbal iconic and deictic gestures with three important features based on the observation that speech and gesture together form a ‘single thought’. First, the content of language and gesture are represented jointly in the same logical reasoning. Second, rhetorical relations connect the content of coverbal gesture and synchronous speech. Third, language and gesture are interpreted jointly within an integrated architecture for linking the form and meaning of utterances. Linguistic models can be applied to nonlinguistic objects in a metaphorical interpretation, but they supplement the nonlinguistic objects rather than replacing them, as the authors argue. Trippel [17] proposes the formal transcription system CoGesT, a machine processable computational model for annotating a subset of conversational gestures. Source and target location; trajectory described in terms of direction, type of movement, shape of the body part, and change in body shape during the movement; and modifiers for speed and size are composed to create a gesture feature vector; a BNF-grammar for the CoGesT system is offered in [4].

As examples of gesture types and their semantic content, the authors mention iconic, metaphoric, deictic and emblematic gestures. Iconic, word-like gestures clarify the spatial and dynamic features of a concrete concept/object; the notion metaphoric refers to gestures that depict the content of an abstract idea through a physical action; deictic refers to pointing to a physically present or absent referent; emblematic gestures conform culturally and are universally understood by a community.

Section 2 deals with stage directions and proposes a classification of intended gestures and movements with emphasis on communicative gestures. Section 3 is devoted to gesture and body movement modeling based on relevant triangles, their trajectories and constraints combined with scene context in order to construct a feature state vector for motion description. In Sect. 4, M.-J. Hourantier’s version of a ritual theater play Orphée d’Afrique [6] is presented as a use case. Section 5 describes a cuing system for an automated structural approach to its implementation and function. Finally, Sect. 6 offers conclusions and discusses further work.

2 Intentional Gestures and Movements Classification

The scope and frequency of stage or production directions vary according to the history of the theatre from quasi-nonexistence to taking an important place in the contemporary theatre [18]. Whereas in the dialogue the characters are speaking, in the didascalia the author himself directs, assigns positions to actors, has them speak or participate in a conversation and describes their gestures and actions. As Gallèpe [3] states, “Almost all of the didascalic indications concern bodily presence and its manifestations”; some describe the various locations. The didascalic mode is the most easily identifiable. First of all, it includes the stage directions formally inserted in multiple parts of the text and provides details about the body and its role in communication. The naming of the body in remarks made by the actors themselves is a second internal instruction mode [18]. The third mode is an implicit one: the body and its actions must be inferred from what the characters say or even from the tension existing between an intention and the content of the directions for obvious reasons of place and feasibility. Many didascalia are devoted to the indications concerning the personal façade of interacting characters. It seems preferable to focus exclusively on the didascalia dealing with notations of the gestures and positions of the bodies in the interactions: Textual gestures concern the phonation of the oral text of the interaction, protextual gestures partially overlap pantomimic figurative or quasi-linguistic gestures, coverbal gestures constitute the nonverbal context of replicas, they correspond to the accompanying register, and contextual gestures denote activities that are not directly communicative but center on another action in relation to a concrete situation.

The method used by Gallèpe is as follows: he undertakes a methodical analysis of the didascalia of a number of German and French plays of various genres and periods constituting a basic corpus. For each of these 13 pieces, the first hundred replicas were considered. In 171 stage directions, the body is explicitly mentioned; in 77 it is only involved. The other didascalia relate to the various locations. They are classified according to four criteria: unintentional gestures; gestures without intention of communication; intentional gestures of communication; undecidable. The assignment of the gestures to a role is clearly described in the vast majority of cases. The corporal signifiers allowing translation within the gestures vary and are left open to interpretation. The different registers with occurrences found in the didascalia are as follows: mimic (26), paraverbal utterance (23), gaze (19), static (7) and dynamic (29) posture, kinesics (20) (undifferentiated, hands, head, arm, torso, lower limbs), and proxemics (88). In this study, we decided to focus on the last four categories: static and dynamic posture, kinesics, and proxemics.

Greimas [5] differentiates between practical, mythical and communicative gestures. He mentions two lists covering behaviors such as (a) walking, running, sleeping, standing upright, etc. and (b) taking, giving, holding, pulling, pushing, etc. These two lists suggest the possibility of a very small inventory of simple and sufficiently general physical activities at the same time comparable to the very limited number of phonemes accounting for the totality of known articulations of natural languages. This sheds light on the parallelism between the gestural phenomena and the sememes covered by the verbs. In semantics, sememe is a bundle of minimal semantic strokes (called semes) whose formal correspondent is the lexeme.

According to an inventory of simple natural behaviors, it seems possible to reduce the gestural substance to figures in the visual plane and thus allow the division of this gestural text into minimal units whose combinatorial composition produces gestural statements and gestural speech itself. Unlike Greimas, Cosnier [1] emphasizes the absence of linear segmentation and the combinatorics of elementary units: gestures given their three-dimensional deployment can associate, combine, and condense. For this reason, we limit ourselves to a limited sample of gesture examined in Sect. 5.

3 Gesture and Body Movement Modeling

Recognizing the gestures and movements of a group of actors requires a few preliminary considerations and preparations, including their formal description and software. The process is as follows: Identify actors performing there, and record the positions. This makes it possible to judge whether each actor stays in a specific position correctly in each scene. Actors’ body parts are also recorded so that their movements can be recognized by a machine-readable computational model that measures geometrical figures. It translates the stage directions into a machine-processable form (classifies the gestures in accompanying, replacing, explaining text), resulting in a script. With respect to a theater rehearsal, the script tags elements such as typical actors’ constellations (i.e., how the scenes have been blocked) and their speech, gestures and movements based on the TEI P5. Formal description of body gestures is essential to recognizing actors’ movements. The procedure is illustrated below.

  • Place a depth camera in front of a stage to detect the vertices of the actors’ body.

  • A spatiotemporal description of an actor’s movement as a sequence of body gestures is produced using involved triangles Tk. Also, a trajectory t can be constructed using an individual’s (p) positions and directions, start/source position s and endpoint/target position e on a surface, and timespan. Each position is described in an xyz coordinate system, each actor’s endpoint position is checked to ensure he or she is in the correct position at the end of each scene.

  • Spatial relations and constraints C rely on topological, directional, proximal or distance features and connectivity containment using contiguity predicates.

A depth camera is used to detect the vertices of each actor’s head, neck, shoulders, elbows, hands, torso, femur balls, knees, and feet. Also, triangles constructed with vertices on the head, arms and legs, upper body, lower body, whole body, and three edges are measured to determine the normal vector pointing away from the body (cf. Table 1). Body gestures are modeled by moving the limbs and recording trajectories (cf. Table 2). For example, encounters between persons and communication can be described as distances of the corresponding body parts and directions of the triangles T1, T4, T5, T6, T7, e.g., extending arms, a handshake or a hug. Using the triangles and their normal vectors, trajectories tk(va), tk(vb), tk(vc), distances d(vk1, vk2) and gesture vectors f can be computed.

Table 1. Examples of body movements and their descriptions using 24 vertices vk, 16 triangles Tk, and normal vectors nvk pointing outside (os). In the current version, T2, T3, T13 and T14 cannot be acquired with the system, but they can be relevant to individual movements, depending on the camera positions.
Table 2. Examples of body movements and their descriptions: a non-exhaustive list.

The advantage of using gesture modeling with triangles, vertices, and their normal vectors is that triangles always lie in a plane described by Hesse normal form nvx(x − vax) + nvy(y − vay) + nvz(z − vaz) = 0. If one assumes the standard case of a plane stage, then it can be equipped with an orthogonal three-dimensional coordinate system (x, y, z). The actors move on the floor z = 0 following flat curves, and many constraints can be checked concerning distances or the position of a point with respect to a plane, circle or polygon by inserting the point coordinates into curve or surface equations.

4 Gestures in Ritual Theater Play: A Use Case

The purpose of the second author’s dissertation [10], Werewere Liking: Ritual and Writing: The Postcolonial Subject and Its Discursive and Narrative Strategies, is to examine Liking’s five chant novels: A la rencontre de… (1980), Orphée Dafric (1981), Elle sera de Jaspe et de Corail (1983), L‘Amour-cent-vies (1988) and La mémoire amputée (2004) from the angle of common research topics. In the novel Orphée Dafric, Orpheus goes through a series of exams and meditations—an individual initiation enriched with positive achievements—and including abstract images that he tries to render concrete and effective in accomplishing an archetypal rebirth.

The object of the research is to find coherence and differences between the five novels and to establish the paradigm of human development from a person who is alienated, ill, searching, and decadent to one who is responsible, engaged, and harmonious—a member of a new mankind, in five different variations: the binary, archetypal, prophesized, liminal and transgenerational accomplishment of human character development. Semantic isotopies and narrative strategies are made visible in the process of a character’s metamorphosis during the phases of denunciation, perspective development and synthesis. Systematic analyses of 10 preselected isotopy groups based on the semantic breakdown of the ambiguities in the corresponding 34 lexemes enable the various developments of the main characters to be described and located at a higher level and distinguishable narrative strategies to be identified. (Clas)semes and subordinated lexemes are found by means of an encyclopedic definition; automatically searched related citations are listed in a table, then contextualized, actualized or virtualized in relation to both main and minor characters. A similarity measure was developed which determines differences in how the main characters evolve [10, p. 435]. After an initial exemplary analysis of the play Orphée d’Afrique [6], it turns out that in addition to the genre-related differences to the novels, content and structural peculiarities also occur that require an adaptation of the chosen method and selected isotopies. While, as in the novel the characters are in an initial state of unconsciousness, impotence, deprivation and isolation, other elements of initiation and development become decisive in the play. Formulated definitions, such as of the term initiation, need to be adapted, figure constellations and their dynamics must be rewritten.

As an extension, this approach analyzes the speech and movements of the actors in parallel using a recognition system to synchronize text isotopies and typical coverbal or mythical gestures: Stretch out, lie down, sit down, get up, take a few steps, climb stairs, stay motionless, leave or enter the scene, kneel down, bring something to the audience, take/hold something in hands, make several forms with the hands, perform ritual actions like baptism or passing within a circle or polygon (cf. Table 3).

Table 3. Actors, isotopies, accompanying movements, and gesture type.

The system localizes people, one near or above the other, alone or in pairs, at the top or in the middle of a group and detects which of these patterns are in sync with text isotopies. Clear scripts and stated directions illustrating dialogues or concerning the gestures or movements of the actors, formally described and tagged, support the premise that this computerized approach could be successfully implemented.

5 System Design and Implementation

This section briefly explains how the cuing system has been designed, implemented and enhanced with gesture recognition.

5.1 Stage Directions

First, a standardized digital version of the play’s stage directions is made with the actors, with their accurately dimensioned body triangles in a basic position with respect to a standard rectangular flat stage. Then, actors’ gestures, movements, and actions are listed in a common timeline with accompanying text and instructions in the detailed standardized XML format.

5.2 Recognition of Actors’ Gestures

Actors’ gestures can be recognized as described in the following examples taken from Table 3.

For stair climbing, “Climb the seven steps to reach the attic, look at the sky”, the start and end points of the trajectory and the midpoint between the foot markers of triangle T1 occur along a straight line. The z-coordinate is controlled with regard to uniform increments from 0.5 to 6.5 times the height of the step. Finally, the direction of the normal vector of the head triangle T2 now points diagonally above.

For “Spinning in a circle”, the trajectory of the feet vertices of T1 stays in the circle and nvx/\( \surd \)nv 2x  + nv 2y , the cosine of the x- and y-coordinate of the normal vector nv in the plane, varies from 1 over −1 back to 1. For “Lies down and turns on the bed”, take the upper body triangle T5 and the cosine of the y and z coordinates of the normal vector nv in the yz-plane. Normal vectors are calculated by the system.

5.3 Implementation

The system has been implemented according to the design. It uses Xtion PRO as a depth sensor and OpenNI to obtain the three-dimensional coordinates of each vertex. It calculates the normal vectors in the Sect. 3 to recognize actors’ gestures. Each actor wears a smartwatch, and the server notifies the individual actor when he/she should start speaking or acting (Fig. 1).

Fig. 1.
figure 1

The cuing system.

An XML script file is stored in advance that includes encoded text and the position coordinates indicating where actors should stay in each scene (Fig. 2). In a play, it measures any positional gap between the coordinates in the script and the actors’ actual positions. It also compares gestures with those in the script.

Fig. 2.
figure 2

An example XML script.

  1. (a)

    The system detects that actors’ positions are converging by investigating the two normal vectors nv directed against each other (Table 2 ”).

  2. (b)

    The system detects that the distance between the actors’ right hands is decreasing (Fig. 3).

    Fig. 3.
    figure 3

    In each set of images, upper images show the actors captured by RGB camera and lower images show the body vertices, triangles T5, and normal vectors nv. The time axis runs from left to right.

The system also detects whether an actor is speaking or not to track the script automatically. Only the volume of their voices is used to avoid any influence from inaccuracy of speech recognition.

6 Conclusions and Further Work

With the help of a speech- and gesture-recognition system for cuing actors’ utterances and gestures, we have shown that body gestures detailed in stage directions can be recognized and brought into a machine processable form. By combining semantic isotopies from an actor’s speech and his body gestures, an individual story is generated that can largely be diversified by actors’ spontaneous interactions.

However, a lot of practical work remains to be done to enhance the system. It should be extended to include full speech recognition, to detect and classify improvised actions and to deal with larger scale plays. With such a system, a database of collected descriptors of body gestures from a number of plays already recorded in the literature can be built, which is a prerequisite for investing in the implementation of a scene editor and player. Also, several open questions remain: How can the simultaneous actions of audiences be modeled? How should instructions concerning rhythm and facial expressions or behavioral gestures be described? What is the best way to describe complex actions like creating harmony or fighting effectively?

In addition, a research team of literacy, media and computer science experts could undertake an analysis of multimodal interactions in theater plays, including musical performances and dance, based on patterns from primitives of a multimodal gesture and sound language. There is a reasonable presumption that semantic isotopy analysis can be most profitably extended to various kinds of theater with a mixture of language, intonation, gesture and dance for a better description and understanding of multimodal utterances.