Introduction

The following reflections are written from a psycholinguistical point of view with the general aim to find a way out of a prevailing monological formulation of questions concerning language, cognition, and their relationship. This is done in concert with approaches in own and related discipline, making a general critic of a monological epistemology more and more visible. Thus, O’Connell and Kowal (2003) convincingly demonstrate the shortcomes of monologism in psycholinguistics, while scholars as Linell (1998), Weigand (2003), Marková (2006), Grossen and Salazar Orvig (2006) are developing dialogical linguistics; in psychology, it is the theory of the dialogical self which provides theoretical support and fruitful empirical material to a dialogical view of human processes, concerned in this field foremost with issues of self and identity (Hermans 2001a; Valsiner 2002; Hermans and Dimaggio 2004).

This short list is far from being complete, but it is not the aim to give here an overall picture of what could be termed dialogical science, with all its branches and variations and with its historical rootsFootnote 1—dialogism not being an invention of the twenty-first century. But what is worth mentioning is the fact that disciplines dealing with human processes and activities become more and more aware of the limits of a research paradigm confined to a self-contained system, supposed to be computable (Bertau 2004a). The need to open that system results not in a mere addition of environment, context or sociality; rather, it comes to a complete shift where the system is no more the center of attention but its relation to other entities or persons. One could even say that it is the in-between which comes to the fore, always a process of common activity. In this, as Valsiner (2007) writes, singularity is seen in its underlying structure of plurality, which is to be found at different levels of organization of the living beings (biological, psychological, social).

The shift mentioned is a real challenge in theoretical and methodological regards, affecting in the end the way we conceive ourselves and the others we are investigating. And this shift is in no way fully accomplished; it has so far not lead to a coherent, single dialogical science. Rather, its inner diversity, its difficulties and problems, even its apories and pitfalls are vividly perceivable. One pitfall to be mentioned is taking dialogicality as a moral category, assuming that everything dealing with dialogues is per se morally good, and that people in dialogue are related in a positive and good way.

Thus, Clot (2004), following Ponzio (1998), criticizes a moral reading of Bakhtin as in Clark and Holquist (1984) and concludes that dialogue is not an ideal to achieve; moreover, it is nothing one chooses to engage in. Rather, dialogue is an act one endures, as a result of alterity which affects us and from which we can not keep aloof. Development consists for Clot (2004) in a development from primary, completely endured dialogue, to a reversal of the relation to alter. This results in an appropriation where alter and the dialogues with him/her become a resource for the self. This understanding of alterity sets the relation to alter as given and imperative, a point of departure for developing dialogues, which will be seen from the point of view of the differences they generate rather than in terms of a communion to be attained.Footnote 2

What is also perceivable in dialogical science is the obvious need to re-think core notions of psychological and linguistical processes, for instance the notion of internalization (Susswein et al. 2007; and comment by Valsiner 2007), and more generally the necessity to coin or re-work notions that are able to grasp the dialogical dynamics in question. Hence, the very term of “voice” seems to be for both psychological and linguistical approaches a central notion, crystallizing important assumptions of dialogicity, for instance perspectivity, vividness, and dynamicity of processes in human beings, as well as relatedness of socio-culturally situated persons.

Situated at the intersections of these diverse ideas and efforts, the present article explores the notion and the phenomenon of voice—a formulation which already reveals the oscillation of the term between its metaphorical–conceptual and literal uses, where “voice” appears to be a concrete, perceivable phenomenon. It is this oscillation as a genuine part of the term which will firstly be developed. Assuming its concreteness as perceivable event—the point of view taken here—a description of the phenomenon is then given in five key concepts with a special attention to interiorization as the passage from the interpersonal to the personal realm of activity and experience. This will be the moment to address the question of consciousness explicitly, having been beforehand more like a slight musical motive running through the topic. Consciousness in the perspective of voice is a serious and challenging subject, which needs thoughtful considerations. However, at the present point, it can only roughly be sketched. But this attempt is in my opinion worth the trouble—and the risk.

Variations of a Notion, or: The Readings of a Metaphor

Many current writers in dialogical sciences refer to the work of Bakhtin, and, actually, the terms “polyphony” and the ensuing “voice” have their point of departure as root metaphors in Bakhtin’s work, namely in his analysis of Dostoevsky’s literary technique (Bakhtin 1984). It is interesting to observe how Bakhtin himself comes to the term of polyphony, accompanied by the notion of voice. According to Bakhtin (1984, p. 21), it was Komarovič who introduced the analogy to polyphony and to the counterpoint of fugue to explain Dostoevsky’s technique of “harmonization of voices” as different voices of a fugue developing contrapuntally. But Komarovič has not well understood the essence of polyphony lying firstly in the autonomy of voices, and secondly in that this autonomy becomes unificated into a “unity of a higher order than in homophony” (Bakhtin 1984, p. 21). Concluding, Bakhtin underscores that the comparison with polyphony is only an illustrating analogy and that one has to bear in mind the metaphorical source of his term “polyphonous novel”. Thus, Bakhtin (1984, p. 22) is using “polyphony” and “voice” in an explicitly metaphorical way, stressing the difference between music and novels.

As a remark to this pronounced metaphorical understanding, I would like to point out the fact that occidental polyphonic music developed in the Middle Ages and the Renaissance within the vocal domain (e.g., Gregorian chorus, school of Notre Dame). Hence, speaking about polyphony and voice with regard to instrumental music is itself a metaphor. Bearing in mind that Bakhtin’s focus was literacy, his use of the musical metaphor shows at the end more affinity between written texts and instrumental music than difference—the affinity lying in a certain distancing from body experience. Thus, the Bakhtinian metaphor speaks of an abstraction, and it may well be that it is this very use that allowed Bakhtnin to think of voice not only in terms of acting and speaking characters but also in terms of consciousness. His metaphor applies to a quality of language as well as to the nature of consciousness.

It is from the notions of word, utterance and answer that Bakhtin arrives at a conception of voice, thereby describing the foundations of language as a dynamic structure of acts of answering: every utterance is an answer to preceding utterances, every act of comprehension is related to an attitude towards answering, and every utterance is produced in anticipation of an answer (Bakhtin 1986, pp. 69, 91). A voice seems to have the function of a carrier: voice carries the speaking subject out of himself, decentering and orienting him toward the other(s) (both face to face, and in general to social others), supporting and leading the contact. What a voice carries and expresses at the same time is that the utterance is as well “mine” as “the other’s” (Bakhtin 1986, p. 89). The speaking subject matters only as a decentered and therefore “twofold” subject, endowed with a voice which carries at least two “tones and echoes” belonging to the uttered word. Voice carries the individual expression of contact with the other which is always mingled with some alien components. It supports the necessary multiplicity belonging to the living language in the word. Bakhtin’s point is about “individual manyfoldness of voices” (Bachtin 1979, p. 157–159). And in this multiplicity there is movement and life, both ideas serving the counterpoint to monologism. The individual manyfoldness of voices is grounded in the social manyfoldness of speech (Bachtin 1979, p. 157), so that one can assume that an individual voice is always manifold. It has a multiplicity of expressions, corresponding to the social language needed in the actual situation. This amounts to saying that, for Bakhtin, voice is not a completely individual phenomenon; on the contrary, it always transmits the typifying character of spoken national and social language, and of genres.

A living language is for Bakhtin not conceivable outside the dialogical movement. The same attitude relates consciousness with voice. In Dostoyevsky’s characters, Bakhtin observes consciousness in continuous dialogues of voices, internal as well as external, representing “the whole person” and having a special “density” and “resilience” (Bakhtin 1984).

…for Bakhtin, dialogue is an expression of the essential characteristics of consciousness, which unite it with external, also dialogical existence […] it is the concrete psychological embodiment and measure of the social quality of consciousness. (Radzikhovskii 1986–1987, p. 18).

Voice carries consciousness, manifests it. One voice, one consciousness “says” nothing; what is really needed are at least “two”: “A single voice ends nothing and resolves nothing. Two voices is the minimum for life, the minimum for existence” (Bakhtin 1984, p. 252). Thus, Bakhtin does not view consciousness in itself but seeks existential characteristics for it, which he finds in the dialogue of voices. The social aspect of consciousness “consists in two minds addressing one another internally” (Radzikhovskii 1986–1987, p. 21). Because all reality is interpersonal communication between voices, consciousness is a voiced internal dialogue.

In dialogical self theory, foremost in Hermans’ work, there is explicit reference to Bakhtin’s metaphor, leading from the polyphonic novel to the polyphonic self made out of I-positions which can be endowed with a voice. With regard to therapeutical work and researches in identity construction, the notion of voice is closely related to processes of change, to the development of new and different positions in the self. The spatialization of self (Hermans 1996) allows for simultaneously different positions, and for movement between these positions. The I moves in this space, having the capacity to “endow each position with a voice” (Hermans 1996), thus establishing dialogical relationships between positions. Hence, in “voice” it is the process of giving a voice, and through this to come into a process of change that matters; voice and position are the basic notions constructing the space of Self, its perspectivity, its stories, its coherence (see e.g., Raggatt 2006). Movements in the self are conceived either as centrifugal (multiplicity of positions, discontinuity and innovation, risk of fragmentation), or as centripetal, with emergent meta-positions (continuity and stability, risk of rigidity); these movements are in constant tension and complement each other (Hermans and Kempen 1993). The processes of voicing or silencing can be seen as carrying these movements. Therefore, in the work of Hermans, I would underline the generating character as basic feature in the concept of voice. The extension from the texture of a novel to the texture of self is a metaphorical one, serving firstly the construction of a model of the nature and dynamicity of self. But, exactly as one can observe in Bakhtin’s shift to voice as manifestations of consciousness—voice thus tending to a concrete event—the extension in Hermans serves secondly just as a manifestation of the assumed I-positions in the self, as can clearly be seen in the Personal Position Repertoire (PPR), a method allowing access to the repertoire of potential voices of a person (Hermans 2001b).Footnote 3

Generally speaking, in dialogical self theory the generating character in the notion of voice is taken as fruitful theoretical heuristic in understanding diverse psychological processes and entities as “voice”. The dialogical self is described as an entity made up of a multiplicity of parts named either “voices” or “positions” or “characters”, refering to an agentic starting point for a message, addressed to any person, or to another part of the self (Hermans and Dimaggio 2004).

Hence, it is not only the possibility of movement and simultaneous multiplicity that is related to the notion of voice, but also the aspect of independence and agency. It is with these very notions that Stiles (1999; Stiles et al. 2004) is able to conceive experience as an embodied rather than as a mere cognitive representation. With regard to the present context, it is interesting that Stiles’ use of the term “voice” shows a development from a metaphorical to a literal understanding. Departing from the heuristic metaphor of voice, Stiles (1999; Stiles et al. 2004) arrives at a literal understanding of voice, where the internal multiplicity can be externally heard and empirically analysed. In these researches, voices are firstly seen as internal, developing as traces from lived experiences, incorporating expressive, experiential, and interpersonal elements (Osatuke et al. 2005), then manifesting outside. Starting with the hypothesis that each of these manifesting voices sound different from each other, the authors demonstrate convincingly the identification of different voices within a person, characterizable by distinctive names and by a qualitative description of vocal and personality features (Osatuke et al. 2004, 2005).Footnote 4 Consequently, “voice” comes to be a clearly embodied entity, underscoring “the physicality of psychological self” (Osatuke et al. 2004, p. 252).

With the deliberate literal understanding of voice as a perceivable auditory event, Stiles and collaborators are going a different way than Hermans, whose focus is the meaning-making process of a person, the personal meanings a person associates with a certain I-position (Hermans 2001b). Hence, the notion of voice is also of central importance to the PPR method insofar as what participants utter is which matters; so, voice is taken in its verbal dimension: “the words, concepts and interpretations from the participants are reported in their original formulations so that their voices can be heard as they want to be heard” (Hermans 2001b, p. 324). Starting from the sounding voice and including attention to the verbal content uttered, Osatuke et al. (2004) were able to analyse discrepancies between the auditive and the verbal form of a voice, which, in turn, leads to a quite differentiated description of the intrapersonal dynamics of voices: shifts, blends, and mixtures between voices of a speaking person could be identified by these authors.

Finally, stepping out of dialogical science, Steels’ (2003) quite different approach should be sketched briefly. Situated in the field of robotics, Steels’ account of voice links theoretical reflections to empirical work in computer simulations, and—not the least—relates voice and consciousness. This work, done outside of any Bakhtinian thinking or dialogical self model, could be for this very reason a fruitful supplement to the understandings of “voice” so far referred to. In any case, Steels’ account of voice, the very fact that he comes to conceive that notion as part of a supposed intelligent machine, illustrates quite clearly what I have alluded to with regard to the limits of self-contained systems. Complexity is only to approximate by including some form of relatedness, or even: alterity.

Starting from the observation that normal persons, including deaf people, report hearing almost constantly a silent voice when not overtly speaking, Steels (2003, p. 174) introduces the notion of inner voice and links it closely to the self. Steels then asks for the conditions of producing an inner voice; turns to its functions in asking for the reasons it has arisen in evolution, and for the benefits once such an inner voice is established. His answers focus on two aspects, language and self, the last one leading Steels to reflect on consciousness.

Regarding language, the inner voice helps to explain the emergence of grammatical complexity, where so called re-entrance is the central mechanism. In short, a re-entrant system applied to language builds on the idea that a speaker applies his own language processing system to his/her own utterances, and this re-entrant mapping is, according to Steels (2003, p. 182), “necessary for bootstrapping a lexical language into a grammatical one”. Once this re-entrant system is established, it can be used for purposes outside communication: firstly, to verbalize thoughts without communicating them only in order to test “how they sound”; it thus becomes possible to reflect own thoughts: for Steels (2003, p. 183) the foundation for the creation of a sense of self. Secondly, in using different bodies of grammatical conventions, one becomes able to simulate the way somebody else would speak—“As a consequence, we are able to simulate the voice of somebody else, and thus set up internal dialogs which explores different points of view” (Steels 2003, p. 183). Hence, the way language is processed through a re-entrant system leads to a kind of deviation consisting in using the means not for others but for oneself, and—if one follows Steels—it is precisely this detour from others which makes conceiving both (as sides of the same coin) a self and an imagination of others possible.

Two points should be mentioned at last. First, internal language, and therefore inner voice, is for Steels a side effect of being able to learn and use external language: it is social communication which matters, and its internalization. Second, the “inner voice mechanism” can be used in order to understand one’s own experiences, enabling a simulation of the thought processes of others, thus providing the foundation for complex social behavior (Steels 2003, p. 175). Here, it should be noted that simulation within external communication is found in children’s pretend role play, quite important to theory of mind (Lillard 2001, p. 20); and children’s role play is exactly that: enacting different roles and positions, often performed with different voices (Cook-Gumperz 1995; Miller and Garvey 1984).

When overviewing the different usages of “voice”, one can notice a mainly metaphorical use, although accompanied by the tendency to concretize the notion into a phenomenon. This phenomenon is then supposed to be a manifestation—either of consciousness (Bakhtin), of self—positions (Hermans), or of traces of experiences (Stiles). Thus, the metaphor shifts into literality, that is in this case: reality. Worth noting is the direction of the shift. The idea seems to be that voice is a kind of envelope, giving a perceivable form to interior processes or entities.

My proposition is on the contrary to reverse this conception and to turn to the literal source of the Bakhtinian musical metaphor, i.e. foregrounding the aspect of the human voice in polyphony. The point of departure for conceptualizing “voice” is then a concrete event taking place between people sharing a communicative practice. Hence, questions concerning the transformation of the perceivable phenomenon to an unperceivable, personal or interior experience follow from this. In that, I will follow the same path as Steels (2003), who puts priority on external language use in different contexts of communication. Communicating with others is what forms internal language and gives rise to an inner voice.

Description of Voice in Five Concepts

The following description integrates different approaches and is organized in a first step around the concepts of indexicality, intonation, and body. These concepts belong to a voice as product of development. These three concepts are supplemented by two further ones: imitation and internalization accounting for the ontogenetically developed voice. With regard to the above mentioned transformation of the perceivable and social phenomenon to an unperceivable and interior process, the issue of internalization will be of special interest.

Indexicality

The term “indexicality” refers to the dependency of natural language utterances on context, which can include various phenomena including, e.g., regional accent, indicators of verbal etiquette, referential use of pronouns, demonstratives, deictics, tense. The verbal ones are investigated by linguists, implicitly giving a paradigm of indexicality in these forms, mainly pronouns and deictics. But, as Laver (1975) writes:

just the fact of speaking and allowing the other participant to hear the sound of one’s voice, regardless of the actual content of the utterance, provides the listener with some information he needs to reach some initial conclusions about the psychosocial structuring of the interaction. (p. 221)

What Laver (1975) terms “phonetic behavior” is important to the participants when they construct a working consensus for the beginning interaction. The features of the voice serve as an orientation.

When a person speaks, he reveals often very detailed indexical information about his personal characteristics of regional origin, social status, personality, age, sex, state of health, mood, and a good deal more. […] As listeners, we infer these information from phonetic features such as voice quality, voice-dynamic features such as control of pitch, loudness and tempo, and from accent, as well as to some extent from features of linguistic choices made by the speaker. (Laver 1975, p. 221).Footnote 5

A distinctive mark of indexical expressions is their co-presence with what they stand for. Related to this is the fact that they give little or no description of their referents, they rather function as link to their context, not as designators of objects and properties. Therefore, indexicals are closely associated to gestures, such as pointing and showing or handing over (Hanks 2001). Co-presence and gesture relation stress the fact that indexicals are anchored in a bodily dimension of language. Finally, the function of indexicals is to direct the addressee “to look, to listen, to take an object in hand” (Hanks 2001, p. 119). Precisely this embodied directing of the other is found in Karl Bühler’s (1990) theory of indexicals.

Bühler (1990) associates indexicals as deictic words with the so called deictic field related to perception. Bühler differentiates four forms of deixis in the deictic field from which the I—here—deixis will be picked out, for its hints to voice. Departing from the questions “Who is there?” posed behind a closed door and “Where are you?” posed in the dark, Bühler (1990, p. 110) analyses the answers “I” and “here”. Bühler terms “I” an “individual signal”, and “here” a “positional signal”. Identifying “the place or the person involved” is done on the basis of the sound. This sound reveals the individual character as well as the origin of what is expressed. So, for Bühler the core function of the primal “here-word” is to direct the gaze to the position of the speaker. The primal “I-word” does more: it not only demands to seek the speaker with the eyes but also urges the listener to aim at the speaker with what Bühler calls “a physiognomic gaze”. These ideas can be followed up in Bühler’s analysis of the pronouns “I” and “thou” as indexing persons in a speech drama, therefore not designating anything. And in this, they individuate speakers and assign them a position.

The phonologically imprinted, formed structure ich (I) […]. resounds with the same phonological form from millions of mouths. It is only the vocal material, the auditory shape that individuates it, and that is the meaning of the answer I given by my visitor at the door: the phonematic impress, the linguistic formal factor in his I points out the vocal character to me, the questioner. (Bühler 1990, p. 129).

Indexicality of voice means to Bühler a turning toward the heard speaker with a “physiognomic gaze”: recognizing him/her as him/her in this specific time and place, at this certain position. Voice directs the other to an individual which is to cognize and recognize. The very possibility of understanding uttered words is in Bühler related to the positioning of the person.

With the concept of indexicality one is immediately thrown into a context of time and space surrounding speakers/listeners moving toward one another and toward the indexed actions and objects in that time and space. But first of all it is the person who shows him-/herself the other as a certain one: in this, the indexical process of voicing is discovering. “Indexical claims” (Laver 1975) shape and constrain the detailed relationship of speakers/listeners. That is, we first have to show and see each other before we can exchange any verbal content. The suggestion made here is that the uttered voice is an important index. Leaving aside social languages and genres, it first of all points to the immediate context, time and space and actual participants.Footnote 6

Intonation

Vološinov, one of the members of the so-called Bakhtin Circle, takes an explicit and radical point of view on the sociality and addressivity of any utterance and of any word. From this, he deduces the social essence of intonation. The concept of intonation developed hereafter is based on this author.

Vološinov’s notion of language is grounded on the idea that the utterance is generated by an experienced extra-linguistical situation (Voloshinov 1981a, p. 188–191). So, Vološinov (1981a, p. 190f.) arrives at language from the outside, so to speak, and he will stress this approach, maintaining the links of verbal and extra-verbal parts. In that, the word does not mirror the extra-verbal situation, nor is this situation to be thought of as an external cause of the utterance, but the word accomplishes the situation, makes an evaluation of it. To the relation of utterance to situation, Vološinov adds the relation of utterance to listener. The utterance is therefore always directed to another, and this leads Vološinov (1981b, p. 298) to take social and hierarchical relations between the interlocutors into consideration.

The notion of intonation is developed in the investigation of form of the utterance; Vološinov distinguishes three fundamental elements organizing it: first, intonation, which is described as the “expressive timbre of a word”; second, choice of word; third, disposition of the word within the utterance. Intonation is emphasized in that it “first of all” relates the utterance to the situation and to the audience. Besides, intonation plays the first role in the construction of the utterance, i.e., the second and the third aspect of form are built as a consequence of intonation (Voloshinov 1981b, p. 305). Intonation itself is determined by the situation and the audience. Vološinov (1981b, p. 305) explains intonation as the phonic expression of the social evaluation. Thus, as speakers/listeners we take an evaluative attitude toward the situation and toward one another, giving value accents which are ideologically shaped. Communicating is first the expression of a certain attitude which gives all utterances a certain accent. And attitudes form intonation, which is first an evaluation of the situation and of the audience (Voloshinov 1981b, p. 307), in turn calling for the adequate word, and assigning a certain position to this word in the utterance.

The function of intonation of voice is seen by Vološinov as similar to the carrying function as developed in my reading of Bakhtin. The features of flexibility and sensitivity facilitate its use and make it pervasive.

L’intonation joue le rôle d’un conducteur particulièrement souple et sensible au sein des rapports sociaux [...] (Voloshinov 1981b, p. 305) [intonation plays the role of a particularly flexible and sensitive leader within social relationships].

Intonation is only understood when one is familiar with the implicit evaluations of the social group in question, be it a family, a social class, a nation, an epoch. And in the process of producing intonations, Vološinov draws on the addressivity of any utterance. In case the speaker can suppose a “chorus of support” in his audience, his/her intonation will be vivid, creative, rich in nuances and self-confident. On the contrary, in case of a lack of support, “the voice will brake”, its richness in intonation will be reduced (Voloshinov 1981a). So, what results here is clearly the deeply social character of intonation, more precisely its dialogical character, being in its features dependent on the other as addressee (who can be actually or virtually present, as Vološinov notes).

Body

Stressing the anchoring of any language practice, however abstract it may be, in a common, lived and shared experience, I arrive at “voice” from a bodily experience. Voice is a central notion to dialogical psycholinguistics for it connects speech with body and emanates from this body. The uttered voice shows, indexes the uttering body—as an individual (gender, social status, age etc.) and as a position (sitting there, coming in front of)—and leaves it as a medium of generalized, inter-individual signs, not belonging to any person (see Bakhtin 1986; Vološinov 1986). Voice refers to a physical event that is never mere physics but always includes assigned meaning: meaning as related to verbal signs, and meaning as related to all embodied expressions of humans, themselves socio-culturally determined. So, “voice” is a vocal-auditory event, and it is a concept belonging to a certain socioculturally constructed way of expression. The uttered voice is absolutely individual, coming from a unique body, but this body is located in specific sociocultural contexts and has a history of actions, movements, labels, etc. So, the voice, too. As for every human expression, the voice is individual and societal, both aspects being the facets of a wholeness, and staying in contrast to “natural” (Vološinov 1986, p. 34).

In saying that the voice is a vocal-auditory event, I refer to the double-sidedness of voice perception, which may be one of the reasons why humans privilege voice as the medium of verbal communication: voice is perceivable both by the speaker himself, as proprioception, and by the listener—in that respect it is different from gaze; it is a concrete, sensitive event, a means to touch the other over space, and as such it encourages transposing and abstractness. This is my proposal for what happens in ontogenesis: the child moves from the voice of its mother as a bodily experience (analogous to her touching and handling) to her voice as medium of signs. Meaning is always there, and always socioculturally shaped, first of all addressed. So, voice offers a meaningful structure in so far as it is directed toward somebody. Body and voice are inseparable. Voice refers to the body it comes from, and the kind of body shapes the quality of voice. Both are social and individual phenomena, manifesting the relationship and tensions between these two interdependent sides.

Marcel Mauss (1999) was the first one to clearly acknowledge that nothing in our bodily expressions is natural, but is rather specific to cultures and societies and even specific to generations in societies. With the term techniques du corps (techniques of the body), Mauss refers to the ways humans use their body, how they hold themselves, how they move, lay down, sit, stand, go, swim etc., even breathe. In his enumeration of the body techniques, Mauss lists the techniques of giving birth, where, for instance differences in handling the newborn are worth noting. All forms of touching and handling the infant are saturated by sociocultural meaning and are a means of transmitting these meanings. Voice plays an important role in raising children. Despite not being in my opinion a technique itself in Mauss’ sense, it is a necessary part of the techniques used with babies and children in that it accompanies, structures and rhythmifies all the handlings and touching of nursing. The voice stresses a certain quality of the caregiver’s action: slow, smooth, rapid, impatient etc. And it is not surprising that almost all cultures have developed a so called baby talk where, besides semantic and syntactical features that reduce complexity, it is the voice quality of the caregiver that matters.

Body means orientation in space, wherein “space” is to be understood as socioculturally constructed and organized. In turn, orientation in space means position, and this is perspective: first of all an attitude toward the other and the world, developing from the techniques du corps, i.e. from the socioculturally meaningful ways one is held toward other and the world. An emotional-cognitive perspective is acquired together with a body position, from where things are seen in a certain way and from which one can tell certain stories and feel certain feelings. So, the position and its perspective uttered in a voice are closely related to early body experience shaped and formed by others. If one assumes that any perspective and its position uttered in a voice develop out of the relationship others express towards self, one must include a pervasive affective attitude. Josephs’ (2002) claim for an emotional ground in voice meets this reasoning.

In Bertau (2004a, b) I hypothesized that the voice of the caregiver supports and leads the development of the infant and child from diffuse social acts to clear mutual exchanges. With regard to the importance of this primary, embodied auditory–vocal event coming from a certain person and addressed to another certain person, the development of voice in ontogenesis was grasped in the model of phonicity. Based on early dialogical structures beyond verbal language between mother and infant (Bruner 1983), and supplied with important steps for developing intersubjectivity (Rochat et al. 1999; Akhtar and Tomasello 1998), the child enters speech acquisition as an already dialogical being, aimed at mutual and addressed exchanges (Lyra 2007). That is, the developing verbal voice of the child, his/her social speech, will manifest dialogical positions which were “offered” to him/her by the caregiver(s). According to the proposed model, development culminates in the stage of polyphonic dialogicity, where the child takes up the mother’s multi-voicedness from the first, monophonic, stage and realizes it both in the ability to imagine other perspectives and to enact them with voices.

Imitation

An important device in developing a voice is imitation. Imitation can be seen as a means to slip into the other and his/her perspective. This ‘slip into’ is particularly interesting for it leads to an inside, rendered possible through the (as it seems) specifically human intersubjectivity. Tomasello (1993) stresses the aspect of perspective taking which I loosely termed as ‘slip into’: “Joint attention is not just shared visual gaze but a true perspective taking.” (p. 176).

In another study Call and Tomasello (1995) demonstrate that this form of learning is related to imitation, in contrast to what the authors call emulative learning and learning through mimicking observed in different apes. Imitation is, in contrast to emulative and mimicking learning, based on the understanding of the goals, i.e., intentions of others, the ability to understand the actions of others as goal directed. Recently, Tomasello et al. (2005) have deepened this aspect on the basis of new empirical findings. Going beyond the assumption of understanding the intentional actions and perceptions of others the authors suggest “shared intentionality” as a key requisite to human cultural cognition. Thus, the ontogenetic pathway goes from dyadic engagement with shared emotions and behavior, through triadic engagement with shared goals and perceptions to collaborative engagement with joint intention and attention. Notably, the authors assume “a special kind of shared motivation in truly collaborative activities” (p. 690), this motivation can be described as desire towards the other, as strong drive, in the end responsible for uniquely human cognition:

Our proposal is that the uniquely human aspects of social cognition emerge only as uniquely human social motivation to interact with an emerging, primate-general understanding of animate and goal-directed action—which then transforms the general ape line of understanding action into the modern human line of shared intentionality. (Tomasello et al. 2005, p. 688; emphasis added).

Some aspects of imitation can be further added to conceive the ‘slip into’ someone’s perspective. First, it should be stressed that imitation is not only quite frequent in adult–child talk, the frequency of the adult imitating the child is also worth noting (see Blount 1972). That is, in a sole child imitation one can see dialogue which proceeds as follows:

  1. 1.

    Child utters/vocalizes

  2. 2.

    Adult imitates child’s utterance/vocalization

  3. 3.

    Child imitates adult’s imitation of his/her own original utterance

  4. 4.

    Adult confirms child’s imitation as genuine utterance

Especially step (3) is interesting. The child, in imitating the adult model of his/her own first utterance, imitates or repeats him/herself, but at the same time both voices are present in step (3). Of course, the utterance (or vocalization) changes in quality from (1) to (4): it is shaped according the criteria of relevance valid for the specific utterance situation. Hence, the notion of imitation is a strongly dialogical one: Imitating is done in an incessant movement from one to the other, each one giving and taking parts of what is expressed, transforming it in the course of the movement. In speech acquisition, forms are established for the sake of inter-individual comprehension. It is not only this instructive function which matters, but also its bounding and carrying one which is, of course, highly affect-laden.

In closing these remarks on the concept of imitation, it is proposed that the most powerful scaffold for the child to align (and for the adult, too) is in repeating and imitating the voice quality of the other. Thus, the structure of dialogical turn-taking and of the mother’s voice intonation function as supports by virtue of a concrete perceptibility (rhythm, prosody) that the infant can imitate. Children in a preverbal stage seem indeed to avail themselves of the intonation in order to come into speech and into specific speech acts like questions and demands; it is at this point that Bruner (1975) speaks of a “prosodic envelope”.

In recent researches one can find strong support for the position stressing the importance of concrete voice perception in psychological development. Castarède and Konopczynski (2005) take into account the speaking subject who has disappeared in “pure” linguistics, and highlight with him the vocal relation between two voices. The research reported is mostly undertaken in clinical contexts from a psychoanalytic perspective, centered on the auditive–vocal exchanges between mother and very young infant.Footnote 7 Is also noticeable the relation to intersubjectivity theory and to theory and research in music, both approaches being united in Trevarthen’s recent research (see Trevarthen and Gratier 2005; besides, the special issue of Musicae Scientiae 1999–2000). This supports the aspect of voice stressed here as a real auditory–vocal event right at the beginning of development—and leads to the question regarding the transformation of this event into an internal process. The concept of internalization will provide a starting point in the attempt to find an answer to this complex question.

Internalization

Vygotsky (e.g., 1978) was the one who pointed to the social dimension of internalization; to view internalization as founded in social processes. Vygotsky deduced that any so-called higher (culturally determined) mental function (such as remembering, attention, thinking) develops by internalization processes out of social interactions and is thus itself fundamentally social. The interactions with others are semiotically mediated, especially by language. What is internalized is the social relationship, a dynamic structure of otherness of a certain quality, mediated and at the same time shaped by language. Vygotsky does not assume that external and internal processes are copies of one another but that internalization transforms the social, inter-individual process itself and changes its structure and functions (Wertsch and Stone 1985, p. 167). In stressing the social, inter-individual origin of individual psychological processes, Vygotsky’s approach is quite close to dialogical theory: both employ a notion of alterity. For this reason, this approach will serve to develop the concept of internalization.

Keiler (2002) asserts that there are two versions of the notion of sociocultural development in Vygotsky, the first one dating from 1928 to 1930, the second one, a revision of the first, from 1931. In both versions, internalization is a key concept, whereby the role of the other is slightly changed. In the first version of Vygotsky’s theory of sociocultural development, the genesis of higher mental functions is accomplished in four stages. The first is the stage of natural psychology, followed by the stage of naive psychology, itself followed by the stage of outer cultural method with signs which are only shortlived, leading to the fourth stage of inner activity: here internalization takes place, the outer means (signs) are transformed into inner ones, they become “ingrown”: “The external means, so to speak, become ingrown or internal” (Vygotsky 1929, p. 426). This process corresponds to a qualitative transformation of “natural psychisms” into culturally determined higher mental functions. What makes this transformational process possible is the fact that the child takes a “psychological attitude” toward him/herself, and that he/she seeks to control his/her own behavior, including mental processes (e.g., attentional and remembering processes). However, what is not mentioned here is the role of the other in forming the child’s “psychological attitude”, and his/her control over the child’s behavior.

Vygotsky’s notion so far is that of an organic process where something is growing in a certain way, backgrounding in my opinion the social aspect of internalization Vygotsky comes to underline later on. Generalizing the four stages to any higher mental function, Vygotsky derives two main age levels where the role of the other is hinted at, the focus however remaining on the child. First, there is a process from adult to child in which the child appropriates by an act of synthesizing the originally distributed process. This unified process, however, psychologically remains “distributed”, in the act of as if it was done by two persons. Doing this, the child is then able to “grow in”, that is, to move from outer use of means to an inner one.Footnote 8 Here, there is a structural similarity to the process of imitation as sketched above. In imitation, too, the child takes a behavior from the adult and performs it as its own. This is possible because the imitated behavior was originally own’s one, imitated by an adult. So, imitating has a double-voicedness, and internalization too, for it brings together other and self in one person.

The revised version is dated by Keiler (2002) autumn 1930 with Vygotsky’s conference on psychological systems (Vygotsky 1997a). The development of higher mental functions is now not limited to purely intrafunctional change, where any function is transformed as such, but to a deep interfunctional one, where the original relations between functions are transformed. From this, Vygotsky deduces his well known claim that any higher mental function appears twice in development (Vygotsky 1978, p. 57). This leads to the general thesis that any higher mental function is originally a reciprocal, mutual process. Having described the child as taking a “psychological attitude toward his/herself” in his first version of sociocultural development, Vygotsky speaks in late 1930s of the child taking the role of the mother toward his/herself, and coming, through this role reversal, to control his/her own behavior (Keiler 2002, p. 197). So, the turning around, so to speak, is now given a clear social origin.

In his work on paedology published in 1931 (Vygotsky 1998), Vygotsky heightens the social aspect in his second law on the cultural development of behavior, saying that the relations between higher mental functions are transferred social relations. So, the higher functions are inner social relations, transferred into the personality, but deeply social. Language is in this transfer of critical importance, understood by Vygotsky as a means of influence, of acting on another and—by internalization—acting on oneself (Keiler 2002, p. 201). This is what permits Vygotsky (1998, p. 170) to write: “Through others, we become ourselves”; the development of the child involves the transformation of social relationships in mental functions (Vygotsky 1997b).

Even though Vygotsky quite precisely depicts the aspects of internalization through the organic metaphors and in the processes of “as if” and role change between mother and child, I would like to try to come even a bit closer to the mechanism of internalization, stated by Wertsch and Stone (1985) as “semiotic”. I propose to narrow “semiotic” to verbal signs, thus following Vygotsky’s acknowledgement of language as a privileged means of internalization—being himself in the tradition of Hegel’s view of language as the tool of tools (see Keiler 2002, p. 188). So, the point would be to come closer to the language process.

It is already clear that imitation and internalization are closely related. Both function through another with whom self is acting. This other is—besides being a significant one to whom the self wishes to be related—physically present, a mirror, or echo, and a former of the self’s actions and expressions. The presence of the other is thus active, not just there but directed, addressed to self, literally in touch with self: by means of actual touch or of voice, or of both. And this is done in physical, reiterated patterns: forms giving themselves form in the growing mutuality of an adult and an infant (Lyra 2007). Imitation allows an exchange of forms of behavior and forms of expression, corresponding to a close give and take. Role change distinguishes more clearly between what the one and the other is doing, and allows one, therefore, to be the other for a moment, to integrate this other in self. Again, voiced forms play an important role, giving the child indications about roles and their timing. Imitation, role change and as-if acting, which are all found in children’s play and in their imaginative dialogues, are devices in the process of moving from the outside to the inside, and between self and other.

Lillard (2001) points to the fact that in pretend role play and in imaginary companion pretence “a child practices at being other people” and comes to experience and thereby know the other’s thought—thus, pretence is important to theory of mind. Further, in pretend play children construct a “decoupled world”, an operation by which representations are temporarily removed from their usual referents, also described as “conceptual move”. I suggest that this move is akin to the one I posit, leading to a reconstructed and transformed other in self, still speaking, a resounding trace in memory and imagination. The basis of the move is a sensitive experience of the other, and this is the reason for giving the actual voice, in which language is expressed and given to another, a specific status: it is a live form which acts as a carrier leading from outside to inside. This form is form and meaning, or: formed meaning. Meaning which manifests itself in form, not detachable from it. Noteworthy for the process of moving inside, a precise (or adult) meaning need not be established for the form to function as carrier. Indeed, Vygotsky (1987) underlines the contrast between adult’s and children’s concepts, different in meaning but seemingly the same because of their linguistic form; the child’s concept, the inner side of meaning, is developing.

The suggestion is to see the experienced voice of a significant other as mechanism of internalization. The specific intonations and the expressive, idiosyncratic style of the person as manifested in his/her voice render a specific taste as to what is internalized: this is individual as well as inter-individual, corresponding to the genres of speaking and intonating of the speech community. So, what permits the movement from outside to inside is a meaningful, perceivable social form, tied to a person. I understand the voice as this form, carrying the other into self and self into other, a scaffold: graspable, embodied and thus living materiality. This form offers a meaningful structure in so far as it is always being turned toward somebody and because of its appertaining to the inter-individual interactional world it is rooted in.

Both ways of having and giving meaning—in personally addressing and in being inter-individually rooted—are indissociable, assuming that the individual alone is non-existant, solely conceivable as a social being whose psyche and consciousness are socio-ideological facts (Vološinov 1986, p. 12, 34). As Vološinov (1986, p. 22) writes, for the animal cry “the social atmosphere is irrelevant”, this cry is bereft of any value accent. On the contrary, a voice does count on such an atmosphere and it sets an ideological accent, thus belonging to the inter-individual realm.

In the course of development the voice as perceivable form is interiorized, and with it the attitude and perspective of the (social) person the voice belongs to. In the dialogical self, several voices exist on the basis of primary voice experience. Some may retain their relation to a specific person, some may be altered by such processes as condensation and displacement, by imagination and generalization (see Mead’s generalized other). A completely abstracted voice is then conceivable as a subject’s perspective and conceptual horizon—but the primary experience bound to the perception of a speaking person as present body is the necessary ground.

Pathway to Consciousness

Approaching language from the viewpoint of speaking and listening subjects as opposed to “pure linguistics”, with its focus on a detached system of structures and elements, leads to understand language as perceived and experienced forms taking place in time and space. The voice of an uttering person is then conceivable as meaningful, socio-culturally shaped form, functioning as semiotic device. The five concepts developed above picture this notion. It is especially in trying to understand how a person comes from the perceivable phenomenon to an unperceivable, interior experience of a voice (with the possibility of transformation into a position, or a perspective) that I make use of the aspect of embodied, of living materiality. As such, voice carries the other into self and self into other; it functions like a carrier.

When looking closer at this movement, the role of other and of exterior communication for the development of interior experiences and self was highlighted. Communicating with others forms internal language and gives rise to what Steels (2003) calls an inner voice, including the possibility of simulating the other’s speaking in one’s imagination. With a side-look to pretence play it is to be added that simulation of others and of their speaking is observable in children’s socio-dramatic play (from around the age of 3 years onwards). Pretence role play may be the precursor to inner simulation, and is surely a field of practice both for language learning and for acting as if being another one. Lillard (2001) speaks in this context of a “decoupled world”. The term “decoupled other” is here proposed in order to grasp the kind of deviation from other-oriented communication Steels (2003) described through the re-entrant system, namely a deviation leading to a use of verbal communication for oneself, rendering a conception of self and of other possible. Using a pointed formulation, I would say that abusing communication and the other (as role) leads to a sense of self.

As said, Steels (2003) links the possibility of creating a sense of self to the establishment of a re-entrant system. For Steels (2003, p. 183–184), the emerging self-model is not to be identified with consciousness, but it is “surely part of the conscious experience and therefore the development of complex language communication may have played a crucial role in the origins of consciousness”.

An akin idea, firmly anchoring self and consciousness in social and communicative practices, and even more explicitly in alterity, is already found in Vygotsky’s (1999) elaboration of the link between consciousness and behavior. The notion of “social irritant” is critical here, referring to “irritants coming from people”; these kind of irritants are for Vygotsky (1999, p. 277) standing among all others because

I, myself, can reproduce the same irritants and […] they become reversible for me very early, and hence determine my behavior in a different way from all others. They make me comparable to another and make my actions identical with one another. Indeed, in the broad sense, we can say that the source of social behavior and consciousness lays in speech.Footnote 9

Consequently, Vygotsky (1999, p. 278) deduces self-awareness (self-consciousness) from other-awareness, and this means especially to take own reflexes as new—say, as if not own—irritants.

We are conscious of ourselves because we are aware of others […]. I am aware of myself only to the extent that I am as another for myself, i.e., only to the extent that I can perceive anew my own reflexes as new irritants.

From this sequence of inferences where social irritants are the source of a reversal leading to self-other formation and thus to self-consciousness, Vygotsky (1999, p. 279) deduces the idea that

consciousness is, as it were, social contact with oneself […]

In concert with Vygotsky’s understanding of speech as social irritant, it is the meaning of communicative practices which is to be highlighted, and especially of verbal communication, as critical for self-awareness, for a sense of self, and for consciousness to develop. It is this aspect of language—as semiotically mediated process, taking place in perceivable forms between mutually related and meaning negotiating persons—that matters. Nelson’s (2005) account of the development of a sense of self, and hence of a complex level of consciousness, points to the same idea.

First, drawing upon Damasio (1999), Nelson (2005) proposes to conceive consciousness not as unitary but as built up of discernable levels ranging from “simple awareness of feelings and perceptions to complex extended states of consciousness”, the last one corresponding to a full self-knowing consciousness (Nelson 2005, p. 116). Nelson describes in details the different levels developing in infants and children (physical, social, cognitive, representation/reflective, narrative, and cultural)—from these only the narrative level should briefly be picked out, narrative consciousness being “the center of the action”.

Narratives are not simply to be understood as presentations of events. Rather, they involve personal perspectives and motivations, temporal and spatial locations, as well as evaluations; these aspects are not the least invoked in children’s play as they begin to collaborate on imaginative scenarios (Nelson 2005, p. 134).Footnote 10 The new development is for instance supported by (and manifested in) the children’s emerging capacity to listen with understanding, to follow storylines, and to contribute to the making of narratives about their own lives. From the plethora of transformations in the child’s consciousness Nelson names the followings: an explicit awareness of the contrast between other’s people stories and one’s own experiential story; contrasts in past, present, and future with an explicit concept of temporal location, linked to temporal adverbs; a sense of “continuing me”; and a theory of mind (Nelson 2005, p. 134). For Nelson, narrative consciousness realizes the potentials inherent in the previous cognitive and reflective level and even goes beyond.

Most notable is the newly emerging sense of self that is situated in time. These two constructs—self and time—go together and are undoubtedly specifically human and made possible through symbolic capacities of human language used communicatively and cognitively. […] In this theory narrative, autobiographical memory (stories of the self) and self-concept are interdependent in development. (Nelson 2005, p. 134).

A sense of self emerging from communicatives practices with social others leads to a quality or to a level of consciousness where possible worlds open up, where awareness about cultural roles that people play emerges, finally leading, according to Nelson (2005), to cultural consciousness.

The significant step within the development into a self-aware, self-knowing and social self—that is a self able to imagine the other’s perspective (which do not implicate that this imagination is right; it is the fact of this imagination that matters)—is related to the shift into an as-if. This corresponds to a temporary decoupling, seemingly non-instrumental, detached from direct practical needs. A decoupling done with regard to usual referents as well as to external others. The suggestion is to locate the critical as-if in the decoupling concerning another person and her communication, experienced by self in a certain way and through certain forms. Thus, in this sketch, it is the voice of a communicating person which is thought to be a powerful means for the emergence of an interior experience of self and other. And this experience is, in turn, linked to consciousness in its complex states. Hence follows from this not only the complete “sociologising of all consciousness” (Vygotsky 1999, p. 278), but also its base in a vivid, material experience. As Vološinov (1986, p. 90) put it:

Outside objectification, outside embodiment in some particular material (the material of gesture, inner word, outcry), consciousness is a fiction.

Conclusion

Starting from a dialogical view of human communicative and cognitive processes where language is understood as semiotic activity of mutually addressing persons—i.e., in its dynamical and material dimensions—, the course of reflection finally lead to consider consciousness in the perspective of voice.

Significant moments of this considering are external communicative practices between persons, and a movement of reversal done with respect to these very practices and persons. The reversal is firstly found in Vygotsky’s (1929, 1997a, b, 1998) account of internalization, a movement through which the child becomes able to take the role of the other toward herself in order to control her own behavior. The social and interactional source of psychological functions is thus deduced, and with it the prominent role of speech: a semiotic means employed by another and then by myself, a “social irritant”, early reversible in ontogeny (Vygotsky 1999). Secondly, the reversal is found in the notion of a re-entrant system as introduced by Steels (2003), where one applies own verbal productions to processing, that is: one takes one’s own as if coming from another. This movement has also been labeled as deviation and detour from the other, an abuse of communicative social practices within another realm of activity and experience, the interior one. Looking briefly at children’s pretence play one could finally observe that the movement of reversal takes (first?) place when children “practice at being other people” (Lillard 2001). Lillard points to a decoupling process with regard to usual referents; additionally, a decoupling of the other was proposed, allowing for deliberate imaginative uses and abuses of the other as social person by self. Worthnoting, pretence play is important in developing a theory of mind—a sense for the other’s interior world. Simulating the other leads to understand his/her thoughts (Lillard 2001); similarly, at the interior level, the inner voice generated by the re-entrant system allows for simulation of somebody else’s speech in order to explore different points of view (Steels 2003).

Hence, the reversal reveals a kind of paradox: deeply oriented towards another person and simultaneously disregarding this person. Leaving the communicative, goal-oriented, commonly negotiated social communication and taking at the same time this shared world as resource for a personal, interior, cognitive world. It is within this movement that an awareness for other and self is developed, an interior experience related to a certain quality of consciousness: a sense of self, or self-consciousness (Vygotsky 1999; Steels 2003; Nelson 2005). Put in this way, consciousness is a specific interior experience of self and other, and it is the voice of a communicating person which is thought to be a (privileged) means for the emergence of this experience. In this sense, consciousness is “social contact with oneself” (Vygotsky 1999). Self can not go past a speaking alter, even not—foremost not—in developing and experiencing consciousness. And this experience remains a social one, regardless of the fact that it belongs to the personal realm.