Multimedia learning environments have become ubiquitous means for supporting knowledge acquisition. Cognitive theories of multimedia learning like the Cognitive Theory of Multimedia Learning (CTML, Mayer 2001) provide empirically validated guidelines for the design of multimedia messages that have been mainly formulated against the background of cognitive information-processing models. Only recently have there been attempts to integrate also socio-motivational aspects into CTML (Mayer et al. 2003). These augmentations will be used in the following to analyze the impact of varying specific aspects concerning the presentation format of verbal explanations.

Designing verbal explanations: cognitive and social-motivational considerations

Research on multimedia learning is mainly concerned with the questions of how to best design verbal and pictorial representations and how to combine them most effectively. For example, according to the modality principle, dynamic visualizations that need verbal explanations are best presented as animations with auditory text (narrated animations) instead of written text (Moreno and Mayer 1999). This principle can be explained by purely cognitive assumptions and without taking into account social-motivational considerations, for instance, by assuming that auditory text offloads the visual processing channel and allows for attending to pictorial and verbal information in parallel. However, when using narration instead of written text, the question arises which kind of voice should be used. Lately, questions on how to design verbal representations beyond their presentation modality have been addressed in design principles for multimedia messages that rely on a socially enhanced view, for instance, the voice principle (Mayer et al. 2003).

According to the voice principle, people will learn better if dynamic visualizations are accompanied by narrations that are presented either with a standard accent rather than a foreign accent voice or with a human rather than a machine-synthesized voice (Atkinson et al. 2004; Mayer et al. 2003). Mayer and colleagues (Moreno and Mayer 2000; Mayer et al. 2003) explain their findings by introducing a theory that goes beyond CTML, namely the so-called Social Agency Theory (SAT). SAT is linked to CTML and suggests that people apply social rules to media, which, in turn, influences learning. The theory postulates five successive steps when learning from multimedia messages that contain social features.

First, it is assumed that stimuli like voices or pictures of a speaker in a multimedia message can act as social cues. According to this assumption, a human voice provides a stronger social cue than a machine-synthesized voice.

Second, due to social cues, learners interpret the multimedia message as a kind of social communication rather than a pure information delivery.

Third, the interpretation of the multimedia learning scenario as a social communication situation leads to the activation of social conversation schemas. That is, social rules of human-to-human communication are applied to the human–computer interaction. SAT adopts this assumption from the media equation theory (Reeves and Nass 1996), which contends that persons tend to behave towards media as they behave towards humans (“media equal real life”, Reeves and Nass 1996, p. 5). The research based on media equation theory has mainly investigated how human–computer interactions influence affective and social-motivational variables. In SAT (Mayer et al. 2003), the assumptions of media equation theory are linked to cognitive aspects of learning as expressed in steps four and five.

Fourth, following a core social rule of conversation, namely Grice’s cooperation principle (Grice 1975), learners assume that the speaker is trying to convey a meaningful message and, therefore, in turn, try to make sense out of the spoken words. Social cues thus result in learners being more motivated and investing more effort to understand the spoken words. Accordingly, they engage in deeper cognitive processing of the instructional materials.

Fifth, as postulated by the CTML, the result of such deeper cognitive processing is a more meaningful mental representation, which is reflected in better transfer test scores.

While this explanation of the voice principle offered by the social agency theory seems plausible at first glance, it is subject to a couple of open questions, which are discussed in the next section.

The voice principle revisited: methodological drawbacks and consequences for future research

There are at least three methodological drawbacks with regard to the existing research on SAT. First, most studies that rely on SAT as a theoretical background have yet failed to provide support for the complex causal chain that is assumed by the theory. In particular, there is hardly any evidence for the assumption that motivation/effort mediates the relation between the presence of a social cue and learning outcomes.

Second, the methodological variation of the social cue is ambiguous. In investigating the voice principle, Mayer et al. (2003) compared only two voices to each other. Thus, there may have been differences on more than just one dimension. For instance, when comparing only one standard accent voice with one foreign accent voice, it is rather impossible to keep other voice features like their pitch or intonation constant.

Third, based on the current empirical data, it is still unclear, whether the SAT is really needed for explaining the superiority of standard accent human voices over foreign accent human voices and machine-synthesized voices. Mayer et al. (2003) themselves admit that this pattern of results is in principle also explainable by the Cognitive Load Theory (CLT, Sweller et al. 1998). According to the CLT explanation, processing a human standard accent voice might impose fewer cognitive demands on learners than processing other voices, thereby leaving more cognitive resources for a deeper processing of the instructional materials. However, a measurement of cognitive load has not been obtained by Mayer et al. (2003) to rule out this alternative explanation.

From these three methodological drawbacks, different consequences for future research can be drawn: First, an assessment of additional variables should be included, particularly with regard to potential mediators like cognitive load, effort and motivation. Second, multiple voices should be presented within each experimental condition to assure that the potential differences can be unambiguously traced back to the experimental variation and have not been caused by other speaker characteristics. Third, in order to investigate whether it is necessary to augment a purely cognitive perspective by social-motivational considerations, it would be helpful to implement a comparison of voices that lead to different predictions, depending on whether one takes a social-motivational or a purely cognitive perspective. As mentioned above, the comparison of voices by Mayer et al. (2003) lead to the same predictions from both perspectives. Therefore, we suggest comparing voices that do not differ with regard to their cognitive processing demands, but that nevertheless activate different social schemas. These constraints are satisfied by varying the gender of the speaker. When contrasting male and female voices, one would prima facie not expect any differences from a purely cognitive perspective, whereas a socially enhanced view allows for several predictions with regard to social-motivational variables, which, in turn, may affect learning. Therefore, speakers of different gender can serve as an appropriate experimental variation to clarify the relevance of social-motivational factors and to give further insights into the relationship between cognitive and social variables in multimedia learning.

The role of the speaker’s gender

Research inspired by the media equation theory has provided evidence that social principles relevant to human–human interaction also have an impact on human–computer interaction (for an overview, see Reeves and Nass 1996). For example, users ascribe a personality to a computer based on verbal cues it delivers and prefer computers that resemble their own personality (Nass and Lee 2001). However, there is not only evidence in favor of similarity-attraction, but also for complementary-attraction towards interactive computer characters (Isbister and Nass 2000). This inconsistency of results matches the inconsistent findings regarding similarity- and complementary-attraction in human–human interaction (e.g., Franzoi 1996). Research on media equation has also provided support for gender stereotyping towards media. For instance, computer programs using female voices were rated as being more competent regarding “female topics” like love and relationships, whereas male-voice compute programs were estimated as being more competent regarding “male topics” like mathematics or computers (Lee 2008; Nass et al. 1997).

In accordance with these findings from media equation research, it can be assumed that the speaker’s gender may trigger gender stereotypes and other social principles, which, in turn, influence the perception and evaluation of the speaker. As a result, either a male or a female speaker might be seen as a more knowledge or likeable person depending on the topic and on learner characteristics. A positive evaluation of the speaker should increase the learner’s inclination to follow Grice’s cooperation principle during the interaction with the learning environment, foster subsequent sense-making processes, and finally result in better learning outcomes.

Moreover, gender stereotyping or principles of interpersonal attraction with respect to the speaker’s gender will occur more likely, the more the learner is aware of the speaker’s gender. Here, it can be assumed that individual differences moderate whether a learner pays special attention to a speaker’s gender. It has been proposed that it is the cognitive availability of the gender schema that determines the awareness for the gender of others (Bem 1981; Markus et al. 1985). A gender schema is a cognitive structure that organizes perception and information processing with respect to gender aspects (Bem 1981). Gender schema theory (Bem 1981) assumes that the cognitive availability of the gender schema is associated with the gender-related self schema. The gender-related self schema is reflected in the sex-role orientation, which is defined by the masculinity and femininity of a person in combination with the person’s biological sex.

Research on gender schema theory provided evidence that so-called “sex-typed” individuals (i.e., men with a high masculine sex-role orientation and women with a high feminine sex-role orientation) show a greater readiness to process information in terms of gender (Bem 1981). For instance, only sex-typed persons tend to arrange information in terms of gender when being instructed to recall previously learned materials, while neglecting other possible ways of clustering. Additionally, sex-typed individuals show shorter latencies for schema-consistent judgements compared to schema-inconsistent judgements. Bem (1981) explains these latency differences based on schematic processing of gender-related information. In the case of schema-consistent information, sex-typed persons can make their judgements rather fast, because they only have to compare the incoming information to an already available schema. However, for schema-inconsistent information, it is not sufficient to compare the information to the schema; rather, subjects have to recall additional information from memory to come to a decision. This implies a time-consuming process, which is reflected in longer latencies. Although, it is not yet clear which specific sex-role orientation (i.e., the masculinity and femininity in combination with the individual’s gender) indicates a high cognitive availability of a gender schema, and thus, can be categorized as sex-typed (Crane and Markus 1982; Hoffmann and Borders 2001; Markus et al. 1982), there are several pieces of evidence that a high masculinity or a high femininity can both influence the processing of gender-relevant information (Bem 1981; Markus et al. 1982; Payne et al. 1987). Therefore, it can be assumed that sex-typed persons (i.e., individuals with a high cognitive availability of the gender schema) show a higher awareness of the speaker’s gender, which, in turn, might influence their information processing.

Hypotheses

From a social-motivational perspective, different hypotheses can be derived on how the speaker’s gender can affect learning. On the one hand, regarding gender stereotyping, men are typically perceived as more knowledgeable, stronger, and more competent with regard to the domain used in the current study, namely Mathematics (Forgasz et al. 2004). Hence, learners should be more motivated to listen to a male speaker and put more effort toward sense-making activities, which, in turn, should improve learning outcomes. On the other hand, women are commonly seen as nicer and warmer compared to men (Franzoi 1996). With respect to this latter gender stereotype, learners might be more motivated to listen to a female speaker and put more effort in trying to understand the verbal explanations, thereby improving learning. Thus, learning outcomes might be moderated by the particular gender stereotype that is being applied towards the speaker.

Moreover, when taking into account principles of interpersonal attraction, there may be interactions between the learner’s gender and the speaker’s gender. According to the principle of similarity-attraction, learners should prefer listening to a speaker of the same gender and might learn better from this type of speaker. However, according to the concept of complementary-attraction, learners might just as well prefer listening to a dissimilar person and thus might benefit from listening to a speaker of the opposite gender.

To conclude, there are reasonable explanations from a social agency perspective to expect better learning outcomes for male or female speakers and for a speaker of the same or opposite gender, respectively. Moreover, it can be assumed that the hypothesized effects of gender stereotyping will be moderated by the cognitive availability of the gender schema. In particular, learners with a higher cognitive availability of the gender schema should react more sensitively to variations of the speaker’s gender then those with a low cognitive availability of the gender schema.

On the contrary, from a cognitive perspective there is no obvious reason for why using male versus female voices for multimedia learning should result in different learning outcomes. Both types of voices can be seen to be equally common and understandable and should require an equal amount of cognitive resources. Thus, from a cognitive point of view, no effects of the speaker’s gender on learning outcomes are expected.

In two experiments, the impact of the speaker’s gender in combination with the learner’s gender on learning was investigated. In Experiment 1, learners were randomly assigned to speakers of different gender. In Experiment 2, learners could choose among different speakers.

Experiment 1

Method

Participants and design

Participants were 84 students (42 female and 42 male) of the University of Tuebingen, Germany, who participated voluntarily for either course credit or payment. The average age of the students was 25.58 years (SD = 4.54). The learner’s gender (male/female) and the speaker’s gender (male/female) were used as between-subjects variables, resulting in a 2 × 2 design. The male and female participants were equally, but randomly distributed across the experimental conditions. In each of the resulting four experimental groups, 21 students served as participants.

Materials and procedure

A hypermedia learning environment on probability theory was used for experimentation, which consisted of four parts: a short technical introduction to the system and to the experiment, a short introduction to the domain of probability theory, a learning phase with eight worked-out examples, whose presentation was subject to experimental manipulation, and a subsequent test phase. In the learning phase, participants had to acquire knowledge on four different problem categories, whereby each category was explained by means of two worked-out examples. The worked-out examples were presented auditorily as narrations, which were accompanied by dynamic visualizations (i.e., animations, cf. Fig. 1). The animations concretely depicted the objects and relations described in the problem statement in an iconic way (cf. Scheiter et al. 2006). The narrated animations were learner-controlled in that learners could start, stop, and replay them. Learners could navigate through the eight different example animations to compare them or to skip them. Pacing was left to the learners. Depending on the experimental condition, participants received a narration that was either spoken by a male or a female speaker. To counterbalance the effects of speaker-specific characteristics, three different male and three different female voices were used and randomly assigned to learners within the respective experimental conditions. All speakers were standard-accent native speakers without speech disorders or noticeable problems. Their voices were adjusted to each other in loudness. The speakers were selected under aspects of variety with respect to their voices’ pitch. The speakers were shortly trained regarding pronunciation and emphasis of the key words of the narrations. The speed of the narrations differed only slightly among the six speakers and was synchronized with the animations.

Fig. 1
figure 1

Hypermedia learning environment with worked-out examples

Subsequent to the example-based learning phase, the learners had to fill out paper-based questionnaires that assessed several subjective dependent variables. Afterwards, the participants had to work on a short exam with 11 test problems. The instructional materials were no longer available during problem solving. Finally, the participants had to fill out a battery of questionnaires that assessed several control variables. Altogether, the experiment took about 2 h.

Dependent variables

Social-motivational variables including evaluation of and motivation towards the speaker, as well as cognitive variables, including learning outcomes, were assessed as dependent measures.

For the evaluation of the different speakers, a German translation of the Speech Evaluation Instrument (SEI) by Zahn and Hopper (1985) in the short version used by Mayer et al. (2003) was administered to the students. Participants were asked to rate the speaker with regard to 15 pairs of opposite adjectives on an 8-point Likert scale (e.g., “The speaker sounds literate—illiterate”). Each of these adjective pairs belongs to one of three subscales: Superiority, attractiveness, and dynamism. The overall scores for the three subscales were calculated by averaging the scores of the corresponding five items. Higher values reflected a more positive speaker rating (i.e., more superior/attractive/dynamic).

Additionally, the motivation with respect to the speaker was assessed. For this purpose, a questionnaire called the Speaker Impression Questionnaire was designed in reference to the Subject Impression Questionnaire by Deci and Ryan (2004b) that had originally been developed to assess the motivation towards other participants in collaborative experimental settings. Although this questionnaire includes six subscales in total, we will concentrate on the subscale effort. The items of the Speaker Impression Questionnaire (SIQ) were formulated analogously to the items of the Subject Impression Questionnaire, that is, the SIQ-effort scale referred to the effort invested in “listening to the speaker” instead of “interacting with another person”. The SIQ-effort scale consisted of three statements for which participants had to indicate their level of agreement on a 7-point rating scale (from 1 “not at all true” to 7 “very true”). High ratings indicated a high effort invested in listening to the speaker.

Additionally, cognitive load was measured with a modified version of the NASA-TLX (Hart and Staveland 1988) that had been used as an instrument for assessing cognitive load in former studies (Scheiter et al. 2006). The scale contained separate subscales for intrinsic, germane, and extraneous load. Additionally, a question that asked about difficulties in comprehending the voice was included. The rating scale ranged from 0 to 10, whereby high values reflected a high level of cognitive load or a high difficulty in comprehending the voice, respectively. All questionnaires were presented as paper-based materials directly after the learning phase.

Learning outcomes were measured by means of 11 test problems of varying transfer distance, which were presented in the test phase of the hypermedia environment. For each of the test problems, one point was assigned for a correct answer; no partial credits were given. The problem-solving performance was expressed as the percentage of correct answers. Additionally, learning time (in seconds) spent on studying the narrated animations was measured.

Additional control variables

Several control variables were registered, including socio-demographical data, prior knowledge (measured by a multiple choice questionnaire addressing the participant’s knowledge of important concepts and definitions from the field of probability theory), and intrinsic motivation measured with a shortened version of the Intrinsic Motivation Inventory (Deci and Ryan 2004a; Ryan 1982). Furthermore, the sex-role orientation of the participant was assessed by means of the Bem Sex-Role Inventory (Bem 1974) in the German version by Schneider-Düker and Kohler (1988).

Results

The data were analyzed by means of a 2 × 2 ANOVA with the learner’s gender and the speaker’s gender as between-factors. For the analysis of problem-solving performance, two additional control variables were included, namely the “Abiturnote” (i.e., final high school grade point average) and intrinsic motivation, resulting in a 2 × 2 ANCOVA. Both covariates showed a significant correlation with problem-solving performance (Abiturnote: r = −.36, P = .001, whereby better school grades were associated with better learning outcomes; intrinsic motivation: r = .26, P = .02, whereby higher intrinsic motivation was associated with better learning outcomes), but were independent from each other (r = −.01, P = .94). The results of the experiment are shown in Table 1.

Table 1 Means and standard deviations as a function of the learner’s gender and the speaker’s gender

Effects of the learner’s gender

Analyses of cognitive load measures indicated that male learners estimated the task difficulty (i.e., intrinsic cognitive load) as being lower compared to female learners (F(1, 80) = 6.17, MSE = 4.63, P = .02, η p ² = .07). Moreover, male learners achieved better learning outcomes compared to female learners (F(1, 78) = 5.86, MSE = 379.84, P = .02, η p ² = .07). Finally, male participants spent less time on learning with narrated animations than female learners (F(1, 80) = 4.70, MSE = 164223.44, P = .03, η p ² = .06). No further significant effects of the learner’s gender could be observed (all Fs < 1, except germane cognitive load: F(1, 80) = 1.54, MSE = 3.35, P = .22, η p ² = .02, extraneous cognitive load: F(1, 80) = 1.382, MSE = 1.35, P = .24, η p ² = .02).

Effects of the speaker’s gender

The data for the speaker rating (assessed with the SEI) showed a significant speaker effect for the subscale attractiveness (F(1, 80) = 7.42, MSE = 1.41, P = .01, η p ² = .09). Female speakers were perceived as being more attractive compared to male speakers. Moreover, analyses of the SIQ-effort scale showed that learners reported to have invested more effort in listening to a female speaker (F(1, 80) = 3.99, MSE = 2.28, P = .05, η p ² = .05). Most important, analyses of the learning outcomes revealed that learners who listened to a female speaker showed a better problem-solving performance compared to learners listening to a male speaker (F(1, 78) = 4.52, MSE = 379.84, P = .04, η p ² = .06).

No further significant effects of the speaker’s gender could be detected (all Fs < 1, except SEI-superiority: F(1, 80) = 1.85, MSE = 1.12, P = .18, η p ² = .02; SEI-dynamism: F(1, 80) = 3.50, MSE = .83, P = .07, η p ² = .04).

Interactions between the speaker’s gender and the learner’s gender

The data showed no significant interactions between the speaker’s gender and the learner’s gender (all Fs < 1, except SEI-superiority: F(1, 80) = 1.96, MSE = 1.12, P = .17, η p ² = .02, SIQ-effort: F(1, 80) = 2.76, MSE = 2.28, P = .10, η p ² = .03, learning time: F(1, 80) = 2.45, MSE = 164223.44, P = .12, η p ² = .03).

Extended analyses of underlying processes

According to SAT, a causal chain can be postulated by assuming that social cues embedded in different voices influence the perception of the speaker, as well as the motivation and effort with regard to listening to the speaker, which, in turn, influences learning outcomes. At first glance, our findings seem to be well in line with these assumptions, as learners listening to female speakers not only showed better problem-solving performance, but also reported to having put more effort in listening to the speaker. However, the partial correlations (adjusted for Abiturnote and intrinsic motivation) between SIQ-effort and learning outcomes failed to reach statistical significance (r = .23, P = .80). Thus, correlative analyses provided no evidence for the suggested causal chain between these variables.

Based on gender schema theory, it was suggested that the perception of the speaker’s gender might be mediated by individual-difference variables, particularly the sex-role orientation of the learners. To test the influence of sex-role orientation as a potential mediator, an additional analysis with respect to the masculinity and the femininity of the participants was conducted.

For the analysis of the masculinity, participants were divided into low masculine and high masculine learners by means of a median-split (Mdn = 4.58) according to the respective scale of the sex-role inventory. The resulting groups showed a comparable portion of male and female learners (low masculine: 25 female and 17 male; high masculine: 17 female and 25 male). For problem-solving performance, a 2 × 2 ANCOVA was conducted with the masculinity of the learner (low vs. high) and the speaker’s gender both as between factors using Abiturnote and intrinsic motivation as covariates (Table 2). For the masculinity of the learner, there was no significant main effect (F < 1). But data revealed a significant speaker effect (F(1, 78) = 4.40, MSE = 389.00, P = .04, η p ² = .05), as well as a significant interaction between the masculinity of the learner and the speaker’s gender (F(1, 78) = 4.53, MSE = 389.00, P = .04, η p ² = .06). Only high masculine learners benefited from listening to a female rather than a male speaker (F(1, 78) = 8.52, MSE = 389.00, P < .01, η p ² = .10), whereas for low masculine learners, the speaker’s gender did not have a significant effect (F < 1). Figure 2 illustrates these findings.

Table 2 Means and standard errors for problem-solving performance (in % correct) as a function of the learner’s masculinity/femininity and the speaker’s gender (Experiment 1)
Fig. 2
figure 2

Problem-solving performance: interaction between the masculinity of the learner and the speaker’s gender

An analogous 2 × 2 ANCOVA was conducted for the femininity of the participants (Mdn = 4.50). The resulting groups showed a comparable proportion of male and female learners (low feminine: 29 female and 28 male learners; high feminine: 13 female and 14 male learners). There was no significant main effect for the femininity of the learner (F(1, 78) = 1.03, MSE = 397.78, P = .31, η p ² = .01). Again, data showed a significant effect for the speaker’s gender (F(1, 78) = 6.46, MSE = 397.78, P = .01, η p ² = .08), however, the interaction failed to reach statistical significance (F(1, 78) = 2.34, MSE = 397.78, P = .13, η p ² = .03).

To conclude, the masculinity of the learner mediated the effect of the speaker’s gender on problem-solving performance, but there were no equivalent effects for the femininity of the learner.

Summary and discussion

The results of Experiment 1 demonstrated—irrespective of the speaker’s gender—that male learners were superior in that they needed less learning time, reported less cognitive load, and performed better in the subsequent problem-solving test. These findings confirm prior research on gender differences in quantitative (mathematical) abilities (Halpern 1992).

With respect to the speaker’s gender, the data revealed a bias in favor of female speakers, who were rated as being more attractive than male speakers. Additionally, learners reported a higher motivation/effort invested in listening to female speakers compared to male speakers. Finally and most important, learners listening to female speakers showed better problem-solving performance. This latter finding on learning outcomes will be referred to as the speaker/gender effect.

The obtained speaker/gender effect cannot be explained by purely cognitive mechanisms, because both female and male voices are equally common and were rated as being similarly understandable. Furthermore, as indicated by the cognitive load data, the processing of a male versus a female voice did not differ with regard to the amount of cognitive resources required. In the following section, alternative explanations for the speaker/gender effect that go beyond purely cognitive considerations will be discussed based on different theoretical accounts, namely from the perspective of SAT and of gender schema theory, as well as from an individual-preferences perspective.

Social agency theory

The fact that female speakers were evaluated more positively than male speakers indicates that for participants in Experiment 1, the gender stereotype “women are nicer than men” might have been active. The findings cannot be explained with purely cognitive assumptions, and the pattern of results with regard to speaker evaluation, motivation/effort, and performance seemed to be well in line with the assumptions of SAT at first glance, in that female speakers were evaluated more positively, led to an increase in learner effort, and improved problem-solving performance. However, there were no significant correlations between the motivation/effort invested in listening to the speaker on the one hand and learning outcomes on the other hand, which contradicts the underlying causal chain proposed by SAT. Thus, we need to look for alternative explanations for the obtained speaker/gender effect.

Gender schema theory

The interaction between the masculinity of the learner and the speaker’s gender indicates that the availability of the gender schema may interact with gender-related social cues. This implies that individual differences may act as a mediator for the impact of social cues on learning outcomes, as the speaker/gender effect only holds for high masculine learners.

Based on this finding, we want to propose an alternative explanation for the impact of speaker gender as a social cue: As discussed in the theoretical section of the paper, the sex-role orientation of a person is associated with the availability of a gender schema and can influence information processing, for example, when making schema-consistent versus schema-inconsistent judgments (Bem 1981). Thus, it may have been that listening to a female speaker, who gives instructional explanations in a mathematical domain, resulted in schema incongruence, as mathematics is generally believed to be a male area of interest. This schema incongruence may have effects that are similar to situations, where a person has to make schema-inconsistent judgments. In both cases, one is confronted with something that does not fit with one’s own gender-related information processing, which may in turn trigger a somehow changed information processing. This changed information processing may then be reflected in learning outcomes. According to gender schema theory, only sex-typed persons with a higher cognitive availability of the gender schema, such as the high masculine learners in Experiment 1, should notice this schema incongruity of information. Thus, information processing should be affected by the speaker’s gender only for sex-typed learners, which may be reflected in the obtained interaction between the masculinity of the learner and the speaker’s gender with regard to problem-solving performance. However, there are at least three open questions with regard to this explanation.

First, it remains unclear how the schema incongruity of multimedia messages may affect information processing and, in turn, improve learning outcomes since there was no significant correlation with the effort to listen to the speaker.

Second, it remains an open question as to why the pattern of results was obtained for high masculine learners, but not for high feminine learners, who can also be considered as being sex-typed (Bem 1981; Markus et al. 1982). A possible explanation may be that the masculine learning domain predominantly addressed the masculinity rather than the femininity of the learners. Additionally, it is often been criticized that Bem’s Sex Role Inventory relates to instrumentality and expressiveness rather than to masculinity and femininity (Hoffmann and Borders 2001; Spence 1993). Thus, it may have well been that a mathematical learning domain may mainly address the instrumentality (i.e., masculinity) of learners, but may be quite neutral regarding the expressiveness (i.e., femininity) of a learner.

Third, if the effects of the gender schema depended on the cognitive availability of this schema, the question would arise as to whether the individual gender schema could be primed also for low masculine learners. If the cognitive availability of the gender schema was the critical mediator, the speaker/gender effect should also appear for low masculine learners under conditions where their gender schema is sufficiently activated.

Individual preferences

Another alternative interpretation is based on individual preferences. It might have been the case that most learners in our study preferred a female speaker for individual reasons and that this preference was somehow more pronounced for high masculine persons due to their higher cognitive availability of the gender schema. Therefore, individual preferences for a female speaker might be a critical variable for the obtained speaker/gender effect and thus, the speaker/gender effect should disappear if learners could choose the speaker for themselves, because in this case each learner’s individual preferences would be satisfied.

To address some of the open questions, in Experiment 2 learners were allowed to select a speaker rather than assigning them to a specific speaker. We used this procedure for two reasons. First, we assumed that the instruction to choose among speakers of different gender would activate the learners’ existing gender schemata by comparing and contrasting male and female speakers. Second, this procedure allowed investigating the potential mediating role of individual preferences in learning from narrated animations.

Experiment 2

If individual preferences mediated the speaker/gender effect, this effect should disappear when learners can choose a speaker for themselves, because, in this case, each learner would be allowed to listen to his or her preferred speaker.

Contrary to this hypothesis, according to proposed explanation based on schema incongruity the speaker/gender effect should appear for all learners irrespective of their masculinity when the gender schema is activated by means of appropriate priming. This priming might result from the opportunity to compare and select between different male and female speakers for presenting the narration, because the learner’s attention is directed towards the gender difference among the speakers.

Method

Participants and design

84 students (42 male, 42 female) of the University of Tuebingen, Germany, participated in the second study for either course credit or payment. Average age was 24.40 years (SD = 4.81). Independent variables were the learner’s gender and the speaker’s gender resulting in a 2 × 2 design. The assignment of the participants to the experimental groups depended on the learner’s gender, and was moreover based on the individual selection of the preferred speaker (see below).

Materials and procedure

The procedure of the experiment was similar to the procedure of the first experiment. However, before starting their interaction with the learning environment, the participants were given the opportunity to listen to the six different speakers from Experiment 1 in order to choose the one that they preferred for presenting the multimedia messages. The participants could click links to retrieve a short voice sample for each speaker. Each voice sample contained the same words spoken by the specific speaker: “Hello, I’m speaker A [B/C/D/E/F]. If you want me to guide you through the following study, then select speaker A [B/C/D/E/F] after this voice presentation.” The links to the voice samples were listed in such a way that a female and a male speaker were always presented alternately. This arrangement was supposed to highlight the different gender of the speakers and thereby prime the gender schema of the participant. The order of the speakers was counterbalanced to avoid order effects.

After participants had chosen a speaker, they began studying the instructional materials, where the worked examples were displayed as animations accompanied by narration presented by the speaker selected earlier. Subsequently, they had to fill out questionnaires for the assessment of the dependent variables. Additionally, they were asked (paper-based) if they would choose this speaker again when given a second choice. This question served as a manipulation check to control whether the learners were still content with their choice. Afterwards, learners had to solve the same eleven test problems as in Experiment 1. Finally, they were asked to fill out a battery of questionnaires that assessed several control variables, including the assessment of participants’ sex-role orientation. Altogether, the study took about two and a half hours.

Dependent variables

In order to get more information about the gender-related perception of the speaker, the speaker evaluation was extended by adding an item that assessed the feminine–masculine dimension of the speakers. High values of this item reflected a high feminine rating, whereas low values reflected a high masculine rating. Besides this modification of the SEI, the same instruments as in the first study were used, including the SIQ-effort, the modified version of the NASA-TLX, the eleven test problems to assess learning outcomes, and learning time, as well as the control variables regarding intrinsic motivation and sex-role orientation.

Results

Because the aim of this study was to test the impact of individual preferences for a specific speaker, only those participants’ data were analyzed, who had been content with their speaker, that is, of learners who would have chosen the same speaker again when given a second choice. For this reason, the data of only 66 learners were analyzed (35 females and 31 males); 18 participants had to be excluded (12 of them had chosen a female voice; 6 of them had chosen a male voice). For means and standard deviations, see Table 3. As in the first experiment the data were analyzed by means of a 2 × 2 ANOVA with the learner’s gender and the speaker’s gender as between-factors. For the analysis of problem-solving performance, again “Abiturnote” and intrinsic motivation were used a control variables.

Table 3 Means and standard deviations as a function of the learner’s gender and the speaker’s gender

Selection behavior

Participants showed a significant preference for female speakers (χ²(1, N = 66) = 10.24, P = .001). A female speaker was chosen by 69.70% (24 male and 22 female learners, n = 46) of all participants, whereas only 30.30% (7 male and 13 female learners, n = 20) selected a male speaker. There was no significant difference for the preferences of male versus female learners (χ²(1, 66) = 1.65, P = .20).

Effects for the learner’s gender

The ANCOVA for learning outcomes revealed a significant main effect for the learner’s gender (F(1, 60) = 3.92, MSE = 456.98, P = .05, η p ² = .06). Male learners achieved higher problem-solving performance. With regard to the other dependent variables, there were no further significant effects for the learner’s gender (all other Fs < 1, except SEI-superiority: F(1, 62) = 1.36, MSE = 0.85, P = .25, η p ² = .02, SEI-attractiveness: F(1, 62) = 3.17, MSE = 0.74, P = .08, η p ² = .05, SEI-dynamism: F(1, 62) = 2.82, MSE = 0.53, P = .10, η p ² = .04, intrinsic cognitive load: F(1, 62) = 1.30, MSE = 5.82, P = .26, η p ² = .02, germane cognitive load: F(1, 62) = 3.64, MSE = 3.77, P = .06, η p ² = .06).

Effects for the speaker’s gender

The 2 × 2 ANOVA revealed a significant main effect for the SEI-subscale superiority (F(1, 62) = 8.67, MSE = 0.85, P = .01, η p ² = .12). Male speakers were rated as being more superior than female speakers. Furthermore, there was a significant difference for the additional item feminine–masculine. Female speakers were seen as being more feminine than male speakers (F(1, 62) = 347.33, MSE = 0.93, P < .001, η p ² = .85). The 2 × 2 ANCOVA for the learning outcomes showed a marginally significant main effect for the speaker’s gender (F(1, 60) = 2.95, MSE = 456.98, P = .09, η p ² = .05). Learners listening to female speakers tended to solve more problems correctly then those listening to male speakers. No further significant effects for the speaker’s gender could be observed (all other Fs < 1, except comprehension of voice: F(1, 62) = 1.02, MSE = 1.02, P = .32, η p ² = .02, learning time: F(1, 62) = 1.62, MSE = 199466.61, P = .21, η p ² = .03).

Interactions between the speaker’s gender and the learner’s gender

No significant interactions between the speaker’s gender and the learner’s gender could be observed (all Fs < 1, except SEI-superiority: F(1, 62) = 3.17, MSE = 0.848, P = .08, η p ² = .05, learning time: F(1, 62) = 1.97, MSE = 199466.61, P = .17, η p ² = .03).

Masculinity of the learner

Analogous to the first experiment, the participants were divided into low and high masculine learners by means of a median split for the masculinity subscale (Mdn = 4.55). The resulting groups showed a comparable proportion of male and female learners (low masculine: 21 female and 17 male learners; high masculine: 14 female and 14 male learners). The 2 × 2 ANCOVA (masculinity × speaker’s gender, with the covariates Abiturnote and intrinsic motivation) showed a significant main effect of the speaker’s gender on learning outcomes (F(1, 60) = 4.97, MSE = 476.72, P = .03, η p ² = .08). Learners listening to a female speaker achieved better learning results than those listening to a male speaker (cf. Table 4). The interaction between the masculinity of the learner and the speaker’s gender failed to reach statistical significance (F(1, 60) = 0.67, MSE = 476.72, P = .42, η p ² = .01). Thus, the speaker/gender effect could be observed irrespective of the masculinity of learners. There was no significant main effect for the masculinity of the learner (F < 1).

Table 4 Means and standard errors for problem-solving performance (in % correct) as a function of the learner’s masculinity and the speaker’s gender (Experiment 2)

Partial correlations

There were no significant partial correlations between speaker-rating and learning outcomes except for the additional assessed single-item feminine–masculine. The higher the perceived femininity of the speaker, the better the learning outcome (cf. Table 5).

Table 5 Partial correlations of the extended sei with learning outcomes (adjusted for abiturnote and intrinsic motivation) in Experiment 2

Discussion

Social agency theory

The findings of the second study were again not in line with the assumed mechanisms of the social agency theory in two ways. First, learning outcomes were not significantly correlated with the evaluation of the speaker except for the item “femininity of the speaker”. Second, while there were (albeit small) differences in learning outcomes depending on the speaker’s gender, these effects on performance were not accompanied by respective differences in motivation/effort invested in listening to the speaker. Moreover, there was no significant correlation between the SIQ-effort and problem-solving performance. Thus, as in Experiment 1, there was no evidence in favor of the causal chain between social cues, motivation/effort, and learning outcomes that is assumed by the SAT.

Individual preferences

The participants’ choice of speaker indicated a preference for female speakers. However, the speaker/gender effect occurs irrespective of whether a learner was given the opportunity to select a speaker (as in Experiment 2) or not (as in Experiment 1), even though it was less pronounced in the second study. Thus, individual preferences cannot explain the speaker/gender effect.

Schema incongruity of information

With respect to the priming of the gender schema, there were several findings of interest. First, the interaction between the masculinity of the learner and the speaker’s gender disappeared in the second study. This finding can be explained by the assumption that having learners choose among different speakers was a successful manipulation for activating the gender schema so that even low masculine learners now noticed the schema incongruity in the multimedia message and thus benefited from listening to a female speaker.

Second, the correlational analyses regarding the speaker evaluation and learning outcomes revealed that only the femininity of the speaker was positively associated with learning outcomes. This might be interpreted as evidence that the more feminine the speaker was perceived, the higher the schema incongruity was and, thus, the more positive the information processing was affected, leading better learning outcomes. These findings give further support for the assumption that the schema incongruity of information may be a critical mediator for the occurrence of the speaker/gender effect.

Summary and general discussion

Based on media equation and social agency theory, two studies on the impact of social cues in narrated animations on learning outcomes were conducted. A salient social cue, namely the speaker’s gender, was manipulated to find evidence for the relevance of social cues in multimedia learning, as well as to investigate the underlying mechanisms for the influence of social factors on learning. In both studies, we found evidence for the impact of social factors on multimedia learning: Learners listening to female speakers achieved better learning outcomes compared to learners listening to male speakers (speaker/gender effect).

As argued in the theoretical section, one would not expect to find any differences for learning from narrated animations presented by either a male or a female speaker from a purely cognitive view. This is in line with the finding that neither intrinsic, nor germane, nor extraneous cognitive load differed with respect to the speaker’s gender. Additionally, learners reported a comparable amount of difficulty in comprehending a male or a female voice. Thus, the speaker/gender effect clearly indicated that a social-motivational perspective needs to be taken into account when designing instructional multimedia messages.

At first sight, the speaker/gender effect detected in Experiment 1 seems to be well in line with the social agency theory, as the study yielded corresponding results for the speaker rating and for motivation/effort with respect to the speaker. However, there was no significant correlation between the effort invested in listening to the speaker and the achieved learning outcomes. Thus, there was no evidence for the causal chain assumed by the SAT.

One alternative explanation for the speaker/gender effect was based on individual preferences for a specific speaker. However, the findings of Experiment 2 with individually chosen speakers provided no support for this assumption.

Another interpretation was proposed based on the gender schema theory. According to this interpretation, it can be assumed that receiving instructional explanations in mathematics presented by a female speaker is incongruent with the gender schema, because mathematics is mostly seen as a male domain. However, according to Bem (1981), only persons with a high availability of the gender schema should notice the schema incongruity of information and thus only for those persons information processing should be affected by it. The data of both experiments fit this interpretation. In Experiment 1, where the speaker was assigned to the learner, the speaker/gender effect held for high masculine learners only. In Experiment 2, where the participants’ gender schema was activated prior to learning by having them choose among speakers of different gender, the results showed a speaker/gender effect for low masculine learners as well. Third, a positive relationship between the perceived femininity of the speaker and learning outcomes was found in Experiment 2, indicating that problem-solving performance improved with the schema incongruity being more pronounced.

Although the speaker/gender effect provides evidence for the importance of social factors in learning, the relationship between cognitive, social-motivational, and performance variables is not clarified yet. The connection between motivation and learning outcomes remains unclear and should be addressed in further studies (for more detailed strategy analyses, see, for example, Scheiter et al. 2009; Vollmeyer et al. 1997).

To sum up, the reported studies demonstrate the significance of voice features for the design of narrated animations. The obtained speaker/gender effect provides strong support for the impact of social factors in learning. Thus, the prevailing purely cognitive approaches should be augmented by social factors. For practical design considerations, the speaker/gender effect suggests using female voices for narrations accompanying dynamic visualizations irrespective of the learner’s gender, at least for a mathematical domain.