1 Introduction

The “Eliza Effect,” named after a computer program that mimicked a psychotherapist, refers to the tendency to anthropomorphize artificial intelligence (AI) (Ekbia 2008). The term alludes to Bernard Shaw’s play Pygmalion and the musical My Fair Lady, in which a Professor Henry Higgins takes it upon himself to teach Eliza Doolittle, a cockney-English speaking flower girl, to act more lady-like. Similar to Professor Higgins, who teaches Eliza how to dress, talk, and behave like a lady, developers of AI use design features to make machines more human-like in order to induce positive impressions and responses.

Soon artificially, intelligent machines with physical features and behaviors that resemble humans are likely to supplement and replace humans in retail and service situations, and in our homes, serving as maintenance and sales staff in stores, receptionists and housekeeping personnel in the hospitality industry, nurses and caretaker in hospitals, and companions at home (van Doorn et al. 2017). Lowe’s LoweBot already helps customers to find products and answer simple questions; Hilton’s robot concierge Connie answers hotel guests’ questions; and some robots (like Pepper) are being sold for private use. It is therefore important to examine the Eliza Effect for these “consumer robots”—that is, how anthropomorphism might affect consumer judgments and attitudes.

In four experiments, we demonstrate that the degree of anthropomorphism of a consumer robot—through physical appearance and behavior—affects judgments and attitudes. We focus on the two fundamental judgment dimensions of social life: psychological warmth and competence (Fiske et al. 2007). We show that making a consumer robot more human-like increases perceptions of warmth while keeping perceptions of competence close to a constant level. Yet, whereas warmth increases liking for human beings, perceptions of warmth decrease attitudes for consumer robots. Based on our findings, we propose that the well-known “uncanny valley” phenomenon—a feeling of uncanniness when robots become to human-like—is due to perceptions of a robot’s enhanced warmth rather than competence.

2 Theoretical background

2.1 Anthropomorphism

Epley (2018) defines anthropomorphism as “perceiving human-like traits in nonhuman agents” (p. 591). Anthropomorphism occurs naturally and automatically in response to a wide variety of stimuli (Epley et al. 2007; Epley and Waytz 2010). In marketing, consumers anthropomorphize brand characters, mascots or avatars (Kim et al. 2016; Nowak and Rauh 2008; Touré-Tillery and McGill 2015), as well as entire brands by attributing a personality and relationship to them (Aaker 1997; Langner et al. 2016; MacInnis and Folkes 2017).

People anthropomorphize non-human entities when they appear similar to humans in physical appearance or behavior. Human-like appearance evokes a human schema, and human-like behaviors lead to attributions of a “mind” (Aggarwal and McGill 2007; Delbaere et al. 2011; MacInnis and Folkes 2017). The role of appearance factors to anthropomorphize objects is a well-known fact in design and the arts (DiSalvo and Gemperle 2003). Physical appearance suggesting a face or a body has also achieved the perception of a human-like appearance of products (Aggarwal and McGill 2007). Regarding behavior, emotionality—the expressive display of emotions in an interaction through body movement, voice, and verbal content—seems critical for creating believable and life-like virtual characters and robots (Bates 1994). Yet, while anthropomorphized robots look as if they are more alive, they can also be perceived as a threat (Duffy 2003; Kiesler and Goetz 2002).

While prior research has demonstrated that robots can be anthropomorphized using human-like appearances and behaviors, one of the key questions for marketers has not been addressed: how do consumers perceive and judge consumer robots when they encounter them in retail and service settings, and in their homes? For example, does the degree of anthropomorphism affect the two social judgments dimensions of competence and warmth? Moreover, are robots judged more positively when they appear psychologically warmer and more competent? Answers to these questions are key for retail and service managers in deciding whether to employ robots instead of human service personnel and for robotics manufacturers in selecting robots to market to businesses and consumers.

2.2 Warmth and competence

According to Fiske et al. (2007), “warmth and competence form basic dimensions that, together, account almost entirely for how people characterize others” (p. 77). The warmth dimension, associated with traits such as caring, nice, and sociable, relates to a person’s sociability, friendliness, and trustworthiness (Fiske et al. 2007). Competence, associated with traits such as capable, competent, and skilled, relates to a person’s ability, intelligence, and skillfulness. While human warmth is a social desirability dimension, competence is a functional and utilitarian dimension (Yzerbyt et al. 2008).

How relevant are competence and warmth for judging consumer robots? Robots are programmed and designed machines that function in a certain way to achieve instrumental and utilitarian goals (Yogeeswaran et al. 2016). Therefore, competence should be a more relevant dimension than warmth. Moreover, because competence is a basic, inherent, and expected feature of robots, competence should be equally relevant, irrespective of the robot’s degree of anthropomorphism. In contrast, on the face of it, ascribing psychological warmth to a machine seems odd. However, once robots are anthropomorphized through human-like physical features or behavior, as is the case with consumer robots, consumers may find warmth increasingly relevant. Consequently, anthropomorphism should differentially affect warmth judgments for robots, depending on their degree of anthropomorphism such that more anthropomorphized robots should be perceived as warmer.

After assessing consumer robots along the two trait dimensions of competence and warmth, consumers are likely to form attitudes toward these robots. In a human context, the more competent and warmer a person is, the more positive the attitudes toward that person will be (Wojciszke et al. 2009; Wortman and Wood 2011). Similar findings have been reported for anthropomorphized brands (Kervyn et al. 2012). However, for consumer robots, we expect a more differentiated effect. For competence, we predict that attitudes will be more positive the more competent a robot is because robots are expected to be functionally capable agents. However, our prediction is different for warmth. We base our prediction on the so-called uncanny valley phenomenon: as robots become more human-like, individuals first show positive affinity toward them; however, when robots become too human-like, people experience a feeling of eeriness or uncanniness (Mori 1970; Wang et al. 2015), especially when people ascribe to them experience rather than agency (Gray and Wegner 2012). We thus expect that when consumers anthropomorphize robots and experience them as increasingly warm, in appearance or behavior, consumers view this increase in warmth as positive at first. However, once a robot becomes even warmer and thus too human-like, an uncomfortable feeling of uncanniness should set in and result in less positive attitudes.

Figure 1 summarizes our conceptual framework, illustrates the sequence of studies, and shows the visual stimuli of studies 1 and 3. The first part of the current research focuses on investigating the effect of anthropomorphism (via appearance and behavior) on warmth and competence judgments, specifically suggesting that anthropomorphism leads to a significant increase in warmth judgments (indicated by the solid line), but not competence (indicated by the dotted line). The second part of the research focuses on consumer attitudes, demonstrating that anthropomorphized robots (via appearance and behavior) are liked less when they are perceived as too warm, whereas they are liked more as their perceived competence increases.

Fig. 1
figure 1

Conceptual framework and sequence of studies

3 Study 1

Study 1 was conducted to show that anthropomorphism of robots in appearance affects warmth, but not competence. We recruited 106 participants (35% female; 37% 21–30, 43% 31–40, 11% 41–50, 9% 50 and above) from MTurk to judge one of three real robots (Ethon 2, Pepper, and Erica; from left to right in Fig. 1). Per a pretest (N = 202), the robots differed in human likeness in their appearance (Ms = 1.56; 3.81; 7.13, from left to right; 0 = “not at all human-like”; 10 = “very human-like”; all ps < .001). Participants rated the robot in terms of how relevant ten human traits seemed to be (1 = “not at all relevant”; 7 = “extremely relevant”), presented in random order: five warmth-related traits (sociable, friendly, kind, likeable, and warm) and five competence related traits (competent, intelligent, skillful, efficient, and capable) (Cronbach’s alphas = 0.95 and 0.93, respectively).

A 3 (robot human-likeness) × 2 (trait dimension) ANOVA revealed two main effects and the predicted interaction. Regarding the trait dimensions main effect (F(1,103) = 15.43, p < .001), competence means were higher than warmth means (Mcomp = 5.25, vs. Mwarmth = 4.62), suggesting that, as expected, competence judgments are relevant than warmth judgments for robots. Turning to the robot human-likeness main effect (F(2,103) = 3.83, p = .025), the robot means in the low and medium group (Mlow = 4.48 vs. Mmedium = 5.19, p = .04) and low and high group (Mlow = 4.48 vs. Mhigh = 5.13, p = .07) differed considerably but the medium and high groups hardly at all (Mmedium = 5.19 vs. Mhigh = 5.13, p = 0.8).

Most importantly, these main effects were qualified by the significant interaction, F(2,103) = 9.84, p < .001, whereas there was no significant difference for competence ratings among the three robots (F(2,103) = 0.68, p = .51, Mlow = 5.27 vs. Mmedium = 5.44 vs. Mhigh = 5.05), warmth ratings differed significantly (F(2,103) = 10.50, p < .001, Mlow = 3.69 vs. Mmedium = 4.95 vs. Mhigh = 5.22). Specifically, the high vs. low group and the medium vs. low group differed significantly (both ps < .001); the high and medium groups did not (p = 0.4). In sum, human-likeness in the appearance of robots affected warmth but not competence judgments.

4 Study 2

Holding appearance constant, we investigated how anthropomorphized behavior affects warmth and competence judgments. One hundred and twelve participants (41% female; 39% 21–30, 29% 31–40, 14% 41–50, 18% 50 and above), recruited from MTurk, participated in the experiment taking the form of a 2 (anthropomorphized behavior: high vs. low) × 2 (trait dimension: warmth vs. competence) design. Participants were randomly assigned to one of the two conditions of anthropomorphism and watched a short video of a casual conversation between a person and Nadine (a humanoid developed by a robotics team at Nanyang Technological University). The human interacting with Nadine asked the same questions and provided the same answers. We manipulated Nadine to show emotionality in the high condition and to express little or no emotion in the low condition, by using a programmed, plug-in emotionality module on top of a natural language processor (see links for videos: https://youtu.be/IqXtITebLWc, https://youtu.be/1fhWUXfzs30). In the high anthropomorphism condition (but not the low condition), Nadine moved her hands, nodded, and changed body position, used a higher pitch, and expressed her emotions more strongly in speech. After watching the video, participants responded to attention-check questions. Six participants (5%) who did not provide correct answers were removed. Then, participants indicated Nadine’s human-likeness and rated the degree to which Nadine possessed competence and warmth, using the same scales in a random order (Cronbach’s alphas = 0.92 and 0.93, respectively).

The manipulation of behavioral anthropomorphism affected human-likeness (F(1,104) = 4.65, p = .03; Mhigh = 6.05 vs. Mlow = 5.52). The 2 × 2 ANOVA revealed a marginally significant main effect of anthropomorphism (F(1,104) = 3.28, p = .07); both warmth and competence judgments were generally higher when anthropomorphism was high than low (Mhigh = 5.21 vs. Mlow = 4.77). The main effect of trait was insignificant (F(1,104) = 0.25, p = .62), likely due to the very human-like face of the android Nadine.

Most importantly, there was a significant interaction, F(1,104) = 11.65, p < .001: as predicted, the anthropomorphism of the robot significantly affected warmth judgments (t(104) = − 2.84, p = .006), but not competence judgments (t(104) = −.48, p = .63). Whereas competence means were barely affected (Mhigh = 4.95 vs. Mlow = 5.07), participants rated the anthropomorphized Nadine as warmer than the less anthropomorphized Nadine (Mhigh = 5.34 vs. Mlow = 4.58), thus indicating the behavioral effect of anthropomorphism on warmth judgments.

5 Study 3

In study 3, we moved from perceptions of warmth and competence toward consumer attitudes. We show that, unlike humans, anthropomorphized consumer robots are liked less when they appear to be high in warmth, thus supporting the pattern of the uncanny valley hypothesis (Fig. 2).

Fig. 2
figure 2

Attitudes toward consumer robots

We randomly assigned 205 MTurkers (44% female; 43% 21–30, 35% 31–40, 12% 41–50, 10% 50 and above) as part of a 3 (anthropomorphism: more human-like robot vs. less human-like robot; plus a human control condition) × 2 (trait dimensions: competence vs. warmth) between-subject design. Participants imagined that they would employ a person (or a robot) as a personal helper in their home. To become familiar with the situation described, participants saw a short video with three examples of humans or robots as helpers. Then, each participant was presented with the target picture of a human or a more (less) human-like robot (see Fig. 1). A designer rendered all the stimuli based on the same human face. Participants imagined that they would “spend approximately 3–4 hours a day with the helper.” In the competence condition, the helper was described as “capable and competent of completing specific tasks, such as cleaning and doing the laundry and ironing.” In the warmth condition, the helper was described as “warm and loves being around people. The helper loves to talk to you and your family.”

A pretest with 55 participants validated the stimuli. Participants rated the competence description as higher in competence than warmth (p < .001); conversely, the description for warmth was rated higher in warmth than competence (p < .001). Most importantly, uncanniness was significantly higher for the more human-like robot (M = 4.35, SD = 1.67) than the less human-like robot (M = 3.78, SD = 1.90) (t(54) = − 2.06, p < .05) and humans (M = 2.37, SD = 1.93; t(54) = 5.09, p < .001).

In the study proper, participants rated the helper’s competence and warmth on two-item scales (intelligent and smart; kind and friendly) as a manipulation check. Participants then indicated their overall attitude toward the target on a six-item scale including the three items used in study 2 and three additional items focused on a longer-term relationship (Cronbach’s alpha = 0.92).

The manipulation check confirmed the pretest results: the manipulation of competence vs. warmth affected the respective competence (F(1,204) = 2.87, p < .05) and warmth measurements (F(1,204) = 3.90, p < .05). The 3 × 2 ANOVA on the attitude measures revealed the predicted two-way interaction (F(2,199) = 3.62, p < .05). For more human-like robots, consumer attitude was higher (F(1,64) = 6.38, p = .01) when competence traits were given (Mcomp = 5.33, SD = 1.41) compared to when warmth traits were given (Mwarmth = 4.44, SD = 1.56). However, for less human-like robots and for humans, the means were essentially the same for competence and warmth (Mcomp = 5.31, SD = 1.13 vs. Mwarmth = 5.56, SD = 1.35; F(1,81) = 0.01, p = .91 for less human-like robots; Mcomp = 4.89, SD = 1.31 vs. Mwarmth = 4.93, SD = 1.24; F(1,51) = 0.09, p = .77 for humans). In sum, as predicted, while the effect of warmth and competence replicated in the more human-like robot condition, the effect was attenuated when uncanniness was not present, suggesting that uncanniness affected the relationship between warmth and consumer attitude.

6 Study 4

Focusing on behavior rather than appearance, we conceptually replicated the results of study 3. In addition, we used uncanniness as a mediator to provide further evidence for the uncanny valley hypothesis as an explanation of the negative effects of warmth on attitudes.

We strictly adopted the design and procedures of the well-known Judd et al. (2005) study in our context, using a 2 (target: human vs. consumer robot) × 2 (dimensions: competence vs. warmth) × 3 (levels: high vs. med vs. low) between-subject design. That is, 484 MTurkers (41% female; 44% 21–30, 34% 31–40, 13% 41–50, 9% 50 and above) were randomly assigned to judge a human or a robot for either competence or warmth behavior for one of three levels. To select behaviors that are diagnostic of competence and warmth, we modeled our descriptions after the descriptions exhibited in appendix in Judd et al. (2005). We made sure that the descriptions seemed plausible to both humans and robots. For example, to manipulate low vs. medium vs. high behavioral competence, the robot or human were described as “barely able to” vs. “able” vs. “perfectly able” to complete a task, do calculations, or communicating in language. To manipulate low vs. medium vs. high behavioral warmth, they were described as having a “basic,” “above average,” or “very high” level of sociability regarding friendliness, working with others, and consideration of the emotional needs of others. Finally, participants provided attitude ratings toward the human vs. robot targets (bad-good, negative-positive, dislike-like) and completed a measure of uncanniness (uneasy, unnerved, creeped out) (Gray and Wegner 2012). A pretest (N = 48) indicated that the stimuli descriptions were successful: consumers viewed the descriptions as indicative of low vs. medium vs. high-level manifestations of competence and warmth. On 7-point scales, all high-level descriptions for both competence and warmth had higher means than all medium-level descriptions (all p < .05), and all-medium level descriptions had higher means than all low-level descriptions (all p < .05).

A 2 × 3 × 2 ANOVA revealed a significant main effect of target (Mhuman = 5.10 vs. Mrobot = 4.63; F(1,472) = 14.25, p < .001), of the levels (Mlow = 3.71 vs. Mmed = 5.40 vs. Mhigh = 5.47; F(2,472) = 89.11, p < .001), and of trait dimensions (Mcomp = 4.71 vs. Mwarmth = 5.00; F(1,472) = 5.50, p = .05) as well as a significant two-way interaction of trait dimension by levels, F(2,472) = 6.02, p < .01); the two remaining effects were not significant (p > .18). However, all of these effects were qualified by the marginally significant three-way interaction (F(11,472) = 2.50, p = .08).

As shown in Fig. 3, for humans, for both competence and warmth, attitudes were understandably less positive for low levels of competence than medium and high levels (Mlow = 3.94 vs. Mmed = 5.11 vs. Mhigh = 5.55 for competence; Mlow = 3.69 vs. Mmed = 4.95 vs. Mhigh = 5.22 for warmth). The means differed for low levels significantly from medium and high levels (t(83) = 6.60, p < .001, t(83) = 6.75, p < .001, respectively, for competence; t(78) = 3.97, p < .001, t(78) = 5.22, p < .001, respectively, for warmth) but did not differ for medium vs. high levels (t(81) = 1.41, p = .16) and t(83) = .30, p = .97, respectively). Consumers thus prefer humans to be competent and warm up to a certain level, after which there is only a slight, but non-significant increase in attitudes toward those humans. In addition, attitudes were more positive for warmth than competence at the medium level (t(77) = 3.08, p < .01), but not for low and high levels (t(75) = .28, p = .78 and t(87) = 1.44, p = .16, respectively).

Fig. 3
figure 3

Mediation effect of uncanniness

In contrast, for consumer robots, the higher the level in competence, the more positive the attitudes (Mlow = 3.36 vs. Mmed = 4.83 vs. Mhigh = 5.18), with each mean significantly different from each other mean (t(116) = 5.05, p < .01 for low vs. medium; t(116) = 7.30, p < .001 for low vs. high; and t(116) = 2.30, p < .05 for medium vs. high levels). Thus, there was no plateau level: more competence was generally more valued for robots. However, for warmth, as we had predicted based on the uncanny valley, attitudes increased from low to medium levels but then decreased from medium to high levels (Mlow = 3.52 vs. Mmed = 5.72 vs. Mhigh = 4.84). All comparisons were significantly different (t(117) = 6.73, p < .001 for low to medium; t(117) = 2.73, p < .01 for medium to high; and t(117) = 4.38, p < .001 for low to high). In addition, overall consumer attitudes were more positive for warmth than competence in the medium level (t(71) = 2.90, p < .01) but less positive in the high level condition (t(81) = − 2.37, p < .05); attitudes did not differ for the low level (t(81) = .49, p = .62).

Importantly, uncanniness mediated the relationship between warmth and attitude toward robots. To test this mediation effect, a bootstrapping method was employed (Hayes 2012). The mediation effect of uncanniness in the relationship between warmth and attitude was significant (95% CI − 0.27, − 0.01), not including 0 in the confidence interval.

7 General discussion

7.1 Theoretical contributions

Given the rapid emergence of consumer robots in the marketplace, this project investigated how consumers judge and evaluate robots in commercial situations. Overall, results showed that consumers are subject to the Eliza Effect by systematically attributing human psychological characteristics to consumer robots based on their appearance and behavior, which seem not rationally justifiable to do so for a machine. Interestingly, anthropomorphism leads to differential effects on the two fundamental judgment dimensions of social life: anthropomorphizing robots affects warmth but not competence judgments. Moreover, consumer perceptions of warmth decrease attitudes toward social robots, due to a feeling of uncanniness—a finding that is opposite to the effects of warmth on attitudes for human beings.

The current research contributes to three research streams: artificial intelligence (AI), anthropomorphism, and the uncanny valley. AI research focuses mostly on non-embodied programs studying phenomena like consumer trust and algorithm aversion (Dietvorst et al. 2018). The present project is the first that focuses on the effects of the physical embodiment of AI, namely robots, on judgments, and attitudes. While prior research showed that robots can be anthropomorphized by manipulating appearance and behavior (Bates 1994; Kiesler and Goetz 2002), we demonstrate that anthropomorphism has differential effects: it changes perceptions of warmth, but not competence, which leads to negative attitudes. Moreover, we extend prior research on anthropomorphism in marketing, showing that different types of consumer reactions—specifically, a feeling of uncanniness—seem to be at play for robots than for anthropomorphized products and brands (e.g., Aggarwal and McGill 2007; Kim et al. 2016). Finally, the present research contributes to the uncanny valley phenomenon that occurs when robots seem too human-like. Various explanations for the uncanny valley phenomenon have been proposed (Wang et al. 2015; MacDorman and Ishiguro 2006), albeit rarely tested, including mortality salience (“human replicas may remind humans of death”) and evolutionary esthetics (“selection pressures have shaped human preferences for certain physical attributes”). Our results are most consistent with the mind perception hypothesis, which states that feelings of uncanniness result from attributing a mind to robots (Gray and Wegner 2012). We add specificity to this explanation by showing that it is not mind attribution in general but the emotional trait of warmth (rather than the cognitive trait of competence) that evokes uncanniness.

7.2 Future research

We have concentrated on examining reactions to robots along the two most fundamental social judgment dimensions and along general consumer attitudes. Future research should determine how anthropomorphized robots might also affect other pertinent judgments and attitudes. We would expect that anthropomorphism has positive effects on a wide variety of emotional judgments (e.g., empathy, responsiveness, and connection) but that it would affect cognitive, utilitarian judgments (e.g., credibility, usefulness, and impact) much less because the former is less expected from robots and thus more malleable. Similarly, we would expect the so-called affective component of attitudes to be more impacted than the “cognitive” component. It may be instructive to study what functions attitudes toward consumers robots serve. Following Katz’s (1960) classic functional theory of attitudes, future research should study to what degree attitudes toward consumer robots serve utilitarian, socially adaptive, value-expressive and ego-defensive functions. Based on our finding that robots that act too warm are viewed as eerie and uncanny, we expect that attitudes toward consumer robots may be largely ego-defensive, perhaps even “species-defensive.”

In addition, future research should address the underlying features of human-likeness and how they may determine impressions of competence and warmth. In terms of appearance, such features may include the human-likeness of different body parts (such as legs, and arms) and the head and face as well as proportion and symmetry; in terms of behaviors, they may include certain gestures, movements, and voice qualities. Indeed, research has shown that certain features lead to higher human-likeness than others (Phillips et al. 2018); yet, more research is needed to understand what exactly makes a robot human-like. Moreover, in service contexts, future research should examine the possible impact of warmth perceptions on competence perceptions rather than treating warmth and competence as two independent dimensions. Finally, research should further delve into the similarities and differences in anthropomorphizing robots compared to products or brands. For example, do the brand-personality dimensions (Aaker 1997) or the big five personality traits (Norman 1963) apply to consumer robots, and what would trigger such personality perceptions?

Finally, from a broader perspective, it will be worthwhile examining whether an opposite process to humanness and anthropomorphism—say, a process of “machine-ness” and “technologism”—may equally be at play in robot perceptions and, for example, be relevant for the marketing of industrial robots that increasingly work together with humans on assembly lines. Such an opposite process may affect competence while keeping warmth constant. That is, an industrial robot may be judged as more competent (but not as more or less warm) when it emphasizes machine-like appearance factors, such as exposed wiring and blinking screens, or machine-like behaviors, such as rotations and detailed handcraft skills, just as anthropomorphism, induced by human-like features and behaviors, affected warmth (but not competence) in our studies.

7.3 Practical implications

As consumer robots are set to take on the roles of humans, and even replace them, a challenge for firms is which robots to employ in stores and service situations and to market to consumers for use in homes. Prior work has suggested that human-like robots are valuable because the overall service experience will be more positive and increase customer satisfaction (van Doorn et al. 2017). That is, just as Eliza, the psychotherapist computer program, made people feel better 50 years ago, consumers of the future may feel better being serviced by Erica or Nadine, Eliza’s modern-day robot equivalents. However, our research has added an important caveat to this suggestion: managers need to make sure that consumer robots are not made too human-like because consumers may not enjoy being serviced by such a robot. We suggest that managers and researchers collaborate to further illuminate the puzzle of Eliza in the uncanny valley by determining the optimal level of anthropomorphism for consumer robots.