1 Introduction

Android robots are gradually entering our social lives. In recent years, the public has widely accepted social robots. Their applications range from social services to children’s education [1], medicine, and many other fields [2]. Humans can interact effectively with toys that bear little resemblance to themselves. However, the more human-like the humanoid robots become in their shape and behavior, the worse the harmony between humans and humanoid robots. Robots with high levels of human-likeness may cause fear, alienation, and discomfort in humans [3]. This phenomenon is called the “Uncanny Valley” theory. It affects people's understanding and acceptance of robots as social partners and impacts people's conscious assessment of their reactions in the social interactions between humans and robots [4, 5].

Non-verbal behaviors are vital channels for robots to express emotions [6], transmit information, and communication meaningfully with humans [7]. In particular, the eye system plays a critical role in human–robot interactions involving the active vision system of robots [8]. Kozima [9] suggested that the key to building an empathic robot is enabling it to perform two functions: maintaining eye contact and engaging in joint attention. Eye contact or gazing has significant effects on interpersonal communication. It is a significant psychological factor in non-verbal interactions [10, 11]. Eye language (e.g., gaze and glance) has been widely used in human–robot interactions [12, 13]. Pupillary change (PC) has become an important means of achieving desired expressions and communication effects [14, 15].

Currently, PC applications have broadened from human-to-human communication to human-to-computer interactions. First, PCs can be used as input signals to manipulate control and commands. For example, humans can “write with the mind” by tagging pupils with attended luminance-flickering objects [16, 17], and PC is an effective means of communication for patients with locked-in syndrome [18]. Second, PCs can also be used as output signals to enhance emotional communication. For example, a pupil response system using hemispherical displays has confirmed that the dilated pupil improves emotional conveyance compared with non-dilated ones [19].

As social robots and humans begin to coexist and work cooperatively, natural human–computer interaction, with implicit communication channels and a degree of emotional intelligence, become increasingly important [8]. Furthermore, social robots are expected to behave emotionally [20], humanly, bringing pleasant and comfortable experiences to users [21]. Therefore, it was of importance to investigate whether PC use is helpful to improve emotional conveyance between humans and robots.

2 Literature Review

Robots have been gradually anthropomorphized in physical shape, behaviour, and interaction with humans [22, 23] due to the application of human psychology and technologies such as artificial neural networks. People sympathize with and even trust robots that can recognize their emotional states and respond fully [24]. A certain degree of human familiarity with robots must be maintained to avoid unnatural feelings when a robot is designed to represent a living human [25, 26]. Mainly, the Uncanny Valley theory affects people's recognition of robots as social partners [5]. A cluster analysis of 40 different humanoid robots found that a high-level humanoid (or low-mechanized) robot could even be threatening to humans [26]. People's judgments about the affinity and friendliness of robots are influenced by the similarity between robots and humans [27]. In particular, the Uncanny Valley effect occurs when the resemblance of a robot to humans approaches almost, but not exact levels [5]. This phenomenon mainly originates in human’s instinct of protecting themselves from proximal dangers (such as corpses and other species and accessible entities), fear of inanimate objects, and aversion to unhealthy bodies and even disease [4, 28] and human's ability to empathize [29]. Mathur et al. [27] estimated the curve of the Uncanny Valley using a novel, validated corpus of 182 images of real robot and human faces. In the interactions between users and embodied agents, affect-based trust toward artificial agents strengthened Uncanny Valley impressions [30]. In this case, slight imperfections of robots could be deeply disturbing, even distasteful [31].

The Uncanny Valley also happens in nonverbal behavioral interaction between humans and robots, such as face tracking, gaze shifting, nodding, and gestures [32]. Moving away or approaching an individual who has expressed a certain emotion can increase robot acceptance [33]. Social robots could express emotions through various modalities, such as facial expression, body posture, movement, and voice [34]. People might understand a robot's expression and emotion without a fully expressive human-like body, movable torso, legs, or arms [35, 36].

Research on PC applications with robots has attracted researchers’ attention in recent years [37, 38]. PC can be used as a cue in competitive interactions [38], and it is a potential source of information in emotional and decision-making contexts during interpersonal interactions [38]. Human pupils respond to light and the environment, reflect a mood, and are affected by strong arousal stimuli [39]. Pupil sizes can reflect emotions [15].However, pupil manipulations may result in trust risk [40]. In addition, humans process only socially relevant cues by monitoring others’ pupil size automatically and unconsciously [15, 38, 41, 42]. Significantly, diminishing pupil size enhances emotional intensity ratings and valences for sad but not happy or neutral expressions [38, 41]. The range of pupil size is approximately from 2–4 mm in daylight and 4–8 mm in the darkness [43]. Otherwise, dramatically changed pupil size frequently indicates abnormal conditions such as disease [44, 45] and drug use [46]. For robots, findings on PC are more favorable than for humans. For example, the movement of a robot’s eyelids and eyeballs system can effectively express emotions [8]. Some studies demonstrated that dilated pupils and a laughing response effectively enhanced empathy [37]. A hemispherical robot PC could convey emotion without other facial expressions or body movements [19]. A pet-robot with a PC that dilates 2.0 times in diameter along with body contact could enhance cuteness and familiarity and cause little sense of oddness or uncanniness [47].

However, it is not clear whether PCs applied to the eyes of robots affect human’s feelings of familiarity and emotions and whether human-likeness levels influence an emotional effect. Therefore, in this study, we evaluated the emotional effects of agents with different human-likeness levels on humans related to feelings of familiarity and conveying emotional states between the agents with and without PC.

3 Hypotheses

The emotional conveyance of PC has been proved effective on specific robots [48]. Meanwhile, conveying human emotion with a robot is related to its human-likeness level according to the theory of Uncanny Valley [5, 27, 35]. Therefore, we assume that the PC of agents, including robots with different human-likeness levels and humans, as visually and socially salient events, result in varying effects in familiarity and emotional conveyance.

H1

The feeling of familiarity is better for PC of an agent with a lower human-likeness level.

H2

Emotional conveyance is better for PC of an agent with a lower human-likeness level.

The independent variables were PC of agents and levels of human-likeness, and the dependent variables were the feeling of familiarity and emotional valence. Familiarity was based on existing social models, which have been proven useful in understanding robots’ behavior to make complex behaviors familiar and understandable when observing them [49, 50]. Emotional conveyance was measured by valence, which reflects the mental state and signals to the observer in a specific situation in face-to-face communication [51]. Positive and negative emotions in interpersonal relationships are often regarded as indicators of the quality of social interaction [52].

4 Methodology

Images of five agents, one human and four typical robots with different human-likeness levels, were used in the experiment. First, participants communicated with the eye area of the five agents with and without changing pupil sizes through a video. The emotional effect was then evaluated by measuring the indices of dependent variables.

4.1 Participants

A total of 62 college students, 31 males and 31 females, with ages ranging from 18 to 30 years (mean value = 23.75 SD = 1.8), were study participants. Thirty-one of them were in the experimental group (agents with PC), and 31 were in the control group (agents without PC). According to their self-report, they had normal cognition, communication, vision, and hearing abilities. Each had vision or corrected vision above 0.8 and had sufficient rest before the experiment to avoid fatigue and anxiety. The experiment could be terminated immediately if a participant felt uncomfortable. All participants signed a written informed consent form prior to the experiment.

4.2 Preparation

The experiment was performed in a laboratory. The experimental program was compiled in an online platform Wenjuanxing, a professional online platform for questionnaire surveys, evaluation, and voting. The participants were asked to use a 10.2-inch tablet with a screen refresh rate of 60 Hz and a pixel resolution of 2160 * 1620 and to ensure that they were in a quiet room with an equivalent environmental illumination of indoor lighting on sunny days. The monitor needed to be free from glare. The elevation angle of the screen was 100°, and the angle from the participant’s eyes to the screen was 30°. The participants sat approximately 0.7 m away from the display, and they were encouraged to maintain a fixed distance and stay comfortable throughout the experiment. If any participants required eyeglasses, they wore them throughout the experiment. We checked the audio and video devices before the experiment to ensured they worked.

4.3 Stimuli

According to Mathur’s mechano-human spectrum [27], four types of existing typical humanoid robots, namely Smart, NAO, FLOBI, SAYA, and one actual human face image, were extracted (Fig. 1). The natural iris colors of NAO, FLOBI, and humans are blue, with the pupils in black, while the colors of the iris and pupil of Smart are the opposite: black and blue. The iris and pupil of SAYA were originally brown and black but were changed to blue and black to avoid the potential impact caused by the color difference among the images. In addition, the color blue has no cultural bias among the participants. Finally, the faces and pupil sizes of the robots and humans were modified to represent the PCs.

Fig. 1
figure 1

Four faces of humanoid robots with different human-likeness levels and a human face

Pupil size generally ranges from 2 to 8 mm [53]. According to a previous experiment, using hemispherical displays, the exaggeration of 1.5 times of normal pupils effectively enhances emotional conveyance in the pupil response system [19]. Therefore, in the experiment, the pupils dilated to 1.5 times their original diameter and constricted back to their initial size in four seconds, animated by Adobe Photoshop. For example, the original size of the human pupil was 22 px × 22 px, then the dilated size was 33 px × 33 px, and the image size was 474 px × 270 px. Before the video test, whole-body pictures of the robots were shown to the participants to familiarize them with the appearance of humanoid robots. Figure 2 shows the eye areas presented to the participant during the video test.

Fig. 2
figure 2

The pupils used in the experiment

Video stimuli were used to simulate the interaction with practical agents, to avoid the potential influence caused by other parts of the robot or human body. A human–computer interaction environment was built by playing five 27 s videos with vocal sounds to the participants as interactions between a robot and human usually are multi-channel, including visual and auditory communications [7, 54]. The videos consisted of pictures of the eye areas of all agents and the voice of an introduction to a robot dubbed by SIRI (Speech Interpretation & Recognition Interface) instead of a natural voice because the intonation of a natural voice generated by humans may cause a fluctuation of emotions [55]. The introduction to a robot was with a neutral voice for each agent, of which the content, identical for each video, was quoted from Wikipedia in English without any emotion arousing words. The dubbed voices' purpose was to help cover potential environmental noise interference that might distract the participants’ attention. The videos were produced by the authors using Adobe Premiere software. Two different five-video series were played to the two participant groups, respectively. The experimental group watched five videos of agents with PC. Five videos without PC were shown to the control group. In each video for the experimental group, the pupil diameter of the agent’s eyes expanded to 1.5 times of the normal size within five seconds following a linear increase, then stayed dilated for 17 s, finally constricting back to the initial size within five seconds linearly. The background luminance was 50 cd/m2.

4.4 Experimental Procedure

In the experimental group, thirty-one participants were asked to communicate with five videos with PC individually. First, the participants were shown the full-body images of the four robots for about one minute to get a complete-picture of all the robots. Second, to simulate a stimulation on an agent, each participant was required to tap the agent’s face on the screen to start the video. The agent began to dub the introduction when the screen was tapped. The five videos were played to the participants in random order. After each video, the participants were asked four questions about the feeling of familiarity and one about the emotional reaction to the video. Each question provided five options for the participants to choose from to indicate a specific rating by typing “1/2/3/4/5” (Fig. 3). Their answers were automatically recorded by the Wenjuanxing. Afterward, the participants could press any key to start the next video. Figure 3 shows a portion of the experimental process. The process was completed in approximately five minutes by each participant.

Fig. 3
figure 3

A portion of the experimental process

In the control group, the other 31 participants were asked to communicate with five videos without PC attentively. They were then asked to score semantic scale questions after viewing each video. The questions were the same as in the experimental group.

4.5 Measurement

4.5.1 Feeling of Familiarity

The familiarity measurement model of humans’ reaction to PC of agents was established by applying the social model of the Uncanny Valley [4, 56]. The model structure is shown in Fig. 4 according to the connotation of the Uncanny Valley theory. This model analyzed the feeling of familiarity when observing an agent with changing pupils with four psychological measurements (Table 1):

  • \({x}_{1}\): indicating feelings about the animation level of the objects, with grades ranging from “inanimate (1)” to “animate (5)”.

  • \({x}_{2}\): indicating feelings about the normality level, with grades ranging from “eerie (1)” to “normal (5)”.

  • \({x}_{3}\): indicating the level of understanding of social cues, with grades ranging from “incomprehensible (1)” to “comprehensible (5)”.

  • \({x}_{4}\): indicating the emotional empathetic resonation level, with grades ranging from “not evocative (1)” to “evocative (5)”.

Fig. 4
figure 4

Measurement model of feeling of familiarity

Table 1 Five-level semantic differential scale for psychological measurements of the feeling of familiarity

In the experimental group and the control group, the questions, in Chinese, to be rated from “1” to “5”, were translated as follows.

  • Do you think the agent is animate or inanimate?

  • Do you think the agent is weird or normal?

  • Do you think the video is comprehensible or incomprehensible?

  • Do you think the agent is talking with you or not?

Table 1 lists the semantics of each scale.

Each psychological factor weighs differently on the feeling of familiarity with agents. Thus, the fuzzy analytic hierarchy process method [57] was adapted to calculate the scale weights. Weights were assigned according to the strength of the correlation and importance between different mental quantities and perception. The fuzzy judgment matrix was obtained by comparing the internal factors in pairs:

$$A={({a}_{ij})}_{n\times n}$$
(1)
$$ a_{ii} = 0.5,\;i = 1,\;2,\;3,\;4; $$
$$ \,a_{ij} + a_{ji} \, = {1},{\text{ i}}\, = \,{1},{2},{3},{4}, $$

where \({a}_{ij}\) is the combination of the four factors \({x}_{1}\),\({x}_{2}\),\({x}_{3}\) and \({x}_{4}\). For example, \({a}_{12}\) is made up of \({x}_{1}\) (“animate or inanimate”) and \({x}_{2}\) (“normal or eerie”). Different \({a}_{ij}\mathrm{s}\) are assigned according to differing degrees of importance [57]. For example, 0.5 means xi is the same important as xj, 0.6 means xi is slightly more important as xj, 0.7 means xi is relatively more important than xj, 0.8 indicates xi is obviously more important than xj, 0.9 means xi is extremely more important than xj, while 0.1 ~ 0.4 indicate decreasing importance, in a parallel fashion.

Participants were asked to rate the psychological factors in pairs based on their degrees of importance and relevance. A more important psychological factor should have been given a higher score than others. The sum of the two paired values, such as the result of aij + aji should equal one.

Therefore, the weight (\({W}_{i}\)) and feeling of familiarity T were calculated as follows:

$$ W_{i} = \frac{{\mathop \sum \nolimits_{j = 1}^{n} a_{ij} + \frac{n}{2} - 1}}{{n\rm{}{\text{n - 1}}\rm{}}}{\text{n}} = {1},{2},{3},{4} $$
(2)
$$ T_{pc} = \mathop \sum \limits_{i = 1}^{n} X_{i} W_{i} \;{\text{n}} = {1},{2},{3},{4} $$
(3)

where \({W}_{i}\) represents the weight of each factor; \({X}_{i}\) represents the score of each factor, and Tpc represents the final score of the feeling of familiarity, pc = 1 if there was a PC, and pc = 0 if there was no PC.

The calculation results revealed that \( W_{1} = 0.23,W_{2} = 0.28,W_{3} \)=0.21, and \({W}_{4}\)=0.28.

The calculation of \({X}_{i}\) in Eq. (3) was based on participants’ ratings of different perceptual indexes/factors shown in Table 1. The result Tpc was the sum of the scores of each factor multiplied by the weights (Eq. 3). Therefore, the T0 value indicates the feeling of familiarity ascribed to the agents without PC in the video and the T1value indicates the feeling of familiarity ascribed to agents with PC.

4.5.2 Emotional Conveyance

The emotional conveyance of PC for agents was measured by the indexes of positive and negative responses in the experiment [52]. We used a semantic differential scale to rate the degrees in each index. The scores from negative to positive were 1, 2, 3, 4, and 5, where “1” meant the most negative and “5” meant the most positive.

In the experiment, the question was in Chinese, and the English translation was as follows:

  • Does the agent in the video make you feel negative or positive?

5 Results

Thirty-one valid experiment responses were obtained and analysed for the experimental group and control group. We used the SPSS software for analyzing the results.

All data were checked using the PauTa criterion test [58] to ensure that the values of each group were within the range of ± 3σ mean and passed the homogeneity variance test. No outliers were found in datasets. Two-way ANOVA (analysis of variance) was used to analyze the results then evaluate whether there was a statistical difference among agents.

5.1 Feeling of Familiarity

Each psychological factor’s scoring data of from the two groups were analyzed using a two-way ANOVA. The evaluation values T0and T1 (see Eq. 3) were compared after each factor was assigned a weight (Table 2 and Fig. 5). There were significant differences in the influence of human-likeness levels (p < 0.001, partial ηp2 = 0.161) and PC (p = 0.001, partial ηp2 = 0.039) on familiarity, while the interaction effect was not significant (p = 0.797).

Table 2 Two-way ANOVA of feeling of familiarity
Fig. 5
figure 5

Means of feeling of familiarity

PC affected the feeling of familiarity in the interaction only for humans. It reduced the familiarity (pHuman = 0. 008, partial ηp2 = 0. 139), indicating the measured results of the current sample size were strong [59, 60]. Familiarity affected by PC showed a trend similar to the Uncanny Valley when there was no PC (Fig. 5).

5.2 Emotion Conveyance

Two-way ANOVA was used to compare the emotional valence between two groups of agents with and without PC.

The ANOVA test in Table 3 (p < 0.001) indicates that the emotion conveyance of agents with varying human-likeness levels differed from each other significantly. The effect size was large (partial ηp2 = 0.124), indicating the measured results of the current sample size were strong [59, 60]. The interaction effect was also significant (p < 0.001). However, PC of agents did not affect users' emotional conveyance. Comparing the emotional valence of different agents with and without PC showed that emotions towards Smart, NAO, and SAYA were not affected by PCs. In contrast, PC on FLOBI generated more positive emotion than others (pFLOBI < 0.01, partial ηp2 = 0.051). PC on a human image (phuman < 0.01, partial ηp2 = 0.054) enhanced negative emotions (Fig. 6).

Table 3 Two-way ANOVA test of emotional valence
Fig. 6
figure 6

Mean emotional valence of agents with and without PC

6 Discussion and Conclusion

This study compared emotional effects of PC of agents including four robots with different human-likeness levels and one human image. An experiment of two groups (with and without PC) was performed to test users’ emotional response when observing PC subconsciously. The hypotheses H1 and H2 were supported: better feelings of familiarity and more positive emotions had been observed when viewing PCs of agents with lower human-likeness levels, which aligns with the Uncanny Valley theory [4, 5], while PC of agents with different human-likeness did not impact the Uncanny Valley. However, PC on images of humans reduced the feeling of familiarity and emotion in human interaction significantly.

First, PC applied on agents had no significant effect on observers’ emotions independently. This result differs from other research findings stating that changes in the pupil area can enhance emotional interaction [8, 19, 37, 47]. Here, the significant enhancement of emotional conveyance was resulted only with FLOBI. This finding may be due to that the eye parts of FLOBI revealed the robot represented a child. The uncanny feeling comes from both human’s ability to empathize [29] and the instinct of self-protecting from dangers, fear, and disease [4, 28]. The image of a child hardly elicits threatening feelings and most possibly evokes people’s empathy subconsciously. Additionally, objects’ size is a critical factor impacting users’ emotion judgement [60]. The sizes of eye parts in this study were comparatively close to natural humans, while in previous studies, the diameters of the pupil, iris and eyeball were 90 mm, 170 mm and 250 mm, respectively [19, 37, 47]. This could be the reason why our results differed from theirs. Therefore, PC applied on robots as little threat as a child could have significant enhancement on emotional conveyance. In contrast, PC applied to robots in other categories should combine with other eye movements in the eye system if a significant enhancement in emotional conveyance is aimed for [8].

Second, for the human image, the application of PC increased the uncanny feeling, e.g., it decreased the feeling of familiarity and positive emotion compared to emotional conveyance from robots to humans. This aligned with previous studies indicating that pupil manipulations probably affect participants trust [40], and diminishing pupil size increases negative emotion ratings [38, 41]. In addition, in real life, people usually do not consciously recognize PC in human-to-human interactions [15]. The range of human PC is quite small in daylight [43], while salient PC is often associated with abnormal conditions like disease [44, 45] and drug use [46]. Thus, PC applied to human images increased the uncanny feelings.

However, there are some limitations to this study. For example, only two aspects of the emotional effect, the familiarity feelings and emotional conveyance, were evaluated and analyzed. Practical human–robot interaction processes are more diverse and complicated, and more indices of the emotional effect should be considered in future experiments. Moreover, this study conducted experimental research on only four typical humanoid robots and one human image. A more in-depth examination of PC in robots, more lifelike or hardly distinguishable, could be conducted in future research. In addition, the PC was presented to participants through a 2D screen, and some factors may have differed from human-to-human interactions in real life, such as viewing distance and a function the pupil dilation/constriction process should follow. In future research, the experiment could be extended to face-to-face interaction between real robots and humans. Only one human image and one robot representing a child (FLOBI) were used for comparison in this research. Thus, the conclusions that might be drawn about PC applied to human images and FLOBI could be limited and required more evidence. Future research on human faces of different races, genders, iris colors, age groups, etc., should be performed.

In conclusion, this study compared emotional effects of agents with five different human-likeness levels, with and without PC. This contributes to the study of emotional influence of PC of agents on human. It showed the different effects of PC applied to humans and robots. Our work broadens the study of PC in the research field of nonverbal behaviors. The effect sizes were statistically large. Thus, the results of this study provide practical knowledge for the application of PCs in designing humanoid robot eyes, making them more expressive in human–robot interaction. PC of agents does not affect the emotional conveyance effect independently. However, PC can significantly improve emotional expressions, when binding with special agents of no threat who may evoke empathy subconsciously; while more care should be taken to avoid uncanny feelings, when applying PC to human images.