1 Introduction

Robot tutors are increasingly being explored as a means of delivering education to children in both dyadic [13] and larger group scenarios [4, 5]. However, it remains unclear how a robot should behave socially in order to maximise learning outcomes. In the education literature, the social behaviour of a teacher is often assumed. For example, Kyriakides et al. [6] considers what makes teaching effective and lists how lessons are structured, how learning is assessed, how time is managed, and so on. The role of social behaviour is not mentioned; we believe that this is because it is so fundamental that it is assumed to be present. A base level of sociality can reasonably be expected when interactions occur between humans, but when the tutor is a robot, this element becomes unknown. The fundamental assumption of social behaviour for teaching highlights it as an important element to resolve.

Various researchers have begun to address certain aspects of social behaviour for educational contexts in human–robot interaction (HRI). Gordon et al. consider the impact that the curiosity of a robot may have on reciprocal curiosity of a child and their subsequent learning of words. The human–human interaction (HHI) literature predicts an increase in learning as curiosity increases, however this finding was not replicated with robots [1]. Saerbeck et al. also consider language learning with a socially supportive robot, where the socially supportive robot leads to more retention than a robot without this social behaviour [7].

Personalisation of interactions has been explored in health education for children with diabetes. In a dyadic interaction with a robot, the robot would ask the child for various items of personal information (name, favourite sports and favourite colours) and use them during the interaction [8]. The personalised robot provided an indication that children’s perceived enjoyment of learning was enhanced, although too few subjects took part to make conclusions about learning effects. Other authors have personalised human–robot interaction in learning contexts through manipulating the timing of lessons [9], or through setting personalised goals [10]. However, this becomes more about teaching strategy and does not help to generate lower-level social behaviour.

Personalisation has also been incorporated into larger scale social behaviour changes in interactions where children learn about prime numbers [2]. A surprising result was found where a robot designed to be ‘more social’ did not lead to learning gains, whereas children interacting with a ‘less social’ robot did experience significant learning gains. Such labelling raises questions about how HRI should characterise sociality: what constitutes being more or less social, and how can this be measured and expressed in experimental reports? This is an important issue to resolve to ease the understanding and interpretation of results, and for comparisons to be made between studies, often in differing contexts.

This paper seeks to explore one way in which sociality might be characterised for HRI: nonverbal immediacy. The elements of nonverbal immediacy are broken down into individual cues (such as gaze, gesture, and so on) and considered for use in an educational context, before being brought back together into an implemented behaviour to evaluate whether the concepts hold true in practice with robots. The rest of this paper is structured as follows. First, the social context of learning and the concept of nonverbal immediacy are introduced (Sect. 2). Nonverbal immediacy will then be considered in terms of the component social cues by which it is measured; the effect of each social cue on learning will be explored from both a HRI and a HHI perspective (Sect. 3). This will culminate in a set of guidelines for robot social behaviour during educational interactions (Sect. 5). These guidelines are used as a basis for an evaluation in which nonverbal immediacy is measured and compared to recall. The study uses a 2 \(\times \) 3 design, comparing nonverbal immediacy scores and recall between children and adults, depending on whether they have seen a high immediacy robot, a low immediacy robot or a human reading a short story (Sect. 6). A discussion of the potential benefits and limitations of this approach will be carried out (Sect. 7), with the suggestion that nonverbal immediacy is a useful means of characterising and devising social behaviour for robot tutors.

2 Sociality, Immediacy and Learning

It has long been posited that the role of society and social signals are of great importance in teaching and learning, most notably in Bandura’s Social Learning Theory [11] and Vygotsky’s Social Development Theory [12]. The importance of social signals is apparent from a young age, with social cues playing a role in guiding attention and learning [13]. However, we still have relatively little understanding of what impact combinations of multimodal social cues have on learning in complex settings [14]. Correspondingly, we don’t seem to be able to correctly identify highly effective teaching when we see it, raising questions about how to define what effective teaching consists of [15].

Fig. 1
figure 1

A depiction of the role of social interaction for an individual, with two possible outcomes: social performance and learning performance—adapted from Kreijns et al. [16]

Social interaction can be considered as the bond between cognitive processes and socio-emotional processes [16]. The outcome of such interaction can be measured through social performance or learning performance, either of which can in turn reinforce the cognitive or socio-emotional processes taking place in an individual (Fig. 1). This concept is supported through definitions of learning, which can be broken down into ‘affective’ and ‘cognitive’ learning [17]. Social interaction has the ability to influence both of these learning elements, and indeed HRI researchers have sought to do just this. Some researchers have focussed on the social behaviour of the robot with the aim of influencing cognitive processes [18], whereas others have sought to influence the socio-emotional processes to a greater extent [19].

Many studies considering the impact of social behaviour use a human expert or model in order to inform the behavioural design for a largely autonomous robot, for example [2, 20]. Additionally, many studies only vary a limited set of social cues, often to tightly control the experimental conditions [2123]. Whilst these approaches allow us to learn about the impact of some social behaviour on learning, there are many difficulties in comparing between studies as there is no common metric for the overall social behaviour of the robot. It is also unclear what would happen when multiple social cues are modified together; it seems plausible that the effects found from single cue manipulation would be additive, but there is evidence to suggest that humans do not process social cues in this manner [24]. A means of characterising social behaviour across multiple contexts would therefore provide a great advantage to the field for making cross-study comparisons.

One possible concept for making such social characterisations is nonverbal immediacy. Immediacy can be defined as “the extent to which communication behaviours enhance closeness to and nonverbal interaction with another” [25], with closeness referring to both proximity and psychological distancing. Nonverbal immediacy is a measure of nonverbal behaviour which indicates a “communicator’s attitude toward, status relative to, and responsiveness to” an addressee [25]. Richmond et al. [26] developed a highly reliable questionnaire to measure nonverbal immediacy in communication contexts. The ‘Nonverbal Immediacy Scale-Observer Report’ developed is freely available onlineFootnote 1 and incorporates the following social cues into a single measure: gestures, gaze, vocal prosody, facial expressions, body orientation, proximity, and touch.

Nonverbal immediacy emphasises the multimodal nature of interaction and the consideration of all social cues taken in context with respect to each other. The measure provides a characterisation of ‘sociality’ which can then be correlated against an outcome, such as learning, and compared against another set of behaviour characterised in the same manner. It has found extensive application in educational research, most often in university lecture scenarios [27].

When reviewing the literature surrounding nonverbal immediacy it is important to make the distinction between ‘affective learning’, ‘cognitive learning’ and ‘perceived cognitive learning’. Affective learning considers constructs such as attitudes, values and motivation towards learning [28]. Cognitive learning typically focusses on topic specific knowledge and skills [29]. Perceived learning is a measure of how much students believe they have learnt, or how confident they are in what they have learnt, such as in [30]. Whilst the correlation with measured cognitive learning gains is only moderate, relatively few studies have used experimental measures; most have used perceived learning, which has a particularly strong correlation with teacher immediacy [27]. It has been experimentally found that perceived learning and actual recall are moderately correlated in such contexts [31], so whilst perceived learning is not as strong as measuring actual learning, it can at least be used as an indication of the nature of relationships.

A positive correlation between nonverbal immediacy and perceived cognitive learning has been validated across several cultures, including the United States, Puerto Rico, Finland and Australia [32]. From this McCroskey et al. postulate that expectation of immediacy plays a key role in how cues are interpreted, presenting opportunities for high immediacy teaching to have a strong positive impact in generally low immediacy cultures, but a negative impact for low immediacy teaching in high immediacy cultures [32]. A similar suggestion relating to the use of robot social cues in teaching contexts has also been raised in HRI [33].

Both verbal and nonverbal immediacy behaviours have been shown to lead to an increase in motivation, and, in turn, student learning [34, 35]. In some cases, such as in a task to recall contents of a lecture [36], cognitive learning gains are not found, but affect for the instructor and material increases when the instructor is more nonverbally immediate. However, there are other examples demonstrating a link between greater nonverbal immediacy and increased recall [37, 38]. A more extensive review of the potential benefits of immediacy (both verbal and nonverbal) can be seen in [39].

Nonverbal immediacy has been studied only briefly in HRI contexts before. Szafir and Multu [23] use it as a means of motivating and evaluating robot behaviour during a recall task with adults. In line with literature studying nonverbal immediacy with humans, they find that as immediacy increases, so does recall. The adults were also able to notice when the nonverbal immediacy of the robot had increased, confirming that people are sensitive to such cues in robots. Nonverbal immediacy concepts have also been used by the same lab to motivate behavioural manipulations for persuasive robots [40]. However, it should be noted that it doesn’t appear that a complete nonverbal immediacy questionnaire was used in either of the studies. This is important as it is argued in this paper that a key motivator for using nonverbal immediacy measures is the consideration of all cues taken in context; this idea will be returned to and expanded upon in Sects. 4 and 7. Finally, nonverbal immediacy has recently been proposed for use in HRI studies to motivate exploring the perception of a robot when posture and nodding behaviour is varied [41].

3 Social Cues of Nonverbal Immediacy

Based on the method used to calculate nonverbal immediacy, if there is a linear relationship between learning and immediacy (as suggested by [34]) then learning would be maximised if the social cues used in nonverbal immediacy are maximised. However, there are also suggestions that the relationship may not be wholly linear in nature [42, 43]. As such, it remains slightly unclear how immediacy should be utilised for social robots. The following subsections will consider each of the component cues which form the nonverbal immediacy measure in turn to provide further insight into how they can be applied in practice, with a particular focus on findings from HHI and HRI learning scenarios. The aim is to generate guidelines for social behaviour in robot tutoring scenarios that are informed by the concepts of the nonverbal immediacy measure and supported by previous work in both HHI and HRI (Sect. 5).

3.1 Gestures

Gestures play an important role in teaching and learning [44, 45]. Children are more likely to repeat the speech of a teacher if a matching gesture accompanies the speech when compared to the same speech without a gesture, but less likely with a mis-matched gesture compared to no gesture [46, 47]. This basic recall is a first step towards learning. Furthermore, these studies show that children can use gestures in understanding problem-solving strategies, giving them the potential to learn both through problem solving and how to approach solving problems.

For young children, it has been suggested that gesture use (specifically symbolic gestures) can facilitate cognition [48]; possibly because gestures can lighten cognitive load, lending more resources to memory tasks [49]. Indeed when children are slightly older (aged 8–10) gestures can help learning to ‘last’ for longer, with correct answers in an algebra follow-up test four weeks after a learning session staying higher in a gesture and speech condition than in a speech only condition [50]. Equally, gestures made by children can be used to assess their learning [51], with adults able to be more certain of their judgements of children’s learning when their gestures matched their verbal explanation.

Such findings are reinforced in studies concerning instructional communication for learning, with children’s performance improving more when given instructions with gestures as opposed to without in a symmetry recognition test [52]. These findings seem to have been partially replicated in HRI, with a robot utilising contingent gesturing leading to increased recall of material from a presentation [23]. However, precisely how to use gestures to influence learning in HHI is an open field with many questions still necessitating futher exploration [53]; this is even more true for HRI where less work examining the use of gesture and learning has been conducted.

The use of hands seems to be particularly important. It is not just the orienting of attention, such as with a laser pointer, but the fact that the gesture is done with a hand that leads to an improvement in learning [54]. It has been shown that humans can accurately interpret pointing by a humanoid robot (an Aldebaran NAO), but that for best results, the arm on the side which the object to be pointed at should be used [55]. However, whether the hand of robot has the same attentional and learning impact as that of a human is not known. It has also been established that being present (as opposed to on video) does not affect how much attention gestures draw between humans [56], but no such study comparing humans and robots could be found.

3.2 Gaze

From an early age, children use social cues such as eye gaze to help direct their learning. Despite social cues distracting briefly from the material to be learnt, infants learn more with gaze cues present than when their learning is not directed by such cues [57]. These positive effects have also been successfully implemented in computational models [58]. Even at 15 months old, children have a tendency to use the gaze of a social interaction partner, instead of distracting and erroneous saliency cues for word learning associations [59]. The power of gaze, or even just the eyes, in influencing behaviour is still observed in adults, with surprisingly strong results. For example, just an image of eyes near a donation point can increase charitable donations by almost 50 % [60].

Selective processing of social cues for learning has far-reaching implications for human–robot interaction. Head movement alongside eye gaze can assist humans in responding to robot cues [61]; use of this social cue could have advantages in learning. However, this has not been found in infants learning from robots, where they follow the gaze direction of both a robot and a human, but only the human gaze facilitated the learning of an object [62]. It was suggested that this could signify a disposition of infants to consider humans a superior source for learning. It remains to be seen whether this holds true for slightly older children, or with children more familiar with the concept of robots. Equally, this result could be a demonstration that humans process robot gaze in a cognitively different manner, as argued in [63].

College students who receive gaze at the start of each sentence when receiving verbal information can recall significantly more than those who receive no gaze [64]. This holds true for both simple and difficult material, for both genders. It is hypothesised that this is because the interaction feels more ‘intimate’ and prevents mind-wandering whilst receiving the information. These findings have also been shown to occur with younger children, aged between 6 and 7 [65]. Greater gaze from a storyteller led to increased recall from children when subsequently asked questions, compared to those in a lesser (but still some) gaze condition. This study reveals a trend towards possible interaction effects between the information content, gender and gaze, speculating that females are less affected by gaze than males when the material is more difficult.

Logically, it follows that using appropriate robot gaze towards a child might be beneficial for recall and learning. Work done in virtual environments demonstrates that caution must be used, as simply staring at a human interactant actually reduces their willingness to engage in mutual gaze, despite increased opportunity [66]. It should be noted that this difference in mutual gaze did not actually translate to a difference in task performance, but this was hypothesised as being due to the relative simplicity of the task. A lack of effect due to gaze has been observed in human–robot interaction studies as well. In both [67] and [68], a tutoring robot received more gaze from children, which could theoretically be beneficial for child learning (as the robot is delivering learning content), but no learning differences were found.

Nevertheless, the outcome here is a message of balance: gaze can clearly have positive effects on learning [58, 62, 64, 65], but if it is not meaningful, or is too abundant then it can discourage mutual gaze, thereby limiting potentially positive effects [66]. This remains a challenge, as it is not trivial to decide how much gaze is ‘just right’, or precisely when a gaze should be made by a robot.

3.3 Vocal Intonation/Prosody

The voice that an agent uses can dictate how much they are liked and how hard humans try to understand the material they are presented with [69]. Those who interacted with an agent who had a human voice preferred the agent and also did better in learning transfer tests when compared to those who interacted with the same agent with a machine-synthesised voice. The sound of a voice can have a significant impact on retention and transfer of a novel subject when presented through narration [70]. Retention is better when a voice has a ‘standard’ (as opposed to foreign) accent and is human rather than machine-like, as well as being more likeable in both cases.

However, this result was found with college students and virtual agents. It has not been established whether this effect is also observed outside of this restricted demographic, nor whether specific embodiments of robots create expectations that violate these rules. For example, it may be less appropriate to have a deep male human voice when using a robot such as the Aldebaran NaoFootnote 2 than a RoboThespian.Footnote 3 It is suggested that a possible uncanny valley effect [71] may occur, where participant expectations are violated when a human voice is played alongside a not-convincing-enough animated agent. An indication in this direction has been found with virtual agents, where participants preferred an animated agent with a machine-like voice and a non-animated agent with a human voice [72].

Vocal intensity can also be used to influence learning. Compliance, a factor in learning, can be increased through raising vocal intensity, as in [73]. This HHI study was conducted in a public space where compliance was greatest when using a medium level of vocal intensity; around 70 dB. It is likely that this level would need adjusting depending on the ambient noise in the space a robot tutor would be acting in, and how far from a student it would be. Vocal intensity has successfully been combined with gestures in a model which is based on nonverbal immediacy to improve attention and recall of a human in an HRI presentation scenario [23]. Whilst not confirming all of the results discussed in this section relating to vocal prosody, it certainly demonstrates that there is great potential for many of the same principles from HHI being applied to HRI with positive results.

Interestingly, speech rate appears to have a significant impact upon perceptions of nonverbal immediacy, but not on recall [74]. As speech rate increases, perceived immediacy of a speaker goes up, but there is no significant difference in recall as a change of immediacy might predict. This could potentially be explained by the capacity of humans for speech. The average human speech rate is 125–150 words per minute, but learners have twice as much cognitive capacity, being able to process speech at 250–300 words per minute [75]. This gives great scope for increasing speech rate, and therefore immediacy, but without any great change in terms of the listener’s cognitive processing.

3.4 Facial Expression

In a HHI study examining the relationship between the social cue elements of nonverbal immediacy and cognitive learning across a number of different cultures it was found that alongside gaze and vocal prosody, smiling from the teacher was one of the more strongly correlated cues to student learning [32]. This result has also been replicated more recently [76], additionally showing the positive relationship between nonverbal immediacy and motivation (with facial expressions having a large effect size).

Experimental data from human–computer interaction (HCI) with an embodied conversational agent revealed no significant difference in recall of subjects when interacting with an agent which was either neutral, or able to express joy and anger [77]. Several reasons are put forward as to why this may have been the case, including a ceiling effect within the task, the amount each emotion was displayed, or that the facial expressions were simply ignored in favour of focussing on the task. As such, it is unclear whether the benefits of facial expression seen in HHI will translate to HCI and HRI.

Despite the suggested impact of facial expressions on learning or motivation in HHI, no data could be found regarding the impact of learning and facial expressions of robots. A possible explanation is that much of the research to-date regarding learning in HRI is performed with robots such as the Aldebaran NAO, Keepon, and Wakamaru which have largely non-manipulable faces. Due to the movement required in expressing facial emotion, the uncanny valley [71] could also be a current limitation for robots.

3.5 Proximity and Body Orientation

The proximity between interactants is correlated to compliance effects [78]. It is suggested that a distance of 1–2 feet (30–60cm) is optimally conducive to compliance between humans (from studies conducted in Western cultures) [79], however whether this is the same for HRI has not been established. This is possibly because judging the physical proximity at which a robot should be from a student would not necessarily be as simple as a strict 1–2 feet (30–60 cm) rule. In human interactions, verbal feedback can modulate (positively and negatively) the proxemic impact on compliance [80]. In HRI, comfortable distances are dictated through the complex interplay of factors such as the size of the robot [81], how much the robot gazes towards a human and how likeable they previously perceive the robot to be [82].

Only about 60 % of people conform to the same proxemic social norms with robots as they do with people [83]. That being said, compliance effects have been seen in educational interactions between children and robots at a distance of about 2 feet (60 cm), although this hasn’t been compared against a control with closer or further distances [84]. Additionally, it would appear that younger children have a smaller personal space, presumably due to their smaller size, so further work would need to be done for people of different sizes [85].

Research conducted with a robot in a variety of task contexts show humans generally prefer the robot to be 0.46–1.22 m away [86]. However, it is warned that the dynamic nature of interaction with a robot should not necessarily be reduced to a simplistic rule. Indeed, the previous paragraph suggested the impact of variable robot appearance and behaviour, but there are also environmental and task factors to consider. For instance, if it is important to hear speech in a noisy environment, then it might be that a closer distance between interaction partners is more comfortable, when outside of these parameters it would usually not be.

Several design guidelines for robotic proximity are presented in [87]. It is suggested that people who are familiar to the robot can be approached more closely, to direct gaze away from the face of a human as an approach is made, and to factor in the human’s attitude towards robots when maintaining distance. The impact of human attitude towards robots is further supported experimentally in [88] where the necessity of building rapport before increasing closeness is emphasised. This could be an important factor in teaching in order to gain compliance.

Studies directly examining the impact of body orientation on learning could not be found; this is possibly due to the entanglement of body orientation with many other social cues. If not orientated to an interaction partner only limited eye gaze will be possible, gestures may be occluded and it may be more difficult to hear any speech. Nor could any studies be found studying the specific impact of co-located physical proximity on learning; most work considers co-located learning against distance learning (not co-located), but this then becomes about social presence rather than proxemics. Logically, it would seem reasonable that a middle-ground should be sought. The robot should not be too far away as then the student may struggle to perceive verbal instructions and nonverbal signals. If more compliance is required, then a closer distance should be sought. Further research is required to decide what is to be considered ‘too close’ in specific scenarios, with humans of certain ages and certain robot sizes/designs; work such as [83, 89] provides a strong starting point in this direction.

Table 1 Behavioural guidelines for robots in educational contexts derived from the nonverbal immediacy and social cue literature

3.6 Touch

Touch has been shown to lead to a positive affective state in HHI, even with very short touches and when subjects were unaware of the touch [90]. This positive response to touch has also been shown in HRI. When a robot offered an ‘unfair proposal’ to participants with touch, their EEG response showed less negativity towards the robot than when the robot did not touch as they made the proposal [91]. Of course, liking does not necessarily result in better learning, but there are indications that if students like an instructor more they will achieve more highly [92].

Touch has also been linked with compliance [93], a useful tool for teachers when they need to influence students in order to get them to engage with lessons. The potential for utilising touch in HRI and educational contexts has previously been highlighted [94] but, as yet, remains underexplored.

4 Synchrony and Multimodal Behaviour

Of course, social cues do not occur in isolation, neither from other cues, nor from the environment and the interaction they are being used in. Behaviour is multimodal, and the cues must be contingent with respect to the interaction and congruent with other social cues being utilised in order to be interpreted correctly and efficiently. Social cues could be perceived as a single percept, which requires that cues be considered as an integrated whole [24]. Nonverbal immediacy is measured by taking many social cues into consideration with respect to one another, and thus supports the principles behind interpreting social behaviour in this manner.

These concepts are exemplified experimentally by Byrd et al. [95] who further explored the conclusions drawn from studies such as those done by Cook et al. [50] regarding gestures and learning (discussed previously in Sect. 3.1). They found that when children did not copy eye movements accompanying gestures the lasting learning effect disappears.

Support for the role of synchrony in social cues can be seen in [96, 97]. Head gaze, gestures and spoken words were all used to direct attention. When any of the cues were incongruent (e.g. responses had to be made to head-gazes, whilst a pointing gesture was made in a different direction), interference effects were found, slowing down responses. If social cues are not synchronous and congruent then interactions will likely be impeded by this additional processing time.

Not just the cues being used, but also their contingency can influence interactions. A robot which displays more contingent social cues, such as appropriate gaze and pointing gestures, can elicit greater participation in an interaction [98]. When applied to an educational context, it is reasonable to suggest that greater participation will lead to an increase in learning [99].

Fig. 2
figure 2

Updated version of Fig. 1 depicting the influence of nonverbal immediacy on social interaction, and the educational dimension of social interaction which this paper is concerned with. Section references are provided in the diagram for each of the social cues that nonverbal immediacy consists of

5 Guidelines

Based on the analysis of the individual cues that comprise nonverbal immediacy (Sect. 2) we seek to derive a set of design guidelines that can be applied to HRI in tutoring contexts. Nonverbal immediacy and learning have been positively correlated in human–human studies, and there have been indications that this may be supported in HRI as well [23]. The social cues which make up nonverbal immediacy have been explored through the HHI and HRI literature, often revealing a connection with learning gains on an individual basis, providing some insights into the practical application of such cues for HRI. From this, guidelines for robot social behaviour in educational interactions have been devised (Table 1).

6 Evaluation

If an effect seen in HHI studies concerning nonverbal immediacy can be replicated with robots, then this strengthens the case for phenomena correlated with immediacy in HHI studies transferring to HRI as well. This could provide useful links to a body of literature from which insights into design of robot behaviour could be derived.

The guidelines in the previous section use nonverbal immediacy as a basis for behaviour generation, which is commonly measured through observational reports, such as those seen in [26]. This measure has seen limited application in HRI evaluations before, though where it has, the immediacy scores have not been explicitly stated [23, 40]. As such, it would be beneficial to validate that behaviour intentionally created as more or less immediate is judged as such when applied to robots, as it is with humans. Additional validation with children (due to the educational context of this work) to check whether they interpret the behaviour in the same manner as adults would allow the guidelines to be applied to a larger range of HRI scenarios. A human condition is therefore used to provide a reference point for the child ratings with respect to the adult ratings. This will enable an assessment of the reliability of child ratings of immediacy (which does not readily appear in the literature), as a basis for the subsequent examination of child ratings of robot immediacy. The comparison between child and adult interpretation of human nonverbal immediacy serves as a useful intermediary step between the existing literature and applications of nonverbal immediacy with robots and children. The evaluation here focuses on the outcome of the educational dimension of social interaction (as opposed to the social dimension) as influenced by nonverbal immediacy (Fig. 2).

6.1 Methodology

A 2 \(\times \) 3 condition study was devised to explore how nonverbal immediacy would impact recall; two factors which have been shown to be positively correlated (Sect. 2). In order to evaluate whether children and adults interpret the behaviour of a robot and a human in the same way, a scenario which could be understood by both groups was required. As such, the study design started from the perspective of the children (who are presumed to have a shorter attention span and more limited knowledge in some areas such as vocabulary) and was then applied to adults. Recall of a presented short story was decided to be an appropriate task for this purpose as this matched the methodologies of immediacy studies.

Participants A total of 117 participants took part in the study, but one child had to be excluded due to an incomplete questionnaire and two adults were excluded due to inconsistent online video timestamps; this will be expanded on later in this section. 83 children (age M = 7.8 years, SD = 0.7; 47 F, 36 M) and 31 adults (age M = 23.5 years, SD = 3.9; 7 F, 24 M) remained for data analysis. All participants consented to participation in the study and all children had parental permission to take part. The children were recruited from one school year group of a primary school in the UK; the children were split across conditions based on their usual school classes, which ensures an appropriate balance for gender and academic ability. Adults in the robot conditions were recruited through regular lectures, and through online advertising for the human condition.

Short Story A short story was created for the purpose of the recall test. The story was largely based on one freely available from a website containing many short stories for children.Footnote 4 This was done to make sure that the language and content was appropriate for children. Some elements were added or modified in order to create opportunities for recall questions, and some of the phrasing was modified so that the robot text-to-speech sounded more accurate. The final version of the story created can be seen in Appendix 1 and lasts for just under 4 min when read in the experimental conditions. None of the participants reported to have heard or read the story before.

Measures Two measures were used: a nonverbal immediacy observer report questionnaire and a recall test. The Robot Nonverbal Immediacy Questionnaire (RNIQ; Appendix 2) was based on the short form of the Nonverbal Immediacy Scale, sourced from [100] and freely available online.Footnote 5 Exactly the same questionnaire was given to both children and adults. The questionnaire was modified from the original to make it easier to understand and complete for children. This was done in four ways:

  1. 1.

    “He/she” was changed to “The robot”, or “The man” depending on the condition.

  2. 2.

    “while talking to people” was changed to “while talking to you”.

  3. 3.

    The response of ‘occasionally’ was changed to ‘sometimes’.

  4. 4.

    Instead of filling in a number at the start of each line, boxes labelled with the scale were presented for each question. This prevents children from having to keep referring back to the top of the page and potentially losing their thought process, and also prevents mistakes in interpreting their handwriting during analysis.

Fig. 3
figure 3

Still images from the conditions used in the evaluation; left to right: (1) low nonverbal immediacy robot, (2) high nonverbal immediacy robot, (3) human. Red backgrounds for the robot were not used in practice and are just used to ease visibility here; the video was shown in widescreen format, with a black background covering the unused space, as in the figure

The recall test was devised based on information provided in the short story and consisted of 10 multiple choice questions, with a final free text answer about the moral of the story. The full list of questions and answer options can be seen in Appendix 3. The questions were designed to vary in difficulty based on how many times the piece of information had been stated, how central it was to the plot, and how many answer options were similar to the correct one. An additional question was added to the adult human condition regarding the colour of the background in the video; this was part of a series of checks to ensure that the video had actually been watched.

Table 2 Operationalization of behavioural manipulations between robot immediacy conditions

Hypotheses and Conditions Based on the literature explored in Sect. 2 and the guidelines in Sect. 5, four hypotheses for the study were considered:

  • H1: Robot behaviour designed to be more or less immediate will be perceived as such, as measured through the nonverbal immediacy scale.

  • H2: Children and adults will perceive nonverbal immediacy in the same manner for both robots and humans (i.e. children and adults ranking of immediacy will agree).

  • H3: Recall of the story will be greater when read by a character with higher nonverbal immediacy.

  • H4: As nonverbal immediacy of the character reading the story is perceived to increase by an individual, their recall of the story will also increase.

In order to address these hypotheses, three conditions were devised which were shown to both children and adults:

  1. 1.

    High nonverbal immediacy robot (Fig. 3 centre)—using the guidelines in Sect. 5, the robot behaviour was maximised for immediacy where possible; full details of the robot behaviour can be seen in the following paragraph. Child n = 27; adult n = 9.

  2. 2.

    Low nonverbal immediacy robot (Fig. 3 left)—using the guidelines in Sect. 5, the robot behaviour was minimised for immediacy where possible; full details of the robot behaviour can be seen in the following paragraph. Child n = 28; adult n = 9.

  3. 3.

    Human (Fig 3 right)—a human was recorded on video reading the story. This was to ensure identical behaviour between child and adult conditions and to time the story to be at the same pace as the robot conditions in order to have equivalent exposure time and reading speeds (which can impact recall [74, 101]). This condition enables the immediacy ratings of children to be validated with respect to adults. The human was not given explicit instructions in terms of nonverbal behaviour, as their immediacy level is not under consideration, but whether the children and adults perceive their immediacy level in the same way is. Therefore, the behaviour itself is not of concern, provided that it is identical between conditions (the video recording ensures that this is the case). Child n = 28; adult n = 13.

Robot Behaviour The high and low nonverbal immediacy robot conditions were developed based on the guidelines from Sect. 5. The conditions sought to maximise the differences between the behavioural dimensions which the guidelines address (and therefore also the dimensions measured by the nonverbal immediacy scale). Some dimensions were not varied due to limitations in the experimental set-up. Facial expressions were not varied as the robot being used for the study, an Aldebaran NAO, is not capable of producing facial expressions such as frowning or smiling. Proximity was not varied due to the group setting in which the study was being conducted. When the robot is telling the story to a classroom of children it is not feasible, or safe, to incorporate touch or to approach the children. The operationalization of behavioural manipulations that were carried out can be seen in Table 2.

Procedure For the robot conditions, the robot was placed at the front of the classroom on a table to be roughly at the head height of observers (either children or adults). The experimenter would then explain that the robot would read a story and that afterwards they would be required to fill in a questionnaire about what they thought of the robot. The recall test was explicitly not mentioned to prevent participants from actively trying to memorise the story. The experimenter then pressed a button on the robot’s head to start the story. Once the story was complete, the nonverbal immediacy questionnaires were provided to all participants. When the whole group had completed this questionnaire, the recall test was introduced and given to participants. For the children, this was followed by a short demonstration of the robot. The human video condition procedure was the same for the children. The video was resized to match the size of the robot as closely as possible, and the volume was adjusted to be approximately the same as well.

Table 3 Mean nonverbal immediacy scores by condition

As the children did not know this person, the adults should not either so that the reported immediacy score is based purely on the behaviour seen in the video and not prior interaction. The subjects for the video condition were recruited online and completed a custom web form which prevented the video from being paused or played more than once, and recorded timestamps for the start of the video, the end of the video, and the completion of the questions. An additional question was also added to the recall test to verify that the participants had actually watched the video (as opposed to the rest of the recall questions which can be answered through listening alone). One participant was excluded from analysis as the timestamps for the start and end of the video indicated too little time for the full video to have been viewed and another participant was excluded as the time between watching the video and completing the questions was in the order of hours (all other participants completed all questions in under 10 min), indicating that the intended protocol had been violated.

6.2 Nonverbal Immediacy Results

Nonverbal immediacy scores were calculated from the questionnaires and produce a number which can be between 16 and 80. Immediacy scores and confidence intervals can be seen for each condition in Table 3. Whilst these scores might initially appear to be relatively low given the possibility of scores as high as 80, the scores do fall in the range expected. Due to the exclusion of certain aspects of the immediacy inventory in the robot conditions in terms of moving towards and touching observers, as well as producing facial expressions, it is unlikely that the score would raise above 56. It is however possible to be perceived differently and score more highly (for example the robot could have been perceived to have produced a smile, even though the mouth cannot move).

Fig. 4
figure 4

Robot nonverbal immediacy scores as rated by children and adults, relating to hypothesis H1. Significance is indicated by *p < .05, **p< .01, and ***p < .001. Error bars show the 95 % Confidence Interval

A two-tailed t test on the adult data reveals a significant difference between the nonverbal immediacy score for the high immediacy robot (M = 50.2, 95 % CI [47.0,53.5]) and the low immediacy robot (M = 36.3, 95 % CI [33.5,39.1]); t(16) = 7.460, p < .001. The same test on the child data also reveals a significant difference between the nonverbal immediacy score for the high immediacy robot (M = 50.8, 95 % CI [48.6,53.0]) and the low immediacy robot (M = 46.5, 95 % CI [44.2,48.8]); t(53) = 2.793, p  =  .007 (Fig. 4). These results confirm hypothesis H1, that robot behaviour designed to be more or less immediate will be perceived as such when measured using the nonverbal immediacy scale. This provides a useful check that the behaviour of the robot has been interpreted as intended by both children and adults.

Support can be seen for hypothesis H2, that children and adults will perceive nonverbal immediacy in the same manner for both robots and humans (Table 3). The results show that both children and adults score the high immediacy robot very similarly, with almost identical means. The relative ranking of immediacy between conditions is also the same, with the high immediacy robot being perceived as most immediate, then the human, followed by the low immediacy robot condition.

However, there are also some differences as the child scores are more tightly bunched together; this could reflect their different (yet consistent) interpretation of negatively formulated questions [102], or more limited language understanding impeding the data quality [103]. A two-way ANOVA was conducted to examine the effect of age group (child/adult) and condition (high/low robot, human) on the immediacy rating. A significant interaction effect was found between these two factors: F(2,108) = 5.29, p = .006. Significant main effects were found for condition (F(2,108) = 16.96, p \(\,<\,\).001) and age (F(1,108) = 26.51, p \(\,<\,\).001). However, due to the interaction effect, exploration of simple main effects splitting the conditions is also required to correctly interpret the results. Significant simple main effects are found for condition within each level of age group (child/adult): adults—Wilks’ Lambda = .796, F(4,214) = 6.46, p<.001; children—Wilks’ Lambda = .798, F(4,214) = 6.38, p < .001. Significant simple main effects are also found for age group (child/adult) within each condition: low immediacy robot—Wilks’ Lambda = .664, F(2,107) = 27.11, p < .001; high immediacy robot—Wilks’ Lambda = .862, F(2,107) = 8.54, p < .001; human—Wilks’ Lambda = .811, F(2,107) = 12.49, p < .001.

Table 4 Mean recall scores by condition

These findings suggest that some differences are present in the way that children perceive (or at least report) the immediacy of the characters when compared to adults. This is not surprising given the tighter bunching of child nonverbal immediacy scores. Nevertheless, there is a strong positive correlation between the child scores and the adult scores, r(1) = 0.91, although this is not significant (p = .272) due to the low number of comparisons (3 conditions). Overall, due to the strong positive correlation and the same ranking of the conditions, it would seem that children perceive nonverbal immediacy in a similar manner as adults, but there are clearly some differences at least in terms of reporting. We would argue that there is a strong enough link to deem nonverbal immediacy an appropriate measure to use with children (and to tie the findings here to the adult human immediacy literature), but this is an area that would benefit from further research.

Cronbach’s alpha values were calculated for the nonverbal immediacy questionnaire for adults and children, splitting the human condition and the robot conditions. All alpha values are based on the 16 item scale. The reliability rating for the adults with the robot is high (\(\alpha =.79\)), whereas in the human condition it is quite a bit lower (\(\alpha =.45\)). This difference may be an effect of embodiment, and will be explored further in the discussion Sect. 7.4. Reliability scores for children are relatively low in both cases (human \(\alpha =.55\); robot \(\alpha =.30\)). In spite of the variation in child responses, the questionnaire was sensitive enough to detect differences as shown in this section. The implications of this are also discussed in Sect. 7.4.

6.3 Recall Results

Recall results are based on the 10 recall questions presented to all participants; scores are given as the correct proportion of answers, i.e. 8 correct answers = 0.8. Recall scores and confidence intervals can be seen for each condition in Table 4 and are represented graphically in Fig. 5.

To explore hypothesis H3, a two-tailed t test was conducted on the adult data to compare recall between observing the high and low immediacy robot conditions. No significant differences at the \({\textit{p}} < .05\) level were found; t(16) = −0.577, p = .572. However, significant differences are found for the child data. A two-tailed independent samples t test reveals that recall is higher in the high immediacy robot condition (M = 0.58, 95 % CI [0.52,0.64]) than in the low immediacy robot condition (M = 0.49, 95 % CI [0.46,0.53]); t(53) = 2.006, p = .011.

These results provide partial support for hypothesis H3: recall will be greater when the character reading the story is more nonverbally immediate. It can be seen that this holds true for the children, where recall is greater in the high immediacy robot condition than in the low immediacy robot condition, in accordance with this condition being perceived as more immediate. However, there are no significant differences in recall between the conditions for adults. This is likely due to a ceiling effect with adults because the recall questions were designed so that they were suitable for children. This may have made them too easy for adults overall, leaving limited space to show differences between conditions. If the questions were more difficult and exclusively targeted towards adults then it is possible that differences would be found. The partial support for H3 and replication of findings from previous studies of nonverbal immediacy—using robots—provides a proof-of-concept for the approach proposed in this paper.

Fig. 5
figure 5

Recall scores for high and low nonverbal immediacy robot conditions relating to hypothesis H3. Significance is indicated by *p < .05, **p < .01, and ***p <  .001. Error bars show the 95 % Confidence Interval

No support is found for hypothesis H4: that higher individual perception of nonverbal immediacy will lead to greater recall for that individual. Correlations between nonverbal immediacy ratings and recall scores are not significant for children (r(81) = −0.047; p = .673) or adults (r(29) = −0.188; p = .311). Indeed the correlations themselves are in the opposite direction (although only with a small magnitude) to that which was expected. This would suggest that in this study, the rating of immediacy at the individual level has less of a bearing on recall than the average as judged by the group, but there is not enough evidence here to explain why this occurred.

7 Discussion

This paper started from the established research field of nonverbal immediacy which links behaviour to learning gains in a measurable and comparable manner (Sect. 2). This was broken down into its component social cues to explore their effect on learning individually. The evaluation in this paper applied a series of guidelines that were devised based on nonverbal immediacy cues and informed by HHI and HRI literature. It was found that both children and adults perceive the immediacy of a robot designed to have low and high nonverbal immediacy behaviours as intended, which confirms and extends prior work in HRI [23]. Additionally, both children and adults ranked the nonverbal immediacy of robots and humans in the same order, although children’s raw scores were more tightly grouped. This gives rise to the possibility that much of the nonverbal immediacy literature, which has mostly been conducted with adults, would also apply to children.

Recall of a short story improved significantly for children when the robot reading the story was more immediate in behaviour, which does indeed confirm the hypothesis derived from nonverbal immediacy literature, based on human–human studies showing the same effect [37, 38]. No significant difference in recall was observed in the adult data, but this may be due to the relative lack of difficulty of the recall test, which had been designed specifically for children.

The following subsections will discuss the findings here in the wider context of research conducted in HRI and HHI. First the impact of individual characteristics will be discussed in relation to hypothesis H4, which was not supported. Secondly, the possible impact of novelty on the perception of behaviour and recall will be explored. Thirdly, potential shortcomings of nonverbal immediacy as a measure for characterising interactions are raised. Finally, we share the lessons learnt from this study in applying nonverbal immediacy measures to HRI and consider the influence of the study design on the findings.

7.1 Students as Individuals

Out of necessity, most experiments observe the learning of large samples of students, meaning that the effect is seen on average, but does not necessarily apply to all students. All children are individuals, with their own characteristics, preferences for subjects and learning styles. It may be that there are some educational scenarios, topics, or children, with which technology is more suited to assisting [104]. Some children may be impacted to a degree related to their personality (and their ‘need to belong’) [105], or their learning style [106], which can affect their sensitivity to social cues.

All studies here have been considering typically developing children/students, so many of the outcomes may not apply to individuals with, for example, attention-deficit hyperactive disorder (ADHD) or autism spectrum disorder (ASD) who might have difficulties in interpreting some social cues [107109].

Gender could also have an impact on learning and the use of social cues. It has been found in both virtual environments [110112] and physical environments [113] that males do not utilise gaze cues in the same way as females; or if they do, it does not manifest in behaviour change or learning. The gender of the teacher, at least in virtual environments, does not however seem to impact on the learning which takes place [114].

In the evaluation presented in Sect. 6, support was not found for hypothesis H4, which sought to link individual perceptions of the robot behaviour (as measured through nonverbal immediacy) to recall scores. It is suggested that this may be because the nonverbal immediacy scale does not cater for the many other variables between individuals that may influence their learning. However, this does not reduce the utility of nonverbal immediacy as a characterisation of robot social behaviour, with differences in robot behaviours clearly demonstrated as part of hypothesis H1. Instead, we highlight here the need to further develop means of including perceptions of robot behaviour into broader models of learner characteristics.

7.2 The Novelty Aspect

It is necessary to acknowledge that the use of social cues is only partially responsible for positive learning outcomes. The approach, content and assessment of teaching contributes significantly to the learning process [115], as does the knowledge of the teacher [116] and their beliefs towards learning [117]. Of course, the students play an equal part in learning too, with aspects such as their emotion playing a role in the process [118]. Teachers and students often have long-standing relationships; these relationships allow for familiarisation with teaching and learning styles, which is beneficial for learning: when teacher turnover increases, attainment scores have been shown to drop, evidencing the importance of consistent relationships [119]. This highlights the need for long-term interaction if using social robots to assist in education, alongside thorough development of learning materials.

The majority of the studies considered as part of the analysis conducted here only look at single interactions, rather than interactions over time. There is evidence for changing preferences (and thus possibly changes in subsequent learning outcomes) over time, as seen in [120]. Of course, a relative lack of long-term data in HRI is understandable because of the immense challenge in enforcing methodological rigour over extended periods of time and the ethical implications of using atypical conditions (such as the low immediacy robot condition from the evaluation in this paper) in real-world learning.

7.3 Nonverbal Immediacy and Interaction

Due to the potentially great benefits of using robots as tutors in one-on-one interactions [121, 122], and the possibility of personalisation in such contexts, this seems to be an apt means of applying robots in education. Whilst nonverbal immediacy addresses how competent a speaker is at communicating towards others, i.e. how well a teacher can convey information to students, in one-to-one tutoring it is important to be competent at two-way communication as well. As such, it may be that the approach taken in this paper would need adapting for one-to-one tutoring, incorporating more principles from dyadic interaction work.

Fig. 6
figure 6

Representation of the role of social cues in dyadic HRI. Social cues are used as modulation behaviour within the interaction

Social behaviour plays a key role in dyadic interaction and on the outcome of communication within a dyad. The role of communication, or the social interaction within the dyad, in such a scenario is posited to be “the mutual modification of two ongoing streams of behaviour of two persons” [123]. The behaviour of one party affects the behaviour of the other. In this view, social cues are used as part of the modulating behaviour in this process (Fig. 6) and can therefore be utilised in many processes influencing education.

The joint modification of behaviour within the dyad gives rise to the need for regulation and alignment of behaviour in order to simultaneously transmit and receive information [124]. All parties engaging in a social interaction must continually adapt the social cues they are using in order to effectively construct the interaction [125]; for example, verbal turn-taking must be regulated through the use of various social cues [123]. Such regulation is important in learning interactions, indicating when it is appropriate for learners to ask questions, and when it is time for them to receive information; learning is more challenging without social cues or conventions to manage this turn-taking [126]. This simple coordination in interaction is vital and has been shown to influence cognition from infancy [124]. Even in unstructured interactions with robots, children appear to actively seek such turn-taking in interactions [127].

These kinds of interaction phenomena are not catered for in nonverbal immediacy measures. The evaluation in this paper saw positive results, but the interaction between the robot and the humans was largely in one direction (the robot instructing the humans); the robot was not responsive to human social cues or behaviour. This is an area which needs further exploration in HRI: the question is when the interaction becomes more interactional than those presentational behaviours considered in the present study, do immediacy principles hold, or are additional behaviours (such as turn-taking policies) required? We propose that in the absence of further evidence in such contexts, the application of the nonverbal immediacy metric provides a suitable basis for initial investigation.

7.4 Using Nonverbal Immediacy in HRI

Whilst the evaluation in this paper had positive results and confirmed (or partially confirmed) three of the four hypotheses, it should be made clear that there are limitations imposed by the study design which could inhibit how well these findings translate to other scenarios. The human condition was shown through a video, whereas the robots were physically present. This means that a comparison between the recall and nonverbal immediacy scores from the human and the robot conditions could be influenced by embodiment, or social facilitation effects [128]. It should be noted that in this study, we do not directly compare between these conditions: comparisons are made within robot conditions, or from children and adults, but not between the human and robot conditions.

The reliability metrics across the conditions demonstrate the effectiveness of the nonverbal immediacy characterisation of social behaviour. Generally, the adult raters have high reliability levels, which reflects the behaviour seen in the literature. That this applies to ratings of robot behaviours indicates the applicability of the metric. Whereas the alpha statistic is lower for children, there are two points of note. Firstly, there remains a reasonable consistency for the ratings of the human condition—this extends the literature by showing the ability of children (in addition to adults) to use the nonverbal immediacy metric. Secondly, for both children and adults, there was agreement in the ordering of relative immediacy levels between the conditions—this indicates that the non-verbal immediacy scale is sensitive enough for the present study, for both adults and children.

A number of caveats apply however that require further investigation. A high reliability score is found for the adults who saw a robot condition, but this is not so high for those who saw the human condition. This may be due to relatively low subject numbers when considering only the human condition (13 subjects), where inconsistency from one or two individual subjects could have a large impact on the alpha value. The reliability for the human is higher for children than for adults, suggesting the difference in subject numbers could be a factor. Alternatively, it could be a result of embodiment: the robot conditions were seen in person, whereas the human was shown on screen, which may have influenced the reporting of social behaviour on the questionnaire.

The Cronbach’s alpha statistic for the children who saw a robot condition is considerably lower than that of the adults. This is not so surprising, given the complications highlighted in the literature of using questionnaires with children [103]. However, it may also be a product of limitations in robot social behaviour. Cronbach’s alpha measures the internal consistency of questionnaire items. Whilst some inconsistency is likely due in part to child interpretations of negatively worded items [102], there are some items within the questionnaire that the robot behaviour itself is probably not consistent in. For example, the questions related to smiling and frowning are opposites of each other in terms of calculating a value for the scale, but could both be answered as ‘never’ performed, as the robot does not have moveable facial features. Such a response would provide maximum inconsistency between these items. This would not necessarily reflect the reliability of the questionnaire, but a limitation in the ability of the robot to implement all of the questionnaire items. The same argument could be made for the items concerning touch—it could be considered that the robot never touches the observer, whilst also not ‘avoiding’ touch, as the question is worded. Inclusion of these two behavioural elements (that were not possible in the evaluation here) in subsequent work exploring the use of nonverbal immediacy for characterising robot social behaviour would likely yield higher reliability scores.

The interaction was also over a very short period of time (approximately 4–5 min) and the measurement of learning was through recall. Although recall is a fundamental element of learning, it is very different from understanding or applying knowledge, or from the higher dimensions of learning as defined in the revised version of Bloom’s taxonomy [29]. Early results suggest that nonverbal immediacy can also be applied in slightly longer interactions, and in dyadic contexts, with learning positively improved as nonverbal immediacy increases [18]. However, longer scale studies with a variety of robots and learning materials would certainly add more weight to the evidence of how well nonverbal immediacy can be applied to HRI.

8 Conclusion

This paper introduced a variety of literature from the well-established area of research studying nonverbal immediacy. Nonverbal immediacy can be used to characterise social behaviour through observer-reports on the use of social cues, such as gaze and gesture. We explored HHI and HRI literature relating to these cues and brought the findings together into a set of guidelines for robot social behaviour. These guidelines were implemented in an evaluation that compared an intended high nonverbal immediacy and a low nonverbal immediacy robot. A human condition was also included to link the work here to existing nonverbal immediacy literature and provide validation for the use of nonverbal immediacy with children. Several hypotheses derived from the nonverbal immediacy literature were confirmed. Both children and adults judge the immediacy of humans and robots in a similar manner. The children’s responses were more varied than the adults, but it was still possible to identify a significant difference in their perception of the social behaviour between the two robot conditions. Children also recalled more of the story when the robot used more nonverbal immediacy behaviours, which demonstrates an effect predicted by the literature. While there are some limitations in the measure, it is proposed that nonverbal immediacy could be used as an effective means of characterising robot social behaviour for human–robot interaction, for both adult and child subjects.