1 Introduction

Educational robots are often used in the educational system as tools to teach Science, Technology, Engineering and Mathematics (STEM) skills [23, 53]. They are built and programmed for the sake of learning computer science and engineering curriculum. On the other hand, social robots’ goals lie in the social domain, wherein their interaction with humans is the focus of their programming and function [6, 12].

Fig. 1
figure 1

Children playing a Hebrew root game with the (left) Patricc and (right) Nao robot platforms

Their potential contribution in the field of education might be especially large; taking into consideration the financial limitation in the field of education, the goal of teaching in a more personal manner through small groups and the desire for new, innovative learning methods accelerate the development of educational robots.

Social robot tutors have been incorporated into the educational system in different settings [24, 30, 58], but the use of robotics in extracurricular settings has focused primarily on STEM education [8, 32, 35, 53, 59]. Summer day camps that incorporate robotics are becoming more popular but generally do not use social robots for teaching non-STEM-related material [35, 53]. More often, social robots have been used as companions in camps, such as for children with ASD to aid in the child’s development of social and vocational skills [22].

Furthermore, while many studies in Human-Robot Interaction (HRI) target kindergarten and early school-aged children [4, 25], summer camp robotic activities are usually designed for older children [8].

Since summer day camps offer important opportunities for learning and social robots are emerging as promising educational and engaging platforms, we explored how to integrate social robots into an educational, non-STEM-related activity in a summer day camp. We aimed to test the feasibility of such a large-scale integration using a novel, low-cost robotic platform in a learning-oriented summer camp activity for young children. The study was conducted throughout a three-week-long session of summer camp, a challenging “in-the-wild” context in which children are more accustomed to water-based activities than to learning more “dry” language-oriented activities.

Specifically, we used two different social robot platforms, Patricc and Nao (Fig. 1), to teach children aged 5–9 how to identify Hebrew roots, i.e., consonants that carry the basic meaning of words, via short, engaging activities. The developed grammatical teaching method employed was not designed for social robots only, and theoretically, human teachers may use it as well without lesser success. An informal pilot study (not reported here) showed surprisingly good results for teaching the root system for kindergarten and first and second-grade children. With these results, we approached senior officials in the Israeli ministry of education. Although they were highly impressed by the results, they denied the possibility to use this method in schools. The main argument was that even though identifying roots is part of the Hebrew language curriculum, in practice, the teachers have no in-depth grammatical knowledge; therefore there is no feasibility to such a program. Adjusting the method for social robots was our answer to this built-in difficulty.

In the study presented here, we did not compare directly human vs robot tutors. Instead, we referred to the preliminary knowledge of the second-grade kids as the reflection of human teaching in practice, as identifying roots as the basis for recognizing “words family” is supposed to be taught in school by the end of second grade, and compared it to the knowledge gained by the interaction with the robot platforms.

Furthermore, we examined the capabilities of the educational robots to engage with larger groups. We compared how the children’s learning progress and experience were affected by the different morphologies of the robot platforms conducting the same activity.

To assess their progress, we administered quantitative pre-post assessments of Hebrew root identification. After the interactions, we conducted a qualitative interview regarding the children’s attitudes towards the robots. We also presented the camp counselors with questions about the activity to qualitatively assess the setup and their perceptions of the children’s attitudes.

We present the integration process, including the initial setup of the study, the pre- and post-tests, the activities, and the counselors’ survey. We report on novel insights into the most relevant factors for successful, large-scale integration of social robot tutors into a summer camp context.

Our quantitative analysis shows that the children made significant progress in their ability to extract Hebrew roots regardless of their age, gender, number of sessions attended, or group size. We also found that they made this progress regardless of which robot they interacted with most or in what order. Our qualitative analysis reveals children’s preference to learn roots again with a robot rather than a human teacher in the future. Moreover, our qualitative analysis of the counselors’ responses presents more insights into how to improve such an activity. Taken as a whole, these results support our approach for integrating low-cost, scalable and effective social robots into a summer camp setting.

The contributions of this study are threefold: (1) Direct comparison between two robot morphologies, revealing similar learning gains, but significant differences in children’s preference for the more robotic-like platform. (2) Effective educational activities for young children in summer camps using social robots. (3) Establish the hypothesis that even young children can learn more advanced curriculum of Hebrew grammar, using the right setting.

2 Related Work

2.1 Social Robots in Education

One of the most promising venues of socially assistive robots is in the field of children’s education [6, 14, 34]. In recent years, these robots have moved from the lab into more naturalistic scenarios, such as pre-schools [25], schools [30], and even children’s homes [44]. They have been used to promote learning in various disciplines, such as language [5, 20, 25], reading [16, 20, 31], and science [47]. Social robots can take on a variety of roles. The robot can be used as a passive learning aid, such as when students are building robots [53]; as a peer or companion in learning with the students [36]; or as a tutor, where the robot teaches students [7]. Moreover, social robots have been used as teachers using frontal lecture mode [49], one-on-one interaction [17, 48] and recently also in small groups [30, 41, 50]. The top three applications for robotics in education have been identified as robotics, language education and robot teaching assistant, whereas preschool and primary school groups have been identified as having the greatest potential to implement education robots [14].

In this research, we use social robots as teachers of small groups of kindergarten and first- and second-grade children for morphological language activities in summer camps.

2.2 Robot Morphologies

A variety of robotic platforms ranging from low-cost kits, e.g., LEGO Mindstorms, to high-cost humanoid robots, e.g. Nao, have been utilized in education [34]. LEGO Mindstorms kits are mainly used in STEM-related activities [1, 23]. Relatively low-cost non-commercial social robots [45, 57] are used extensively to study various child–robot interactions. Recently, a medium-cost commercial robot, Jibo [10], was introduced in several HRI studies [2, 44]. The Nao platform is probably the most used in child–robot interaction, despite its high cost [5, 56].

The morphology of the robot has a direct effect on how humans perceive it. Robots with anthropomorphic features are generally expected to have more human-like capabilities, whereas caricatured or zoomorphic robots are expected to have less human-like or familiar capabilities and may be expected to have limited functionalities [19]. Studies have found that humans perceive robots with high human-likeness as more suitable for serving as teachers and caregivers, whereas robots with greater animal-likeness are more suitable as toys, entertainment, and companions [19]. Furthermore, minor changes in humanoid morphologies have been found to induce no significant effects [29].

However, relatively few studies have directly compared different robotic platforms in educational settings, and there is no direct evidence of the effects of high-cost robots and the enhanced expressiveness, functionality and interactivity that they provide on educational outcomes compared to low-cost robots with limited interactivity [8]. In our study, we compare the effectiveness of a non-commercial, low-cost, puppet-like robot and a commercial, high-cost humanoid robot to showcase the ways in which low-cost robotics can be integrated as educational tools with no deleterious educational effects.

2.3 Robots in Summer Camps

Robot summer camps are gaining popularity every year, but their main intentions are to teach children how to build and code robots and to encourage children to pursue STEM fields, similar to intercurricular robotics [8, 52, 53]. Social robots have been utilized as companions, for example, as friends for children with ASD [22]. Yet even in this study, a criterion for inclusion in the study was an interest in robotics.

In our study, we bring the classroom social robot experience to a new setting and with younger participants to observe how social robots can be integrated into standard summer camps and whether they can “compete” with other activities.

2.4 Hebrew Language Morphological Background

The activity we chose for the summer camp with social robots is learning the morphological properties of the Hebrew language. The two main morphological properties of Hebrew are the root and pattern, whose combination and close relationship are the essence of Hebrew words [9, 39]. Hebrew roots are not pronounceable words themselves but are abstract entities that carry the core meaning and “consonantal skeleton” of nouns, verbs, and adjectives, while the surrounding components help portray grammatical and categorical meaning [39].

For example, different patterns used with the root z-r-k derive words with different meanings that all share a semantic connection, as shown in Table 1.

Table 1 Examples of Hebrew roots, patterns and words

The ability to explicitly and consciously manipulate and inflect morphemes to change their meanings is called morphological awareness [13, 54]. The bound nature of Hebrew “forces” children to attend to word-internal structures [40]. Consequentially, children who acquire Semitic languages start developing morphological awareness as early as preschool [51, 54]. However, children are unable to explicitly recognize roots of Hebrew words until later in childhood [39, 54]. For this reason, it appears that root- and pattern-based grammar is not as thoroughly taught to younger-grade school students. Given that Hebrew is a morphologically dependent language and that even preschoolers are able to unconsciously exercise morphological awareness, developing methods for teaching Hebrew roots and patterns to children in their early elementary years should become a higher priority.

3 Methods

3.1 Robots

For this study, we used the two robotic platforms shown in Fig. 1. The first is the popular yet expensive Nao robotic platform [46]. Nao is a humanoid robot of height of 0.57 meters, weighs about 4.5 kg and has the appealing appearance of a human toddler. It has 25 degrees of freedom (DOF), which enable very expressive motion. It has two cameras, four microphones and force sensors on its legs, arms an head. It can be programmed using Choreograph [37], a visual programming tool, which has a very large and extensive behavioral database.

Fig. 2
figure 2

Patricc robotic platform. a Mechanical sketch. bd Patricc’s puppet costumes

The second is Patricc, Fig. 2, a novel 3D-printed robotic platform [18]. The robot was designed for large-scale deployment in educational facilities and child–robot interaction research. The robot is built from 3D-printed parts and widely available off-the-shelf products. The servo motors that facilitate the robot’s motions also serve as structural parts, therefore minimizing the amount of parts that need to be 3D printed.

To increase children’s engagement with the robot, it was designed as a human-like torso. The robot has 8 DOFs that enable it to perform child-like actions, such as gazing toward the child [11], participating in joint attention situations with the child [60] and performing a wide range of gestures [15]. One of the DOFs controls the mouth for synchronized speech and mouth movements. The shoulders and neck joints are designed in a manner such that the axes of the links of the robot’s mechanism and the axes of the joints are non-orthogonally connected. This configuration is intended to enhance the expressiveness of the robot and create the illusion of a higher amount of degrees of freedom [21].

The robot has three puppet-like, exchangeable costumes that can be used to easily “dress” the robot. These costumes are intended to give the children the ability to interact with multiple characters using only one robot and to create a richer interaction and delay the novelty effect. Once on the robot, the costume remains connected to the robot by utilizing a set of strong magnets. Another advantage is that letting the child choose the character they prefer to interact with and letting them “dress up” the robot may increase the child’s engagement in the educational activities.

The two robots differ in several aspects:

Appearance. While both Nao and Patricc have anthropomorphic qualities, Patricc has more zoomorphic characteristics, including bright colors and fur. Patricc also has multiple characters, whereas Nao has only one.

Autonomy and Interactivity. Both Patricc and Nao perform a simple task that does not require full reciprocal interaction and response to the group. However, Nao has more autonomy both in movement and in the demonstrations performed in front of the groups during the sessions. Nao also includes a larger repertoire of interactive behaviors, which was also showcased in the demonstrations.

Cost. The Nao robotic platform is in the high-cost range of several thousands of USD. The prototype Patricc platform’s material costs, including three puppet-like costumes, are less than 800 USD.

3.2 Software

The experimental setup was programmed in Python using the ROS protocol. We deliberately used the same code for both robots, which differed only in their simple, generic, and expressive behaviors. Both robots had the same verbal component composed of pre-recorded Hebrew speech. Patricc’s movements were simple behaviors that were pre-recorded using our authoring tool, which included a Kinect sensor and a simple human-robot skeleton-angle transformation matrix. We used Nao’s generic “Explain” behaviors, which are part of the precoded behavior repertoire of the robot.

To enable scalable, easy-to-use software, we developed a simple text-based protocol of the sequence of behaviors and sounds. Thus, recordings were uploaded to the computer/robot for Patricc/Nao, respectively, and the text file listed the sequence of audio files and behaviors to play. This enabled us to easily change and add content to the general system, something that proved vital in the integration of the setup into the summer camp.

3.3 Teaching Hebrew Morphology

To cultivate active morphological awareness in early childhood, we developed a unique method. Our method is not based on a theoretical understanding of the abstract term “root” but rather on attributing morphological awareness development to the identification of the root through exposure to words and their roots at a young age. Participants were presented with a rhythmic series of words and their roots, which enabled them to ascertain the different verb patterns related to the roots. With this knowledge, we hypothesized that they would develop the ability to extract roots from verbs and nouns of different complexities, some of which are only presented in more advanced school curricula. This method aims to improve children’s understanding of morphological structure in a direct way, without the mediation of writing.

In contrast to teaching in schools, where there is a relatively regular teaching order beginning with the first pattern, or binyan, CaCaC, and continuing according to the grammatical complexity of verbs, our lessons’ teaching order was based on the difficulty level of root extraction. For example, the CiCeC pattern and the CaCaC pattern were initially taught together. Since their third-person, singular past-tense form does not include an affix, the root is easier to identify. Moreover, the question of the semantics of the verb, including the question of an active or passive relationship, has no significance for the degree of difficulty in identifying the root.

High linguistic competence is expressed through identification not only of roots of words that a person recognizes and understands but also of roots of words that the child or adult is not familiar with. By identifying the root of unfamiliar words, the speaker is able to decipher some semantic components of the word. This skill is especially essential for children, who are in the process of acquiring their first language and face unfamiliar words on a regular basis. The importance of improving the children’s ability to handle unknown words was revealed in the initial stage of our research. In the initial assessment of our study, the children were asked to identify the root of the verb lehantsiax (meaning perpetuate, immortalize, preserve, or eternalize), which was unknown to most of the participants. Many children explicitly expressed their lack of familiarity with the given word, which prevented them from even trying to guess the root, as they did before with other complicated verbs. Only 1 of 46 correctly decoded the n-ts-x root. Therefore, as part of the lessons, the children also practiced pseudo-words that do not exist in Hebrew, but have a correct morphological structure. Two such entries were added to the post-test: hitralesh and zilatnu.

3.4 Participants

Participants were recruited from our institution’s summer camp. While all children participated in the activities with the robots, we only collected and analyzed the data of children whose parents signed a consent form, which was distributed as part of the other documentation associated with the summer camp. A total of 46 children participated in the study. Children who did not complete the full pre-post tests (\(n=4\)) were excluded from the analysis. The analyzed participants (\(n=42\)) were \(6.7\pm 0.9\) years old and included 24 females and 18 males. The study was approved by the institutional IRB.

4 Research Questions and Hypotheses

The study presented in this contribution takes a holistic approach to address the question, whether social robots can be introduced into summer camps with non-STEM related activities. We thus investigate several unique characteristics of summer camps that social robots should address for successful integration: (i) the large scale and limited personnel in summer camps; (ii) cost-effectiveness of the solution and; (iii) learning outcomes of the activities.

4.1 Research Questions

4.1.1 Scalability (RQ1)

Is the robot activity compatible with a large group of children? How many people are needed to operate the social robot setup for large groups of children?

4.1.2 Robot Type (RQ2)

Does the morphology of a robot impact the progress made by the children being taught? Can a less-expensive robot entertain and teach children in a satisfactory and effective way?

4.1.3 Learning Outcomes (RQ3)

Can the children extract Hebrew roots after the few and short interactions with the robot? Do learning outcomes depend on the number or specific activities they undergo?

4.2 Hypotheses

Our testable hypotheses mainly relate to the learning outcomes and their dependence on variable factors. We hypothesize that children will learn to extract morphological roots after the activities with the robots (H1). Moreover, we hypothesize that the more activities children experience, the more they will improve (H2). We hypothesize that older children will improve significantly more than younger children (H3), but that there will be no gender difference in learning outcomes (H4). We also hypothesize that the group size will have no effect on the children’s learning outcomes (H5).

In relation to the robots, we hypothesize that, due to the simplicity of the activity, there will be no significant difference in learning outcomes between activities with the two robotic platforms (H6). However, we hypothesize that the children will prefer the Nao platform significantly more than the simpler, less-interactive Patricc platform (H7).

The study was aimed to address these research questions and hypotheses, where a mixed qualitative- quantitative analysis was applied to RQ1 (H5) and RQ2 (H6-7) and a quantitative analysis was applied to RQ3 (H1-4).

4.3 Conditions

There were four conditions for robot session interactions. The children played the learning session either with only Patricc (\(n=10\)), with only Nao (\(n=8\)), with Patricc followed by Nao (\(n=10\)), or with Nao followed by Patricc (\(n=14\)). Children’s assignment to each condition was randomized.

The only Patricc and only Nao conditions were directly aimed to compare the robots’ effect on learning outcomes (first part of RQ2), whereas the mixed conditions also addressed the robots’ influence on preference, namely, whether first learning with one robot influenced preference over the other robot (second part of RQ2). Data from all conditions were used to address the scalability (RQ1) and learning outcomes (RQ3) research questions.

Furthermore, while each child was assigned to a specific condition, which determined which robot platform they learned with, children were exposed to Nao during the demonstrations. Thus, children who were assigned to learn with Patricc expressed their desire to also learn with Nao. Thus, out of the four conditions, only one condition included children who did not learn with Nao, but were still exposed to Nao in non-experimental contexts, thus maximizing the enjoyment of as many children as possible.

5 The Study

We conducted a 3-week study in a summer camp in which participants played a Hebrew word game led by one of the robots, Fig. 3. The game’s design was similar to a teacher’s presentation of questions followed by answers. Each activity was pre-recorded with the word-root combinations. The robot acted as the teacher in this interaction and interacted with a group of 4–9 children at a time.

The first session was dedicated to performing individual pre-tests with the children, presenting parts of the first activity, and engaging the children in an interactive, extracurricular demonstration with Nao. The next couple of sessions (either two or three) were dedicated to more Hebrew root activities accompanied by more fun demonstrations with Nao. The last session was dedicated to reviewing the learned material, completing a post-test with all participants, and engaging in an interactive demonstration with Patricc. In other words, each session was composed of learning activities. The activities were presented in a fixed order, but the robot with which each activity group engaged in root extraction activities were randomized and yielded the different conditions. The learning activities were followed by an engaging, unrelated demonstration of the robot platforms, initially with Nao (sessions 1–3) and then with Patricc (last session) (see Fig. 3).

Fig. 3
figure 3

A schematic diagram of the study flow. a Each participant engaged in 3–4 sessions and up to 5 root learning activities. The sessions were composed of assessments (light gray), learning activities (pattern) and demonstrations (dark blue). b, c Activities varied in length and composition, and additional changes about individual activities are described in parenthesis: b structure of activities 1–3 and c activities 4–5

Below, we describe the general setup and the quantitative and qualitative data collection. This is followed by a detailed account of each week’s session, challenges encountered, and lessons learned.

5.1 General Setup

The study took place in three rooms during the morning of a regular summer camp day. Two summer-camp groups rotated through our activity every morning over the course of three weeks. During these rotations, each summer-camp group was split up into smaller groups of 4–9 children who would interact with one robot at a time. The interactions between the children and the robot took place on the floor, where the children sat in a semi-circle around the robot, as shown in Fig. 1.

The setup included bringing each robot, computer, and accompanying equipment such as speakers, charging cables, video cameras and external TP-Link routers before the trials every morning.

On the first day, we introduced four robots, i.e. two Patricc platforms and two Nao robots, to four smaller groups all in one room. Upon the first interaction between the groups of campers and robots, it became clear that the initial setup was not feasible. Not only was it extremely challenging to hear the robot given the amount of auditory stimuli in the room, but the children became distracted by the other robots and did not want to listen to the robot in front of them. To overcome this issue, we reduced the number of robots in the room to two. Even though sound was no longer an issue, the children were still curious about the other group’s robot, regardless of the fact that both robots were identical and were engaging in the same activity. Therefore, for the remainder of the study, we separated each group into an individual room and reduced the total number of robots to three.

5.2 Quantitative Assessments

5.2.1 Pre-test

In the first meeting with any summer camp group, we assessed each participant’s initial ability to consciously extract the roots of Hebrew words. We introduced the assessment as a word game that would continue with the robots. The format of this assessment was very similar to the structure of the root activities to follow. While the participants were not told explicitly what a root was, they were given a series of examples including 5 word-root pairs. Then, they were presented with 10 questions and expected to give any answer that came to their mind. The interviewer assessing the individual was instructed not to give feedback on the correctness of the answer but rather to give encouragement after every answer. At the end of the assessment, the participants were given a sticker.

5.2.2 Post-test

The post-test was longer than the pretest. In addition to presenting the same 10 questions introduced in the pre-test, which were not taught during the teaching activities, the post-test included 8 additional words that were taught in the lessons and 2 new pseudo-words in Hebrew. Similar to the pretest, the interviewers were asked to respond positively to the children’s answers regardless of their accuracy.

5.3 Qualitative Data Collection

5.3.1 Children

During the post-test, the children were asked the following questions about their perception of the robots:

  1. 1.

    If you were to play this Hebrew root game another time, which robot would you prefer to play with you? Patricc or Nao? Why? Here, the children were presented with a photo of each robot and were asked to point to the photo of the robot of their choice.

  2. 2.

    Would you like to play with these robots again?

  3. 3.

    Would you rather learn roots with a teacher in your class next year or again with a robot? Why?

5.3.2 Counselors

During the last week, the summer camp counselors filled out a survey about the robot activity within the context of the summer camp. Only 6 of the 16 counselors completed the following questionnaire:

  1. 1.

    What rating would you give our activity? Why?

  2. 2.

    If you could change something about the lesson, what would you change?

  3. 3.

    Which robot do you think the children preferred playing with?

  4. 4.

    With a short explanation, it is possible to teach the process of setting up and running the robots. Would you be interested in learning how to activate the robots?

  5. 5.

    Answer based on your agreement with the statements below (5-point Likert scale):

    • The children continued to talk about robots after the session.

    • The kids enjoyed coming to the robot activity.

    • It is important that future summer camps include a Hebrew root/robot activities.

    • The campers learned more in the robot activity than in other summer camp activities.

5.4 Interactive Sessions

5.4.1 First Session

Each summer camp group spent approximately 30 min in any one session, and within that time frame we needed to be able to individually assess each child and present the first activity to them. During the time of the initial assessments, children engaged in another unrelated activity. Then, they were split into their activity groups and presented the first robot activity, which was structured similarly to the pretest. After giving a series of example word and root pairs in a recitation fashion, the robot asked the group a question about the root of a word. After a delay of three seconds, in which the children were given time to answer out loud, the robot presented the correct answer followed by words of affirmation such as “Good Job!” or “You are awesome!”

After the first few minutes of the six-minute activity, the children were not as captivated by the novelty of the robot and had a harder time participating and engaging with the robot’s requests. Several children complained of boredom given the repetition and slow nature of the first activity. With some parts of the activity, they could not hear as well unless they oriented themselves in a particular way and from a particular distance from the robot, which made focusing on the activity more challenging for many children.

We acknowledged that the nature of a summer camp required us to change the presentation of the activity from a more traditional classroom lesson to a game. We believed that the best way to make the children more receptive to the activities was to introduce more interactive requests from the robot, such as to jump up and down, scratch one’s head, and laugh out loud. Afterwards, the kids were asked to analyze the verb of the activity they were asked to preform. These changes were implemented in the future activities presented by the robot in the following sessions.

5.4.2 Second Session

We tested the scalability of the robot setup by splitting the children into groups of up to 9 children at a time rather than the original 4 that was planned. In these sessions, we also introduced Patricc’s different costumes. Since we did not need to run a pre-test, we decided to break up the activities into shorter segments, each 6 min long, and entertain the children with a Nao demonstration in between.

The two new activities, namely, activity 2 and 3, included more interactive words that encouraged more participant engagement. The recitation examples from the robot were also supplemented with additional voices to echo the answers of the children. This was used to promote the idea that the children were expected to answer the robot out loud.

5.4.3 Third Session

Some of the groups were able to participate in a third session in which they engaged in a fourth activity with the robots. In this activity, we tried to minimize the number of transitions made and experimented with a longer, 19-min activity that included a fun game with the robot rather than bringing all of the kids in between sections to play with Nao. Not only did we want to present a side of Patricc that they had not seen yet, but we also wanted to see if such a lesson could function as a more hands-off interaction between the children and the robot for an extended period of time. This activity included made-up words that offered an additional challenge to the children and presented the words as riddles rather than just questions or examples. Although these two challenges more successfully engaged and interested the children in all age groups initially, the longer length of the hands-off session made it difficult for many children to participate and stay engaged.

5.4.4 Final Session

In the last week, we presented a shortened, 6.5-min version of the fourth activity described previously that included the sections from the long activity that the kids felt was most challenging and included words with structures that had not been introduced before. For example, it included riddles and questions about pseudo-words in Hebrew. Unlike the previous activities, this activity did not include the interactive requests. This was followed by the post-test. Children who finished the post-test were invited to play a game with Patricc where they controlled its limbs via a Kinect sensor.

6 Results

6.1 Children Learned to Extract Roots

The pre- and post-tests included the same questions, yet the children received no feedback or correct answers for either test. The number of pre-test’s correct answers were significantly lower than the post-test’s (Pre-test \(M=1.98, SD=1.87\), Post-test \(M=7.07, SD=2.13\). Post-Pre: \(M=5.1, SD=2.1, t(41)=15, p<0.001, d=2.4\), two-tailed t-test), Fig. 4a (RQ3). On average, the children successfully identified five more words in the post-test than in the pre-test. This difference was highly significant, supporting H1.

Furthermore, the children were presented with 10 extra words and had impressive success in extracting the roots of these words (Extra \(M=7.5, SD=2.1\) correct root extraction). We compared the pre-test and these extra words. Children extracted significantly more correct roots from the extra words after the intervention, compared to the pre-test (Extra-Pre: \(M=5.6, SD=2.1, t(41)=17, p<0.001, d=2.6\), two-tailed t-test).

Henceforth, we report the analysis of different factors’ correlation to the improvement in the same words between pre and post correct extraction of the roots. In our absence of effect analysis [28, 38, 42], we consider a difference of more than one correct root extraction to be meaningful, for the following reasons: (i) consulting with experts in the field, a single word difference was considered to be not meaningful enough; (ii) \(\pm 20\%\) is a common measure [42], which in the current study amounts to a single word difference.

Fig. 4
figure 4

a Pretest and post-test number of correct words. be Change in number of correct root identifications between post-test and pretest as a function of b gender, c age, d group size and e robotic platform condition.*** p!‘0.001

There was variability in the number of activities each child participated in due to logistic concerns and attendance. Hence, we were able to test whether the number of activities correlated with the children’s improvement. We found that there was no significant correlation between the number of activities and learning (\(F(1,40)=0.515, R=0.113, p=0.48\) linear regression) and thus failed to find support for H2.

Since there were five different activities, we also tested whether participation in any one activity was a significant predictor of learning. We found no such significance (\(F(5,36)=.288, p=0.92\), multi-linear regression).

6.2 No Age, Gender or Group Size Difference in Learning

We tested whether age and gender were correlated with learning outcomes. We first tested hypothesis H3 and found no significant correlation between age and learning outcome (\(F(1,40)=.012, p=.91\), linear regression), thus not supporting H3.

To test hypothesis H4, we used the confidence interval approach to test for equivalence, or “absence of effect” [38, 42] and found the confidence interval of the difference in learning outcomes between genders to be close to, but higher than 1.0 word (\(F(1,40)=0.028, p=0.869, \eta _p^2=0.001\) for gender, one-way ANOVA, \(Male-Female: M=0.11, SD=0.67, CI=[-1.2, 1.4]\)), Fig. 4b. This result means we cannot conclude equivalence between genders, claiming indeterminacy with regard to H4.

Children interacted with the robots in different-sized groups. Given our scalability research question (RQ1), we tested whether interactions within larger groups correlated with learning. Since each group size had only a few participants, as shown in Fig. 4d, we performed a linear regression analysis with group size as an independent variable and found no significant correlation between it and learning outcomes (\(F(1,40)=0.008, R=0.014, p=0.93\) linear regression), as shown in Fig. 4d. To test hypothesis H5, we again employed the confidence interval approach and found that, with 95% confidence, a difference of 1–2 children in group-size has no meaningful effect (\(\beta : M=0.016, SD=0.182, CI=[-0.35, 0.38]\)). This lends some support to hypothesis H5, but larger group-size may result in significant learning outcome differences.

6.3 Learning Outcome does not Depend on Robot Morphology

We tested whether interacting with the different robotic platforms had any influence on learning outcomes (RQ2). Children encountered Nao and Patricc a different number of times (Nao:\(M=2.1, SD=1.5\) (0-5), Patricc:\(M=1.8, SD=1.2\) (0-3)).

We first defined an independent variable of the difference between number of sessions with Nao and Patricc (Nao-Patricc: \(M=0.33, SD=2.4\), Range [-3, 5]). We performed a linear regression between this variable and learning outcomes (\(F(1,40)=0.3, R=0.086, p=0.587\), intercept: \(M=5.144, SD=0.33, CI=[4.470, \) 5.817], \(\beta : M=-0.074, SD=0.134, CI=[-0.345,\) 0.198] linear regression). Employing the confidence interval approach, these results show that with 95% confidence, a difference of one session in either direction did not cause a meaningful effect. Moreover, the maximal effect of robotic platform on learning outcomes was, with 95% confidence, 2.76 root extractions, which, given the intercept confidence intervals, means that learning with both robots induced significant positive learning outcomes.

Furthermore, due to our inability to test for significance using a 4-condition ANOVA because of the small number of participants in each group, we compared the means of the four conditions and found that all were within each other’s 95% confidence intervals, as shown in Fig. 4e.

Taken together, these results show that, while the effect of which robotic platform the children learned with on learning outcomes is very small and both platforms are effective, they cannot be claimed to be significantly equivalent, resulting in indeterminacy with regard to H6.

6.4 Children Preferred Nao Over Patricc

We also asked the children about their robot preference. An overwhelming majority (Nao: 33, Patricc: 5, Other: 4) preferred playing with Nao. The counselors reported the same preference (\(100\%\)). These results support our final hypothesis H7.

Despite the fact that the counselors reported low scores for the children’s enjoyment during the activity (2.5 on a 5-point Likert scale), most children said they wanted to play with the robots again (Play: 34, Not play: 2, Other: 6). Moreover, when asked specifically about the root activity, most children said they would prefer learning with a robot rather than their own teacher in the future (Robot: 32, Teacher: 1, Other: 9).

6.5 Camp Counselors’ Impressions

To better understand how to integrate social robots into summer camps, we asked the counselors for their opinion.

In general, the counselors reported that the activity was good (rating = \(7.4\pm 2.7\) on a 10-point Likert scale) but that the children grew bored and frustrated with the Hebrew root activity throughout the three weeks. They recommended that for future presentations, the activities should be short and less repetitive in style and content. Furthermore, the counselors shared that the Hebrew teaching activity decreased the overall enjoyment of the interaction with the robots (\(2.5\pm 1.29\) on a 5-point Likert scale).

Regardless, the counselors admitted to being surprised and impressed by the children’s improvement in root identification (\(3.6\pm 0.55\) on a 5-point Likert scale). However, they reported that the children were not necessarily learning more in the robot activity than in other activities offered by the camp (\(2.6\pm 1.52\) on a 5-point Likert scale) and that the children were not likely to talk about robots after the activity (\(2\pm 1.22\) on a 5-point Likert scale). They also did not have strong feelings about incorporating a Hebrew root and robot activity into all summer camps (\(2.8\pm 0.83\) 5-point Likert scale).

Finally, while several research assistants conducted the individualized pre-post assessments, the entire setup required only one person to operate the robots at the beginning of the activity. When asked whether they were willing to learn how to operate the setup for future activities, the counselors expressed mixed opinions. Most of the counselors (\(67\%\)) said they would be interested, and some explained how it might add to the campers’ interest and experience. The others (\(33\%\)) mentioned that it might make it harder for them to attend to other issues with the campers.

7 Discussion

The study reported here has taken a holistic approach and addresses several complementary aspects of integrating social robots into summer camps. We first discuss general aspects related specifically to the summer camp setting, then address the scalablity of the setup. Robot morphology related insights are followed by a discussion of learning outcomes within the context of social robot tutors.

7.1 Summer Camp Settings

We have shown that social robots can be successfully introduced into summer day camps [8, 52, 53], despite the fact that the summer camp setting introduces several challenging conditions compared to school-based interactions, such as unfamiliarity with peers, with the summer camp schedule, with the counselors, with the rooms and with the robots. Moreover, children coming to a summer camp are expecting novel, entertaining and exciting activities all day. Given all the new things and given the pre-established expectation of a summer camp, focus and regularity of a lesson was less easily received by the children. Furthermore, the time frame for each activity was relatively short and varied based on when a group arrived at the activity. Attendance was not always regular, and some children did not interact with the robots as much as their peers. For this reason, groups were also changing.

Hence, due to the summer camp context, several adjustments of the standard child–robot interaction setup in schools are required. The activities and stimuli presented should be shortened to a maximum of 7 min, they should be more interactive by encouraging more movement and appeal to different senses, and transitions between activities within a single session should be minimized.

7.2 Scalability due to Robot-Group Dynamics

The setup introduced in this contribution can be scaled-up (RQ1) due to the fact that a single robot can engage a large group of children, in our case, up to 9 children at a time [41, 49, 50]. However, it is preferable to separate the groups into individual rooms and maintain a 1:1 group:room ratio. Even in this arrangement, only a single non-professional person was needed to operate the robots at the beginning of the activities, thus representing another scaling-up feature of the setup.

Moreover, the group dynamic and peer pressures within the groups affected participation and engagement of the children with the robot [30, 50]. For example, a child who vocalized their dissatisfaction led to a loss of interest among the other children. However, other groups in which a child vocalized satisfaction showed greater overall participation. These behavior changes are common throughout childhood and into adolescence [3].

Our results indicate that despite these factors, group size had no significant effect on the impressive learning outcomes.

7.3 Robot Morphologies

We compared two vastly different robot morphologies within the same user study (RQ2). Some qualitative differences may have contributed to the children’s preference for Nao. Patricc, despite appearing more puppet-like, has more “robot-like” movements, and the sounds of the machinery are audible [33]. These sounds were distinct as the activity progressed, and many children commented on how distracting and uncomfortable the sounds were. Compared with Nao’s much smoother movements, Patricc’s movements appeared even more annoying.

Furthermore, Nao’s internal speakers have a maximum volume that serves as a limiting factor in presenting the activity in an auditorily challenging environment. In contrast, Patricc’s external speaker offers more flexibility and control over audio stimuli.

Patricc’s costumes also served as a deterrent for some children. Once we started to take off the costume to reveal the robot underneath, they began to perceive the robot platform as a “robot” rather than a “puppet.” However, they continued to comment on the childlike nature of the costumes and yearned for a “cooler” robot. Nao fits this “cooler,” more stereotypical robot aesthetic, and the children were more captivated by the robot even when the robot was stationary and was not utilizing any of its interactive features. The interactivity of the Nao robot during the breaks in the activities may have biased the children towards it. Children often compared the two platforms, and their opinions about the robots noticeably affected their experience within the activities.

Moreover, part of the research question (RQ2) was whether a low-cost robot can produce the same learning outcomes as a more common, yet high-cost platform. The children were not aware of the robots’ costs and reacted to the general appearance, performances, and aesthetic, or “coolness,” of each robot platform. Hence, while there are other morphologies of varying costs [10, 23, 45, 57], one future direction could be to improve Patricc’s perception of “coolness” without increasing its cost.

Despite all the aforementioned differences, the learning outcomes were very similar between the two robot platforms.

7.4 Social Interaction

In this study, there was a lack of full reciprocal interaction between the children and the robot. This was deliberate due to the simplicity of the activities and the size of the groups [30], which limited the amount of interaction each child could experience with the robot. The robot did not respond to the answers given by the children but instead recited a script designed to fit the children’s responses and teach them at the same time. Some children picked up on this and lost interest in the activity because they did not feel like the interaction depended on their participation. They also recognized that the robot was playing a recording and that the voice(s) did not actually belong to the robot.

We opted for this type of interaction as a compromise between full autonomy and wizard-of-Oz operation: Current state of Hebrew automatic speech recognition [27] and natural language processing [43] does not enable speech comprehension. The context in our setup is even more challenging, with large groups of variable-age children. Hence, a fully autonomous and verbally communicating robot was not an option. The scalability criteria for our setup excluded using wizard-of-Oz methodologies. Hence, a fully autonomous, nonreactive robot platform was chosen.

7.5 Learning Outcomes

Previous studies have shown mixed learning gains of robot tutors for second language learning [55]. First language learning has shown more promise [26].

In the current study, despite the limited interaction and the challenging summer camp scenario, children significantly improved in their Hebrew root extraction over a very short period of time (RQ3). The effect was very large and was apparent regardless of age and gender. This suggests that the robotic platforms were age- and gender-appropriate and were effective teaching tools in summer-camp settings.

7.6 Study Limitations

The reported study has been conducted in-the-wild, with all the limitations that entail, such as a complex interaction between the counselors, the physical environment of the summer day camp and the robotic platforms. Furthermore, the relatively small number of participants across the four conditions did not allow us to conduct a proper significance test, thus while our analysis shows that the difference between conditions is small, it does not significantly confirm our condition-dependent hypothesis (H6).

8 Conclusions and Future Work

We have shown that social robots can be successfully integrated into summer day camps using an easy-to-use, scalable and low-cost setup. The children participants had high and significant learning outcomes from a few short interactions with either robot platforms.

In future work, we aim to drastically improve the interactivity of the Patricc platform and introduce visual and auditory perceptual capabilities that will enable the robot to directly react to children’s expressions and answers. Furthermore, we intend to introduce these activities on much larger scales in kindergartens, schools, and additional summer camps to better ascertain the generalizability of our setup. We also plan to lengthen the study to learn about the longitudinal effects of learning roots orally from robots compared with typical methods of learning roots in schools as well as to learn about the retention rate of the information through our teaching methods. Finally, enabling counselors and kindergarten teachers to run the setup by themselves will also be investigated.