Keywords

1 Introduction and Background

The advancements experienced by the eXtended Reality (XR) related technologies over the last decade is unprecedented for this family of media. The availability of cost-effective hardware solutions is promoting its diffusion at the consumer-level. Thus both industry and academy are dedicating significant effort to help XR mediums attain maturity in a variety of contexts and in fields as diverse as engineering, arts, design, architecture, medicine, education, and many more [5]. Since the first days, for both Virtual Reality (VR) and Mixed Reality (MR), one application field suddenly attracting a great amount of interest was training. This is even more true nowadays were XRTSs are moving from the laboratories to the industries, being more and more frequently integrated into the companies training programs [8], especially for the practical and manual task that could benefit from a learning-by-doing setting. Despite the growing body of literature in the field, and the potential of the medium, the vast majority of studies and applications stick to the traditional learning (TL) approach. In traditional learning, a lecturer teaches something to one or more students, possibly using also additional materials such as books, blackboards, or slides. In a common XRTS, the teacher is replaced by the piece of software (not necessarily by a teacher avatar) that guides the trainee in the experience, for instance by providing step-by-step instructions [16]. Even tough the intrinsic engaging nature of VR and MR already boost the training effectiveness through embodiment, there is so much more that can be done. Since the 50’, pedagogists have endowed significant effort in developing didactical models to help students climb the learning pyramids [15] effectively. On the opposite side of traditional learning w.r.t the didactic model spectrum, there is the so-called learning by teaching. It grounds on the naïve practice of peer-tutoring, in which students tutor other students by teaching each other self-learned domain knowledge from traditional (or not) sources. In spite of the fact that in normal conditions (humans teaches humans) LBT has proven to be much more effective compared to TL [6, 14], especially for long-term retention of the acquired knowledge, it also suffers from some drawbacks. Besides of being more inefficient (time-consuming) w.r.t TL, the training effectiveness depends on the role taken by the student in a given moment (teacher or tutee) and the two roles are dependable of different kinds of feedbacks and stimuli [10, 18]. The need to replace the tutee peer has led to the rise of the so-called teachable agents (TA). These are (computer) agents that learners can teach about a subject domain, and while doing so, gain a deeper understanding of the subject matter [3]. In other words, the ultimate goal is not to actually program the agent, but exploiting it to stimulate the mental process involved in the LBT approach, letting the learner gain a better understating of the topic through the process of teaching to someone else. Considering that empathy and other several social factors [6] are crucial in the LBT, one of the most promising implementations of teachable agents takes advantage of service robots [20]. Robotic Teachable Agents (RTA) have been investigated by several studies and proved to be equally or even more effective than the TL (still employing robots) [17], and capable of activating the required mental processes needed for an effective LBT experience [17]. Nevertheless, an intelligent training system (ITS) using just a robotic teachable agent is usually limited in terms of modality, being voice explanation from the learner the main (and often only) form of Human-Robot Interaction (HRI) involved in the experience [9]. To extend the potential of the RTA-based learning by teaching intelligent training systems, some researchers begun to use an MR environment together with the RTA. To now, there is a handful of studies on the topic. In a first study [11] a mobile robot along with a spatial MR setup was employed to teach a geometry related topic. In the study was found that learners reacted differently based on the social attribution feedback (positive or negative connotation, and different subject) from the robot, suggesting that the MR environment doesn’t affected significantly the social interaction. However, no direct comparison with a TL version was investigated. In a second study [19] employing the same MR robotic training platform, was studied if the physical RTA constitutes a real advantage in terms of learning effectiveness compared to a digital replica of it (just MR) and to a desktop-like application (no MR, no robot). This reflects a key challenge in designing MR robotic experiences, in which the augmented content could take over at the point that having a physical robot may be useless [13]. No significant differences were reported among the three versions of the experience from the learning gains point-of-view, thus indicating the announced problem could have affected the MR intelligent training system.

With the aim of better clarifying if the MR addition could be detrimental to the robot’s features that enable the LBT approach, and by seizing the call to action from the research community [2, 20], in this paper is presented a preliminary study to evaluate the training effectiveness of a MRRTS implementing the learning by teaching paradigm compared to a traditional learning version.

2 Materials and Methods

The MRRTS was implemented by adopting a table-top projected spatial MR setup together with a commercial off-the-shelf programmable toy robot.

Fig. 1.
figure 1

Anki Cozmo

2.1 Technologies

More specifically the Anki CosmoFootnote 1 robot was selected among others, due to its popularity and because it has several anthropomorphic features that strengths its emotional connotation and supporting social behaviours (Fig. 1). The manufacturer provides an official SDKFootnote 2 for programming it in Python. Cozmo is a non-holonomic robot with a minimum size of \(6\times 7\times 11\) cm and includes two moving parts (in addition to wheels). A first moving part, which can be considered as the “head” of Cozmo, has one rotational degree of freedom (DOF), and can rotate by \({20^\circ }\) downward and \({45^\circ }\) upward. The head of Cozmo is completed by a “face” implemented through a \(2\times 2\) cm LED matrix display, which shows a simplified anthropomorphic facial expression using eye-like animations (which can be selected from a pre-defined list using the SDK). Beneath the display, there is a \({60^\circ }\) wide field of view \(640\times 480\) pixels RGB camera (although the image accessible through the SDK is limited to a \(320\times 240\) grayscale image). A feature of the SDK allow to use this camera to let Cozmo automatically follow the user face (by orienting the robot and the “head” of the robot) simulating a look-at behavior. The second moving part is a front lifter (one positional DOF, likewise controllable through the SDK), which is primarily designed to interact with the bundled tangible objects (interactive cubes not used in this project) but beside of that can also be used in custom ways if programmed, for instance to simulate a robot interaction with the projected environment [13] (tap-like animation). Cozmo is also equipped with WiFi capabilities and a built-in speaker that could benefit from the Text-To-Speech (TTS) functionality included in the SDK. The SDK is designed using an event-driven approach and is rich in features (for the sake of brevity, in this review only the subset of features that were actually used for the implementation were mentioned).

Fig. 2.
figure 2

Architecture of the MRRTS.

Fig. 3.
figure 3

Setup of the MRRTs.

The whole high-level architecture of the MRRTS system exploited in this work is illustrated in Fig. 2, and includes: Cozmo, a RGB-D camera, a projector, an Android smartphone and a PC. As said, the MR selected for the system is a spatial MR setup, and the augmented digital contents are table-top projected. Since this setup, depicted in Fig. 3, was selected as one of the most used one in the literature [13], and is exploited also by already referred previous works on the topic of LBT with RTA [11, 19], just a brief description of our implementation is given in the following. The projector was mounted near the ceiling in order to project the image on the table from the top. To improve the quality of the projected image, the table was covered using a black cardboard of size \(85\times 65\) cm, which is also the size of the projected surface. Because of the fact that was decided to provide the user with the ability to interact with the MR environment using natural gestures [13], in the immediate nearby of the projector it was mounted a Microsoft Kinect v2. For this specific setup, both the \(1920\times 1080\) pixels, 30 fps RGB camera and the depth sensing \(512\times 424\) pixels camera were used. The first is used by the Wizard-of-Oz (WOZ) interface that will described in Sect. 2.2, while the second for hand gesture recognition to enable a touch-based interaction with the projected surface. The depth image is processed using well-known computer vision techniques (background subtraction, depth level thresholding, opening, contour detection). The module is implemented using the OpenCV (v3.2) library and can detect the position of the hand as well as its configuration (i.e. open or closed), and is able to distinguish three touch gestures, i.e. tap, slide and drag.

Since the accuracy of Cozmo’s built-in estimated odometry isn’t sufficient for the devised application scenario, mostly because suffering of drift related issues, a depth image processing similar to the one used for the hand-gestures is performed to endow the system with the capability of tracking the robot position in a outside-in fashion. On average, the tracking capability of this algorithm is \(\overline{Err}=0.81\pm 0.62\) cm. A calibration phase (performed before the game starts) was required to synchronize the Cozmo’s internal coordinate system with the coordinate system used by the external tracking and by the projection, computing the required transformation matrices. Voice feedback was provided (when requested) using the TTS capabilities of the SDK in English language. The lecture logic and graphics were implemented using the well-known Unity game engine (v2018.3), and were deployed to a Windows application running on the PC. The gesture detection module and the robot control logic were instead implemented in another Python application, accessing the functionalities provided by the Cozmo’s SDK. The WOZ interface was developed using a webpage served by Flask and written in HTML5 and Javscript language. The inter-process communication (IPC) among the modules was implemented through ZeroMQ sockets. The Android phone is required for the SDK to work since run its runtime. The smartphone, which has to be connected to the PC running the applications through USB cable, communicates with Cozmo by using a WiFi network hosted by the robot itself.

2.2 Experience Design and Implemented Variants

As said, the aim of this work is to compare the learning effectiveness of TL and LBT didactic models in a MRRTS. To this aim, a new training experience named MireLab was designed and implemented in two variants.

Topic. The chosen training topic for MireLab is the Thèvenin Theorem, from the electronic engineering domain. Due to the fact that the selected target audience was undergraduate students from electronic engineering, it was necessary to select a topic not too basic in order to keep the engagement of the participants, but also not excessively complicated such that the learners have the right level of previous knowledge on the domain thus not being overwhelmed. In particular, are given for granted as background knowledge at least the ohm’s laws and the Kirchhoff’s Circuit Laws. MireLab was designed by getting inspiration from a possible lecture that could take place in an electronic laboratory thus additional lecture material, in that case, would have been slides, paper sheet for notes/calculation, and of course a test bench with components to assemble a circuit and testing the acquiring knowledge.

Common Foundation. In order to minimize the differences, both variants exploit a common foundation about the projected environment (interface) and robot features exploited. MireLab was designed taking into account state-of-the-art guidelines for MR-based robotic experiences [13]. The main interface (Fig. 4a) is made of 3 areas. The first (top-left), occupying the main space of the screen, is constituted by a whiteboard space where the circuits are created and other information can be introduced (equation, pictures, etc.). On the right, there is a components area, where both the robot and the user can select the desired components for the circuit. The selection is performed using a coherent gesture: a finger-tap for the user and a tap-animation (using the lever) for the robot. When a component is selected, it appears on a buffer space (bottom-right) where it can be valorized, using the dropdown list which shows coherent values. A few additional option are available in the bottom-left button panel, such as the possibility to erase the whiteboard or orienting the component in the buffer. The component can be then placed in the whiteboard space by a drag-and-drop gesture (as before coherently for both the user and the robot). Finally, when all desired elements are on the panel, they can be connected (wiring) by clicking the cable button and later selecting the adequate terminals of the components. There is also a pop-up input tool that can be used as calculator or as an input tool in the LBT variant. As already announced the robot can move all-over the projected environment and interaction are meant to emulate the counterpart performed by the user. Also, the robot is constantly fed with micro-choreography inputs thus fostering the sense of a living being. The robot can communicate to the user using the TTS features, or by showing elements on the shared projection.

Traditional Learning. In this variant, the user assumes the role of the tutee while the robot acts as the teacher. Well established practices are implemented in that case. The robot is controlled by the software based on an FSM logic. The delegation pattern I-do, We-check, You-practice, is adopted from the robot as teaching style, managing the lecture pace through milestone advancement and feedbacks to the tutee. Hence, Cozmo explains the concept, shows and solve example while speaking to carefully make it clear for the tutee. Moreover, it also asks the tutee for collaboration at some points like choosing the values of the components or removing certain elements suggested by the robot. These little interactions are introduced in order to keep the tutee’s attention high during the explanation resulting in a more engaging experience and active learning.

Learning by Teaching. As said, in LBT the learner (user) acts as a teacher lecturing the RTA. To design this variant, we kept in mind that, according to the literature, there are 4 key steps that the learner must undergo and were proved to be effective to maximize learning gains [7]:

  1. 1.

    Preparing to teach (expectation to teach)

  2. 2.

    Explaining to others/RTA (teaching)

  3. 3.

    Interacting with others/RTA (Q&A, feedback to RTA)

  4. 4.

    Observing the RTA spending the acquired knowledge (recursive feedback)[12]

In MireLab, for the first step, a one-sheet long paper is provided to the learner [11, 19] that has to study it on its own and prepare the lecture. The cheat sheet, available for downloadFootnote 3, contains a synthetic explanation of the topic that matches the contents that are provided by the robot in the TL variant. Its structure has been designed to suggest the learner a specific order to use later when teaching, however some points bring a certain level of freedom so the learner can lead the lecture in their own way. All the equations, circuits, and images contained are referred with a numerical code. This code can be used by the learner to rapidly add to the whiteboard these snippet elements during the lecture, using the input tool feature.

Afterwhile, the learner uses the MireLab interface to take the lecture and while doing so interacting with the RTA (steps 2 and 3). Hereby, the learner uses the whiteboard to clearly explain the topic and interacts with the robot through the voice and the MR environment. On the other hand, the robot will follow the lesson asking questions and performing the tasks that the learner command in order to increase its inclusion grade on the experience and not being unnoticed [13].

Finally, a prerecorded video of the robot, solving an exercise on the lecture topic while interacting on its own with the MireLab interface, is shown to the learner (step 4). It was decided to use the same prerecorded video for all the participants of the study to minimize bias.

In this LBT variant the robot is no more acting autonomously but its behavior is controlled using a WOZ approach. This was decided because of the complex interaction that the RTA is asked for, considering the fact that none to little AI are already available for that specific purpose, and building such an AI is out of the scope of the presented study.

Wizard of Oz: As can be seen in Fig. 4b, the WOZ interface implementation provide the wizard with the ability to perform exactly the same actions the robot was capable of when relying on the AI (in the TL) and, therefore, act in a comparable manner. Moreover, the control of the robot is not entirely manual, but some assisted features are provided to the wizard to both facilitate it and minimize the interaction discrepancies w.r.t. the TL robot behavior. By remotely observing the MR environment, through the Kinect RGB camera feed, the wizard can teleoperate the robot with keyboard and mouse input, directly controlling it or by clicking at a point of the camera feed (in that case, the robot will automatically reach the point by the shortest path). Particular efforts were devoted to standardizing some possible frequent questions and answers that the RTA could be in the situation to speak to the learner. The list is included in the interface and once selected the item, its text can be edited, or multiple items combined together, before sending the final phrase to the robot’s TTS engine. Furthermore, several predefined animations encoding different emotions and reactions can be triggered by the wizard. Finally, there has been included some buttons to trigger specific events to the application. This capability is essential to simulate the feeling that the robot is actually interacting with the MR environment. For example, if the robot is asked to remove a component from a circuit it will have to touch it performing the required gesture (double-tap) and then the wizard trigger this event to let the system act accordingly (remove or short circuit such component).

Fig. 4.
figure 4

a) The devised projected interface of MireLab and b) the WOZ interface as seen from the Wizard point of view

3 Experimental Results

This section presents an discusses the results of a preliminary user study that was carried out by using the devised system to compare the training effectiveness of the TL and LBT approach in a MRRTS scenario.

3.1 Experiment Design

The selected population of the study was the one of electronic engineering students that met the requirements of having sufficient background knowledge but scant about the topic (Thévenin Theorem). Therefore, volunteers were provided with a multiple-choice screening test, that includes both theoretical questions and practical exercise about the two aforementioned knowledge areas (10 questions on background knowledge and 5 on the topic). Only the volunteers that scored coherently, i.e. scored greater or equal to 6/10 on background knowledge and less than 6/10 on the topic, were accepted as participants of the study. The so-made sample included 6 participants (all males) aged between 22 and 25 y.o. (\(\mu = 23.83, \sigma = 1.07\)). Due to the (desirable) learning effects, a between-subjects design was adopted for the experiments, by randomly assigning each participant to two equal-sized groups (TL and LBT).

Prior of being exposed to the training, all the participants were asked to respond to a before-training questionnaire (BTQ) designed to investigate: their previous knowledge and expertise with technologies related to those used in MireLab; their study habits; their behavior while learning in a class; and how familiar they are with teaching other people.

After that, the participant received a tutorial given by a confederate about the interface and feature of the system, with tiny differences between the two groups (mostly pertaining to the use of the snippets input tool for the LBT).

Following, participants underwent the training. In the LBT participant were allowed to take notes on the cheatsheet while studying it and preparing the lecture. It was given them the possibility to consult the notes while lecturing the robot, however, they were recommended to leave it on the table (aside the projected area) hand to have just a few quick look at it, otherwise, they could have used the sheet as a communication barrier (by holding it in one hand) between them and the robot, which could have been a negative impact on HRI. In addition, they were allowed to check the notes for a maximum of 5 times, thus preventing to superficially prepare the lecture/study. Finally, the video showing the robot spending the taught knowledge was viewed in another room away from the MRRTS and the robot.

Instead, for the TL no particular expedients were adopted. The time required to complete each step was recorded for each participant of both groups.

After the training was administered a post-experience questionnaire (PEQ), containing: all the items of the System Usability Scale tool [4]; the godspeed questionnaire [1], to analyze the learner perception of the robot, complemented by custom additional statement pertaining the specificities of the experiment; and few self-efficacy items to investigate the perceived learning gains. Objective learning gains were evaluated later by administering a post-training test (PTT), which is an extended version of the screening test (13 questions, including the 5 of the screening test). After that, was administered a final questionnaire (FQ) to investigate the perceived quality of the training and the satisfaction with it.

Since one of the key advantages acknowledged by the literature to the LBT w.r.t. TL is the enhanced long-term retention of the acquired information, a retention test (RT) was considered in the study. Participants were asked to answer the same quiz of the PTT after one week, during this time they were not exposed to any information related to the topic. All the devised tests and questionnaires are available for downloadFootnote 4.

3.2 Results and Discussion

Collected data were analyzed using MS Excel with the Real-Statistics add-on (v7.1). Comparative analyses were performed on the two groups using the two-tailed Mann-Whitney U-test and, considering the limited sample size, the significance threshold was set as \(p \le 0.10\). Regarding the BTQ, no significant differences were found between the two groups for the analyzed aspect. More in-depth, on average participants were used to play videogames occasionally and were very accustomed to touch screen interfaces. On the contrary, they were little to none familiar with neither service nor toy robots. Also, 5 participants reported teaching other people at least once a month while 1 never or rarely (belonging to the LBT group).

About the PEQ, no significant difference was spotted about the 5 dimensions of the godspeed questionnaire (anthropomorphism \(p=0.70\), animacy \(p=1.00\), likeability \(p=1.00\), perceived intelligence \(p=0.70\), perceived safety \(p=0.40\)), suggesting that the robot behavior was perceived similarly in both groups. This fact also seems to support the statement that the implementation adopted for the WOZ in the LBT didn’t biased the comparison. Regarding overall usability, according to SUS results both variants were rated as barely acceptable TL (\(M=68.3,\,SD=14.6\)), LBT (\(M=61.7,\,SD=7.2\)), however, no significant difference was reported (\(p=0.70\)). According to the open-feedback collected, these relatively low scores were mainly due to the sluggish feeling of the touch surface compared to what they were accustomed to (tablet and smartphone devices).

About the self-efficacy, it was significantly higher in the LBT (\(M=4.0,\,SD=0.00\)) w.r.t. TL (\(M=3.11,\,SD=0.38\)), as well as the participant confidence about “successfully pass a test on a thévenin’s theorem without further training”, LBT (\(M=4.0,\,SD=0.00\)) vs. TL (\(M=2.33,\,SD=0.58\)). Whereas, no significant differences were reported for the FQ items, suggesting comparable satisfaction levels and perceived quality of the training.

Fig. 5.
figure 5

Objective results of the study. All scores are normalized to 10 and significant comparisons (p-values \(\le 0.1\)) are marked with *

Objective Learning Gains: Fig. 5 illustrates the objective results about learning gains (score normalized at 10). All participants were able to successfully pass the test after been trained by the MRRTS, independently of the group. In particular, by comparing common items of the screening test and PTT (Fig. 5a) were found significant and marked learning gains (pre-post training scores) for both groups, meaning that both variants were effective. Also, even if the score of the full PTT is higher for the LBT, the difference w.r.t TL was not significant (Fig. 5b). This fact seems to encourage that the intrinsically interactive nature of MRTTS, and a good implementation of the best practice are able to minimize the differences between the two approaches w.r.t what happen with other mediums. However, this result is probably influenced by the limited sample size. Nevertheless, a significant difference is observable in the retention test (PTT scores, immediately after training and after one week from the exposure). In that case (Fig. 5b), the loss of information was lower in the LBT group, also, all participants from LBT group were able to successfully pass the test after the retention period with a minimum score of 7.7, whereas for the TL this was the maximum score obtained and one of the participants did not reach a sufficient mark (5.4). This confirms that LBT is a superior approach in terms of granting long-term retention. Being this results in agreement with previous studies on LBT, this suggests also that the MRRTS was able to stimulate the required mental process, and that the addition of MR was not too detrimental in that way. The Scheirer–Ray–Hare Test applied to the retention test results highlighted that this difference can be attributable mainly to the training approach. In fact, not significant interaction effects were reported between the approach and the exposure time (\(p = 0.80\)), and is improbable that this difference is only affected by the exposure time (\(p = 0.46\)), instead a striking significance was found for the training kind (\(p = 0.003\)). Lastly, considering the efficiency of the training, our results are similar to those obtained in previous works (Fig. 5c), being the LBT significantly more time consuming (almost 4 times) than TL. This is for sure due to the time invested in studying the cheat-sheet, but also largely ascribable to the higher time spent interacting with the robot (teaching).

4 Conclusions and Future Work

In this paper was presented a study with the goal of evaluating the effectiveness of the LBT pedagogical model when applied to a MRRTS by comparing it with a consolidated model (TL). The select topic of the training was the Thévenin Theorem from the electronic domain and the population of the study was one of the electronic engineering students.

Obtained results outlined that both approaches were able to provide sufficient knowledge transfer to the learner. In spite of the limited sample size of the presented preliminary study, it was observed that, at the cost of a higher time consumed in the process, students that underwent the LBT training were able to retain the acquired information better than those trained with TL. This poses LBT as a promising model also in MRRTS scenarios that worth the attention of the community. That considered, future works should focus on validating the preliminary findings with a larger sample size by encompassing also different target populations (K-12, High School, etc.), on developing tools and AI to autonomously control the RTA with believable and emphatic behavior, and by investing in the direction of natural HRI which is key to improve the efficiency and the effectiveness of this particular kind of MRRTS.