1 Introduction

The use of Virtual Reality (VR) as a tool for students with Autism Spectrum Disorder (ASD) has been developing over the last years. Several researches (Bellani et al. 2011; Maskey et al. 2014; Hedges et al. 2018; Ip et al. 2016, 2018; Anderson et al. 2019) support the use of this technology for students with ASD, since through VR a three-dimensional representation of control safe real environments can be created (Charitos et al. 2000; Blascovich et al. 2002; Baileson et al. 2008). Moreover, VR allows us to repeat a social situation to help to understand how it works (Saiano et al. 2015; Lorenzo et al. 2016). According to Lorenzo et al. (2019a, b), VR allows the designer to obtain information about how ASD students respond, interpret, and interact with the real world. Parsons (2016) stresses how, through VR, it is possible to control and manipulate different features of social situations, helping students with ASD to adapt better and solve them.

There are important differences between the two main types of VR, namely, Immersive virtual reality (IVR) and Desktop Virtual Reality (DRV). Although they have similar characteristics, they present both advantages and disadvantages that may translate into how helping children with ASD. In immersive virtual reality systems (IVRS), the user is immersed in a computer-generated world (Wallace et al. 2010; Lorenzo et al. 2016) so that the point of view upgrades according to the position of the user. There is also greater realism and interaction. On the contrary, in desktop virtual reality environments (DVRE), interaction is more limited as visual experiences occur through a computer screen. This greater level of immersion, realism, and interaction, not present in DVRE systems, allows for a better attunement of the sensory needs of ASD students in real-time, as Baileson et al. (2008) and Kandalaft et al. (2013) state.

Both Parsons and Cobb (2011) and Blascovich et al. (2002) consider that the design of immersive virtual reality environments applied to ASD students should fulfill two main goals: representational and behavioral realism, on the one hand, and social presence, on the other. Blascovich et al. (2002) refer to the first concept as the degree to which virtual humanoids and objects behave as they would in the real world (page 111). It is also key that both verbal and non-verbal communication are credible. Social presence, on the other hand, is defined as the degree to which a participant believes he or she is in the presence and interacting with another real human being (pp. 111-112). The scientific literature suggests that social presence is a crucial factor for the efficacy of an IVR based intervention for ASD students (Blascovich et al. 2002; Wallace et al. 2010; Parsons and Cobb 2011; Wallace et al. 2016).

To date, different immersive virtual reality educational environments served researches to work on social competences (Strickland 1996, 1997; Saiano et al. 2015; Greffou et al. 2012; Maskey et al. 2014; Cheng and Huang 2012; Lorenzo et al. 2013; Wang and Reid 2013) and emotional skills (Kim et al. 2014; Ip et al. 2016; Lorenzo et al. 2016) of students with ASD. However, the type of virtual interaction proposed so far shows limitations that influence the experience intensity and presence sensation.

The main objective of this intervention is to address communication problems, in particular, verbal and social communication, including ToM, empathy, and emotional regulation for the ASD population. The present work advances in the recreation of a virtual learning environment (VLE), introducing levels of realism higher than those achieved in previous research, not only regarding the spatial simulation but also in the communication system of the avatar-human type, more flexible and natural. These advances raise the following research questions:

  • RQ1. Can IVR, within this improved format, be a tool that produces improvement in the learning process of social and emotional skills of children with ASD?

  • RQ2. If the answer to the previous question is affirmative, in which learning areas is an intervention based upon this format of IVR more effective?

The paper divides into seven differentiated sections: Section 2 enumerates previous related works. Section 3 describes the features of the IVR tool developed. Section 4 is devoted to the method and protocols followed along with the study. Section 5 presents the main results, Section 6 discusses our findings, and finally, Section 7 concludes.

2 Related works

2.1 Social and emotional competences in ASD students

We start by summarizing the social and emotional competences that students with ASD are likely to present. It is necessary to remember that, although the present research is limited to students with ASD at a low level of severity, the broad spectrum that encompasses the problem suggests a very varied casuistry, which will always require additional study and adaptation of educational strategies to their particular needs and capabilities. Social and emotional competences are intimately related (Sancassiani et al. 2015) since we can interpret that the former generally feed on the latter.

If we understand social competences as the capability of an individual for successful social development, it poses multiple possible situations that involve social interactions of very varied levels of complexity. Gresham (1986) classified different definitions on this concept, distinguishing between those that focused on peer acceptance, those based on behaviors that facilitate social acceptance, and those of social validity or the principles of predicting social response to given behaviors. However, that acceptance or appropriate social behavior always implies the need for interpreting the emotional states of others and offer emotional responses relevant to the context (Merrel and Gimpel 1998).

According to the hypothesis of the Theory of the Mind (ToM) in ASD, individuals with autism manifest the inability to assign a mental state to both themselves and others in different situations, which is the source of their difficulty to interpret the mental state of other subjects within social contexts (Premack and Woodruff 1978). The theory of weak central coherence (WCC), on the other hand, proposes a differentiated cognitive style characteristic of this population, according to which the subjects with ASD show a capacity for detail even superior to that of the neurotypical population, but also trouble to integrate the sum of details in a global gestalt (Frith 1989).

Additionally, the theory of executive dysfunction (ED) understands that the population with ASD shows difficulties in fluency and ability to create new ideas and responses, inhibition (Verté et al. 2006), or planning (Geurts et al. 2004), that is to say the complex and dynamic process by which a sequence of planned actions are carried out, monitored, reevaluated and updated, as well as flexibility of thought and action (Semrud-Clikeman et al. 2014), referring to obsessive fixation and stereotyped behavior.

A different approach corresponds to the theory of systematization, by which individuals with ASD learn based on the search for predictable correlations following established rules (Baron-Cohen 2002). Consequently, individuals with ASD would show preference, tendency, or high capacity to follow a reasoning model based on mechanisms of cognitive systematization, showing a special talent in systematized tasks (Baron-Cohen 2006). Systematization would explain why they can deal with highly regulated systems, while they encounter difficulties with systems of great variabilities, such as relating socially or understanding the minds of others. Also, this theory would explain the resistance to change as one of the defining characteristics of ASD (Wheelwright and Baron-Cohen 2011).

Along this intervention we address, on the one hand, social competences linked to communication problems, in particular, verbal and social communication (Watkins et al. 2017), and on the other hand, emotional competences including ToM (Fletcher-Watson et al. 2014), empathy (Montgomery et al. 2016), and emotional regulation (Samson et al. 2015).

2.2 ASD and virtual reality

There is a number of examples of design and applications of educational Virtual Reality Environments (VRE) for children with ASD (Brown et al. 1998; Parsons et al. 2000, 2004; Mitchell et al. 2007; Self et al. 2007; Kandalaft et al. 2013). Nevertheless, the last reviews reveal that even though some studies suggest promising results, we are still far from having evidence of the general efficacy of the use of VRE in this context (Parsons and Cobb 2011; Boucenna et al. 2014; Grynszpan et al. 2013).

We briefly review here some of the interventions, by distinguishing between those aimed at improving social and emotional competences, as well as the type of VR used (immersive vs non-immersive), and the sort of interaction (either individual or collaborative), depending upon the type of avatars the user interacts with, that is, avatar-human, whenever there is a real human being controlling the avatars, or avatar-agent, when the software controls the avatars' behavior.

2.2.1 VR and social competences of ASD students

According to the type of social competences intended to be improved, we have:

Appropriate social behaviour and problems resolution: Here we have contexts as riding in a bus, to properly behave in a cafeteria or school environment as it may be to respect taking turns, personal space and convention norms and, in general, (Rutten et al. 2003; Parsons et al. 2004, 2005, 2006; Mitchell et al. 2007; Matsentidou and Poullis 2014; Cheng et al. 2015; Lorenzo et al. 2013). Overall, preliminary studies showed potential for social interaction practice and learning, although students with ASD participants required support from educators. Another contexts under research observed a strong link between levels of social attention of ASD students and their learning (Jarrold et al. 2013), improvements in social behaviour under social barriers when incorporating more realistic virtual environments (Trepagnier et al. 2005), as well as improvements in conversational abilities in a job interview (Smith et al. 2014). All those studies base on individual interaction. Among those, the use of immersive VR reduces to the studies of Lorenzo et al. (2013) and Matsentidou and Poullis (2014), using a cave automatic virtual environment (CAVE) (Cruz-Neira et al. 1992), in which three to six walls of a room-sized cube are used to project images representing a particular pre-designed VR learning environment, and reporting significant improvements both in social tasks and executive functions. The work performed by Beach and Wendt (2014) and Cheng et al. (2015) used HMDs, reporting successful adaptation to the technology and satisfactory interaction with the avatars and the environment they experienced.

Social Communication: Kandalaft et al. (2013) and Ke and Im (2013) reported improvements in taking turns, starting or taking the initiative in an interaction, and greeting or ending a conversation, while Stendal and Balandin (2015) suggested increased ability to avoid communication barriers and stimulate self-esteem. Other studies suggest improvements in social and communication flexibility, identity growth, and norms respect (Didehbani et al. 2016; Ke and Lee 2016), as well as better visual contact, manners and listening capability (Cheng and Ye 2010). Here, all studies mentioned above used non-immersive VR, but with a type of interaction avatar-human.

Similarly, two areas of emotional competences may be distinguished:

  • Emotions recognition: They deal with emotions recognition in others (ToM), as well as empathy, emotional regulation, and suitable emotional reciprocity in contexts of socialization (Moore et al. 2005; Cheng et al. 2010). Some of these studies use collaborative environments (Kandalaft et al. 2013; Wallace et al. 2016), suggesting a positive influence of this type of strategy in this field. Some others use immersive VRE (Kim et al. 2014; Ip et al. 2016; Lorenzo et al. 2016), reporting technology acceptance of the participants and improvements in ToM.

Emotional influence of non-verbal language: According to Schwartz et al. (2010), a significant part of people with ASD do not show interest in social engagement. They are also less influenced by aspects as glance direction or facial cues in the social experience. Kuriakose and Lahiri (2015) suggest a larger physiological alteration associated with anxiety when confronted with avatars’ emotions or when the situations are difficult to interpret. A series of papers deal with the relationship between learning social communication and visual contact for ASD students (Mineo et al. 2009; Alcorn et al. 2011; Grynszpan et al. 2009, 2012; Lahiri et al. 2011; Bekele et al. 2013; Georgescu et al. 2013), showing mixed results: Whilst some observed improvements in visual cues, visual contact and attention during conversation (Mineo et al. 2009; Lahiri et al. 2011), as well as positive reactions to the avatar's body language (Alcorn et al. 2011), Grynszpan et al. (2009, 2012) noticed that those improvements were only maintained if introducing external manipulations. Moreover, Georgescu et al. (2013) found that ADS students did not change their opinion on an avatar's personality depending on the time of interaction, whereas neurotypical students did. All these studies proposed an avatar-agent type of individual interaction.

3 Immersive virtual reality system

We next describe our immersive VR system, differentiating between the hardware and software as well as the virtual environment developed in order to deal with social and emotional competences for ASD students. The immersive VR system we developed for this study chooses a realistic format recreated by HMD with a collaborative interaction avatar-human. It allows for a high immersion level and flexible and complex communication, favoring the feeling of presence as well as the intensity of the virtual experience, in order to facilitate adaptation, acquisition of the educational objectives, and their transference to the real world.

To create a familiar environment to facilitate participants’ adaptation, we chose a school and the play garden of the same school. The virtual reality application presupposes the possibility of recreating alternative virtual environments that allow different situations adapted to the needs and abilities of students with ASD, and future upgrades intend to implement several of them. The choice of this particular type of environment responds, on the one hand, to the fact that it is a familiar environment for students, where socialization situations generally occur with peers of similar ages. On the other hand, one of the objectives of the study focuses on working on emotional regulation, empathy, and TOM in a bullying-related situation, more likely to occur in this type of context. Such virtual context required for the recreation of a series of avatars with a variety of personalities and appearances, to set in place a situation where working with the concept of respecting the different.

3.1 Equipment

The hardware equipment used in the design was an HDM display Oculus Rift © and sensor able to track down infrared constellations LED in order to transfer the user’s movement to the VR (Oculus VR, 2014), all connected to a laptop computer. The software used in the development of the immersive VR application includes the applications Unity 3D © in its 2017.3 version (Wang et al. 2015) as the engine game, allowing for the interaction of the virtual elements with the user; iClone © 7 for the model and animation of the humanoid avatars in the learning virtual environments (Ryu and Jang 2016), and the free software Blender in its 2.79 version for the general modelling of the architectonic environment and furniture (Morelli et al. 2015). The video recording used a Smartphone with Android OS (Gandhewar and Sheikh 2010) attached to a tripod.

3.2 Learning VR environments

In the design of the VREs we recreate a generic virtual school where the participants can socially interact with several avatars, in particular with a female teacher and six children, three of them male and the other three female, of ages similar to those of the children participating in the study, and with different physical features, racial characteristics and personalities. We set two different situations: the first one is inside the classroom, and there, the teacher introduces the user in the setting and invites him/her to meet the virtual children. The user’s position allows observing the different characters occupying different locations in the classroom. Only when the user fixes attention in any of the avatars, this one reacts and approximates the user to interact with him/her. The second situation takes place in the garden of the same virtual complex, with identical characters, but here there are conflict situations in which one of the avatars attacks another one. The teacher gets all together around the user and starts a debate with him/her around concepts like respect, inclusion, and equity (Fig. 1).

Fig. 1
figure 1

Aspect of setting 1 (in the classroom), and of setting 2 (in the garden)

To avoid conflict between the dimensions and obstacles in the real world and the virtual one, we opt to limit the mobility to a fixed point, setting the user in the scene in a sitting position with instructions to avoid the temptation to go through the virtual world physically. Immersive VR offers alternative techniques of teleportation translation or virtual displacement by using commands and buttons, in the way of a classical desktop VR, but we tried to reduce the learning curve and the complexity of scenes in order to focus the interest in the social interaction with the virtual characters. Additionally, the spacial design is neutral, with diffuse illumination and without visual and loud auditory stimuli that could alter the level of attention and emotional state of the participant (Parsons and Carlew 2016).

In the same way, and to get a better level of realism in the interaction with the virtual characters, we opt for an interaction of the avatar-human type (Kandalaft et al. 2013; Ke and Im 2013; Ke and Lee 2016; Schmidt et al. 2008; Stichter et al. 2014; Cheng and Ye 2010; Strickland et al. 2013). Thus, the researcher controls the avatars’ answers throughout virtual interaction, chosen from a limited set of pre-defined answers of a menu displayed when the corresponding avatar is activated. Those pre-defined answers are classified into five communication categories: introduction, agreement, neutral response, disagreement, and farewell. We developed a communication system adapted to the context recreated in each setting so that a coherent conversation can be constructed with an optimized number of pre-defined answers. For each one of the answers, we implemented an animation associated with the active three-dimensional model of the activated avatar, including gestures, facial expression, and lip synchronization with the corresponding audio, recorded by a real human being.

The two settings involve identical characters, the female teacher and six children, three of them males and three females. There is another secondary character, a male teacher, with which the female teacher talks when the context asks for her absence at some specific point of the experience (Fig. 2).

Fig. 2
figure 2

Avatars: the teachers and the classroom mates

Each one of the characters represents a differentiated personality and aspect, both in physical complexion as in racial features. Luna, a black girl in her 9, has an open and friendly personality, but with a certain shyness possibly due to a poor origin or a mild inferiority complex; Celia is a Caucasian girl of 12, good-looking, extraverted and with a slightly arrogant attitude; Carmen is a Caucasian girl of 10, plump and short, friendly and direct, self-confident and without complexes; Miguel is an 8 years old caucasian boy, a bit shy, self-conscious but honest; Po is a plump oriental boy of 11 years, affable and impressionable; Christian is a caucasian boy of 12, small, thin, hot-headed and with a strong and contradictory nature, sometimes conflictive but with the desire of nobleness.

In short, we tried to design a VRE in an school context, with a realistic representation both of the spatial architectonic environment as well as the avatar-human communication system, with a careful design to avoid sensorial stimuli in order to favour the communication and the education goals: practice and improvement of verbal and non-verbal social communication, including ToM, empathy and emotional regulation.

3.3 Algorithms

Here we summarize the programming code used in order to construct the interactive communication system in the VR application constructed for this research. The version 2017.3 of Unity 3D © (Wang et al. 2015), used as engine game, allows for different programming languages, but we used C# due to its compatibility with the program (Murray 2014). Thus, we briefly describe the basic structure of the written code to activate the communication system of the avatars in the setting through the users’ glances, the code with the options displayed by the communication system with the actions of the aforementioned avatars in their interaction with the user, as well as the code controlling the time of joint attention and the number of communications between avatars and user during the session.

In order to emulate the joint attention of the user’s glance, we associate the component script ‘VR Eye Raycaster’ to the virtual camera used by Unity. This component throws a virtual beam able to cross other elements in the virtual space. We also associated a component, ‘Sphere Collider’, to the skull of the avatars. This component is invisible to the player but can act as an obstacle to the VR Eye Raycaster. The so-called ‘MiguelTrigger’ script is attached to each avatar, and it defines the time the beam coming from the user’s eyes should connect the sphere on the avatar’s head in order to activate its communication system.

Here, we establish a minimum time of 2 seconds, that is when the user looks at the avatar’s head for more than two seconds, the communication system of that avatar changes to green, as an activation that suggests the researcher the convenience of approximating the avatar and start the communication with the participant (Fig. 3).

Fig. 3
figure 3

Fragment of code corresponding to MiguelTrigger script and collusion spheres located on the avatar’s head

Furthermore, we created a panel integrating a tags menu with the communication system at the disposal of the researcher. Such menu shows, in the upper part of the screen, the name of the participating avatars. These tags change to green whenever the visual contact of the user activates the corresponding avatar. The communication menu of the corresponding avatar, displayed by pressing any of those tags, is organized in five groups of actions (introduction, agreement, disagreement, neutral, and farewell). Each of those groups of actions appears as alternatives at the bottom of the screen. By pressing any of them, the options of answers or actions associated with the avatar appear (Fig. 4).

Fig. 4
figure 4

Fragment of code to activate the audio associated to an animation, and set of animations assigned to one of the avatars

Each one of those answers or actions means than the avatar makes an animation coordinated with specific audio. Avatars’ animations were done with iClone ©7, including gestures and lip synchronization with the audio. These animations and audios, once exported to Unity 3D ©, were assigned to the corresponding avatar via script and audio components. Each component is related via code to some tag to be activated by the researcher during the interaction.

Finally, we developed a button that collects the numerical data of interest related to the user’s performance during the session. Once the researcher presses this button, a text archive is generated, assigning a single identification to this session. The archive incorporates data related to the number of avatars the user has activated, as well as the total time the user established visual contact with any of the avatars, namely, the time the Raycaster intersects the spheres located on any of the avatars. Also, the system registers the number of actions the researcher activated during the session, related to the number of effective communications produced.

Summarizing, we use the MiguelTrigger script together with the component VR Eye Raycaster applied to the VR camera and the collision spheres located on the avatars’ heads to fix and measure the visual contact of the user with the avatars. On the other hand, the animation and audio scripts relate via code to the tag menu at the researcher’s disposal on the screen. An extra button records numerical information about the time of visual contact, the orders realized by the researcher, and the number of avatars activated.

4 Method

In this research, we use a VR technology that will allow us to simulate social situations in a personalized way, and creating a safe and controlled environment in which the ASD student feels comfortable. This way, we may deal with those aspects of his/her executive functions, social and emotional abilities we consider of interest. We performed an educative intervention in a group of students with ASD, between 8 and 15 years old, along with ten sessions, by using a qualitative, quantitative, and experimental methodology. Following this methodology, we implemented the intervention and collected information that later analyzed in order to evaluate both the acquisition and progress in the social and emotional competences at stake. Our methodology is developed, employing a process divided into five phases:

  1. (A)

    Starting phase: We start by stating our research questions

  • RQ1. Can IVR, within this improved format, be a tool that produces improvement in the learning process of social and emotional skills of children with ASD?

  • RQ2. If the answer to the previous question is affirmative, in which learning areas is an intervention based upon IVR more effective?

  1. (B)

    Design and planning phase: Design of the immersive virtual reality environments, the selection of participants, the instruments for data collection, and the intervention period.

Learning immersive virtual environments to be effective for ASD students, should enjoy some characteristics: (1) Realism and feeling of presence. Immersion is not enough; the environment has to be realistic, both in the shape of the spatial environment as well as in the appearance and behavior of the avatars with which the student will interact. (2) Avatars should be flexible and customizable, able to adopt some characteristics of the environment (light, colors) that could be intimidating, excessive, or threatening. (3) It has to be possible to choose different social contexts, to select the characteristics of the main avatars, and to control the type of social interaction, its development, and possible outcomes.

As participants, we wanted to select children with level 1 and 2 ASD according to the DSM-V so that they were able to engage with the technology properly and to interact with the avatars.

As instruments for data collection, we designed a series of questionnaires that worked on the different areas established in the DSM-V, namely, communication and social interaction. Also, the specific way of treating the data was established. The intervention period was carefully chosen so that participants were not under the influence of any other intervention. We got that by performing the sessions during the school holidays.

A pre-intervention period was also designed to guarantee enough contact between the three parties involved in the evaluation via the questionnaires: parents, therapists, and researchers. Researchers attended the therapy sessions, and shared with parents and children different daily activities, observing children’s behavior during a month before the intervention. In that pre-intervention period, the questionnaire (and the meaning of the categories in each item) was widely discussed among parents, therapists, and researchers, before assigning children to the study or the control group.

They were refreshed after the intervention as well, to minimize biases in the responses. Also, it served the purpose of controlling for the Hawthorne effect (i.e., the fact that people’s behavior may change due to their awareness of being in an experiment (Holmes 2011). Additionally, before starting the sessions, participants had a test session in which they became familiar, first with the HMD, and then with the virtual environment.

  1. (C)

    Implementation phase: 14 children with ASD were recruited and randomly assigned to be part of the intervention (7) and to be used as control (7). Whether or not to have a control group is a decision not immune to criticisms. The decision to have a control group was taken as a way of controlling for confounding variables and possible bias. As the control group was not subject to any intervention, it was expected to experiment with no changes in behavior before and after the intervention. As in biology and medicine the use of a control group, generally receiving no treatment, is undebatable, transferring this design to the educational research field is up to debate (Kember 2003; Deslauriers and Wieman 2011), although some authors consider it valid and advisable (Torgerson and Torgerson 2008).

Accepting this argument, as this would be the case, whether the control group receives some conventional intervention (quasi-experimental layout) or not (experimental layout) is the next logical step to consider. According to Schwichow et al. (2016), many variables cast doubt on this question, although quasi-experimental designs have lower internal validity. Additionally, the literature on the utilization of VR as a learning tool for ASD students support the use of a CG within an experimental layout (Wallace et al. 2010; Lorenzo et al. 2013, 2016, 2019a, b; Matsentidou and Poullis 2014; Horace et al. 2016).

  1. (D)

    Data collection: (1) All sessions are video registered, and the researchers fill up a questionnaire about the student’s performance in the assigned tasks. (2) The researchers interviewed relatives and therapists, and filled up a questionnaire, before and after the intervention. These questionnaires added up to the experimental data collected by the researchers.

  2. (E)

    Analysis phase: Evaluation and analysis of the study group performance compared to the control group. Validation of the VR tool considering the observed improvement of the study group.

That is, the general aim of this research is the design and validation of an innovative learning intervention that can be effective and flexible in acquiring social and emotional competences for high functioning ASD students, using VR, taking advantage of the potential benefits this technology has for this population.

4.1 Participants and context

The study group consists of seven youngsters with ASD, with ages between 8 and 15. One of them is female, and the rest are male. In general, all of them have good verbal competences, diagnosed with low or medium severity ASD, which shows deficits in non-verbal communication, socialization, ability to show empathy, and restricted interests. Additionally, the majority of them have an IQ in the mean or over the mean, even though some present a limit knowledge capability. In this sense, we chose the participants according to the severity levels established in the DSM-V. Levels 1 and 2 were chosen, for they presented homogeneous and good communication skills, although it implied to extend the range of ages. In Table 1, severity 1 corresponds to low severity, while level 2 corresponds to medium. In order to preserve anonymity, we use the code “RV”, followed by a number to name each participant in the intervention. Similarly, for the participants in the Control group, we use the code “CG”, followed by a number.

Table 1 Experimental group

RV1 is 15 years old male with low severity ASD and a mild cognitive retard that shows restricted interests, resistance to joint attention, and difficulties to empathize and emotional regulation that, on some occasions, provoke conflict with his peers. Additionally, he has problems in keeping social information, following instructions to interact in a group, and sometimes adopts an indifference attitude and no participation in group activities, most likely due to lack of attention because of his restricted interests. His language is limited, and normally he does not imitate other children in the social interaction, showing boredom together with simple stereotypes.

RV2 is an eight-year-old male with good verbal capabilities, diagnosed with medium severity ASD. He has a good predisposition to socialization, but has difficulties in processing and interpreting social information, and thus, having problems in socio-emotional reciprocity in social interaction contexts, showing astonishment, or anger and rejection. He also shows inflexibility to changes in unfamiliar group situations, and some sensorial reactivity confronted with multiple sensorial stimuli, avoiding sharing things with other children. He can also show boredom together with simple stereotypes in these contexts.

RV3 is an eleven-year-old male with good verbal capabilities, IQ in the average, and diagnosed with low severity ASD. He can keep social information, but he mixes ideas and concepts when developing his arguments, with eventual difficulties in social communication and interaction. He seldom uses gestures to communicate with other children and shows difficulties in participating in group situations, frequently showing boredom and simple stereotypes in these types of situations.

RV4 is a twelve-year-old male, diagnosed with low severity ASD, with excellent verbal capabilities and high IQ. Nonetheless, he has difficulties to empathize, and in the social interaction, he is governed by interests. He shows resistance to maintain joint attention, and to unfamiliar group situations with eventual difficulties in emotional regulation. Moreover, if the group situation includes sensorial over-stimulation, he is likely to reject participation. Among his interests are anything related to technology and computing, and as such, he had, from the beginning, a high degree of acceptance and assimilation of the dynamics of this particular intervention.

RV5 is a ten-year-old male with good verbal skills and medium IQ, diagnosed with low severity ASD. He is governed by interests in social interaction and shows difficulties in the emotional regulation when confronted with unfamiliar situations or when he interprets them as unpleasant or negative. He seldom uses gestures to communicate with other children; when interpreting information in social contexts, he is literal; has some resistance to join attention and difficulties in social memory in the information other than his preferred interests. He has trouble participating in group situations and shows boredom together with simple stereotypes.

RV6 is a fifteen-year-old female with very good verbal skills, high IQ, and diagnosed with low severity ASD. She has difficulties in the space-temporal organization, and has low self-esteem, together with mild difficulty in keeping social information. As a consequence, she shows resistance to participate in group situations with other children when she is unfamiliar with the context, often giving rise to disagreement or annoyance.

RV7 is a fifteen-year-old male with limited verbal and non-verbal communication skills, diagnosed with medium severity ASD. He can understand simple information and answer to clear-cut questions, but he has difficulties in following a complex conversation, empathy, and ToM. He has trouble processing and keeping social information as well as behaving in the right way in situations of socio-emotional reciprocity. Nevertheless, he is always eager to participate in social situations with other kids, in particular, when sports are involved. He has restricted interests. In particular, he enjoys and has a special ability with digital games, as well as in learning and playing the piano.

4.1.1 Participants in the VR intervention

All the participants of the control group were chosen with Level 1 of ASD severity based on DSM-V. Four out of seven participants had good or very good social reciprocity. All had moderate inflexibility to changes. Except for CG7, they barely had problems with stereotypes and sensory reluctance. Table 1b resumes this group’s characteristics.

4.1.2 Participants in the control group

4.2 Instruments

A series of questionnaires were designed that worked on the different areas established in the DSMV, namely, communication and social interaction. Together with the hardware and software used, we designed a series of questionnaires aimed to establish the general level of capabilities of the students participating in the study. Those capabilities are related to the main deficits associated with ASD. The specific questionnaires are in line with those in Lorenzo et al. (2013, 2016, 2019a, b). A pilot trial was not conducted because there is very little Autism Spectrum Disorder population, one of the major existing problems this research field has to face, as indicated by Wallace et al. (2010) and Parsons (2016).

As shown in appendix 1, a first basic questionnaire made out of 35 items was structured giving rise to a total of four categories in coherence with the Cronbach alpha test (Cronbach 1951) explained in section 5.1: (1) Social and emotional reciprocity; (2) non-verbal communication; (3) inflexibility to changes, and (4) stereotypes and sensorial reactivity. The specific questions are detailed in Tables 2, 3, 4 and 5 below. This questionnaire was administered twice, before and after the intervention, to parents/tutors and therapists/educators. It was also completed by the researcher in charge, at the beginning, and after the intervention. Each one of the questions offers five predefined answers: Never; occasionally; sometimes; frequently, and always. This way, we can easily quantify the responses in order to compare the results before and after the intervention objectively. The specific meaning of the categories was discussed with parents and therapists, explaining the differences between them. A weighted Kappa test (Cohen 1960; Landis and Koch 1977) on the level of agreement between the differences in the answers revealed a moderate-substantial agreement among the three parties filling the questionnaires, with an average value of 0.529.

Table 2 General Questionnaire: Social and emotional reciprocity
Table 3 General Questionnaire: Non-verbal communication
Table 4 General Questionnaire: Inflexibility to Changes
Table 5 General Questionnaire: Stereotypes and sensorial reactivity

At the same time, we developed session questionnaires to evaluate the students’ performance in different areas. As we did in the general questionnaire, we offer three possible answers to each question: Good-Fair-Bad; All-Some-No one; or High-Medium-Low, depending upon the type of question. The first questionnaire evaluates concepts related to the RV experience developed in the virtual classroom, considering the level of adaptation to the technology, their satisfaction with the experience, and some aspects of social memory, and ToM. A second questionnaire deals with the experience that takes place in the garden of the virtual education environment. This second questionnaire is devoted to asses empathy and emotional regulation related to concepts like inclusion, respect, equity and rejection of violence as a solution to the conflict in a socialization context (Tables 6 and 7).

Table 6 Type 1 Session Questionnaire: In the Classroom
Table 7 Type 2 Session Questionnaire: In the Backyard Garden

Additionally and finally, the VR application allows registering information on the performance of each one of the subjects participating in this research. As we previously commented, this information includes data on the number of avatars each subject activated by visual contact, the total time the user established visual contact with any of the avatars present in the stage, as well as the number of commands or actions the researcher activated during the session, that can be interpreted as the number of effective communication interactions produced. Out of this data, we built a performance table, taking into account the duration of each session, and the average the users maintained visual contact during the virtual interaction (Table 8).

Table 8 Performance during the VR sessions: Data

Next, we describe the procedure to collect the data set using the materials and instruments described before. The exploitation of the data allowed us to obtain and interpret the research results.

4.3 Design and procedure

Prior to the start of the study, the parents were informed of the objectives of the study at an informational meeting. It was also said that participation in the study was not compulsory. In this sense, consent was acquired from those parents who permitted their children to participate. To protect data privacy, consent was not included in the article.

Before starting the sessions, the general questionnaire, with the dimensions and questions described in 4.2, is administered to parents/tutors and therapists, whenever the participants are under therapy in the intervention period. The responsible researcher of the sessions also completes the same questionnaire out of his impressions and the results in the first phase of the sessions. After finishing all the sessions, all involved parties again fill in the questionnaire to establish the evolution produced in the different areas throughout the educative intervention. As we previously stated, the VR application is made out of two different scenes in the same education context, involving identical characters. The first scene has an introductory nature and is mainly devoted to introduce the context and facilitate the user relationship with the avatars he/she will interact with later on. The main educational goals of these initial sessions involve practice and improvement of verbal and non-verbal communication skills, joint attention, and ToM.

The setting one is introduced by the female teacher, who leaves the classroom to talk with the male teacher, and invites the user to introduce himself/herself to his/ her peers. The system collects data about the number of interactions between the avatars and the user taking into account the actions taken by the researcher during the conversation. The total time of visual contact between the user and the virtual characters during their interaction is registered as well.

These numerical data can be interpreted as a good indicator of the capacities the user shows both in social conversation and joint attention. Additionally, once the RV experience ends, the researcher asks the participant a series of questions to get information about his/her level of satisfaction with the experience and to see whether the student recalls some information on the avatars, as to how old they are, or their preferred hobbies. In particular, we stressed whether the student could describe the avatars, their physical aspect, and personality, aspects related to the student’s level in ToM (Table 9).

Table 9 Type 1 VR Session

The second VR setting takes place in the garden of the same virtual environment. In this setting, on top of the objectives we settled for the first scene, the main goals include working their empathy and emotional regulation related to ideas as inclusion, respect, equity, and rejection of violence as a way of solving social conflicts (Fig. 5 and Table 10).

Fig. 5
figure 5

Example of activation of Luna’s communication system (in green) due to joint attention

Table 10 Type 2 VR Session

As in the initial sessions, once the VR experience concludes, the researcher asks a series of questions to the user in order to get information about his/her level of satisfaction with the experience, and whether he/she is aware of the concepts introduced in the debate. In particular, we stressed whether the user was able to understand the avatars’ emotional state, his/her emotional response, level of empathy, and his/her understanding of ideas of respect, equity, inclusion, and rejection of violence as a solution to social conflict. Video recording, together with the observational data collected by the researcher throughout the sessions, allowed us to estimate the students’ level of verbal and non-verbal communication and physiological responses during the VR experience.

4.4 Data analysis

There are three types of data collected across the experiment: (a) Data collected from the general questionnaires; (b) Data about the students’ performance during the sessions; and (c) Data automatically registered by the system. Only data of type (c) is numerical, while the other two sets of data are of a categorical type. We now explain the details.

To asses the performance during the sessions, we defined three levels according to the nature of the question asked. They could be either Good/Fair/Bad, or All/Someone/No one, or else High/Medium/Inexistent. As for the general questionnaire, they refer to the frequency of specific behaviors associated with ASD defining characteristics. The questionnaire has 35 questions, each of them allowing five possible answers: Never, Occasionally, Sometimes, Frequently, and Always. On the one hand, we aimed to analyze whether the VR intervention improved the performance of the participants in the perception of tutors, therapists, and researchers. On the other hand, we wanted to see the evolution of the students concerning their relationship with the technology and the VR experience.

There are two main approaches to treat advances involving categorical data. The traditional one assigns a set of scores to each of the options, with the convention that the better the performance, the higher the score (Agresti 2010). Additionally, without any additional information on the relative importance of the different categories, the usual strategy is to assign consecutive integer numbers to each one. In this vein, we assign the scores 1, 2, and 3 to the answers of the user’s performance during the sessions, and 1, 2, 3, 4, and 5 to the answers of the general questionnaire. The difference between the total average scores before and after the intervention provides a measure of the improvement.

Recently, an alternative way to evaluate progress with categorical data, without assigning scores to the answers, but rather, with a probabilistic approach has been proposed (Herrero and Villar 2013, 2017). The probabilistic approach compares the relative probability of getting better outcomes when comparing the answers’ distributions before and after the intervention. As our data perfectly fit this format, we also evaluate progress by using this alternative approach, in order to check the robustness of the results.

The questions in the general questionnaire can be treated individually or else grouped in the four dimensions linked to the main types of behavior associated with ASD: Social and emotional reciprocity; Non-verbal communication; Inflexibility to changes, and Stereotypes and sensorial reactivity. We have up to three questionnaires per participant before the intervention and the other three after the intervention, corresponding to the tutors, therapists, and researchers.

On the other hand, we classified the sessions questionnaire questions into four groups. The first one deals with how the user accepts and feels comfortable with the technology and the dynamic of the sessions, evaluating the users’ adaptation to the HMD and his/her satisfaction with the virtual experience. The second group of questions evaluates the social recall of the user. The third group deals with the user’s capacity to show ToM or empathy. Finally, the last group refers to questions on social inclusion and emotional regulation experienced in the sessions. The analysis of the data allows discriminating the user’s performance in each of those areas.

Furthermore, we analyzed the data automatically registered during the sessions, in dealing with the percentage of visual contact, the number of avatars activated, and the number of effective interactions during each session.

5 Results

Here we summarize the main results obtained. First, we present some statistical tests to check (1) whether it is statistically sound to group the items in the questionnaire in the four areas previously mentioned, and (2) whether the study group and the control group are statistically homogeneous. We do so, in both cases, by considering the different answers to the questionnaires before the intervention. Then, we consider the baseline, and finally, we analyze the effectiveness of the intervention.

5.1 Statistical tests

The Cronbach alpha test (Cronbach 1951) is used to measure reliability or internal consistency of compound scores, or whether we may assume that several items are measuring the same underlying construct, and if so, whether it is legitimate to group the answers. Here, we grouped the items in the questionnaire in four areas: Social and emotional reciprocity; Non-verbal communication; Inflexibility to changes, and Stereotypes and sensorial reactivity; A value close to 1 indicates high internal consistency (Table 11).

Table 11 Cronbach alpha test

As we see in Table 9, all alpha values are very close to 1. These figures mean that the different items considered in the four areas are measuring the same underlying construct. We now test whether the study group and the control group are statistically homogeneous. In doing so, we apply the Fisher’s exact test (Fisher 1922, 1954), designed to check whether two different samples are drawn from the same population. This test is adequate for qualitative variables (Mehta et al. 1984), small samples (Larntz 1978), and whenever the contingency tables are known. If the p-value is small enough, we can accept the hypothesis that both the study group and the control group come from the same population. The results agree with this hypothesis (Table 12).

Table 12 Fisher’s exact test

5.2 Control group

We have data on the control group before and after the intervention period. As their components were not participating in the VR sessions, we expect them not to present significant changes along the experiment period. Indeed, this is the case. We test this fact using two instruments: (1) By comparing the score differences, and (2) by using the probabilistic comparison, i.e., by computing the relative probabilities of getting better results before and after the intervention period.

(1) The differences between the dimensions scores before and after the treatment period are, on average, between -1.3% and 1.4%, i.e., they are nonsignificant (see Figure 6). (3) As for the probabilistic comparison, widely explained in section 5.3, we obtain the distribution of answers before and after treatment. The algorithm provides the distances to the mean (normalized to be equal to 1) of both distributions. We understand that there is significant progress whenever (1) the valuation after treatment is above the mean, (and therefore, the valuation before treatment is below the mean), and (2) the distances to the mean are above 5%. Then, in the control group, we see nonsignificant progress (in dimensions 1 and 3), as well as nonsignificant regress (dim. 2 and 4). See Figure 7.

Fig. 6
figure 6

Score percentage changes of VR participants

Fig. 7
figure 7

Average changes of the study group

5.3 Intervention group. Score differences

As we previously mentioned, we have up to three questionnaires for each participant, both before and after the intervention, made out of 35 questions, each with five possible answers, ordered from worst to best. In a first step, we apply a scoring system, where each question can get a score from 1 to 5. We evaluate the effectiveness of the intervention by considering the score differences of the participants after and before the intervention, where the questions are grouped in the four areas considered.

All participants in the study group improve after the intervention in the majority of areas. Only RV2 and RV7 show a slight decrease: some 2% in non-verbal communication for RV2, and 2.5% in stereotypes and sensorial reactivity for VR7. The highest improvement (17.5%) is obtained by RV5 in “inflexibility to changes”, while RV4 progresses some 15% in non-verbal communication. RV5 and RV4 improve 13.3% and 12.3% in social and emotional reciprocity, respectively. On average, RV5 leads with an improvement of 13.5%, followed by RV4 and RV6, with a mean improvement of 12% and 11%, respectively.

On average, the study group present improvement in the four dimensions considered, going from some 6.5% in stereotypes and sensorial reactivity up to some 9.2 in inflexibility to changes. The mean improvement is 8.2%.

5.4 Effectiveness of the intervention: Probabilistic evaluation

As previously commented, we now evaluate progress in the treatment group by using a probabilistic approach (Herrero and Villar 2013, 2017). This probabilistic approach is based upon comparing the relative probability of getting better outcomes when comparing the answers’ distributions before and after the intervention using a free-access algorithm (Herrero and Villar 2017).

Here, we start by comparing for the study group, and for each of the 35 questions in the questionnaire, the distribution of answers prior and after treatment. The algorithm provides the distances to the mean (normalized to be equal to 1) of both distributions. We understand that there is significant progress whenever (1) the valuation after treatment is above the mean, (and therefore, the valuation before treatment is below the mean), and (2) the distances to the mean are above 5%. The results for the treated group appear in Figure 10. As we see, this group presents significant progress in 34 out of 35 questions. Only in question 32 appears a slight (non-significant) regress. The largest progress appears in Question 30, where the probability of getting a better result after treatment is 3.63 times than that before treatment. In Question 2, this relative probability is 3.08. The best results in the different dimensions are obtained in Questions 2, 3, and 8 in Social and emotional reciprocity, indicating improvements in closeness, the disposition to participate, and emotional responses. As for Non-verbal communication, Questions 16 and 19 present the best results, indicating progress in responses, interest to others, and expressions. In Inflexibility to changes, Questions 30 and 26 are the best, meaning that is a significant reduction of anxiety and tantrums. Finally, in Stereotypes and sensorial reactivity, we obtain very good improvements in Questions 33 and 34, showing less boredom and a better disposition to share things (Fig. 8).

Fig. 8
figure 8

Relative dominances. Study group

If we apply the probabilistic approach to the four (aggregated) areas, we obtain significant progress in all of them, over 40%. The largest differences appear in Social and emotional reciprocity and inflexibility to changes, with a 60% improvement (Fig. 9).

Fig. 9
figure 9

Aggregate progress. Study group

5.5 Data collected within sessions

For the study group, we also have data about their adaptation to technology, social memory, and ToM, as well as whether they comprehend ideas of inclusion and respect. Also, we have data about visual contact and joint attention during the sessions. In dealing with these data, we get a positive trend in all members of the treated group, in all the items. The one with the best evaluations is Adaptation to the technology, reaching the largest values. Also, Social Memory presents a very good trend, reaching the largest qualifications. For the other two items considered, empathy and ToM, and emotional regulation and inclusion, the trend is positive but smoother. The results are shown in Figure 10. Finally, we consider the data relative to visual contact and avatars’ activation, comparing them between the sessions developed in the classroom and the garden. We get that the results for both items are better in the second scenery. See Figure 11.

Fig. 10
figure 10

Average evolution within sessions

Fig. 11
figure 11

Average visual contact and avatars’ activation within VR sessions

6 Discussion

The present study tries to explore the improvement in different behavior areas in a group of students with ASD. To this end, we designed an immersive virtual environment (a school and a playground), were different interactions between the participants and the avatars come naturally. The Fisher's test (Fisher 1922, 1954) was applied to deal with the reliability of the results, resulting in a value of p below 0.05, showing that both groups (VR and CG) are statistically homogeneous. In addition, an alpha Cronbach's test (Cronbach 1951) was also conducted to check the grouping of the questions in all four areas of behavior considered. The values of the variable alpha, close to 1, guarantee the internal consistency of the compound scores.

Given the size of the sample might bring up some discussion about the generalizability of the results (Parsons and Cobb 2011; Boucenna et al. 2014; Grynszpan et al. 2013) the homogeneity between groups as well as the fact that the CG did not present significant changes after the intervention gives supports the relevance of the results obtained.

The largest improvements for the experimental group were in Inflexibility to changes and Social and emotional reciprocity, with an overall improvement of 9.2% and 9.1%, respectively. Moreover, all the participants presented individual improvements in these two areas, between 5.8% to 17.5% in the first case, and 3.1% to 13.3% for the second. These results are in line with the findings in Didehbani et al. (2016), Manju et al. (2017), Ke and Lee (2016), Cheng et al. (2015) or Stichter et al. (2014), suggesting that the use of IVR learning environments provides a secure and controlled context for ASD students favoring predisposition to social interaction and increasing flexibility to variations.

Non-verbal communication and stereotypes and sensory reactivity dominions showed an average improvement of 7% and 6.5%, respectively, but participants' performance was not as homogeneous this time. RV4 and RV5 improved some 15% and 13.3%, respectively, in nonverbal communication in contrast with RV2 that showed a decrease of 2% through the intervention. These results are similar to those in Bernardini et al. (2014), or Cai et al. (2013). At the same time, RV2, RV4, and RV6 achieved improvements between 10% and 11.7% in stereotypes and sensory reactivity, whereas RV7 diminishes his/her performance 2.5% in this area. These differences between the participants’ performance may be due to a combination of different factors, as the diversity of the children’ capabilities and context, the difficulties some ASD students may have in interpreting specific stimuli (Rogers 2000), or the particular chosen protocol, although duration and number of sessions are not parameters universally agreed for this particular matter (Kandalaft et al. 2013).

To add robustness to previous results, we also applied a probabilistic test (Herrero and Villar 2013, 2017). Out of this, we obtain that in Social and emotional reciprocity, the largest improvements (i.e., the probability of better results after the intervention is about 3-4 times than before) appear in Questions 2,3, and 8, linked to closeness, the disposition to participate, and emotional responses. These results are similar to those in Hopkins et al. (2011), Parsons (2016), Ip et al. (2016, 2018), and suggest that IVR is in line with the visuospatial preferences of ASD students and is effective in capturing their interest in order to develop their emotional and social competences. Another area presenting significant improvement is inflexibility to changes (here, the probability of better results after the intervention triples). These results are in line with that suggested by Baron Cohen (2002, 2006) which indicate that children with ASD present great capacities in systematized and predictable environments with great flexibility such as IVR (Wallace et al. 2016; Newbutt et al. 2016; Lorenzo et al. 2019a, b), something that could be associated with a decrease in anxiety and temper tantrums in children with ASD.

Finally, with respect to the data collected during the sessions, both social memory and adaptation achieved values between 2.7 and 3 out of 3, something in line with recent findings showing that children with ASD have a good acceptance of virtual reality technology (Hadad and Ziv 2015; Trembath et al. 2015; Murdaugh et al. 2016) enhancing their interest in social interaction (Wallace et al. 2010). Additionally, eye contact registered by the system throughout the sessions was between 2,8% and 15% for the scene portrayed inside a class, and between 8,4% and 33,7% for the play garden scene. This contrast was expected, as the activation of avatars is less easy in the classroom due to their relative position within the scenes. In the garden scene, after they face a conflict situation, the characters locate around the subject, increasing the options to activate them. In any case, both in the classroom as in the garden, appear a decrease in the activation in the second session that is recovered later on, in the last two sessions. Even though the level of interest is high during the full intervention, there could be some boredom once they are familiar with the scene. It seems that after the second session, they are more interested in discovering the scene than in the avatars. Nonetheless, this repetitive behavior favors the improvement of social memory, presenting clear progress across the sessions, whereas the domains of empathy and ToM, and those related to the concepts of inclusion and equity, show improvement, nonetheless it is not so extreme.

7 Conclusions

Based on the results of our intervention, we may say that our research reveals a positive impact on the competences of the experimental group. However, the research questions must be answered now.

In dealing with the first research question, we may say that IVR, in the present format, is an educational tool that may serve to improve the social skills of students with ASD. We recreate a socialization virtual world familiar in the participants’ daily life, with an interaction avatar-human that sacrifices complexity and flexibility to get a better realism. This game-like format, functioning by repetition, has been effective in the view of families, therapists, and the researcher in charge of the experimental process, observing improvements in the general competences of the children participating in this study.

Additionally, the RV group experiences optimal levels of adaptation that are maintained throughout the entire intervention. However, the progress in eye contact and activation of avatars suggests a gradual loss of interest in interacting socially with the virtual characters presented, an aspect that may be due to the repetitive nature of the dynamics presented. This recurrence, however, translates into a positive performance development in the field of social memory, which on the other hand, is not as pronounced for the categories of empathy and theory of mind (ToM) and its emotional regulation and assimilation of inclusion concepts, although all reach intermediate values.

As for the second research question, we can conclude that social and emotional reciprocity, as well as inflexibility to changes, were those with the largest improvements. Nonetheless, also non-verbal communication and stereotypes and sensorial reactivity presented significant improvements.

7.1 Limitations and future work

There are, nonetheless, some limitations of this study that ask for caution when interpreting the results. Even though we tried to select the two groups (RV and control) as homogeneous as possible, we have to take into account that each individual is unique, and as so, they present differences in their abilities, sensorial preferences, interest and emotional situation that may have conditioned their adaptation and responses during the intervention, and thus, influence the results. Furthermore, here, we did not test to what extent the intervention may have affected other areas of behavior or its maintenance after some time.

On the other hand, as we previously mentioned, there are no standard actions to establish duration, number of sessions, and participants in an intervention of this type to guarantee the reliability of the results. On the contrary, those decisions seem to depend upon the design and particular goals of each research. Nonetheless, larger study groups and more sessions in a longer intervention period are always desirable.

To sum up, the lack of analysis of the persistency of the improvements in the long run, the duration of the intervention, and the heterogeneity of the participants are the main limitations of our analysis, preventing us from generalizing the results.

Future lines of work include: (a) improvement of the technology and the environment, aimed at obtaining a more flexible interaction between children and avatars; (b) design of new environments with the purpose of carrying out alternative intervention in which the effect of the technology on some other competences could be checked; (c) design similar interventions with a larger group of students, in different locations, so that differences in adaptability and improvement could be better tracked. Of course, in any of those future lines of research, we will try to overcome the limitations of the present study to get better and more generalizable results.