Keywords

1 Introduction

Virtual Reality (VR) devices and other Mixed Reality (MR) technologies were not just invented in the last years. They have been a topic for scholars for over 50 years by now. The possibilities of these tools lead to a tremendous potential as a learning platform. A variety of studies explain some of the many ways to integrate these technologies into the classroom [1]. Especially in the mining engineering sector and its education system, the industry had to face massive changes over the past few years. In many countries mining operations became unprofitable, got closed or did get privatized [2]. This development does not make it easier for students to experience mining operation in practice and on site. In order to face these challenges, the project MiReBooks focuses on the development of a Mixed Reality framework supporting professors and students of mining education. This framework offers tools, methods, examples, and technologies that bring Mixed Reality into mining education [2]. Research shows, despite already finding use in today’s classrooms, MR has still not found its way into the tertiary education sector fully [1]. In addition to that, formal evaluations of MR applications have only been a topic for researchers for a few years by now [5].

Beginning with a short description of MiReBooks itself, we give some insight into the key data of the project. In the main chapter of this paper, we examine the current state of evaluation that was done in the recent years within the field of MR. The authors will give an overview of these studies, which will also be classified according to their evaluation type. The goal of this step is also to get a first impression of how the sector has developed over the last years. The results will then be discussed in view of our own research process and how the gathered data can help improving this task. We will also give an outlook on upcoming evaluation research within the MiReBooks project. Finally, the authors will present the didactic concepts behind the project and apply the results of our study on them.

2 The MiReBooks Development and Research Project

2.1 Mixed Reality in Education

Mixed Reality (MR) describes a continuum between reality and virtuality. It includes Virtual Reality (VR), Augmented Reality (AR) and different stages between [10]. Within the last years, techniques are finding their way into the educational sector more and more [11]. However, MR is still not widely acknowledged by teachers in the tertiary educational sector [1].

As already mentioned in the introduction chapter, MR is a potent technology for enhancing learning processes in different ways. Especially the interactive and immersive nature of virtual environments brings potentials not only for serious games and three-dimensional worlds: Granic [7] mentions that entertainment is not the primary purpose of MR technologies, but to increase the motivation of learners and involvement into learning activities. In addition, virtual learning environments (VLEs) should also be beneficial in terms of learning outcomes [7]. Dawley and Dede [12] state that MR experiences enable situated learning. This concept is widely acknowledged as a powerful didactic concept. Schiffeler et al. [13] also mention that collaborative forms of MR can promote communicative skills and problem-solving by interaction with other students.

Overall, lecturers confirmed positive effects of MR in education [14]. However, there is still not a lot of empirical evidence within the field to confirm such expectations in general [8,9]. Using MR in mining courses is a particularly challenging task and there is still little knowledge of their efficient usage in mining engineering education [15].

2.2 The MiReBooks Project

We already introduced shortly into the difficult situation of the mining industry and its educational sector. This situation has led to a decline of social acceptance and damages the public image of the raw materials industry [2]. Following the sector becomes less and less attractive for students, while the demand in the sand, gravel and quarry industry is rising [18].

To counteract this, the European Institute of Innovation & Technology (EIT) Raw Materials launched the project MiReBooks (Mixed Reality Books) in 2018 [15]. 14 Pan-European partners work on different methods, technologies, and tools to address the current problems in the field of mining education. The purpose of the project is to increase the attractiveness of mining engineering for students [3]. The researchers work on possibilities to transfer theoretical knowledge into practical work [14]. This is one of the major challenges within the mining sector [19] since blasting, loading of rubble onto trucks or visiting a mine in general lead to safety risks, logistical challenges, and further problems [14].

Kazanin and Drebenstedt [19] compared the educational sector for leading countries in the mining industry like USA, Russia, and Germany. This research showed that education programs must meet “changing demands of national and global mining industry” and that it should incorporate “active involvement of the professional community in the process of training” [19]. Knowing these constraints, we can state that highly practice-oriented teaching in the field of mining engineering education is difficult to implement [14].

A possibility to ensure a more practical way of teaching is the use of MR. These technologies can be a helping tool when trying to overcome such constraints. Lee [20] and Winn [21] state that AR and VR enable more natural processes for interaction with virtual objects. According to Radu [22] such interaction increases the quality of the learning outcomes. Santos et al. [23] could also measure a positive effect on the performance of students using AR compared to traditional methods in their meta-analysis.

The MiReBooks project uses these findings and creates a framework for assisting teachers and students in mining engineering education. It is “a new digital learning experience that explores the way mining is taught, applied and changed in the future” [14]. We use AR and VR technologies to enhance traditional learning material. These learning experiences allow lecturers to provide situations similar to hard-to-get real-life experiences. The didactical concept behind the project will be discussed after explaining the literature study and explaining our results.

3 Evaluation of MR Tools in Education

As described in the previous chapters, Mixed Reality is still a rather little researched topic when used as a teaching method. Especially in the domain of mining engineering education, the use of MR in the classroom is a relatively new approach [15]. To enhance these new tools, it is mandatory to evaluate the learning outcomes and technologies themselves. However, as Swan et al. [16] could show, evaluation processes are not conducted as often as expected. In 2004 they produced a study that reviewed over 1,100 articles from multiple sources connected to Augmented Reality. From these publications, only 21 described some form of formal user evaluation [16]. Santos et al. [23] conducted a meta-analysis in 2014, where they analyzed 87 research articles on augmented reality learning experiences. 43 of these papers included formal user studies which measured factors like ease of use, satisfaction, immersion, student motivation or performance [23]. This shows that there has been an increase of evaluation processes in MR over the last years. However, there is still limited knowledge about MR as an educational tool and there is even less information about fitting evaluation methods for these technologies.

Duenser et al. [5] put up the theory that the main reasons for this lack of user evaluations in AR could be “a lack of education on how to evaluate AR experiences, how to properly design experiments, choose the appropriate methods, apply empirical methods, and analyze the results.” These aspects can be found within the six stages of evaluation design, presented by Oliver [17]. Therefore, evaluation design consists of “identification of stakeholders, selection and refinement of evaluation question(s), based on the stakeholder analysis, selection of an evaluation methodology, selection of data capture techniques, selection of data analysis techniques, and choice of presentation format”. It is also mandatory to understand, that evaluation processes should be designed in an iterative or cyclic way to maximize benefits.

3.1 Research Design

After showing the difficulties and challenges connected with the evaluation process of MR tools in education, it is now mandatory to explain the research method. We started by defining the search queries. These were the terms “Virtual Reality”, “Augmented Reality” and “Mixed Reality” connected with strings like “evaluation”, “study”, “education” or “classroom”. The research was conducted on the academic databases ERIC Database, Researchgate, IEEE Xplore and LearnTechLib. We also performed a search on Google Scholar. Additional constraints were the date of publication and availability of each paper. The authors focused their research on works, which were published between 2015 and 2021 and are freely available online.

The authors then selected 94 papers for a first examination where they were checked for relevance. After this selection process 59 publications remained in our selection from which further 15 were dropped. These papers were excluded because they either only suggested an evaluation, the actual evaluation was explained in a different paper or the study did not evaluate an MR technique or tool itself. We then began with the formal analysis of the remaining 45 papers by classifying their aspects according to a predefined grid. Our main interest was the evaluation techniques used in each study. For this purpose, we adapted the approach by Duenser et al. [5] and created a grid to classify each paper into five types:

  1. 1.

    Objective Measurements

  2. 2.

    Subjective measurements

  3. 3.

    Qualitative analysis

  4. 4.

    Usability evaluation techniques

  5. 5.

    Informal evaluations

As our research goal was getting insight into current state-of-the-art practice in MR evaluation, we decided to allow assignment of one paper to multiple categories. We wanted to generate an overview of all used techniques and put them in perspective. This differs from the original approach by Duenser et al. [5], as they only classified each paper according to its main evaluation approach. However, the analysis of multiple papers showed, that a clear assignment to only one method would have been problematic. Several researchers conducted multilayered studies with multiple measurement goals which complicate the assignment to only one category. Another deviation of the original concept was made when the authors decided to check each paper for explanations of an iterative development process. Beside these formal aspects, which were registered into an Excel sheet while reading each paper, the authors also took notes about factors like evaluation types, research outcomes or study participants. The results of this analysis will be discussed in the following chapter. The authors will also talk about some examples in more detail.

3.2 Results

Examination of the 94 initial papers led to 59 publications, which met our general selection criteria. From these, 15 were dropped due to different reasons. Lee et al. [28] and Merengo et al. [30] did both not conduct a study within their papers, but rather used feedback from studies that have been explained in different publications. Takala et al. [29] developed and evaluated a course on creating VR experiences, which did not match our research question. Thanyadit et al. [31] created a promising AR tool which allowed the lecturer to supervise a group, using VR, but did not conduct an evaluation. Despite their scientific value, these publications were discarded, because they did not contain any information on practical evaluation techniques.

The remaining papers presented different studies which were categorized according to the evaluation techniques used, mentioned in the former chapter. The most common participants in these evaluations were students. 31 publications presented a study, where this group of test subjects were represented and 22 of them were carried out in the area of higher education. Aside from educational settings, other studies were carried out in the context of professional work or medicine, therefore another common group of participants were patients or representatives from the specific domain. One example of a study with patients was explained by Summers et al. [32], where the researchers used a variety of methods to evaluate their application, like observations or a questionnaire. Only two papers contained experts as the test subjects and there were four publications, which did not describe the test subjects in detail aside from sex or age. The number of evaluators varied between five and 829. The last number stems from Scullion et al. [9], where the researchers first let 720 participants answer a questionnaire about subjective experiences and later conducted another survey with 102 students on three different universities.

As mentioned in the research design chapter, the authors also took notes on explanations of iterative evaluation processes. It turned out, that many studies were conducted within such a procedure. However, only nine publications described such iterative evaluation processes more detailed. Examples for this can be found in Pombo et al. [33] and Shahriari-Rad et al. [34]. A detailed description of a multilayered evaluation design is presented in Lozada-Yánez et al. [35]. The researchers of this paper explained five stages of testing, which started with a first review of their test environment in the construction phase. The original items were then validated according to their relevance and clarity. After this stage, the test was further adjusted, and a pilot study was conducted. As a final step, the researchers performed a reliability analysis of the obtained data.

Fig. 1.
figure 1

Types of evaluation techniques by year and publication date.

Overall, we found 21 methods about objective measurement and 29 about subjective measurement. 16 publications contained a qualitative analysis and five used usability evaluation techniques. Another five described informal evaluation approaches. The distribution of all analyzed papers and the types of evaluation techniques they presented is shown in Fig. 1.

Fig. 2.
figure 2

Overview of types of evaluation techniques found in all publications.

Objective Measurements

This category includes studies, which conducted objective measurements. Aspects, which are measured in this category are consumed time, error rate, accuracy, scores, number of actions or other objective factors. As seen in Fig. 2, we found 21 papers, which conducted such objective methods in their evaluation. This marks the second most common category in our research. An example for this is Caputo et al. [36], where participants had to solve tasks in MR with different types of object manipulation. Researchers measured aspects like execution time or actions per task and additionally measured subjective factors with a post-test questionnaire.

Subjective Measurements

In this section, the authors selected papers that measured subjective experiences of the participants. Common techniques are questionnaires, subjective ratings, or judgements. As depicted in Fig. 2, subjective measurements were the most common type of evaluation techniques among all publications with 29 studies. Papers categorized under this term often measured aspects like immersion, authenticity, preferences, motivation, mental effort, or attitude towards the application. An example for such techniques can be looked up in Lemheney et al. [37].

Qualitative Analysis

In this category the authors collected studies with formal user observations, formal or semi-structured interviews or classification of behavior. 16 papers were classified in this category. Summers et al. [32] is one example for such methods, as the researchers observed the behavior of patients during VR sessions and compared it with a control group.

Usability Evaluation Techniques

This category compiles studies with evaluation techniques that measure interface usability like heuristic evaluation, expert based evaluation or think aloud method. However, it is still possible to measure factors of system usability with other techniques. This category strictly consists of papers which used the aforementioned methods. With 5 studies, that could be attributed to this section, it was the least common category together with informal evaluations. Examples for papers with this evaluation type are Chujitarom et al. [38] and Nuanmeesri et al. [39] as they both conducted expert based evaluation.

Informal Evaluations

These are papers which included informal user evaluations like observations, informal collection, or user feedback. It must be stated that an attribution to this category was a bit problematic since it was not always possible to clearly detect such kind of evaluation. Therefore, only papers that unambiguously described an informal evaluation were selected. This led to 5 papers collected under the term informal evaluations. As shown in Fig. 2 this category is the least common one together with usability evaluation techniques.

4 Discussion

After gathering data about evaluation techniques used in MR over the last five years, we can now compare these findings with each other, with older data and with our own previous research within the MiREBooks project.

As depicted in Fig. 2, subjective methods were the most common type of evaluation methods used. These findings affirm the assumption, that researchers are still actively trying to maximize user experience in terms of immersion or motivation and it also aligns with our own motivations within the MiReBooks Project. We can propose the theory, that objective factors like error rate or consumed time are not seen as important as these subjective aspects. However, objective factors are still considered to be very important to a large number of researchers, as the difference between the number of objective and subjective evaluation methods is rather small.

There is also another explanation to this observation. In most studies subjective measurements act as a kind of addition to the main research topic. Many authors focused on other aspects and only conducted a subjective questionnaire after the main evaluation process. This could also explain the difference between our study and the study conducted by Duenser et al. [5] from 2008. While we allowed attribution of one paper to multiple categories, Duenser et al. only categorized them according to their main focus. In their publication, the authors presented objective measurements as the most common type of evaluation between 1995 and 2007, while subjective methods were only second or third in most years. However, both studies show, that the overall number of conducted studies about evaluation in MR is steadily rising.

In 2004, Swan et al. [16] only identified 21 from 266 AR-related papers, that contained information about formal user evaluation processes, which are about 8%. Four years later, in 2008, Duenser et al. [5] conducted a similar study and identified 161 (~29%) of 557 papers, that included evaluation. Santos [23] also found 43 (~49%) formal evaluation approaches within their 87 research articles on Augmented Reality in 2012. In our own research we selected 94 papers and identified 59 (~63%) publications connected with evaluation processes and 45 (~48%) explaining an informal evaluation of MR. These numbers clearly show a positive trend in the amount of evaluation within Mixed Reality. However, this conclusion is limited, as our own study differs in some aspects from the other three and shares similar limitations.

Another observation that coincides with the work by Duenser et al. [5] is the low number of usability testing. In their findings, this category was only identified in studies between 2003 and 2007 and it was the least common type in every year, just like in our own study. This suggests that usability is not seen as important to MR applications as factors like immersion, motivation, or performance data.

We can also compare the findings of our research with our own previous evaluation approaches within the MiReBooks project. The most common evaluation methods attribute to the category about subjective measurements. The researchers within the MiReBooks project explained their findings of subjective measuring methods within three publications. There were multiple methods to measure subjective aspects like questionnaires or interviews [3,14]. In addition to that, another study conducted an evaluation of the usefulness of 360° videos in VR [4]. Feedback forms were used to capture the individual perception of each participant. Another important area was usability of different tools [14]. These findings can now be used within the didactical framework of MiReBooks.

5 Applying the Results on the Didactical Framework of MiReBooks

Based on the current findings and basic didactic principles, the didactic concept of the MiReBooks project is presented below. The concept includes current research findings and aims to take into account the interdisciplinary expertise of the project consortium on technical and didactic requirements in the best possible way. Overall, the didactic concept addresses four phases of integrating MR technologies into teaching. The planning phase (I), content production (II), the implementation phase (III) and evaluation and reflection (IV). The whole procedure is visualized in Fig. 3.

Fig. 3.
figure 3

Four phases of integrating MR technologies into teaching.

The planning phase (I) mainly includes reflection on the learning objectives to be achieved. Based on WHAT is to be taught, suitable media and technologies are selected, taking into account the organizational framework conditions as well as individual skills and prior knowledge [14]. Teachers are supported in this process by a decision matrix and a planning table [14]. Here, the basic concept of Bloom’s educational objective taxonomy is taken into account [24]. The application of this taxonomy supports teachers in better structuring their individual teaching units or even entire curricula.

The content production phase (II) differs for each medium. For example, Khodaei et al. [25] discuss the specific requirements and different steps in the production of 360° videos. The individual skills and prerequisites of the teacher must obviously be taken into account here as well. Within the framework of the guideline developed in the project, it is pointed out that the targeted development of own MR contents definitely requires the support of technically experienced staff, which was confirmed by various teachers using MR technologies [3]. The MiReBooks project also aims to provide teachers with an authoring tool to share content, adapt it and make it usable for themselves.

The implementation phase (III) refers to the actual use of MR technologies in teaching. Within the MiReBooks project, four different test lectures (open pit bench blasting, hard rock underground drift development, hauling in mining, and continuous surface mining) were developed and conducted at several European universities. The integration of MR technologies is preceded in the project by an examination of the learning objectives, organizational prerequisites (such as size of the classroom or number of participants) and individual skills and competencies of the teachers. This provides opportunities to use the technologies in large groups (teacher-centered learning) as well as in small groups (student-centered learning). Especially in student-centered settings, the approach of integrating MR technologies follows Kolb’s concept of experiential learning, which is based on active experiencing the learning content [26].

Within the aforementioned lectures, different sets of MR hardware components were included. During the test lectures, classical teaching materials, such as PPT, Whiteboards, Blackboard were used and combined with small breakout sessions providing MR based experiences. In total, there were 12 test lectures (four on open pit bench blasting, three on hard rock underground drift development, two on hauling in mining, and another three on continuous surface mining). Previously, all lecturers were asked to fill a storybook on their lectures containing the aim and use of the respective media for a certain learning objective.

In the last phase of evaluation and reflection (IV), a mixed-method approach is applied to evaluate the use of MR technologies. Thus, questionnaires with vali-dated scales, e.g., System Usability Scale [27]) are used to assess the usability of the technologies. In addition, questionnaires are used to assess the experience of MR technologies in teaching, which were supplemented with qualitative open questions. Subsequently, interviews are conducted to explore the possibilities and limitations of MR in mining engineering education.

6 Conclusion and Outlook

We began our paper with the statement that evaluation processes in MR are still a very little researched scientific field. To support this statement we presented multiple works, which already conducted analysis of this topic and practical examples. After explaining the key data of the MiReBooks project, we dealt with evaluation methods and techniques. In the main chapter of this publication, the researchers presented a literature research of 45 MR related papers, which conducted formal evaluation. We categorized these works by adapting an approach by Duenser et al. [5] and compared the findings to their data and to further research that has been done within the MiReBooks project.

This research showed that the amount of evaluation in MR is steadily increasing. Where Swan et al. [16] only identified about 8% of all selected papers to be focused on evaluation in 2004, we could measure about 48%. This is another increase from the 29%, Duenser et al. [5] found in 2008. However, there are some clear limitations within our approach. First, we included all MR-related papers, while Duenser et al. and Swan et al. only researched on AR-related publications. Therefore, the comparability of them is limited. Second, our literature research was far smaller, as we only selected 94 papers, while Duenser et al. found over 6000 initial papers which were then reduced to 557, which were related to AR. Third, the main source for our research were ERIC Database and Researchgate supplemented by IEEE Xplore, LearnTechLib and Google Scholar. To overcome these limitations, a future research could extend the time delimitation to ten or 15 years. In addition to that, the number of analyzed papers should also be increased. These measures could help, confirming our findings concerning developments of evaluation in MR and a comparison to older works would not be as necessary as in this paper.

Overall, the research approaches to date show, that MiReBooks mainly focused on subjective measurements. This is also consistent with the results from our literature research and show again, that perception and attitude of participants are very important to the developers of VR tools. Researchers also conducted face-to-face interviews, which belong to the category of qualitative analysis. Methods that are attributed to objective measurement have not been used within these papers and therefore mark a possible gap for future research.

Concerning the MiReBooks project, we could confirm, that the evaluation approaches to date were consent to the current standard. By mainly focusing on subjective measurements, which were supplemented by qualitative methods, the researchers adhered to common practice in Mixed Reality. However, objective evaluation methods or usability evaluation techniques were absent from previous publications. Therefore, a future research could focus on testing the developed tools in terms of error rate, time consumption or accuracy or utilizing usability testing. This could not only lead to new insights withing the development of the MiReBooks tools, but in evaluation of Mixed Reality as a general. As the MiReBooks project is still ongoing, there will be further evaluation approaches in the future. One topic, which is currently undergoing a planning phase is about remote evaluation concepts.