1 Introduction

Wooden Light–Frame (WLF) structures offer flexible design possibilities for civil engineering applications in North America. These flexible assembly procedures rely on a blend of standardized technical knowledge and well-developed best practices to guarantee high quality and low operation times. However, despite the fact that the construction industry will be one of the largest growing sectors by 2026 (U.S. Department of Labor, Bureau of Labor Statistics 2018), within the next few years the construction sector will encounter a significant reduction of the expert workforce. The pending retirement of the “baby-boomer” generation that currently anchors the construction field poses an especially pressing problem (McGraw-Hill Construction 2012). To address this imminent shortage of expert workers, new teaching techniques must be implemented to create and foster the next generation of construction workers. The purpose of the current research is to design, develop, and evaluate a virtual reality (VR) learning environment that can measurably improve novices’ understanding of the installation and construction of WLF structures. With the help of experts in the fields of construction, training, and human cognition, we have outlined an effective and suitable VR training system that can guarantee a seamless replenishment of the expert workforce within the construction field. Further, by developing this enhanced training we also aim to establish the fitness of similar off-the-shelf technologies for commercial applications and implementation.

Our developed prototype allows users to engage with a virtual representation of a small set of WLF assembly components (e.g., studs, nails) and tools (e.g., tape measure, hammer, etc.). Participants learn to use these artifacts to assemble and disassemble a predefined WLF structure and move through the discrete phases of its installation from start to finish. To evaluate the effectiveness of this immersive video-based training tool, the VR training system is directly compared with existing 2D video-based training that is currently being used by a partner construction company. This study aims to provide a set of foundational data on the design process of a prototype VR learning tool, which can hopefully be adapted for subsequent design efforts in other domains or applications. The long-term objective of this line of research is to inform VR and AR training design for the myriad of different personnel involved in the construction industry; not only builders/installers, but also for architects and engineers. In doing so, we also aim to increase the applicability of our team’s established research in industrial augmentative systems and architectures (Tarallo et al. 2018; De Amicis et al. 2018; Simões et al. 2018; Simões and De Amicis 2016; Gune et al. 2018).

2 State of the art

Previous attempts to study applications of VR and AR systems within industrial design and manufacturing have been largely centered around a few specific training applications. These include guiding the worker through assembly steps (Fiorentino et al. 2014; Henderson and Feiner 2011; Optronique et al. 2001; Toro et al. 2007), worker training in an overall assembly process (Hořejší 2015; Peniche et al. 2012; Schwald and De Laval, 2003), and supporting assembly design, simulation, and planning (Doil et al. 2003; Friedrich et al. 2002; Posada et al. 2015; Sääski et al. 2008). The core incentives for the introduction of VR and AR within manufacturing include (but are not limited to): the reduction of errors associated with highly repetitive tasks, high noise, or poor ergonomics (i.e., assembly system factors); the reduction of errors related to the increasing number of products that share a high degree of similarity in components and configurations, which can cause confusion (i.e., product factors); and the reduction of errors due to workers’ memory, mental and physical abilities, skills or experience (i.e., worker factors). Although the literature on formal assessment and validation of VR and AR and multimodal systems is still sparse, several experiments have shown some encouraging trends. For example, users do seem to prefer multimodal interfaces over unimodal alternatives, as they offer the potential for faster processing of information, more flexibility, and better reliability in a range of usage patterns and preferences (VanWassenhove et al. 2005).

Despite these examples, there are still a few practical considerations that the current literature still fails to address (Neugebauer et al. 2016). These include high customization costs (Paoletti 2017), lack of interoperability, technological acceptance by workers (Merhar et al. 2018), and physical constraints of mixed reality (MR) hardware. Most importantly, there is no reliable and valid methodology for measuring cognitive demands during the learning of a procedural task in virtual environments, reflected by the lack of a standard adaption and certification. This failure to situate these exploratory VR efforts within such larger contexts has produced a noticeable barrier to the seamless incorporation of these technologies into the manufacturing domain. While this may have limited the impact of previous work, it also suggests a fruitful opportunity for appropriate research to make a broad impact on the field by addressing this salient need.

There currently exist several typical training approaches in the construction field, often delivered via in-person or Computer-Aided Training (CAT). These approaches include text-based user manuals, lectures or round table talks, video demonstrations, and even hands-on training sessions. Amongst these various methods, workers often tend to prefer more experiential learning approaches, as it is easy to lose engagement during lectures or when required to memorize technical procedures from user manuals (Harfield et al. 2007). However, physical hands-on experiential training has its own drawbacks, including cost-related factors (i.e., dispatching trainers and special equipment on site), and safety-related challenges in hazardous environments (e.g., mining plants) (Grabowski and Jandowski, 2015). Initial CAT training was developed to address these concerns, and often took the form of desktop simulations and gamification experiences. Most recently, however, these efforts have also begun to incorporate VR and AR experiences. Recent research has reliably demonstrated the effectiveness of VR and AR for assembly training, reducing error counts and task completion time, while also providing a usable experience for the learner (Borsci et al. 2016). However, despite such reductions of error rates, users often do express problems with the interaction or interface (Langley et al. 2016). This latter finding is consistent with other research that has identified some of the drawbacks of game—like VR training (Hermawati et al. 2015). For these reasons, companies are hesitant to fully replace traditional training with VR training, even though they tend to use VR training to augment more traditional in-person training sessions.

In the construction field, most previous VR research has been concerned with workers’ safety-related behaviors (Sacks et al. 2013), and training with cranes and other heavy equipment (Kayhani et al. 2018). To the best of our knowledge, no research has specifically investigated VR training within the domain of WLF manual assembly training. The use of VR as a training solution for educating workers in the WLF building construction field would be a significant breakthrough due to its purported efficiency, safety, and effectiveness. However, before such benefits can be realized, a detailed assessment is crucial to determine if this technology is indeed ready for industrial implementation, or whether it requires further development. Such explicit evaluations are crucial to company management due to the financial investment often required by new training initiatives. Thus, it is imperative to perform high quality and informative assessments of VR training technologies that justify any initial startup costs.

As defined by (Kirkpatrick 2006), good assessments of training are comprised of four sub components: reaction, learning, behavior, and results. Reaction assessments identify trainees’ satisfaction with training courses: positive reactions support learning, while negative reactions nullify or otherwise inhibit learning. Learning assessments evaluate the achievement of predefined knowledge gains. Both reaction and learning assessments are considered short-term evaluations. Behavior evaluations, however, are long-term, as they focus on the application of the training and its ability to improve work over many years or applications. Results assessments are also considered long-term evaluations and they are often indicators of overall company success. While the current literature demonstrates a clear interest by the scientific community in evaluating behavior in the context of practical occupational learning contexts, it has not yet fully explored the potential of advanced modern training systems, nor has it determined the role emergent technologies might play in its evolution. This paper serves to clarify the role VR may serve within WLF construction training, and to evaluate its initial effectiveness focuses specifically on the first 2 sub-facets of (Kirkpatrick 2006)’s hierarchy (i.e., reaction and learning).

3 Methodology

In this paper, we seek to address the simple research question: Do immersive VR training simulations of construction processes teach more effectively than other more traditional methods of training (i.e., how-to video)? We hypothesize that VR training will not only result in better performance metrics during the recollection and application of the learned skills, but will also be perceived more positively than the more traditional alternatives. To answer these questions, we first devised a training and evaluation regime with the intent of fairly representing and investigating a real training need. This included an in-depth task analysis and the formation of explicit and well-defined learning objectives (Sect. 3.2), which naturally informed the actual design of the training tool. Further, the outcome of these development efforts was then empirically evaluated against the alternative (video) training, detailed in Sect. 4.

3.1 Task analysis procedure

In order to produce a robust understanding of the WLF task, we followed a three-stage procedure described below:

  1. 1.

    Identify the skill the user needs to acquire. The target skill was chosen in order to have the proper difficulty level with respect to the application field. In our case study, the target skill has been identified as the construction of a WLF wall with a door. Nailing two studs together would have been too simple and not representative of the cognition process involved; on the other hand, building an entire house would have been too difficult since it would have had overwhelmingly excessive variables and outcomes.

  2. 2.

    Identify prerequisite skills. Skills already mastered by the user should not be included as part of the task analysis, while skills not mastered must be included.

  3. 3.

    Subdivide the target skill into discrete tasks, in a manner enlightening to users about the correct procedure.

The results of the complete Task Analysis are presented in "Appendix" 1—Task Analysis for Wall Construction.

3.2 Learning outcomes

Learning Outcomes (LO) of any training course must be clear, as they explicitly define what trainees must know and be capable of after completing the course. In order to describe LO’s for the proposed VR training course in WLF, we applied the revised Bloom’s taxonomy of learning to the results of our task analysis (Anderson et al. 2001). According to this taxonomy, LO’s must be described with specific and measurable terms. Qualitative terminology must be avoided because verbs like “to know”, “to understand”, “to appreciate” are too vague. Action-type verbs like “to remember”, “to list”, “to describe” are preferable because they are connected to the level of learning involved. The adopted taxonomy lists six Levels of Learning (LoL) (Fig. 1) and it suggests appropriate “action” verbs for every level (Mualem et al. 2018).

Fig. 1
figure 1

Revised Bloom's taxonomy, (Armstrong 2010)

Thus, for each item in the task analysis, and for each LoL (if applicable), LO’s were defined including the skill developed and knowledge to-be-acquired by the trainee (see Table 1). These LO’s also served as the basis for the assessment of both initial knowledge levels and the subsequent performance of each trainee after training.

Table 1 Learning Outcomes of the task analysis

3.3 Experimental validation of the VR training tool

3.3.1 Participants

Twenty students from degrees in Mechanical Engineering and Industrial Product Design of the University of Bologna were asked to participate in the experiment. The participation was on a voluntary basis and there was not any compensation for participation. Participants were divided in two equal groups; each group was composed of 10 participants with an almost equal percentage of male (60%) and female (40%) participants in each group.

Some of the participants had already experienced immersive technologies, either VR or AR. To attempt to provide a common level of experience with immersive technology among all the participants, a short familiarization session on the HTC Vive was conducted for all participants. In this familiarization session, participants could walk around in the virtual environment and learn the input of the Vive controller to interact with virtual objects. The VR familiarization session and the equal % of females in each group are critical to help control for the effects of prior experience and any gender effects, as it has been demonstrated that both of these factors can influence the use of immersive technologies (Sagnier et al. 2019). Further, the timber construction sector is mainly a male-dominated sector, with less than 2% of female carpenters and practitioners (U.S Department of Labor, Women’s Bureau 2017). By representing both genders in our study, at near equivalence, will provide evidence that any effects would be robust, even given the transition towards a more inclusive future workforce. The average age of the participants was 24 years old. In (Fig. 2) one of the participants performing VR training is depicted.

Fig. 2
figure 2

Student performing the VR training

3.3.2 Procedure

The evaluation of the effectiveness of the VR tool was comprised of 3 phases (Fig. 3). In the first phase, as recommended by (Kirkpatrick 2006), all participants completed a preliminary knowledge test (pre-test) that captured technical knowledge about WLF components and installation techniques. The inclusion of this preliminary test is critical as it permits an accurate assessment of what each trainee already knows, which then allows for a more accurate assessment of knowledge gained specifically from the training session. This preliminary test consisted of ten questions; two questions for each of the five tasks covered in the Training Session described below. Every question of the test is a closed-ended question, with yes or no as possible answer. However, if the user answered yes to a particular question, they were required to verify this knowledge with an open-ended answer. In such a way, it discourages false responses of ‘yes’ when such knowledge does not exist, and thus limits the likelihood that users would respond ‘yes’ in order to avoid appearing they don’t have such knowledge. All the possible questions we devised for the preliminary/posttest are listed in Appendix 2. Ten questions were randomly selected in order to compose the questionnaire for each participant.

Fig. 3
figure 3

Methodology workflow for the experimental comparison of VR training systems with traditional training methodologies

In the second phase, participants were trained on WLF construction processes. Due to the complexity of the overall construction sequence, we conducted the experiment only for the two tasks: Wall Layout and Wall Framing. The second phase contained three sub-components. The first component [I] was the training sequence: one group (A) watched a VR immersive video, whereas the control group (B) watched a 2D training video that has been used previously to train employees. An example of a WLF wall is available in Fig. 4. Importantly, the construction of the VR immersive video was based directly on the original training video; the original video was analyzed using the task analysis procedure described above and then edited according to the suggestions of the construction company. In other words, every part of the VR training demonstrated a specific sequence of activities that replicated the sequence performed in the source videos (e.g., Fig. 5). In this way, we ensured that our understanding of the task and construction sequence portrayed in the VR training reflected the actual construction sequence performed by the workers of the construction company. The VR immersive video consists of a series of animations in a virtual workshop where individual steps of the WLF assembly process were detailed in which users can freely navigate and change their point of view, and second, it is also possible to render the appearance of the building site, which should increase immersiveness. This enabled the participants to observe, for example, how timber hardware is handled, or where reference marks are drawn. Moreover, given the flexibility of the environment, it also integrates a few best practices identified by experienced workers as intermittent steps within the 3D assembly sequence. The learning content includes 3D models of tools and typical building site equipment. The main functionalities and instructions for tools are described to the user via dedicated pop-up menus that the user can activate. For example, these instructions might describe how to change drill bits while using a power drill or how to recharge a nail gun. Moreover, safety information and risk prevention behaviors in the use of this equipment are described to the user. Each step of the assembly was represented as a discrete scene with its own written caption superimposed within the environment, and the viewer could freely cycle between scenes. Each also included pause and rewind functions, so the trainees could review the more complex passages that may be more challenging to learn.

Fig. 4
figure 4

Framed wall components

Fig. 5
figure 5

Animation of the toe nailing step

After completing the training materials, participants were then asked to read the design plans for a WLF wall [component II]. The VR training group performed this construction plan in the VR environment following the steps of the VR immersive videos, whereas the control group studied the technical drawings and plans for the same WLF wall. In the last component of the training session [III], both groups were asked to execute the plans from component II, and were not given any additional instructions or plans. Both groups performed this construction in real-life with actual building materials. User performance was evaluated using the performance metrics, described in Sect. 3.5.

Finally, after completing the Training Session, participants completed a re-ordered version of the pre-test, thus permitting an evaluation of knowledge gains within an individual participant. After which, they then completed a System Usability Scale (SUS, Sect. 3.4) questionnaire to evaluate their perceptions of the system.

3.4 System usability scale

Participants’ appreciation to the technology was principally described in terms of the SUS, a broad metric used to compare the general technological finesse humans are able to apply across a variety of systems (Brooke 1996). It is a highly cited and utilized metric across many industries, and it has been integral to a rich body of usability literature (Brooke 1996). This subjective usability test quantifies trainees’ perception of the instructional materials and is important to better understand how Information Technologies (IT) such as VR are identified as useful and accepted by users in an industrial context. This is especially pertinent as perceived usefulness, and ease of use, can impact the learning process (Davis 1989). A five-point Likert scale also asked users to rate their own level of learning. In other words, trainees were asked to rate how much they felt they had learned via the provided training. This subjective self-assessment of learning provides a convergent data point that can be used in concert with more objective levels of actual performance (see Sect. 3.5) to better gauge how users perceive the ultimate utility of a given training technique for enhancing their job skills.

3.5 Performance metrics

Participant performance was quantified in terms of both time to completion and number of errors made in the wall-building portion of the experiment. For the purpose of this research, an “error” is defined as an incorrectly performed step in the assembly sequence of a severity sufficient that future steps cannot be completed correctly, or the overall integrity of the finished product was compromised. Specifically, we classified the following activities as errors: (1) a step is performed in a different way from how suggested and the result of the step create a different result, (2) a step is missing, (3) incorrect use of the carpenters’ tools. Research in VR assembly training advocates that time and error count are important means of capturing user performance, regardless of domain [e.g., for industrial assembly tasks (Roldán et al. 2019) and for the assembly of medical devices (Ho et al. 2018)].

4 Results

For each task, training time and execution time are reported in Table 2, while a graphical representation is provided in Fig. 6.

Table 2 Descriptive statistics for all measures by training group
Fig. 6
figure 6

Time to Complete Tasks versus Participant Group and Tasks (confidence interval for error bars: 95%)

Independent samples Welch’s t-tests were conducted on learning and execution time for the wall layout and framing tasks. As is visible in Table 2, the only reliable difference between the 2 training conditions was for the learning time Task 2: Wall Layout (t(1, 18) =  − 19.06, p < .001, Cohen’s d =  − 8.53). In the VR training condition, the average training time was 22 min and 34 s; while in the traditional set-up, the average training time was 36 min and 56 s. For both training methods, the average execution time was approximately 29 min. Thus, training was approximately 40% faster in VR, and did not impact the efficiency of actual performance. With respect to our research question, this confirms one dimension by which VR training is more effective than traditional video-training. There are multiple possible elements of the VR training environment which could have contributed to this improvement. For example, the VR training could be completed more quickly than the video by confident users, as the net time the animations played during was shorter than in the video allowing the users to progress more rapidly. Furthermore, because the steps in VR were explained through written text rather than by a narrative voiceover, the speed of information communication may have been improved in VR. Future research is necessary to determine exactly which aspects of the VR training were most conducive toward the increases in the speeds of the learning time, and whether VR is strictly necessary to produce similar advantages. It is unknown why the magnitude of the difference in learning times between tasks 1 and 2 was so severe.

In terms of errors, VR-trained users made 20 total errors during real-life construction, while traditional training users made 26 total errors there was also no difference in # of errors committed across the training conditions (t(1,18) = 1.19, p > .05). This suggests that not only does VR training permit faster training, but also does not tradeoff this speed for a corresponding decrease in accuracy. It is worth noting that three errors did occur more frequently in the VR training: the wrong application of the “16 on center” rule; the incorrect use of the speed square; and the incorrect crowning procedure. It is likely that these errors occurred more frequently in the VR set-up because a VR trainer was not present, hence, the trainees shown explicitly how to perform a specific operation, or to use a tool. Regarding the “16 on center” rule, in the video the worker showed to hook the tape to the bottom plate edge and started measuring from the edge in a continuous way. However, in the VR environment the tape is moving by its own from one of the bottom plate edge along the studs (Fig. 7).

Fig. 7
figure 7

Comparison of one of the step of the Layout task, in the video training mode (up) and in the VR training mode (down)

An example of this error lies in the fact that some of the participants did not hook the tape to the bottom plate edge, instead measuring 16 inches every time from the mark they previously drew (Fig. 8).

Fig. 8
figure 8

Incorrect use of the tape after VR training

Regarding the handling of the speed square, all the instructions about how the user should handle tools and equipment are written instructions; while, as evidenced in the comments sections of the SUS questionnaire (see below) seeing a human being handling the carpenters’ tools could facilitate learning how to use the tools. Previous research has shown examples of how various degrees of engagement with virtual avatars can impact the learning process, which may also be applicable in this scenario (Kim et al. 2017). Despite this suggestion, most of the users of the VR training set-up (6 participants) used the speed square in the correct way (Fig. 9).

Fig. 9
figure 9

Correct use of the speed square

Similar to the previous two errors, the crowning step also proved most difficult in the construction sequence. Four members of group A and three members of group B made errors in this step. The different instructions on how to perform the crowning step are depicted in (Fig. 10).

Fig. 10
figure 10

Visual instruction by the trainer in the video training mode (left), and the correspondent instructions on canvas in VR training mode (right)

In (Fig. 11), a participant is depicted that followed the traditional training performing the crowning operation as suggested by the worker in the video.

Fig. 11
figure 11

Correct execution of the crowning operation

Meaningful data can also be extracted from the pre- and post-assessments. All participants demonstrated very low initial levels of knowledge on the pretest. For both groups, before training participants were unable to name any of the 6 type of studs used in WLF construction. However, after training there was a significant overall increase in this knowledge (F(1, 18) = 10.02, MSe = 3.06, p < .01). This knowledge gain was consistent across both groups, evidenced by the lack of a main effect of training type (F(1, 18) = 2.09, MSe = 3.06, p = .16). Similarly, all participants prior to training were unable to identify any of the 3 symbols used to mark the different studs during construction, however after training there was again a significant improvement in this knowledge (F(1, 18) = 1892.25, MSe = .04, p < .01). Once again this improvement did not vary by training condition (F(1, 18) = 2.25, MSe = .04, p = .15). Further, participants expressed that they were unable to define the 8 other concepts on the pretest prior to training, however after training this was likewise significantly improved, with participants acknowledging they could explain significantly more of these 8 concepts (χ2 = 79.65, p < .01), however once again this improvement was not different across training conditions (ps > .05). Hence, it appears that learning gains in the VR training condition were consistent with the results of more traditional training. However, one must consider that since training in the VR environment was approximately 40% faster than traditional training, it seems fair to acknowledge that VR training is more efficient in producing these learning gains than its traditional counterpart.

Overall, there was no reliable difference on the overall SUS score (t(1, 17) = .48, p > .05). This suggests that both training solutions were functionally equivalent in overall usability. Examining each of the 10 questions (Table 3), it is worth noting that for all questions, except n. 4, there was likewise no difference between the VR and video training groups. Users did acknowledge that there might be a higher need for support in using the VR system versus the video training (t(1, 18) = 2.22, p = .04, d = .99), but this is likely due to the somewhat novel nature of the HMD and other VR hardware. Importantly, this heightened need for support does not appear to be a barrier to users’ achievement given (1) the equivalence of scores on the other questions of the SUS (specifically n. 2, 3, 7–10), and also (2) the error and pre-post performance measures above. In the comments section from the VR group, 4 users suggested that voice instructions be included in the future, and 3 users asked for a VR trainer who shows how to perform the most difficult operations. This is consistent with the recommendations from the error analysis above.

Table 3 SUS questionnaire' results

5 Conclusions

In this paper, we describe and develop a virtual reality learning environment for the wood-based construction sector. The simulated learning environment was designed to develop manual skills in young carpenters, hopefully providing more effective training for new workers. This application was empirically tested through an experiment, which compared learning time and performance in the simulated VR learning environment versus a traditional video training system currently used in industry. Such research on VR training systems for manual skills development is not often performed (Hoedt et al. 2017), especially in the construction sector, and thus this work represents one of the first examples of such research in the area of wood-based engineering and construction. Results showed that the VR training is approximately 40% faster than traditional training yet provides similar (or better) levels of performance. In other words, learning was much faster in the VR case, and produced knowledge gains consistent with the traditional video method, suggesting that VR training is more efficient than video training. This increase in training efficiency is a potentially transformative windfall for the business bottom-lines of construction companies. By providing training in VR, not only may workers be trained effectively, but training costs can be cut considerably depending on development and deployment practices. Such training benefits are likewise enticing for other industries, and the research team strongly encourages the exploration of implementing such technologies in other contexts.

Future studies should further explore and extend the current findings based on the qualitative recommendations from the participants and highlighted theoretical potentials presented by other more abstract training research publications. For example, it would be interesting to see whether the implementation of a virtual trainer/instructor within the VR training system, or the implementation of voice instructions, might positively impact VR training efforts. Furthermore, future domain-centric research should also focus on more complex construction tasks (e.g., framing of a wall with a window or framing operations for the roof of the house) to see whether varying task requirements impact the effectiveness of VR training. Lastly, training systems which implement different modes of interaction should likewise be pursued on account of a general lack of knowledge on the appropriacy and advantages of interaction schemas, both within and outside of the training sector. While our research was restricted by an abstract interaction metaphor, more specific manual dexterity skills could be imparted through a training system focused on precise hand-tracking and haptic feedback. Regardless of the specific hypothesis, the tremendous efficiency and potential of VR in training presents a promising direction for all manner of future research projects, and these preliminary results indicate a substantial need for more science and innovation both at the precise level of construction training and the broader level of immersive learning.