1 Introduction

Almost all educational systems, by incorporating various technological innovations in the curriculum, try to provide a high quality and a wide range of knowledge, to connect the learning subjects with daily practices, to release the education of its spatial and temporal limitations and to foster a number of skills, abilities, and learning dispositions, called the “21st century skills” (Trilling and Fadel 2009). A technology that supports the realization of all of the above, both in formal, non-formal, and informal settings, are 3D virtual environments (VEs) (Duncan et al. 2012).

Researchers have tried to evaluate the learning outcomes when using VEs in all levels of education, focusing on the teaching methods or on certain technical aspects, while the inclusion of psychological or subjective factors that may have an impact on the users’ learning experience is rather limited (e.g., Merchant et al. 2012; Tüzün and Özdinç 2016). Moreover, there is no common consensus on which factors to include, how they interplay with each other, or even how they are defined. On the other hand, by knowing how specific factors positively or negatively affect users, we can, with appropriate interventions, maximize the effectiveness of VEs.

This study contributes towards this direction, by developing and testing a model for examining a number of factors that was theorized to shape the learning experience when using VEs in informal learning settings. What is more, how the included factors interact with each other was also examined. It has to be noted that informal learning was chosen because it is the least studied in relation to VEs. For that matter, a virtual museum was developed presenting the work of Nausica Pastra, a Greek sculptor/artist with an interesting international carrier, who, on the other hand, was not widely known in her mother country. Following a literature review, the factors that were included in the model were perceived usefulness, perceived ease of use, perceived application’s realism, presence, motivation, as well as the enjoyment when using VEs. Also, the learning outcomes, what the users were able to learn when using the VE, were included as one of the model’s constructs. In the sections to follow, a brief review of the literature regarding informal learning and virtual museums is presented, followed by the literature review regarding the factors affecting the learning outcomes when using VEs. Next, the research rationale, methodology, and the results analyses are presented. Subsequently, the results are discussed, and the conclusion completes the work.

2 Informal learning, museums, and (3D) virtual museums

Informal learning is spontaneous, meaning that it is a non-systematic and often a non-organized process that can be evaluated on the basis of one’s overall lifelong learning (Coombs et al. 1973). It can also be defined as the sum of the learning activities not taking place in a formal classroom (Gerber et al. 2001). Informal learning is also experience-based, non-routine and often tacit (Marsick and Watkins 1990). Taking into account the above, it can be argued that informal learning originates accidentally and sporadically, does not follow a specified curriculum, and the learner is highly motivated to learn. Also, since it is not planned pedagogically or systematically organized, it is difficult to quantify and test it. The most significant part of the educational research is related to formal education and planned teaching; informal learning is often overlooked and not regarded as a particularly valid form of learning. On the other hand, there are several studies highlighting the importance of informal learning in an individual’s life (e.g., Fonseca et al. 2016; Harrop and Turpin 2013). Examples of where informal learning can take place include after-school activities, community-based programs, libraries, at home, and museums.

Informal learning and museums are closely related; a substantial number of studies examined the former in the context of the later. Museums are institutions having as a purpose to collect, preserve, and display artifacts (permanently or temporarily) of artistic, cultural, or scientific importance with the objective to educate the public (Alexander et al. 2017). Archaeological sites, monuments, traditional settlements, urban and natural environments can also act as museums. From an educational perspective, museums try to engage visitors in learning experiences, by stimulating their curiosity and interest (Hooper-Greenhill 2013). Consequently, learning in museums is mostly experiential, depending on the individual’s interests and preferences. As already mentioned, several studies have been conducted regarding learning in museums, in all age groups and in all kinds of settings. These demonstrated that the individuals experienced enjoyment and positive emotions, being free to browse through the museum and interact with its exhibits at their own pace (e.g., Dilli and Dümenci 2015; Miglietta et al. 2008). At the same time, the visitors acquired a varying degree of useful information and knowledge and developed cognitive, affective, as well as linguistic skills (Ampartzaki et al. 2013; Hackett 2014; Synodi 2014).

In recent years, a new form of museums has emerged, that of digital/virtual museums. These environments can be stand-alone applications, websites, mobile applications, or a combination of the above. They draw on the characteristics of physical museums, complementing or enhancing the experience one has, through personalization and interactivity. The virtual museums can act as the digital footprints of real museums, as exhibitions created from diverse resources, or they can be fictional environments with equally fictional exhibits. Regardless if these digital museums represent real or imaginary environments, their visitors have the ability to communicate, explore, or even modify the digital space and objects (Pujol and Lorente 2013). Virtual museums are quite popular and attract the research interest for the examination of the experiences (learning or otherwise) the users have when “visiting” them (e.g., Li and Zhou 2016; Naya and Ibáñez 2015; Pescarin et al. 2012).

There are several ways to develop such interactive environments, however, VEs have notable advantages compared to other types of applications (i.e., web pages and 2D multimedia applications) because they (a) can achieve the ambience of a physical museum, (b) allow free navigation inside the simulated space, (c) allow increased interactions with the 3D objects/exhibits as well as with other users present in the VE, thus, offering a more realistic experience (Pescarin et al. 2012), and (d) the 3D digital artworks can be viewed from multiple perspectives allowing the user to get an idea of their actual dimensions, placement, and use (Skamantzari and Georgopoulos 2016). Indeed, several studies examined the learning outcomes when using 3D virtual museums in a variety of settings such as (a) formal or non-formal education (e.g., Seok and Kim 2014), (b) by combining visits to a real museum and virtual visits to its digital counterpart (e.g., Jones and Alba 2016), and (c) in the context of serious games (e.g., Paliokas and Sylaiou 2016). On the other hand, research on the impact of 3D virtual museums, and of VEs in general, in strictly informal learning settings is rather sparse (e.g., Voss et al. 2013).

3 Factors affecting the learning experience when using 3D virtual environments

On the basis of what was presented in the preceding section, it can be argued that 3D virtual museums provide a good basis for examining VEs in relation to informal learning. VEs offer unique and realistic experiences to users, opportunities for personalized learning (Rapanotti et al. 2012), and, in general, their educational value is widely accepted (Allison et al. 2012; Merchant et al. 2014). Although most researchers reported -more or less- satisfactory learning outcomes when using them, there are many different, and sometimes contradictory, explanations why these outcomes were achieved. In addition, the focus was on formal education (Merchant et al. 2014). Far fewer studies examined psychological and subjective factors as well as their interactions. The study of the above in informal learning settings proved to be even more sparse.

In order to have a better understanding on which factors affect the user’s learning experience when using VEs, a quite extensive literature review was carried out spanning from 2007 to 2017, revealing a substantial number of well-organized and interesting studies that provided useful information and ideas. For instance, deNoyelles et al. (2014) examined the role of self-efficacy. In another case, the effects of presence on the learning outcomes were examined (Hassell et al. 2012). Once again presence together with enjoyment (Sylaiou et al. 2010) or with perceived usefulness (Tüzün and Özdinç 2016) were the factors of interest in other studies. Jia et al. (2014) found that perceived system efficacy and self-efficacy had an impact on the effectiveness of VEs in informal education. Also, Naya and Ibáñez (2015) found that factors such as enjoyment and perceived ease of use affect the learning outcomes, while Yaman et al. (2008) considered the instructional support and the learner’s personal interests. Presence and usability were used by Merchant et al. (2012) for explaining the observed learning outcomes in VEs. Finally, Lee et al. (2010) developed a model that tried to explain the learning outcomes in VEs by using a number of VE’s features together with cognitive beliefs, presence, control, motivation, and reflective thinking.

The vast majority of the studies that were found had little in common; different factors were used as key determinants of the learning experience, different types of VEs were examined, and the learning subjects/learning settings were also diverse. Also, much of the research did not deal with the learning experience per se but with factors that, beyond any doubt, can shape it (e.g., usability). What is more, there is no common consensus on the definition of some factors. For example, the terms presence, immersion, and flow were used interchangeably and were even examined using the same questions. Nevertheless, by summarizing the literature review, six factors were identified, commonly used for explaining the learning outcomes, that can be grouped into the following categories:

3.1 VEs technical features

  • Perceived realism. Though a VE’s realism depends on certain technical features (e.g., how detailed the 3D models are and to what extent a user can interact with them), individuals perceive it differently. Studies indicated that realism acts as a facilitator of learning (Dalgarno and Lee 2010; Harrington 2012; Lee et al. 2010). Also, in this study, realism was viewed as the sum of other factors such as perceived audio and visual adequacy and perceived interactions’ adequacy.

  • Perceived ease of use. Perceived ease of use (PEU) has been found to play a key role in one’s experience when using VEs (e.g., Lee et al. 2010). As with perceived realism, PEU was also viewed as the sum of other factors such as perceived usability and perceived feedback’s adequacy.

3.2 State of mind

  • Enjoyment. Studies have shown that enjoyment and, in general, the positive feelings towards a VE contribute to knowledge acquisition (Faiola et al. 2013).

  • Presence. Presence is one of the most troubling factors, suffering from definitional issues because there is a tendency to use this term as a synonym for engagement, flow, or immersion. It refers to the phenomenon of one acting and feeling as being in the digital world (Sanchez-Vives and Slater 2005). Though presence has an impact on the learning outcomes (e.g., Bulu 2012; Lee et al. 2010), there is a lot of controversy regarding its exact role. There are those who supported that it depends on the use of specialized equipment (e.g., North and North 2016) and those who supported that it depends on the user’s personality (e.g., Nunez 2004). In order to examine the above, it was decided to include presence as a construct, view it as a synonym of immersion, and examine its role.

3.3 Learning facilitating features

  • Perceived usefulness. Perceived usefulness is the degree to which one believes that his/her performance would be enhanced by using an ICT tool, (Davis et al. 1989). It is commonly used as a construct in studies regarding VEs’ impact on learning (e.g., Lee et al. 2010; Sharda et al. 2004). In terms of its impact on knowledge acquisition, it can be viewed as perceived learning facilitator, how one thinks that the use of a specific ICT tool can help him/her acquire some knowledge.

  • Motivation. Motivation is not a static attribute, but it is sensitive to how the content is presented and inherently volatile (Linnenbrink and Pintrich 2002). Research has indicated that, in VEs, motivation and the learning outcomes are closely connected (McLellan 2004). It has to be noted that in informal learning intrinsic motivation (one’s need to learn about the application’s subject) is already present and quite strong. Thus, it was interesting to examine whether an application can have an impact on one’s already strong motivation to learn.

4 Research model and hypotheses

Figure 1 presents the proposed research model, which includes the factors presented in the previous section. Since the literature suggests diverse relationships among the factors, it was decided to add and test all possible relationships. The study’s hypotheses were based on these relationships:

  • H1a-e: Perceived realism influences (a) presence, (b) enjoyment, (c) motivation, (d) perceived usefulness, and (d) the learning outcomes.

  • H2a-e: Perceived ease of use influences a) presence, (b) enjoyment, (c) motivation, (d) perceived usefulness, and (d) the learning outcomes.

  • H3a-d: Presence influences (a) enjoyment, (b) motivation, (c) perceived usefulness, and (d) the learning outcomes.

  • H4a-c: Enjoyment influences (a) motivation, (b) perceived usefulness, and (c) the learning outcomes.

  • H5a-b: Motivation influences (a) perceived usefulness and (b) the learning outcomes.

  • H6: Perceived usefulness influences the learning outcomes.

Fig. 1
figure 1

The research model

In this study, the dependent variable was the learning outcomes, while all the other acted as independent variables. Because structural equation modeling was going to be used for model testing, perceived ease of use and realism were the exogenous variables, while presence, motivation, enjoyment, and perceived usefulness were the endogenous variables.

5 Method

To test the research model, a study was designed and implemented utilizing a 3D virtual museum developed specifically for its purposes. The target group was adults, data were collected using a questionnaire and tests, and the project lasted for six months. The above are further elaborated in the coming sections.

5.1 Participants and duration

An invitation was issued addressed to anyone interested to participate in the project. This invitation was posted on social networks as well as in large communities of virtual worlds (i.e., Second Life and OS Grid). Participants were informed that they were going to “visit” an art museum and complete a short questionnaire, that the study was conducted on a voluntary basis, and that by entering the virtual museum consent to participate was deemed to have been given. Detailed instructions were provided on how to login to the virtual world and on other technical issues (how to connect, what software to use, and the necessary settings). Participants were also informed that the time-period they remained connected and their IP addresses were recorded. The virtual museum remained active for a period of six months (June to December 2017) in order to collect a sufficient sample size.

5.2 Materials

As already mentioned, a 3D virtual museum was developed for the project’s needs, using OpenSimulator (http://opensimulator.org/), an open source platform which allows the development, management, and hosting of VEs. The museum presented the work of the sculptress Nausica Pastra who pursued an international carrier but she remained relatively unknown in Greece. This was exactly the reason for selecting this artist. Since the participants were Greeks, there was little chance the museum’s visitors to be familiar with her life and work. Consequently, it would be easier to (a) check whether the participants acquired any knowledge and (b) to attribute this to their “visit” to the museum rather than to any other factor.

Most of Pastra’s creations were geometric forms based on mathematical functions. Also, the team had at its disposal high-resolution photographs of her sculptures. Therefore, creating their exact digital replicas was a relatively easy task. All were placed in a 512Χ512 meters imaginary natural environment (Fig. 2). While there was no specific order in their placement, the exhibits were grouped in small and big ones. The former were placed inside two small buildings, while the latter were placed in open spaces. Screens placed next to each exhibit provided information for them. Also, other screens provided information (in the form of websites, texts, and other audio-visual material) for Pastra’s life and milestones in her carrier. The users could only walk, flying was disabled. This was done because the museum was divided into three areas (two main areas, and an exit area) separated by walls and gates. For gaining access to the next area, users had to answer some questions in order to lift the gates (see section “Instruments” for further details). The development of the application took about two months.

Fig. 2
figure 2

Screenshots from the 3D virtual museum

5.3 Instruments

For measuring the model’s factors, a questionnaire was used. For developing it, a number of questionnaires (with high reliability and validity) used in other studies related to VEs were considered. Specifically, presence was measured using four questions from the presence questionnaire used by Novak et al. (2000). For measuring perceived usefulness, four questions from the Computer Attitude Scale (Selwyn 1997) were adapted. This scale is predominantly used in studies examining users’ intentions to use various technologies in an educational context (e.g., Fokides 2017; Teo and Noyes 2011). Four questions coming from the same scale were used for measuring perceived ease of use. From Witmer’s and Singer’s questionnaire (1998), four items were used for measuring the perceived application’s realism. The Intrinsic Motivation Inventory (Tamborini et al. 2010) provided four items for measuring enjoyment. Finally, from Martens’s et al.’s questionnaire (2007) four questions were adapted for measuring motivation.

The questions were translated into Greek by two pairs of experts (a computer science professional and a psychologist with expertise in VEs), all being proficient in the English language. The two versions were then back-translated into English and viewed by another pair of experts. The final version was obtained through a consensus meeting of all experts, aiming to assess the semantic adaptation. Thus, the final version of the questionnaire was formed, having a total of twenty-four five-point Likert-type questions (worded “Strongly Agree”, “Agree”, “Neutral”, “Disagree” and “Strongly Disagree”). The questionnaire is presented in the Appendix.

To effectively communicate substantive knowledge in informal learning conditions is a formidable task, and, probably, even more difficult is to quantify the acquired knowledge. Nevertheless, this task was accomplished by using three short tests, with fourteen questions each (right-wrong, multiple choice, and fill-in-the-blanks), all relevant to what was presented in the museum (i.e., the life and important milestones in Pastra’s life, her sculptures, and artworks). The first two tests were embedded in the gates blocking access to the different areas of the museum, while the third was presented to the users together with the questionnaire. It has to be noted that taking the tests was not mandatory; the gates could be lifted either by answering the questions or by simply skipping the tests altogether. Also, it was not required to answer the questions correctly for allowing access to another area; even if all answers were wrong the gates were lifted. In the exit area, there was a large screen presenting the final set of questions and the questionnaire. As with the other two sets of questions, participants were given the choice to take the test and the questionnaire or simply skip this step and log-out. The reason for doing so, was because the study examined informal learning; under no circumstances, users should be left with the impression that they were tested for what they have learned.

5.4 Procedure and data processing

After logging-in, participants were free to explore the virtual environment in any way they wanted, examine the exhibits in any order they liked, and study the information screens for as long as they considered necessary. The only task they were given was to reach the exit area of the museum. Also, they were asked to remain logged-in for at least three-quarters of an hour, which was considered the minimum amount of time for a user to visit all the areas.

Scores were obtained by allocating numerical values to the responses in the questionnaire: “Strongly Agree” scored 5, “Agree” scored 4; “Neutral” scored 3; “Disagree” scored 2 and “Strongly Disagree” scored 1. Also, three scores were computed on the basis of the correct answers in the tests and were added as variables, forming the learning outcomes factor. From the data analyses the following cases were excluded (a) questionnaires and tests from users who remained logged-in for less than forty-five minutes, (b) multiple visits to the museum from the same IP address and, subsequently, multiple completion of the questionnaire and tests (only the first visit was considered valid), (c) questionnaires without three complete sets of tests and vice versa, and (d) questionnaires without variation in their responses. As a result, the final sample size was 612 participants (out of the initial 875).

6 Results

Coming to the results, the distribution of males and females was fairly even (49% and 51% respectively). Up to the age of thirty was 61% of the participants, while above the age of forty was only 10% of them. Scores from each of the questionnaire’s items were summed to provide total scores on each construct (including the tests’ scores) (Table 1). As a normative guide to interpretation, a score below the 25th percentile (19) can be interpreted as a relatively low score in the tests, whereas a score above the 75th percentile (32) can be interpreted as a relatively high one.

Table 1 Factors’ mean scores

6.1 Exploratory and confirmatory factor analysis

Although all the questionnaire’s items came from already validated scales, an Exploratory Factor Analysis (EFA) and a Confirmatory Factor Analysis (CFA), were conducted in order to (re)establish its overall validity and reliability. The study’s 612 cases were considered very good (Comrey and Lee 2013) and even satisfied Everitt’s (1975) strict rule for at least 20:1 sample to variable ratio. The data were suited for factorial analysis because (a) the kurtosis and skewness indices were small and below the recommended level of |10| and |3| respectively, as suggested by Kline (2005), (b) the Kaiser-Meyer-Olkin Measure of Sampling Adequacy index was .912, (c) the Bartlett’s Test of Sphericity was significant (p < .001), and (d) the extraction communalities were above .5, as shown in Table 2 (Hair et al. 2006).

Table 2 Exploratory factor analysis

For assessing the underlying structure of the twenty-seven items (the twenty-four questionnaire’s items, plus the three tests), principal axis factor analysis (PAF) with oblique rotation was selected, because (a) PAF accounts for the covariation among variables (Kline 2005), and (b) oblique rotation produces more accurate results when the research involves human behavior (Costello and Osborne 2005). As hypothesized, seven factors were extracted (the questionnaire’s six factors, plus the learning outcomes) using (a) the scree test (Costello and Osborne 2005) (Fig. 3) and (b) Kaiser’s (1960) criterion (eigenvalue >1). No variables were dropped, all items loaded high on their respective factors (> .6), each factor averaged above the .7 level, there were no significant cross-loadings between items, and there were no correlations between the factors greater than .7 (Hair et al. 2006) (Table 2). A 68.76% of the total variance was explained by the seven components, which was considered very good. The reliability of all constructs was assessed using Cronbach’s alpha and it was found to be between .84 and .97, well above DeVellis’s (2003) recommendations (> .70). Due to the above, the internal consistency of the constructs and of the overall scale was considered more than satisfactory.

Fig. 3
figure 3

Scree plot of the eigenvalues

CFA was performed by imputing the factor structure that emerged during the EFA into AMOS 25. The convergent validity was considered satisfactory because (a) the standardized estimates were acceptable (ranging from .71 to .99), (b) the items explained at least half of the variance of the latent variable they belonged (ranging from .52 to .98), (c) the average variance extracted (AVE) for each factor was above the .50 level, and (d) the measurement items loaded with significant t-values on their constructs (Table 3) (Hair et al. 2010). For evaluating the questionnaire’s discriminant validity, the square root of the AVE for any given factor, as well as the correlations between this factor and all other factors were calculated. The discriminant validity was considered satisfactory because, in all cases, the variance a construct shared with its measures was more than the variance this factor shared with any other factor (Fornell et al. 1982) (Table 4).

Table 3 Results for the measurement model
Table 4 Convergent and discriminant validity

In conclusion, the results of the EFA and CFA confirmed the questionnaire’s factorial structure and indicated that it had more than satisfactory validity and reliability.

6.2 Initial model testing

Since the study was based on perceptual processes that were measured at one point in time, Common Method Variance (CMV), the variance which is caused by the measurement method, was a concern (Podsakoff et al. 2003). For ruling out its presence, two tests were conducted (a) Harman’s single-factor analysis (Podsakoff and Organ 1986), and (b) the comparison of the standardized regression weights before and after the addition of a common latent factor (CMV) (Gaskin 2013). It was found that CMV was not an issue because (a) Harman’s single-factor analysis was <50% (36.51%), and (b) the standardized regression weights were not that different (difference < .1).

All the fit indices of the initial model appeared to be within the recommended range of values, with the exception of χ2 (Table 5). On the other hand, χ2 usually indicates significant differences when the sample size exceeds 200 cases (Hair et al. 2006), as in this study (N = 612). Therefore, this outcome was considered acceptable.

Table 5 The initial fit indices of the research model

6.3 Structural equation modeling

For testing the initial research model (Fig. 1), SEM was performed using AMOS 25. Before conducting the analysis, it was checked whether the multivariate assumptions were met. It was found that (a) there were no influential cases and outliers, (b) abnormal Cook’s distance was not an issue (all cases < .025), (c) a curve estimation for all the relationships in the model revealed that, in some cases, linearity was slightly lower than the strongest relationship between variables but it was still significantly linear, and (d) multicollinearity was not an issue because the highest VIF value that was observed was 2.28, well below the recommended maximum of 3, while Tolerance was far greater than the recommended minimum of .1 (O’Brien 2007).

When conducting SEM, it is suggested to use control variables. Gender and age were used as such, making the assumption that they had an impact on all the other factors. The results of the SEM analysis are presented in Table 6.

Table 6 Direct effects in the proposed model

Since there were some non-statistically significant effects, and others were rather weak, the removal of these paths may result in a more robust yet simpler model. For examining this, a hierarchy of 224 = 16,777,216 models was formed by making all the direct effects optional. These models were analyzed using the Specification Search Facility available in AMOS 25. The final model was the one with the smallest BCC0 value (BCC0 = 6.65) (Burnham and Anderson 1998). The fit statistics for the final model remained satisfactory [χ2 (353, N = 612) = 810.27, p < .001, χ2/df = 2.30, SRMR = .035, TLI = .96, NFI = .94, RMSEA = .046, CFI = .96]. Indeed, in this model, some paths were removed, leading to the rejection of five out of the initial twenty hypotheses, as shown in Table 7 and Fig. 4. The final model also included two more paths, that of age to enjoyment and perceived usefulness, because age proved to have a significant impact on these factors.

Table 7 Hypothesis testing results
Fig. 4
figure 4

The final model (non-significant paths were omitted for clarity of presentation)

6.4 Mediation

Mediation models provide a more accurate picture of the effects the independent variables have on the dependent. Hayes’s (2013) bootstrapping technique was used and it was found that (a) although realism did not directly affect the learning outcomes, it had indirect effects through perceived usefulness, motivation, and pleasure, (b) perceived ease of use had both a direct and an indirect impact on the learning outcomes (through enjoyment and motivation), (c) presence influenced the learning outcomes only indirectly through perceived usefulness, (d) enjoyment besides its direct impact on the learning outcomes, had indirect ones through motivation and perceived usefulness, and (e) motivation had both direct and indirect (through perceived usefulness) impact on the learning outcomes (Table 8).

Table 8 Mediation results

6.5 Post-hoc statistical power

The method described by Soper (2016) was used for conducting the post-hoc power analysis for unsupported direct effects. For the six predictors of the dependent variable, an observed R2 of .53 for this factor, a probability level of .05, and a sample size of 612, the observed statistical power was 1.00. Thus, the proposed model demonstrates an excellent power to detect the significant effects, while the non-significant effects that were observed were indeed not significant.

7 Discussion

The first thing that it is worth noting is the validity and reliability of the questionnaire used in this study. Indeed, both the EFA and CFA confirmed the above. At the same time, the number of participants in which it was administered satisfied the strictest rule of thumb regarding the sample size (Everitt 1975). Consequently, it can be argued that, besides being a short one in terms of how many items it has, the questionnaire is a quite robust tool for measuring the factors included in the model.

Presence and perceived application’s realism had the lowest mean scores (M = 12.45 and M = 12.56 respectively). This finding was expected as it is almost certain that participants did not use any specialized equipment that could enhance their sense of presence when “visiting” the virtual museum. On the other hand, the mean scores of perceived usefulness and perceived ease of use were well above the mid-point (M = 15.82 and M = 15.03 respectively). It seems that the users regarded the application as easy to use, as well as a useful one. The role of these factors will be further elaborated in the paragraphs to follow. As for the tests’ scores, which acted as an indicator of the learning outcomes, these can be considered as satisfactory since (a) they were slightly above the mid-point (M = 25.59, SD = 7.72, max = 42) and (b) they were achieved in informal settings and not in an organized educational intervention. These results confirm previous research which recognized the educational value of 3D virtual museums and indicated that indeed users were able to acquire some knowledge (e.g., Li and Zhou 2016; Naya and Ibáñez 2015; Pescarin et al. 2012).

Coming to the final model, in order its predictive power to be meaningful, it has to demonstrate high R2s as well as significant and substantial structural paths that are close to .20 (ideally above .30) (Chin 1988). Then again, statistically significant interaction terms below this threshold are also important (Chin et al. 2003). In the light of the above, out of the initial twenty hypotheses, fifteen were supported by the data. The learning outcomes (the dependent variable) were adequately explained (R2 = .53), while perceived usefulness, enjoyment, perceived ease of use, and motivation were its most significant determinants, because the paths linking them to the dependent variable were significant (β = .30, β = .24, β = .21, and β = .13 respectively). Taking into account that the model relates informal learning with VEs, it is quite plausible that many factors, not included in the model, might have influenced the results (e.g., the participants’ personality and temperament). Therefore, the fact that just four factors managed to interpret more than half of the variance of the learning outcomes, is noteworthy. Thus, it can be concluded that the model possesses the power to adequately explain the learning outcomes when individuals use VEs in informal settings. Moreover, perceived usefulness was also adequately explained (R2 = .56) by motivation (β = .37), perceived realism (β = .18), enjoyment (β = .18), age (β = .14), presence (β = .12), and perceived ease of use (β = .11). The most sufficiently explained factor was that of motivation (R2 = .63), which was significantly influenced by perceived ease of use (β = .51), enjoyment (β = .31), and realism (β = .16). In contrast, presence (R2 = .02) and enjoyment (R2 = .24) were not adequately explained. Perceived realism was the only factor -weakly- affecting presence (β = .15), while perceived ease of use (β = .31), perceived realism (β = .27), and age (β = −.15) had an impact on enjoyment.

Associating the findings of previous research with the findings of the present study is a quite difficult task, given that the proposed model examined multiple factors, in contrast to other studies where only one or just a few factors were examined (e.g., Tüzün and Özdinç 2016; Merchant et al. 2012). Even so, the most influential factor in shaping the learning outcomes was perceived usefulness, followed by enjoyment and perceived ease of use. Motivation was also influential but its impact was less important. In addition, these factors acted as mediators of the effects of all the other constructs to the learning outcomes (see Table 8). The significant impact of perceived ease of use on the learning outcomes has been noted by other researchers (e.g., Lee et al. 2010) and this also holds true for enjoyment (Faiola et al. 2013; Wrzesien and Raya 2010). On the other hand, the direct effect of perceived usefulness to the learning outcomes is in contrast to the findings of other studies where no such effect was noted (e.g., Lee et al. 2010). Additionally, motivation was expected to have a strong direct impact on the learning outcomes as suggested by other researchers (e.g., McLellan 2004). Alas, out of all the factors affecting the learning outcomes, it had the lowest impact. Then again, it had the strongest effect on perceived usefulness (β = .37). Plausible explanations for these inconsistencies in the findings of the present study, in relation to previous research, are the use of different models, target group differences, and the learning settings.

As already mentioned in a previous section, it was considered interesting to check whether a VE can effect motivation, since intrinsic motivation is already quite strong in informal learning (Marsick and Watkins 1990). Indeed, it was found that perceived ease of use had an impressive impact on motivation (β = .51), confirming the findings of previous research (e.g., Lee et al. 2010). Thus, it can be concluded that it is possible to enhance one’s motivation to learn when the VE is not considered difficult in its use.

Enjoyment, although not adequately explained (R2 = .24), seems to play a substantial role (see Fig. 3) as noted by others (Faiola et al. 2013; Harrington 2012; Wrzesien and Raya 2010). Then again, other factors that shape enjoyment need to be identified. Besides being the least explained construct (R2 = .02), presence’s role was minimal, having a rather weak effect only on perceived usefulness. While research indicated that it has significant effects on the learning outcomes (e.g., Bulu 2012; Hassell et al. 2012; Lee et al. 2010), this is not supported by the findings of the present study. It is worth noting that, even if presence was removed, the final model was not going to be affected that much. Thus, it is not an exaggeration to state that presence is not a factor in VEs when specialized equipment is not used, as North and North (2016) suggested. As for the perceived application’s realism, it proved to be an influential factor, having direct effects on all constructs except on the learning outcomes. On the other hand, it had indirect ones through perceived usefulness, enjoyment, and, to a lesser degree, through motivation (see Table 8). While these findings confirm previous research noting the importance of realism (e.g., Dalgarno and Lee 2010; Harrington 2012; Lee et al. 2010), the absence of a direct impact on the learning outcomes is puzzling.

The model was found to be invariant across genders; males and females are expected to have similar learning outcomes when using VEs, as reported by others (Wrzesien and Raya 2010). Finally, age seems to play an interesting, yet not that strong, role. It negatively affects enjoyment and positively affects perceived usefulness. This means that the older the participants were, the more they recognized the usefulness of the VE, but the less they enjoyed using it. Given that the sample had an unbalanced age distribution, this finding has to be viewed with some caution.

The study’s findings have implications for experts involved in the development of virtual museums. On the basis of the results, the impact of enjoyment needs to be enhanced even further. Thus, more attention is required on features that can increase the sense of enjoyment when one is using a VE. This can be achieved by adding game-like features, as these facilitate both enjoyment and motivation (Kozlov and Johansen 2010). Also, the sense of the VE’s realism can be further developed, since it was found to have a moderate effect on almost all of the model’s constructs. Then again, caution is advised, because increased realism means very detailed 3D models and equally detailed VEs. Such complex applications require advanced computers in order to run smoothly and not all users have such computers. Finally, the design of the virtual museum per se (e.g., how the exhibits are placed, how the associated information is presented, lighting, and the inclusion of activities) requires attention as these details can have an impact on the learning outcomes. That is because the design influences whether the users’ attention in a museum (virtual or otherwise) is focused and active or not (Screven 1993).

8 Conclusion

Within the theoretical framework laid by previous research, a model was developed for explaining the learning outcomes when adults use VEs in informal learning settings. While the results were thought-provoking, there are limitations to the study that require some attention. First, the sample, though its size was more than adequate for statistical analysis, consisted only of Greek adults. Therefore, the results cannot easily be generalized to other samples. Moreover, the participants’ socio-economic profile was not recorded in detail since there was no focus on specific groups and their age range was rather narrow. As a result, the impact of factors such as the educational level is unknown, while a more balanced age distribution might have resulted in different outcomes. Finally, the VE presented the work of an artist; totally different results might have been obtained if another theme/subject was chosen (e.g., science concepts). Future validations will establish (or reject) the model’s applicability. The role of presence certainly requires an in-depth examination, since it was the least explained factor. Additional factors can be considered for inclusion that might explain the learning outcomes in VEs more thoroughly. Future studies can also examine whether the model is invariant in different participants’ profiles or if it can be used in any type of VE and, thus, increase its usefulness to the scientific community.

Nevertheless, experts involved in informal learning can use this model in order to devise suitable strategies and policies for content delivery. The model can also act as a tool for measuring the learning outcomes and/or the effectiveness of VEs in various learning settings. In conclusion, the study contributes to the growing body of research on the impact of VEs in learning and it is hoped that the model will be of use to researchers and educational experts.