1 Introduction

Virtual reality (VR) technology has been widely adopted in many fields (Macredie et al. 1996; McConville and Virk 2012; Gervasi and Ranon 2010; Riva 1998), including education training, medical teaching, virtual experiments, and game development. Because of the rapid growth of the VR industry, there has been a dramatic proliferation of research concerned with the virtual experience (VE), which refers to user interaction with a 3D computer-simulated interactive environment that requires an engaging and emotional psychological state (Li et al. 2001).

Because the mood, loyalty, and brand attitude of users are solid predictors of purchase behavior or continued use (Lutz 1980; Beatty and Elizabeth Ferrell 1998), an increasing number of recent publications and empirical studies have focused on the positive influence of virtual experience on users’ mood, loyalty, and brand attitude. Lee et al. (2012) proposed a framework for demonstrating that a 3D product interface tends to foster the formation of positive brand attitudes. Furthermore, Lee (2012) validated the link from 3D interface and imagery vividness to brand attitudes. In addition, a number of studies have demonstrated that a user’s mood and loyalty can be enhanced through the virtual environment (Schuemie et al. 2001; Reid 2002). Jung (2011) noted that the experience of presence affects users’ continued use of social virtual worlds. Moreover, Plante et al. (2003) investigated how virtual reality technology may promote users’ mood of exercise.

To date, however, there has been relatively little research conducted to account for the mechanism by which virtual experience affects the strength of users’ mood and loyalty. Perugini and Bagozzi (2001) proposed that theoretical mechanisms can be better comprehended by introducing a new construct that mediates the influence of existing variables, and this progress may be referred to as theory deepening. Thus, the purpose of this article is to introduce and test the flow model to explain how existing antecedents influence user mood and loyalty in 3D VR environments.

Our study differs from previous studies in the following respects. First, with respect to the literature on virtual experience focusing on the predictors of virtual environment, we examine how such predictors affect users in 3D virtual environments. Second, previous studies of telepresence have focused primarily on interactivity (Choi et al. 2007; Skadberg and Kimmel 2004; Novak et al. 2000) and did not consider the relationship between interactivity and vividness (Coyle and Thorson 2001). By contrast, we consider the causal relationship between interactivity and vividness. Third, the previous literature on virtual experience primarily focused on Western countries, whereas we include Asian countries in our study.

2 Theoretical background and hypotheses

2.1 Flow

Csikszentmihalyi (1975) proposed flow theory, which was primarily intended to examine the psychological state in which humans are completely immersed in particular behaviors and activities. In examining the concept of flow, Csikszentmihalyi (1975) explored the psychological state of total absorption that a human experiences when engaging in certain behaviors or activities. Flow is an enjoyable experience in which a participant feels a high level of behavioral control, happiness, and enjoyment (Csikszentmihalyi 1990). Csikszentmihalyi (1993) identified nine elements of flow. These nine elements were theorized to fall into three categories or stages: equilibrium between skills and challenges, focused attention for optimal controllability, and loss of self-consciousness with even time distortion. Hoffman and Novak (1996) proposed a flow model for computer-mediated environments (CME). The researchers stated that the content characteristics, control characteristics, and process characteristics of a mediated environment could all affect the flow. Content characteristics include interactivity and vividness, control characteristics pertain to skills and challenges, and process characteristics are associated with goal-directed and experiential behavior.

Some previous studies have examined the relationship between flow element and attitude toward Web sites (Novak et al. 2000; Huang 2003; Skadberg and Kimmel 2004; O’Cass and Carlson 2010; Sánchez-Franco 2006; Stavropoulos et al. 2013). Furthermore, some studies have focused on the influence of flow on attitudes toward purchasing online (Guo and Poole 2009; Korzaan 2003). Other researchers have validated the link between flow and learners’ attitudes toward e-learning (Choi et al. 2007) and online communication (Zaman et al. 2010; Huang et al. 2011). In recent years, many subsequent researchers have claimed that flow can sufficiently explain man–machine interactivity in computer-mediated virtual environments (Animesh et al. 2011; Hoffman and Novak 2009). Hoffman and Novak (2009) suggested that flow in a virtual reality environment is qualitatively different from that in Web browser-based environments. These authors also claimed that flow would be a typical aspect of the user experience in virtual environments. Figure 1 depicts a proposed framework for our research.

Fig. 1
figure 1

Concept model

2.2 Relation to interactivity and vividness

Steuer (1992) suggested that interactivity refers to the level and type of content that a user encounters in real-time adjustments to the virtual environment. According to McMillan and Jang-Sun (2002), studies of interactivity have identified three primary means of conceptualizing interactivity: the telepresence view, the process view, and the perceptual view. In this study, we employed the telepresence view because this view is more suitable than other perspectives, given our aim of examining person–computer interactivity in virtual reality (Steuer 1992). According to Steuer (1992), interactivity depends on the level and type of content that a user encounters in his or her real-time adjustments to the virtual environment. Furthermore, many studies have adopted the telepresence view in analyzing virtual experience; for instance, Animesh et al. (2011) studied the relationship between person and computer interactivity and the purchase of virtual goods in online games, and Suntornpithug and Khamalah (2010) studied the relationship between person and computer interactivity and online purchasing.

Steuer (1992) defined vividness in relation to the breadth and depth of the message, in which the breadth is the number of senses that can be measured at a given time: For example, video has greater breadth than audio does. Perceptive depth refers to the quantity and quality of the information received: For example, high-definition movies exhibit greater depth than do ordinary movies.

Ghun (1999) stated that interactivity requires a certain vividness threshold. Moreover, Shih (1998) asserted that as the vividness of a medium increases, interactions accelerate because vividness can stimulate the sensory perceptions of users and improve interactivity. The following hypothesis is then derived:

H1

Greater vividness corresponds to greater interactivity.

2.3 Relation to telepresence

Nowak and Biocca (2003) classified the presence into three types: telepresence, copresence, and social presence. Telepresence has been frequently described as the sense of being in the virtual or mediated environment (Klein 2003; Kim and Biocca 1997; Schubert et al. 1996; Steuer 1992; Sheridan 1992; Minsky 1980; Nowak and Biocca 2003). Furthermore, telepresence has always been used to comprehend the attitudes or behavior of users in virtual environment (Jung 2011; Animesh et al. 2011). Copresence is often defined as the sense that refers to a psychological link to and with another person (Nowak and Biocca 2003; Nowak 2001). Social presence refers to the degree of sense of interaction with others in the virtual environment (Nowak and Biocca 2003; Jung 2011). Specifically, telepresence is more relevant to our study than the other two types of presence because this article is primarily concerned with the experience of users in virtual environment rather than interactions between users.

Researchers have agreed that two key variables affecting telepresence are interactivity and vividness (Klein 2003; Coyle and Thorson 2001; Draper et al. 1998; Kim and Biocca 1997; Steuer 1992; Sheridan 1992; Rheingold 1991; Huang et al. 2011). In addition, other researchers have begun observing the level of vividness; for instance, Welch et al. (1996) experimented on driver simulators and concluded that more vivid virtual environments are associated with higher levels of telepresence. Subsequent work indicated that the media richness of a medium—its vividness—affects the level of telepresence (Li et al. 2001; Fortin and Dholakia 2005). Meanwhile, researchers were also observing the phenomenon from the interactivity perspective; for instance, Shih (1998) proposed that a user’s perception of telepresence depends on the user’s interaction with the environment and the feedback derived from the environment. Hoffman and Novak (1996) asserted that interactivity is closely correlated to telepresence and that a user enters a flow state through constant feedback and response. Amant (2002) and Animesh et al. (2011) noted that a person’s online presence is created by the ability to respond quickly. Furthermore, an increasing number of studies have indicated that the interactivity of the game and virtual world can enhance the sense of telepresence (Von Der PüTten et al. 2012; Shahid et al. 2012; Chanel et al. 2012; Nelson et al. 2006; Siriaraya and Siang Ang 2012; Haans and IJsselsteijn 2012). Hence, we propose the following hypotheses:

H2

Greater vividness corresponds to greater telepresence.

H3

Greater interactivity corresponds to greater telepresence.

Rich media tools, such as video, audio, and animation, may be considered tools that increase vividness by enhancing the richness of the experience. Animation is a device that can attract attention when used effectively (Zeff and Aronson 1999; Rothschild 1987). Hence, we propose the following hypothesis:

H4

Greater vividness corresponds to greater focused attention.

Csikszentmihalyi and Csikszentmihalyi (1988) proposed that a user’s perception of skills and challenges is relative to other activities in which the user engages rather than on an absolute value. Novak et al. (2000) regarded skill and control as a combined construct in conducting their examination. As noted by Li et al. (2001), interactive media can improve user perceptions of control skills. Consequently, we proposed that as user perceptions of increased interactivity are associated with greater perceptions of user skill. Kettanurak et al. (2001) also suggested that interactivity positively influences user control skill and performance.

H5

Greater interactivity corresponds to greater skills.

The flow model proposed by Novak et al. (2000) asserted that higher rates of interaction increase the likelihood of user challenges. Furthermore, Friedl (2002) suggested that high interactivity in game design can increase the perceived challenge for even the most experienced game players. Thus, we considered the following hypotheses:

H6

Greater interactivity corresponds to greater challenge.

Novak et al. (2000) found that mediated environments with higher interaction rates are associated with higher levels of user concentration. Huang et al. (2011) suggested that users increase their concentration on the current activity through rapid interactivity resulting from seamless sequences of action. Thus, we developed the following hypotheses:

H7

Greater interactivity corresponds to focused attention.

Novak et al. (2000) revealed that a higher level of importance (involvement) yields higher levels of user skill. If the role of a user becomes more important, then the user will spend more time in the game and develop stronger skills. Additionally, Novak, Hoffman, and Yung have shown that a higher level of importance (involvement) creates greater user challenge. Hence, we propose the following hypotheses:

H8

Greater involvement corresponds to greater skill.

H9

Greater involvement corresponds to greater challenge.

Webster et al. (1993) found that intrinsic interest and focused attention are positively correlated and that a lasting involvement affects the level of focused attention (Celsi and Olson 1988). The flow framework of Hoffman and Novak (1996) also proposed that higher levels of importance (involvement) are associated with greater likelihood of forming focused attention. Faber and Lee (2007) also agreed that involvement increases the demand for primary attention. In addition, Koufaris (2002) and Huang et al. (2011) noted that user involvement is positively related to focused attention. Hence, higher levels of user involvement are associated with increases in the focused attention of users.

H10

Greater involvement corresponds to greater focused attention.

Nelson et al. (2006) claimed that there is a positive relationship between involvement and telepresence. Held and Durlach (1992) concluded that user familiarity intensifies telepresence. Experts and novices vary in their capacity and requirements, and this variation leads to differing levels of telepresence experience (Dix et al. 1993). Thus, an expert/novice distinction would be a vital indicator for measuring involvement. Witmer and Singer (1998) suggested that higher levels of user involvement increase the likelihood of forming positive telepresence, which leads to the following hypothesis:

H11

Greater involvement corresponds to greater telepresence.

The conclusions by Novak et al. (2000) and Koufaris (2002) showed that a more challenging mediated environment is more likely to draw a user’s focused attention. Thus, we developed the following hypothesis:

H12

Greater challenge corresponds to greater focused attention.

As a user interacts with a mediated environment, the user’s perception that the entire media environment is within his grip causes the user to enter a flow state (Animesh et al. 2011; Choi et al. 2007; Skadberg and Kimmel 2004; Huang 2003; Webster et al. 1993; Trevino and Webster 1992). This finding allows us to propose the following hypothesis:

H13

Greater interactivity corresponds to greater flow.

Hoffman and Novak (1996) suggested that both skill and challenge affected flow. When skill-level fails to meet the specified challenge, the user is likely to feel anxiety, whereas when the level of challenge decreased, the user is likely to feel bored. Hence, the user can enter flow when there is equilibrium between skill and challenge, and it is only when high states of skill and challenge are present that the user can be inspired with the potential to acquire new skill. Subsequent research has also indicated the point at which a user’s skill and challenge must be perceived as congruent to form a flow state (Ellis et al. 1994; Guo and Poole 2009; Skadberg and Kimmel 2004), thus leading to the following hypotheses:

H14

Greater skill corresponds to greater flow.

H15

Greater challenge corresponds to greater flow.

Hoffman and Novak (1996) proposed that two primary antecedent factors are necessary for the flow state to be experienced: (1) skill and challenge and (2) focused attention. A user’s focused attention, especially on a limited stimulus field, together with the filtration of other unrelated information, leads to a flow state. Huang (2006) noted that if consumers do not focus on current activities, then they become either bored or anxious and are likely to halt their current task. Thus, we develop the following hypothesis:

H16

Greater focused attention corresponds to greater flow.

When the information in an environment is technically manipulated or depressed and a medium shifts focused attention toward this information from the virtual environment, a user is likely to form a telepresence (Kim and Biocca 1997; Gerrig 1993). Witmer and Singer (1998) concluded that a user who is focused on stimulants in a virtual environment and becomes absorbed by the environment is more inclined to perceive a high level of telepresence. Draper et al. (1998) asserted that focused attention plays an important role in determining telepresence (Schubert et al. 2001; Sheridan 1992). Mollen and Wilson (2010) also claimed that telepresence is augmented by focused attention. Hence, reviewing the findings above, we propose the following hypothesis:

H17

Greater focused attention corresponds to greater telepresence.

The flow framework proposed by Hoffman and Novak (1996) indicates that telepresence serves as an antecedent of flow but not a requisite one. Recent publications regarding online learning indicated that a significant correlation exists between telepresence and flow (Guo et al. 2012; Faiola et al. 2012). Additionally, several studies have suggested that telepresence is an antecedent of flow state (Nijs et al. 2012; Animesh et al. 2011; Zaman et al. 2010), which supports the following hypothesis:

H18

Greater telepresence corresponds to greater flow.

Webster et al. (1993) asserted that flow is essentially a form of human–machine interactive experience and that through the process of interaction, an individual can perceive joy and involvement and thus derive emotions and satisfaction. Welch (1999) found that a strong flow is conducive to creating an emotional state in users. Furthermore, four studies (Hoffman and Novak 1996; Hsu and Lu 2004; Sánchez-Franco 2006; Choi et al. 2007) asserted that flow positively affects the attitudes of users, thus leading to the following hypothesis:

H19

Greater flow corresponds to greater positive affection.

Hoffman and Novak (1996) suggested that flow would dictate a user’s positive attitude and future usage desirability. Moreover, conclusions by Chou and Ting (2003), Korzaan (2003), and O’Cass and Carlson (2010) showed that flow is positively related to loyalty. Therefore, we propose the following hypothesis:

H20

Greater flow corresponds to greater loyalty (to a 3D VR game).

3 Methodology

3.1 Stimuli

Although conventional 3D VR motion theaters can involve hundreds of people and offer a high level of immersion, they offer only a passive mode of experience, meaning that the server and the screen are operating under a fixed scenario. In these circumstances, the user views only the screen and perceives the motion rather than modifying the trajectory of motion. As opposed to motion theaters, interactive simulators allow users to manipulate the setting. The user is able to interact with the simulator. The most technically sophisticated 3D VR motion simulators are equipped with a six-axis motion platform. This interactive simulation technology remains rare around the world. Representative applications include aircraft piloting and race car games by Maxflight USA and Injoy Motion Corp Taiwan. The 3D VR simulators developed in Injoy Motion Corp offer the challenge of online competition among multiple players. This feature meets the desire of the younger consumer market for challenge, stimulation, and joy. The appeal of this feature has caused 3D VR games to become mainstream and has thus drawn the focus of this study.

In this study, we used a six-axis simulator to create a 3D VR interactive environment and we measured participants’ virtual experience. A larger range of motion and a larger angle of rotation to improve the experience of virtual environment are the main characteristics of a six-axis simulator (Fig. 2). The participants sat in the booth of the simulator during game play to block external interference and to maximize their concentration.

Fig. 2
figure 2

Configuration of six-axis simulator

This research used the game Panzer Elite Action, which is produced by the Injoy Motion Corp. In the beginning of the game, the participants were informed that they must shoot and kill the enemy to pass through each barrier. The participants played for approximately 10 min to complete one session. The game takes place in World War II. In this game, participants control their tank through a World War II battlefield to kill the rival tank and troop that blocked their path to the end point (Fig. 3). There are many powerful and renowned rival tanks in this game, including a Soviet T34-76, Allied Sherman tank, and others. Participants can use the joystick to control a tank’s movement and direction and manipulate the artillery of the tank to attack the enemy. Additionally, participants can use a special button to call for air support in bombing the enemy. Because of the 3D VR interactive design of the six-axis simulator, participants would feel the vibrations of the booth when their tank was being attacked or went through the bumpy roads.

Fig. 3
figure 3

Tank go through the battlefield

3.2 Measurement

In this study, all items were measured on a seven-point Likert scale. A seven-point Likert scale was used to relieve the burden of informants and to reduce response error resulting from the large number of items used in this study. To enhance face validity, a group of expert judges qualitatively tested an initial pool of items.

We measured the constructs of skill, challenge, positive affect, and flow by adopting the scale used by Novak et al. (2000), and the instrument measuring flow contained a paragraph explaining the flow state. The interactivity scale by Novak et al. (2000) was adopted; this scale is based on theoretical fundamentals from the work of Steuer (1992) and consists of three-items. The three-item vividness scale was developed by Witmer and Singer (1998). We measured the constructs of involvement by adopting the three-item scale used by Laurent and Kapferer (1985). The four-item focused attention scale was developed by Ghani and Deshpande (1994). The telepresence scale employed by Novak et al. (2000) was also used; this scale is based on theoretical fundamentals from the work of Kim and Biocca (1997) and consists of six-items. We measured the constructs of loyalty by adopting the three-item scale used by Franzen (1999) and Zeithaml et al. (1996).

3.3 Data collection

The survey was conducted for three consecutive days during the Taiwan International Amusement Machine Exposition Show. The participants were recruited from the exhibition and were given a gift as payment for their participation. The participants were then asked to read a printed page of instructions on how to play the 3D VR game to which they were assigned. Prior to beginning the assigned game, each participant practiced moving and using the weapons. When they finished practicing, the participants were asked to play the 3D VR game for 10 min. After playing the game, the participants were asked to complete a questionnaire. During the three-day period, 485 questionnaires were dispatched. After careful review, 368 valid questionnaires were retained after the elimination of incomplete responses and aberrant responses. Finally, a total of 368 fully completed responses were used for the analysis. Table 1 presents the demographic information of the final samples.

Table 1 Sample Demographics

4 Findings

4.1 Measurement model evaluation

Tests of the measurement model and structural model were employed with maximum likelihood estimation. The SAS 9.3 statistical software package was used in the data analysis. The study’s measurement model incorporated three stages to select and evaluate the final items to be used in testing the following hypotheses. (1) In a reliability check, the study computes each construct using Cronbach’s α, as proposed by Nunnally and Bernstein (1994). Following this criterion, a score of 0.7 or higher would validate reliability. We also compute an item-to-total coefficient on each item. An item that has an item-to-total coefficient below 0.5 is then deleted for parsimony purposes. Finally, a total of ten constructs evolved into 34 items (Table 2).

Table 2 Reliability checks of construct and item

(2) An exploratory factor analysis (EFA) was used to check the unidimensionality of all constructs in our model and to ensure that the item loadings comply with the latent constructs used in past research. The exploratory factor analysis used the principal component analysis factor method with equamax rotation. A factor-loading threshold of 0.45 was used, as recommended by Nunnally and Bernstein (1994). Table 3 shows that the factor-loading values are well above the threshold value of 0.45.

Table 3 Factor analysis Result

(3) A confirmatory factor analysis (CFA) was employed as the measurement model specifies the pattern by which each observed measure loads on a particular factor through CFA. Garver and Mentzer (1999) suggested that there are two choices in the overall fitting indicator strategy, and our study adheres to the following criteria: (a) the strategy is not influenced by sample size; (b) the strategy is accurate and consistent in assessing various models; and (c) the strategy is easy to interpret when employing four indicators of the Relative Noncentrality Index (RNI), Tucker-Lewis Index (TLI), Comparative Fit Index (CFI), and root mean square error of approximation (RMSEA) (Marsh et al. 1988; Marsh et al. 1996). The final results from the measurement model indicate that the CFI is 0.93, RNI is 0.92, and TLI is 0.92. Values exceeding 0.9 indicate that the results are acceptable (Gerbing and Anderson 1992), and the RMSEA of 0.059 (90 % confidence interval: 0.054–0.063) reflects a reasonable fit level (Browne and Cudeck 1993). A chi-squared/degree of freedom at 2.32 falls within the acceptable range between 2 and 5 (Marsh and Hocevar 1985). Hence, all of the previously mentioned factors confirm that the model offers an acceptable fit.

In addition, under the measurement model, we attempt to determine measurement validity. Convergent validity was confirmed by examining both the average variance extracted (AVE) and the composite reliability of the indicators associated with each construct. Table 4 shows that the AVE values ranged from 0.48 to 0.66, except the skill construct measures near 0.5 at 0.48, which are well above the threshold value of 0.5 (Bagozzi and Yi 1988; Fornell and Larcker 1981). The composite reliability ratings ranged from 0.65 to 0.91 and are all above the threshold value of 0.6 (Fornell and Larcker 1981), thus supporting the convergent validity.

Table 4 Reliability evaluation of measurement model

Discriminant validity was evaluated by two methods (Anderson and Gerbing 1988). First, the chi-squared difference test is used to select constructs with a maximum correlation in the study, and we define its covariance as 1 to obtain a chi-squared difference of 66.6, which indicates an acceptable discriminant validity (Bagozzi and Phillips 1982). Next, the confidence interval of the paired correlations among the latent constructs was examined. If the confidence interval includes 1.0, then discriminant validity is not supported (Anderson and Gerbing 1988). The result falls between 0.82 and 0.67 to exclude 1.0 in the region and thus indicates dependable discriminant validity.

4.2 Evaluation of the structural model

Figure 4 reveals that under the structural model, only involvement and vividness are exogenous constructs; interactivity, skill, challenge, focused attention, telepresence, flow, loyalty, and positive affect are endogenous constructs. The final model is constructed to encompass ten constructs represented in 27 items. Similar to the measurement model’s method, this study incorporates the variance/covariance matrix in data input to examine the model. With regard to the model fit, the findings in Table 5 show CFI, TLI, and RNI are all above 0.90, which indicates a good fit, and show an RMSEA value of 0.068 (90 % confidence interval: 0.062–0.074), which also indicates a reasonable fit (Browne and Cudeck 1993); meanwhile, the chi-square/degree of freedom of 2.69 also falls within an acceptable range (Marsh and Hocevar 1985). The results for the above indicators show that the data and model fit are acceptable. In addition, the study incorporates a one-tailed t test to validate the model’s paths, with 0.05 chosen as the level of significance. Furthermore, Adjusted Goodness of Fit Index (AGFI) of our structural model is 0.81; according to Segars and Grover (1993), AGFI values greater than 0.8 are considered an acceptable level for goodness of fit. Finally, our cases-to-parameter ratio is 4.91.

Fig. 4
figure 4

Standardized path coefficient of structural model

Table 5 Evaluation of structural model

The final modeling results shown in Table 5 indicate that a majority of the hypotheses are supported. These hypotheses include the paths (H1) from interactivity to vividness; from vividness (H2), involvement (H11), and focused attention (H17) to telepresence; from interactivity (H13), skill (H14), focused attention (H16) and telepresence (H18) to flow; from interactivity (H5) and involvement (H8) to skill and then from interactivity (H6) and involvement (H9) to challenge; from interactivity (H7) and challenge (H12) to focused attention; and from flow to loyalty (H20) to positive affect (H19). All sixteen paths were found to be positive and significant. However, four paths did not reach a significant level: the path from interactivity to telepresence (H3), the path from vividness (H4) and involvement (H10) to focused attention, and the path from challenge to flow (H14).

4.3 Competing models

Researchers generally agree that studies should compare rival models rather than merely testing the performance of a proposed model (Bagozzi and Yi 1988). Because our intention is to comprehend the direct and indirect effects within the complex construct of virtual experience, it is important to prove that other paths are not significant. One possibility is that the antecedents to the flow constructs may directly influence positive affect or loyalty. This issue is important because we model flow as an important mediator of the effects of positive affect or loyalty.

According to previous research, we added three direct links. First, because Jung (2011) considered telepresence to be a critical factor in affecting users’ motivation for continued use of a product, we hypothesized that there is a direct effect of telepresence on loyalty. Second, Park (1996) proposed that involvement has a direct effect on loyalty toward a product, and we thus added the direct link between involvement and loyalty. Third, Sukoco and Wu (2011) noted that interactivity would significantly enhance a consumer’s affective responses, and we therefore hypothesized a direct effect of interactivity on positive affect.

The chi-squared difference between the hypothesized model and the competing model was not significant (the chi-squared difference is 6.2479, degree of freedom = 3, p > 0.05); according to Hair et al. (2010), this result indicates that the rival model is not better than the hypothetical model. In addition, the average CFI of the rival model was the same as that of the hypothesized model, and the rival model’s RMSEA was slightly higher than that of the hypothesized model (0.0681 versus 0.0680, respectively). However, compared with the hypothesized model, the rival models demonstrated reduced parsimony. Therefore, telepresence only influences flow, and the effects of involvement on positive affect and of interactivity on loyalty are indirect. On the basis of these findings, we believe that the exercise of fitting a rival model has strengthened the support that we found for the significance and robustness of our hypothesized model.

5 Discussion

The entire flow formation model can be divided into four stages, and each stage is contingent on the successful achievement of the previous stage. Stage one considers the mediated environment’s content/form characteristics and user involvement characteristics; stage two considers the user’s premise and perception of entering a flow state; stage three considers the user’s inner experience emphasizing the state and phenomenon of entering a flow state; and stage four considers the consequences of entering a flow state.

Firstly, the study considers internal relationships in stage one. The study examines the relationship among content variables in the mediated environment. Interactivity and vividness significantly impact the mediated environment virtual experience. Most scholars regard interactivity and vividness as independent variables without examining their interactive effect. This study attempted to establish, based on previous research, that interactivity within a mediated environment is affected by vividness; a mediated environment’s interactivity relies on the presence of vividness as a threshold. A mediated environment’s vividness and interactivity do not remain independent variables. This conclusion varies from the model proposed by Novak et al. (2000).

Secondly, the study considers the relationship between stage one and stage two. In discerning the relationship of a mediated environment’s content/format variables and user involvement, the study finds that this relationship is correlated to the premise and perception of entering a flow state. The study’s findings revealed that both interactivity and involvement affect the mediated control characteristic variables (skill and challenge)—while interactivity also affects focused attention—but involvement does not significantly affect focused attention. This conclusion does not conform to the conclusions proposed by Novak et al. (2000). This divergence may be explained by the users’ need to pay attention when driving an interactive motion simulator to avoid loss of control in a high-speed environment. High-level and low-level participants all achieved a high level of focused attention. In addition, Novak et al. (2000) primarily examined Web-browsing behavior, which differs from the high-speed interactive environment experienced in a motion simulator. Varied levels of participant involvement could affect the level of focused attention. The study also revealed that vividness does not affect focused attention, primarily because vividness or the richness of content does not necessarily affect a user’s focused attention in a mediated environment. Hoffman and Novak (1996) stated that vividness would affect a user’s focused attention because their study examined Web-browsing behavior. The vividness of product information (a rich content) online would draw the user’s browsing attention. Moreover, that study showed that challenge will positively and significantly affect focused attention, a conclusion that conforms to Novak et al. (2000). Furthermore, a worthwhile reference finding shows that though vividness does not affect focused attention, it positively and significantly affects telepresence.

Thirdly, the study considers the direct relationship between stage one and stage three. In discerning the relationship in a mediated environment’s content characteristics between involvement and the formation of a flow state, the study showed that interactivity positively and significantly affected flow. This conclusion coincides with Hoffman and Novak (1996) and Novak et al. (2000). Nevertheless, though vividness did positively and significantly impact telepresence, interactivity did not directly affect telepresence, a finding that conforms to Novak et al. (2000) while partially echoing Steuer (1992) and Sheridan (1992). They reckon interactivity and vividness as two key elements that affect telepresence. The study also suggests that interactivity affects telepresence through challenge and focused attention. In addition to interactivity, the level of involvement will directly affect telepresence, a finding that varies from those proposed by Novak et al. (2000). They indicated that a user’s level of involvement, in Web-browsing behavior, will affect focused attention and further affect telepresence through a user’s focused attention.

Fourthly, the study considers the relationship between stage one and stage three. In discerning the relationship of premise and perception of entering a flow state to the formation of a flow state, the study revealed that skill and focused attention would affect the state of flow, though challenge did not significantly impact flow. This finding does not coincide with Novak et al. (2000). When users engage with an interaction motion simulator, challenge leads to the achievement of a flow state through a user’s focused attention. In Web-browsing behavior, a high level of challenge provided by the Internet medium would suffice to create a flow state. As to the formation of telepresence, focused attention did affect telepresence, a finding that coincides with Novak et al. (2000).

Fifthly, the study considers internal relationships in stage three. In discerning the relationship of telepresence and flow, the study revealed that telepresence did positively and significantly affect flow, a finding that echoes Hoffman and Novak (1996). Telepresence serves as an antecedent to impact flow state but is not necessary for flow. Requisites also include other factors (i.e., skill, challenge, focused attention, interactivity, involvement).

Lastly, the study considers the relationship between stage three and stage four. In discerning the relationship between flow formation and flow state, the study showed that flow significantly affected a user’s positive affect and inclination to repeat consuming behavior in the future (loyalty). This finding not only echoes Hoffman and Novak (1996) but also indicates that flow is a crucial construct in measuring virtual experience. Figure 5 depicts the four stage of flow formation model.

Fig. 5
figure 5

Four stage of flow formation model

6 Managerial implications

The study findings and discussion yield the following crucial managerial implications.

First, interactivity and vividness remain the crucial factors that affect telepresence and flow, while vividness also affects interactivity. Bone and Ellen (1992) described vividness as a continuous concept, ranging from poor presentation to an enriching and amusing and near realistic experience of content presentation. The key to the vivid quality of a motion simulator rests on the VR game’s content’s ability to captivate, the quality of the images, the sound effects, and the level of visual cognition, which requires the screen be as large as possible. This factor concerns field of view of the screen with field of view for a single eye ideally set to cover 70 degrees. Many virtual reality games set the field of view at 100 degrees or larger to provide the user an all encompassing and immersive effect (Stanney et al. 1998). In design, though, it is noteworthy that larger screen size will inadvertently reduce the quality of image resolution (Biocca 1992). Perhaps, then, a well-intentioned screen configuration is not practical in design implementation. Hoffman and Novak (1996) proposed that two types of interactive modes exist in a mediated environment: First, the interaction of human and computer interface, which is described as machine interaction. Second, the interaction between one player and others under a computer medium is described as person interactivity. The study’s focal product focuses on an online interactive motion simulator that covers not only machine–user interface but also online interpersonal interface, in which interactive elements behind interface include response time and continuity (Alba et al. 1997). Under an interactive motion simulator environment, the technique that determines the interaction rests on the structure of a motion platform and the real-time computing capability of computer-generated images. The interactive motion simulator’s motion platform encompasses one to six-axes with maneuverable motions ranging from single axis to six axes of pitch, roll, yaw, surge, sway, and heave. More axes create more technological difficulty. Most vehicle and navigation simulators require three axes, whereas aircraft may require up to six axes due to maneuverability concerns. More axes of motion create a more realistic interaction generated through motions. In addition to the technique of a motion platform, the feedback function of the joystick remains a crucial factor that affects the user’s interaction and keeps the player’s attention. Regarding visual computation, conventional motion theaters offer a browser-based viewing experience that presents a collaboration of machine and screen operated at a fixed setting the user cannot adjust. Hence, the user cannot voluntarily demand that the motion be changed. The human–machine interface thus remains at a user-passive mode, which leaves a lack of interpersonal interaction. However, this study’s product offers the user a voluntarily maneuverable mediated environment that also allows interaction with others through the generation of a computer-mediated environment. This product requires a large amount of real-time computation in light of random game screens, which in turn requires that the producers focus on enhancing the motion platform technology and building software development know-how for screen real-time computation to improve the user’s interactive perception.

The second managerial implication concerns skill and challenge. Hoffman and Novak (1996) proposed that key requisites in flow formation lie in skill, challenge, and focused attention. As to skill and challenge, Novak et al. (2000) stated that when skill fell short of tackling a challenge, a user would feel anxious, whereas when the challenge was low, a user may feel bored; hence, skill and challenge must be congruent. As the system manager is less likely to manipulate the involvement degree of participants, the producer nevertheless could focus on improving the interactivity that may influence a user’s skill and challenge. In addition, Hoffman and Novak (1996) emphasized that vividness and interactivity could increase a user’s focused attention, which indicates that interactivity could directly impact flow, and can affect the state of flow through focused attention backed by skill and challenge. This proposal also confirmed interactivity’s key role in flow formation. The marketer thus needs to focus on improving the vividness; this conclusion was derived from how interactivity and vividness affect the flow deriving from an aggregated yield on a standardized path factor (of direct impact + indirect impact) and suggests that interactivity is 2.5 times more powerful than vividness (deriving from 0.406/0.16).

The third managerial implication concerns marketer attention to telepresence and interactivity. Steuer (1992) cited the term telepresence and asserted that two crucial factors that dictate telepresence are interactivity and vividness; as noted previously, interaction does not directly affect telepresence, but it does indirectly impact telepresence through challenge and focused attention. Observing the aggregated effect (derived from indirect effects) of the standardized path factors, the study revealed that the influential power of vividness is three times more than that of interactivity deriving from 0.4406/0.1489. The conclusions indicated that vividness remains the primary factor that affects telepresence. Up to now, a majority of marketers have merely emphasized visual and audio to allow for improvements where dextral and olfactory senses are concerned. The level of involvement also directly influences telepresence and thus needs the attention of marketers.

The fourth implication concerns the connection of telepresence and flow formation. The study revealed that telepresence remains a vital antecedent to flow construct, but is not necessary for it. Vividness, however, rather significantly influences interactivity. While this conclusion subordinates the role of interactivity, it also suggests that vividness has a very crucial role in creating telepresence. In an interactive motion stimulator, vividness is essential in forming positive interactivity. To attain the ultimate goal of improving telepresence and flow state, producers need to secure a vivid design before looking to coordinate interactive elements that help to induce focused user attention through skill and challenge.

Finally, the study concludes that vividness and interactivity play important roles in formulating virtual experience in 3D VR simulators, and the study proposed that game content and motion-based platform play key roles in helping to demonstrate vividness and interactivity. Thus, in practice, we should pay more attention to scenario, fine arts, animation, and background music to improve vividness for players, and we should focus on control engineering, 3D VR technology, and communication engineering to enhance interactivity with consumers.

7 Research limitation and future research directions

Beyond this discussion of managerial implications, the study confronts certain limitations. Firstly, while the study attempts to define the effect of interaction between vividness and interactivity, it has only attempted to define the significant yields following a comparison of the vividness and interactivity factors without considering that a mediated environment often falls under a high level of interaction that requires interacting with the consumer in a 3D-operating environment to create the desired effects. As this consideration is not covered by prior studies, incorporating a causal effect into the model could lead to a non-recursive model. While a majority of SEM models present a one-way direction, a two-way model will also present difficulty in analysis and explanation. Secondly, the study used as a prerequisite flow concept the concept proposed by Hoffman and Novak (1996) and Novak et al. (2000), notwithstanding that some research has proposed the affordance concept incorporated to describe the formation characteristics of virtual experience (Durlach and Mavor 1995; Li et al. 2001). The study does not include this consideration in its analysis. Without the construct of affordance, the study nevertheless offers a close facsimile on flow formation stages to resemble that proposed by Csikszentmihalyi (1993) to offer certain valuable contributions.

In addition to the aforementioned research limitations, we recommend the following for future studies: first, as vividness and interactivity are not intended as independent variables, future studies may want to examine the incorporation of varying levels of telepresence and observe how they affect consumption behavior. Second, as past studies attempt to categorize telepresence into three layers—first content telepresence, second social telepresence, third individual telepresence (Ghun 1999)—the study concludes that telepresence is conducive to forming a flow state and warrants that all three layers of telepresence be further examined by future studies in terms of how they affect the flow formation process.

Third, with a recent increase in the neuroscience approach to the study of virtual experience (Sjölie 2012; Riva and Mantovani 2012; Bouchard et al. 2012), future research might usefully extend the present use of the neuroscience approach to examine the virtual experience of the new research issue, such as augmented reality (Villani et al. 2012; Benyon 2012), locative media (Karapanos et al. 2012), and 3D virtual world (Bae et al. 2012; Nah et al. 2011). Fourth, the majority of flow research has focused on the balance between skill and challenge and does not consider the interactive relationship between skill and challenge. Perhaps future research could examine the interaction between these two constructs and could provide more detailed results, which may differentiate past research from one another.