Keywords

1 Introduction

Many users experience adverse effects such as cybersickness when wearing a Head-Mounted-Display (HMD). The reasons for these symptoms are the subject of research. One possible theory is the mismatch between the senses which provide information about the body’s orientation and motion, also known as the sensory conflict theory [13]. Thus, increased cybersickness is often reported when the delay between user input and visual feedback is too high [7, 17] or when the movements are forced upon the users, e.g., in flight simulations [16]. Shafer et al. [15] argue that with increasing sensory conflict, users will experience a higher level of cybersickness.

High end-to-end latency can induce cybersickness [17]. End-to-end latency in immersive virtual reality (IVR) refers to the time delay between a user’s action and when this action is visible on the HMD. High delays also impair user performance and experience. Attig et al. [2] state that the lower limit of 100 ms is too high and suggests to update the guidelines.

In this paper, we want to recommend appropriate end-to-end latency so that the researchers and developers can prevent or minimize cybersickness, maximize user performance, and ensure immersive user experience. Our main contributions in this paper are:

  • We aim to identify the lower limit for the end-to-end latency in IVR at which the users report increased symptoms for cybersickness.

  • We analyze the impact of end-to-end latency on the sense of body ownership, agency, and presence. It should be investigated which maximal delay the users tolerate without losing the joy of the IVR application.

The paper is structured as follows. In Sect. 2, we present selected related work. In Sect. 3, we describe the IVR tasks to measure the effects of increased end-to-end latency on user experience as well as performance and furthermore propose our hypotheses. We describe the evaluation method in Sect. 4. We show the results in Sect. 5 and discuss them in Sect. 6. Lastly, we conclude in Sect. 7.

2 Related Work

The work of Albert et al. [1] reveals that participants, even if instructed to search for artifacts, could not find them if the latency was not higher than 50 to 70 ms. Waltemate et al. [18] stated that motor performance and simultaneity perception are worsened significantly at end-to-end latency above 75 ms, whereas the sense of agency and body ownership only decline at latency higher than 125 ms. However, in their study, the participants performed motor tasks inside a Cave virtual environment and did not use IVE with a HMD. Similarly, Kasahara et al. [6] observe that with an end-to-end latency of 104 ms, the participants lose the sense of body ownership.

Kawamura and Kijima [7] studied the effect of latency on the ability to stand on one foot in IVR. They used a Wii Balance Board to measure the stability of the participants and found that balance decreased with increased latency of already 1 ms. Attig et al. [2] expressed a similar view. The researchers report that the users are indeed able to perceive latencies down to the single millisecond and furthermore state that performance already gets impaired by latencies between 16 to 60 ms.

Increased latency of 450 ms also negatively effects the ability to memorize objects in IVR [12]. Furthermore, Samaraweera et al. [14] investigated the impact of additional latency applied either to the left or right side of the avatar. The researchers found significant differences between the control (with base latency of 45 ms) and experimental group (end-to-end latency of 200 ms). However, only three out of ten participants noticed the added delay. Users in the experimental group walked slower towards the mirror and tried to compensate for the latency by increasing the step and swing time on the impaired side. Likewise, Meehan et al. [10] argue that lower latency of 50 ms compared to 90 ms corresponds to a higher sense of presence.

Stauffert et al. [17] investigated the impact of latency jitter on cybersickness in IVR. In their study, almost half of the participants did not even notice latency jitter and the ones who did, found it to be annoying. The researchers furthermore found a significant correlation between heart rate and cybersickness. Similarly, Waltemate et al. [18] also conclude that the users did not show cybersickness symptoms, even at end-to-end latency above 350 ms. Other works suggest that delays above 40 ms already evoke cybersickness and postural instability [3].

3 Experimental Design

3.1 Tasks

We want to measure and compare the impact of different latency conditions on user experience and performance. In general, we apply latency to the virtual body and the HMD, controllers, as well as trackers. In other words, player movements are delayed with a certain amount of time. We analyze the impact of tracing delay in IVR to determine the upper latency limit that is still tolerated by users. Therefore, we developed three tasks:

  • Searching task is designed to measure cybersickness through the simulator sickness questionnaire (SSQ) [8]. As proposed by Stauffert et al. [17], the user has to find circular targets (platforms), spawned on the ground in one of three corridors. As soon as the user steps on one platform, a new one randomly spawns in another corridor. Thus, the user has to walk to find the next platform. Only one platform is present in the scene at the same time. The user is exposed to different artificial added delays, between 0 and 100 ms. In each round, the user has to find three platforms, one after another. After that, the user has to state how affected he/she is by each symptom on a four-point Likert scale: “None”, “Slight”, “Moderate”, and “Severe”. The scale is converted into integer numbers from 0 to 3. Finally, we calculate the final SSQ scale.

  • Reaching task is designed to measure user performance, in particular, how much time a user needs to touch a cube appearing in a virtual scene and how many errors he/she makes during the execution. Similar to the searching task, we add delays between 0 and 100 ms. As proposed by Ware et al. [19], targets are spawned around the user in the field of view, at a distance so that he/she can reach them. The user has to sit on a stool and should use his/her dominant hand to reach the virtual target. As soon as their hand reaches the target, the virtual object disappears and a new one spawns randomly in the virtual environment. Thus, the users should reach the targets as fast as possible; however, they must avoid making mistakes. The user can access the object only once. If he/she fails, it is considered as an error.

  • Embodiment task is designed to measure user experience while the users can observe themselves in a virtual mirror. As proposed by Samaraweera et al. [14], we apply latency to only one side of the body (either the left leg and left arm or the right leg and right arm). By using a seven-point Likert scale, we measure the sense of body ownership, sense of agency, sense of presence, and latency perception. The users are exposed to different artificial added delays, between 0 and 200 ms. The questionnaire consists of five questions, as listed in Table 1.

Table 1. Embodiment questionnaire. Each question was assessed on a seven-point Likert scale: “Strongly Disagree” (\(-3\)) and “Strongly Agree” (3).

3.2 Hypotheses

Studies on latency in IVR show that users performance may drop and they may notice latency smaller than 75 ms (see Sect. 2). However, studies evaluating the sense of agency and body ownership show that the users in IVR tolerate higher end-to-end latency between 90 and 200 ms.

For our experiment, we pose our hypotheses as follows:

  • H1: Higher end-to-end latency evokes cybersickness.

  • H2: Higher end-to-end latency decreases user performance.

  • H3: Higher end-to-end latency decreases user experience, in particular, sense of embodiment.

The evaluation of H1 is based on the SSQ, as proposed by Kennedy et al. [8]. Another way to detect cybersickness symptoms could be by using an ECG to measure the electrical activity of the heartbeat as proposed in our previous work [5]. To test H2, we measure the time it takes for a user to reach a cube and the errors that the user makes. Furthermore, to test H3, we analyze an embodiment questionnaire on a seven-point Likert scale.

4 Method

We recruited 21 participants, aged between 19 to 53 years (\(M = 28.6\) years, \(SD = 8.79\), six females). At the beginning of the experiment, all participants filled out a demographics questionnaire and were informed about the risks.

The experiment was conducted on a laptop with an Intel i7-7700HQ processor and an Nvidia GeForce GTX 1080. This setup fulfills the minimal hardware requirements of the HTC Vive Pro.Footnote 1 The tracking area was approximately 3.1  \(\times \) 2.9 m large. The application was built with Unity 3D on a Windows 10 system. We used an HTC Vive Pro HMD with two controllers and three trackers to enable full-body motion reconstruction. The users had to bind two trackers to left and right ankle as well as one on the hip. Additionally, they had to hold both controllers in the hands to track arm movements. One additional tracker was bound to a stool so that it can be shown in the IVR application.

Fig. 1.
figure 1

The virtual environment, used for our experiment. (a) The search task consists of three corridors, where the users have to find platforms, spawned on the ground in one of three corridors. (b) In the reaching tasks, the users sit on a stool and have to touch virtual targets, spawned around them. (c) In the embodiment task, the users have to observe themselves in a virtual mirror, to find the impaired side. Note that the player can see the virtual body from the first-person view.

Each participant could choose between two virtual bodies (a female and male avatar). After the avatar calibration, the participants then viewed a simple scene with a mirror in from of them and a stool, as shown in Fig. 1. Thus, they could view their avatar from the first-person view, i.e., by looking down towards their body or by observing themselves in a virtual mirror.

The study had a fully within-subjects design. Thus, all participants tested all tasks and all conditions. To ensure that the participants do not adapt to the latency during execution, we randomized the task order as well as latency conditions. The experiment started with a SSQ to provide a baseline measurement. After that, a total of 165 (\(11 \times 3\) different platform positions in the searching task, \(11 \times 6\) different cube positions in the reaching task; \(11 \times 6\) tries in embodiment task) randomly conditions were presented. In searching and reaching task, we added a delay between 0 and 100 ms (0 ms, 4 ms, 8 ms, 13 ms, 19 ms, 25 ms, 33 ms, 42 ms, 54 ms, 71 ms, 100 ms). In the embodiment task, we added higher delays between 0 and 200 ms (0 ms, 8 ms, 17 ms, 27 ms, 38 ms, 51 ms, 66 ms, 84 ms, 108 ms, 142 ms, 200 ms).

At the end of each latency condition in the searching task, the participants had to complete the SSQ. The surveyor read the questions and noted the participant’s responses. In the reaching task, the participant did not need to complete any questionnaire. Instead, we measured how long each participant needed to grab the cube and how many errors he/she made. In the embodiment task, the participants again had to complete a questionnaire (see Table 1). Note that the order of the tasks and different latency conditions were randomly selected for each participant. Exposure time was approximately 55 min.

5 Results

5.1 Base End-to-End Latency

We firstly measured the system latency of our setup by using a frame counting approach, as proposed by Friston and Steed [4]. The end-to-end latency is the delay between the player’s movement until the corresponding avatar movement is displayed on the HMD. Due to the calculation of inverse kinematics for full-body motion reconstruction, our base latency is slightly higher than those found in related work, which is often between 22 and 36 ms [1, 11, 17]. Other similar studies using full-body motion capturing reported higher end-to-end latencies, between 45 and 80 ms [6, 9, 14, 18]. We measured end-to-end system latency of approximately 50 ms.

5.2 Cybersickness

To evaluate the level of cybersickness, we calculate the final SSQ scores at the end of each sub-task. Figure 2 shows the final SSQ scores for different end-to-end latency conditions. Error bars represent 95% confidence intervals of the mean.

Fig. 2.
figure 2

Final SSQ scores at different end-to-end latency conditions. Error bars show 95% confidence interval of the mean. The values above the Friedman threshold contribute to the significant cybersickness symptoms. Indeed, values above Stanney et al. threshold already indicate too high SSQ scores.

Results of the Friedman test show that higher end-to-end latency has a significant effect on final SSQ scores (\(\chi ^2(2) = 100.6\), \(p \le .001\), Kendall’s \(W = .57\)). The internal reliability for cybersickness was very high (Cronbach’s \(\alpha = .922\)). Post-hoc analysis with Conover’s test and \(\alpha = .05\) reveals significant differences between 50 ms and \(\ge \)75 ms, 54 ms and \(\ge \)75 ms, 58 ms and \(\ge \)75 ms, 63 ms and \(\ge \)83 ms, 69 ms and \(\ge \)75 ms, 75 ms and \(\ge \)121 ms, 83 ms and 150 ms, 92 ms and 150 ms, as well as 104 ms and 150 ms. Thus, the results show that higher latency values tend to correspond to higher final SSQ scores.

The results furthermore show significant lower SSQ score before the users put on the HMD (0 ms) and after they experienced end-to-end latency of 75 ms (\(p < .001\)). Thus, these findings indicate that users will experience sufficient discomfort at end-to-end latency above 75 ms. As it can be seen in Table 2, the reported final SSQ score at 75 ms is already \({>}22\). Stanney et al. [16] also complement these findings, stating that the final SSQ scores of \({>}20\) indicate that such a system is troublesome.

Table 2. Final SSQ scores prior the users put on the HMD (0 ms) and after each end-to-end latency condition (50–150 ms).

Moreover, Stanney et al. [16] report that users who report symptoms which are already higher than 15 on the SSQ are experiencing sufficient cybersickness. Comparing this value with the measurement results, the users reported increased symptoms at end-to-end latency above 63 ms (see Table 2). Thus, to minimize cybersickness symptoms, end-to-end latency of below 58 ms (\({<}15\) on SSQ) is required.

5.3 User Performance Results

To evaluate user performance, we measure the time the users need to complete the task successfully. Figure 3 shows the average time needed to reach a cube while users were exposed to different end-to-end latencies. Error bars represent 95% confidence intervals of the mean.

Fig. 3.
figure 3

Mean needed time for different end-to-end latency conditions. Error bars show 95% confidence interval of the mean. At values above the Friedman threshold, users need significantly longer to complete the reaching task.

Results of the Friedman test show that higher end-to-end latency has a significant effect on time, needed to complete the task (\(\chi ^2(2) = 68.71\), \(p \le .001\), Kendall’s \(W = .658\)). The internal reliability of the scale was high (Cronbach’s \(\alpha = .872\).) Post-hoc analysis with Conover’s test and \(\alpha = .05\) results in significant differences between 50 ms and \(\ge \)69 ms, 54 ms and \(\ge \)92 ms, 58 ms and \(\ge \)92 ms, 63 ms and \(\ge \)104 ms, 69 ms and \(\ge \)121 ms, 75 ms and \(\ge \)121 ms, 83 ms and \(\ge \)104 ms, 92 ms and \(\ge \)121 ms, and 104 ms and 150 ms. In other words, with the minimum end-to-end latency (i.e., 50 ms), users could complete the task faster than with added delay. Thus, the results show that higher latency values tend to require more time to complete the task.

Table 3. Mean needed time for each end-to-end latency condition.

By observing Fig. 3, we can see that higher delay worsen user performance. Looking at the findings from the post-hoc test indicate that user performance drops at 69 ms. The data in Table 3 shows that mean needed time values at end-to-end latency of 69 ms and 104 ms are nearly the same. Thus, user performance gets worse with increased delay and most users need significant longer to fulfill the task at end-to-end latency above 69 ms.

Even though the users were allowed to touch the virtual cubes only once, we could not identify any mistakes. We believe that the reaching task is very simple to complete and that more complex tasks would lead to higher chances of performance errors.

5.4 Embodiment Results

We conducted a user study to furthermore measure the sense of presence, agency, and body ownership at different end-to-end latency conditions. We calculated the median (MED) and Inter-Quartile Range (IQR) for each question, as it can be seen in Table 4 and Fig. 4. Each question was assessed on a seven-point Likert scale: \(-3\) for “Strongly Disagree” and 3 for “Strongly Agree.”

The internal reliabilities for embodiment measurements are high. Cronbach’s \(\alpha \) for embodiment ranges between .706 (impaired side), .869 (simultaneity), .953 (sense of agency), .988 (sense of presence), and .992 (sense of body ownership). For the sense of agency (\(\text {MED} = 3\)) and for the sense of presence (\(\text {MED} = 2\)), we could not find any significant differences. These findings indicate that end-to-end latency of up to 250 ms does not cause a significantly lower sense of agency or presence.

However, results also show that end-to-end latency above 101 ms will decrease the sense of body ownership (see Fig. 4a). Indeed, body ownership significant decreases for end-to-end latency above 192 ms. Thus, the sense of body ownership decreases with increasing end-to-end latency. Furthermore, with end-to-end latency above 67 ms, one user already noticed different latencies between the left and right side and also correctly specified the impaired side. As expected, with higher end-to-end latency the users not only perceived increased latency but also correctly specified the side (either left or right body side) which was influenced by additional delay (see Fig. 4b). At end-to-end latency of 250 ms, almost half of the users (\(n = 10\)) perceived the delay and could correctly specify the impaired side. Similarly, with an end-to-end latency of 116 ms and \(\ge \)158 ms, the users perceived the movements of the avatar not simultaneously to their movements.

Table 4. The table depicts the median (MED) answers for embodiment questionnaire for each end-to-end latency condition. Inter-Quartile Range (IQR) values furthermore indicate the variability of the questionnaire responses.
Fig. 4.
figure 4

Median responses for the embodiment questionnaire at different end-to-end latency conditions.

6 Discussion

With the significant results of the SSQ comparing pre- and post-conditions, we accept hypothesis H1. Thus, increased end-to-end latency will evoke cybersickness symptoms. Stauffert et al. [17] expressed a similar view. The researchers evaluated latency jitter and found that the disorientation sub-scale was higher in the affected group, although nearly half of the users did not notice the repeated lag in the tracking of the HMD. In contrast to Waltemate et al. [18], we found significant results at the end-to-end latency of 75 ms. However, as suggested by Stanney et al. [16], we want to satisft the desire of keeping the value beneath 15 on the SSQ. Therefore, we recommend that the researchers and developers should keep end-to-end latency well below 58 ms (see Sect. 5.2). Our guidelines also confirm those proposed by Attig et al. [2], according to which the lower limit of 100 ms seems to be outdated. Moreover, we indeed recommend keeping the end-to-end latency below 20 ms as suggested by DiZio and Lackner [3].

Furthermore, with the significant results for user performance, we also accept hypothesis H2. Thus, with higher end-to-end latency, the user performance will decrease. In particular, users will need more time to complete the task. Compared to Waltemate et al. [18], stating that motor performance is affected by latency above 75 ms, we obtain similar results. The results show that the higher delays worsen user performance (see Sect. 5.3). Although the users typically need more time to complete the reaching task with increased end-to-end latency, we could not identify significant differences until 69 ms.

Hypothesis H3 cannot be accepted. We found no significant results for the sense of agency and presence. However, with increased end-to-end latency, body ownership and simultaneity perception will decrease. As already noted by Waltemate et al. [18], the delay affects body ownership significantly later, between 125 ms and 210 ms. Our results show that the sense of body ownership decreases earlier, already at 101 ms; however, we could not find significant differences until 192 ms. Similarly, Waltemate et al. [18] stated that simultaneity perception was affected by latency above 75 ms. Our results show that simultaneity perception dropped at latency values higher than 116 ms.

In contrast to work from Kondo et al. [9], not all our participants were able to identify the impaired side even not in the 250 ms end-to-end latency condition. Nevertheless, with the increased delay, more and more users identified the impaired side correctly. With end-to-end latency of 67 ms only one user and with an end-to-end latency of 250 ms total of ten users were able to specify the impaired side correctly.

7 Conclusion

This study aimed to gain insight into the effect of latency on user experience and performance in IVR by artificially increasing the end-to-end latency. We designed three different tasks to evaluate which impact increased latency has in IVR. For quantitative results, we used the Friedman test with Conover’s test of multiple comparisons as a post-hoc test. We found significant differences when comparing final SSQ scores between several end-to-end latency conditions. Increased latency negatively affects user experience, e.g., users reported significant cybersickness symptoms with end-to-end latency above 63 ms. Thus, to ensure final SSQ scores of \({<}15\) and minimize cybersickness symptoms, the researchers and developers should ensure end-to-end latency far below 58 ms.

Furthermore, also user performance decreases with higher end-to-end latency. The results show that latency above 69 ms causes significant longer execution time. The results suggest that the sense of body ownership does accept higher latency values than user performance and cybersickness before significant results can be found. Sense of body ownership decreases with end-to-end latency above 101 ms; however, significant results were not found until 192 ms. We could not find any significant results for the sense of presence and agency. In general, with increasing end-to-end latency, user experience decreases and user performance worsens. We conclude that latency between perceived and physical head movements does not only contribute strongly to cybersickness but also decrease user performance and sense of embodiment.

In future work, we will focus on physiological data, such as ECG to measure the electrical activity of the heart. We believe that this will detect the cybersickness symptoms easier than subjective measurements of a SSQ. Furthermore, the task complexity and duration should be increased and breaks should be included between latency conditions, to avoid carry-over effects.