Keywords

1 Introduction

While time itself is a concept, it is also something humans can perceive. However, the perception of time is subjective, and this experience is an integral part of the overall experience of any environment, with virtual environments being no exception. Therefore, acknowledging this in their conception and actively devising virtual environments to modulate the time experience of users would be an exciting design instrument. Time perception as an interdisciplinary topic is explored in numerous scientific studies in disciplines as diverse as psychology or neuroscience. In a previous study combining cognitive science and computer science, we already examined the relationships between time perception and rhythmic stimuli with a sorting game in a two-dimensional setting, which revealed varying time experience and performance effects depending on whether single or combined stimuli were used in relation to their tempo [19]. However, that initial experiment was limited to a crowd-sourced desktop setting, and we adapted the experiment for Virtual Reality (VR), which allows for extended control of the test environment plus extending the initial set of questions to include stimuli and time experience aspects in fully immersive environments. Therefore, while our main goal with this study is to investigate and interpret anything significant by having a correlation study process, we do come with assumptions coming from our initial study, which are different effects on time perception depending on the type of stimuli present (audio, visual, or both), tempo-related time estimation errors for combined stimuli, as well as decreased task performance by the presence of visual stimuli. Before detailing the experiment in Sect. 3, analyzing the data in Sect. 4, and discussing the results in Sect. 5, we will first provide the necessary background and review related work in the following Sect. 2. Section 6 then concludes the paper with an outlook and potential impact of our findings on general virtual environment design.

2 Background and Related Work

The most common time perception model in literature is the clock model, which assumes an “internal clock” as a body system dedicated to time perception [9]. This system is usually tied to a model producing “ticks” or “units”, such as an oscillator or a pacemaker model, where the body keeps track of time by counting these ticks. In these models, it is assumed that time perception can be changed either through the speed at which these ticks are produced or by skipping some, with these effects potentially resulting from external stimuli unrelated to time. Counting the ticks can be delegated to attention, making attentional resources a key element of time perception [5, 6]. Attention is believed to act as a gate or switch on accounting the time units, where paying less attention to time will result in compressed time experience as time units are likely to be skipped. Another source of subjective temporal distortions can be found in the use of arousal levels, which are believed to affect the clock speed, resulting in an altered time experience [2, 3, 13]. However, using an internal clock model is not necessary to predict time perception accurately [22]: on the context of watching videos in different scenarios, time perception was accurately predicted by a classification network using changes in perceptual content and visual spatial attention (more specifically, gaze position). Nevertheless, it is a natural way to interpret time experience based on the focus on attention and arousal that gives initial directions to time perception studies.

In the literature, we can observe different types of timing tasks, which involve different processing mechanics. The most common aspects are time estimation (i.e., asking for a duration estimate of an event with units, such as seconds) and the feeling of time passage (i.e., judging if time passes by quickly or slowly), which is also referred to as time judgment.

This difference can be observed in depressed subjects underestimating time but judging it as passing slowly [3], with boredom-prone people judging boring tasks as passing slowly but not necessarily overestimating those [27], or with players of the game “Thumper” reporting faster time passing without time estimation errors [23].

Having defined time perception, we can discuss the state of “flow”, one of the applications of time perception alterations. Flow is a psychological state of full attention on a task defined by Csikzentmihalyi, represented through nine dimensions: challenge-skill balance, action-awareness merging, clear goals, unambiguous feedback, concentration on task, sense of control, loss of self-consciousness, time transformation, and autotelic experience. The psychological state of flow is a research subject in itself, centered around one’s relation to a task as it primarily relies on the challenge-skill ratio aspect [11]; it is often considered a desirable state, and time perception alterations are one of its manifestations. In social media, the manifestation of flow seems to be influenced by the positive effect of telepresence on enjoyment, concentration, challenge, and curiosity; flow would then influence the presence of time distortions [17]. Delving further into social context, it was found that the concentration and time distortion components of flow, but not enjoyment, were affected by working in a group of two compared to working individually in virtual worlds (within the social game platform Second Life) [16]. More in line with our work, several studies on flow and VR have been conducted. The previously mentioned study on Thumper compared flow states between VR and non-VR setups, finding that despite VR’s technical immersion, both scenarios could lead to a flow state [23]. Within VR activities, a model ostensibly associates flow and playfulness, defined by a combination of intrinsic motivation, control, and freedom to suspend reality, this association then influences competence in the activity [21].

VR studies on time perception, however, are not limited to flow. VR itself affects our senses due to what is being conveyed through sensory channels, but also due to the devices used and the eventual physical discomfort we can get from it. Simply comparing the time perception of the same game in a VR versus a desktop setup leads to an underestimation bias for VR [15]. It also seems that time perception changes when bored or waiting in VR compared to real life [10]. In another simple study about time perception comparing the time perception between VR and desktop while doing a task ranging from 30 s to 5 min, it was observed that if both situations yielded time overestimation, the VR scenario was overestimated more [14]. However, technological immersion alone might not be a sufficient explanation, as walking in VR does not seem to affect time perception significantly [1]. In another experiment about zeitgeber on time perception while doing a task conducted both in VR and on desktop, no significant difference was observed regarding time perception, but there was a difference in task performance (with the VR participant performing better) [24]. The Thumper study also observed the effect of performing better in a VR setup compared to a non-VR one [23]. When it comes to the effect of emotional content, VR itself appears not to yield any difference to real life in time distortions when the emotional content is the same [7]. Employing VR entails possibly unique content, such as having movements represented by an avatar. Differences were observed between avatar and no-avatar conditions where avatar presence leads to, in a retrospective paradigm, a significantly faster passage of time without an effect on time estimations [25].

When considering VR and time perception, we can thus regard both the technological immersion, i.e., the effect of being in VR through its dedicated hardware, as well as the virtual environment stimuli and transformations that can be induced through VR. A specific stimulus type we want to employ in VR is rhythmic stimuli, which we already identified as having notable effects in a desktop scenario [19]. Rhythmic stimuli and music generally have a high potential to modulate one’s time experience. With music, it was observed that higher tempos induce longer subjective time, but emotional valence decreases (but does not suppress) the effect of tempo and affects time perception. These effects on time perception might be due to their effect on arousal; interestingly, using a different orchestration (piano only or full orchestra) does not affect time judgment and pleasantness while affecting arousal [4]. On timing evaluations of instrumental excerpts of Disco songs (including estimation and judgment of time passage) over different tempos, it was observed that faster tempos were correlated to longer reproduction duration; however, no effect on estimations was observed alongside the necessity of a tempo difference of at least 20 BPM required for timing measurement differences to appear [8]. By varying cognitive load through tasks and arousal levels through music choice while keeping music tempo constant, it was found that (1) time is judged as passing faster under higher cognitive load (presence of a math task), (2) presence of a concurrent motor task (tapping the music’s tempo) yield shorter subjective durations, and (3) for the motor task, for the same music tapping to half notes instead of eighth notes ended up with smaller time estimations and time passage rated as faster [28]. Regarding timing and spatial movement, rhythmic auditory stimuli (RAS) have been observed to improve motor performance when vision is unavailable [18].

Rhythm, however, is not only tied to music and audio. Concerning temporal judgments, audio was believed to be dominant over visuals [12]. However, a later study found visual stimuli dominance using Point-Light-Display (PLD) dance motions compared to simple audio tempos [26]. Participants were presented with both the dance motion and the audio tempo and had to give a globally fitting tempo. The result of this study suggests that under the right conditions, visual stimuli can dominate audio in terms of timing, which may be due to the quantity of temporal information on a sensory channel rather than the channel itself.

3 Experiment

We recruited 30 participants from a public, science-related event in Luxembourg City and from students and staff at the University of Luxembourg. The female-to-male ratio was 46.66%–53.33%, with ages ranging from 19 to 45 (mean age 25.63/median 24). Participants were received at the VR/AR Lab’s test space at the University of Luxembourg and briefed about the experiment before the session, both verbally and via an informed consent form, in which we also collected demographic data. After the setup and familiarization phase, the participants performed the tasks, with each participant able to take breaks between trials if desired. Each session lasted a total of approximately one hour.

3.1 Trial Task and Design

Participants had to complete trials in which they had to sort three-dimensional objects according to their shape (spheres, capsules, or cubes). As shown in the screenshot sequence in Fig. 1, the objects must be grabbed with a VR controller and dragged into one of the two larger sinks, which only accept specific shapes displayed on a scoreboard above them (cf. Fig. 1a–c). Once sorted, an object disappears with a small animation, and a new object to be sorted appears in the center of the virtual environment (cf. Fig. 1d).

Fig. 1.
figure 1

Sorting example from trial task, in sequential order from left to right.

The sorting attempt has a predefined duration unknown to participants. After this duration, the experiment ends with a questionnaire in which the participants are asked to estimate the time in seconds and rate on two Likert scales how fast the experiment felt and how tired they were (cf. Fig. 2).

Fig. 2.
figure 2

Post-trial questionnaire on time estimation, time perception, and fatigue.

Trials were subjected to conditions that were a combination of these parameters:

  • trial length: how long the trial lasted in seconds (either 40, 50, or 60)

  • tempo: the rhythm of the audio/visual stimuli if present in beats per minute (either 100, 140, or 180; forced to 0 without stimuli)

  • visual stimuli: whether or not there are flashing visual pulses around the object

  • audio stimuli: whether or not metronome click sounds are produced

Each participant went through all 30 possible combinations in random order.

3.2 Technical Specifications

Participants used the VIVE Pro Eye head-mounted display (HMD) with one of its controllers, allowing them to enter VR and control virtual objects with six degrees of freedom (DoF). This particular HMD also allows for collecting precise eye-tracking data during the trials, specifically on gaze and pupil dilation. During the experiments, participants were also instructed to wear an Empatica E4 wristband to record further physiological data; our experiment included heart rate variability and skin temperature. Our custom application reads the data by receiving messages from Empatica’s E4 Streaming Server software. However, the physiological data is out of the scope of this paper and will be discussed in a separate publication. The experimental application itself was developed in Unity and used SteamVR. In addition to the application, OVR advanced settings were employed to adjust the participant’s height position.

4 Analysis and Results

4.1 General Methods and Data

As discussed previously, various data were collected from each trial; in this subsection, we will describe what data we effectively used for our analysis and how.

Disclaimed Trials. Some trials were disclaimed from the data set depending on our notes during trials. The reasons for removing trials were:

  • Misunderstanding of task controls by the participant.

  • Misunderstanding of trial questionnaire by the participant (verbally checked when giving incoherent values such as negative time estimation).

  • Disturbance or interruption during the trial, either from the participants themselves (i.e., asking a question or talking during a trial) or external sources (i.e., technical issues, noise from a nearby room).

Variables/Functions. Variables and functions extracted from trial data are:

  • \(t \in T\): trial identifier t from all available trials T.

  • \(p \in P\): participant identifier p from all available participants P.

  • trials(p): all trial identifiers of a participant p.

  • participant(t): participant identifier of a trial t.

  • correct(t): number of correct sorts at the end of trial t.

  • trialLength(t): length of trial t in seconds.

  • reportedLength(t): reported length of trial t in seconds.

  • reportedFatigue(t): self-reported fatigue of a participant after performing trial t in an ordinal scale from 1 to 5.

  • reportedSpeedPerception(t): subjective participant rating of trial t’s speed in an ordinal scale from 1 (slow) to 5 (fast).

  • trialIndex(t): the index of trial t, indicates how many trials were performed before t and, therefore, global repetition from the experiment session.

Since reportedSpeedPerception(t) and reportedFatigue(t) are purely subjective questions to the participant, we can use these values directly. However correct(t), the task performance direct variable depends on the participants’ individual performance and reportedLength(t), while the direct time estimation variable depends on both the trial’s duration as well as the participants’ individual representation of a second. Therefore, these two variables need to undergo a normalization process. Normalizing the performance variable (correct(t)) for our analysis goes through a three-step process involving the following extracted variables:

  • correctPerSecondTrial(t): average number of correct answers per second during trial t.

    $$\begin{aligned} \frac{correct(t)}{trialLength(t)} \end{aligned}$$
  • correctPerSecondParticipant(p): average number of correct answers per second of a participant during trials.

    $$\begin{aligned} \frac{\sum \nolimits _{t' \in trials(participant(t))}correct(t')}{\sum \nolimits _{t' \in trials(participant(t))}trialLength(t')} \end{aligned}$$
  • correctNormalized(t): amount of correct answers per second of trial t normalized with 1, i.e., the average number of correct answers per second of a participant among all performed trials.

    $$\begin{aligned} \frac{correctPerSecondTrial(t)}{correctPerSecondParticipant(p)} \end{aligned}$$

For time estimation (reportedLength(t)), we employed the following process:

  • secondBias(p): ratio of the total of seconds of a participant p’s trials and the total reported time, defining what the participant considers a second.

    $$\begin{aligned} \frac{\sum \nolimits _{t \in trials(p)}trialLength(t)}{\sum \nolimits _{t \in trials(p)}reportedLength(t)} \end{aligned}$$
  • deltaTimePerception(t): averaged delta per second between reported time (accounting participant bias) and trial length of a trial t.

    $$\begin{aligned} \frac{secondBias(p)*reportedLength(t)}{trialLength(t)}-1 \end{aligned}$$

However, in addition to deltaTimePerception(t) we also use its absolute abs(deltaTimePerception(t)) as it represents the magnitude of time perception delta of a trial.

Outcome Variables. The specific variables relevant to the analysis performed in this study are:

  • deltaTimePerception(t): if the difference between reported time and trial time shows too much individual bias, this variable represents a participant’s variation in perception. A negative value indicates that the seconds of the trial t were reported as shorter than the other trial performed by this participant. A positive value means that the seconds were reported as longer.

  • abs(deltaTimePerception(t)): instead of denoting how much longer or shorter a second is interpreted for a trial t compared to other trials performed by a participant, the absolute value represents the magnitude of the eventual time distortion.

  • correctNormalized(t): to simplify the analysis, we do not take into account negative answers to evaluate performance but only the amount of correct answers. A smaller number of correct answers would indirectly reflect the number of incorrect answers due to the time lost. Similar to a participant’s variation in perception, this variable represents the variation in performance instead of the pure performance, with values \({<}1\) indicating worse and \({>}1\) better performances.

  • reportedSpeedPerception(t): the subjective interpretation of whether time drags or flies after performing trial t in an ordinal scale from 1 to 5.

  • reportedFatigue(t): with relatively low trial numbers, reported fatigue should mostly depend on the participants, independent of trial parameters.

  • trialIndex(t): the index of trial t, indicates how many trials were performed before t and, thus, is an indicator of global repetition from the experiment session.

Parameters. The parameters used in this experiment are:

  • stimulusTrial(t): the type of stimulus used in trial t, possible values are: None, VisualsOnly, AudioOnly, Both.

  • hasAudioTrial(t): whether or not the trial t contains an audio stimulus.

  • hasVisualTrial(t): whether or not the trial t contains a visual stimulus.

  • hasStimulusTrial(t): whether or not the trial t contains any type of stimulus.

  • tempo(t): the tempo in beats per minute (BPM) of a trial t, possible values are: 0, 100, 140, 180. a tempo of 0 means that the trial had no stimuli.

Trial Filters. When performing analyses, we may want to include only subsets of the trials to investigate specific effects. The filters used are:

  • filterAUDIOONLY: considers only trials that only have an audio stimulus, is equivalent to saying t where stimulusTrial(t) == AudioOnly.

  • filterVISUALSONLY: considers only trials that have a visual stimulus, is equivalent to saying t where stimulusTrial(t) == VisualsOnly.

  • filterBOTH: considers only trials that have both audio and visual stimuli, is equivalent to saying t where stimulusTrial(t) == BOTH.

Performed Tests. Using the above variables, parameters, and filters, we performed statistical tests on various data subsets to examine the effect of a stimuli presence on performance and time estimation with the following parameters:

  • stimulusTrial(t): ANOVAs on deltaTimePerception(t), abs(deltaTimePer-ception(t)) and correctNormalized(t) to see if any significant difference appears between possible stimuli situations.

  • hasAudioTrial(t), hasVisualTrial(t), hasStimulusTrial(t): t-tests on deltaTimePerception(t), abs(deltaTimePerception(t)) and correctNormalized(t) to see if there is an effect on the presence or absence of a specific stimulus (since our t-tests are not pairwise, no t-test corrections have been performed).

To examine this effect on time judgment, we did the following as the time judgment variable is ordinal:

  • stimulusTrial(t): Kruskal-Wallis test on reportedSpeedPerception(t) to see if any significant difference appears between possible stimuli situations.

  • hasAudioTrial(t), hasVisualTrial(t), hasStimulusTrial(t): Wilcoxon test on reportedSpeedPerception(t) to see if there is an effect on the presence or absence of a specific stimulus.

In order to observe the effect of tempo on performance and time estimation, we performed ANOVAs between tempo(t) and the variables deltaTimePerception(t), abs(deltaTimePerception(t)) and correctNormalized(t). The ANOVAs were also repeated across the filters filterAUDIOONLY, filterVISUALSONLY and filterBOTH to see if differences in tempo appear only within stimuli conditions. As for the effect of tempo on time judgment (reportedSpeedPerception(t)), here we again replaced the ANOVAs with Kruskal-Wallis tests, including the repeated ones under filters. Correlations between time estimation variables (deltaTimePerception(t), abs(deltaTimePerception(t))) and performance (correctNormalized(t)) were investigated with Pearson tests. As the time judgment variable (reportedSpeedPerception(t)) is ordinal, correlation with time estimation variables (deltaTimePerception(t), abs(deltaTimePerception(t))) as well as performance (correctNormalized(t)) was made through Pearson tests. The confounding effect of fatigue was investigated by considering reportedFatigue(t) both as a nominal and as an ordinal variable; the former allows us to eventually observe differences between specific ratings and has been considered through ANOVAs with time estimation variables (deltaTimePerception(t), abs(deltaTimePerception(t))) and performance (correctNormalized(t)), the latter was considered through Spearman tests on time estimation variables, performance and time judgment (reportedSpeedPerception(t)). Regarding the confounding effect of trial index, trialIndex(t) is an ordinal data so it has been investigated with Spearman tests on time estimation variables (deltaTimePerception(t), abs(deltaTimePerception(t))), performance (correctNormalized(t)) as well as time judgment (reportedSpeedPerception(t)). Each ANOVA with a p-value below 0.1 would lead to a subsequent Tukey HSD, Kruskal-Wallis tests would lead to subsequent paired Wilcoxon tests. Table 1 provides an overview of the different effects per subset with a significant p-value or tendency, while each is discussed in detail in the following sections. The complete data from our tests, including confidence intervals and average values, are available online [20].

Table 1. Subsets for which a significant p-value (\(\bullet \)) or tendency (\(\circ \)) is observed for a combination of stimulus dimension and outcome variable group.

4.2 Across All Trials

Effects of Stimuli on Performance. One of the aims of this study is to investigate the effects of stimuli on task performance. Looking at performance across all trials, the ANOVA between task performance (correctNormalized(t)) and stimulus type (stimulusTrial(t)) revealed no significant difference. However, when performing a t-test between task performance (correctNormalized(t)) and the presence of visual stimuli (hasVisualTrial(t)), a significant difference (p = 0.027) can be observed alongside decreased performance when a visual stimulus is involved, as the mean with stimulus is lesser than without. No effect is observed when considering the t-test with the presence of an audio stimulus (hasAudioTrial(t), p = 0.167) or any stimulus (hasStimuluslTrial(t), p = 0.431). Therefore, we can only observe a decrease in performance due to the presence of a visual stimulus but no effect on performance from the sole presence of any or of an audio stimulus. When stimuli have the dimension of type, they also have the dimension of tempo. The ANOVA between participant performance (correctNormalized(t)) and stimuli tempo (tempo(t)) generally finds no effect of tempo. However, tempo might have an effect under a specific stimulus type. Therefore, we performed the same ANOVA but only considering subsets of data where trials contained either audio stimuli only (filterAUDIOONLY), visual stimuli only (filterVISUALSONLY), or both simultaneously (filterBOTH). We can then observe a tendency when trials have audio stimuli only (filterAUDIOONLY, p = 0.088, F = 2.461). The Tukey HSD of this ANOVA reveals that the effect is significant between 180-100 (p = 0.070, diff = 0.04), with a diff value indicating the faster tempo leads to better trial performance with audio stimuli only. In the absence of interference from visual stimuli, the faster tempo for audio stimuli may have implicitly stimulated the participant to sort objects faster.

Effects of Stimuli on Time Estimation. Similarly to performance, we evaluated the effect of stimuli type and tempo on time estimation variables. Likewise, ANOVAs were effectuated regarding the type of stimuli (stimulusTrial(t)) and tempo (tempo(t)) on both the normalized time estimation error (deltaTimePerception(t)) and its magnitude (abs(deltaTimePerception(t))). The only significant result is a tendency between time estimation error (deltaTimePerception(t)) and tempo (tempo(t)) (p = 0.073, F = 2.328). Tukey’s HSD of this ANOVA reveals that the effect is a tendency only between tempi of 180 and 140 (p = 0.65, diff = 0.067), with trials under a tempo of 180 being rated with a longer time per second than trials under a tempo of 140. We performed similar ANOVAs involving tempo considering subsets of data where the trials had either only audio stimuli (filterAUDIOONLY), only visual stimuli (filterVISUALSONLY), or both at the same time (filterBOTH). The only significant result comes from the ANOVA between time estimation error (deltaTimePerception(t)) and tempo (tempo(t)) across trials within the AudioOnly condition (filterAUDIOONLY) (p = 0.039, F = 3.288), where the Tukey HSD follow-up reveals a near-significant difference between tempi of 140 and 100 (p = 0.051, diff = 0.106) and a near tendency between 180 and 140 (p = 0.107, diff = 0.091). This means that in the case of trials with only an audio stimulus, trials with a BPM of 140 were evaluated as faster than others, which contradicts the analysis under all types of stimuli. This contradiction may indicate the confounded effect of tempo in time perception depending on stimuli types. Finally, t-tests between our time estimation variables (deltaTimePerception(t), abs(deltaTimePerception(t))) and the presence of audio stimuli (hasAudioTrial(t)), visual (hasVisualTrial(t)), or any (hasStimuluslTrial(t)), yielded no significant result, meaning no effect of any type of stimuli present can be observed on time estimation here.

Effects of Stimuli on Time Judgment. As the time judgment variable (reportedSpeedPerception(t)) is ordinal, we produced Kruskal-Wallis tests between it and the type of stimuli (stimulusTrial(t)) and tempo (tempo(t)). In the case of the test between time judgment (reportedSpeedPerception(t)) and the type of stimuli (stimulusTrial(t)), we can see a significant effect (p = 0.016, chi2 = 10.267), however a follow-up paired Wilcoxon test reveals statistical difference only between the stimulus “None” and each of the other stimuli types (p = 0.048 for None-AudioOnly, p = 0.015 for None-Both, p = 0.015 for None-VisualsOnly). As for the test on tempo (tempo(t)), we observe another correlation (p = 0.001, chi2 = 15.432) that, after a paired Wilcoxon, shows significant differences between 0–140 (p = 0.006), 0–180 (p = 0.005), 100–180 (p = 0.031) as well as a tendency between 100–140 ((p = 0.064) and a near-tendency between 0–100 (p = 0.12). These two tests highlight a significant difference in time judgment depending on the presence of any stimuli (both by the differences from the “None” stimulus in the first test and the “0” BPM tempo in the second, which correspond to trials without stimuli). This is also verified by the Wilcoxon test between time judgment (reportedSpeedPerception(t)) and the presence of any stimulus (hasStimuluslTrial(t)) (p = 0.003, mean(TRUE)>mean(FALSE)), considering the mean values, we can say that the presence of a stimulus has a significant impact in making a trial judged as passing faster than one without any. The same test has been done on the presence of audio (hasAudioTrial(t)) (p = 0.323) and visuals (hasVisualTrial(t)) (p = 0.023, mean(TRUE)>mean(FALSE)), meaning no significant difference on the presence or not of an audio stimulus is observed but a fast-inducing effect is observed on the presence of a visual one is recorded. In the case of tempo, the results of the paired Wilcoxon discussed earlier also indicate a significant between 100BPM and other (non-0) tempi across all types of stimuli. However, running the same Kruskal-Wallis test under subsets on “AudioOnly” trials (filterAUDIOONLY), “VisualOnly” trials (filterVISUALSONLY) and trials with both (filterBOTH(t)) highlights a significant difference only across trials with both stimuli (filterBOTH(t)) (p = 0.027, chi2 = 7.2343) meaning that meanwhile tempo may have an effect across all stimuli, that effect might only be due to the combined stimuli scenario. Follow-up paired Wilcoxon tests indicate a significant difference between 100–180 (p = 0.032) and 140–180 (p = 0.091), which are the same conclusion as the tests without subsets.

Correlations Between Outcome Variables. In order to investigate the correlation between outcome variables, Spearman tests were used when time judgment (reportedSpeedPerception(t)) was involved as the data is ordinal; otherwise, Pearson tests were used. When comparing time estimation error (deltaTimePerception(t)) and performance (correctNormalized(t)), we see no correlation (p = 0.218, cor = 0.043), but we see a significant negative correlation with the magnitude of time estimation error (abs(deltaTimePerception(t))) (p = 0.016, cor = 0.083). This means the performance is correlated to the magnitude of time estimation errors but not to the direction; in other words, participants may generally be more error-prone in their estimations depending on their performance. When it comes to time judgment (reportedSpeedPerception(t)), it is negatively correlated to time estimation errors (deltaTimePerception(t)) (p = 1.724e-13, rho = 0.251) and positively correlated to the magnitude of said error (abs(deltaTimePerception(t))) (p = 0.002, rho = 0.107). This means that the bigger the error, the faster the time is perceived and underestimated trials are rated at passing faster. As for time judgment (reportedSpeedPerception(t)) and performance (correctNormalized(t)), better performance is associated with faster passing trials (p = 1.265e-05, rho = 1.50).

Confounding Effect of Fatigue. While running the experiment, we noticed that participants were often exhausted at the end of the session. As exhaustion affects time perception and performance, we verified if it affected our outcome variables. For its effect on performance, a Spearman test between performance (correctNormalized(t)) and fatigue (reportedFatigue(t)) reveals a significant correlation (p = 2.924e-08, rho = 0.190). By considering the fatigue variable (reportedFatigue(t)) nominal and performing an ANOVA with performance (correctNormalized(t)), we retrieve this correlation (p = 1.68e-11, F = 14.56), and subsequent Tukey HSD reveals that fatigue values of “3,4,5” are statistically different from values of “1,2” as the p-value is below 0.001 in all these situations, other situations (i.e., “3-4”, “1-2”, ...) have a p-value above 0.48. As for time estimations, signed error (deltaTimePerception(t)) is not correlated if we look through a Spearman test (p = 0.191, rho = 0.045), but we retrieve statistical differences with the ANOVA (p = 0.004, F = 3.86). Subsequent Tukey HSD indicates statistical differences between “3-2” (p = 0.001), “5-2” (p = 0.045) and a tendency between “4-2” (p = 0.06). No correlation is observed for the absolute error (abs(deltaTimePerception(t))) with both the Spearman test (p = 0.842, rho = 0.107) and the ANOVA (p = 0.893, F = 0.277), however it is observed for the Spearman test with time judgment (reportedSpeedPerception(t)) (p = 2.88e-11, rho = 0.227). From the results of the ANOVAs involving performance (correctNormalized(t)) and time estimation (deltaTimePerception(t)), we can identify two groups of reported fatigue values which are “1-2” and “3-4-5”. We thus decided to perform the same tests on subsets of our data according to these two groups on Sect. 4.4.

Confounding Effect of Trial Index. Similarly to fatigue, repeated trials can affect both performance and time perception due to learning effects and repetition. We thus evaluated correlations through Pearson tests between the number of a trial across the session (trialIndex(t)) as a continuous variable and time estimation variables (deltaTimePerception(t), abs(deltaTimePerception(t))) as well as performance (correctNormalized(t)). When it comes to performance (correctNormalized(t)), the test reveals a correlation (p = 2.626e-14, cor = 0.259), which indicates a learning effect. Trial repetition also seems to affect time estimation as we retrieve a significant correlation with the signed time estimation error (deltaTimePerception(t)) (p = 1.407e-04, cor = 0.131) and a tendency with its absolute (abs(deltaTimePerception(t))) (p = 0.089, cor = 0.059). We, therefore, decided to investigate different phases (beginning, middle, end) in the experiment defined by three subsets of the data based on the trial index, as shown in Fig. 3 and discussed in detail in the following Sect. 4.3.

Fig. 3.
figure 3

Trial subset allocation for a participant by index.

4.3 Trial Index Subsets

Due to our results on the confounding effects on trial index as described in Sect. 4.2, we decided to investigate three subsets of the data based on the trial index with steps of ten (1–10, 11–20, 21–30). For each subset, we performed all the tests as on the full set of trials, which are available for download [20] and described in Sect. 4.1. However, the normalization process only considered the targeted subset when using the sum of data on trials. We go through each subset in the following subsections, focusing on the significant results.

Trials 1–10. This subset corresponds to each participant’s first ten trials of the experiment, constituting a discovery phase. Regarding stimuli effects on performance, the results indicate a positive effect of audio stimulus presence (t-test performance\(\sim \)audiopresence, p = 0.016; TukeyHSD performance\(\sim \)stimuli, p = 0.099 on worse performance between visuals\(\sim \)both). This can be linked to the results across the entire experiment as we have seen a negative impact of visual stimulus presence and a tendency for trials with just audio to have their performance led by the tempo (see Sect. 4.2). This difference might be due to a learning effect on the trials where the participants are not proficient enough to lose enough performance from visuals but may be eased by the presence of any leading audio rhythmic stimulus for this repetitive task. As for stimuli on time estimation, here we only observe a potential novelty effect on trials without stimuli as they are rarer than trials with any stimuli (t-test time estimation error stimuli presence, p = 0.093). The most notable difference with the analysis on all trials regarding this aspect is the absence of the effect of tempo on the time estimations. Surprisingly, no effect of stimuli concerning time judgment is observed from any of our tests. When it comes to correlation between performance and time estimations through Pearson correlation tests, contrary to the full set of trials, we observe a (negative) correlation with the signed time estimation error (p = 0.025, cor = 0.142) but not on the absolute error. Regarding time judgment concerning both time estimations and performance, we lost the correlation with the absolute time error; however, we retrieve the positive correlation from the Spearman tests with performance (p = 0.095, rho = 0.106) and the signed time estimation error (p = 0.183, rho = 0.184). Finally, regarding results on confounding effects of trial index and fatigue, we retrieve correlations of fatigue on performance and time experience, correlation of performance and trial index but none between trial index and time experience. This means that with this subset, we should have isolated an experiment phase based on trials for time perception but not for performance, which is expected as the participants were likely learning how to perform better during the first few trials.

Trials 11–20. This subset corresponds to each participant’s ten trials in the middle of the experiment, representing a neutral phase as they no longer learn the task while not being in the experiment long enough to be bored. Regarding performance and stimuli, in this trial, we observe a performance increase from higher tempo within trials using combined stimuli (Tukey HSD 180-100, p = 0.044, diff = 0.055; 180-140, p = 0.075, diff = 0.053) As for the effect of stimuli on time estimation, we only observed a tendency between tempo 140–100 across trials solely using a visual stimulus (Tukey HSD 140-100, p = 0.100, diff = 0.083). When it comes to time judgment, strong evidence shows that under this subset, the presence of any stimulus heavily alters it (paired Wilcoxon on time judgment and type of stimuli, p<0.002 for all situations with “None”; paired Wilcoxon on time judgment and tempo, p<0.02 for all cases with “0”; Wilcoxon time judgment and stimuli presence, (p = 8.939e-05, higher mean with stimulus). We also observe an effect of visual stimulus specifically with the same Wilcoxon test on visual stimulus presence (p = 0.041, higher mean with stimulus) but not on audio presence. Therefore, the effect of stimuli on time judgment in this subset is consistent with the full set regarding the effect of the present stimuli type, but we lost the effect of the tempo. This time, no correlation has been observed between time estimation and performance. However, we retrieve the time judgment correlations from the full set with Spearman tests on the performance (p = 0.004, rho = 0.168), the signed time estimation error (p = 1.988e-04, rho = 0.215) and its absolute (p = 4.20e-05, rho = 0.236). Finally, on confounding effects, we find a correlation between fatigue and time judgment, which is expected, yet we also see a correlation tendency between trial index and signed time estimation error through a Pearson test (p = 0.071, cor = 0.105). Still, the absence of correlation with performance indicates a proper subset division on the trial index.

Trials 21–30. This subset corresponds to each participant’s last ten trials, representing the end of the experiment and, thus, a phase where the participant is possibly tired or bored. Here, investigation of performance suggests that when there are stimuli, the presence of visual stimulus leads to worse task performance (Tukey HSD performance between stimuli modes “Both” and “AudioOnly”, p = 0.010, diff = 0.045; “VisualsOnly” and “AudioOnly”, p = 0.026, diff = 0.041; t-test on performance and visual presence, p = 1.449e-04, lower mean when stimulus is present). Under this subset, nothing significant has been observed in the relation between stimuli (type or tempo) and time perception (time estimation and judgment). Similarly to the previous set (Trials 11–20), no correlation between time estimation and performance is observed. The relation from Spearman tests between time judgment with both performance and time estimation is similar to the subset at the beginning of the experiment (Trials 1–10), where performance is positively correlated (p = 0.002, rho = 0.176) and time estimation error is negatively correlated (p = 4.795e-04, rho = 0.202) but the absolute error is not. On confounding effect, while finding effects of fatigue on time judgment as expected, unfortunately, we see tendencies on the effects of the trial index on both time estimation error (p = 0.067, cor = 0.107) and task performance (p = 0.073, cor = 0.104) from Pearson tests. This may indicate a transition between phases of boredom and tiredness relative to the time spent in the experiment.

Fig. 4.
figure 4

Example trial subset allocation for a participant by fatigue.

4.4 Fatigue Subsets

Having obtained the results on the confounding effects of fatigue described in Sect. 4.2, we decided to investigate two subsets depending on the participants’ answers on fatigue, one for fatigue at 1 or 2, and one for fatigue at 3, 4, or 5 (see Fig. 4). Similar to the previous subsets, for each, we performed all tests on the complete trial set, which can be found online [20], and the same modification on the variable normalization process by only considering the targeted subset when using the sum of data on trials. In the following sections, we will again focus exclusively on significant test results and will not re-elaborate the methodology.

Fatigue Levels 1-2. This subset corresponds to the participant experiencing “low” fatigue. First, concerning performance and stimuli tempo, we can observe the lesser performance of stimuli with a tempo of 100 (Tukey HSD on performance and tempo between 140 and 100 bpm, p = 0.056, diff = 0.065; 180 and 100 bpm, p = 0.020, diff = 0.081). This finding can be aligned to results from the full set (performance dependent on tempo for audio stimuli) and trials 11–20 (180 bpm leading to better performance under combined stimuli). No general effect of stimuli type on performance is observed, either from the specific situations possible or the presence of a modality. This subset yielded no significant insights regarding stimuli dimensions (type and tempo) and time estimation. Regarding time judgment and tempo, however, we observe significant differences between 180 and 100 bpm across all stimuli (Paired Wilcoxon on time judgment and tempo between 100 and 180 bpm, p = 0.023) as well as an effect of the presence of 180 bpm (Wilcoxon between 0 and 180, p = 0.025). When considering only trials with combined stimuli, the paired Wilcoxon shows a significant difference between 180–100 (p = 0.017) and a tendency between 100–140 (p = 0.094), which is consistent with the time judgment effects results on the complete set of trials. As for time judgment and stimuli type, we observe another consistent result from the full set as stimuli tend to be judged faster when there is any stimulus (Wilcoxon on the presence of any stimulus, p = 0.073) or if there is at least a visual one (p = 0.020). Looking for a correlation between performance, time estimation, and time judgment yielded similar results to the subset of Trials on index 1–10. With a tendency of a negative correlation (p = 0.059, cor = 0.141) from Pearson between time estimation error and performance, a significant negative correlation (p = 0.022, rho = 0.171) out of the Spearman between time estimation error and time judgment, a positive one (p = 0.033, rho = 0.157) between performance and time judgment, but no correlations from the absolute time estimation error. Confounding effects of trial index on performance (Pearson test, p = 0.086, cor = 0.129) are similar to trials 1–10, which is not surprising as early trials probably are low fatigue trials. A confounding effect of fatigue is not observed for both time estimation and judgment; however, we can observe it for performance (Spearman test, p = 0.037, rho = 0.157; Tukey HSD (more of a t-test considering we have two values in this subset), p = 0.048, F = 3.977, diff2-1 = 0.046). We can assume that higher fatigue trials in this subset would be after the learning phase when the participant would be more proficient.

Fatigue Levels 3-4-5. This subset corresponds to the participant having a higher fatigue level. Concerning performance and stimuli type, like for the full set and trials 21–30, we observe a negative impact from the presence of visual stimuli (TukeyHSD on stimuli type and performance between VisualsOnly and Audio Only, p = 0.080, diff = 0.024; t-test between performance and presence of visuals, (p = 0.020)). Some observations converge towards contextual effect on tempo depending on the type of stimulus of the trial (TukeyHSD on absolute time estimation error between tempos 140 and 180 for audio trials, p = 0.086, diff = 0.1; TukeyHSD on signed time error between tempos 140 and 180 across all, p = 0.080, diff = 0.087). As for time judgment and stimuli, we only found evidence indicating an effect of general stimulus presence (Paired Wilcoxon on stimuli type and time judgment, p < 0.07 for pairs involving “None”; Wilcoxon on time judgment and stimulus presence, p < 0.011; paired Wilcoxon on tempos, p = 0.032) between 0–140 and p = 0.040 for 0–180). Looking for a correlation between performance, time estimation, and time judgment yielded similar results to the full set of trials. From Spearman tests with time judgment, we retrieve the negative correlation with the time estimation error (p = 2.217e-11, rho = 0.256), the positive correlation with the absolute error (p = 4.366e-04, rho = 0.136) and with the performance (p = 3.915e-05, rho = 0.159). We do not retrieve the significant p-value on the Pearson test between performance and absolute error, but a near-tendency (p = 0.110, cor = 0.062). While we do not observe a confounding effect from the ANOVAs between fatigue and time estimation variables, we see a tendency (p = 0.089, rho = 0.066) from the Spearman test on the absolute time estimation error. The effect of fatigue is more pronounced on performance (Pearson test, p = 0.029, rho = 0.085) and from the subsequent Tukey HSD of the ANOVA (p = 0.064, F = 2.768) the difference appears to be between 3–5 (p = 0.050, diff = 0.023). Fatigue also seems to have a significant effect on time judgment (p = 2.185e-08, rho = 0.215). In this subset, the fatigue level of 3 and 5 may be significantly different on both performance and time judgment; however, this is apparently due to the normalization on the subset and was not observable across all trials. Confounding effects of the trial index observed from Pearson tests are similar to those of the full set, which is not too surprising as the subset is rather large and was not made to minimize the effect of the index.

5 Discussion

We conducted a VR experiment where participants repeatedly performed a simple sorting task subjected to different stimuli conditions, allowing us to gather numerous data, including information related to task proficiency, subjective data from questionnaires, and physiological data. The overarching goal was to explore relationships between time experience, task performance, environment/stimuli conditions, and physiological cues. However, we found that for some of the results on the complete data set, it was necessary to investigate closer multiple subsets, which we will discuss together with their implications for VR application design. Looking at the entire data set, we can observe specific effects of stimuli type and tempo on different aspects of time perception and performance, as well as some interesting correlations between those variables, which appeared to be also heavily impacted by the trial index and fatigue through the experiment. Therefore, we defined subsets of data based on the trial index and difference values from ANOVAs for fatigue. As indicated in Table 1, we can observe effects of stimuli presence, type, and tempo on performance and time experience depending on the subset. Some of the data and correlations align between subsets while others do not, which may indicate contextual effects of stimuli on performance and time experience depending on task repetition and fatigue.

5.1 Observations on Task Performance and Stimuli

A central result from the analysis of the complete set of trials is how the presence of visual stimuli negatively impacts task performance. This is coherent with our previous study and is to be anticipated as the task requires visual attention, and those stimuli may be disturbing. However, within subsets, this result is observed only for later trials (index 21–30) and high fatigue (fatigue 3-4-5). Surprisingly, we see a positive effect on performance from the presence of audio but only in the early trials (index 1–10), and no effect of stimuli type presence in-between (index 11–20). This could be interpreted as the disturbance of visual stimulus not being impacting enough when one is learning the task or not physically tired. We can also interpret the presence of audio stimuli as beneficial for this task only when the participant is in a learning phase. Effects of tempo are observed on trials within trials with only the audio stimulus when considering all trials and within trials with combined stimuli within the subset of trials from index 11–20. In both cases, the faster tempo led to faster performance, which indicates an invitation to go faster in the task from the faster stimuli; however, that interpretation from the participant depends on the context.

5.2 Observations on Time Estimation and Stimuli

Time estimation variables are defined from the difference between (normalized) participants’ estimation of time taken for a trial and the actual time of a trial; we thus talk about the time estimation error and its absolute, which represents the magnitude of error regardless of if the participant under- or overestimated the length of a trial. A global effect of tempo can only be observed with the complete set of trials between 180–140 (with 180 being overestimated). As for differentiation within stimuli situations, we see a time estimation error difference on the audio stimuli for the entire trial set and for high-fatigue trials, and absolute error difference in visuals for trials 11–20 as well as on combined stimuli for high fatigue. These results show a tendency of the 140 bpm tempo leading to fewer (absolute) estimation errors and being underestimated compared to 100–180. Another overestimating effect from stimulus presence is observed for trials 1–10. Overall, we also observe context-dependent effects of stimuli as the type of stimuli will affect one’s time perception differently depending on the index or fatigue.

5.3 Observations on Time Judgment and Stimuli

Time judgment or time passage refers to the subjective evaluation of a participant on whether they think a trial is going by fast or slow. It differs from time estimation as the participant gives their subjective feeling about the time spent, whereas time estimation is an attempt of the participant to be objective about time. Time judgment has semi-constant results of the presence of any stimuli inducing faster perception; this is observed across all trials for both subsets on fatigue and the subset on stimuli 11–20. We can also observe a specific fastening effect of visual stimuli on all these sets affected by the presence of any stimuli except for the high fatigue one. The absence of these observations on subsets of trials either at the beginning or at the end of the experiment might indicate the participant needing to get used and, over time, getting too used to the presence of stimuli to be noticeable, regardless of fatigue levels. Another effect observed only on the complete set and for low fatigue is a difference between tempo in general and within trials with combined stimuli.

5.4 Time Judgment, Time Estimation, and Performance Balance

Two correlations were consistent across all sets: a negative correlation between time estimation error and time judgment and a positive correlation between performance and time judgment. The first means that under-estimation of time is reflected by a subjective faster trial and the second means that when the participant rated the trial as faster than usual, they would perform better. This could directly be tied to the notion of flow as two elements of flow states are the challenge-skill balance and time transformation. The similarity between time estimation error and time judgment is indicative that our time transformation was a general time experience shift and not a side effect of disorientation (i.e., a participant judging a trial as fast because they thought it was a higher amount of time that actually passed). Among low fatigue and early trials, we also retrieve a negative correlation between performance and performance, reinforcing the flow approach. As for the time estimation error magnitude and performance, under all trials and high fatigue, it is negatively correlated, which means that possibly in a specific context, higher time transformation generally was detrimental to performance. However, this is against the flow definition, and combined with previous observations, it may imply that we are approaching flow states only with time transformations that are an underestimation. We also observed positive correlations between this magnitude and time judgment with all trials, the 11–20, and high fatigue subsets, which could be interpreted as the presence of any time transformation potentially leading to faster time passage in general.

5.5 Limitations

It is important to remember that the effect of the rhythmic stimuli in our experiment is contextualized in the particular scenario of our sorting task. We can also see some limits from the confounding effects of task familiarity and fatigue, and even with the subsets, which unfortunately implies using fewer data and thus having lesser statistical power relevance (especially in the case of low fatigue), we can isolate the effect of at most one confounding effect but not both at the same time. Individual per-participant differences are also to be considered, as through casual talks with the participants, we know of varying degrees of VR experience between participants; however, this data was not recorded and is thus not included in our analysis.

6 Conclusion

In this paper, we used a simple sorting task to explore how rhythmic stimuli affect time experience and task performance in VR. We found that the context concerning the trial index (repetition of the action) and fatigue affected these aspects of the user experience. Depending on the familiarity with the task, the presence of a particular type of rhythmic stimulus under possible tempos will affect either performance or time experience. Both aspects can contribute significantly to a flow experience or even well-being in general, and the results of this study can thus inform the design of future interactive VR applications.

While the familiarity or repetition of a task or action can be easily assessed in any interactive application, using fatigue as a modulator could be a growing opportunity for VR developers as newer HMDs incorporate advanced sensors, e.g., for eye tracking. We observed effects of rhythmic stimuli under some fatigue and task familiarity, yet the more important finding is the presence of effect variation rather than the specific effect itself, highlighting the need for studies of time perception concerning context- and subject-dependent time modulations.