Keywords

1 Introduction

The genuine power of virtual reality (VR) is not necessarily to produce a faithful reproduction of “reality” but that it offers the possibility to step outside of the normal bounds of reality and realize goals in a totally new and unexpected way [1]. One of these use cases of VR would be a third-person perspective experience. In the third-person perspective, the body is presented away from the actual body, enabling a person to look at his/her own full body. Previous studies have reported that the third-person perspective is better than the first-person perspective with regard to motion (e.g., walking, catching a ball) and spatial awareness [2,3,4]. In fact, some technologies to capture external self-body image have been developed using a head-mounted display (HMD), a drone or augmented reality glasses [5,6,7]. On other hand, using VR would present the benefits of safety, time, space, equipment, cost efficiency, and ease of documentation for rehabilitation or motor learning [8]. The use of the effective design of the third-person perspective in VR may open up a new path for rehabilitation or training, which reflecting a full body while moving or posing.

In realizing third-person perspective in VR, one of the important factors might be self-cognition for virtual body presented in VR. The sense of ownership (often called body ownership) and agency are fundamental factors for relative self-cognition [9,10,11]. Ownership refers to one’s self-attribution of a body [9, 12]. Agency refers to global motor control, including the subjective experience of action, control, intention, and motor selection and the conscious experience of will [13]. Research has shown that importance of ownership and agency for self-avatar in VR. For example, ownership is associated with physiological responses and cognitive and behavioral changes when using a virtual body [14,15,16,17] On the other hand, agency is related motor control [18]. In addition, ownership and agency may be elements that make an avatar feel like one’s own biological body, embodiment [19]. In fact, ownership and agency are associated with the effect of VR training on physiological, cognitive, and neural changes, similar to the actual exercise [20].

Previous VR studies have reported that people can feel as if the virtual body is their own body, i.e., experiencing ownership and agency for the virtual body in a first-person perspective [3, 15, 16]. A full-body illusion (FBI) as a known method for eliciting ownership for full body. In the FBI, ownership is elicited by simultaneously and synchronously stroking or touching a mannequin or virtual avatar and actual body with a rod or brush [15, 22]. On other hand, agency is elicited by the synchronized synchronous movement of a virtual body and the actual body in addition of ownership [3, 14, 21]. Visuo-tactile or visuo-motor multimodal feedback is a matter of ownership on a mannequin or virtual avatar presented in a first-person perspective. Contrary to the consistent results in first-person perspective studies, how to elicit ownership and agency for an avatar presented in the third-person perspective remains unclear.

Ownership of a virtual body presented in the third- person perspective was first studied by Lenggenhager et al. [23]. They reported that participants experienced ownership of a virtual body placed 2 m away from their own body in the extraperipersonal space. However, some other studies reported different results in the follow-up studies of Lenggenhager et al. [23], whereby ownership could not be elicited for a mannequin or virtual body presented in a third-person perspective and could be elicited only in the peripersonal space [24,25,26,27]. Contrarily, other studies argue that ownership could be elicited in the third-person perspective [3, 14, 29]. These contradictory results may be due to differences in multisensory synchrony (visuo-tactile or visuo-motor synchrony) of a mannequin or virtual body with the actual body in these studies. Most visuo-tactile studies in the third-person perspective, except for Lenggenhager et al. [24], have reported that ownership is not elicited by visuo-tactile synchrony [24,25,26,27], whereas most visuo-motor studies have reported that ownership is elicited by visuo-motor synchrony [3, 14, 21, 28]. Thus, whether ownership is elicited for the virtual body or mannequin might depend on differences in multisensory synchrony. In the study by Lenggenhager et al. [24] and some studies under very similar conditions [29, 30], participants saw a video image of their body presented in the third-person perspective through a head-mounted display in real time, as well as a video image of their body being stroked in synchrony with their actual body. In this paradigm, subtle movement of the actual body reflected the video image of their body. This visuo-motor synchrony between the video image of their body and the actual body could affect ownership of one’s own body presented in the third-person perspective, in addition to visuo-tactile synchrony induced by stroking synchronously. On the other hand, visuo-motor synchrony between the actual and artificial body used in previous studies constitutes merely visuo-motor synchronization behavior (e.g., moving hand) and visuo-motor synchronization plus tactile synchronization behavior (e.g., walking). Participants get synchronized visuo-motor information while moving their hand and synchronized visuo-motor information plus tactile information when their feet hit the ground while walking. However, previous studies did not directly compare multisensory synchronicity. Therefore, with regard to eliciting ownership, whether synchronized visuo-motor information or visuo-motor plus tactile information is critical in visuo-motor synchrony between actual and artificial body remains unclear.

Unlike ownership, participants experience agency for a virtual body presented in the third-person perspective in visuo-motor synchrony [3, 21]. However, whether “visuo-tactile” or “visuo-motor plus visuo-tactile” information enhances the agency rather than “visuo-motor” synchrony without tactile feedback for an avatar presented in the third-person perspective remains unclear.

To answer this question, our study aimed to investigate the effect of multimodal presentations on ownership and agency for a male avatar presented in the third-person perspective in VR. We modified and applied the FBI paradigm and compared the effect of the modality condition (visuo-tactile, visuo-motor, and visuo-motor-tactile) on ownership and agency. The use of the third-person perspective avatar within ownership and agency in VR will contribute to developing a new rehabilitation and training method, alongside novel VR content.

Several studies exist on the synchronized effect on ownership and agency. In the study of the effect of ownership using first-person perspective, the synchronization and asynchronization of visual and motor stimuli were compared; ownership was not elicited in asynchronization [3, 14, 16, 21]. Another study examined the effect of ownership on an avatar presented in a first-person perspective with the synchronization and asynchronization of visual and tactile stimuli and reported that ownership was not elicited in asynchronization [15, 22]. The fact that ownership was not elicited in asynchronization using a first-person perspective, which is more likely to elicit ownership, suggests that synchronization is a fundamental condition, even in a third-person perspective. Therefore, in this study, we aimed to determine which synchronous presentation condition (visuo-motor, visuo-tactile, or visuo-motor-tactile) would elicit ownership if visual, motor, and tactile stimuli are presented synchronously. On the other hand, agency is elicited by the synchronized multimodal presentation of visuo-motor but not by asynchronized multimodal presentation of visuo-motor stimuli [3, 14, 15, 21, 22]. Therefore, our study did not compare between synchronized and asynchronized multimodal presentation conditions for agency. This study examined the difference between visuo-motor and visuo-motor-tactile synchrony conditions for ownership and agency.

2 Methods

2.1 Participants

A total of 23 men (mean age = 22.39, SD = 1.81) and 20 women (mean age = 22.4, SD = 1.29) participated in this study. Gender is biologically divided (groups were divided according to the gender assigned at birth). All participants had normal or corrected vision. Data obtained from three male participants were excluded from the analysis because two had poor understanding of the experimental procedure and one received no vibration from the haptic device. Thus, 20 men (mean age = 22.2, SD = 1.62) and 20 women (mean age = 22.4, SD = 1.29) were included for sub sequent analysis. All participants were recruited through an advertisement published by the National Institute of Advanced Industrial Science and Technology and were from Tsukuba city, Japan. Written informed consent was obtained from all the participants prior to the experiment, and they were paid for participating (3300 JPY). The experiment was approved by the Ethics Board of the National Institute of Advanced Industrial Science and Technology.

Power analysis of post-hoc test was conducted to confirm that the number of participants was appropriate for this experimental design. A post-hoc power test was per-formed using G power; the power was 0.94. Parameter values as flowing were set (statistical test = ANOVA: repeated-measures, within-between interaction, effect size = .25, alpha = .05, total sample size = 40, number of groups = 2, and number of measurements = 3, correlation among repeated measures = .5, nonsphericity correction э = 1).

2.2 Apparatus

HMD, tracker, and haptic device were used for this experiment controlled by Unity 3D on a VR ready PC (Alienware15, Dell, Texas, USA). The HMD (HTC VIVE PRO, HTC, New Taipei, Taiwan; resolution 1440 × 1600 pixels in each screen) was used to display three-dimensional images and head tracking. Five trackers (VIVE Tracker, HTC, New Taipei, Taiwan) were used to track the participant’s body movement for real time reflection to virtual body movements. The three-dimensional image was generated and presented by Xperigrapher [31], platform created with Unity 3D (Unity Technologies, San Francisco, USA). The avatar we used was that of a brown-skinned Latin male.

A haptic device (310-113, Precision Microdrives, Brixton Road, England) was used to present vibration to the participant’s hand. An analog output device (NI-9264, National Instruments, Texas, USA) was connected to the haptic device and used for transforming digital signals from the computers to an analog signal, transmitted signals to the haptic device. When there was a white ball in contact with an avatar’s right hand in the virtual space, participants received vibration to the right hand from the haptic device (not for the visuo-motor condition, as is mentioned later in this section).

The experimental setup is shown in Fig. 1.

2.3 Procedure

Participants wore the HMD and had five trackers attached to both wrists, both feet, and abdomen, one in each place, and a haptic device was attached to the right hand. Before starting the experiment, adjustment of the virtual body to each participant’s body, equipped with the HMD, trackers, and haptic device, was conducted. The length and thickness of the arms, foot, and torso and posture of the virtual body were automatically adjusted by Final IK (Unity asset), and the orientation of the trackers was manually adjusted. The participants then looked to the virtual body in a first-person perspective, through the virtual mirror, and checked whether the virtual body matched own body in terms of the length and thickness of body parts and posture. If they were not matched, the virtual body was adjusted again.

At the beginning of the trial, the virtual body was presented 2.0 m in front of the participant. The virtual body moved in synchrony with the participant’s body. Two parallel horizontal black lines were presented diagonally forward right of the virtual body; one line (above line) was at a height of 1.3 m and another line (under line) at a height of 1.0 m (Fig. 2). Participants were instructed to look at the virtual body reaching out above the line with their right hand while doing the same.

There were three conditions across the trials. In the visuo-tactile condition, participants were required to look at the virtual body and not move. In this condition, a white ball (60 cm in diameter) was presented on the bottom line and popped up to the upper line. When it reached upper line, the white ball disappeared and vibration was presented to the participant’s right hand by the haptic device (Fig. 3). In the visuo-motor condition, participants were required to move the virtual body’s right hand once between the bottom line and upper lines back and forth using their right hand every time the white ball was presented (Fig. 4). In the visuo-motor-tactile condition, participants were required to move the virtual body’s right hand once between the upper line and bottom line using their own hand every time the white ball was presented. When the virtual body’s hand reached on the bottom line, the white ball disappeared and vibration was presented to the participant’s right hand by the haptic device (Fig. 5). The phase shows the timing and sequence of the visual, tactile, and motor (participant movements) in Figs. 4, 5, and 6. Participants took part in each of the three multimodal feedback conditions (visuo-tactile, visuo-motor or visuo-motor-tactile condition) once in block design. 10 trials were repeated in each condition block. The order of the condition blocks was also counterbalanced for each participant. At the end of each feedback condition block, participants took off the HMD and verbally responded to the questionnaire. The questionnaire included four categories (ownership, ownership control, agency, agency control) of items. Each category was consisted of three questions. The mean and variance of 40 participants were calculated in 12 conditions [=3 feedback × 4 categories] for the statistical analysis. The total duration of the experiment was approximately 60 min.

Fig. 1.
figure 1

Illustration of the system architecture

3D image was generated and presented by PC and head-mounted display. The movement of the participant was tracked using five trackers. Vibration was presented by a haptic device.

Fig. 2.
figure 2

The participant in real world and the presented avatar in virtual reality

Avatar presented 2 m away from the viewing position. The participant could view the avatar from the viewing position and move it. Avatar touching an object (while ball); vibration is presented to the right hand of the participant through a haptic device in the visuo-tactile and visuo-motor-tactile conditions, but not in the visuo-motor condition.

Fig. 3.
figure 3

Visuo-tactile condition. The phase shows the timing and sequence of the visual, tactile, and motor (participant movements). Visual: view through head-mounted display; Motor: participants’ posture; Tactile: vibration presentation through the haptic device attached to the participants’ right hand. Participants were asked to be still and looking at the virtual body. White ball appears on the bottom black line, pops up until the upper black line, and disappears. Participants received vibration on their right hand through the attached haptic device when the virtual body’s right hand touched the ball.

Fig. 4.
figure 4

Visuo-motor condition. Participants manipulate the virtual body’s right hand by their own right hand while looking at the virtual body. White ball appears on the bottom black line, and the ball disappears soon after the virtual body’s hand returns to the original position. In this condition, participants did not receive vibration on their right hand through the attached haptic device.

Fig. 5.
figure 5

Visuo-motor-tactile condition. Participants manipulate the virtual body’s right hand by their own right hand while looking at the virtual body. Participants received vibration on their right hand through the attached haptic device when the virtual body’s right hand touched the ball (the ball disappeared at this moment).

2.4 Questionnaire

We used 12 items of the questionnaire, adopted from the version statement by Kalckert and Ehrsson (2014) [32] (Table 1).

The 12 items spanned four categories; ownership, agency, ownership control, and agency control. Each category had three items. The statement for the item categorized as ownership concerned the feeling of ownership (e.g., I felt as if I looked at my body), that for the item categorized as agency concerned agency (e.g., I felt as if I could control the movement of the virtual body), that for the item categorized as ownership control did not concern the feeling of ownership (e.g., It seems as if I had more than one body), and that for the item categorized for ownership control did not concern the feeling of agency (e.g., I felt as if the virtual body was controlling my will). The categories of ownership and agency was used to measure ownership and agency, whereas the categories of ownership control and agency control was used for controlling task compliance and suggestibility. Participants responded to each statement by choosing a number on the 7-point Likert scale, ranging from -3 for “strongly disagree” to 3 for “strongly agree,” with 0 indicating “uncertain.” Each statement was randomized across trials and participants. We calculated each category’s mean score to compare the difference in condition and gender for the FBI.

Referring to previous studies [33, 35,36,37], we defined participant experience ownership or agency when the average score of each category of the statement in group level was equal or greater than 1. This indicated that participants at the group level affirm the experience of ownership or agency.

Table 1. Ownership and agency questionnaire applied to each condition.

2.5 Statistical Analysis

The score questionnaire was calculated for each category (ownership, ownership control, agency, agency control) of three items by averaging each of the three item scores. The average of each category score was used further statistical analysis. Thus, there are (3 feedback conditions) × (4 categories) × (3 questions) per each participant, and the mean and variance of 40 participants were calculated in 12 conditions [=3 feedback × 4 questionnaires (mean of three questions)] for the statistical analysis.

To assess the effect of modality condition and gender, there were used as factors in a mixed-design two-way ANOVA. The condition was the within subject factor, and gender was the between subject factor. All p-values in multiple comparisons were Holm-corrected.

To confirm the order effect, we conducted an analysis of variance with order as a factor. The results showed that the order effect was not significant for any of the categories (ownership: F (2, 78) = 1.90, p = .156, η2 = .018; ownership control: F (2, 78) = .421, p = .668, η2 = .003; agency: F (2, 78) = 1.06, p = .350, η2 = .002; agency control: F (2, 78) = .34, p = .672, η2 = .004).

3 Results

3.1 Ownership

Male participants in the visuo-motor-tactile condition affirmed experiencing ownership for the virtual body (the average of category score = 1.23) while affirming the score of ownership (the average of category score ≧ 1), but male participants in the visuo-tactile or visuo-motor condition or female participants in all conditions did not (visuo-tactile condition in male participants: mean = 0.58, visuo-motor condition in male participants: mean = 0.75, visuo-tactile condition in female participants: mean = -0.33, visuo-motor condition in female participants: mean = 0.58, visuo-motor-tactile condition in female participants: mean = 0.15) (Fig. 6).

Repeated-measures ANOVA revealed significant main effects of condition and gender (condition: F (2, 76) = 7.31, p = .008, η2 = .06; gender: F (1, 38) = 7.82, p = .001, η2 = .11). Therefore, multiple comparisons were conducted for each condition. The ownership scores in the visuo-motor and visuo-motor-tactile conditions were significantly larger than those in the visuo-tactile condition (visuo-motor condition vs visuo-tactile condition; t (19) = 2.67, p = .002, visuo-motor-tactile condition vs visuo-tactile condition; t (19) = 3.30, p = .012), and there was no significant difference between the visuo-motor and visuo-motor-tactile conditions (t (38) = .22, p = .83). An interaction was also revealed (F (2, 76) = 4.25, p = .02, η2 = .03). Post-hoc multiple comparisons on male participants revealed that the visuo-motor-tactile condition score was larger than the visuo-tactile and visuo-motor condition scores (visuo-motor-tactile condition vs visuo-tactile condition: t (19) = 3.82, p = .03; visuo-motor-tactile condition vs visuo-motor condition: t (19) = 3.00, p = .014).

Post-hoc multiple comparisons on female participants revealed that the visuo-motor condition score was larger than the visuo-tactile and visuo-motor-tactile condition scores (visuo-motor condition vs visuo-tactile condition; t (19) = 2.85, p = .031, visuo-motor condition vs visuo-motor-tactile condition; t (19) = 2.67, p = .031).

3.2 Ownership Control

Participants did not affirm ownership control (the average score of ownership control < 1) across the three conditions, irrespective of gender (visuo-tactile condition in male participants: mean = −1.45, visuo-motor condition in male participants: mean = −1.40, visuo-motor-tactile condition in male participants: mean = −1.10, visuo-tactile condition in female participants: mean = −1.61, visuo-motor condition in female participants: mean = −1.42, visuo-motor-tactile condition in female participants: mean = −1.48) (Fig. 6).

Repeated-measures ANOVA revealed no significant main effects of condition or gender and interaction (condition: F (2, 76) = 1.75, p = .18, η2 = .01; gender: F (1, 38) = 0.52, p = .48, η2 = .01; condition × gender: F (2, 76) = 1.02, p = .37, η2 = .01).

3.3 Agency

Participants in the visuo-motor and visuo-motor-tactile conditions experienced agency (the average score of ownership control > 1) for the virtual body, irrespective of gender (visuo-motor condition in male participants; mean = 1.5, visuo-motor-tactile condition in male participants; mean = 1.78, visuo-motor condition in female participants; mean = 1.42, visuo-motor-tactile condition in female participants; mean = 1.22), but those in the visuo-tactile condition did not, irrespective of gender (visuo-tactile condition in male participants; mean = 0.22, visuo-tactile condition in female participants; mean = −0.20) (Fig. 7).

Fig. 6.
figure 6

Questionnaire results of main effect of condition for ownership and ownership control. Error bar shows a standard error. (Left: Ownership; Right: Ownership control)

Repeated-measures ANOVA revealed a significant main effect of condition (F (2, 76) = 26.99, p < .0001, η2 = .29). Therefore, a multiple comparison analysis was conducted. The visuo-motor and visuo-motor-tactile condition scores were significantly larger than the visuo-tactile condition scores (visuo-motor condition vs visuo-tactile condition; t (38) = 5.61, p < .0001; visuo-motor-tactile condition vs visuo-tactile condition; t (38) = 5.27, p < .0001). There was no significant difference between the visuo-motor and visuo-motor-tactile conditions (t (38) = .35, p = .72).

No main effect of gender or interaction was revealed (gender: F (1, 38) = 2.48, p = .12, η2 = .02; condition × gender: F (2, 76) = .57, p = .56, η2 = .01).

3.4 Agency Control

Participants did not affirm agency (the average score of agency control < 1) control across the three conditions, irrespective of gender (visuo-tactile condition in male participants: mean = −1.53, visuo-motor condition in male participants: mean = −1.73, visuo-motor-tactile condition in male participants: mean = −1.45, visuo-tactile condition in female participants: mean = −1.35, visuo-motor condition in female participants: mean = −1.48, visuo-motor-tactile condition in female participants: mean = −1.58) (Fig. 7).

Repeated-measures ANOVA revealed no significant main effect of condition or gender and interaction (condition: F (2, 76) = .46, p = .63, η2 = .004; gender: F (1, 38) = .15, p = .70, η2 = .002; condition × gender: F (2, 76) = .75, p = .45, η2 = .01).

Fig. 7.
figure 7

Questionnaire results of main effect of condition for agency and agency control. Error bar shows a standard error (Left: agency; Right: agency control).

4 Discussion

This study assessed the effect of multimodal presentations (visuo-tactile, visuo-motor, and visuo-motor-tactile) on ownership and agency for a male avatar presented in the third-person perspective in VR. Results showed that ownership was elicited in the visuo-motor-tactile condition only for the male group and agency was elicited in the visuo-motor and visuo-motor-tactile conditions for both groups.

This study only used a synchronized multimodal presentation because previous studies had reported that a synchronized multimodal presentation is a fundamental condition to elicit both ownership and agency. Ownership is elicited by the synchronized multimodal presentation of visuo-tactile or visuo-motor stimuli but not by the asynchronized multimodal presentation of visuo-tactile or visuo-motor stimuli [3, 14, 15, 21, 22]. On the other hand, agency is elicited by the synchronized multimodal presentation of visuo-motor stimuli but not by the asynchronized multimodal presentation of visuo-motor stimuli [3, 14, 15, 33]. In the following section, we will discuss ownership, agency, and the relationship between ownership and agency.

4.1 Ownership

We found that participants affirmed ownership in the visuo-motor-tactile condition in the male group. This finding is inconsistent with reports that ownership was elicited only in the first-person perspective [24,25,26,27]. This finding supports the notion that the lack of information could be compensated for by other information, as suggested by Ma and Hommel (2015) [38]. In FBI, ownership could be elicited only by looking at the virtual body or mannequin in the first-person perspective, i.e., in a visuo-proprioceptive feedback condition [25]. On the other hand, the virtual body presented in the third-person perspective does not offer synchronized feedback of visual and proprioceptive feedback of body information. Therefore, for the subjects to assess whether the body is their own body, a variety of additional information, such as multimodal feedback of visual, motor, and tactile information, might be needed to compensate for the lack of visuo-proprioceptive information to the virtual body.

This result is inconsistent with previous two studies showing that availability of synchronous information does not increase ownership [39, 40]. One reason for this could be the difference of the way of tactile presentation (self-touch or goal-directed touch). In this study, tactile feedback was presented goal-directed touch, which could increase ownership [41, 42]. The possibilities of a ceiling effect could also explain this difference. Actual images of the participants themselves were presented from a third-person perspective in those studies [39, 40]. Personalized avatar could enhance ownership over the virtual body [43].

The fact that ownership was elicited only for the male group might indicate gender differences with regard to ownership of the male virtual body. For the female group, ownership of the virtual body was not elicited in any condition (the average score of ownership was < 1) and was negatively affected, which was different from the male group. The results suggest an effect of gender match with the virtual body. This finding seemingly contradicted the fact that gender differences do not affect ownership using a first-person perspective [17, 22]. In contrast, Tacikowski et al. [44] showed that ownership could be elicited for the opposite gender body using a first-person perspective and that eliciting ownership for the opposite gender body could change sense of one’s own gender. The difference between that study and ours lies in the fact that theirs was based on a first-person perspective and not VR. In addition, what was implied in their study was quite the opposite of causality with regard to how the emergence of a sense of body ownership for gender-different bodies transforms gender identity (i.e., whether ownership affect gender identity). However, they also showed that the median value of elicited ownership for the opposite gender body was less than that matched for gender body (supporting online material [44]). This result is consistent with the idea that there is an effect of gender match, as noted in this study. The effects seen in this study could be attributed to the following two reasons. First, the resemblance between fake and real body is a top-down factor affecting ownership [43, 45]. The similarity with the male virtual body in the female group was less pronounced than that in the male group. This point might affect ownership in the female group. The use of more realistic avatars in this study than those used in previous studies [17, 22] might have made females more aware of the differences between the avatars and their actual bodies and felt tactile feedback unnatural. Second, there may also be something beyond the simple lack of similarity. Schwind et al. [46] showed that the female group dislike the male hand and felt less presence than the male group while using the male hand in VR. The male virtual body used in this study may appear to be more masculine than the mannequin or avatar used in the previous studies [17, 22]. This point might also affect ownership in the female group.

Another possible influence on the sense of body ownership in this study is the avatar’s skin color concerning race. There are no consistent findings on how skin color could affect ownership. Some studies [17, 47, 48] showed that there is no difference of ownership between right-skin and black-skin avatars for white people. However, Farmer et al. [49] showed that showed that ownership of a white hand was stronger for white people. Lire et al. [50] reproduced these results and was faster elicited in synchronous condition with a black hand. In addition to those conflicting results on skin color, previous studies have mainly focused on white people, and the impact of skin color on Asians has not been fully explored. The avatar used in this study has brown skin, which is different to that of most Japanese. This point would weaken ownership irrespective of gender.

4.2 Agency

We found that participants affirmed agency (the average score of ownership was > 1) in the visuo-motor and visuo-motor-tactile conditions in both the male and female groups. These results were consistent with previous studies of agency with a rubber hand from the first-person perspective. Kalckert and Ehrsson [32, 33] showed that agency was elicited for a moving rubber hand according to the actual hidden hand movement but not in the visuo-tactile condition. This is an issue with regard to the validity of discrimination of the visuo-tactile and visuo-motor-tactile conditions. It is suggested that the synchronization of visual stimuli and movement is a sufficient condition for agency.

The comparator model suggests that agency occurs when sensory prediction and actual sensory feedback are matched [51]. In this study, participants were asked explicitly to control the virtual body in the visuo-motor and visuo-motor-tactile conditions. Therefore, the prediction of moving the virtual body and the feedback of the movement of the virtual body could be fully tied together; hence, agency was elicited. In the visuo-tactile condition, the sensory prediction of moving the virtual body and actual sensory feedback of the movement of the virtual body were absent; thus, agency was not elicited. The results show that agency was equally elicited, regardless of whether the gender of the participant is the same as that of the avatar.

However, agency was affected by comparison and but also by many other factors, such as goal achievement [52] or emotion [53]. Then, why did not tactile feedback and gender difference affect agency? two-step account of agency, different perspective of comparator model, suggests that agency could be divided by feeling of agency (non-conceptual and sensory motor level) and judgment of agency (conceptual judgment) [54]. The extent to which the feeling and judgement of agency, respectively, contribute to the overall agency depends on the context and task requirements. Sensory prediction is matched with afferent information such as proprioception and visual feedback, and if no particular discrepancy between the information is detected at this stage, agency occurs without no further processing. If there is a discrepancy between the information, further processing is done. At second stage, Intentions, beliefs, and contextual cues are used to judge who is the agent. In visuo-motor and visuo-motor-tactile condition, it is obvious that participants are the agent. Therefore, tactile feedback or gender difference was not used for the process by which agency occur.

4.3 Ownership and Agency

The relationship between ownership and agency is a matter debate. As previously mentioned in the Introduction and Discussion sections above, ownership is elicited from mainly multisensory synchrony and agency from the match between intention and outcome. Each have mostly different independence mechanisms [33, 35, 55] but can affect each other under some circumstances [38, 56]. In this study, we did not directly investigate the relationship between ownership and agency because the aim of this study was to elucidate the independent effect of FBI manner on ownership and agency. Therefore, this relationship is unknown in this study. However, the reason for the score for the visuo-motor-tactile condition in ownership being higher than that for other conditions in the male group might have involved agency.

We speculate that our findings may also have been affected by the peripersonal space. The peripersonal space is the space immediately surrounding one’s body; it plays a special role in interaction, where one can perform an action such as grasping [57]. This space may be closely associated with bodily ownership illusion, such as the rubber hand illusion and FBI [58, 59]. Some studies suggested that ownership in the rubber hand illusion or FBI was elicited only in the peripersonal space [24,25,26,27]. Lloyd [61] showed that the rubber hand illusion was elicited only in the limits of the peripersonal space. Similarly, Guterstam et al. [62] showed that the magnetic touch illusion was similar to the rubber hand illusion, in that visuo-tactile integration occurred in the peripersonal space. In the same vein as the FBI using visuo-tactile synchrony, most studies showed that ownership could be elicited only in the peripersonal space [24,25,26,27]. On the other hand, the peripersonal space is not fixed. Studies have shown that motor behavior could widen the peripersonal space [37, 63, 64] and that this process was modulated by agency [36] and could be occurred for separated body parts in the actual body [37, 64]. In our study, the peripersonal space enlarged by agency might have affected the full-body illusion. Thus, visuo-tactile synchrony in the enlarged peripersonal space toward the virtual body presented in the third-person perspective might have affect ownership in the visuo-motor-tactile condition. Conversely, the peripersonal space may limit the surrounding physical body in the visuo-tactile condition; ownership was not elicited despite visuo-tactile synchrony.

5 Limitation and Future Work

There are a few limitations to this study. First, that the influence of gender difference on avatar ownership may be caused by the biological gender of the user seems clear. In addition of appearance characteristics of shape, face or color of skin, self-concept might be related ownership [65]. Therefore, the gender identity of the participant, rather than just identification with the avatar’s appearance, could be related to ownership. Thus, those who perceive themselves as more male would feel a stronger ownership towards a male avatar, and vice versa. Second, the reasons for which ownership and agency could influence training and rehabilitation in embodiment VR using third-person perspective were not investigated. In a task such as learning to use an avatar presented in the third person perspective, it is necessary to examine whether increasing ownership or agency could in turn enhance learning, if the participant feels that the avatar’s body feels like their own. Hülsmann et al. [66] demonstrated improved full body motor learning using an avatar presented in the third-person perspective while measuring ownership and agency to the avatar. However, they did not analyze their effects on learning. Third, the visuo-motor-tactile condition in this study was a goal-directed behavior: touching the ball. Similar to the visuo-motor-tactile condition, visuo-tactile and visuo-motor synchrony could be separated. For example, the condition of the virtual body could be moved and touched with the actual body at the same time, as in the setup of Lenggenhager et al. [23]. The effect of this difference in the visuo-motor-tactile condition on ownership should be investigated. It may be also necessary to compare the effect of a goal-directed behavior under the condition of not touching the object and under the condition of touching the object to examine the effect of a goal-directed behavior. These points should be investigated in future research using a third-person perspective in VR. Fourth, the participants, who were all Japanese, found the avatar and task to have low fidelity. The task used in this study is very simple (touching a ball ten 10 times). Thus, higher fidelity could make the task more ecologically valid and meaningful. With the involvement of movement and other more complex interactions with the environment, ownership and agency could be improved. Further research is needed to determine whether similar results could be obtained using a high-fidelity avatar and in more ecologically valid environments. Fifth, while in the pre-experiment questionnaire, twenty-nine participants answered that they had experienced using VR before, none of the volunteers were particularly familiar with it. However, all participants were young adults familiar with cyber communication technologies. Age could influence embodiment [67, 68]. Thus, future research should explore whether similar results could be obtained in different age groups. These points should be investigated in future research using a third-person perspective in VR.

Moreover, there are limitation about the accuracy of the questionnaire. Peck and Gonzalez-Franco (2021) [68] suggested a standardized avatar embodiment questionnaire including categories of Appearance, Response, Ownership, and Multi-Sensory, and sub-scale of agency. Using the questionnaire by Peck and Gonzalez-Franco (2021) [68] will be more appropriate for future study.

6 Conclusion

The main purpose of this study was to the investigate how to elicit ownership and agency for a virtual body presented in the third-person perspective. Our study revealed that ownership was elicited for the male virtual body presented in a third-person perspective by the synchronized condition of all visual, motor, and tactile multimodal feedback information in the male group. Moreover, our study revealed that agency was elicited for the male virtual body presented in the third-person perspective by the visuo-motor synchronized condition and that tactile feedback is not necessary to elicit agency.

Nevertheless, additional studies are required to explore the effect of gender match, the relationship between agency and ownership, and the use of the virtual body in VR.