Introduction

Seeing and feeling a self-initiated movement is a strong indicator that the body part that we see moving belongs to us (Gallagher 2000; Tsakiris et al. 2005; Tsakiris 2010; Walsh et al. 2011). This sense of agency and the associated feeling of ownership derive from a correlation between the motor and proprioceptive signals and the visual feedback confirming the expected outcome (Tsakiris et al. 2005; Tsakiris 2010). Artificially created synchrony between seeing and feeling an event can trick us into feeling ownership over rubber hands (Botvinick and Cohen 1998; Tsakiris and Haggard 2005; Tsakiris et al. 2010) and even inanimate objects (Armel and Ramachandran 2003). Temporal synchrony alone, however, is not sufficient for knowing what is self. We usually experience our body and its movements from a first-person (egocentric) perspective, and from this, we build up an expected view of ourselves, constrained by anatomical limits and by how much of our body we can ever see. Here, we investigate the significance of visual perspective of self-generated movement on body ownership of views of the body that cannot normally be seen and for which we therefore have no chance to build up an expected view.

Matching the consequences of our actions with what we expect is an important part of controlling our actions. Information comes from proprioception and from an efferent copy of the motor command signal, both of which can be matched to visual feedback. The successful completion of this loop provides a basis for constructing and updating the perception of self. When a person views their hand from an anatomically plausible perspective, they are better at making laterality judgements (Parsons 1994; Fiorio et al. 2007; Dyde et al. 2011), self/other judgements (Conson et al. 2010), finger movements to targets (Sutter and Müsseler 2010), and at detecting multisensory asynchrony (Hoover and Harris 2012) than they are when the hand is seen in an anatomically implausible perspective. Furthermore, the effectiveness of body ownership illusions, such as the rubber hand illusion, is lessened if the rubber hand is not at least approximately aligned with the actual hand (Costantini and Haggard 2007; Holmes and Spence 2007). This suggests that a uniquely egocentric visual perspective is required to generate the feeling of ownership and self-identity. Is there an “anatomically plausible” perspective for body parts that cannot be seen directly? We consider the front and back of the head. Neither view can be seen directly, but we are familiar with the view of the front of the head because of our daily use of mirrors. Is this view adequate to provide a quantifiable sense of identifying the face in the mirror as our self?

The link between the mirror and the self is one that has been made since ancient times (Bartsch 2006) and the ability to identify the person in the mirror as oneself has been used as evidence that humans (and some non-human primates) demonstrate self-awareness (Gallup 1970; Bertenthal and Fischer 1978; Nielsen et al. 2006). The process by which one equates an image in the mirror to oneself involves prior experience in which the visual information seen in a mirror is correlated with other sensorimotor information about movement or tactile information experienced, for example, while shaving, combing one’s hair, or putting on makeup. This is, in essence, similar to how we use multisensory cues in creating the sense of ownership over other body parts but with one integral difference. The difference is that we must correlate these sensorimotor experiences with visual information seen now from an allocentric perspective. Therefore, interpreting the image in a mirror is a special case that combines both egocentric and allocentric perspectives.

Measuring the tolerance of temporal mismatch between visual feedback and efferent copy and proprioceptive information concerning self-generated movement can be used as a measure of body ownership (Daprati et al. 1997; Franck et al. 2001; Hoover and Harris 2012). When sensory information matches the expected self perspective, it provides a signature self-advantage in which the asynchrony can be detected about 40 ms sooner than when viewed from some other perspective. Using the self-advantage as a probe, we asked whether, when you see yourself in a mirror or from an unfamiliar viewpoint, do you truly attribute what you see as being yourself? Using a live video to which we could add delays, we had participants view movements of their hand and head as if seen from three different viewpoints: the direct view (hand movements viewed as if looking down at the hand), the mirror view (hand and head movements viewed as if looking in the mirror), and the behind view (the same hand and head movements as in the mirror but viewed as if from behind). Each of these five live videos was presented in the “natural” or expected perspective described above or with the video flipped around the horizontal, vertical, or both axes to simulate looking at the head or hand of another person; that is, to switch from an egocentric (self) to an allocentric (other) perspective. If the natural, unflipped view was regarded as “self,” then, following the logic of Hoover and Harris (2012), there should be a self-advantage in which smaller asynchronies can be detected than in the flipped views.

Materials and methods

Participants

Ten right-handed adults (seven females, three males), with a mean age of 29.8 (±5 SD) years, participated in this study. All participants took part in all five blocks of the experiment and gave their informed consent. The experiment was approved by the York University office of research ethics and followed the guidelines of the Declaration of Helsinki. Handedness was determined by an adapted version of the Edinburgh Handedness Inventory (Oldfield 1971).

Apparatus and camera viewpoints

For hand movements seen in the direct view, participants sat on an adjustable chair at a table with their head on a chin rest 50 cm away from a LCD display (HP Fv583AA 20″ widescreen monitor; 1600 × 900 pixels; 5 ms refresh response time) centered at eye level. They placed their hand on the table shielded by a black cloth. A PlayStation Eye camera (SCEI; resolution 640 × 480 @ 30 Hz) was mounted on the front of the chin rest pointing down at their hand (Camera A in Fig. 1). The camera was angled to approximately capture the view seen by participants looking down at their own hand.

Fig. 1
figure 1

Apparatus: participants sat on an adjustable chair at a table 50 cm from an LCD display centered at eye level. PlayStation Eye cameras (see text) were used. Camera A was mounted on the front of the chin rest and pointed down at the participant’s hand, Camera B was mounted on the LCD display pointed at the participant’s face, and Camera C was mounted on a post directly behind the participant pointed at the participant’s back. Insets show the view on the monitor for each camera

For the hand and head movements seen in the mirror view, participants sat 50 cm away from the display and were not restrained by a chin rest. The camera was mounted to the LCD display and angled to capture the view as if they were looking in a mirror (Camera B in Fig. 1).

For the hand and head movements seen in the behind view, participants sat 50 cm away from the display and were not restrained by the chin rest. The camera was mounted on a post positioned 40 cm directly behind the participants’ head (Camera C in Fig. 1).

Introducing a delay in the display

The video signal from the USB camera was fed into a computer (iMac 11.2, mid 2010), read by MATLAB (version R2009_b) and played through the LCD screen at either a minimal system delay, or with an added delay of between 33 and 264 ms. To calibrate the system, we had the camera view a flashing LED and compared the voltage across it with its appearance on the screen measured by a light sensitive diode. This revealed a system delay of 85 ms ± one-half camera refresh duration and confirmed the delay values we introduced by the software.

Movements

For all hand movements, participants performed a single flexion of the right index finger through approximately 2 cm both when their hand was on the table and held up by the side of their head. They made the movement as soon as they saw their hand on the screen in a given trial. Participants avoided touching the table, other fingers, or their face with their index finger during the movement so as to not introduce additional tactile cues. For the head movements, participants performed a single small roll of the head of approximately 5° to either the left or the right while looking straight ahead. To reduce between-subject differences in the speed and type of movement, all participants went through a 15-trial practice phase for each of the movements during which the experimenter observed and corrected movement.

Manipulating the visual perspective

In order to display the movements in the four perspectives, video images were flipped and delayed using the Psychophysics Toolbox extension of MATLAB subroutine PsychVideoDelayLoop. This program implemented a realtime video feedback loop in which the video images could be flipped about the horizontal, vertical, both, or neither axes. Delays were introduced in 33 ms increments (Brainard 1997; Pelli 1997) to match the rate of image capture of the camera. These manipulations are illustrated as inserts in Figs. 2 and 3.

Fig. 2
figure 2

Detecting an added delay to the visual feedback for hand movements viewed in the direct view (a), in the mirror view (b) and in the behind view (c). Mean proportion correct is plotted as a function of the imposed visual delay. The sigmoidal curves plotted through the data are for the natural (unflipped) perspective (solid dark lines and filled inverted triangles), with the video flipped around the vertical axis (solid light lines and filled circles), flipped around the horizontal axis (dashed dark lines and filled squares), and flipped around both axes (dashed light lines and filled triangles). Vertical lines represent the 75 % threshold, and the horizontal dashed line represents the 75 % criterion. Error bars represent SEM

Fig. 3
figure 3

Detecting an added delay to the visual feedback for head movements viewed in the mirror view (a) and in the behind view (b). Mean proportion correct is plotted as a function of the imposed visual delay. The sigmoidal curves plotted through the data are for the natural (unflipped) perspective (solid dark lines and filled inverted triangles), with the video flipped around the vertical axis (solid light lines and filled circles), flipped around the horizontal axis (dashed dark lines and filled squares), and flipped around both axes (dashed light lines and filled triangles). Vertical lines represent the 75 % threshold, and the horizontal dashed line represents the 75 % criterion. Error bars represent SEM

Procedure

To explore temporal synchrony detection, a two-interval forced choice paradigm was used. Each trial consisted of two 1 s periods separated by an inter-stimulus interval of 100 ms. One interval contained a minimal-delay presentation of the movement, while the other contained a delayed presentation. Which presentation was displayed first was chosen randomly. There were nine possible differences in visual delay between the two periods: 33, 66, 99, 132, 165, 198, 231, 264, and 297 ms (corresponded to an integral number of camera frames). Participants responded by means of foot pedals (Yamaha FC5). They kept their feet on the foot pedals for the entirety of the block and raised their left foot to indicate that the delay was in the first period or their right to indicate the second period.

Each of the five movement/viewpoint combinations was run in separate blocks. For each block, the nine visual delays were presented eight times for each of the four perspectives (flip conditions) in a random order resulting in a total of 9 × 8 × 4 = 288 trials. Blocks were broken down into 144 trial sessions, each lasting approximately 20 min. The ten sessions of 144 trials were run in a counterbalanced order separated by at least an hour.

Data analysis

To compare performance across conditions, we fitted a cumulative Gaussian to the proportion of times participants correctly chose the delayed period as a function of the delay using:

$$y = 0.5 + \frac{0.5}{{1 + {\text{e}}^{{\frac{{ - (x - x_{0} )}}{b}}} }}$$
(1)

where x is the delay, x 0 is the 75 % threshold value, and b is the standard deviation.

The statistical analysis comprised of repeated measures analyses of variances (ANOVAs) and paired samples t tests. For ANOVA tests, alpha was set at P < 0.05. All a priori multiple comparisons were performed using one-tailed Student’s T tests and corrected using the false discovery rate P values (Benjamini and Hochberg 1995).

Results

Figure 2 shows the proportion of times participants correctly identified the interval with the delay plotted as a function of the total delay (system delay plus added delay) averaged across ten participants for each condition. Psychometric functions are plotted through these average data. Mean thresholds are shown in Table 1.

Table 1 Mean thresholds and standard errors for the hand and head movements tested

Discriminating visual delays for hand movements

A one-way repeated measures ANOVA revealed a significant main effect of perspective when participants viewed their hand movements in the direct view (F (3,27) = 9.45, P = 0.01, η 2 p  = 0.51; see Fig. 2a). The threshold for detecting the delay was significantly lower for the natural (unflipped) perspective (M ± SE = 126 ± 7 ms) compared with the other perspectives (vertical axis flip M ± SE = 138 ± 8 ms, M ± SE difference = 12 ± 5 ms, P = 0.02; horizontal axis flip M ± SE = 148 ± 9 ms, M ± SE difference = 22 ± 7 ms, P = 0.01; and both axes flip M ± SE = 144 ± 8 ms, M ± SE difference = 18 ± 6 ms, P = 0.01) thus confirming the self-advantage (Hoover and Harris 2012).

Interestingly, the same effect of perspective was found when participants made hand movements while looking at their hand raised up beside their head in the mirror view (F (3,27) = 4.12, P = 0.02, η 2 p  = 0.31; see Fig. 2b). Again, the threshold was lower for the natural (unflipped) perspective (M ± SE = 141 ± 6 ms) compared with the other perspectives (vertical axis flip M ± SE = 173 ± 13 ms, M ± SE difference = 32 ± 11 ms, P = 0.03; horizontal axis flip M ± SE = 167 ± 14 ms, M ± SE difference = 26 ± 13 ms, P = 0.03; and both axes flip M ± SE = 184 ± 18 ms, M ± SE difference = 43 ± 18 ms, P = 0.03). That is, the view of the hand in the mirror was still associated with a perspective-dependent self-advantage. There was, however, no such self-advantage when participants viewed their hand movements in the behind view where thresholds for detecting asynchrony were similar across all perspectives (F (3,27) = 1.32, P = 0.29, η 2 p  = 0.13; see Fig. 2c).

Analysis of the standard deviations of the psychometric functions showed no significant differences between the perspectives for the direct and mirror views (F (3,28) = 1.11, P = 0.36 and F (3,28) = 0.88, P = 0.46, respectively). There was, however, a just-significant effect of perspective for the behind view (F (3,28) = 2.64, P = 0.05) where the vertical axis flip slope was considerably lower (M ± SE = 17 ± 4 ms; solid light curve in Fig. 2c) than the three other slopes (unflipped M ± SE = 36 ± 5 ms; horizontal axis flip M ± SE = 42 ± 8 ms; and both axes flip M ± SE = 41 ± 7 ms).

Discriminating visual delays for head movements

A one-way repeated measures ANOVA revealed a significant main effect of perspective when participants saw their head movements in the mirror view (F (3,27) = 3.26, P = 0.04, η 2 p  = 0.27; see Fig. 3a). The threshold for the natural (unflipped) perspective (that is, the view expected when looking in a mirror) was lower (M ± SE = 152 ± 7 ms) than for the other perspectives (vertical axis flip M ± SE = 182 ± 12 ms, M ± SE difference = 30 ± 9 ms, P = 0.03; horizontal axis flip M ± SE = 177 ± 15 ms, M ± SE difference = 25 ± 12 ms, P = 0.03; and both M ± SE = 190 ± 16 ms, M ± SE difference = 38 ± 16 ms, P = 0.03): the self-advantage prevailed. There was, however, no such self-advantage when participants saw their head movements in the behind view (F (3,27) = 0.50, P = 0.68, η 2 p  = 0.05; see Fig. 3b).Footnote 1

Analysis of the standard deviations of the psychometric functions showed no significant difference between perspectives for the mirror and behind views (F (3,28) = 1.393, P = 0.27 and F (3,28) = 0.81, P = 0.5, respectively).

Is temporal delay better detected for views of the body experienced most often?

We compared the performance at detecting delays in visual feedback for the natural (unflipped) perspectives (solid dark lines in Figs. 2a–c, 3a, b) for each movement and viewpoint. For hand movements, a one-way repeated measures ANOVA revealed a significant main effect of view (F (2,18) = 13.22, P < 0.001, η 2 p  = 0.6) with a linear trend between the natural, mirror, and behind views of the body (F (1,9) = 35.38, P < 0.001, η 2 p  = 0.8). The mean thresholds for the natural perspectives differed from one another (direct view was 15 ± 5 ms lower than the mirror view, P = 0.02; direct view was 34 ± 6 ms lower than the behind view, P < 0.001; and the mirror view was 19 ± 8 ms lower than the behind view, P = 0.02). Thresholds for detecting asynchrony between hand movement and visual feedback were lowest when the body was seen from the direct view. Analysis of the standard deviations of the psychometric functions showed no significant difference between the views (F (2,18) = 0.51, P = 0.61).

A similar story was found for head movements where a paired t test showed that threshold for the natural (unflipped) perspective was 20 ± 10 ms lower when the head movement was seen in the mirror view than when it was seen in the behind view (t (9) = −1.92, P = 0.04). Analysis of the standard deviations of the psychometric functions showed no significant difference between the views (t (9) = 0.36, P = 0.72).

Differences between hand and head movements

Repeated measures ANOVA’s revealed no significant difference between detecting delays for head or hand movements seen in the mirror view (F (1,9) = 1.93, P = 0.2, 0.18) or in the behind view (F (1,9) = 3.49, P = 0.10, 0.279), although the thresholds for the head movements ranged from 6 to 26 ms longer than the thresholds for the hand movements.

Discussion

This study demonstrates significant variation in the ability to detect a temporal asynchrony between a movement and its visual feedback depending on the perspective from which the movement was viewed (manipulated by the various flips of the video image we employed) and previous experience of the view. A self-advantage in detecting delays viewed from the natural perspective was evident only for the direct view (looking down at the hand) or indirectly (looking at the hand or head in the mirror), and not for view of the body that is never seen (looking at the hand or head from behind). We interpret the variation of performance across viewing perspectives as reflecting when the visual feedback matches the internal representation of the body and the view is thus recognized as being of the “self”.

There was no clear systematic variation in the slopes (standard deviations) of the psychometric functions with perspective for any of the viewpoints. In particular, the natural perspective was not associated with lower standard deviations than the other perspectives. Neither was there a systematic variation in the standard deviations between the views. This suggests that the difficulty of the task was equivalent in all conditions and that there was no variation in the reliability of the sensory information involved. Rather, the self-advantage results instead from the information needed to perform the task being available faster.

The self-advantage in detecting delay in visual feedback about a movement when it is viewed in the natural perspective is an objective measure of body ownership through agency (Hoover and Harris 2012). Variation in performance with perspective has also been found in self/other recognition of hands and feet (Saxe et al. 2006; Conson et al. 2010), when judging which hand is portrayed in a static image (Parsons 1994; Dyde et al. 2011), and for the rubber hand illusion (Costantini and Haggard 2007; Holmes and Spence 2007). So interpreting the self-advantage as indicating ownership is clear when looking directly at one’s own body. But why might a similar advantage be given to views of the face seen in a mirror?

Mirror viewing

Identifying the face in the mirror as being one’s own has long been regarded as an ultimate test of self-recognition. Thus, countless hours have been spent trying to get various species to indicate that they can recognize themselves in mirrors by, for example, seeing if they could remove tags that could only be seen in a mirror (Gallup 1970; Bertenthal and Fischer 1978; Nielsen et al. 2006). Of course, humans can perform this task with ease, but the present study is the first objective demonstration that the face in the mirror is given preferential treatment.

Mirrors allow us to match personal sensorimotor events with simultaneous visual information seen from an allocentric perspective. Since mirrors are used on a daily basis, it is likely that we create an internal representation of our face that combines egocentric and allocentric perspectives. Interestingly, performance at detecting delays while viewing the natural perspective in the mirror view (hand 141 ms; head 152 ms) falls between performance while viewing the direct (hand 126 ms) and the behind (hand 160 ms; head 172 ms) views in the natural perspective. This indicates that there is possibly less of a sense of ownership for the face in the mirror than for the hands. Faces are highly ecological and provide very strong cues to self-identification relying heavily on featural configuration. Thus, when a face is inverted, it is harder to identify changes in these configurations (Yin 1969; Leder and Bruce 2000; Thompson 1980). Our data suggest that when the face in the mirror is viewed upside down, the consequent reduction in performance is a quantifiable estimate of this reduction in identifiability and ownership.

Viewing invisible views

There was no self-advantage when detecting visual feedback asynchrony for head or hand movements viewed in the behind view. This suggests an absence of an internal visual representation of this viewpoint of our body—all such presentations could best be considered as “other.” Of course, we cannot see ourselves from behind, so it is unlikely that we would recognize the image as corresponding to ourselves. But what then does this mean for our sense that our back is part of our self? Recent studies have suggested a special connection between the front and back of the body (D’Amour and Harris 2014). Perhaps non-visible parts of the body are pinned to surfaces that are visible but this is not enough to provide a self-advantage.

Conclusion

We have examined three views of the body: the direct view of the hand, the mirror view of the face and hand, and the view from behind. We have demonstrated a progressively weaker effect of varying the visual perspective from which these views are seen in the ability to detect temporal asynchrony between self-initiated movements and visual feedback concerning the movement for these three views. We interpret this as indicating that body parts that can be seen directly are treated as more part of the self than other body parts. We conclude that the sense of self is linked with the sight of self.