Introduction

In the second half of the 19th century, soon after the invention of photography, photographers and scientists realized that it is possible to capture the movements of humans or animals in a sequence of still-frame pictures. The American photographer Eadweard Muybridge pioneered this approach. In 1878 he became world-famous for his series of photographs of a galloping horse. These images showed for the first time that there is a moment in the horse’s gait in which all four hoofs are in the air at once (Fig. 1). In the following years, the French physiologist Etienne-Jules Marey adapted this technique to study human movement. In order to capture purely the motions of the limbs, he clothed actors in black suites with white bands attached to the arms and legs and photographed them against a black background (Fig. 1, lower left panel). The sequence of still frames then gave him enough information to trace and analyze the movements of the body. About 100 years later, Gunnar Johansson [13] in Sweden discovered that such abstracted information is actually sufficient as a visual stimulus for perception of the movement of the actor when it is synthesized into a movie.

Fig. 1
figure 1

Top A sequence of static images of a galloping horse shot by Eadweard Muybridge in the late 19th century on a racetrack with a series of cameras taking pictures one after another. The images show in detail the stages of movement. Bottom left Image taken by the French physiologist Etienne-Jules Marey of an actor wearing a dark suit with reflecting stripes and dots. In this superimposed sequential exposure on a single image plate the flow of movements of the body can be visualized and analyzed. Bottom right A single frame from a typical point light biological motion stimulus. An actress walks towards the camera. Only the point lights at the head, shoulders, elbows, hands, hips, knees and ankles are visible

The stimulus that Johansson used subsequently came to be called the point-light display. It consisted of the motion of a small number of light points attached to the major joints of the body of an actor (Fig. 1, lower right panel). Johansson demonstrated that human observers can perceive highly complex features of human movement and action from this very impoverished visual stimulus. He called the ability to perceive the actor and its actions the perception of biological motion. Later studies showed that the movement of animals can also be perceived from point-light displays [19] and that observers are even able to recognize individuals or the gender of a person [7, 14].

This ability appeared astounding to many since the stimulus seemed so impoverished and devoid of almost all visual information about the actor. In Johansson’s studies, immediate perception of biological motion occurred only when the point-light stimulus was set in motion. A single image of a point-light figure was insufficient to elicit the perception of a human figure. Thus Johansson concluded that the information in a point-light display is carried mainly by the motion of the points over time. Since then, biological motion perception has often been regarded as a highly specialized form of motion analysis, i.e. a perception of form-from-motion. However, research on biological motion perception over the last 10 years or so has provided evidence for a rather different view, namely that biological motion is derived from the analysis of sequences of body postures. In this view, biological motion is motion-from-form processing. Psychophysical, physiological and computational studies support this view.

Form and motion in point-light displays

In point-light displays, a small number of light points are shown in a movie or computer animation (Fig. 2a). These light points represent the position and movement of the major joints of the human body. Johansson’s original displays were constructed by filming actors who had small light bulbs attached to their bodies. Later studies have sometimes used a computer program that simulates the joint movements of a walking human figure instead [6].

Fig. 2
figure 2

a Two frames of a point-light walker stimulus. The stimulus consists of 12 points attached to the major joints of the body. The grey lines are not part of the stimulus but serve here to illustrate the body structure. From one frame to the next the body posture changes in accordance with a forward step. b Two types of information in this stimulus: position (left) and motion (right) of the points. The position information is available in each single frame. The motion vectors result from the apparent motion of a point between subsequent frames. c Two frames of a point-light walker devoid of the motion signals depicted in b [2]. The points are distributed over the limbs in each frame but the distribution is randomized from one frame to the next. Thus, there is no basis for apparent motion signals in the direction of the limb movement

Point-light displays contain information about the position and the motion vectors of the joints (Fig. 2b). The motion information is directly specified by the change of position (apparent motion) of the light points over frames. Information about the form of the body, on the other hand, is largely removed because the outline of the body is not visible. Some limited form information is retained, however, in the positioning of the light points on the joints. In principle, a static image of a single frame from a point-light animation could provide enough information to estimate the body posture, if one knows how to connect the correct points with lines.

The percept generated by point-light biological motion encompasses both the form of a human figure and the motion of its limbs. It therefore involves both form and motion recognition. However, there are two routes by which the visual system might arrive at this percept. The first starts by computing motion vectors from the light points (right panel of Fig. 2b). The pattern of motion vectors is then analyzed and interpreted, perhaps in conjunction with knowledge or expectations about the form and movement of a human body. This motion-based approach considers biological motion perception a variant of form-from-motion perception.

The alternative route to biological motion perception assumes that the temporal evolution of the static form information provided by each single frame image may be used over time to integrate form cues across views. This approach starts by computing human form information from the positions of the light points in a static image (left panel of Fig. 2b). It then calculates the motion of body parts from changes in the form. In this form-based approach, biological motion perception is the recognition of dynamic form. I will call this motion-from-form perception.

In the motion-from-form approach, the visual input is first used to estimate the form and posture of the body. Motion is then derived from the changing body form. The motion-from-form approach does not require local motion vectors. Instead, it attempts to find positions of points on the body rather than the motion vectors of these points. This raises the question of whether point light displays contain enough information to support human form recognition without using motion vectors, and how this process might be implemented in the brain.

Point-light displays without motion

Is there enough information in the position of the points of point-light displays to support form analysis? If so, why did observers not recognize a moving figure from a static Johansson display? The first answer to these questions is that later research has shown that whether or not a static point light display is recognized as a human figure depends on the posture that is displayed [8, 26]. Postures in which the extremities are extended are more easily recognized than those in which the extremities are close to the body. Secondly, there are at least two possibilities as to why setting the display in motion gives more information than a single image. First, in a sequence of images form information provided by each single image can be accumulated over time. This means that a number of body postures are displayed over time and each provides the system with more constraints on the interpretation of the image series. Because each single image carries very little form information, such a temporal accumulation might be an essential requirement to see the walking figure. Secondly, even when position information is the primary cue that is used to recognize the figure, a sequence of images allows the observer to also estimate the action of the figure. Recognizing the action may be a fundamental part of the spontaneous recognition of biological motion in point-light displays.

The above argument illustrates a problem in investigating the contributions of motion and form to biological motion recognition. Because the point-light display contains both form and motion it is difficult to estimate the respective role of each. Beintema and Lappe [2] used a limited lifetime technique to create point-light stimuli in which the use of image motion information is reduced (Fig. 2c). These stimuli directly pitted motion and form information against each other. A small number of light points were placed on the outline of the body rather than on the joints. Each light point remained at its position on the body for only a limited time. Thereafter, the point was extinguished and a new one was created at a different position. In these stimuli, the form of the body is sampled over time more completely than in classic point-light stimuli. Each individual image, however, gives only very limited form information. The amount of form information can be adjusted by varying the number of dots displayed simultaneously. The amount of image motion information, on the other hand, can be adjusted by varying the lifetime of each dot. If a dot stays on the same position on the limb for two or more frames this dot generates an apparent motion signal. If the lifetime is restricted to only a single frame, no individual point creates apparent motion in the direction of the limb movement.

With these stimuli, biological motion recognition was possible even with a lifetime of one frame, i.e. in the absence of apparent motion signals of the limb movement [2]. This suggests that form information, although limited in any single image, can be exploited in a sequence of images. When point lifetime was increased, image motion signals were added at the cost of a slower sampling of body positions. In this case, performance in direction or coherence discrimination dropped. This suggests that position and form cues are more important in these tasks than image motion cues. Finally, these stimuli allow us to look at the role of the temporal integration of position information. We have argued above that a richer sample of position signals can be obtained from a sequence of images than from any single image alone. This temporal integration might be associated with the greater ability to spontaneously recognize a point-light display in motion than a still frame. However, in that case there are also motion signals that may contribute to the percept. In the stimuli of Beintema and Lappe [2], a richer sample of position signals may be generated in an image sequence without setting the figure in motion. This occurs when a single static posture of a human body is displayed with limited lifetime temporal sampling. In this condition, naive subjects were able to recognize the figure purely from position signals presented over time. This suggests that the temporal integration of position signals may be a viable mechanism for figure recognition and for a subsequent recognition of biological motion.

The observation that the human form can be recognized from point-light displays even when the figure remains static underlines the importance of the form information. However, this finding raises the question of how the motion of the body is recognized, and how the recognition of body motion can be investigated. Many psychophysical studies on biological motion perception have used direction discrimination to investigate perception. Subjects had to discriminate whether a walker was facing to the left or to the right. However, since this discrimination can be performed on a single static posture it does not truly test motion processing. Beintema and Lappe [3] suggested a discrimination between forward and backward walking instead, and compared performance in both discrimination tasks (and a third based on the structural coherence of the body) for a variety of presentation durations, point numbers and point lifetimes. They found that discrimination performance was mainly determined by the total number of points seen during stimulus presentation, and less by how many points were in any given frame, or how long each frame lasted, or whether the points moved consistently from frame to frame with lifetimes greater than one (Fig. 3). Moreover, the discrimination between forward and backward walking (body motion discrimination) required about twice the number of points than discrimination based on body form (facing and coherence discrimination). These findings are consistent with the idea that body motion is derived from body form processing and needs at least two sequential postures.

Fig. 3
figure 3

The figure combines data from several experiments with the limited lifetime stimulus (Fig. 2 c) of Beintema et al. [2, 3]. In these experiments observers had to recognize either the structural coherence of the body, the facing direction (left or right), or the motion direction (forward or backward walking) for various combinations of the stimulus parameters (number of points per frame, lifetime of each point and presentation duration of each frame). Each of the four panels shows the same data, but split into different curves depending on number of points, frame duration, lifetime and task. The x-axis in each panel gives the total number of points in each trial, i.e. the number of points per frame times the number of frames in the trial. The figure shows that performance is mostly determined by the total number of points per trial. For example, performance saturates at around 256 points per trial, irrespective of whether this number is reached by one point per frame and 256 frames at 10-ms frame duration, or by eight points per frame for 32 frames at 50-ms frame duration. This is consistent with the idea that facing direction and coherence rely on body posture, to which each point contributes the same amount of information. Only the split by task (lower right panel) shows a clear difference. The discrimination between forward and backward walking (motion) needs about twice as many points as the discrimination of facing and coherence. This is consistent with the idea that motion is determined from the analysis of body posture in (at least two) successive frames

Biological motion from sequential posture analysis

Beintema and Lappe [2] proposed that biological motion perception may be performed by an analysis of sequential posture information, obtained from position signals of points on the body. This could be done via dynamic form templates that accumulate the evidence for human form over time, while allowing for a dynamic change in the form of the body. Lange and Lappe [16] transformed this idea into a biologically plausible neurocomputational model (Fig. 4) that captures many of the psychophysical and physiological properties of biological motion perception [15, 16, 17, 18]. This model starts with a set of template cells that each represent a particular posture of the human body. Their activities are determined from the match of each single frame of the stimulus to the preferred posture of each neuron. As the stimulus moves and the body posture changes, a sequence of body posture cells is activated one after another. The estimation of the walking movement is then performed by neurons that respond specifically to one (forward) or the other (backward) sequence of activities. The discrimination of the facing direction of the stimulus (walking leftward or rightward), on the other hand, is performed directly on the body posture templates, by finding those templates that are most active for a given stimulus. Hence, the model proposes a two-stage recognition scheme, in which first the posture of the body and then the motion of the body are analyzed. This scheme predicts a neural representation of body posture in the brain and a neural representation of body motion.

Fig. 4
figure 4

The model of biological motion perception from sequential posture analysis [16] consists of two processing stages. The first stage contains neurons selective for the posture of the body seen from a particular view point. Each cell can be considered as a template for the body view in this posture. The cell responds according to the match of the stimulus dots with its template on a frame-by-frame basis. The discrimination of the facing direction (left or right) of the stimulus is computed by comparing the maximum template responses for each direction. The second stage computes the temporal sequence of activation of the template cells to generate selectivity for body motion. The discrimination between forward and backward walking is computed by comparing the activations for the two possible motion sequences. The selectivity to posture in the template cells relates to body form processing areas of the brain (extrastriate and fusiform body areas), and the selectivity to body motion relates to biological motion selectivity in the superior temporal sulcus

Body posture representations in the brain

In the human brain, two areas have been identified that respond selectively to visual images of the human body: the extrastriate body area (EBA) [9] and the fusiform body area (FBA) [24]. Selectivity for body form and posture has also recently been found in monkey temporal cortex [34]. These posture representations may form the basis for the recognition of biological motion. Indeed, in human fMRI studies with point-light walkers, the body-selective areas are activated also by point-light stimuli that convey only very limited information about body structure [4, 11, 33]. Moreover, using the limited lifetime technique described above, Michels et al. [21] found activations in body form-selective areas also for static postures of point-light stimuli. In the monkey, Vangeneugden et al. [34] showed that most temporal cortex neurons that responded to sequences of stick figures of a human body in walking motion actually responded to particular static postures within this sequence.

If biological motion perception is based on such posture-selective neurons in a two-stage process, then the posture representation should contain information about the facing direction of the body. Indeed, Vangeneugden et al. [34] found cells specific for particular facing directions and showed that a support vector machine classification analysis based on the temporal cortical population responses was very effective in discriminating facing direction. In humans, Michels et al. [22] provided evidence that different facing directions of point-light stimuli are represented in distinct patches in the fusiform gyrus.

Another way to show neural specificity to particular stimuli is via the aftereffect method. In an aftereffect experiment, a stimulus is shown for a long duration during which cells selective for the properties of this stimulus are fatigued. When a neutral stimulus is shown immediately afterwards the percept is often one of the opposite of the previously presented stimulus. This is taken as evidence that the original stimulus is encoded in a dedicated population of neurons. Theusner et al. [29] performed such an aftereffect experiment using a walker facing in one direction as adaptor and a superposition of two walkers facing in two directions as a neutral stimulus. The results showed that facing direction can be selectively adapted, confirming that the neural representation of walking contains facing-specific populations. Other aftereffect studies have shown further properties of point-light walkers that are coded in specific representations including gender [31] and heading direction [12] of the walker. Also walking direction (forward vs. backward walking) shows aftereffects such that, for example, prolonged viewing of a forward walking walker induces the percept of backward walking in subsequently shown ambiguous [29] or static [1] stimuli. This is particularly important for the mechanisms of biological motion perception from posture sequence analysis because the difference between forward and backward walking lies in the temporal order of the posture sequence, and thus allows us to investigate the second stage of the above model, the body motion level.

Body motion representations in the brain

Neuroimaging studies in humans have shown selectivity to biological motion in the superior temporal sulcus (STS) [4, 11, 33]. This is consistent with early studies in monkeys that showed selectivity to body motion and point-light walkers in the superior temporal polysensory area [23]. Besides STS, activation by biological motion stimuli has also been reported in the above-mentioned body areas, in premotor cortex, motion areas hMT+ and KO and in the cerebellum [25, 28, 33]. The STS has reciprocal connections to areas of the form pathway in ventral cortex and of the motion pathway in dorsal cortex. Input from the ventral body posture representations could, therefore, be used in the STS to analyze the temporal order of the posture sequence and determine body motion. Indeed, activation of human STS was found not only for classical point-light stimuli but also for point-light walkers devoid of local motion signals, for which body motion is available only from posture sequence analysis [21, 22]. Conversely, in a study that manipulated body form information by separating the limbs from one another while keeping their motion intact, activation in the STS was reduced, confirming that body form information was important for driving STS activation [30].

Different representations for body form and body motion were also identified in the monkey [34]. A subset of temporal cortex neurons responded to the sequence of body motion postures (i.e. body motion) stronger than to individual postures. These neurons were found predominantly in the upper bank of the STS, whereas the posture-selective neurons were more frequent in the lower bank.

Motion-from-form

The model presented here assumes a representation of body posture from neurons selective for body forms. Body motion then induces a temporal variation of activity in this “posture space”. Biological motion perception can then be performed by applying motion detection mechanisms to this “posturo-temporal” signal. The result is a biological motion detector that is based upon body form transformation, i.e. a motion-from-form pathway. This model is supported by many experimental findings from psychophysical studies, observations of aftereffects, neuroimaging studies and electrophysiological experiments in monkeys. In addition, there are reports of patients with deficits in general motion perception that can nonetheless recognize biological motion [20, 32]. All of this suggests that biological motion perception can progress via a first analysis of the form of the body and a subsequent analysis of the motion of the body from the change of the body form or posture over time. This constitutes a route to body motion perception that does not involve the regular motion pathway of the brain but rather a motion mechanism acting on top of form analysis. Whereas regular motion perception is based on the variation of the spatial distribution of luminance over time, the motion-from-form pathway to biological motion is based on the variation of posture over time.

This does not preclude, however, that the regular motion pathway also contributes to biological motion perception. For example, it may be that motion signals from the individual points of a point-light walker are combined into a complex motion pattern that signals biological motion (e.g. [10, 13]). Also, the motion trajectories of individual point lights, such as the feet, can convey particular aspects of biological motion perception, for example the facing direction [27], and support a general percept of animacy [5]. Biological motion may be too complex and multi-facetted a percept to be explained by any simple single mechanism. However, for the current understanding of the motion processing pathways in the brain it demonstrates a route to motion that bypasses regular luminance-based motion detection and instead works through the analysis of form changes signaled via body posture representations in the brain’s form pathway.