For decades, researchers have developed techniques for the causal analysis of behavior, in which results are usually obtained from experiments by employing some artificial renditions of social partners. These experimental techniques have ranged from the use of dummies (Noble 1939; Tinbergen and Perdeck 1951), pictures and mirrors (Gallup and Capper 1970), animated robotic animals (Fernández-Juricic et al. 2006), audio (McGregor et al. 1992) and video playback (D’Eath 1998), and more recently to video playback of digitally created computer animations (Rosenthal 2000; Peters and Evans 2003a,b). The sophistication for the application of novel playback techniques to study animal interactions has progressed significantly over recent years with the common objective of precisely controlling the stimuli that are presented to focal subjects. The desire to deliver standardized visual stimuli and to limit the variable behavior of demonstrators across trials has strongly contributed to the development of novel playback methods to stage animal interactions. The democratized access to high-performance video editing and broadcasting software now allows researchers to produce realistic computer-generated animations instead of using video sequences of live-acting demonstrators. Computer-generated animations can be edited to produce a large range of behaviors with high degree of realism based on morphological appearance and behavioral patterns of real animals (Campbell et al. 2009). The increasing employment of computer-generated animations to simulate demonstrators in animal behavior studies reveals a call for an enhanced control of what an observer gets to see or experience. However, with technological limitations and constraints during the processes of elaboration and presentation of the artificial visual signals (e.g., limitations of the video equipment), care must be taken before ascertaining that the computer-generated animation approach is efficient. The success of this technique for eliciting natural behavioral responses from observers can be only validated by the consistency of the qualitative behavioral responses that are exhibited by focal individuals when in presence of either live or animated demonstrators.

The aim of this review is to provide an appraisal of the computer-generated animations efficiency to stage animal interactions by providing an exhaustive account of experimental studies in which video playbacks of computer-animated stimuli have been employed to simulate social patterns. We will focus on the representation of visual signals and how they are used to explore animal interactions. We will also present alternative methods that are used for designing animation models. We will attempt to evaluate the merits of computer-generated animations and how this technique may be more appropriate for certain types of staged interactions compared to other methods classically employed to simulate conspecific or heterospecific companions (i.e., still images, dummies, mirrors or even video playback and video altered sequences). We will try to unravel what computer-generated animations perform in such exclusive detail that is incapable in the use of other techniques. Finally, our review will allow us to establish that computer-generated animations may be one of the most effective methods to examine social interactions among species that are reliant on dynamic visual signals. Consequently, computer-generated animations have the application to perform features outside normal property ranges and also allow researchers to examine and isolate aspects of signal design that are critical for communication.

Early non-intrusive behavioral observations have given way to experiments that manipulate the interactions between live subjects. Historically, observational studies have provided researchers with an understanding of the natural history of many species, from insects to vertebrates. In particular, it has been of great interest to focus on specific social interactions between conspecifics that highlight important visual signals used in communication (Tinbergen 1960; Simpson 1968). However, to further our understanding of social animal interactions, we may need to experimentally manipulate the structure of signals. However, the experimental manipulation of these signals may prove challenging and often involve the use of simulated companions.

A large body of the published literature on social behavior has been based on experimental studies that involved direct manipulations of visual cues. For instance, Bischoff et al. (1985) examined the hypothesis that fantail length in the male guppy (Poecilia reticulata) was responsible for the selection preference of females. To test this, the length of tails of selected males was surgically shortened. Their results revealed that females had a preference for males with longer tails. Similarly, Basolo and Delaney (2001) suggested that swordtails (Poeciliidae) preferred members of the opposite sex that had lengthier ornamental tails. To test this assumption, Basolo and Delaney (2001) selectively altered the tail length by surgically removing the tail in order to decrease the length and implanting a longer tail to simulate a lengthier ornament. Males and females of the guayacón olmeca (Priapella olmecae) found the opposite sex more attractive with the addition of a tail. Surgical manipulation of morphological features is not limited to fish. Tokarz et al. (2003) surgically impaired male brown anoles (Anolis sagrei) from extending their dewlaps during staged contests in a study investigating dewlap function in a territoriality context.

The shift from invasive techniques has encouraged the utilization of more non-intrusive methods. Alternative methods to stage live interactions or physical manipulations of morphology have used artificial stimuli that still maintain key visual characteristics similar to real conspecifics. Stimuli that elicits behavioral patterns are commonly referred as a sign stimulus, but have been also termed as releasers (Lorenz 1937), social releasers (Tinbergen 1948), perceptual signs (Russell 1943), and releasing stimuli (Tinbergen and Perdeck 1951). In a review of social releasers, Tinbergen (1948) suggested that only the silhouette of a looming cardboard predator was sufficient to induce aversive behavior from other bird species. Despite the lack of color or the modification in shapes, he demonstrated that the presence of key visual characteristics of the stimulus releaser is required for the signal to be effective. Therefore, if the key visual features are present, those characteristics have been sufficient to elicit social responses.

Staging animal interactions: limitations to all employed techniques

Computer-generated animations share a series of constraints and limits with other approaches that have been utilized to simulate companions (e.g., still images and photographs, mirrors, dummies, robots, video playbacks or video-altered sequences). Regardless of the method of employment, the implementation of each method to stage social encounters encompasses a number of potential limitations to its application. When constructing visual stimuli used in experiments for staged interactions, researchers need to take into consideration all the possible limitations in the design characteristics of their stimulus. All models must consider the following parameters when designing them for playback: apparent size, texture, morphology and shape, color, brightness, flicker frequency, depth, movement, perception of background, contingency, and range of properties (presented in Table 1).

Table 1 Potential limitations of the different techniques in testing animal interactions. Denotation given to identify technical ability to successful meet limitation criteria: +, able to fully replicate visual feature; (+), mostly replicates visual characteristic with some constraint limited by technique or model; +/−, replication depends upon circumstance and model characteristic; (−), distorts visual feature, but still provides some comparative cue for discrimination; –, unable to replicate visual characteristic

Apparent size is commonly defined as a stimulus property in which an individual may gauge the overall size of an object in comparison to its own physical dimensions and the relative size of objects in its visual field (O’Brien et al. 1976; Peters 1983). Individuals discriminate texture based on their ability to recognize and differentiate between changes regarding the contour of objects (Marr and Nishihara 1978). An organism’s ability to distinguish morphological features is governed by viewer-centered axis specifications and capacity to gauge the changes in shape based on differentiating between points on an innate coordinate scale (Marr and Nishihara 1978; Holling 1992). It is well known that animals do not perceive color in a same manner as humans. Color perception is ultimately determined by retinal physiology and the reception of particular color photons from artificial and natural light sources (Endler 1990). Brightness is determined by the natural illumination from sunlight on the model, or if an artificial source produces light, it is dependent on the type of spectral features emitted by this source (Endler 1990). Flicker frequency is measured by the constancy of a flashing light until either the light or image is either perceived as constant or it appears to stop flashing. As the light flashes at a fast rate, the visual system may be unable to detect changes between interval flashings. In contrast, if the light is flashing too slowly, it might also appear to be undetectable (Lythgoe 1979). Depth cues provide information about the distance of an object in relation to other environmental features (Zeil 2000). In order to perceive movement, the visual system must process the change in positional coordinates of an object situated at one point compared to another position over a period of time (Desimone and Duncan 1995). In order to resolve focal images, an individual must be able to perceive the differences in the spectral properties of local objects in relation to the background in which it is presented (Sekular and Blake 1994). Contingency cues are information that can be acquired from previous exposure to a stimulus and thus predict the next pattern of occurrence (Gallup et al. 2002). Finally, the range of properties is the characteristics of design stimulus features outside its current limitations (Rosenthal 2000). Each potential limitation constrains the use and overall appropriateness of each method for social interaction experiments.

From video-altered video to computer-generated animations

The development of computer-generated animations progressed from basic video sequences to the manipulation of computer-altered stimuli. One of the early studies to experiment with this technique was by Clark and Uetz (1992), who digitized male morphs of jumping spiders (Maevia inclemens) to match size, morphology, and movement. Clark and Uetz (1992) were able to rearrange the placement and speed of the video sequences and synchronize the stimulus to match the movements of real males. To selectively show Anolis lizard displays from conspecifics to subjects, Macedonia and Stamps (1994) used a frame overlaying sequence to create a mask on each frame, allowing only certain images on the frame to be visible to the test subject. Using sticklebacks (Gasterosteus aculeatus), Rowland (1995) manipulated the courtship tempo by increasing or decreasing the speed of the video sequences in order to simulate male courtship vigor. Rowland (1995) found that females preferred a male whose display was more vigorous. Color manipulations with sticklebacks also found that there are selective preferences for members of the opposite sex: Males preferred females with a brighter red spot on the abdominal region (Bolyard and Rowland 1996), and females preferred more colorful males (Rowland et al. 1995a). In a study that examined sexual selection, Rosenthal and Evans (1998) manipulated both the apparent size and swordtail length of green swordtails (Xiphophorus helleri). Females preferred males that were larger in apparent size (25% larger), and that sword preference, when isolated, was abolished when compared to larger representative mates. Kodric-Brown and Nicoletto (2001a,b) found, using a video-altered technique, that female guppies (P. reticulata) were more attracted to males that expressed carotenoid (orange) pigmentation and had high display rates, as indicators of both dynamic and static traits in which to assess male quality. In an experiment dedicated to unravel the mechanisms triggering the classical group size effect in foraging groups of ground feeding nutmeg mannikins (Lonchura punctulata), Rieucau and Giraldeau (2009a) edited video playbacks of groups constituted of either companion birds always vigilant or at the opposite companion birds that always feed without exhibiting any vigilant behavior. To create vigilant non-foraging companions, they edited out all head-down posture images and inversely to create non-vigilant foragers all images where the head of birds pointed up were removed, and as a result, the birds on the video sequences never appeared to raise their head. Thus, even though computer-altered techniques seem to be an effective means to examine communicative interactions, they are still limited in their overall range and necessitate a prior video capture of crucial visual information.

Methods of designing animation models

The design of video animated stimuli encompasses a number of approaches in which to build the model. Early design of animal models was based on the human perceptual assessment of size and shape; however, these models did not provide any morphological support for the basis of the animation. Most animation designs begin with filming displaying individuals and then extracting the morphology, texture, and movement from recorded characteristics. Although outdated, some early software and hardware combinations allowed for sophisticatedly staged interactions, thus paving the way for continued improvement on stimulus design characteristics. Clark and Uetz (1992) first used a method to capture each video frame of a displaying jumping spider and then digitized each consecutive frame. Consequently, they also employed a basic manipulation of the animation. In further refinement, Clark and Uetz (1993) were able to generate a crude jumping spider animation that was fully functional for performing courtship displays and eliciting responses from females. Allen and Nicoletto (1997) used a similar approach in designing an animated fighting fish (Betta splendens). In their design, Allen and Nicoletto (1997) also used the frame-grabbing technique and also modified the color. However, unlike recording displays onto video and using those frames for rendering, Evans and Marler (1992) recreated a raptor-like shape (black silhouette on a white background) that flew over chickens in a cage. In this case, the raptor shape was designed using video-effects software.

Künzler and Bakker (1998) provided an outline of stimulus design and applications for experiments in mate choice where a male three-spined stickleback (G. aculeatus) was fixed, sectioned into 23 slices of 1-mm thick coronal cross-sections, and then cast with epoxy resin. All sections were scanned photographed and joined in a 3D animation program. In a similar approach, Rosenthal (2000) created a stimulus model by preparing segmented coronal sections of zebra fish (genus Danio) and tracing these cross-sections to outline the basic morphological features. The measurements acquired from this method were later used in later mate choice experiments (Turnell et al. 2003; Rosenthal and Ryan 2005). Künzler and Bakker (1998) also suggested that magnetic resonance imagining (MRI) could be employed to scan several sections of the animal and recombined the portions to create a single model. An alternative to using MRI, models may be scanned using digital photography and corresponding software to collate dimensional data into a single polygon mesh. In this manner, the application of a 3D scan on a taxidermic specimen incorporates sophisticated digital technology without having to dissect an animal. Woo (2007) had a taxidermic Jacky dragon (Amphibolurus muricatus) digitally scanned with Konica Minolta VI-9i (Konica Minolta Holdings, Inc., Japan) by New Dawn® (Bexley North, NSW Australia) (Fig. 1). The Konica Minolta VI-9i uses 3D algorithm software to link photographed segments together. The segments were then forged together into a single polygon mesh using Raindrop Geomagic® software (Raindrop Geomagic, Inc., Research Triangle Park, NC, USA).

Fig. 1
figure 1

The Jacky dragon model: from a taxidermic model lizard tridimensional scan used for our animation was acquired (a). The model was scanned using a Konica Minolta VI-9i (Konica Minolta Holdings, Inc., Japan) by New Dawn® (Bexley North, NSW Australia), and the shape consisted of approximately 50,000 polygons. The data were collected with Raindrop Geomagic software that collated scanned surfaces of the object into a single polygon mesh. Using an animation program (Lightwave® 3D v8, NewTek Inc., San Antonio, TX, USA), we created a realistic model and we then incorporated bones inside the model using Lightwave® 3D Modeler program. These virtual bones acted as the basic skeletal structure (b). We closely matched the number of bones to actual bones of a real Jacky lizard, allowing changes in posture and position of our model. We then added a weight shade to regional body parts to balance the overall movement of the model (c). We acquired digital photographs using a 12.8 megapixel Canon EOS 5D camera (Canon Inc., Tokyo, Japan) from a live lizard in three positions (dorsal, orthogonal, and ventral) and three angles (anterior, central, and posterior) to capture realistic texture (d). The texture was extracted from the photographs while maintaining the silhouette of the animal using Adobe® Photoshop® Elements v3.0 (Adobe Systems Incorporated). In Lightwave© 3D Modeler, an atlas UV map was built; this procedure allows to separate polygons into UV coordinates that are used to match areas of texture onto the model. A UV map acts as a textural blueprint of the UV coordinates that correspond to the model. We then extracted corresponding texture regions on a layer that matched UV coordinates, and placed them onto the UV map (e)

In order to generate credible reproduction of motion in animals, researchers needed to be able to objectively capture realistic movement patterns (Woo and Rieucau 2008). Like videos, animations must also recreate biologically relevant motion patterns (i.e., as in human motion, Dekeyerser et al. 2002). Watanabe and Troje (2006) created an animated pigeon using biological motion. This involved a series of high-speed cameras that track reflective markers placed on the pigeon and then using computer software to automatically record the movement patterns. The points are then matched to complimentary sites on the animation to produce realistic movement. Gatesy et al. (1999) used another technique called “rotoscoping” in which they modeled the track mark impressions of therapod dinosaurs (Coelophysis bauri) from movement features similar to ground-dwelling birds, such as turkeys (Meleagris gallopavo) and helmeted guineafowl (Numida meleagris). Adult turkeys and guineafowl were videotaped in lateral and anterior/posterior angles as they walked across clay-rich and sandy soil. Therapod movements were then reconstructed by modeling the position and angles of bird movement matched to the video footage. Similarly, this method was used to recreate with a high degree of resemblance Jacky dragon’s (A. muricatus) behavioral motion patterns when exploring territoriality (Van Dyk and Evans 2008), signal detection and motion sensibility (Woo 2008; Woo et al. 2009; Woo and Rieucau, in review) (Fig. 2), caudal luring movement of a sympatric predator, and death adder (Acanthophis antarcticus) (Nelson et al. 2010). This technique has also been employed by Smith et al. (2008) to investigate the function of wattle movements in male junglefowls (Gallus gallus). Thus, the rotoscoping method appears as an effective means for developing accurate animations that match a large range of movements with a high degree of realism.

Fig. 2
figure 2

The rotoscoping technique: the lizard model was first imported in Lightwave® 3D layout to create scenes that matched stimuli acquired from a video of a real acting lizard. Individual images from the image sequence, starting with the first JPEG, were imported into the background of the sequence. We then manipulated bones within the animation to match the object to the postural position of the digital photograph in the background. The animation was manipulated to match the background image; the frame was keyframed to maintain the positional displacement of the object. The background was then removed and replaced with the next image in the sequence. The animation was then matched to the next position and keyframed. The procedure was repeated in a frame-by-frame process until the display was complete

One additional challenge that researchers may face when designing animations is to insure that the animation models will conserve strict species-specific presentation of coloration patterns that would be adequately perceived by intended receivers. Texture and color patterns can be incorporated into animations using digital photographs of animal models taken from various positions. The use of high resolution digital photographs has been recently employed to explore animal coloration. Researchers have now the possibility to map the output of the digital camera to the receiver’s visual sensitivity (Endler and Mielke 2005; Stevens et al. 2007; Pike 2011). However, this process requires a prior knowledge of the spectral sensitivity characteristics of the device that is utilized, as well to the characteristics of the visual system (e.g., type, abundance, density, and sensitivity picks of photoreceptors) of the species studied. We strongly encourage future animation developers to refer to recommendations made by Stevens et al. (2007) and Pike (2011) to tackle this issue. Caution must also be taken to calibrate the monitor employed to present the colored stimuli. In a recent mate choice experiment that explored the role of ventral coloration of female Pelvicachromis taeniatus, Baldauf et al. (2011) developed computer animations of digital photographs of females that differ in the characteristic of their colored ventral ornamentation. Using a digital colorimeter, Baldauf et al. (2011) calibrated the monitor used to present the animations in a way that matched an average spectral reflectance of the colored ventral area measured beforehand on several females. Therefore, this method allows to test whether animal receivers would perceive the colored information of any computer-generated animations as similar to natural coloration patterns of conspecifics. The implementation of approaches meant to measure precisely animal coloration will be a major improvement of the computer-generated technique.

Animation addresses potential limitations

Animation playback shares similar visual design features as in other approaches to simulate social partners: photographs, dummies, mirrors, robotized dummies, or video playback. Each technique can manage to maintain apparent size, texture, and morphology. In comparison, Morris et al. (2005) varied the apparent size and symmetry of vertical bars in swordtail fish (Xiphophorus cortezi and Xiphophorus malinche) and found that females preferred larger and more asymmetrical males. In altering textural features, Rosenthal and Ryan (2005) modified the number of stripes on semi-striped (Danio rerio), striped (Danio nigrofasciatus), and stripeless (Danio albolineatus) zebrafish and found that members of corresponding groups congregated with synthetic conspecifics that expressed similar striped features. In a similar study that explored shoaling preference in zebrafish, Saverino and Gerlai (2008) reported that zebrafish preferred to shoal with computer-animated wild-type-like conspecifics than with altered computer-animated conspecifics (color, stripe patterns, or body shape). In the modification of morphological characteristics, Wong and Rosenthal (2006) investigated the evolution of sexually selected traits in the swordtail fish X. birchmanni. Using synthetic animated swordtails, Wong and Rosenthal (2006) showed that females preferred males without the sword ornament, and this preference for unsworded males may have been a selective force on the evolution for the absence of the tail in this species. In a similar opponent assessment study, Allen and Nicoletto (1997) produced computer-generated animations of males Siamese fighting fish (B. splendens) that had elaborated dorsal, caudal, anal, and pectoral fins. Subject males oriented more often to the animations when the simulated opponents had elaborated features than when they were short finned. Allen and Nicoletto (1997) gave a functional interpretation of their results by suggesting that the length of male Siamese fighting fish fins may act as a competitive ability cue. The elaboration of ornaments has also been used to examine fluctuating asymmetry and mate selection (Mazzi et al. 2003) and assessment of fitness (Mazzi et al. 2004) of potential partners in sticklebacks. Smith et al. (2008) found that wattles of male junglefowls (G. gallus) play an important role in signal efficacy. To reach this conclusion, they conducted a playback experiment using computer-generated animations of a male G. gallus with different wattle motion characteristics varying from the absence of wattle, normal movement identical to real males, static wattle, and exaggeration of the wattle motion amplitude. Harland and Jackson (2002) examined prey recognition (Jacksonoides queenslandicus) by the jumping spider (Portia fimbriata). By manipulating the size and the location of the anterior medial eyes in J. queenslandicus, P. fimbriata was found to lack the ability to recognize this particular prey item, suggesting that eye features are critical for species recognition. Textural cues have also been identified as a signal for discrimination. Cook et al. (1997) found that pigeons were capable of discriminating between dynamic changes in color texture stimuli using a computer monitor. As a consequence, it can be argued that the ability to discriminate computerized textural stimuli has ecological implications (Cook 2000), such as the recognition of texture on fast moving conspecifics, mates, predators, or prey.

However, the same apprehensions regarding the inability for classical video playback to accurately represent spectral color, background contrast, flicker frequency, and depth also apply for animation design. In particular, color has been difficult to address, despite approaches to recreate natural color and texture in models.

Initially, concerns were raised when researchers began to use video playback as a technique to explore animal visual communication (Endler 1990; D’Eath 1998; Fleishman et al. 1998; Fleishman and Endler 2000; Zeil 2000). These concerns unequivocally argued that video monitors were originally tuned to stimulate the human visual system. Critics of the video playback technique have foremost cited the differences between animal and human visual systems. First, both Endler (1990) and D’Eath (1998) clearly pointed out that color perception in animals varies from humans mainly due to the physiology of the retina, and the number and distribution of rods and cones in vertebrates, or reticular cells in invertebrates. Furthermore, the spectral output from video monitors is distributed by tiny pixels, each consisting of three types of phosphors: red, green, and blue (Fleishman and Endler 2000). The spectral output of these phosphors has been shown to have different relative wavelengths, despite perceptual similarities when viewed by the human visual system (Bennett and Cuthill 1994; Fleishman et al. 1998). In particular, the sensitivity of avian species to color and UV is considerably different than that of humans, where the color rendition from video is calibrated for human perception (Cuthill et al. 2000). Secondly, motion resolution (i.e., critical flicker fusion and refreshing rates of video monitors) is varied across visual systems (D’Eath 1998). Video playbacks have been used across two different standards based upon the trichromatic color system: the American National Television Systems Committee (NTSC, 29.97 frames per second) and the European standard of phase alternation line system (PAL, 25 frames per second). However, animals quite clearly have varying flicker frequency rates. For instance, starlings (Sturnus vulgaris) have high upper flicker fusion threshold >100 Hz (Maddocks et al. 2001), but the cane toad (Bufo marinus) has rather poor upper flicker fusion threshold of ~6.7 Hz (Nowak and Green 1983). In contrast, the human upper flicker fusion frequency threshold is 55–60 Hz, which is dependent on contrast modulation and mean luminance (Levinson 1968). Thirdly, video playback technique has been suggested by Zeil (2000) to inaccurately provide depth cues that allow animals to gauge the distance of proximal objects in relation to environmental features. Zeil (2000) further suggested that depth cues derived from binocular stereopsis, accommodation, and motion parallax are lost when the animal is free to move about the 3D environment, while the video images are constrained by the stationary monitor and its size. Fleishman and Endler (2000) also suggested that video playback neglects to incorporate aspects of natural illumination, such as shadows, reflections, and polarization that are used by individuals to receive information; thus, the lack of these cues may influence visual perception. In addition, the inability to provide natural illumination and color may limit the ability to discriminate texture cues (Fleishman and Endler 2000). Animals will be unable to accurately perceive signals from the background due to the absence of natural illumination (D’Eath 1998; Fleishman et al. 1998; Fleishman and Endler 2000). Video playback experiments can match the visual characteristics, and test recognition of communicative signals based on apparent size (Clark and Uetz 1990), morphology (Van Dyk and Evans 2007), and movement (Rosenthal et al. 1996).

Critics of video playback have identified four main issues with the technique. First, there is the differentiation between the flicker-fusion rates of various animals and their ability to perceive motion (D’Eath 1998). However, the video playback method has been successfully demonstrated using both NTSC (Evans and Marler 1992) and PAL (Ord et al. 2002; Ord and Evans 2003). Secondly, it has been suggested that televised images do not accurately represent spectral sensitivity in all species (Fleishman and Endler 2000). In the studies by Rowland et al. (1995b) and Bolyard and Rowland (1996), they both manipulated the coloration on the female stickleback (G. aculeatus) and found that males will court females that had a brighter red spot. Although color perception can be altered and masked using filters (McDonald et al. 1995), it can be recognized that visual systems of animals are physiologically different (Oliveira et al. 2000), but some species can still show a discrimination of color cues on televised media (Bolyard and Rowland 1996). Thirdly, it has been argued that video playback distorts visual cues because it lacks natural luminescence and relies on artificial output from monitors (D’Eath 1998; Fleishman and Endler 2000). However, video playback has successfully been conducted in the field (Clark et al. 1997; Burford et al. 2000), which demonstrates that video playback is not only restricted to laboratory settings, and that animals are able to respond, despite differences between natural and artificial illumination. Fourthly, both depth cues and the perception of background contrast have been suggested as being misrepresented in video media (Zeil 2000). Fleishman et al. (1998) demonstrated, with the aid of a rotating drum, the ability to determine the sensitivity of A. carolinensis to visual motion in background noise. Peters and Evans (2003a) demonstrated that the social signals of the Jacky lizard are designed to be conspicuous across native habitat and the complexity of windblown vegetation. With regards to depth and background representation, it can be acknowledged that there is a physical constraint on these features that is limited by the dimensions of the monitor (Oliveira et al. 2000). When video playbacks are meant to simulate several individuals, depth cues normally associated with a group cannot be directly transmitted due to the two-dimensionality of video images. As a result, different spatial position of individuals on screen could be perceived as animals of different absolute sizes. Rieucau and Giraldeau (2009a,b) addressed this issue by creating video sequences of companion birds forced to stand aligned on the same plane of a linear apparatus facing the video camera; consequently, all the birds were videotaped at the same distance to the lens to ensure that they appeared life sized after that in the video sequences.

All video playback studies have demonstrated that the movement of individuals and their displays has been the critical key feature to elicit social responses. Ord and Evans (2002) had developed an interactive approach to examine opponent assessment in Jacky lizards. In their study, they have selectively chosen the type of response by a video male lizard (i.e., aggressive or submissive) that was displayed in response to the live male. This technique provided a method of using potential contingency cues without interruption in the video sequence. However, it also highlighted a major deficit in standard video playback technique; the video sequence may not only be computer altered but also breaks up the continuity of the scene. A major constraint in standard video playbacks is the ability to be manipulated outside the current range of properties, and this has led to the development of one more technique that builds upon pre-existing methods (Oliveira et al. 2000).

In most video playback studies to date, stimuli have used biologically arbitrary colors (Rosenthal 1999). McDonald et al. (1995) modified the perception of video images by placing filters in front of the screen. Kingston et al. (2003) also applied filters to monochromatic animations in pygmy swordtails (X. pygmaeus). Using blue and gold color filters, females had shown a strong aversion for gold males, but had a strong preference for blue males, which suggested a preference for inconspicuous individuals. To investigate signal design, Peters and Evans (2003b) used an animation program (Lightwave® 3D) to create a single tail that was superimposed onto natural vegetation movement. They examined the conspicuousness of an assertive tail-flick display in the Jacky lizard in native environmental motion and found that this signal was highly conspicuous against natural windblown vegetation and that duration was a critical design characteristic. Like video playback, animation must also be rendered for NTSC and PAL dimensions (see detailed review of technical limitations in Baldauf et al. 2008). The ability to respond to animation stimuli has also been successfully demonstrated for NTSC (Turnell et al. 2003) and PAL (Peters and Evans 2003b; Zbinden et al. 2003). Although televised media is constrained by the overall size of the monitor, the examination of depth perception is not exclusive to only live interactions. Peters and Evans (2007) had created a complete synthetic computer-generated lizard and rotoscoped three social displays (tail-flick, push-up body rock, and slow arm wave) of the Jacky lizard from archival video footage (Ord and Evans 2002; Ord et al. 2002; Van Dyk and Evans 2007) to be represented at different distances (1, 3, 9, and 21 m) while imbedded against stimulated natural windblown vegetation. In this study of active space in visual noise, Peters and Evans (2007) found that these social displays were still conspicuous up to a scaled 21 m in distance, despite a massive reduction in orientation to the onset of a display between 9 and 21 m. These results suggested that animations could nevertheless simulate changes in depth. Furthermore, pigeons (Columba livia) have demonstrated the capacity to discriminate objects using depth cues (Cook and Katz 1999). Cook et al. (2001) showed that pigeons were able to discriminate between the depth and movement features of 3D objects. Katz and Cook (2000) also suggested that the ability to use the visual features, such as motion and depth, could aid in the formation of generalized categories that simulate realistic situations.

The development of suitable experimental designs involving playback techniques with the aim of accepting or rejecting the hypothesis on the effect of a given stimulus or combination of stimuli commonly intersects with the important and recurrent statistical issue of pseudoreplication. Hurlbert (1984) first pointed out the widespread nature of this issue in designs of field experiments in ecology. As a result of this revelation, pseudoreplication became the major focus of the debate surrounding the use of playbacks in animal behavior studies. In this scope, pseudoreplication generally arises from the use of repeated presentations of any unique sequence, whatever the modality, to focal subjects meant to perform replication instead of using proper independent replicates (randomly selected exemplars from a bank of sequences). As a consequence, the experimental design and the further statistical analysis become inappropriate for the hypothesis that is tested. Video-altered playbacks may also fall into the same statistical problem if alterations only occurred on a single video sequence that will serve throughout the repeated presentation of the modified stimulus. Again, one may recommend the use of different independent exemplars where only the specific stimulus under consideration (morphology, motion pattern, or number of tutors) has been manipulated beforehand in each exemplar. Conversely, the computer-generated animation technique involves the de novo development of artificial stimuli synthesized from the mean values of a trait or series of traits sampled from a population (Rosenthal 1999; McGregor et al. 1992). Then, the creation and the utilization of average artificial stimuli created from population samples allow to appropriately evaluate the specific effect of a particular set of stimuli on the behavioral responses of observers. The computer-animated stimuli technique offers an effective alternative to resolve the critical issue of pseudoreplication in playback experiments.

Advantages of using animations

What can computer-generated animation perform in such exclusive detail that is incapable in the use of other techniques? First, computer animation, to the extent that video and robot models may have the same capacity, recruits prolonged attention from the individual toward the stimulus, as opposed to static representation (i.e., still images or models) or the lack of dynamic motion (i.e., mirrors). Shimizu (1998) argued that video images would recruit more social behavior because of the continuous and uninterrupted nature of the stimulus. Secondly, prolonged exposure to video and animation media allows subjects to assess the simulated conspecific. Static models and images are once again constrained by their ability to retain attention. However, dynamic visual stimuli may have motion properties that provide more information than stationary images. For example, Van Dyk and Evans (2008) had developed a similar interactive protocol to Ord and Evans (2002), with the exception that the use of animation prevented Jacky lizards from acquiring cues from visual components other than motion. Van Dyk and Evans (2008) also developed a synthetic lizard animation able to perform either an aggressive or submissive display in response to subject behavior. In studies where mate choice (Künzler and Bakker 2001), opponent assessment (Van Dyk and Evans 2008), or social recognition (Shashar et al. 2005) are investigated, animation could therefore provide an efficient tool to examine the exchange of potential signals or preference of traits, such as size (Morris et al. 2005), morphology (Wong and Rosenthal 2006), or symmetry (Mazzi et al. 2003; Mazzi et al. 2004).

The use of this technique thus accomplishes four critical functions: (1) to prevent habituation to the stimulus due to motion properties, (2) to retain subject attention, (3) to continue elicitation of social responses, and (4) maintain continuity in signal design. As static images simply do not have any motion cues, animation stimuli have a continuous stream of different motion patterns.

The critical feature that separates animation from video playback and all other techniques is the range of properties that can be applied to the actual stimulus. The major benefit for employing this technique is total control over the manipulation of movement properties. The animation technique places no constraint on movement features, as all computer-animation programs can design animations with any range of movement. In particular, there are five types of ranges in which animation can control for: (1) stimulus consistency, (2) supernormality, (3) control of visual characteristics, (4) isolation of movement feature, and (5) unusual motion control. Van Dyk and Evans (2008) have demonstrated stimulus consistency in that the transition from aggressive and submissive displays does not undergo another computer alteration of the sequence since they programmed the model to return to its original position. Although supernormality can also be computer altered, as seen shown in Rowland (1989) with sticklebacks, animation allows for the control for apparent size effects. In a mate choice study examining the preference of the sword ornament in pygmy swordtails (X. nigrensis) using synthetic animations, Rosenthal et al. (2002) found that females failed to show a preference for males with normal swords, swords that were naturally larger, or swords that were abnormally large. Animation programs also allow for the control of surface features. In addition to manipulating texture (Rosenthal and Ryan 2005) and basic morphology (Morris et al. 2005), animation has the ability to alter shape and texture. Evans and Marler (1992) developed a computer-generated simulated hawk to fly overhead domestic chickens in an experiment examining anti-predatory behavioral responses. Similarly, Carlile et al. (2006) manipulated the shape and flight direction of a simulated aerial predator and showed that Jacky lizards found the most realistic silhouette of a hawk highly aversive, whereas abnormal shape and flight direction were not recognized. Gerlai et al. (2009) also found that when in the presence of a moving computer-generated animation of Indian leaf fish (Nandus nandus), their sympatric predator, zebrafish exhibited specific antipredatory behavioral responses. However, control of basic visual features is not limited to predator recognition, but has been demonstrated in prey recognition. Roster et al. (1995) showed that green frogs (Rana clamitans), American toads (Bufo americanus), and Southern toads (Bufo terrestris) were able to discriminate between a cricket (Acheta domesticus) and an abstract object. Animation control allows, thus, for the isolation of particular motion features that is absent in video sequences. For instance, Peters and Evans (2003b) restricted the motion of an animated Jacky lizard such that only the tail performed any movement. Isolation of the tail flick was used to measure signal efficacy, thus drawing attention only to that region of the animation and no other body part. Finally, the use of computer animation software also provides an effective way to incorporate both surface and motion features that may resemble natural or unnatural characteristics, but still maintain either attribute. For example, Woo (unpublished data) designed a computer-animated Jacky lizard with two levels of signal syntax (normal and reversed display action pattern) compared across three levels of morphology (life-like animation with texture, life-like animation absent of texture, animation that lacked texture and exact structure, but maintained basic morphological shape). In this study, Woo (unpublished data) was able to examine signal design by selectively reducing the level of motion and visual features thought to be necessary for display recognition. Syntax was found to be more effective in eliciting social responses than morphology, and it was the motion pattern that was critical for recognition. Thus, computer-generated animations not only have the application to perform features outside normal property ranges but also allow researchers to examine and isolate aspects of signal design that are critical for communication. An elegant use of computer-generated animations has been recently described in the study of Campbell et al. (2009), who explored yawning contagion in chimpanzees (Pan troglodytes). Three different animations of conspecific faces displaying distinct control mouth movements and yawning were presented to 24 focal chimpanzees. Campbell et al. (2009) found that focal subjects significantly yawn more when viewing the yawning animation than that in the control mouth movements, suggesting the yawning contagion phenomenon in non-humans.

Like classical video playbacks (Rieucau and Giraldeau 2009a,b), computer-generated animations have been successfully used to simulate groups of individuals. For instance, Saverino and Gerlai (2008) created a computer-animated shoal of zebrafish where the shoal size as well as the morphology and the swimming pattern (speed and direction) of each shoal mate could be controlled by the experimenters. Theses computer-generated animations were first used to explore shoaling preference (Saverino and Gerlai 2008), and more recently, they have been employed to investigate the acquisition of a learning task in zebrafish (Pather and Gerlai 2009). Thus, the ability to precisely control intrinsic characteristics of demonstrator groups, which offer the use of computer-generated animations, now provides researchers with a powerful tool for the exploration of group decision-making processes and ultimately sociality.

Effectiveness of animations: the need to compare stimulus types

In order to objectively identify which method is both successful and more appropriate to the type of study undertaken, it is necessary that all types of techniques be compared directly to each other. Studies with pigeons that compare still images to motion models have already identified that movement is critical for individuals to engage with conspecifics, and this key feature is necessary for social recognition (Shimizu 1998). In comparing the display behavior of live conspecifics to video playback, Ord et al. (2002) found that Jacky dragons will elicit the same intensity of social responses to both treatment types. In understanding the potential implication for animation playback, Clark and Stephenson (1999) compared the schooling behavior of live tiger barbs (Puntius tetrazona), video recording of that exact pattern, and a rotoscoped scene of the same video sequence. In this experiment, Clark and Stephenson (1999) found that conspecifics elected to school at the same intensity with all three treatments, thus showing no difference in perception between live, video, and animation.

However, to ensure that the animation stimulus is accurate in its movement, it is critical that a method be developed to monitor differences in motion. Peters et al. (2002) developed a motion analysis program called analysis of image motion (AIM). AIM uses an optic flow algorithm to calculate the displacement of motion in the visual field from the perspective of an individual. Peters and Evans (2003a) had successfully demonstrated that assertive (tail-flick), aggressive (push-up body rocks), and submissive (slow arm wave) signals of the Jacky dragon all have different velocity, amplitude, sweep area, and speed characteristics, and these features are salient when superimposed among complex habitat noise. Researchers can now use AIM to measure whether animation stimuli greatly deviate from the original video sequence (Woo and Rieucau 2008).

Researchers must be able first to design stimuli that require the animation to operate outside the current range of properties. If the task requires simple discrimination of surface cues, then traditional static stimuli may be sufficient. However, studies in sexual selection or signal design often require that the stimulus be designed to be both naturalistic and exaggerated. In designing this type of study, it would be beneficial to compare the responses to robotized animations and computer-generated animation models as the appropriateness of either technique essentially hinges on their ability to move outside the current range. If robots are highly constrained in their ability to replicate all critical aspects of motion that are required for signaling, computer-generated animation may thus be the only technique to effectively alter the model with distorting other visual features.

The multimodal approach

By offering a combination of a precise control to the freedom of boundless manipulation of the visual stimuli presented, the computer-generated animation technique appears as a promising approach to address novel questions in animal communication. The use of video media has allowed researchers to incorporate multiple visual aspects in a stimulus with other communication modalities. The multimodal approach examines the signal as a complex functional component, to what extent content and efficacy-driven selection pressures act on the signal, and how individual signals may contribute and interact to complete the entire sequence (Rowe 1999; Candolin 2003; Hebets and Papaj 2005). This approach does not exclude one modality that may either be important for communication or a component of the entire signal (Partan and Marler 1999). In the use of still pigeon images, video, stuffed models, and live conspecifics, Ryan and Lea (1994) suggested that visual stimuli features must have a combination of visual characteristics in order for signals to be salient. Video playback incorporates the use of most of the visual features to some extent, with the exception of color. However, the freedom of animation manipulation allows researchers to be more flexible than video.

Evans and Marler (1991) initially used video images of chickens (G. domesticus) and bobwhite quails (Colinus virginianus) and corresponding alarm calls to investigate audience effects. They suggested that both visual and auditory modalities were responsible of the release of the social responses associated with the audience effect, and neither sensory modality is whole effective on their own. Partan et al. (2005) used a similar approach and examined the response of female pigeons to multisensory (audio and video) playback that was representative of male courtship behavior. Female pigeons were found to select males where both sensory signals were available. Male courtship behavior was further enhanced by the combination of both auditory and visual cues and thus serves to complete the signal. Besides, Galoch and Bischof (2007) also reported that if silent video sequences of females were sufficient to induce courtship behavior in male zebra finches (Taeniopygia guttata), the addition of acoustic cues enhanced the intensity of male responses. However, solely the diffusion of acoustic cues did not trigger any courtship response from males. In comparison, Hebets (2005) demonstrated that the wolf spider (Schizocosa uetzi) uses a bimodal courtship signal that is comprised of both a visual and vibratory component. In a somewhat different approach, Narins et al. (2002) used an electromechanical dart-poison frog (Epipedobates femoralis) that produced a synthetic advertisement call and inflation of the vocal sac. Narins et al. (2002) found that a combination of both auditory and visual signals elicited aggressive responses from conspecifics, but stationary models failed to engage receivers. However, in using an integrated approach to include video and animation paired with an auditory signal, Rosenthal et al. (2004) investigated male advertisement signal as dual component process in the túngara frog (Physalaemus pustulosis). They paired the male advertisement call with a video stimulus of an inflated vocal sac. In addition, they used a computer-animated rectangle with a similar amplitude and frequency of apparent motion to simulate the inflation of the vocal sac as a visual cue. Results from their experiments showed that females significantly responded to video males with both the call and inflated sac, but failed to respond to the movement of the rectangle, suggesting that the vocal sac does act as a visual cue and the apparent visual characteristics of morphology are required for recognition. Mehlis et al. (2008) investigated the mechanisms of kin recognition in three-spined sticklebacks by carrying a mate choice experiment where females were provided visual (two similar computer-generated animations of courting males) and olfactory cues (odor of either an unfamiliar or a familiar fish). The results of this study reveal that female sticklebacks were able to discriminate between unfamiliar and familiar mates based on olfactory cues. Recently, Thünken et al. (2011) conducted a multimodal experiment to investigate mating preferences of male P. taeniatus, where male cichlids were presented a series of combinations of a computer animation of a laterally viewed female swimming on a horizontal path (an identical animations has been used by Baldauf et al. 2009) and olfactory cues from either familiar and unfamiliar females. Interestingly, mate preferences based on olfactory cues appeared to depend on male body size; only the large males based their female choice on odor cues, showing a strong preference for the odor from familiar females, whereas the smaller males were found to be less choosy. The findings of Thünken et al. (2011) certainly call for the need to consider mutli-sensorial system in cichlids mate choice and give a strong experimental support on the effectiveness of computer-generated animations to minimize the effects of confounding variables and to deliver consistent visual stimuli that can be combined to olfactory modality.

Although Uetz and Roberts (2002) suggested that communication is most likely multimodal, individual components should not be overlooked, and there is a need to separate each component in a multimodal system to understand how the signal operates as a whole. The experiment by Rosenthal et al. (2004) demonstrated the ability to isolate visual features, such as frequency and amplitude of motion, which may be necessary for signal design. Unlike the use of robots or computer-altered video, the use of animations allowed them to understand if these inherited properties, in the form of a synthetic animation, were enough to elicit communicative responses. No other playback tool has the capacity to isolate specific visual components or express a combination of features simultaneously.

Conclusion

Classical methods employed to stage animal interactions have been effective sign stimuli. However, not every method is appropriate for any type of the experiment. The visual features that elicit social responses must be heavily considered before electing to use a particular technique. We suggest here that computer-generated animation is the most flexible technique that can be used for investigating signal communication, as it offers researchers the liberty of isolating or combining several dynamic visual features while insuring a precise experimental control. In the future, we hope to encourage other researchers to continue to improve upon past and current techniques. Although most studies have been able to elicit clear behavioral responses, it is necessary to ensure that we compliment our observations with quantifiable results. To date, researchers have primarily considered species-specific interactions, but we would like to see this method expand to studies that examine group interactions, multimodal signals, and more refined manipulations of visual signals.