Introduction

Male representatives of the Betta splendens species are territorial animals of pugnacious nature (Monvises et al. 2009). When approached by other males of the same species, they initially engage in contactless antagonistic displays that are meant to convey fighting prowess to the opponent. If neither animal retreats, this interaction escalates into contact fighting behaviour with potentially lethal consequences. During the initial phase, fighting fish primarily engage in three stereotyped behaviours (Simpson (1968)): (1) gill flaring (also termed opercular display), whereby their gills are repositioned to face the front part of the body (Gorlick 1990; Ma 1995; 2) fin extension, in which their fins are rearranged to presumably increase apparent body size; (3) increased oxygen uptake directly from the air. This last behaviour occurs as a consequence of the energy demands imposed by the other two behaviours (Arnott et al. 2016), emphasizing the cost associated with producing aggressive displays.

Fig. 1
figure 1

Aggressive behaviour of fighting fish can be triggered by synthetic opponents. The test animal (outlined by blue region in a) is placed within a square arena flanked by two displays. One display (left in the example) shows video of an opponent. Tracking software identifies head position/direction (indicated by black cross) and gill location (indicated by red regions adjacent to blue region in a). b Plots head position/direction for stimuli presented to the left (red) and to the right (black). Light-coloured dots show head position when the animal did not produce flaring (gill extension), while full-colour elongated symbols indicate head position and direction (larger part of symbol pointing to front) during flaring events. c Plots corresponding distributions for all head positions along tank. Polar plot in d shows corresponding distributions for head directions. Black open histogram shows distribution during the baseline phase (no stimulation). e plots same as d but restricted to flaring events (color figure online)

Fig. 2
figure 2

Individual variation in aggressive drive. a Plots (on y axis) the normalized difference between black and red distributions in Fig. 1c (labelled ‘right-versus-left drive’ here); when this characteristic is tilted away from the horizontal axis (dashed line) as shown here, the animal demonstrates measurable drive. Dark gray shading shows range (95% confidence interval) spanned by linear fit constrained to pass through origin (only 1 free parameter); light gray shading shows same for 99% confidence interval. b Plots flaring events (each black vertical segment corresponds to one event) over time as the stimulus switches between test phases (indicated by red shading) and baseline phases (gaps between test phases). During the test phase, one display showed the aggressive stimulus while the other display was blank (gray background); during the baseline phase, both displays were blank. a, b Show data for the crowntail animal detailed in Fig. 1; c, d show data for a veiltail animal, and e, f for a moontail individual (see icons showing photographs of actual individuals). Error bars (not visible when smaller than data points) show ± 1 SEM (color figure online)

Fig. 3
figure 3

Visual discrimination of selective stimulus manipulations. Middle column (b,e,h, k) plots behavioural drive to the convention of Fig. 2a for entire population (weighted across animals by their flaring activity; see Supplementary Methods). The number of animals contributing to each plot is indicated by n. Right column (c, f, i, l) shows intact stimulus; left column shows competing stimulus for detection (blank screen in a), discrimination of warping manipulation (d), inversion (upside-down stimulus in g) and reverse playback (j). Animated versions of these stimuli can be viewed in Supp Video 2

Informal observations and targeted experimentation have demonstrated that antagonistic displays can be elicited by visual cues alone (Thompson 1963; Elcoro et al. 2008), without requiring that the two opponents come into physical or chemical contact. A simple observation that is easily available to recreational pet owners involves placing a mirror in front of the animal: within seconds, fighting fish will engage in aggressive behaviour towards the reflected image (Eisenreich et al. 2017). Similar results can be obtained by replacing the mirror with artificially created images of male opponents; in general, however, attempts of this kind have not proven as effective as mirror images (Meliska et al. 1980; Arnott et al. 2016). In an effort to retain control over the stimulus but yield better behavioural drive, some investigators have, therefore, resorted to robotic devices (Thompson 1963; Simpson 1968) that sometimes share physical space with the test animal (Romano et al. 2017).

Our focus here is on visual stimulation delivered by means of computerized displays (Turnbough and Lloyd 1973; Allen and Nicoletto 1997), because our primary goal is to isolate image-based cues and study the manner in which they are processed by the visual system of the animal (Rowland 1999). In some studies, visual stimulation is achieved by presenting two actual opponents within abutting tanks that do not exchange water (Forsatkar et al. 2017). An important limitation of this class of protocols is that the experimenter has limited control over the visual cues delivered by real animals. A similar limitation arises with mirror stimulation, because the test animal modifies its own visual input. We were able to design a class of computerized stimuli that elicited substantial aggressive response on the part of our test animals, allowing controlled dissection of the different visual cues that contribute to the behavioural response.

Our results demonstrate that the visual system of fighting fish selectively represents complex aspects of aggressive stimuli, such as spatiotemporal dynamics and whole-versus-part configurative information. We also found a substantial degree of individual variation (Simpson 1968; Matessi et al. 2010): some fish engage with the artificial stimulus very effectively, others not at all (or at least not insofar as afforded by the resolution of our behavioural measurements). For individuals that do engage with the stimulus, their level of discrimination implies an ability to represent visual information that is not encompassed by what is known about neuronal selectivity in this creature (McDonald et al. 2004), and that is generally ascribed to cortical structures in primates (Zeki and Shipp 1988; Shafritz et al. 2002; Robertson 2003; Botly and De Rosa 2009). It appears that teleosts have developed equivalent circuitry using completely different substrates (Ito and Yamamoto 2009; O’Connell and Hofmann 2011; Neri 2012; Rosa Salva et al. 2014), prompting a more general theory of how socially relevant signals are represented by vertebrate brains.

Methods

Eight B. splendens males were tested, one at a time, by placing them inside a plexiglass arena flanked by two monitors on opposite sides. They varied in tail pattern and were tested at different times over a period of 2 years. The behaviour of the animal was recorded by a camera located above the arena (Supp Video 1). Each testing session conformed to an AB blocked design (20 \(\times\) A and 20 \(\times\) B) where each block lasted 30 s. During block A (test phase), the two monitors displayed different visual stimuli, which we refer to as ‘target’ and ‘non-target’; during block B (baseline phase), both monitors were blank. Stimuli were variants of an ‘intact’ movie sequence depicting a male aggressor (Supp Videos 2 and 3). We tested eight different pairs of stimuli. In the ‘detection’, ‘discrimination’, ‘inversion’ and ‘reverse-playback’ configurations, the target stimulus displayed the intact sequence while the non-target stimulus displayed, respectively, a blank screen, a warped sequence, an upside-down sequence or a time-reversed sequence (see Supp Video 2). In the ‘head-only’ configuration, the target stimulus displayed only the head of the intact sequence while the non-target stimulus was a time-reversed version of the target stimulus; the same protocol applied to the ‘intact head with warped body’, ‘body-only’ and ‘intact body with warped head’ configurations (see Supp Video 4). Behavioural drive is estimated by first computing the distribution of head position during test phases when the target stimulus was displayed on the right side of the arena, and the distribution when it was displayed on the left side. The difference between the two distributions at each of the ten sample points is divided by their sum. The resulting data points are subjected to a linear fit with no intercept (constrained to 0); the best-fit slope value defines drive. The statistical significance of this metric is assessed via confidence intervals (95% and 99%) around the measurements (indicating an effect when range does not include 0) and probability (p) of null hypothesis defined by zero-slope giving rise to observed measurements (significant when < 0.05). When combining different drive values across individuals, we first weighted each individual value by the amount of flaring synchronization displayed by each animal. Flaring synchronization is defined as the absolute value of the log ratio between the total number of flaring events recorded during the test phase and that recorded during the baseline phase. Please refer to Supplementary Methods for additional details and clarification.

Fig. 4
figure 4

Generalization to an opponent with different characteristics. Same as Fig. 3 but for a stimulus depicting a red moontail opponent (see Supp Video 3)

Fig. 5
figure 5

Reverse-playback discrimination is not supported by local regions of the synthetic opponent. Plotted to the same conventions of Figs. 3 and 4. Top row shows data for head-only stimuli (ac); second row for intact head but warped body (df); third row for body-only (gi) and bottom row for intact body but warped head (jl). Animated versions of these stimuli can be viewed in Supp Video 4

Fig. 6
figure 6

Global versus local discrimination of spatiotemporal dynamics (reverse-playback manipulation). b plots behavioural drive for the reverse-playback discrimination in the presence of an intact stimulus (red) or a part-only stimulus (black). To maximize the resolution of our measurements, data for the intact configuration is pooled from both crowntail and moontail stimuli (essentially combining data from Figs. 3k and 4k, see icons in a,c here), while data for the part-only configuration is pooled from all four variants of the part-only stimulus (essentially combining data from Fig. 5b, e, h, k, see icons in d, e here). fh are plotted to the conventions of b but show data for the three individuals already detailed in Fig. 2. Additional resolving power is harnessed by symmetrizing traces across the left-right divide (with sign inversion) (color figure online)

Results

General characteristics of behavioural steering in the presence of an aggressive stimulus

Individual males were tested inside a tank placed between two monitors displaying different visual stimuli (Fig. 1a). Each session was divided into multiple test protocols. Each protocol consists of repeated alternations between a ‘baseline’ phase, during which both monitors are blank, and a ‘test’ phase, during which the two monitors display competing stimuli. In the ‘detection’ protocol, the test phase involves presentation of an aggressive stimulus on one of the two monitors, while the other monitor remains blank (in Fig. 1, the aggressive stimulus appears to the left; a full movie of this stimulus can be seen in Supp Video 2). We automatically tracked a number of behavioural markers from the animal, notably head position, head direction and flaring events (gill extension; see Supplementary Methods and Supp Video 1).

The individual in Fig. 1 could be successfully steered towards the aggressive stimulus, as demonstrated by the differential distribution of head position when the stimulus appeared to the left as opposed to the right of the tank (Fig. 1b, c). Head direction is also asymmetrically distributed during the test phase (but not the baseline phase), as indicated by the polar plot in Fig. 1d (compare red versus gray bars for left versus right presentation of the aggressive stimulus; also compare with open bars from baseline phase). More specifically, the distribution is biased towards two orthogonal directions: one oriented so that the animal would be facing the stimulus (horizontal spokes in Fig. 1d), the other oriented so that the animal would be displaying its right flank towards the stimulus (vertical spokes in Fig. 1d; see Bisazza and de Santi (2003)). When this directional analysis is restricted to time points during which a flaring event was recorded (Fig. 1e), incidence of the flanking configuration is greatly reduced (compare size of vertical spokes in Fig. 1e as opposed to Fig. 1d), consistent with the notion that flaring displays are most effective when the animal is directly facing its opponent (Simpson 1968).

Inter-individual differences in behavioural drive and flaring synchronization

We summarize the above-detailed differential effect on head position by taking the normalized difference between the distribution associated with right-hand side stimulus presentation (black histogram in Fig. 1c) and the distribution associated with left-hand side stimulus presentation (red histogram; see “Methods”). The resulting data points are expected to hover around 0 in the absence of behavioural drive; in the presence of drive, as we observe in Fig. 1, they are tilted with positive slope (Fig. 2a). Behavioural drive is, therefore, summarized by a linear fit through the origin of the plot (light/dark gray-shading in Fig. 2a), where the origin marks the centre of the tank (vertical green line in Fig. 2a, c, e). To provide a graphical rendition of whether drive is statistically different than 0, 95% and 99% confidence intervals around the linear fit are indicated by dark- and light-shaded regions, respectively. Using this plotting convention, an effect is present when the shaded region stands clear of the horizontal dashed line (the latter corresponding to a slope of 0, i.e. no behavioural drive). We also include p values, with the understanding that they should be interpreted with caution (Cumming 2014; Wasserstein and Lazar 2016). The two approaches (confidence intervals and p values) lead to equivalent conclusions when applied to our dataset.

Not all individuals were as responsive to the aggressive stimulus as the example documented in Fig. 1, also further detailed in Fig. 2a, b. This individual not only presented clear behavioural drive as assessed by head position (Fig. 2a, drive different than 0 at \(p<10^{-6}\)) and head direction (Fig. 1d, e), but also produced flaring events that were highly synchronized with the occurrence of the aggressive stimulus (black segments in Fig. 2b, indicating individual flaring events, mostly occur during test phases, indicated by red shading). Other individuals showed substantially less drive (Fig. 2c, drive is different from 0 at \(p<10^{-6}\) but tilt is smaller than in Fig. 2a); in some cases, drive was barely measurable (Fig. 2e, \(p=0.1\)) and flaring behaviour was poorly synchronized with stimulus occurrence (Fig. 2f).

There was no obvious pattern to these inter-individual differences (Meliska and Brown 1982). For example, they did not seem related to fish size (we also carried out some pilot measurements using smaller stimuli to address this issue and found no evidence that size was a critical parameter within the range that could be rendered by our display). They also did not seem to depend on whether the fish belonged to the crowntail variant or to one of the other two main variants (veiltail, moontail). We show examples from all three categories in Fig. 2 for completeness, but this morphological distinction was not a good predictor of drive.

It is possible (though purely speculative) that the observed differences may reflect perceived dominance with relation to the computerized stimulus: to some individuals (Fig. 2a), the aggressor in the synthetic stimulus may appear as a manageable opponent; to others (Fig. 2e), the stimulus may appear overpowering and not worth/wise engaging with. We have made an indirect attempt at addressing this issue by collecting data with a different synthetic stimulus (see below); however, we have no conclusive evidence to support this class of speculations. For example, previous research has demonstrated that aggression is more pronounced in response to an opponent that differs in colour as opposed to one of similar coloration (Thompson and Sturm 1965). Our results, however, go in the opposite direction: the most-responsive individual (crowntail in Fig. 2a, b) presents similar coloration to the synthetic sequence, while less-responsive individuals (e.g. Fig. 2c, d) carry different coloration (e.g. black, white).

A related issue we consider here is the potential extent to which breeding may have rendered our results unrepresentative of the natural population. Fin pattern and coloration are subject to intense selection by breeders for commercial purposes, so that mainstream fighting fish available in Western shops (like those used in this study) differ substantially from natural (pla-kat) strains (Monvises et al. 2009). Although we cannot exclude the possibility that different results may be observed if the experiments reported here were repeated on other strains, previous research indicates that this scenario is unlikely because highly bred strains demonstrate characteristics that are similar to pla-kat strains with relation to their aggressive behaviour (Allen and Nicoletto 1997). This matter can only be settled conclusively by future research.

We further draw attention to the fact that, because animals were sourced from mainstream commercial outlets as mentioned above, we are unable to provide adequate information on genetic profiling for our sample, and in particular about the potential variation across the sample, which may have contributed to the inter-individual differences detailed above. Although the practice of purchasing individuals from local shops is widely adopted for research on B. splendens (e.g. Forsatkar et al. 2017; Arnott et al. 2016; Eisenreich and Szalda-Petree 2015; Romano et al. 2017), we hope to rectify its limitations in further studies by relying exclusively on institutional breeding facilities and it is further hoped that the latter practice will come to dominate the field in the near future.

Visual discrimination of warped, inverted and reversed stimuli across the population

Due to the inter-individual heterogeneity documented above, we combine drive measurements across individuals after weighting them separately by the corresponding degree of flaring engagement in each individual. For example, when combining data from Fig. 2a, c, e into a composite population descriptor, we apply a weighting factor to each trace that is derived from the corresponding flaring patterns in Fig. 2b, d, f (see “Methods”). This procedure effectively over-represents individuals that engage with the stimulus, and under-represents those that show poor engagement. It is designed to reduce the impact of individuals that did not engage with the stimulus because, if they do not engage at all, it is difficult to draw sensible interpretations about their discriminative abilities and they can be regarded as merely contributing noise to the population estimate (see Supplementary Methods for more extended consideration of this issue).

The population estimate for drive in the detection protocol is shown in Fig. 3b. Notice that this plot now presents data aggregated across multiple individuals (see above), as opposed to Fig. 2 where each plot refers to a different individual. As expected, drive is sizeable under this protocol (\(p<10^{-5}\)); however, this is not a particularly informative configuration due to its lack of visual specificity: there are many reasons why fighting fish may prefer inspection of the stimulus (Fig. 3c) when pitted against absence of a stimulus (Fig. 3a). For example, stimulus discrimination may be supported by the mere presence of motion signals (Thompson 1963), without representing the specific motion pattern associated with synthetic aggressors. To exclude this possibility, we designed a warped variant of the stimulus that retains comparable amounts of movement, shape and colour information (Fig. 3d; a full movie of this stimulus can be seen in Supp Video 2). Albeit reduced in amplitude, behavioural drive was measurable for this comparison (Fig. 3e, \(p<0.002\)), demonstrating that the visual system of fighting fish can represent relatively specific information about the configuration of the synthetic opponent.

Despite affording greater specificity than the featureless background, the warping manipulation is nevertheless relatively aspecific, in that it involves disruption of several low-level characteristics. To mention one, the spatial frequency content of the stimulus is not entirely preserved and the degree of local motion coherence is also slightly altered. We, therefore, tested two additional stimulus configurations with greater specificity: upside-down inversion (Fig. 3g) and reverse playback (Fig. 3j; full movies of these stimuli can be seen in Supp Video 2). Stimulus inversion is an established manipulation in visual psyhophysics (Yin 1969; Thompson and Thatcher 1980), typically employed to selectively impair higher-level perceptual representations without concomitant changes in low-level stimulus content (Valentine 1988; Neri 2011, 2014). Reverse playback achieves a similar goal (Neri 2014) and has been successfully exploited in previous studies to demonstrate feature binding in a teleost species (Neri 2012).

Although upside-down inversion was poorly discriminated by fighting fish (Fig. 3h, \(p=0.06\)), we report a measurable degree of discrimination for reverse playback (Fig. 3k, \(p<0.002\)), comparable in amplitude to the ability demonstrated for the warping manipulation (compare Fig. 3k with Fig. 3e). The latter result is particularly interesting because it exposes the role played by dynamic features of the stimulus: a visual representation that lacks dynamics, for example one based on individual snapshots of the stimulus that treats it like a collection of static images, cannot discriminate between forward and reverse playback. On the other hand, dynamics alone appears insufficient to carry out the discrimination, because the two stimuli contain nearly identical motion signals. The only potential differences are produced by accelerating elements being converted into decelerating elements (and vice versa) during reverse playback; however, the two motion characteristics are represented in roughly equal amounts within the original sequence, so that the forward and reverse playback configurations present virtually no difference in overall acceleration/deceleration content.

Generalization to a different synthetic opponent

During stimulus development, we identified an issue of both conceptual and practical relevance: the role played by the specific visual sequence associated with the synthetic aggressor used in the experiments of Figs. 1, 2 and 3 (see Supp Video 2). The design of a stimulus that produced robust behavioural drive was not trivial and it was only following a number of failed or mildly successful attempts that we fine-tuned the visual sequence into its final configuration. It remains unclear whether the results obtained using this specific sequence will generalize to a different sequence. We addressed this issue by designing an additional sequence that retained some of the characteristics associated with the primary configuration, while at the same time presenting several differences in colour, shape and motion (see Supp Video 3). The most obvious differences relate to colour and shape (crowntail versus moontail).

Compared with the primary sequence, this stimulus variant produced similar drive in the detection protocol (Fig. 4b, \(p<10^{-5}\)) but reduced amplitude for the discrimination protocols (Fig. 4e, h, k). The overall pattern remains consistent, in that warping (Fig. 4e, \(p<0.003\)) and reverse playback (Fig. 4k, \(p<0.04\)) are discriminated more effectively than upside-down inversion (which produces no measurable drive in Fig. 4h, \(p=0.7\)); however, some of these effects are poorly resolved (Fig. 4k). Clearly, this variant of the stimulus is not as effective in driving a response from our sample of fighting fish.

Local versus global analysis of body sub-parts

A central question in contemporary research on action perception is whether the motion patterns generated by other agents are analysed locally, i.e. by representing separate elements of their body as distinct stimuli, or whether they are processed globally, i.e. by nonlinearly integrating sub-elements into a coherent whole (Neri et al. 1998; Troje and Westhoff 2006). In the context of human perception of conspecifics, the distinction between whole and subparts may be cast in terms of body versus limbs, for example (Neri 2009; Cusack et al. 2015). Based on prior research (Neri 2012), we define two sub-regions within our stimulus: the ‘head’ region and the ‘body’ region (see Fig. 5a, g). It is relevant in the specific context of fighting fish that these two regions correspond to distinct aggressive displays, namely gill flaring and fin extension (Simpson 1968). Will either region be sufficient to support the class of discrimination abilities documented for intact stimuli? To answer this question in a stringent and highly specific fashion, we focus on the most informative discriminative protocol tested earlier: reverse playback (Fig. 3j–l). The ability of fighting fish to discriminate between forward and reverse playback is not supported by either head or body configurations (Fig. 5b, h; \(p=0.5\) and \(p=0.3\), respectively).

One potential explanation for the above outcome is that, in the part-only configurations, the stimulated region only extends over a limited portion of the synthetic opponent. It may be argued that the stimulated region must be sufficiently large for the animal to engage with the stimulus in an aspecific sense (e.g. allocation of attention) before further engaging in more detailed discrimination of its dynamics. The failure to discriminate dynamics may then be attributable to the lack of attentional engagement (aspecific) and may not imply inability to carry out the discrimination (see Supplementary Methods for further consideration of associated issues). This potential explanation appears reasonable in relation to the head-only stimulus, because this stimulus only covers a small portion of the original sequence; it appears less applicable to the body-only stimulus, which is only slightly smaller than the intact one (see Supp Video 4). However, even if we accept that the animal should engage with the body-only stimulus to an extent that is comparable with the full intact opponent, there is a related issue that complicates interpretation: it may be argued that a missing region does not support perceptual completion of the stimulus. According to this interpretation, it is not necessary for the missing region to be intact, but it must provide some stimulation in order to cohere with the remaining intact region.

To address the above scenarios, we designed chimeric stimuli in which one of the two sub-regions remains intact, while the other one is warped (see “Methods”). For the head-only variant, this means that the head-region is kept intact, while the body region is distorted (Fig. 5d, f); for the body-only variant, the opposite configuration applies (Fig. 5j, l). We did not observe any measurable difference in the behavioural response to these chimeric configurations as opposed to the part-only stimuli tested earlier (Fig. 5e, k, \(p=0.5\) and \(p=0.6\), respectively), indicating that the lack of forward/backward discrimination for part-only stimuli is not attributable to the mere absence of one part, but to the fact that the disrupted part does not conform to the natural configuration of the synthetic opponent.

A different but relevant concern involves data mass and the resolution of our measurements. Because our conclusions are based on the absence of a measurable effect, it may be argued that our methodology simply failed to measure discrimination for part-only stimuli due to lack of resolving power. This possibility cannot be fully excluded, but we can analyse our dataset further to investigate the matter further and perhaps draw additional conclusions. To maximize the resolution of our measurements, we combine multiple equivalent configurations so as to increase data mass (we also symmetrized the traces between left and right sides to reduce the impact of measurement noise). For the intact stimulus configuration, we combine data from the two different synthetic opponents (Figs. 3k, 4k); the resulting characteristic (red data symbols in Fig. 6b) produces clearly measurable drive (\(p<10^{-4}\)). For the part-only configuration, we combine data from the four variants in Fig. 5b, e, h, k. The resulting characteristic (black data symbols in Fig. 6b) does not produce measurable drive (p barely significant at 0.04, 95% confidence interval overlaps with horizontal dashed line), confirming our earlier conclusions (however, see below for further analysis).

As we have highlighted earlier (Fig. 2), we found considerable individual variability in how fighting fish react to synthetic stimuli. This variability is particularly relevant in the context of the whole-versus-part analysis we are currently discussing, because we found one individual (partially characterized in Fig. 2a, b) for which drive was measurable in response to part-only stimuli (Fig. 6f, \(p<10^{-5}\)), albeit reduced in amplitude (compare black versus red data points; p value for red data is < 10\(^{-7}\)). Other individuals only discriminate forward/backward stimuli when intact (see example in Fig. 6g, corresponding to individual in Fig. 2c, d, for which p values are > 0.05 and < 10\(^{-3}\) for black and red data, respectively) or do not discriminate them at all (see example in Fig. 6h, corresponding to individual in Fig. 2e, f, for which p values are > 0.2). Based on these observations, we must further qualify our conclusion that forward/backward discrimination relies on global analysis by adding that this is a generalization that may not apply to all individuals. At the population level, it is certainly the case that dynamic information from our synthetic stimuli is processed more effectively when the stimulus delivers a full representation of the natural visual signal and that this process is greatly impaired when only local information is made available to the animal (Fig. 6b). The associated reduction in discriminability, however, may show substantial individual variation (Fig. 6f–h).

Discussion

Significance of understanding visually guided behaviour in teleosts

The study of visual perception in teleosts has a long and productive history (Schuster et al. 2011; Rosa Salva et al. 2014), often resulting in important insights relating to function and dysfunction in other creatures like ourselves (Maximino et al. 2015). The Siamese fighting fish is a particularly attractive model for behavioural quantification thanks to its explicit antagonistic display (Thompson 1963; Simpson 1968). Furthermore, because atypical aggression is associated with a range of pathological conditions and drugs are easily delivered to fish via their natural medium, fighting fish are of interest to pharmacological studies (Lynn et al. 2007; Eisenreich et al. 2017; Dzieweczynski et al. 2016; HedayatiRad et al. 2017). To support targeted drug delivery and manipulation, however, it is imperative that aggressive behaviour is characterized and understood to a satisfactory degree of specificity.

Little is known about the specificity of visually guided behaviour in fighting fish. It is known that visual stimulation is sufficient to elicit it, but it remains unclear what aspects of the stimulus are represented by the animal for controlling behaviour. Previous studies have attempted to identify some of the critical features by manipulating, e.g. fin size (Allen and Nicoletto 1997), colour (Thompson and Sturm 1965; Grabowski and Thompson 1968), movement (Thompson 1963) and other characteristics (Romano et al. 2017), but interpretation of the results is complicated primarily by two factors. First, animals are typically tested in response to one stimulus at a time, rather than two simultaneous stimuli for direct comparison. As we explain in Supplementary Methods, the former protocol has poorer resolving power, due primarily to saturation effects. Second, several manipulations applied in previous studies were meant to target individual visual cues, but did not always achieve this goal due to the lack of sufficient specificity. We discuss the latter issue in the next section.

Discrimination of spatiotemporal dynamics as a selective tool for probing the specificity of perceptual representations

Prior studies have made important contributions to our understanding of how teleosts analyse visual stimuli; however, experiments with fighting fish have often involved stimulus manipulations that are relatively aspecific, thus complicating interpretation of the results. For example, when fin size is varied (Allen and Nicoletto 1997), it is also the case that the overall size of the stimulus is varied; any associated difference in behavioural response may be attributable to general reduction of stimulus size, not necessarily to the fins in particular. When coloration is varied (Thompson and Sturm 1965; Grabowski and Thompson 1968) and the animal shows preference for one colour over another, it remains unclear whether the critical feature is the colour of the synthetic aggressor or of any other visual object, potentially even meaningless ones. Addressing these issues requires the design of visual stimuli that only differ with respect to very specific visual cues, while leaving all other cues intact or nearly unaffected.

When the above considerations are taken into account, it becomes apparent that the most stringent test adopted in this study is represented by the forward/backward discrimination where one stimulus follows its natural temporal trajectory, while the competing stimulus is reversed in time. The two stimuli contain nearly identical amounts of overall colour, shape and movement; what sets them apart is the relationship between shape and movement as it evolves over time according to either natural or unnatural correspondence (Neri 2014). Prior work has demonstrated that the zebrafish visual system can represent the relationship between these two attributes in a specific manner indicative of the ability to carry out feature binding (Neri 2012) (see Nakayasu and Watanabe 2014 for subsequent confirmation of this result in a different teleost species). In this study, we similarly demonstrate that fighting fish possess neural structures that can support this computation [see Supplementary Methods for a more detailed discussion of how the stimuli used here relate to those used in previous experiments (Neri 2012)]; however, we also report that the manner in which this computation is carried out by the brain of fighting fish presents qualitative differences that make it more similar to the manner in which vision operates in primates (see below).

Global versus local processing of visual stimuli in fighting fish

When compared with relevant experiments in zebrafish (Danio rerio), fighting fish (B. splendens) differ primarily in two important respects. First, discrimination in zebrafish appears to be relatively local: forward/backward discrimination is supported by a restricted portion of the synthetic stimulus, regardless of whether other regions are presented incongruously or excluded altogether (Neri 2012). In this sense, the stimulus is processed by zebrafish in a local fashion, which is not the typical mode of operation in primates (Maurer et al. 2002; Neri 2009) [although it has been proposed that it may apply to some atypical profiles, like autistic traits in humans (Happe and Frith 2006; Cusack et al. 2015)]. In fighting fish, on the other hand, the integrity of the full stimulus appears to play a greater role: when one region is disrupted or omitted, discrimination is poorly supported if at all (Fig. 6).

The difference detailed above carries potentially important implications for understanding visual processing in teleosts, because it indicates that different classes of social behaviour (e.g. shoaling versus fighting) may involve different perceptual computations, or at least different modes of operation for carrying out those computations. At the same time, the issue remains open because it is unclear what exact perceptual mechanisms are probed by stimuli that contain partial information; clearly, further research using a wider range of stimulus manipulations will be necessary to answer this question satisfactorily. We have made a first attempt in this direction by repeating our experiments using a different synthetic aggressor; the overall pattern of results returned by these additional experiments confirms the earlier conclusions (Fig. 4); however, the size of the effects is smaller (possibly due to the second stimulus supporting poorer definition of the gills as a consequence of its more homogeneous coloration), demonstrating the importance of testing a wide range of stimulus configurations.

A second difference exposed by our measurements is that, in general, zebrafish present less individual variation than fighting fish in their visually guided behavioural responses. In zebrafish, the presentation of synthetic conspecifics produces robust shoaling behaviour when pitted against a blank background (Neri 2012; Spilioti et al. 2016). In fighting fish, some individuals show little response to the aggressive display (Fig. 2e, f). We can only speculate as to why this pattern was observed (while acknowledging that substantial inter-individual differences are well documented in B. splendens (Simpson 1968; Matessi et al. 2010); see also Lucon-Xiccato and Bisazza (2017) and “Results” for further related considerations). An interesting possibility that is directly relevant to the observations made in the previous paragraph is that, because visual analysis of potential opponents in fighting fish appears to retain certain characteristics of higher-level vision, while conspecifics are visually represented using low-level rules for the purpose of social aggregation in zebrafish, the former class of computations may constitute a larger repertoire than the latter in relation to how information is represented and the degree of associated behavioural variability.

Who needs cortex?

Our results demonstrate that Siamese fighting fish possess sophisticated visual capabilities that well exceed our knowledge and expectations about the potentially underlying brain structures (e.g. McDonald et al. 2004), and that the manner in which these capabilities operate on the visual stimulus share important similarities with primate vision (Neri 2012; Treisman 1996; Zeki and Shipp 1988; Shafritz et al. 2002; Botly and De Rosa 2009). Clearly, complex circuits can be built without cortical substrates and to a remarkable degree of miniaturization. Our behavioural measurements also highlight the holistic nature of visual analysis in this species, a characteristic that (like binding) is regarded as an important signature of cortical processing (Maurer et al. 2002). More generally, our results prompt current theoretical research into social cognition to widen its scope and expand beyond frameworks that are centred around mammalian brains, dovetailing related proposals by other authors (Ito and Yamamoto 2009; O’Connell and Hofmann 2011). A comprehensive understanding of different behavioural traits across phylogenetically distant species may teach us novel facts about visual processing in biological systems and may provide important insights for the development of a more general, and ultimately more powerful, theoretical account of how socially relevant signals are perceptually represented by animal brains (Rosa Salva et al. 2014; Lucon-Xiccato and Bisazza 2017).