Introduction

The ability to generalize rules and apply them to novel situations is a fundamental aspect underlying flexible cognitive behavior (Emery and Clayton 2004). This allows individuals to adjust to environmental changes through the generalization of previous learning, without having to learn specific contingencies ‘from scratch’ (de Mendonça-Furtado and Ottoni 2008). In cognitive learning experiments, animals which develop such ‘learning sets’ (Harlow 1949) decrease error rates in subsequent contextually similar problems (Wynne 2001). However, they only do so if they are capable of mastering the original problem, by extracting and applying the key principle that all of the problems had in common (Wynne 2001). In contrast, rote learners often solve tasks at a slower rate by learning each problem individually (e.g., pigeons: Wilson et al. 1985; Emery and Clayton 2004). It has been argued that the social environment, in particular, is variable, which leads to selection for increased brain sizes in social species to enable individuals to behave more flexibly (Deaner et al. 2007). This view thus links behavioral flexibility to the social brain and Machiavellian intelligence hypotheses (Byrne and Whiten 1988; Dunbar 1998). The ability to extract and apply generalized rules is also considered an attribute of human higher cognitive functioning (i.e., language: Pinker 1991), yet more simple rules have been documented in non-human animal taxa (i.e., rats: Murphy et al. 2008).

Despite the conceptual focus of the importance of generalized rule learning in a social environment, experiments have generally been conducted in an abstract context (i.e., ‘match to sample’ experiments using symbols, sounds or colors as the stimulus sample and with food as the reward). Experiments on generalized rule learning typically involve the following rule: two stimuli are presented: one consistently yields a food reward, while the other one does not. Animals that are able to apply this rule to new tasks are unable to know the correct choice during the first presentation of two novel stimuli, but can deduce the correct choice for the second presentation (i.e., if it received food during the first trial, then it should repeat the choice, and if it was not rewarded, then it should shift). Comparing the performance of various mammals in the crucial ‘trial 2’ of novel tasks, the number of stimuli pairs learned through operant conditioning before evidence for generalized rule learning emerged increased considerably from rhesus monkeys to squirrel monkeys, marmosets and cats, while rats and squirrels exhibited no evidence for generalization, even over 1800 problems (Passingham 1981; Wynne 2001; Shettleworth 2010). Although these results correlate with relative brain size, this is not necessarily the driving force (Wynne 2001). An independent study on dunnarts, using visual stimuli, demonstrated the performance of these marsupials exceeded even that of rhesus monkeys, despite their small brain size (cephalization index 0.07, smaller than that of low-performing rats and squirrels) (Darlington et al. 1999; Wynne 2001). It is suggested that the foraging habitat of dunnarts drives this exceptional performance, as catching fast moving prey in an open, high predation risk environment may select for quick learning (Wynne 2001). Thus, while evidence for generalized rule learning exists in a variety of animal taxa, including marine mammals (i.e., Herman et al. 1994), rodents (i.e., Murphy et al. 2008) and birds (i.e., de Mendonça-Furtado and Ottoni 2008), the lack of variation in ecological validity may explain why the variation in species’ performance is large and not necessarily linked to brain size (Wynne 2001). As there is a clear evidence that many cognitive abilities are tightly linked to a species’ ecology (i.e., ecological approach to cognition: Kamil 1998; Shettleworth 2010), generalized rule learning capabilities may span across a more diverse range of animal taxa and context than what is currently documented, and be more readily demonstrated as long as the ability to generalize is ecologically relevant to the species.

Testing such generalization abilities in a purely social context provides novel experimental opportunities. For example, generalization learning, a behavior documented repeatedly in primates (i.e., Byrne and Whiten 1988), is yet to be explicitly tested in a social tool use scenario. Social tools differ from physical tools in that an individual or social agent is utilized or manipulated to achieve personal goals (Bard 1990), i.e., an ‘agent’ uses a ‘social tool’ to affect a ‘target’ for the agent’s benefit, as defined by Whiten and Byrne (1988a). Efficient social tool use requires that the tool is socially dominant over the target. This is indeed the case in early descriptions involving baboons (i.e., protected threat: Kummer et al. 1990), and Slocombe and Zuberbühler (2007) found that chimpanzees that have been attacked exaggerate their distress calls systematically in the presence of third parties that are dominant to the aggressor. However, such data do not allow for distinguishing between generalized rule application and the possibility that subjects learned each combination from scratch through operant conditioning.

Here, we test experimentally whether bluestreak cleaner wrasse (Labroides dimidiatus), a species which utilizes social tools under natural conditions, as discussed below, uses generalized rule application to identify potential tools. Cleaner wrasse (hereafter simply ‘cleaners’) are a reef-associated fish species that maintain territories termed ‘cleaning stations,’ where they remove ectoparasites from visiting reef fish clients (Côté 2000). Although a mutualistic relationship (Ros et al. 2011; Waldie et al. 2011), conflict arises as cleaners prefer feeding directly on protective client mucus, which is considered an act of cheating (Grutter and Bshary 2003). In order to promote cooperative cleaning interactions, clients counter such cheating behavior by employing various control mechanisms (Bshary and Grutter 2005; Pinto et al. 2011), including punishment in the form of aggressive chasing following a cheating event (Bshary and Grutter 2005). Cleaners sometimes reduce the amount of punishment by exploiting the presence of predatory clients as a third party (Bshary et al. 2002). Predators are approached and given tactile stimulation (socio-positive behavior: Soares et al. 2011), while the punisher is deterred (Bshary et al. 2002). Hence, cleaners (agent) use predator species (social tool) to minimize the degree of punishment they receive from the cheated client fish (target). Preliminary field observations suggest that social tool use in cleaners is a relatively frequently occurring phenomenon (approximately once per eight hours, Ras Mohammed National Park, Egypt; Bshary et al. 2002). The observed tool use interactions invariably involved locally common grouper species (Cephalopholis miniata and C. hemistiktos) serving as the social tool (R.B. unpublished data; Bshary et al. 2002). Thus, the question arises how readily cleaners would be able to use other, less common predator species as social tools, i.e., are cleaners capable of generalizing between predator species, and hence, able to recognize potential social tools? Also, would this ability be linked to the safe haven being a predator or could cleaners also readily learn and generalize that a harmless species may provide a safe haven?

To address these questions, we experimentally simulated the described cleaner social tool scenario in the laboratory, in order to test whether cleaners are able to apply generalized rule learning to minimize punishment (chasing), or whether they have to learn the usefulness of each predator species independently. Our first study, conducted in 2011, tested whether cleaners are able to generalize a series of predator species, both local and exotic, when presented with a series of predator-harmless fish model combinations. Additionally, we explored whether cleaners could generalize when offered two harmless client models. These results were preliminary for the latter, in that tasks were presented in a fixed sequence. Therefore, we subsequently tested specific predictions derived from our initial results. First, we tested whether cleaners are able to distinguish ‘safe havens’ more quickly when represented by a predator model, in comparison with a harmless fish model. Second, using a counterbalanced design, we tested whether cleaners can generalize the concept of ‘predator fish represent a safe haven’ to other predator species, which vary in morphology, i.e., grouper versus moray eel, and location, i.e., from locally abundant species to species exotic to our study site, and hence, unknown to the subjects. And finally, we explored whether generalization abilities of cleaners are linked to always being presented with two clients belonging to different categories (i.e., predator–harmless) or whether cleaners can also generalize when facing a same category task (predator–predator or harmless–harmless). If cleaners were able to solve any of these problems, we predicted that the number of trials needed to reach individual learning criterion would be less in generalization tasks than during the learning of the initial task.

Study site

Our study was conducted between July and August 2011 and July and September 2014 at Lizard Island Research Station (LIRS), in the northern Great Barrier Reef, Queensland, Australia (14°40′S, 145°28′E).

Materials and methods

Subjects and housing

Adult female cleaners (n = 12 in 2011; n = 16 in 2014), ranging in size from 6.6 to 8.7 cm, were caught using barrier nets (2 m × 1 m, 5 mm mesh) from continuous fringing reefs surrounding Lizard Island. Post-capture, all fish were placed in sealed plastic bags containing ample oxygen supply for transport to LIRS (duration approximately 30 min). Cleaners were individually housed in glass aquaria (62 cm × 26 cm × 37 cm) with a continuous flow of fresh sea water and were supplied with two polyvinylchloride tubes (2 cm × 15 cm) for shelter. Fish were fed mashed prawn once daily on gray Plexiglas plates (5 × 10 cm) from day one in captivity and were allowed to adjust to their captive environment for one week prior to commencing experiments.

Laboratory experiments

Prior to the commencement of experiments, all cleaners, both in 2011 and in 2014, were taught that cheating behavior would lead to punishment. We simulated this frequently observed natural behavior by offering each cleaner a Plexiglas plate (7 × 12 cm) that contained both a preferred (mashed prawn; hereafter ‘prawn’) and a less preferred food item (fish flake mixed with mashed prawn; hereafter ‘flake’), as equivalents of client mucus and ectoparasites, respectively. The Plexiglas plate mirrored the behavior of a client reef fish: there was no consequence if the cleaner fed on the less preferred flake item (i.e., cooperate), but it was immediately chased with the plate in a straight line for a distance of 20 cm if a prawn item was consumed (i.e., cheat). Individuals were subjected to eight learning trials, all of which included punishment in the form of chasing. As cleaners significantly prefer feeding on prawn items over flake items, a prawn item was consumed in every trial. Cleaners that refused to feed from the plate (e.g., scared to approach the plate) did not participate in experiments, as cheating and punishment could not be simulated in subsequent trials. Detailed description of feeding against preference methods used in the cleaner system is provided in: Wismer et al. (2014).

Experiment I: Generalization of predatory species

The goal of the first experiment was to simulate the described social tool scenario and to determine whether cleaners are capable of generalizing among predatory species in this context. For experimental trials, all aquaria were subsequently separated lengthwise (partially) using a transparent Plexiglas partition (42 cm), creating three compartments (Fig. 1). The smaller compartment contained the client Plexiglas feeding plate, while the two elongated sections contained a model of a predator (peacock cod, Cephalopholis argus) or a harmless (two-lined monocle bream, Scolopsis bilineatus) fish (made from laminated colored pictures). The models’ positions were counterbalanced in sessions of ten trials. Cleaners were now once again offered the client feeding plate. When a prawn item was consumed, which occurred in each trial, the plate was swiftly moved in a 20-cm straight line toward the partition (Fig. 1). If the cleaner subsequently swam into the compartment containing the harmless species, the cleaner was chased for 40 cm toward the harmless model (Fig. 1). In contrast, if the cleaner swam into the compartment toward the predator model, all chasing ceased (Fig. 1). The goal was for cleaners to learn that the predator model (i.e., C. argus) represents a ‘safe’ area, which would reduce the energetic cost of cheating.

Fig. 1
figure 1

Experimental setup used to test cleaner generalization abilities in both 2011 and 2014. Aquaria divisions, model placements and chasing directions used in Experiments. Cleaners were systematically chased upon consuming a prawn item when in compartments 1 (feeding plate) and 3 (harmless model; right). Chasing ceased at a distance of 20 cm if they entered compartment 2 (predator model; left). Model positions (right, left) were counterbalanced across experimental sessions

All cleaners were subjected to two experimental sessions per day, each consisting of 10 consecutive trials, which commenced at 9:00 and 14:30 h. The total number of sessions conducted per cleaner for each model combination varied according to performance, i.e., the time it took to reach our criterion for learning. The learning criterion was based on cleaners developing a significant preference for the predator model in the ‘Initial Treatment’ model combination, as described above, either by performing correctly in at least 9/10 trials or two times 8/10 in a row. Once the criterion was reached, a cleaner was presented with a novel model combination, comprising in total of 5 consecutive treatments and 1 control (Table 1). However, individuals that did not satisfy the learning criterion in a total of 200 trials for the Initial Treatment, were not presented with a subsequent novel model combination (n = 3). All cleaners were subjected to identical sequence of treatments.

Table 1 Predator and harmless fish model combinations utilized in Experiment I: ‘Generalization of predatory species’

‘Treatment 1’ to ‘Treatment 4’ consisted of locally occurring species, whereas ‘Treatment 5’ consisted of two Caribbean endemics, the nassau grouper (Epinephelus striatus) and the queen angelfish (Holocanthus ciliaris). The purpose of the latter model combination was to take into account that cleaners may (although unlikely) have had direct natural social tool use experience with every predator species we used as models (cleaners may classify the Caribbean species as the local species they resemble). The ‘Control Treatment’ model combination consisted of two harmless species (coral rabbitfish, Siganus corallinus, acting as the harmless species and the blackeye thicklip, Hemigymnus melapterus, acting as the predator), in order to determine whether cleaners generalize between predators or apply new rule learning in every model combination separately.

Experiment II: Generalization in an abstract context

In order to explicitly test which rules cleaners are capable of generalizing and to avoid potential sequence effects, in 2014, we repeated a modified version of Experiment I. The setup of aquaria, the learning phase and the performance of experimental trials remained the same. However, in this field season, we independently tested whether cleaners were able to generalize a rule, which is not ecologically relevant. If cleaners are capable of generalizing the ‘predators-are-safe-havens’ rule, are they also able to generalize the ‘one-of-two-stimuli-is-a-safe-haven’ rule (i.e., by using two harmless fish models)? We tested this question using a counterbalanced design. In total, 16 cleaners were used, 8 of which were first exposed to an initial combination consisting of one predator and one harmless model (PH) (ID No. 1–8), whereas the other 8 (ID No. 9–16) were first exposed to a model combination consisting of two harmless species fish models (HH) (Tables 1–2 in E.S.M.). Cleaner ID No. 1–8, who learned to prefer the predator model in the Initial Treatment in 200 or less trials, were subsequently exposed to 5 other treatment combinations, consisting of one predator and one harmless fish model (i.e., including Caribbean endemics or species restricted to the outer Great Barrier Reef), as well as same-status models (i.e., two predator and two harmless models, PP, HH) (Table 1 in E.S.M.). All cleaners were exposed to unique model combinations comprised of at least one moray eel, one grouper and one exotic fish model, and were always shown a given model no more than once.

Cleaner ID No. 9–16 tested the latter rule, ‘one-of-two-stimuli-is-a-safe-haven,’ whose first model combination consisted of two harmless fish species representing the two stimuli (i.e., one of the harmless fish models represented a safe haven) (Table 2 in E.S.M.). If they achieved in preferring the correct harmless model in this Initial Treatment, they would have been tested on additional HH model combinations. However, given that this was not the case, cleaners (ID No. 9–16) were subsequently also tested on a similar sequence as cleaners (ID No. 1–8) (Table 3 in E.S.M.). This allowed us to determine whether fish, which could not learn the ecologically irrelevant rule of ‘one-of-two-stimuli-is-a-safe-haven,’ could still generalize the simpler rule of ‘predators-are-safe-havens.’ Images of fish species used to construct models are provided in Figure 2 in E.S.M. Note that we used only one picture per species, i.e., the one we could find that was of the highest quality (appropriate profile angle, representative colors).

Data analysis

Variation in cleaner performance among model combinations was investigated using two general linear mixed-effect models (LMM) (Experiments I and II). Data were log transformed, and model assumptions were checked with plots of residuals vs. fitted values and QQ plots of residuals. Furthermore, a Fisher exact probability test was used to analyze variation in performance between the two cleaner groups of Experiment II, while a Sign test was used to determine whether a significant proportion of the total number of cleaners tested was capable of generalizing. As a further exploratory measure, one-sample Wilcoxon signed-rank tests were used to determine after how many trials cleaners, as a group across both experiments, performed above random probability. Data were analyzed in Statistica and R 3.1.2 (R Development Core Team 2014). Individuals that failed to solve the task were not included in the analyses and shown as outliers on the figures. This included 12 individuals, of which 3 failed to feed from the plate altogether and 6 failed to significantly prefer the predator model in the Initial Treatment. Hence, these fish did not participate in the subsequent trials since generalization abilities could not be adequately tested.

Results

Laboratory experiments

Experiment I: Generalization of predatory species

Cleaners were able to generalize between predator fish models. A comparison of the speed of learning (e.g., number of trials to prefer the predator model) during the Initial Treatment combination and the five treatment trials (Table 1) yielded overall significant differences (F 5,30 = 12.66, P < 0.001); the performance of cleaners in the five treatments were all significantly different from the Initial Treatment (all P’s < 0.001) and did not differ from each other (Figure 2; Figure 1a in E.S.M.). Cleaners completed the Initial Treatment combination in 85 trials (median) (Initial, Fig. 2), while all subsequent treatment combinations were completed much faster, ranging from 20 to 30 trials (median) (T1 to T5, Fig. 2). However, during the maximal 120 trials of the Control Treatment (HH), none of the cleaners were able to develop a significant preference for the harmless fish model that resulted in a refuge from chasing (HH, Fig. 2). These results are based on 6 individuals from a total sample size of 12 cleaners (3 individuals were unable to learn the initial combination in 200 trials and 3 individuals failed to participate in the experiment by remaining in shelter tubes during experimental trials).

Fig. 2
figure 2

Performance of cleaners in generalization Experiment I. Boxplots of the number of trials required for cleaners to develop a significant preference for the predator model in initial (Initial PH) and subsequent treatment (T1-5) and control model combinations (HH = C). Boxplots show median (bar), mean (open circle), interquartile (rectangle) and maximum and minimum values (error bars). Small filled circles represent the 3 individuals which did not successfully complete the Initial Treatment in 200 trials. Asterisks P < 0.001. The proportion of variance explained by our model (R 2) was 0.57

Note, while the results above show that cleaners are able to generalize between predator and harmless client combinations, the conclusion remains unclear, in regard to their failure to learn that a harmless client could also represent a safe haven. This may be driven by a sequence effect and/or that cleaners were previously exposed to the harmless client models in a scenario where they did not provide a safe haven. Additionally, sample size for the generalization trials was small, since half of the twelve cleaners dropped out of the experiment. Therefore, we conducted a second series of experiments with the aim to address sequence effects and to increase sample size to address more specific questions.

Experiment II: Generalization in an abstract context

First, a comparison between cleaner groups (i.e., ID No. 1–8 vs. 9–16) on the speed of learning during the Initial Treatment, consisting of either PH or HH combinations, respectively, yielded significant differences (Fisher exact test: P = 0.026) (Fig. 3). Five out of 8 cleaners (ID No. 1–8), which were first exposed to a PH model combination in the Initial Treatment, learned to prefer the correct predator model in less than 200 trials, ranging from 70 to 190 trials, with a median of 130 trials (Fig. 3). In contrast, all 8 cleaners that were given the Initial Treatment combination consisting of two harmless fish models (e.g., HH) failed to prefer the non-chasing model within the maximum of 200 trials. However, when subsequently tested on an ‘Initial’ model combination consisting of one predator and one harmless fish model (e.g., PH), 5 out of 8 cleaners were also able to prefer the predator model within 200 trials, ranging from 50 to 100 trials, with a median of 80 trials (Fig. 3).

Fig. 3
figure 3

Performance of cleaners in the initial model combination of Experiment II. The gray boxplot symbolizes cleaners (1–8), which started Experiment II with one predator and one harmless species fish model (PH), while the white boxplot and the bar represent cleaners (ID No. 9–16) that started with two harmless models (HH), and were subsequently exposed to one predator and one harmless species model (PH). Boxplots show median (bar), mean (open circle), interquartile (rectangle), and maximum and minimum values (error bars). Filled circles represent individuals which did not learn the Initial Treatment in 200 trials

In subsequent treatment combinations (T1-3), consisting of one predator and one harmless fish model, cleaners consistently preferred the predator model in fewer number of trials in comparison with the Initial Treatment (Fig. 4), thus producing comparable results obtained in Experiment I. A comparison of the speed of learning during the Initial PH and the three treatment combinations yielded overall significant differences (F 3,32 = 16.35, P < 0.001); the performance of cleaners in the three treatments were all significantly different from the Initial PH (All P’s < 0.001) and did not differ from each other (Figure 4; Figure 1b in E.S.M.). Collectively, cleaners solved T1-3 in 40, 40 and 35 (median) trials (Fig. 4). Both groups of cleaners contained individuals, which were unable to generalize and prefer the predator model in T1-3 in less than 120 trials, shown as outliers in Fig. 4. In contrast to T1-3, all cleaners (ID No. 1–16) failed to significantly prefer the correct, non-chasing model in Treatments 4 and 5, which consisted of two same-status fish modes (e.g., HH, PP), during the maximal 120 trials (Fig. 4).

Fig. 4
figure 4

Performance of cleaners in initial and treatment model combinations of Experiment II. Boxplot shows median (bar), mean (open circle), interquartile (rectangle) and maximum and minimum values (error bars), using pooled data between the two cleaner groups, ID No. 1–8 and 9–16. Filled circles represent outliers or individuals which did not generalize the treatment in 120 trials. Treatments 4 and 5 (T4, T5) are grouped here as HH and PP (see Tables 1 and 3 in E.S.M.). Asterisks P < 0.001. The proportion of variance explained by our model (R 2) was 0.58

Collectively, out of the 25 cleaners tested between Experiments I (n = 9 in 2011) and II (n = 16 in 2014), 16 cleaners were able to learn the Initial Treatment (PH) in less than 200 trials. Fourteen out of these 16 cleaners learned all subsequent combinations faster than the original combination, while only 2 cleaners did not provide evidence for generalization (Sign Test: n = 16, x = 2, P = 0.004).

Group performance

In order to test the collective performance of cleaners, we first investigated whether ‘predator species’ (i.e., model morphology; grouper versus moray eel) and ‘abundance’ (local versus exotic) had a significant effect on the generalization abilities of cleaners. Both in 2011 (Friedman two-way analysis of variance: χ 2 = 4.0, df = 2, n = 6, P = 0.135) and 2014 (Friedman two-way analysis of variance: χ 2 = 0.7, df = 2, n = 5, P = 0.691), these categories had no significant effect on cleaner performance. We hence combined all data to calculate the percentage of correct choices in the first session (i.e., first ten trials) of each treatment (i.e., T1-5 in Experiment I and T1-3 in Experiment II) (Fig. 5). As a group, cleaners obtained a mean value of 60.1 % of correct model choices for the first ten trials, ranging from 40 to 76 %, which is significantly above chance (one-sample Wilcoxon signed-rank test: n = 14, T = 7.0, P = 0.004). A trial-by-trial analysis revealed that individuals performed significantly above random probability (50 %) in their predator choices by the sixth, ninth and tenth trials (one-sample Wilcoxon signed-rank test: n = 14, T = 21, 18, 5, P = 0.049, 0.030, 0.002) (Fig. 5), where the tenth trial remains significant even after a Bonferroni correction.

Fig. 5
figure 5

Cleaner group performance. The percentage of correct choices by cleaners in the first experimental session (trials 1–10) of treatments 1–5 (Experiment I; 2011) and treatments 1–3 (Experiment II; 2014). Symbols represent median (open circles) and interquartile values (filled circles and triangles). Asterisks indicate when cleaners as a group performed above chance, i.e., 50 %

Discussion

Our aim was to investigate whether cleaners can learn and subsequently generalize that a client model provides a safe haven from punishment and to explore to what degree ecological relevance affects their performance. More than half of the cleaners in our study learned to use predator species as social tools to minimize the amount of punishment they received after a cheating event. The few individuals that failed to learn the task may have either experienced a lack of exposure to social tool use situations under natural conditions or alternatively lacked the cognitive ability to exploit the situation. Such results are not surprising, given that there is often great variation in individual performance in cognitive experiments (reviewed in: Thornton and Lukas 2012). Cleaners that mastered the initial learning task generalized to novel models in subsequent tasks, independently of familiarity with the species and species body color or shape. For example, although the profiles of grouper species are similar to one another (but differ considerably in color), the morphology of moray eels differs significantly from groupers, precluding generalization based purely on predator shape. This differs from a study on spontaneous predator recognition in minnows, where generalization appears to be restricted to similar shaped species (Ferrari et al. 2010). Furthermore, the exotic fish combinations (Caribbean fish and outer barrier fish models) demonstrate that cleaners can generalize between predators, even when exposed to species which they could not have encountered before. Our results demonstrate that cleaners clearly discriminate between predator and harmless fish models as they were able to apply a generalized rule associated with a natural reef environment.

Cleaners, however, failed to choose the model which provided a refuge from punishment when presented with two harmless species during the Initial Treatment. As non-predatory clients do not provide safe havens against chasing client in nature, it appears that this lack of ecological validity impaired the cleaners’ learning ability in our experiment. Their failure to learn the initial discrimination task prevented us from the ideal test whether cleaners can generalize in a new species pair combination. Nevertheless, the data from both experiments show that cleaners fail to generalize ‘a predator is a safe haven’ to a situation in which both models belong to the same category, i.e., two harmless fish species but also two predatory species. This latter result is also important for methodological reasons, i.e., the same image of a predator induced generalized rule learning when paired with an image of a non-predatory client, but not when paired with the image of another predatory client. Hence, we can conclude that there were no hidden cues in the few predator images that cleaners may have used instead of the information we wanted to convey. Overall, the results hence strongly suggest that cleaners indeed have the categories ‘predator’ and ‘harmless’ in their mind and that these units allow the generalization between tasks. The distinction between predators and harmless clients is of fundamental ecological importance for cleaners as the former may potentially try to eat them (Côté 2000). As a counterstrategy, cleaners provide predators with the best service quality, i.e., low cheating rates and high rates of tactile stimulation (Bshary 2011), a behavior that reduces stress in clients (Soares et al. 2011). Having client categories for service quality may then help cleaners to learn to use predators as social tools. As it stands, the combination of punishment by clients and the presence of a predator is generally very low, except for small resident grouper species. These conditions seem to largely preclude the option to learn the usefulness of each species as social tool through trial and error, while a generalized rule allows the efficient exploitation of rare events. Given the somewhat surprising result that cleaners cannot generalize to the condition with two predator models, an interesting future study would be to test their ability to discriminate between other important client categories: Various studies provide evidence that cleaners discriminate between resident clients with access to the local cleaner only and visitor clients with access to several cleaning stations (Bshary 2011). Maybe species combinations from these two categories would facilitate the initial learning as well as the generalization even if both species are either predatory or harmless. Such a study would allow distinguishing between the importance of preexisting categories versus the ecological relevance of the task for cleaner performance.

Generalization abilities and the quick application of learning sets are considered an attribute of higher cognitive functioning (Murphy et al. 2008) and were once thought to be a correlate of brain size (Wynne 2001). However, our results support the view that the ecological validity is of key importance for a species’ performance, as put forward as explanation why a desert marsupial performs so well in the standard generalization paradigm (Darlington et al. 1999). Our results are not directly comparable with previous studies that focused on subjects’ performance in trial 2. As it stands, within the 3–5 test combinations our cleaners did not perform above chance in trial 2 (Fig. 5). However, the fact that they performed above chance in the first 10 trials with so few test combinations shows very fast learning that was clearly absent during the initial test. The results fit recent evidence that cleaners are an excellent example of a species capable of solving complex cognitive tasks if placed in the context of their own ecology (Salwiczek et al. 2012; Gingins et al. 2013). For example, cleaners have shown to outcompete three primate species (capuchin monkeys, chimpanzees and orangutans) in a laboratory-based cognitive foraging experiment relevant to client selection under natural conditions but not to primate foraging strategies, where individuals were given a choice between two actions, both of which yielded identical immediate rewards, yet only one an additional delayed reward (Salwiczek et al. 2012).

The ability to use and manipulate social tools for a personal benefit is also considered a unique cognitive ability, and to date, primarily documented in anecdotes in primates (Whiten and Byrne 1988a). In olive baboons (Papio anubis), for example, a female has been observed, after unsuccessfully obtaining meat from an antelope carcass guarded by a male, to attack another female (social tool), until the male (target) came to the attacked female’s rescue, leaving his carcass behind, which was subsequently stolen by the original female (agent) (Observer: Strum; Byrne and Whiten 1990). Such observations have led to the Machiavellian intelligence or social brain hypotheses, which propose that the emergence of higher cognitive functions and expansion of neocortical regions in primates are the consequence of intense social complexity as a selective factor through evolution (Whiten and Byrne 1988b; Dunbar 1998). A socially complex environment likely plays a key role in cleaner cognition as well. For example, on a given day, cleaners are involved in over 2000 cleaning interactions (Grutter 1996) and have to continuously engage in fine-tuned social strategies (that counter client strategies) to maximize their daily food intake. It is therefore not surprising that they have evolved the ability to use sophisticated decision rules in a social context, including the ability to use social tools for their personal benefit.

In conclusion, we have demonstrated generalized rule learning in a fish species in the context of social tool use. Minnows have been shown previously to generalize between predatory fish odor cues (Ferrari et al. 2007). As there is also recent evidence for transitive inference in a cichlid (Grosenick et al. 2007), as well as the documentation of referential gestures and sophisticated decisions about when to collaborate with whom in a grouper (Vail et al. 2013, 2014), our study adds to the growing evidence that the cognitive abilities of fishes are much more advanced than previously appreciated (further examples of fish intelligence in Brown et al. 2011; Bshary et al. 2014). Such evidence as well as the cognitive performance of some invertebrate species (Chittka and Niven 2009) shows that we are still far from understanding why mammals and birds evolved larger brains than other taxa.