Introduction

Learning can occur through self, asocial experiences with the surrounding environment, on the basis of trial-and-error, allowing for adaptive behaviour towards changes in the environment (Shettleworth 2001). A more complex form of adaption is learning through the experience of others, a process termed social learning (Galef and Laland 2005; Heyes 2012). This vicarious process allows for a reduction of trial and error, and inherently optimizes the individual’s behaviour, by maximizing successful actions, and/or minimizing costly actions and energy expenditure. Social learning by observation can occur through several mechanisms with an increasing degree of cognitive complexity, namely local/stimulus enhancement, social facilitation, emulation, mimicry, and imitation (see Olmstead and Kuhlmeier 2015 for review). These different mechanisms entail distinct information-processing dynamics, from sensitization/habituation to stimulus, to understanding object properties, end state, and/or actions undertaken; each with specific cognitive requirements (Galef and Laland 2005; Olmstead and Kuhlmeier 2015).

Cognitive comparisons between invertebrates (especially cephalopods) and vertebrates with similar/superior relative brain size (fishes, reptiles, and mammals) are still under intense debate (Amodio et al. 2018; Schnell and Clayton 2019). Arguably due to the lack of physical defenses and intense predatory pressure, cephalopods evolved the most sophisticated nervous systems among invertebrates (Albertin et al. 2015; Liscovitch-Brauer et al. 2017), exhibiting cognitive and behavioural capabilities that rival those of vertebrates (Amodio et al. 2018), such as episodic-like memory (Jozet-Alves et al. 2013). Despite showing social tolerance in early stages during laboratory rearing (Boal 1996; Boal et al. 2000), and some species reporting occasional schooling in the field (Yasumuro et al. 2015), cuttlefish are mostly solitary (or semi-solitary) animals throughout their life (Boal et al. 1999; Boal 2006). Even though learning through others has been shown in other non-social animals—e.g. sharks (Vila Pouca et al. 2020)—perhaps due to the predominantly solitary lifestyle of cephalopods (excluding squids), literature on sociality and social learning is scarce in this group. Notably, Octopus vulgaris individuals increase successful choices in a discriminative learning task, after observing a conspecific performing said task (Fiorito and Scotto 1992). However, cuttlefish (Sepia officinalis) were not able to improve predation techniques after watching conspecifics prey on crabs (Boal et al. 2000), and a subsequent study testing social learning of danger avoidance provided unclear results (Huang and Chiao 2013).

After hatching, cuttlefish externally already resemble adults (Hanlon and Messenger 1988; Nixon and Mangold 1998); however, neural organization is not fully matured until 10 days post-hatching, or reaching the juvenile stage at 30 days post-hatching for specific brain areas (Dickel et al. 1997; Liu et al. 2017). Among these are the optic lobes responsible for visual integration (Liu et al. 2017), and the vertical lobe (Dickel et al. 1997) that is highly associated with both short and long term memory potentiation (Agin et al. 2001; Shomrat et al. 2008), and proven to hamper learning when still under development (Dickel et al. 1997). An extensively used test for the assessment of learning and memory in cuttlefish is the “prawn-in-the-tube” procedure, where prey within a tube are presented to an individual, that must then learn to inhibit predatory behaviour (Messenger 1973). In this test, cuttlefish associatively learn to inhibit predation towards prey by detecting the glass tube through multimodal sensorial integration (both visual and tactile exploration were shown to be important), rather than negative ‘pain’ reinforcement (i.e. provoked by tentacles striking against the glass tube) (Agin et al. 2006; Purdy et al. 2006; Cartron et al. 2013).

Here we modified both the nature of the tasks and resulting end states expected in previous studies, using the “prawn-in-the-tube” procedure to gauge if S. officinalis can associatively learn to inhibit predatory behaviour by observing conspecifics. Moreover, we added a series of experimental conditions, in an effort to control for potential confounding variables and disentangle the mechanisms underpinning social learning. In gauging social learning potential before neural maturation (newly hatched cuttlefish), we aimed to further explore the neural plasticity of invertebrates that evolved complex neural systems, on which they heavily rely to navigate the world.

Materials and methods

Collection, husbandry, and maintenance conditions

Cuttlefish (S. officinalis) eggs from different clutches were collected directly from the wild (n ~ 300), in the Sado River (38° 28′ 40.1ʺ N, 8° 47′ 35.2ʺ W), and transported to Laboratório Marítimo da Guia, Cascais, Portugal. Eggs were randomly assigned to six nurseries (sand bottom area, 49 × 24 × 20 cm; volume = 2.2 L), within a larger 400 L tank with seawater flowing through the nurseries. The latter was a closed system with a sump tank below, equipped with mechanical (net and cotton mesh) and biological (bio-balls and protein skimmers ReefSkimPro 850, TMC) filtering, as well as UV sterilization (300 L/h UV Vecton, TMC). Water was maintained at 18 ºC and pH at 8.0, and nitrate, nitrite, and ammonia were kept at minimum levels, during the 40 experimental days. Once cuttlefish started hatching, prey was introduced (amphipod Gammarus locusta) in the nurseries. After 1–3 days of hatching, most cuttlefish would attack and eat the live prey. Cuttlefish used on the experiment were fed for 2 consecutive days (i.e. subjects were 3–5 days old hatchlings at the time of testing), and were fasted for 8 h prior to the experiment, to prevent hunger level-related biases. From the initial multi-clutch pool of eggs retrieved, we managed to successfully rear ~ 150 S. officinalis hatchlings.

General experimental setup and procedure

Our experimental setup was based on the prawn-in-the-tube procedure (Messenger 1973; Agin et al. 2006; Purdy et al. 2006; Cartron et al. 2013), with one additional arena in parallel, so that one test subject could observe (observer) the other performing the task (demonstrator) (see setup photo in Fig. S1). For each session (e.g. in the social learning condition), cuttlefish subjects (usually, 1 demonstrator and 1 observer) were taken from their nurseries and placed separately in the two adjacent arenas (20 × 5 × 15 cm), both filled up to ~ 7 cm with water from the original 400 L tank. After 1 h of acclimation, the stimulus, a glass cylindrical tube (2 × 10 cm) containing two prey items (amphipod Gammarus locusta) was introduced in the middle of one of the aquariums, starting a 10 min trial (timeframe based on our own preliminary tests of cuttlefish activity). The demonstrator was randomly chosen between the two subjects, to control for arena side and eventual relative size difference between subjects. After the first trial, subsequent trials were spaced by 10 min between them, and this was repeated for a maximum of ten trials. The criterion for successful learning was defined as three consecutive trials where the cuttlefish displayed no predatory behaviour (i.e. performed no attacks). Contrarily, after ten trials without reaching learning criterion, it was considered that the animal was unable to learn the task. After testing the demonstrator and 30 min had passed, the tube with prey was introduced in the observer arena, and the same trial protocol was followed. All sessions were video recorded, and at the end of the session, the pair was placed in a nursery separated from the untested subjects.

Study design

To disentangle the mechanisms underpinning potential social learning through observation, and simultaneously control for the existence of several confounding factors, we designed a series of different experimental controls.

Treatment 1

Classical social learning test—where an observer (T1-O) was paired with a demonstrator (T1-D) that already learned the task, i.e. did not elicit predatory behaviour (classical demonstrator). Given our inhibition learning based on social cues task, this procedure served ultimately as a confirmation of end-state acquisition by T1-O.

Treatment 2

Inhibition by social learning test—where one demonstrator (T2-D) learns the task (i.e. naïve individual or sham demonstrator), while another subject observed (T2-O). Given the nature of our task, in this test, we observed the learning rates until behaviour and end-state acquisition.

Treatment 3

Stimulus pre-exposure test—where subjects were exposed to the stimulus, demonstrator free, on the adjacent arena five times prior to their test, to control for learning by pre-exposure to the stimulus alone (T3-P). We reasoned five trials of pre-exposure, since this was the average number of trials that T2-O observed T2-D (see Fig. 1a).

Treatment 4

Tube control test—where no prey was in the tube, to control for unwarranted elicitation of predatory behaviour by the glass tube used for stimulus presentation (e.g. due to reflection), in either demonstrators (T4-D) or observers (T4-O).

Treatment 5

Positive control (prey reward) test—where demonstrators (T5-D) and observers (T5-O) were immediately rewarded on the first time attacking the stimulus, to control for fear-related and potential inactivity biases in the rest of the treatments.

Overall, we performed: 9 sessions for T1 (n = 9 for T1-O and T1-D), 13 sessions for T2 (n = 13 for T2-O and T2-D, total n = 26), 14 sessions for T3 (n = 14 subjects, i.e. T3-P), 7 sessions for T4 (n = 7 for T4-O and T4-D, total n = 14), and 16 sessions for T5 (i.e. positive control, n = 16 for T5-O and T5-D, total n = 32), in a total of 59 sessions and 95 cuttlefish individuals.

Statistics

Analyzing presence/absence of successful learning and learning rates

First, we wanted to know if the number of naïve subjects reaching learning criterion (role played by T2-D), was different from the number of respective observers that successfully completed the task (T2-O). To that end, we used a Chi-squared test on a 2 × 2 matrix with the number of subjects that learned and did not learn the task, in T2-D and T2-O. Moreover, we were also interested in investigating a potential correlation in the performance of T2-D/T2-O pairs, and used a Pearson correlation test to that end. After, we were interested in comparing learning rates among all treatments (excluding T5) to compare between all different conditions. For that, we used time-to-event analyses, i.e. Cox proportional hazards tests, to gauge differences in the number of trials required for learning between subjects, i.e. T1-D, T1-O, T2-D, T2-O, T3-P, T4-D, and T4-O. The proportional hazards assumption was confirmed for all cases.

Number of attacks during trials

Here, we were interested in verifying if the learning rate patterns emerging from different treatments were also consistent with the number of attacks over trials. Since we registered three different response variables (between arm, tentacle, and total number of strikes), we first checked for correlation to verify collinearity in the data using Pearson correlation tests. We found that total number of attacks was highly correlated with both arm (Pearson correlation = 0.8615, t = 19.933, df = 138, p < 0.0001) and tentacle attacks (Pearson correlation = 0.8788, t = 21.633, df = 138, p < 0.0001). Considering it the most inclusive parameter, total number of attacks was used as representative explanatory variable. To detect significant differences in total number of attacks over trials among T2-D, T2-O, and T3-O, we fitted a negative binomial generalized linear mixed model (GLMM) with subject and trial number as fixed effects. We also computed trial and session number (which served as cuttlefish ID) as random slope and intercept, to account for dependency of cuttlefish identity over trials. Details on model choice and validation are presented at the end of the section.

Latency attack time in T2 and T5

With the inhibition task (T2) and the positive control test (T5), we were interested in checking if observers would be slower than demonstrators on the former (T2-O vs T2-D), but would be quicker than demonstrators on the latter (T5-O vs T5-D), i.e. reporting an opposite trends. To measure time differences in T2 and T5 between observers and demonstrators, we fitted two gaussian GLMMs with the same aforementioned fixed (T5 only subject) and random effects for each treatment, with time latency as the response variable.

For all GLMMs, structure was chosen from an initial full model, decayed using the Akaike information criteria, and posteriorly validated depending on model family, by checking for overdispersion, normality, predicted and fitted structure, homogeneity of variances, and non-existence of influential values (see Script). When pertinent, in both multi-level Cox proportional hazard models and GLMMs, subject/condition factors were relieved to obtain pairwise comparisons between all levels. All statistical analyses were performed in R 3.5.2 (R Core Team 2018).

Results

Analyzing presence/absence of successful learning and learning rates

In T2, all observers learned the task (T2-O, 13/13), whereas only 70% of demonstrators (T2-D, 9/13) were able to reach learning criterion after 10 trials (χ2 = 4.7273, p = 0.02969, Fig. 1a). Individual T2-O learning rates were directly correlated with its respective T2-D pair’s learning rates (Pearson correlation = 0.8526, t = 6.7272, p < 0.0001). Accordingly, time-to-event analysis also reported faster learning rates of predatory behaviour inhibition in T2-O than in T2-D (Table 1, Fig. 1a). Compared to any individual T2-D, all T2-O required fewer trials to learn (CI 95%, Fig. 1a), with 77% of individuals (10/13) starting to reach learning criterion from the first trial, and the remaining reporting a maximum of 4 trials to do so (Fig. 1a).

Fig. 1
figure 1

Time-to-event analyses on the probability of attacking the stimulus (i.e. not reaching learning criterion) across trials, for a inhibition by social learning test (T2); and for b all different treatments (excluding T5). T2 demonstrators (T2-D) and observers (T2-O), T3 pre-exposed to stimulus (T3-P), and both subject conditions (observers and demonstrators) of T1 and T4, which are pooled per treatment since probability of attacking was the same (i.e. zero). Table of numbers depicts sample progression across number of trials

Table 1 Cox proportional hazard models and pairwise comparisons between all experimental treatments

Time-to-event pairwise analysis considering all treatments (except T5) (Table 1, Fig. 1b), disclosed significant differences across treatments. First, it is worth noting that observers and demonstrators in T1 (T1-O and T1-D) and T4 (T4-O and T4-D), reported no attacks, and were pooled together by treatment for formal analyses (Table 1). T2-D showed slower learning rates when compared to: T2-O, T1, and T4 (Reference T2-D, Table 1, Fig. 1b). T2-D learning rates did not differ from T3-P, i.e. subjects which had been pre-exposed to the stimulus (Table 1, Fig. 1b). Conversely, T2-O learned significantly faster than both T2-D and T3-P, whereas no significant differences were found when compared to T1 and T4 control treatments (Reference T2-O, Table 1, Fig. 1b). Additionally, T3-P subjects also reported significantly lower learning rates than both T1 and T4 (Reference T3-P, Table 1, Fig. 1b).

Number of attacks during trials

As mentioned above, predatory behaviour (i.e. attack stimulus) was not registered in both T1 and T4, for either demonstrators or observers (Fig. 1b). As such, only demonstrators and observers from the inhibition by social learning test (T2-D and T2-O, respectively), and subjects pre-exposed to stimulus (T3-P) were analyzed statistically for significant differences in the number of attacks per trial (Negative Binomial GLMM, Table 2, Fig. 2). Concordantly to what was verified in learning rates, T2-D and T3-P were shown to not differ in the number of attacks performed per trial (Reference T2-D, Table 2). Furthermore, T2-O also performed significantly less attacks per trial than both T2-D and T3-P (Reference T2-O, Table 2), thus fully confirming the pattern registered for learning rates. As expected, a general decreasing trend on the number of attacks over trials was reported for all three treatment conditions (Trial, Table 2, Fig. 2).

Table 2 Negative binomial mixed model and pairwise comparisons between all experimental conditions which reported attacking behaviours
Fig. 2
figure 2

Mean number of attacks per trial (model and data points) by different treatment subjects: T2 demonstrators (T2-D) and observers (T2-O), and T3 pre-exposed to stimulus (T3-P)

Latency attack time in T2 and T5

Lastly, mean latency time to elicit predatory behaviour (i.e. attack stimulus) was not significantly different between trials in T2, but was significantly higher for T2-O than for T2-D (GLMM, T2, Table 3, Fig. 3a). Conversely, in T5 (i.e. prey reward as positive control), the trend registered in T2 was reversed, with observers (T5-O) now exhibiting significantly lower latency time responding to stimulus presentation than demonstrators (T5-D) (GLMM, T5, Table 3, Fig. 3b).

Table 3 General linear mixed models for measuring differences in T2 (T2-O vs T2-D) and T5 (T5-D vs T5-O) of mean latency response time
Fig. 3
figure 3

Mean latency time (seconds) to attack the stimulus by demonstrators (D) and observers (O), in a the inhibition by social learning test (T2-D and T2-O), and b in the positive control test (T5-D and T5-O)

Discussion

We show that cuttlefish (up to 5 days post hatching) use social information and can learn to inhibit (or modulate) predatory behaviour, as a result of observing other individuals attempting to retrieve prey behind a glass tube. In addition to the significant reduction of predatory behaviour in observers of naïve individuals, we highlight that the: (i) correlation between number of trials taken to learn by pairs of demonstrators and observers (which simultaneously shows that potential inactivity did not bias results); (ii) non-differentiation between pre-exposed and naïve individuals (i.e. non-experienced demonstrators), indicating that knowing the stimulus beforehand did not improve learning rates; (iii) non-elicitation of predatory behaviour when paired with an experienced demonstrator; and (iv) the inversed time latency response pattern in the positive control (again controlling for inactivity biases), corroborate our reasoning.

Previous studies with adult/juvenile cuttlefish reported no clear evidence of social transmission of correct predatory behaviour (Boal et al. 2000), and different individual responses in the improvement of evading behaviour (Huang and Chiao 2013). Since S. officinalis hatch within close vicinity to others in nature, hatchlings may be more gregarious or socially tolerant—further hinted by laboratory rearing (Boal 1996)—than the later life stages used in the previous studies, which may have facilitated learning of the task. Moreover, different methodologies (e.g. in Boal et al. 2000, multiple cuttlefish simultaneously observed one cuttlefish performing the test, which may have led to unwarranted third-party audience effects), as well as distinct end-state and learning mechanisms underlying different tasks, are likely to be another of the causes for discrepant results among studies. By adopting a test measuring higher level parameters (attack/not attack, number of attacks), we simplified previous experimental designs that entailed higher intraspecific variation (i.e. assessing choices of attack/defense strategies), which provided more consistent results.

Social learning theoretical frameworks are continuously being updated, and not always unanimous on specific applicable nomenclature and underlying mechanisms (see Biederman et al. 1993; Heyes 1994; Galef and Laland 2005; Olmstead and Kuhlmeier 2015). Taking that into account, and moving from more heuristic to theoretically more cognitively complex mechanisms, stimulus enhancement is generally defined as observers being drawn quicker to a stimulus, and individually learning how to perform the task through trial-and-error (Galef and Laland 2005; Olmstead and Kuhlmeier 2015). Vicariously learning the “prawn-in-the-tube” procedure appears to go beyond stimulus enhancement, since only a small portion of observers (30%) required trial-and-error learning. Social facilitation (or social enhancement) predicts that the mere presence of conspecifics, regardless of their performance, will make observers perform better in the task (Klein and Zentall 2003). However, our results show a marked correlation between the number of trials taken for a naïve demonstrator to learn, and the number required by its respective observer. Moreover, observers of trained demonstrators did not produce one single attack on the stimulus, indicating that learning is mediated by the performance/behaviour of the demonstrator, and not only by its presence. Observational Pavlovian conditioning (through stimulus-stimulus learning) predicts that the same unconditioned response should be transferred from demonstrator to observer, which does not seem to be the case here, as seen by observers that learned the task despite being paired with demonstrators that did not learn. Thus, the occurrence of emulation appears to be the most likely social learning process, underpinned more on a “learning what” process instead of “learning how” (i.e. affordance learning, see Galef and Laland 2005; Olmstead and Kuhlmeier 2015). Rather than concentrating on the demonstrator’s actions per se, emulation through affordance learning dictates a focus on recognizing the properties of the object (i.e. there is a glass tube) through the actions of the demonstrator. Accordingly, the presence of the conspecific performing the task is key for learning, which explains why individuals pre-exposed to the stimulus did not perform better than naïve demonstrators, when observers did (individuals that observed the stimulus and a conspecific performing the task). Moreover, affordance learning as the underlying social learning mechanism aligns with how cuttlefish self-learn this procedure (Heyes 2012), i.e. by recognizing the existence of the glass tube through multimodal sensorial integration (Cartron et al. 2013). However, end-state or goal emulation, i.e. emulation/mimicry of the final demonstrator behaviour after trying to get the prey, are other possible mechanisms that could not be disentangled through the used experimental conditions.

Cuttlefish eggs do not receive any parental care (Nixon and Mangold 1998), but individuals can gather information about existing predators and prey through vision and olfaction from within the egg (Darmaillacq et al. 2008; Guibé et al. 2012), showing that cuttlefish hatchlings are capable of individual learning albeit learning rates improve over ontogeny (Dickel et al. 1997). However, despite the still immature state of key brain areas (Dickel et al. 1997; Liu et al. 2017), we found that cuttlefish newborns are additionally able to learn through emulation and perform much more cognitively demanding tasks (i.e. incorporating information from observing conspecifics, and inhibit natural behaviour), effectively improving the efficiency of their actions towards new stimulus. In the wild, the potential advantages of inhibiting predatory behaviour following actions of conspecifics, can go from preventing meaningless energy expenditure, to predatory avoidance associated with disengaged camouflage when attacking (Hanlon and Messenger 1988). Moreover, this occurs at a critical life stage, where the ability to circumvent trial-and-error for acquiring knowledge from the environment can signify avoiding extremely costly actions, conferring an evolutionary advantage to hatchlings that incorporate information provided by conspecifics.

Recent studies have highlighted the comparatively slower conditions of cephalopod genome evolution linked to higher RNA editing capabilities (Liscovitch-Brauer et al. 2017), together with the presence of serotonergic neurotransmission and its conserved role underpinning social behaviour in the same solitary octopus species (Edsinger and Dölen 2018). Considering the ~ 600-million-year gap in the evolutionary pathways of vertebrate and invertebrate neural systems (Hochner 2008; Amodio et al. 2018), further investigation of cephalopod social behaviour and learning, outputted behavioural plasticity, and respective neural correlates, can shed light on universal mechanisms shared between the two distinct branches, and deepen our knowledge on the evolution of complex learning and cognition. As a model case study of convergent evolution (Darmaillacq et al. 2014; Amodio et al. 2018), cephalopods are known for their vertebrate-like cognitive abilities, and our findings further reveal the potential for neuroplasticity and behavioural flexibility of these invertebrate brains (Schnell and Clayton 2019). These highly responsive phenotypic features are instrumental for quickly adapting to changes in the environment, by minimizing costs and maximizing individual fitness in nature.