Introduction

Fifty years ago, the well-known paper (by William T. McKinney, Jr. and William E. Bunney, Jr.) Animal model of depression. I. Review of evidence: implications for research (1969) was published. Since then, much has been written about animal models of depression, and the number of both descriptions of original empirical findings and review articles far exceeds the capability of any depression researcher to keep up with all the literature. A Medline search reveals that the first single items matching with depression models appeared in the 1950s, the attention to the theme started to grow in mid-60s and the turn of millennium witnessed rapid growth in animal research on depression, so that an annual output of close to 800 papers was reached and the cumulative figure is above 10,000 (Fig. 1). One review type of a publication per less than ten original articles is out there trying to entice the reader. Is it owing to such statistics that the editors of this volume have invited a paper with an intriguing title that is bound to raise controversy amongst all who have vested interests in animal research. Or is it related to some concerns recently raised amongst those who have not.

Fig. 1
figure 1

Annual output of depression research in animals. The number of items in PubMed database from the first item in 1954 to 2017 (as of August 14, 2018), by using search terms “depression animal model” or “depression animal models”

Why to question the use of animal models. Why still use them.

For us in neuropsychopharmacology who do animal research, the need for animal models of psychiatric disorders is obvious. Amongst the psychiatric disorders and indeed amongst all illnesses, depression has been one of the most significant public health issues (Wittchen et al. 2011). Furthermore, depression is projected to become by 2030 the global leading cause of disease burden (WHO Report 2011). We believe that animal models offer the possibility of neurobiological analysis at a higher resolution, allowing experiments on selected components in the brain circuits that may underlie psychopathology as well as the possibility of screening novel drugs with clinical potential. We also believe that analysis at this higher neurobiological level of resolution and preclinical drug screening are necessary in order to understand depression and enhance our capabilities to treat the disorder and the resultant increased understanding of brain function can potentially guide the enhancement of human mental health as a preventive measure. For the rest of professionals who are responsible for drug development and health care in general, not to speak of other audiences, the necessity of animal models is less obvious. Within the last 60 years, a major progress has taken place in drug treatment of depression but in the more recent decades, the advances slowed down and this has produced discontent (Hyman 2014); the animal models often falling into the role of a scapegoat. The psychopathology of depression is thought to be highly complex (Drysdale et al. 2017; Duman et al. 2016), with much recent focus on cognitive aspects. While the animal model builders have strategies to meet these expectancies, including the cognitive shift (Keeler and Robbins 2011; Darcet et al. 2016), those who essentially do not believe in representation of human mental faculties in other animals can presently refer to the missing proof in the pudding. Firsthand experience of discussions at a regulatory level of medicines informs us that what is perceived as the absence of novel advances can, at board meetings, become mainly attributed to animal models, as if these misguided the process that would be easy to streamline otherwise. This is unfair and counterproductive as there are a large variety of issues in the drug development process that fall short of the ideal (Williams 2011). However, recommendations based on animal experiments may sometimes have been erroneous and even if the clinical studies might be made to use the preclinical guidance in a more sophisticated way (Belzung 2014), the skeptical clinician or regulator cannot be convinced without a novel success story. There also appears to be some hidden doubt in the models within the research community. The statistics that shows the growth of the use of animal models of depression includes a proliferation of more or less extensive modifications of established models and entirely novel techniques that may remain in use at a single laboratory. The practice of use of depression models has also become highly divergent: While some authors continue to refine the method and argue about the significance of any single detail, others include for big data analyses all findings obtained with a wild variety of techniques labeled as depression models.

To take and reinforce the affirmative tone, we do need animal models of depression because we can benefit a lot from access to the brain at molecular and cellular levels, from the experimental manipulations on little grey (and white) cells and, eventually, from testing the drugs with a promise to treat depression even better. Animal models will become even more indispensable as the search for more personalized treatments in psychiatry unrolls. These treatments need be hypothesis-driven and there will be a lot of trial and error that must not happen on patients. But, for the eventual success, we will need valid and reliable readouts. Open discussion on controversies can make this research only stronger. To make predictions means making some errors and the animal modelling has simply, by nature, been a research field acting in a relatively bold manner (Slattery et al. 2004). Table 1 presents a list of several points in this discussion.

Table 1 Counterarguments to animal models of depression and rebuttal

The easy targets of blame over insufficiency of the animal models include a bad design of the experiment, subjectively biased interpretation and inappropriate statistical analysis. But, are these the most salient culprits? Quite obviously, experiments should be designed and data analysed according to the best practice and the peer review process should assure that the conventional standards are maintained. So, what could be said of the conventional standards, bearing in mind the space for further improvement.

Animal models of depression

Models, the diversity and attempts to classify

The number of psychiatry-related animal models in use is hard to count owing to modifications but is sufficient to have inspired production of classifications. For animal models of depression, a well-known division is between animal assay models (or tests) and homologous models (Weiss and Kilts 1998). The former measure an informative behavioural or physiological response in animals but refrain from claim of creating a human disorder in another species; the latter rely on construct and face validity. Alternatively (or additionally), classification into drug screening procedures and proper models of the disorder has been applied. Animal assays or drug screening tests, some rather historical, include muricide, potentiation of yohimbine lethality or amphetamine-induced hyperactivity, antagonism of apomorphine-induced hyperthermia, preferential reduction of kindled seizure activity initiated from the amygdala compared to the neocortex, facilitation of circadian rhythm readjustment after switching the light-dark period, reduction of isolation-induced hyperactivity, forced swimming and tail suspension tests (Weiss and Kilts 1998; Harro 2004; Cryan and Holmes 2005; Slattery and Cryan 2014; Wang et al. 2017b). Another classification separated acute stress- and chronic stress-based models, models of secondary depression and immutable models (Yin et al. 2016) and yet another listed stress-based models (acute vs. chronic), genetic predisposition strategies, models producing dysfunction of limbic circuitry and “other” (O’Leary and Cryan 2013). Animal models of depression indeed often involve a type of chronic stress (restraint, variable, “unpredictable mild,” social defeat, early life or mechanistic imitation of stress by corticosterone administration), isolation or separation; other models including olfactory bulbectomy produce changes that may make unexpected environmental stimuli more stressful (Weiss and Kilts 1998; Harro 2004; Cryan and Holmes 2005; Slattery and Cryan 2014; Hammels et al. 2015; Czéh et al. 2016; Slattery and Cryan 2017; Wang et al. 2017b). Further models rely on selective breeding for “depressive” phenotype and a seemingly endless number of models can be brought about by genetic modification of expression of a protein believed to be involved in the pathogenesis of depression, given the modern targeted mutagenesis tools that include the possibility to insert “risk loci” from the human genome (Deussing 2013).

Confusion of tests and models

In depression, research tests and models are sometimes referred to as if the terms were synonymous. As an aid in making the necessary difference, it has been proposed that a test provides only a readout measure, while the model additionally comprises an inducing manipulation, categorized as an independent variable (Geyer and Markou 1995; Slattery and Cryan 2014). Whether these classification attempts have made the desired separation of animal tests and animal models of depression any clearer may be worth an analysis but this should lie in the subject area of history of science. For the present, enough confusion is around. In a recent editorial, Stanford (2017) commented on the poor record of translational relevance in behavioural neuroscience that feeds into accumulating disappointment in the entire research area, emphasizing that if predictive screening tests for psychotropic drugs are incorrectly used as animal “models” of psychiatric disorders, this contributes to the poor translation of neuroscience as conducted in humans vs. animals. Another proliferating trend is the use of the term “disorder-like behaviour” that leaves so much space to interpretation. These tests often measure behaviour that appears with certain face validity for the disorder but the biased attention to the face should be acknowledged. Indeed, as Stanford (2017) noted on the mouse tail suspension test, “it is often claimed that a deficit in struggling in the Tail Suspension Test reveals depression-like behaviour, but if two humans were suspended upside-down, against their will, I doubt that a difference in their struggling would be attributed to one being more depressed than the other”. This sarcasm is similar in spirit to another writing: “In the forced swimming test, the lengthening of duration of immobility was originally labeled ‘behavioral despair.’ This interpretation justifies the model being sometimes considered a homologous model of depression, even though evidence for depressed humans displaying increased immobility when thrown into deep water may be lacking” (Harro 2004). Indeed, repeated forced swimming leads to increasing immobility but this is not accompanied by any increase in other behavioural or homeostatic symptoms that could be associated with depression (Mul et al. 2016). The forced swimming test (and the related tail suspension test for mice), nevertheless, remains quite broadly recognized as the “animal model of depression,” probably because there are no others that were obviously better in terms of all aspects of validation.

Validation and validity

Validity of the animal models is a subject extensively dealt with. Fifty years ago when the study on the neurobiology of depression was beginning in earnest, McKinney and Bunney (1969) set out the first criteria for an animal model of depression. These criteria suggested that a model should resemble the human disorder in respect to aetiology, symptomatology, biochemistry and response to treatment and have subsequently been reiterated by others (Abramson and Seligman 1977; Willner 1984; Geyer and Markou 1995) with a somewhat different emphasis on each criterion. Recent depression modelling has largely drawn on Willner’s (1984) dictum of the requirement of construct, face and predictive validity. Face validity aims at similarity of the symptoms observed in the model and in the disorder; preferentially, the symptoms should be specific to the subtype of depression and any symptoms not fitting with clinical symptoms should not be present. Predictive validity should allow the model to discriminate between true and false and between efficient and inefficient treatments. Construct validity is met if both theoretical and empirical unambiguous connections between the model and the disorder can be established. Especially, the latter is a tough criterion indeed. It is important to note that McKinney and Bunney (1969) primarily wrote on primate models, bonding and separation experiments and before the proliferation of psychiatric research on Rodentia (not to speak of Dario, Drosophila and Caenorhabditis). While the depression researchers almost invariably claim that their empirical work is performed with validated models, more critical opinions have doubted that any animal model has achieved all three of these validation criteria (O’Neil and Moore 2003; Robinson 2018) and consideration of the significance of language and abstract knowledge in the development of human behaviour and its disorders (de Wit et al. 2018) may, at the first glance, render any attempt to create an animal model of depression utterly impossible. Validity criteria for animal models can of course be stretched to such a limit that the conclusion that a perfect animal model for depression is unattainable in principle would become inevitable (Slattery and Cryan 2017). Furthermore, it has been argued that depression itself has adaptive function in evolutional terms but that this function of depression is not shared between humans and rodents owing to species-specific differences in social behaviour, hence severely limiting the ethological relevance of current animal models (Hendrie and Pickles 2009). Meanwhile, the number of novel animal models and their modifications is rapidly increasing and this is likely to reflect the dissatisfaction with the progress made by the earlier paradigms. Introduction of modifications can be viewed as a progressive, iterative improvement but if each laboratory requires its own modification, one may wonder whether the construct is still the same and whether efforts should rather be directed toward finding a replacement technique.

While neurobiology of depression is the ultimate target of animal models, currently the most practical use of animal tests and models comprises of prediction of antidepressant effect. In this regard, animal models were classified by Cryan and co-authors (Cryan et al. 2002) in terms of reliability, specificity and ease of use as high, medium or low. It is appreciated that an animal model should be robust, not miss to predict antidepressants and also preclude false positives and it should be applicable in screening many chemical compounds. Only the classic and modified versions of the forced swimming test and the related tail suspension test received the rating “high” in all three categories, while the olfactory bulbectomy model was rated as highly reliable and specific but “medium” in ease of use; the learned helplessness model was given “high” for specificity but “medium” in the two remaining categories. Another highly popular model, the chronic mild stress model, was rated “low” for ease of use and reliability, these less favourable estimates apparently compensated for by its “high” specificity.

In practice, validity in case of animal models is not a categorical matter, the test or model being either valid or invalid. How many clinically useful antidepressants should be reported as true positive before the model can be accepted as valid? Which classes of drugs and how many of each, need be shown not to be false positives? How many behavioural symptoms or endophenotypes of depression should be observed? Is a model with three better than another with two? How has it been demonstrated that a behavioural sign of depressiveness is not an unspecific index? At least, we must admit there are limits to which extent validity can be claimed. In the literature, the efforts made to validate a model have been so incomparable between the models but this is masked by lumping them all together into one table in review papers. It could, however, be argued that the validity derives from the integrated effort of many research groups as it is unrealistic to expect such an ideal dataset from a single laboratory. Then again, a model validated in one laboratory need not necessarily be immediately relied upon in another, as what the behavioural readout reflects is in part dependent on local conditions (Harro 2018). Animal tests of depression are fairly complex in terms of design parameters that provide immense variability (Yin et al. 2016). Minor modifications have been reported to change the outcome (Bogdanova et al. 2013; Yin et al. 2016) but few immediate incentives exist to promote research in this direction. Behaviours recorded in depression models appear likely to be less sensitive to the multitude of environmental factors than e.g., anxiety tests but they cannot be immune to such impacts that change the meta-stability of the CNS activities and, hence potentially, the meaning of the behavioural readouts (see below).

For a pharmacologist, predictive validity in terms of antidepressant drug effect remains the most desirable goal but we usually focus on the qualitative side. The quantitative comparison of dose and effect has been suggested feasible only within single experiments (Kara et al. 2018). There may also be the trade-off between different aspects of validity and it should be acknowledged that the numbers needed to treat in the clinics suggest much less efficacy than animal testing where drugs can be shown efficacious with small treatment groups. The latter may suggest that animal tests are highly optimized for predictive validity and given the sensitivity of behavioural readouts to environmental conditions, this optimization may lead to discoveries related to depression but not at its core.

Tests and models: a few lessons learned

Forced swimming and tail suspension: despair or else and does it matter

In the 1970s, Roger Porsolt and colleagues introduced the forced swimming test for mice (Porsolt et al. 1977) and rats (Porsolt et al. 1978) and this work became a landmark for depression modelling in many respects. The forced swimming test was presented under the concept of behavioural despair, while there seems to be no demonstration that rats more immobile during the second immersion to water are more “desperate” according to any other independent assessment. As the predictive value of the test for drug development was remarkable, the concept of a state of “despair” or “depression-like” went on living its own life, with an ever-increasing number of reports anthropomorphically interpreting immobility as depression (Molendijk and de Kloet 2015). Meanwhile, a number of alternative interpretations to the immobility measure have been proposed. For example, it is obvious that an animal, having learned that escape from the deep water is impossible, should maximize its chances of survival by conserving energy and hence staying immobile. Use of the early versions of the forced swimming test that had a lower depth of water soon led to the observation that rats are good in using their tails for support. The immobility measure also tends to correlate with the body weight possibly because it is easier for the fatter rat to stay on the surface. Hence, an alternative explanation to increase in immobility was proposed that it reflects learning (De Pablo et al. 1989; Enginar et al. 2016) but this would lead to an uneasy stream of thought that antidepressants act by impairment of learning. A concept easier to embrace is that the forced swim test measures a stress-coping strategy (Commons et al. 2017). Obviously, the onset of increase in immobility reflects the switch from predominantly active to predominantly passive behaviour in a stressful situation (Molendijk and de Kloet 2015). Antidepressants could well be promoting active coping strategies and a passive coping strategy would not be alien to depressed state, while not synonymous to the latter. Yet, another interpretation for a decrease in immobility, observable as a more active coping style but essentially not a reasonable survival strategy, is that the response is immature. Stressful stimuli can decrease immobility (Platt and Stone 1982; van Dijken et al. 1992) and this is mimicked by selective neurochemical lesions of serotonergic (Häidkind et al. 2004) and noradrenergic (Harro et al. 1999) systems. These behavioural activations are reminiscent of impulsive behaviour that may significantly interfere with interpretation of behaviour if developmental or applied factors have changed the meta-stable CNS activity pattern, e.g., in genetically modified animals (Harro 2002) and may be counter-adaptive. Indeed, rats that were less able to stay on the water after 2 h of swimming had been less immobile within the first 15 min of forced swim (Nishimura et al. 1988).

One detail that indirectly supports the multifactorial regulation of behaviour in the test as simple as forced swimming is that, while conceptually the ideal measure to take is the increase in immobility as calculated for each animal, this has very rarely been implemented. This may suggest that the standard deviation of behavioural change is greater than that of retest immobility and statistically significant differences between treatment groups will, this way, be harder to reach. This, in turn, suggests that antidepressants can reliably reduce immobility because they diminish the inter-individual differences in multiple behavioural domains simultaneously.

Heterogeneity in depression: phenotypes and endophenotypes, with focus on anhedonia

Theoretically, 681 combinations of symptoms can exist that meet the DSM-5 diagnosis of depression (Akil et al. 2018) and symptoms of depression occur in other disorders. Should this heterogeneity bring about consequences to animal models? Acknowledging the controversies undermining different types of validation to the disorder, it has been suggested that the endophenotype level represents a feasible target for animal models (Cryan et al. 2002; Slattery and Cryan 2014). Similar is the suggestion to classify animal tests and models according to the NIMH research domain criteria (Söderlund and Lindskog 2018). While the “endophenotype” concept (Gottesman and Gould 2003) has become popular, in animal models, it mostly refers to the focus on one clear-cut behavioural output, essentially reflecting the simplification of the principles of McKinney and Bunney (1969) by Geyer and Markou (1995) that supports adaptation to research on rodents.

McKinney and Bunney (1969) were clear in that they would see the core of depression in despairing emotional state and the depressive mood. The empirical data behind their vision largely derived from conditions of social loss. Similar is the conclusion from the affective neuroscience model of Jaak Panksepp where depression develops from the overactivity of the separation-distress PANIC emotive system that reflects severed social bonds and produces loneliness (Panksepp 1998). This process is suggested to cascade into despair that includes low activity of what is alternatively called the brain reward networks or the SEEKING system (Panksepp 2017). While the optimal behavioural equivalent of SEEKING system activity may yet need to be defined for depression modelling, an apparently related construct, anhedonia has found a rather unequivocal recognition.

Anhedonia is a core symptom of depression and studies on animals have observed a reduction of reward-related behaviours after application of chronic stress. Experience of adverse life events is a key aetiopathogenetic factor of human depression (Fava and Kendler 2000) and hence, the study of distress in animals has huge translational potential. When Katz and colleagues introduced the chronic variable stress model of depression (Katz et al. 1981; Katz 1982), one of the observed symptoms after a few weeks of exposure to a randomized sequence of a variety of stressful stimuli was a decrease of consumption of sweet solutions. Willner and coworkers modified the paradigm by replacing the most severe stressors and made preference of a weak sucrose solution over water, the primary endpoint of the procedure (Muscat and Willner 1992).

The popular chronic mild/variable/unpredictable stress model has received variable assessments in terms of reliability/reproducibility (Cryan et al. 2002; Willner 2017). It has often been applied with the assumption that the exact nature and presentation sequence of stressors is not critically important and these models may be labeled different but represent the same paradigm (Willner 2017). This of course is a prerequisite for the model if unpredictability is considered a key element. As to the other aspects in chronic mild stress (CMS), the mildness of stress, it is a relative matter whether a stressor is mild to the animal (Cabib 1997) and the aetiology of clinical depression as well as much of animal stress research (Mason 1975) suggest that the degree of stress as perceived by the observer is much less important to the outcome than its novelty, uncontrollability and unpredictability. As to the perception of the intensity of stress, it appears that the length of application of a stressor has received less attention than its qualitative aspects. The stressors that are used in the CMS paradigm vary to a significant degree in length, from 5 min to 12 h. One investigation that monitored body weight in contingency with stressor quality found that daily body weight gain was significantly lower after periods when longer stressors had been applied (Tõnissaar et al. 2008). This study also found that in terms of daily body weight gain, adaptation developed to some stressors but not to others. While specific stressors are applied in a (semi)randomized order, they are unpredictable but one may wonder whether it is that important to the animal subjected to this sequence of stressors whether the next nuisance arrives in the form of wet bedding or crowded housing, as long as it can expect some sort of distress anyway. Specific stressors are unpredictable but stress as such is not and hence, adaptive responses to chronic variable stress are expected (Matrov et al. 2011). Only a few researchers have explicitly addressed that concern by comparing groups of animals that develop vs. do not develop behavioural changes to stress (Wiborg 2013). This is in sharp contrast to practice of another stress paradigm, the social defeat stress (Berton et al. 2006), where attention to the resilient animals has become widespread. What has also been occasionally reported but not systematically pursued is the neurobiological variation that may underlie resilience to chronic mild/variable stress. In one study, sucrose consumption in response to chronic mild stress was very different across vendors of Wistar rats within one laboratory, paralleled with the presence of a large difference in epigenetic regulation of p11 expression (Theilmann et al. 2016). Experimentally, reduction of serotonergic projections by 20–30% with administration of low-dose para-chloroamphetamine can change the response to chronic variable stress so that sucrose intake and preference is increased (Harro et al. 2001), while this behavioural change, reminiscent of symptoms of atypical depression (Wurtman and Wurtman 1995), is antidepressant sensitive (Tõnissaar et al. 2008). Other neurobiological correlates of changes in sucrose intake after chronic variable stress have been reported (Wiborg 2013) and with some more effort, we could be able to predict the stress response of anhedonia from in vivo neurochemistry. One important question, however, is whether or not we are always dealing with true anhedonia. Because not all studies specify the order of presentation of stressors, informative comparison is not feasible but it appears rather customary to measure sucrose preference after 24 h of food and water deprivation. Our own unpublished experience suggests that it helps to reveal the reduction of sucrose intake by stress. This deprivation stressor must, however, induce hunger response and again, as argued above for the case of forced swimming, we may be measuring the effect of antidepressants against something related to depression but possibly not at its core.

A commonly occurring theme in medical research is whether one can develop efficient treatment for the disorder by looking at efficacy at the level of a single symptom. Depression models often satisfy with one simple behavioural readout, which is often affected by the effort factor. Studies on operant conditioning have come to recognize that it is not only the reinforcement schedule that shapes behaviour but also the perceived effort and that the latter is strongly dependent on mesolimbic dopamine (Salamone et al. 2018). Low energy state is strongly correlated to anhedonia but clearly distinct from negative emotionality that is the major target for cure.

Vulnerability to depression and treatment resistance

Even if people are genetically very similar or if they live under equally adverse conditions, this does not predict equal probability of developing major depression (Anisman et al. 2008). Classic methods of genetics and epidemiology have clearly shown that depression is the product of gene–environment interactions, occurring in biologically predisposed subjects under impact of adverse life events. Predisposition to depression or vulnerability is likely not a permanent condition; as such, it would bring about strong evolutionary pressure against it. Rather, vulnerability reflects the dynamic state into which the harmful factors can push the organism so that further adversity triggers the pathogenetic chain leading to full-scale depression (Harro 2013).

Willner and Mitchell (2002) refreshed the focus in animal modelling by introducing the discussion on the concept of models of predisposition of depression, defined as models that “increase the ease with which an analogue of major depression may be evoked, or a presentation analogous to dysthymia (chronic mild depression).” Such a vulnerability state was described for 11 genetic, genomic, developmental and lesion models, including several well-known “depression models” such as congenital learned helplessness or olfactory bulbectomy.

In order to be readily accepted as a model of depression, efficacy of antidepressants should have been demonstrated as part of validation. An obvious unmet need is the limited efficacy of the available antidepressants and their use in the validation process may reinforce the circulus vitiosus in drug development. But, could we have animal models for resistance to the antidepressant treatment of human depression? Willner and Belzung (2015) recently theoretically merged this clinically relevant need with the concept of animal models of depression vulnerability. This is based on the observation that antidepressants appear more effective in models that apply stress whereas many conditions in which “depression-like behaviour” is observed are not reliably reversed by conventional treatment. The criteria for a model of treatment-resistant depression would be evidence of increased stress responsivity and resistance to chronic administration of classic antidepressants. Willner and Belzung (2015) identified 18 potential models for studying treatment-resistant depression but concluded that all require further validation.

Returning to the valuable concept of vulnerability models brings to the table the issue of separation of pre-existing and emerging components in the aetiopathogenetic chain. The earliest contribution to depression vulnerability appears at the genetic/epigenetic level, manifested in e.g., sex/gender and certain personality traits but the pathogenetic process needs to build on that by proceeding stepwise, as indicated by the major contribution of early and current life events (Fava and Kendler 2000). As to animal models, vulnerability can be produced by genetic selection or manipulations, neurochemical lesions or environmental impacts. But, what defines vulnerability and what separates it from the “depression-like state”?

The vulnerability concept, left on its own, appears almost borderless but becomes meaningful in the context of the diathesis-stress concept of psychopathology. Stress brought about by adverse life experiences, here by definition, would elicit psychopathology only in vulnerable individuals. For animal modelling, the vulnerability state needs to be kept as persistent as possible to allow the stress component to reliably elicit the depression-like state. Vulnerability should be derived from what we know of human depression and stress should change the characteristic coping style of vulnerable animals.

Sex as a vulnerability factor will be discussed below in a separate section but of personality traits, neuroticism is the major culprit, with some additional independent contribution of extraversion (Kendler et al. 2006). Behavioural traits analogous to human personality can be measured in animals and several models of depression (or perhaps rather depression vulnerability) have been included in the above-cited reviews. Breeding approaches are a powerful tool that enable the researcher to examine the neurobiology of vulnerability and related behaviour in a robust model that does not require pretesting of animals and hence excludes one obvious confound. The side effect of breeding is becoming dependent on a specific genetic/neurobiological constellation responsible for “depressiveness” in the given pedigrees of animals. As the structural genetic and expression studies suggest, there are, however, many roads to depression. Of course the latter caveat also applies to the genetic modification-based models.

Vulnerability measures need be assessed for their distribution and can potentially have a non-linear relationship with outcome behaviour. Clear bimodal differentiation of the animals can lead to very different results as compared to dichotomization of a normally distributed variable. In extreme cases, an animal representing the opposite ends in the behavioural measure shares vulnerability (Matrov et al. 2016). Both linear and non-linear associations of behavioural traits and the expression of their variability can be identified between CNS activity levels and major adaptive strategies (Kanarik and Harro 2018). Vulnerability may causally be related to lower ability for the complexity of the response of the brain. In a whole-brain mapping study of neuronal activity by c-fos expression in the GFP transgenic mice, helpless and resilient animals in the learned helplessness paradigm were distinct (Kim et al. 2016) and interestingly, in these mice and in a separate positron emission tomography experiment in rats, a higher similarity between individual responses was observed in helpless animals.

The potential of neurobiological measures in animal modelling

Behaviour or neurobiology

Psychiatric diagnosis strongly relies on uniquely human features but from the viewpoint of affective neuroscience, these should be thought of as just another expression of species-specific aspects of behaviour. The evolutionary approach inevitably suggests the continuity of traits across species as an outcome of shared ancestry (Sapolsky 2016) and it is only necessary to figure out how are the shared features expressed. Owing to the fact that we do not diagnose depression by means of assessment of physiological measures, animal models have relied on behavioural expressions but this need not remain so (Harro 2004). What is less species-specific and context-dependent than behaviour are neurobiological underpinnings of both depression vulnerability and the depressed state and hence, modelling should rely less on face validity but rather on the known neurobiology of depression.

Anisman and Zacharko (1990) introduced neurobiological vulnerability to animal models, suggesting it lie in factors that “favor the provocation of amine depletions.” These and further mechanisms need be identified and can serve as behaviour-independent indicators of vulnerability, depression and treatment effect. Depression is clearly related to functional alterations in the monoaminergic systems but despite decades of research, no pathognomonic alteration has been identified. Instead, findings of genetic, morphological and physiological analyses all converge to suggest that the alterations that potentially underlie depression can be many and occur at distinct sites but that the dysfunction they produce can spread from one system to another as stressful events precipitate the pathology in vulnerable individuals (Harro and Oreland 2001; Nutt 2008; Harro and Kiive 2011).

Neurobiological vulnerability models are already around, if you have a close look. For example, one of the most valid animal models of depression (or vulnerability model, dependent on interpretation) is the bilateral removal of the olfactory bulbs that can elicit a number of behavioural, neuroendocrine and neuroimmune changes reminiscent of depression (Kelly et al. 1997). Bulbectomy is a straightforward procedure with a robust impact and this could make it less vulnerable to the large number of factors potentially affecting the model readouts. What is curious about this lesion model is that its cardinal symptom is hyperactivity in a novel arena, certainly not a symptom of depression but sensitive to antidepressant medications. Indeed, bilateral removal of olfactory bulbs in the rat was originally not meant to be a depression model nor was it used for antidepressant screening. Bulbectomy was introduced to study the neurobiology of learning and was collaterally found to have a more complex impact reminiscent of lesions to the limbic system (Marks et al. 1971). This manipulation was subsequently found to impair a variety of behaviours (see Hendriksen et al. 2015) until proposed as a model for antidepressant screening (van Riezen et al. 1977). By now, we know that bulbectomy leads to morphological changes in a variety of brain regions including the cortex and hippocampus (Morales-Medina et al. 2017) and a number of morphological changes in the frontal cortex are shared with those in patients of major depression (Rajkumar and Dawe 2018). Still, one may wonder whether the hyperactivity as a signal of depression (vulnerability) would stretch the excuse of species specificity of behaviours too far. Interestingly, hyperactivity is not immediate but grows week by week, appearing as a general behavioural disinhibition (Jaako-Movits and Zharkovsky 2005). Thus, hyperactivity is an obvious change from typical coping behaviour, in some way similar to what is observed in forced swimming at a different timescale and may be the sign of a response to chronic stress in animals rendered vulnerable by surgery. It is conceivable that for animals strongly relying on sense of smell but deprived from it, their ordinary living quarters appear as a strange environmental change and even stimuli normally considered non-noxious may be interpreted as sources of distress, hence making the post-lesion condition a diathesis-stress model. By similar developmental dynamics, this might also become true for several of the genetic mouse models (Harro 2013; Matrov et al. 2018).

The search for neurobiological markers

What could be the minimal neurobiological unit informative for depression (vulnerability) models? A recent surge in a study of depression models has made use of the power of hypothesis-free large-scale molecular assessments such as gene expression profiling (Molteni et al. 2013) and the hypothesis-based candidate protein research is now being complemented by large-scale proteomic analyses (Carboni et al. 2016) to reveal neurobiological readouts. A recent highlight of the role of micro-RNAs in gene expression has led to many attempts to identify these as biomarkers in clinical depression and in animal models but consistent findings are yet to emerge (Yuan et al. 2018). The diathesis-stress approach obviously suggests close attention to epigenetic signatures and comparative studies of histone modifications and methylation (Sun et al. 2013; Deussing and Jakovcevski 2017) in depressed humans and in animal models could aid in separating the vulnerability and pathogenetic aspects of the neurobiology of depression.

The so far largest attempt to use genome-wide expression profiling to identify common neurobiology for depression models at this level was made by the NewMood consortium (Deakin et al. 2011). This endeavour pursued the rationale that each animal model may have its specificity but models similar regarding depression endophenotypes should allow to identify common gene expression targets. Mouse and rat models were examined for expression of anhedonia, helplessness and anxiety and harmonized methods were applied to examine genome-wide expression in the raphe, hippocampus and frontal cortex. While each model produced a large number of differentially expressed genes, including several that are implicated in the neurobiology of depression, there was little in common between the models (Hoyle et al. 2011). This “negative” finding has broad implications. It could of course be argued that the models or brain regions selected were suboptimal or that advances in neurobiology and bioinformatics could uncover hidden common structure in the gene expression patterns. While the latter criticism may be useful, the inevitable conclusion from these experiments is that involvement of no single gene is necessary for the expression of anhedonia, anxiety or helplessness and supports the notion that in depression, a clinically similar picture can be painted by distinct pathophysiologies (Harro and Oreland 2001; Czéh et al. 2016). Other studies that have compared depression models have arrived at findings that can lead to basically a similar conclusion. In an experiment that compared gene expression in the hippocampus and amygdala of more and less immobile substrains of Wistar-Kyoto rats with changes elicited by chronic restraint stress, the authors found very few overlaps between the models, leading to the suggestion that “endogenous depression” and stress-induced depression are molecularly distinct (Andrus et al. 2012). Blood transcriptomes of these models appeared also very different (Pajer et al. 2012). In a mouse study (Malki et al. 2014) that compared gene expression after maternal separation and the unpredictable chronic mild stress protocols, contrasted as early and adult age adversity models, respectively, revealed a less than 10% overlap, considered small by the authors. Furthermore, gene expression profiles of response to different antidepressants are also very different (Malki et al. 2017). Transcriptional profiles may differ between brain regions as shown for the social defeat stress model and in animals resilient to stress alternative changes develop (Kanarik et al. 2011; Bagot et al. 2016). It is important to notice that transcriptomic and proteomic analyses are conducted brain region-wise and hence the interpretation of the findings may change as more is learned from the roles of brain networks. A nice example of how experimental precision medicine can work in depression research was recently presented by Rummel and co-authors (Rummel et al. 2016). They compared the effect of deep brain stimulation of three brain regions in two rat models of depression and monitored the efficacy of the intervention by in vivo and ex vivo neurochemical measurements. Amongst the findings, the efficacy of deep brain stimulation in a specific brain region depended on the model.

Differences in the structure of DNA and the indicators of gene expression eventually become part of the pathogenesis if they lead to salient alterations at the protein level in critically important brain circuits. As one promising lead, Cox et al. (2016) compared proteomic data from the anterior prefrontal cortex in human depression with findings in three stress models and found similar alterations in a number of protein-protein interaction networks, most notably carbohydrate metabolism and cellular respiration. While research at the protein level in vivo remains indirect, microstructural alterations involved in stress response (and not present in resilient animals) can be established using in vivo imaging and further detail can be specified at the tissue level (Khan et al. 2018). While synaptic remodelling in depression-like states has received attention for quite some time (Castrén et al. 2007; Kas et al. 2011), alterations in glial cells are increasingly recognized in clinical depression and also in animal models (Sild et al. 2017; Wang et al. 2017a), thus broadening the traditional set of candidate mechanisms for depression.

It appears likely that minimal neurobiological unit informative for depression models should be searched for endophenotype-wise. As an example from addiction research, drugs that can produce psychological dependence also increase extracellular dopamine levels in the nucleus accumbens but obviously, these increased dopamine levels do not correspond to the whole of the syndrome of drug addiction. Reward circuitry has indeed received increasing attention as functioning inadequately in depression and the details of these alterations are being singled out (Knowland and Lim 2018). The brain reward system is well characterized and measurement of its function could serve as a better indicator for symptoms of depression than behaviour. While it is recognized that anhedonia is not unique to depression but also present in e.g., schizophrenia (Gass and Wotjak 2013), this non-specificity does not matter to building modular testing systems. Optogenetic tools have already been successfully used to modify the mesolimbic dopaminergic circuit to control depression-related behaviours (Tye et al. 2013).

The advances of preclinical neuroimaging conducted with animal models of depression (McIntosh et al. 2017) further increase the hope that neurobiological indicators can provide a strong impetus to depression modelling. For example, derived from the neurobiology of human depression and the hyperconnectivity of the default-mode network (Kaiser et al. 2015; Mulders et al. 2015), promising experimental work has been conducted by applying chronic psychosocial stress to mice (Grandjean et al. 2016) and in the rat congenital–learned helplessness model (Vollmayr and Gass 2013), recently relabeled as the negative cognitive state model of depression (Gass et al. 2016; Clemm von Hohenberg et al. 2018). At a different spatio-temporal level, hippocampal circuit-level activity measurements have been found useful to explain changes in immobility as measured in the forced swimming paradigm (Airan et al. 2007). These tools need not entirely replace behavioural assays but guide the assessment of behaviour.

Diathesis-stress concept: the hidden meaning to the models

Separation of the vulnerability and adversity factors is a vital step for animal models of depression but not the final one. Recognition that study of the neurobiology of depression must include an understanding of the mechanisms of adaptation to adverse stimuli and how these succeed or fail, will bring animal modelling to a new level closer to the goals of precision medicine. All behavioural or neurobiological changes occurring in conditions of applied stress should not be immediately interpreted as depression-related, even if they may resemble signs of human depression. Diathesis-stress models may aid in separating the neurobiology of adaptive and less adaptive responses to adverse life events. Attempts to build resilience toward lifetime adversities will occur throughout life, as the individual learns about its vulnerabilities (Harro 2010). This resilience reconstruction starts very early as genetic vulnerability is associated with major alterations in gene expression profiles and oxidative metabolism during the very first postnatal weeks (McCoy et al. 2016). The capability of resilience building is maintained in adults. Stressful events elicit persistent changes in monoamine systems that appear to be protective against further stress (Oosterhof et al. 2016). Nevertheless, obviously, only a selection of stressful events promote resilience building, while others barely remain tolerated or impair the homeostasis hence leading to depression-like state (Fig. 2).

Fig. 2
figure 2

Highly simplified neurobiological formal models for animal modelling of depression. a Most depression research considers the potential impact of a single gene (G) out of large variety that is moderated by the environment and somehow is channeled through monoamine (NA, noradrenaline; 5-HT, serotonin; DA, dopamine) systems to symptoms (S) and symptom clusters of depression. Subtypes of depression are likely with distinct pathogenesis and clinical picture. b Variations of the previous model in terms of (a) early integration of the impact of vulnerability genes and (b) postulating the spreading adjustment disorder in the monoaminergic systems that integrate glutamatergic (Glu) excitation and GABA-ergic inhibition, hence making the exact location of the initial hit less salient. Exactly how environmental factors (En) contribute to the building of vulnerability vs. resilience is not well known. c The long evolutionary history of serotonin, dopamine and noradrenaline has led to the development of a huge resource of adaptive capabilities in the interconnected monoaminergic system that buffers most primal gene-environment impacts into distinct but successful coping strategies; it is at the level of these neurobiological coping resources, possibly involving a large variety of neuromodulatory substances, where gene-environment interactions can become pathogenetic

Shifts in coping style from active to passive and vice versa may reflect a stress axis-mediated change in relative activities of the dopaminergic (de Kloet and Molendijk 2016), noradrenergic (Atzori et al. 2016) and serotonergic (Carhart-Harris and Nutt 2017) systems. These shifts are likely to depend on pre-existent inter-individual variability and lead to distinct symptomatology upon provocation. At the level of monoaminergic neurotransmission, the pre-existing deviations may remain behaviourally silent until a major impact triggers decompensation. For example, a fast-scan cyclic voltammetry study that examined serotonin release in the raphe-prefrontal axis and dopamine release in the ventral tegmental area projections to the nucleus accumbens before and after pilocarpine-induced seizure activity found that these rats that had lower prefrontal serotonin release in response to dorsal raphe developed immobility in the forced swimming test whereas those who had deviant dopaminergic activity developed anhedonia in the taste preference test (Medel-Matus et al. 2017). It should of course be borne in mind that the monoaminergic systems operate to harmonize glutamatergic excitatory neurotransmission together with GABA-ergic inhibition, plus further neuromodulators that probably act in a more brain region and stimulus-specific manner.

Proteomic approaches to animal models of depression lend support to the presence of alterations in energy metabolism in depression (Carboni et al. 2016). Long-term cerebral activity levels are well reflected in the activity of cytochrome oxidase, which is rate limiting to oxidative phosphorylation (Sakata et al. 2005). Differences in the activity of cytochrome oxidase have been reported for a variety of depression-related brain regions but again, these appear as specific to the model (Harro et al. 2011). An analysis across models by the factors of vulnerability and chronic stress, however, revealed a remarkably coherent picture. Vulnerability is, in general, associated with increased oxidative metabolism, with overall higher inter-correlations between regional activities (Harro et al. 2014). This may underlie the reduced behavioural variability of vulnerable strains. On the other hand, stress can produce distinct responses throughout the brain, either an increase, a decrease or an interactive effect with the pre-existent vulnerability. Multidimensional scaling reveals that the inter-relationships of regional activities are globally different in vulnerable and stress states and in the diathesis-stress condition (Harro et al. 2014). This concerns many depression-related brain areas like the extended amygdala, cingulate, the VTA axis and hippocampal regions. These patterns appear to reflect the extensive but specific synaptic remodelling of excitatory synapses and the resulting alterations in balance with inhibitory synapses (Csabai et al. 2017).

Sex: a vulnerability factor or an independent variable

Evidence is on the increase that measures of male and female brain are largely overlapping in distribution but significant average differences exist between many variables (Ritchie et al. 2018). These differences can be regionally specific, exist in both grey and white matter and include functional connectome. Sex differences can be observed emerging in the developing brain (Wierenga et al. 2018) and in boys, the brain structure is more variable. Well known is the fact that affective and many other stress-related disorders are much more prevalent in females. Prevalence alone is not a decisive argument for paying much attention to the sex/gender factor at the pathogenetic level but if the diathesis-stress mechanisms run differently owing to distinct coping strategies, male and female affective disorders may form neurobiologically non-overlapping clusters. Indeed, sex differences in depression are very significant at the molecular level. Transcriptional profiles were largely different in male and female depression across several cortical areas (Labonté et al. 2017). Another analysis of gene expression data in three brain regions, the dorsolateral prefrontal cortex, subgenual anterior cingulate cortex and basolateral amygdala, recently found that more genes were differentially expressed in male and female depressed subjects in opposite direction to each other than in the same direction (Seney et al. 2018). While male depression was generally associated with decreases in synapse-related genes and increases in glial cell-related genes, in females, it was the other way around. This sex difference is not surprising given that monoaminergic gene-environment interactions can be qualitatively different in males and females (Harro and Oreland 2016).

What implications does this have to animal models of depression? Behavioural strategies in paradigms assessing learning and decision-making have important differences between males and females (Shansky 2018). Differences between males and females have been found in antidepressant screening tests such as forced swimming (Kokras et al. 2015). But, the differences found have been unsystematic. In depression, vulnerability models like the Wistar-Kyoto rat and the Flinders Sensitive rat, the depressive-like phenotype may also be significantly more pronounced either in males (WKY; Burke et al. 2016) or in females (FSR; Sanchez et al. 2018). Similarly, chronic stressors have been found to affect female rodents less than males (Dadomo et al. 2018) and vice versa (Hodes et al. 2015). This may depend on which neural adaptations are elicited under particular experimental conditions (Borrow et al. 2018) as males and females may be differentially sensitive to specific stressors (Dadomo et al. 2018). Repeated presentation of an ethologically relevant stressor elicits coping approaches that can be fundamentally different in males and females (Steinman and Trainor 2017).

Indeed, differences between males and females are observed in neurobiological measurements. The effect of chronic variable stress on cortical transcriptomics in male and female mice has been found non-overlapping (Labonté et al. 2017) as is its effect on cerebral oxidative metabolism (Mällo et al. 2009). This is not surprising given that the neurobiology of CRF, a pivotal factor in the stress response, is also very different in males and females (Bangasser and Wiesielis 2018). This spreads to the monoamine systems. Serotonergic neurotransmission has been found to react differently in a variety of animal models (Dalla et al. 2010) and both acute and chronic stress as applied in depression research can affect the activity of dopamine neurons in the ventral tegmental area to a greater degree in female rats (Rincón-Cortés and Grace 2017). Sex differences in response to environments further interact with inter-individual differences that occur in both sexes (Liu et al. 2018).

Conclusively, the neurobiology of depression in males and females has serious dissimilarities that merit further investigation but it should still be borne in mind that male-female differences cannot be mechanistically translated from animal to human. Sex differences are highly species-specific overall and certainly, human sex- or gender-related disparities have a strong sociocultural component that is specific to our species (Eliot and Richardson 2016). This does, however, not exclude the similarities in the respective neurobiologies between species that share salient basic coping strategies.

Future prospects for reconciliation

While it may need a major breakthrough in antidepressant drug development, clearly aided by animal research, to convince the majority of skeptics, stepwise developments could immediately aid in enhancement of support to animal modelling (see debate in Table 1). Technological advances aiding the inter-species translation of depression neurobiology on the evolutionally common ground are already heralding a bright future where behaviour and neurobiology go hand in hand both in making the diagnosis and in search for novel treatments. Other, more prosaic aspects also need focus. The classic paper of McKinney and Bunney (1969) included not four (as occasionally quoted) but five validity criteria (originally “minimum requirements”) for a model system. The fifth (“the system should be reproducible by other investigators”) was an early warning of the later developments that rely on a nominal statistical significance of a single experiment, the frequent modification of original techniques, or ignore the experimenter effect (Chesler et al. 2002). In recent years, the reproducibility issue has become a major argument and the grasp for handy aids to remedy the situation has most often focused on statistics. This could be a rather unproductive path because reproducibility failures are most probably rather about the hidden variables in the animal or its environment. Failures to replicate are often failures to generalize across experimental conditions (Redish et al. 2018). The fundamental variables that determine a particular outcome after manipulation with a causal agent need be identified and this can only happen by replication attempts that include failures if important conditions are different.

Together with modern neurobiological technologies and more effort to reproduce essential findings, new conceptual approaches to signs and symptoms, both behavioural and neurobiological, are required. While admittedly these approaches may vary between schools of thought, there is a universal need to reconceptualize the understanding of pathogenesis of depression and structure of the depressed brain, relying on systems physiology. The simple one-core symptom or one-neurotransmitter concept, still latently guiding much of research, can, in occasion, be fruitfully applied to depression but cannot explain the whole universum of this pathological meta-stable state of the CNS. Much of this reconceptualization will need animal experiments for guidance and confirmation.