Introduction

Just as conspecifics represent an essential part of the environment of group-living individuals, humans can be considered a major element in domestic animals’ immediate environment. However, the significance that animals attribute to humans remains unclear.

von Uexküll (1965), in his famous theory of meaning, argued that organisms construct their own subjective world by choosing among “objects” surrounding it, on the basis of functional circles (i.e. relations between perceptual and effecter cues of organism/object). Objects, inanimate or animated, that allow organisms to construct their own developmental plan can consequently be considered as “significant” for an organism. We argue that data related to inter-specific interactions support von Uexküll’s theoretical framework perfectly: animals regularly encounter, in their environment, animals of other species (including humans) that can potentially become significant. For instance, vervet monkeys (Chlorocebus aethiops) in Amboseli inhabit the same areas as superb starlings (Spreo superbus). Both vervets and superb starlings emit acoustic alarm calls after perceiving predators of both species. Interestingly, when adult vervet monkeys hear the starlings’ predator alarm calls, they produce the typical response adapted to that predator (e.g. looking up after hearing a starlings’ eagle alarm call); this suggests that starlings have progressively become significant objects for these monkeys. On the other hand, playbacks of calls of species living in the same area, but with little biological importance to the monkeys (e.g. hippopotamus Hippopotamus amphibius, which never attack nor compete with monkeys and are not threatened by the same predators) do not elicit monkeys’ responses (Cheney and Seyfarth 1990). Thus hippopotamus can consequently be considered as non-significant objects from the monkeys’ point of view.

Following this theory of meaning, humans among other species can progressively become significant for animals with experience (e.g. the owner becomes a “feeding object” for a puppy, von Uexküll 1965). Data from social cognitive studies also suggest that humans can become significant for animals. First, studies showed that chimpanzees (Pan troglodytes) are able to use cues of attention in humans correctly (e.g. presence of eyes, Hostetter et al. 2007, body and head orientation, Kaminski et al. 2004) to ensure that their visual begging gestures would be effective. Furthermore, recent studies showed that some pets, especially dogs (Canis familiaris), are also able to use human gestures (e.g. pointing) or body orientation as cues to find hidden food (dogs: Miklósi et al. 1998, 2005; Udell et al. 2008; cats Felis catus, Miklósi et al. 2005) or when begging (dogs: Gácsi et al. 2004). These results strongly suggest that animals are able to associate humans with particular relevant situations (e.g. obtaining food), i.e. are able to form a “memory” of humans leading them to develop efficient communication strategies.

In the present study, we investigated the significance that domestic horses (Equus caballus) attribute to humans, and hypothesized that this significance relied on experience. Like domestic dogs (Miklósi et al. 2004), horses can be informative models for investigating the perception of humans by domestic animals. First, through domestication, the human environment now represents the ecological life conditions for this species (Maros et al. 2008); domesticated horses allow researchers to evaluate animals’ perception of humans in “natural” interactions. A recent study showed that horses can react, for instance, to pointing by humans to find hidden food, suggesting that they are able to associate humans with relevant situations, developing communication strategies based on humans’ cues (Maros et al. 2008). Second, horses are one of the species that share a direct work-related relationship with humans, allowing researchers to assess horses’ perception of humans in various situations, which may induce different memories of humans, depending on interaction context (context associated with work or not…).

Comparisons of the impact at later stages of early interactions between humans and foals revealed that early experience with humans can induce either positive (revealed by approach, contact) or negative (avoidance) memories (Henry et al. 2005, 2006). Moreover, several studies showed that this memory established with familiar humans is generalized to unfamiliar persons (Hausberger and Muller 2002; Henry et al. 2005; Krueger 2007). Although the temperament of the horse may be involved (Lansade and Bouissou 2008), these results also suggest that horses are able to create a memory of humans based on their previous experience and transfer it to an extended interaction context or to unfamiliar humans. Recent findings suggest that horses are able to associate objects with a memory of human-related situations. Thus, breeding mares presented with a bucket (food-associated object), a cone (unknown object) or a white shirt (used by vets for their daily examination), were more reluctant to approach the shirt than the other objects. Moreover, they presented lateralized visual responses comparable to those observed for fear-inducing objects (De Boyer des Roches et al. 2008; Larose et al. 2006).

The aim of the present study was to determine whether horses have a kind of memory of humans (based on previous interactions), leading to a general significance of humans, revealed by their reactions to humans in subsequent interactions. To test this hypothesis, we performed three types of behavioural tests on adult riding-school horses (which were used to being handled by many different humans and may already have distinct expectations of humans in general). Our tests involved an unknown experimenter and corresponded to three potential memories of human–horse relationships (not work-related, work-related, unfamiliar “working task”). Growing evidence shows that training conditions have an impact on all of a horse’s daily life (e.g. Caanitz et al. 1991) and a study of more than 700 horses showed that the type of work influenced emotionality and tendencies to present abnormal behaviour outside work (Hausberger et al. 2004; Hausberger et al., unpublished data). The working situation may be associated with emotional and physical demands that make this context very particular compared to home-living pets and may therefore create quite different memories of the relationship (McGreevy and McLean 2005). We investigated this aspect using work-related objects such as halter and saddle, which are, for these horses, generally only used prior to riding, as the above-mentioned study (De Boyer des Roches et al. 2008) suggested possible associations between objects and memories of human-related situations. We also performed standardized observations of routine interactions of each horse with its familiar handler (caretaker) to record data concerning its daily relationship with its familiar human surroundings. To get a broad overview of the horses’ reactions to humans, we recorded both investigative and aggressive behaviours during the tests, as they represent respectively, a “positive” and a “negative” memory of the relationship. We hypothesized that if horses attributed a general significance to humans, they would react similarly to humans whatever the interaction context. Consequently, frequencies of positive (or conversely aggressive) behaviour towards the experimenter should correlate between tests, and horses’ reactions in a given type of test should predict the animals’ reactions in other types of tests. Furthermore, frequencies of positive (or conversely aggressive) behaviour towards the unfamiliar experimenter should correlate with the horses’ reactions to its familiar caretaker.

Materials and methods

Subjects

Fifty-nine horses, from three riding centres, were tested. Activities and housing conditions in the centres were similar. In all cases, the horses were kept individualy in 3 m × 3 m individual straw-bedded boxes. Each box was cleaned once a day (in the morning). Animals were fed industrial pellets three times a day and hay was provided ad libitum once a day. Each box was equipped with an automatic drinker. Horses worked in riding lessons for 4–12 h a week, with at least 1 free day each week (closing day). Riding lessons involved children and teenagers and were mainly related to indoor (instruction) and outdoor activities, including a few competition activities. Both geldings (44 animals) and mares (15 animals) were tested. Sixty-eight percent of the horses were French Saddlebreds, equally distributed among the centres. Other horses belonged to a variety of breeds or were unregistered animals. Their ages ranged from 5 to 20 years old.

Methods

Experimental tests (unfamiliar experimenter)

Procedure

As mentioned in the introduction, we performed a large battery of experimental tests, all involving an experimenter (unknown to the horses before the first test). Some of these tests have been developed and are commonly used to evaluate human/animal relationships, although mostly independently (Waiblinger et al. 2006; Hausberger et al. 2008). Three types of tests, corresponding to three different potential memories of the relationship were chosen:

  1. 1.

    Human presence in a non-work-related context, three tests were chosen:

    • The motionless person test (MP) (e.g. Henry et al. 2005; Seaman et al. 2002; Visser et al. 2001), where the experimenter entered the box and stood with her back against the closed door, facing inwards and looking at the ground. Each test lasted 5 min.

    • The approach contact test (ACT) (based on, e.g. Henry et al. 2005; Pritchard et al. 2005; Søndergaard and Halekon 2003), where the experimenter entered the box and stood motionless at 1.5 m from the animal until the horse started feeding again (hay, straw), then she came closer to the animal and tried to touch its neck. She approached it from the side, walking slowly and regularly at approximately one step per second, hands hanging by her sides, looking towards the horse’s shoulder. The horse was free to withdraw. If the horse threatened the experimenter during her approach, or withdrew from her, she retreated to 1.5 m from it and renewed a trial. The test was stopped when the experimenter could stroke the horse’s neck continuously for 2 s or after three unsuccessful trials. Both sides of the horse were tested in a random order, i.e. the test was performed twice (2-h interval).

    • The sudden approach test (SudAp) (Hausberger and Muller 2002), where the experimenter, walking slowly along the corridor appeared suddenly at the closed door of the box while the horse was feeding (hay, straw), head down. She recorded the horse’s first reaction (see ‘Measures’ below).

  2. 2.

    Human presence using work-related objects, two tests were developed here.

    • The saddle test (Saddle) was developed because horses were always fitted with their saddle in their box before the beginning of a riding lesson. The test followed the same procedure as the SudAp test, except that the experimenter carried a saddle on her right arm and opened the box door. Again, the horse’s first reactions when the saddle became visible were recorded.

    • The halter fitting test (Halter) (e.g. Lansade and Bouissou 2008) was developed because horses were always fitted with their halter and tied up in their box before a riding lesson. The experimenter entered the box, holding a halter with her left hand. She approached the animal, walking slowly and regularly towards the horse’s left shoulder, at approximately one step per second. When she was near the horse, she stopped walking, put her right arm over the horse’s neck and fitted the halter.

  3. 3.

    An unfamiliar “working task”, the Bridge test, where the experimenter led the horse and tried to make it cross an unknown potentially fear-inducing obstacle. This test is generally used to assess emotionality (Wolff et al. 1997; Visser et al. 2001), but its human-related context suggests that it may also reveal the perception of humans by horses as either positively reassuring or not (e.g. sheep: Tallet et al. 2006; cattle: Waiblinger et al. 2004). Here, the test took place in a familiar indoor arena. A mattress, 200 × 100 × 10 cm and covered with a blue and white check-patterned oilcloth (unusual colour and substrate) representing the bridge. A starting line was drawn in the sand 2 m in front of the bridge. A single assistant, unfamiliar to the horses, fitted the horse’s halter in its box and led it to the test arena door. Then the experimenter led the horse and tried to make it cross the bridge. She was not allowed to touch or talk to the animal. Her actions were limited to pulling the rope slightly if necessary. Many animals avoided walking on the bridge and walked by its side. In this case, they were led back to the starting line and a new trial began. A trial was considered successful when the horse crossed the bridge, walking on it with all four feet. The stopwatch was stopped when the fourth foot was back on normal ground. The test was stopped after 10 min if the animal refused to cross the bridge.

All tests were performed during an 8-day period by the same unfamiliar experimenter (female, dark hair), in the following order: MP, ACT, Saddle, Bridge, SudAp and Halter, during quiet (no riding lessons) periods during the day (08.00–22.00). Horses were randomly assigned to time periods.

Measures (Table 1)

All test observations were tape-recorded and transcribed subsequently. Two types of measures were made according to the tests:

Table 1 Human-directed behavioural variables recorded in the experimental tests (unfamiliar experimenter). Adapted from Henry et al. 2005
  • We used Hausberger and Muller’s (2002) scoring method for the SudAp test and our adapted Saddle test. Five scores were given according to a gradient ranging from very “friendly” to very aggressive behaviour: the horse looks at the experimenter with upright ears and approaches: score A; the horse looks at the experimenter with upright ears and remains where it is: score B; the horse shows no evidence of directed attention towards the experimenter (no change in behaviour, no gaze towards the person): score C; the horse looks at the experimenter with ears laid back and remains where it is: score D; the horse looks at the experimenter with ears laid back and approaches with a threatening posture (neck lowered, head extended or even exposed incisors): score E. Each horse was tested five times, at different times of the day, yielding five scores for each horse for each test. For instance, a given horse in a given test (e.g. sudden approach) could scored “A” in the first, the second and the third sudden approach tests, “B” in the fourth test and “A” again in the last test. This would yield five data: AAABA for this horse. In a previous study (Hausberger and Muller 2002), scores A and E were the most rarely observed reactions and scores A and B were most often related (present in the same horses) as were D and E. Therefore, horses were considered as having positive (A + B), indifferent (C) or negative (D + E) reactions.

  • In other tests (MP, ACT, Halter and Bridge), behavioural frequencies, latencies, numbers of trials and success/failure (e.g. to touch, to fit, to cross) were recorded (Table 1). All human-directed behaviours were recorded ad libitum, and in addition, for the MP test, times spent in different distance categories (physical contact = 0, [≤0.5 m], [0.5–1 m], >1 m) were recorded using a scan sampling method with an interval of 30 s.

Observations (familiar person)

Standardized observations of a routine interaction with a familiar caretaker were performed once for each horse. In all schools, the caretaker was in charge of the daily routine: cleaning the box, bringing hay, all handling other than riding (taking it to the vet, the farrier, clipping hair, etc.). Caretakers never rode the horses (no work association). We chose the cleaning of the box because it was an easily repeatable situation involving the caretakers in the three riding schools. Furthermore, it took place in all riding schools during quiet periods (no riding lessons), between 07.00 and 09.00 (according to that riding school’s schedule). Horses and their caretaker were video-recorded by the experimenter (who could thus remain silent) standing outside the box from the beginning (when the caretaker opened the box door) until the end of the box cleaning (when the caretaker closed the box door). All caretakers were asked permission to be observed beforehand, told that horses’ behaviour were video-recorded for further analysis and asked to behave in their usual way.

Duration of box cleaning varied from 11 to 149 s. In order to have significant data, only sessions lasting at least 60 s were analysed, so this excluded 29 horses. For the remaining horses (N = 30), only the first 60 s were analysed. Data recorded ad libitum were investigative behaviour (i.e. approach and sniffing) and aggressive behaviour (i.e. threats) towards the caretaker (Table 1).

Statistical analyses

Although the proportions of aggressive behaviours differed between riding schools (Fureix et al., unpublished data), correspondences between tests in the horses’ reactions were similar in all three schools (e.g. number of threats in the MP test/number of threats in the Halter test, Spearman correlations, school 1: ρ = 0.73, P < 0.01; school 2: ρ = 0.60, P < 0.001; school 3: ρ = 0.40, P < 0.10). In school 1, 100% of the horses that threatened the experimenter at least once in the MP test also threatened her in the Halter test, whereas only 30% of the non-threatening horses in the MP test did it in the Halter test. The same trends could be observed in other schools: 100% (against 11%) in school 2 and 40% (against 23%) in school 3. As the trends in the different riding schools were similar, data could be pooled for further analysis. Data analysis was conducted using Statistica© 7.1 software, and the accepted P level was 0.05. Data collected were binary variables (e.g. success/failure), time (e.g. latency to be touched), number of trials (e.g. to cross the bridge), occurrence of behaviours and, in particular for the MP test, percentage of time spent in different distance categories. Investigative behaviour data (approach, sniffing, licking) (Table 1) were pooled for further analysis. No significant differences were related to approach side (failure to touch according to approach side: Fisher test, N = 59, P > 0.05; latency to touch: Wilcoxon test, \( \overline{X}_{\text{Right}} \) = 3.09 ± 1.80, \( \overline{X}_{\text{Left}} \) = 2.95 ± 2.08, Z = 0.95, P > 0.05); therefore, data for both sides of approaches in the ACT test were pooled for further analysis (therefore minimal score was two trials to be touched).

As our data were not normally distributed, we used non-parametric statistical tests (Siegel and Castellan 1988). Friedman tests compared the proportions of time spent at the different distance categories (preferential distances) in the MP test. Chi-square tests compared the proportions of animals displaying investigative or aggressive (or neutral for instance in the Saddle test) behaviours between tests. Fisher tests determined whether two independent groups, defined by the classification of one variable (e.g. threatening horses/not threatening horses in the MP test), differed in proportions in relation to classification for a second variable (e.g. failure to cross the bridge/bridge crossed). Mann–Whitney U tests compared latencies, numbers of trials, of threats and percentages of time in one test between two independent groups. Spearman correlations evaluated correlations of latencies, numbers of trials, of threats, of investigative behaviours and percentages of time spent at the different distance categories (MP test) between test categories. We present our main results below.

Results

General reactions to the tests (Table 2)

Tests related to human presence in a non-work-related context

In our first category of tests, 58% of the horses approached and made physical contact at least once with the motionless experimenter, but horses tended to spend most of the time at a moderate distance (\( \overline{X} \)  = 35% between 0.5 and 1 m, Friedman test(N=59, df=3) = 7.17, P = 0.06) from the motionless person. All the horses could be touched during the ACT test (but 14% of them required more than one trial per side, mean latency to touch: 6.12 ± 3.28 s, range 4–22). Interestingly, the proportions of horses showing investigation/positive behaviour or aggressiveness varied according to test (Fig. 1). Fifty-eight percent of the horses displayed investigative behaviour towards the experimenter at least once (\( \overline{X} \)  = 6 ± 6.24, 1–32) in the MP test, 20% in the ACT test and only 12% of the horses showed a positive approach (score A) in the SudAp test. A converse gradient could be observed for threats: 51% of the horses threatened the experimenter in the SudAp test (score D + E), 32% in the ACT test and 15% only in the MP test (\( \overline{X} \) = 4 ± 3, 1–9) (Chi-square test, χ 2 = 30.25, df = 2, P = 0.001). Comparing these results, the MP test appeared to be the test that induced the more positive reactions.

Table 2 Percentage of horses (followed, according to test, by the mean number\( \overline{X} \) ± standard error, range) displaying investigative and aggressive behaviour at least once (1) towards the experimenter in the six experimental tests and (2) towards their familiar caretaker
Fig. 1
figure 1

Investigative (a) and aggressive (b) behaviour (in percent of horses) displayed towards the experimenter in the motionless person test (MP), the approach-contact test (ACT) and the sudden approach test (SudAp). A converse gradient between the three tests appeared for investigative and aggressive behaviour

Tests related to human presence using work-related objects

In the work-related tests, most horses reacted to the visible saddle with positive reactions (54% A + B) and with more indifference (34% C) than aggressiveness (12% D + E) (Chi-square test, χ 2 = 78.19, df = 2, P = 0.001). All horses accepted halter fitting (\( \overline{X} \) to fit = 22.71 ± 20.93 s, 14–178). Several horses presented investigative behaviours at least once in these contexts: 54% in the Saddle test and 29% in the Halter test (\( \overline{X} \) = 3.4 ± 4.73, 1–20). Twenty-four percent of the horses threatened the experimenter at least once in the Saddle test, and 25% in the Halter test (\( \overline{X} \) = 1.29 ± 0.59, 1–3) (Table 2).

The unfamiliar “working task”

In the Bridge test, 75% of the horses succeeded to cross the bridge within the allocated time (time to cross: \( \overline{X} \) = 251.39 ± 243.17 s, 11–600 s, which was the time limit; number of trials to cross: \( \overline{X} \) = 2.98 ± 2.92, 1–12). Fifty-eight percent of the horses performed investigative behaviour towards the experimenter at least once (\( \overline{X} \) = 3.79 ± 3.49, 1–14), as in the MP test. No aggressive behaviour was recorded.

Comparisons between experimental tests (Table 3)

We hypothesized that if horses were able to attribute a general significance to humans, they should react similar to humans whatever the context.

Table 3 Significant correlations of horses’ positive and negative (in bold) reactions to humans between all assessed contexts. Read from line to column, e.g. first line: the horses that investigated the experimenter at least once in the MP test investigated their caretaker more

Tests related to human presence in a non-work-related context as predictors of horses’ reactions in other tests

Motionless person test

The horses that threatened the experimenter at least once in the MP test were also those that were difficult to touch in the ACT test (more than two trials before being touched in the ACT test, Fisher test, N = 59, P = 0.01), that threatened her at least once in the SudAp test, the Saddle test and the Halter test (Fisher tests, N = 59, respectively, P = 0.03, P = 0.0002 and P = 0.006), and threatened her more often (numbers of threats) in the Halter test (Mann–Whitney test, n Threat.MT = 9; n Non-Threat.MT = 50; \( \overline{X}_{{{\text{Threat}} . {\text{MT}}}} \) = 4 ± 6.22; \( \overline{X}_{{{\text{Non-Threat}} . {\text{MT}}}} \) = 0.30 ± 0.76; U = 94; P = 0.005). The greater the number of threats in the MP test, the longer it took to fit the halter (Spearman correlation, n = 59, ρ = 0.34; P = 0.01). Moreover, the less time horses spent in contact with the experimenter in the MP test, the longer it took to touch them in the ACT test (Spearman correlation; ρ = −0.41; P = 0.002).

However, horses’ positive reactions in the MP test did not predict their reactions to work-related objects (Saddle, Halter, Fisher tests, N = 59, P > 0.05 in both cases). On the contrary the horses that spent more time in contact with the experimenter were also those that investigated her at least once in the ACT test (Mann–Whitney test; n Investigative.ACT = 12; n Non-Investigative.ACT = 47; \( \overline{X}_{{{\text{Investigative}}.{\text{ACT}}}} \) = 34.55 ± 37.25; \( \overline{X}_{\text{Non-InvestigativeACT}} \) = 10.21 ± 18.71; U = 178; P = 0.05). Time spent near ([≤0.5 m]) the experimenter was positively correlated with frequencies of investigation in the Bridge test (Spearman correlation, n = 59, ρ = 0.34; P = 0.01). MP tests were therefore predictive only for non-work-related situations when only positive reactions were being considered.

Approach contact test

The horses that could be touched only after more than two trials in the ACT test were also those that threatened the experimenter at least once in the Saddle test and the Halter test (Fisher tests; N = 59, respectively, P = 0.01 and P = 0.002), that took longer to be fitted in the Halter test and to cross the bridge (Mann-Whitney tests; n >2trialsACT = 8; n =2trials = 51, respectively; \( \overline{X}_{{ > 2{\text{trialsACT}}}} \) = 43.13 ± 54.86, \( \overline{X}_{{ = 2{\text{trials}}}} \) = 19.51 ± 3.14, U = 105.5, P = 0.03 in the Halter test; \( \overline{X}_{{ > 2{\text{trialsACT}}}} \) = 530.88 ± 195.52, \( \overline{X}_{{ = 2{\text{trials}}}} \) = 215.10 ± 225.36; U = 74; P = 0.002 in the Bridge test). Most of these horses failed to cross (Fisher test; N = 59; P = 0.0002). Moreover, the longer it took to touch a horse in the ACT test, the more threats the horse performed in the Halter test (Spearman correlation; ρ = 0.37; P = 0.005).

Considering horses’ positive reactions, the horses that could be touched both sides on the first trial in the ACT test also investigated the experimenter (score A) or displayed positive reaction (score B) at least once in the Saddle test (Fisher test, N = 59, P = 0.001). Furthermore, horses that had shown investigative behaviour at least once in the ACT test crossed the bridge more successfully (Fisher test, N = 59, P = 0.03).

Sudden approach test

The horses that threatened the experimenter at least once in the SudAp were also those that threatened her in the Saddle and the Halter tests (Fisher tests, N = 59, respectively, P = 0.03 and P = 0.02), and they threatened her significantly more considering the number of threats in the Halter test (Mann–Whitney test, n Threat.SudAp = 30, n Non-Threat.SudAp = 29; \( \overline{X}_{{{\text{Threat}} . {\text{SudAp}}}} \) = 1.57 ± 3.75, \( \overline{X}_{{{\text{Non-Threat}}.{\text{SudAp}}}} \) = 0.14 ± 0.44; U = 295.5, P = 0.03).

Conversely, the horses that investigated the experimenter (score A) or displayed positive reaction (score B) at least once in the SudAp test did also in the Saddle test (Fisher test, N = 59, P = 0.01).

All together, these results show that horses’ behaviour, especially their aggressive reactions and handling difficulties, in tests related to human presence in a non-work-related context (MP, ACT and SudAp tests) could predict their reactions in the other non-work-related tests, but also in tests related to human presence with work-related objects. Interestingly, the analysis of positive reactions suggest that horses seemed to separate “non-invasive” (MP test) from “invasive” (all the other tests) situations.

Relations between tests using work-related objects

Aggressiveness level in one test predicted aggressiveness level in the other test: horses that threatened the experimenter at least once in the Saddle test also threatened her at least once in the Halter test (Fisher test, N = 59, P = 0.0004), and they threatened her more in the Halter test (Mann–Whitney test, n ThreatSaddle = 14, n Non-ThreatSaddle = 45; \( \overline{X}_{\text{ThreatSaddle}} \) = 3.07 ± 5.14, \( \overline{X}_{\text{Non-ThreatSaddle}} \) = 0.18 ± 0.49; U = 133.5; P = 0.0008).

On the contrary, positive reactions could not be predicted between tests: the horses that investigated the experimenter once in the Saddle test were not necessarily those that investigated her in the Halter test (Fisher test, N = 59, P > 0.05).

Predictability of reactions to an unfamiliar human-led task from human-horse evaluations in other contexts

Success in the Bridge test could be predicted from the ACT test, as mentioned before, but even better from the reactions of the horses to the work-related tests. Thus, the horses that threatened the experimenter at least once in the Saddle or in the Halter tests were more likely to fail the Bridge test (Fisher tests; N = 59; P = 0.04 in both cases), and the longer it took to fit the halter, the more trials they required to cross the bridge (Spearman correlation, ρ = 0.34; P = 0.01). Conversely, the horses that showed the most positive reactions (score A at least once) in the Saddle test crossed the bridge more successfully (Fisher test, N = 59, P = 0.02). Moreover, the horses that showed overall positive reactions (scores A and B) at least once in the Saddle test also investigated the experimenter at least once while being led across the bridge (Fisher test, N = 59, P = 0.004). Aggressive reactions and handling difficulties in tests related to human presence associated with work (Saddle, Halter tests) seemed good predictors of the horses’ behaviour in the unfamiliar working task (Bridge test) (Table 3).

Reactions to the unfamiliar experimenter/the familiar caretaker (cleaning the box)

Most of the horses (80%) investigated their familiar caretaker at least once (\( \overline{X} \)  = 2.5 ± 1.06, 1–5). Twenty-seven percent of them threatened him/her at least once (\( \overline{X} \)  = 2 ± 1.41, 1–4); this proportion was similar to the proportions of aggressive horses in other tests, especially in the ACT, Saddle and Halter tests (Table 2). Interestingly, proportions of horses presenting aggressive reactions varied less between situations (including the caretaker) than those presenting investigative reactions (Table 2).

Considering the relations between situations, horses that threatened the experimenter at least once in the MP, SudAp and Saddle tests were also those that threatened their caretaker at least once (Fisher tests, N = 30; respectively, P = 0.006, P = 0.04 and P = 0.01), and they threatened him/her more often (number of threats) (Mann–Whitney tests, MP, n ThreatMP = 7, n Non-ThreatMP = 23, \( \overline{X}_{\text{ThreatMP}} \) = 1.86 ± 1.77, \( \overline{X}_{\text{Non-ThreatMP}} \) = 0.13 ± 0.34, U = 30, P = 0.009; SudAp n D+ESudAp = 15, n A+B+CSudAp = 15, \( \overline{X}_{{{\text{D}} + {\text{ESudAp}}}} \) = 1 ± 1.46, \( \overline{X}_{{{\text{A}} + {\text{B}} + {\text{CSudAp}}}} \) = 0.07 ± 0.26, U = 66, P = 0.05; Saddle test n D+ESaddle = 11, n A+B+CSaddle = 19, \( \overline{X}_{{{\text{D}} + {\text{ESaddle}}}} \) = 1.27 ± 1.62, \( \overline{X}_{{{\text{A}} + {\text{B}} + {\text{CSaddle}}}} \) = 0.10 ± 0.31, U = 58, P = 0.03). On the other hand, the more investigation the horses showed towards the experimenter in the MP test, the more they investigated their caretaker (Spearman correlation, ρ = 0.38, P = 0.005), and the horses that investigated the experimenter at least once in the Halter test also investigated their caretaker more (Mann–Whitney test; n ExploHalter = 7, n Non-ExploHalter = 23; \( \overline{X}_{\text{ExploHalter}} \) = 3.29 ± 0.76, \( \overline{X}_{\text{Non-ExploHalter}} \) = 1.47 ± 1.27; U = 19; P = 0.002).

To sum up, especially regarding aggressive reactions and handling difficulties, our results showed that horses reacted in similar ways to humans in different contexts, including an unfamiliar experimenter and their familiar caretaker. Negative reactions appeared to be more predictable than positive reactions.

Discussion

In the present study, where we compared horses’ positive or negative reactions to humans in a variety of situations using or not work-related objects, it appeared that while correlations between tests revealed a “general perception” of humans as “positive” or “negative”, unusual tests, that are not usually performed, elicited more positive reactions. These results support our hypothesis that horses’ perception of humans may be based on experience, i.e. repeated interactions. More interesting still was the finding that aggressive reactions were more reliable indicators of this relationship than positive reactions, both from one test to another and from a familiar to an unfamiliar human. Our results also show generalization of the horse’s perception of humans. Interestingly, positive reactions to the mere presence of a standing person did not predict similar reactions in work-related tests. Horses seem therefore to distinguish passive humans from active ones (towards them), reinforcing the idea of an experience-dependent significance.

Many animal species are able to form a memory of humans (e.g. chimpanzees: Hostetter et al. 2007; Kaminski et al. 2004; dogs: Miklósi et al. 1998; Miklósi et al. 2005; Udell et al., 2008; cats: Miklósi et al. 2005), and our results confirm those of previous studies on horses suggesting the use of a memory of humans in subsequent interactions (Henry et al. 2005, 2006). Interestingly, the motionless presence of the experimenter and the unfamiliar working task elicited more positive reactions than did the other situations (including interactions with the caretaker). We hypothesized that because these situations were unusual, the reactions they elicited were less related to memories of previous interactions. The other tests and the interactions with their familiar caretaker represent more familiar situations, as horses are commonly approached in their box by unfamiliar riding students (to be cleaned, fitted) or by their caretaker. Altogether, these results support the formation of a memory of humans by horses that influences their reactions in subsequent interactions.

Direct comparisons between situations showed that horses’ reactions to humans in a given context could predict the horses’ reactions in others contexts involving an unfamiliar experimenter or the familiar caretaker. Our results confirm Lansade and Bouissou’s (2008) earlier findings evidencing cross-situation stability and generalization of reactions to humans in young untrained horses, but they also go further and show that these reactions in adult animals, in a non-working situation, predict their reactions when performing an unfamiliar task with a human. However, some adult-working horses appeared to be able to distinguish a passive person from an active person susceptible to approach and saddle them. In contrary to Lansade and Bouissou’s (2008) young untrained horses, the positive reactions of our subjects to a motionless person could not predict their positive reactions in the other tests, and especially in tests using work-related objects. This difference between untrained and trained horses strongly reinforces the hypothesis of an experienced-based memory of humans.

It could be argued that the reactions observed in the Saddle and Halter tests were merely reactions to the objects rather than to humans. However, the significant correlations between the reactions to clearly human-related tests (e.g. approach…) and the reactions in these tests indicate that more that just object memory is involved. Of course, a badly fitted saddle or halter may induce bad memories and lead to such reactions, but badly fitted saddles are even more painful during riding and it has been argued that bad memories of work situations can be induced by badly fitted gear (McGreevy and McLean 2005; Hausberger et al. 2008). Our results suggest that horses’ work-related memory may also influence their general perception of humans, confirming previous studies that underlined the major impact of working conditions on the whole daily life of an individual (Caanitz et al. 1991; Hausberger et al. 2004). Our results strongly suggest that the different memories of humans formed by horses during previous interactions lead them to attribute a general significance to humans (positive or negative memory of the relation), who become “significant objects” for horses, as defined in von Uexküll’s theory of meaning.

More interesting was the finding that aggressive reactions are even better predictors of horses’ reactions between tests than are positive reactions: a negative memory of humans seems to have a greater impact on the attribution of significance to humans by horses than does a positive memory. This study is the first to focus precisely on the valence of memory of humans in horses (i.e. positive or negative emotional value of humans for the animals). De Passillé et al. (1996) found that previous aversive handling seems to create a negative memory of humans in calves (Bos Taurus), whereas positive handling did not. Moreover, Hemsworth et al. (1987) compared the impact of different handling treatments on pigs’ (Sus scrofa domesticus) subsequent reactions to humans: a pleasant treatment (stroking a pig whenever it approached the experimenter), an unpleasant treatment (chasing the pig away whenever it approached) and an inconsistent treatment (combining unpleasant and pleasant treatments at a ratio fixed at 1:5). Their results showed that just one negative handling session (associated with several positive treatments) induced the same negative reaction to humans (revealed by avoidance of a motionless person) as the reaction induced by unpleasant treatments alone. Thus, a negative action seemed able to erase all positive ones, underlying the major importance of negative experience on pigs’ memory of humans. It could be the same for our horses, the negative interactions that some horses may have experienced previously could have induced them to develop a negative memory of humans more efficiently than one based on positive interactions.

According to the theory of meaning, our data support the hypothesis that other species, especially humans, can be considered as significant objects for a given organism according to its human-related experience (in our case, horses). The mechanisms involved in these horses’ abilities to attribute a general significance to humans remain to be investigated. Call (2001) argued in favour of a knowledge-based approach in chimpanzees, in which animals not only learn to associate some stimuli with certain responses, but also to extract the relationships between stimuli and therefore form general rules that disregard the specific stimuli involved. According to Call (2001), solving social problems involves the formation of a knowledge that allows individuals to predict the behaviour of others in novel situations. Considering Hinde’s (1979) definition of a relationship between two protagonists; i.e. the bond emerging from a series of interactions, where partners have, on the basis of past experience, expectations about the other individual’s responses; one may argue that data from inter-specific relationships could be linked to such a knowledge-based mechanism. Besides, studying dogs’ ability to recognize human attention across different situations, Gácsi et al. (2004) found that dogs are not only able to learn to associate some stimuli with certain responses, but are also able to extract relationships between stimuli, and, based on this, form new rules that they can use in novel situations involving humans. Our results suggest that horses possibly possess a similar type of mechanism: handling and training could affect horses’ memories of human actions either positively or negatively (Henry et al. 2006; Sankey et al., unpublished data).