Information-seeking across auditory scenes by an echolocating dolphin

The world can be a “blooming, buzzing confusion” offering myriad sources of detectable information to the sensory systems of an organism trying to negotiate its world (James 1890). Determining how organisms manage the challenge of organizing this input has been a central focus in psychology since its inception—and before. In his Principles of Psychology, James began his chapter on discrimination by citing the seventeenth-century philosopher Locke: “It is not enough to have a confused perception of something in general: unless the mind had a distinct perception of different objects and their qualities, it would be capable of very little knowledge”. Indeed, object perception and recognition are central to creating an animal’s representation of its world, its Umwelt. Under the sea, parsing the blooming, buzzing confusion into objects might be especially difficult due in part to the efficacy of the transmission of sound in water (sound travels more than four times faster in water than in air) and the water’s frequent murkiness, limiting vision. However, bottlenose dolphins (Tursiops truncatus, hereafter referred to as dolphins) evolved to take advantage of sound’s underwater speed through the development of echolocation, a system by which they send short, focused, high intensity, broadband clicks into the water and interpret the returning echoes to detect, discriminate, and identify objects, the surface, and other aspects of the auditory scene (e.g., Au 1993; Harley et al. 2003; Houser et al. 2005; Nachtigall and Moore 1988; Pack and Herman 1995; Xitco and Roitblat 1996). What is still unclear is how dolphins manage their echoic object recognition. Here, we try to gain a better understanding of the dolphin’s world by studying the effects of top–down processing (knowledge, memory, expectations, and selective attention) on echoic information-seeking of unfamiliar-to-familiar targets ultimately presented in auditory scenes that vary in their difficulty.

Most studies of dolphin echolocation have focused on bottom–up processes associated with echolocation, and although a great deal of important psychophysical research has uncovered some of the exquisite discrimination abilities dolphins have when it comes to detecting acoustic differences in frequency (Thompson and Herman 1975), amplitude (Au 1993; Evans 1973), and time (Au et al. 1988; Moore et al. 1984), these animals must surely also rely heavily on top–down processes to organize and interpret the echoes they receive which are an inconsistent information source for multiple reasons. First, sound is affected by many physical marine attributes—air bubbles, temperature, salinity, density, surface characteristics, and more—likely requiring the dolphin to use its knowledge of these forms of interference as well as its knowledge of familiar objects to manage object recognition. Second, objects are complex with the effect that echoes from different aspects of the same object can vary more than those from different objects (DeLong et al. 2006), and similar object features can produce different sounds depending on many factors including physical conditions and angle of reflection (Helweg et al. 1996a, b). Third, the larger auditory scene is formidable, with echoes returning from multiple objects including fish, rocks, conspecifics, and air (e.g., the water’s surface); and sounds issuing from animals of all kinds against the backdrop of the many varieties of the ocean’s roar. With that in mind, one wonders if the ability to hear from below 1 kHz to above 150 kHz (Houser and Finneran 2006; Johnson 1967) is a blessing or a curse!

For object recognition, the dolphin’s broad frequency hearing range does in fact help, as do its other perceptual strengths. Interpretations of dolphins’ object/echo discrimination performance accuracy and confusions coupled with analyses of objects’ echoes and perceptual models suggest that dolphins may use multiple echoic attributes to recognize objects including time separation pitch (a perceived frequency created by the return of correlated sound pulses received closely in time), target strength (the intensity of echo returns from target objects), the pattern of changes of echoes from multiple object orientations, the distribution of energy across frequencies, peak frequency, center frequency, duration, and integration of echoes, among others, and that they likely use a combination of attributes rather than single attributes (e.g., Altes et al. 2003; Au and Martin 1989; Branstetter et al. 2020; DeLong et al. 2006; Helweg et al. 1996; Helweg et al. 1996a, b). However, sounds alone do not a perceptual object make. The dolphin’s perceptual and cognitive systems do that work: how?

Auditory scene analysis

A general framework to help us think about this problem is Bregman’s (1990) auditory scene analysis, an approach focusing on how humans and other animals organize apparent cacophony, i.e., how they group sounds from the same source together into auditory streams and disambiguate/separate those streams from sounds related to other sources (Bregman 1990, 2005, 2015). Bregman uses the Gestalt approach to visual processing as an analogy: in the same way that people use the principle of similarity to group dots of the same color into columns when they appear on a page full of lines of equidistant dots in which every other dot in each row alternates from yellow to blue to yellow to blue, etc., people also group a series of alternating high and low tones into two auditory streams: a high one and a low one. An everyday illustration of the organization of auditory objects can be illustrated in the well-known example of the phonemic restoration effect in which listeners fill in speech sounds that are missing due to the inclusion of coughs or other noises that replaced the expected speech sound; this phenomenon occurs through a combination of bottom–up (accessing the speech sounds themselves) and top–down (filling in information based on knowledge, expectations, and goals) processes applied when the sounds that listeners hear are ambiguous and they are trying to organize an auditory scene (e.g., Shinn-Cunningham and Wang 2008). Top–down processes not only fill in missing information, but also guide information-seeking behavior. For example, listeners can identify auditory objects based on timing or frequency attributes, a bottom–up process, but providing instructions to them can sway them to attend to one attribute versus another, a top–down process, and these changes in attention affect the auditory object they perceive (Bregman 1990; Miller and Bee 2012; Shinn-Cunningham 2008; Shinn-Cunningham et al. 2007).

The use of top–down processes typically makes navigating the world faster and easier. Although learning requires time and resources, having knowledge reduces the need for resources later and makes problem-solving (e.g., the problem of identifying words, objects, and patterns) easier in a complex and noisy world. For example, memory of previous familiar problems and patterns is central to problem-solving for experts like chess masters and makes them significantly faster and more capable at coming to successful solutions than novices (see Bilalic et al. 2009, for an overview). This reduction in the use of cognitive resources after learning is also reflected in neural processing in multiple ways including via long-term potentiation, a strengthening between neural synapses occurring after repeated activation of that neural channel (see Hayashi 2021, for an overview). Trying to avoid a predator or capture prey? Faster identification is better! For an echolocating dolphin moving through the world, decoding auditory scenes in part through object recognition should require fewer resources for familiar objects and scenes, thereby making life more manageable.

The framework of auditory scene analysis applied to other animals, including macaques, starlings, treefrogs, ferrets, budgerigars, bats, porpoises, and dolphins, indicates that they also organize their acoustic environment into informative auditory streams (Bee 2015; Branstetter and Finneran 2008; Branstetter et al. 2013; Finneran and Branstetter 2013; Fishman et al. 2012; Fishman et al. 2001; Hulse et al. 1997; Itatani and Klump 2009, 2020; Ladegaard and Madsen 2019; Ma et al. 2010; Moss and Surlykke 2001; Neilans and Dent 2015). Work with bats is of special interest here, because they, like dolphins, are also echolocators and can directly contribute to the auditory scenes they are decoding through their biosonar systems, i.e., how they manage their echoic investigations in part determines the echoes they receive. Bat echolocation systems have evolved for the different needs and habitats experienced across the hundreds of different echolocating bat species on the planet, and individual bats can adaptively control their out-going echolocation signals and movements to gain better information about the world and likely reduce the complexity of an auditory scene (Moss and Surlykke 2010). For example, some species shift the frequency of their calls when in groups, presumably to help them disambiguate their own echoic returns versus those of a conspecific, and other species appear to avoid simultaneously echolocating with nearby bats (as do rough-toothed dolphins, Gotz et al. 2006) (Chiu et al. 2008, 2009; Jarvis et al. 2010).

Moss and her colleagues (e.g., Moss and Surlykke 2010; Stidsholt et al. 2018; Wohlgemuth et al. 2016) make compelling arguments that bats’ information-seeking behaviors, i.e., their active-sensing movements and behavioral responses to a variety of acoustic scenes, provide useful windows into bats’ echolocation systems that enhance our understanding of the information bats are trying to control within the acoustic streams they are receiving. For example, when capturing prey in open rooms, a fairly easy auditory scene, versus near vegetation, a more difficult auditory scene, big brown bats change a host of their behaviors: when the prey are near vegetation, the bats take longer to try to intercept the prey target, spend more time “strobing” (producing packets of pulses with a stable pulse interval), increase the pulse intervals in the strobes potentially to increase processing time for this difficult task, change their flight paths to avoid the backward masking of echoes returning from the vegetation and overwhelming the target’s echo, shorten the length of the terminal buzz (the final burst of pulses) produced just before capture, and are significantly less likely to try to capture the target in the first place than they do in an open room (Moss et al. 2006). Similarly, barbastelle bats shift their echolocation behaviors based on the difficulty of different tasks including increasing their call rates (i.e., effort) for more difficult tasks (Lewanzik and Goerlitz 2021). Bats clearly adapt their echolocation behaviors literally on the fly to enhance perception and reach their goals, and analyzing those behaviors gives us insights into the information the bats need and how they work to get it across contexts.

Echoic information-seeking by dolphins: strategies, top–down processing, and effort

Although the general lack of availability of dolphins for research and the challenges of working in a marine environment have led to many fewer studies of free-swimming dolphins specifically focused on their echoic investigations of objects and auditory scenes compared to bats (Moore and Finneran 2011; Moss et al. 2014), many experiments with stationary dolphins and a few free-swimming animals indicate that they can control their echoic investigations of objects, presumably to get better information, including by producing more intense clicks with higher peak frequencies in noise, emitting more clicks, changing their inter-click intervals—including producing “packets” of clicks, and getting closer to objects when they are free-swimming (e.g., Au et al. 1974; Au et al. 1982; Houser et al. 2005; Ladegaard and Madsen 2019; Roitblat et al. 1990). We also know that free-swimming echolocating harbor porpoises (Phocoena phocoena) change their investigatory behaviors when auditory scenes change: when a target object and an alternative were closer together, potentially making an auditory scene more difficult, the porpoises increased the number of their echoic scans, clicked faster, began their buzzes farther away and increased their duration, and made their decisions when they were closer to the targets (Malinka et al. 2021a, b). In addition, these animals generally reduce their inter-click intervals (except for terminal buzzes) and produce quieter pulses in pools versus net pens, potentially due to different reverberation levels in the two contexts (Ladegaard and Madsen 2019).

Some data shed light on the top–down processing mechanisms bottlenose dolphins use for echolocation tasks. For one, dolphins have remarkable echoic attention capacities. Because dolphins are unihemispheric sleepers, they can echoically monitor their environment for echoic targets for at least 15 continuous days, i.e., 360 h in a row, with high-performance accuracy (> 95% with an average of 78.4 trials/day) (Branstetter et al. 2012). For another, expectations can affect click production; dolphins use their expectations of a target’s distance to space their clicks (Au et al. 1974, 1982). For example, in a task in which an echolocating dolphin had to report the presence or absence of an object at 5 different distances, the dolphin performed significantly better when the object appeared at the same distance throughout a session than when it appeared at different distances, suggesting that focusing on a single distance helped (Penner 1988). In addition, the dolphin’s inter-click interval was appropriate to a specific single distance (based on two-way travel time of the out-going click and the returning echo) on both presence and absence trials when distance stayed the same throughout a session, indicating that the dolphin’s knowledge dictated the nature of his click trains. Third, object familiarity makes a difference. Dolphins in matching tasks often get better at matching as objects become more familiar, i.e., as the dolphin gains more experience with the objects (e.g., Herman et al. 1998; Xitco and Roitblat 1996).

The object-familiarity advantage suggests that dolphins remember an object’s echoes, learn how to inspect an object more capably, and/or shift their representations of objects as they gain experience with them. This may mean that they need less information to recognize an object, thereby allowing them to recognize objects more proficiently in contexts in which echoes are degraded or otherwise less accessible. They may also learn to investigate objects more efficiently to discover the telling characteristic of that object compared to alternatives. An unusual study on eavesdropping in dolphins adds support to these possibilities (Xitco and Roitblat 1996). In this study, a non-echolocating dolphin (the “listener”) listened in on the echoes returning to its echolocating partner (the “inspector”). Both dolphins were originally trained to engage in an active echoic matching task and improved with object familiarity, indicating that they remembered the objects. The listener tended to be better at these active echolocation (non-eavesdropping) tasks; the inspector had more biases. When just eavesdropping, the listener’s performance accuracy was above chance levels and numerous analyses indicated that his choices were not contingent on the inspector’s choices; that is, echoic returns of the sample target object were the basis of his choices. However, the listener was affected by the inspector’s investigations: The listener was more likely to be correct when the inspector was correct, the inspector’s biases were reflected in the listener’s choices (i.e., the inspector had a bias towards wooden and styrofoam objects, and when eavesdropping—and only when eavesdropping—so did the listener), and the inspector—and therefore the listener—was much more likely to be correct when he was inspecting familiar objects versus objects that were only familiar to the listener.

Effort, defined as number of clicks, also varies across echoic investigations. In one study of a pair of free-swimming dolphins’ echoic behaviors when engaging in a presence/absence target detection task in a cluttered environment, one “methodological and thorough” dolphin (whose high-frequency hearing was likely significantly worse than the second dolphin’s) averaged over 300 clicks and 25 s in a target-present search and over 500 clicks and 38 s in a target-absent search, whereas the second minimalist dolphin averaged around 31 clicks and 6.5 s for target-present and 109 clicks and 18 s for target-absent searches (Houser et al. 2005). Both dolphins increased their number of clicks, and therefore on our account “effort”, when conditions required a more thorough search during target-absent trials. Both dolphins also increased the strength of echo returns at target acquisition; the first dolphin by getting closer to the target and the second dolphin by increasing the intensity of his clicks. In an early study of effort, two stationary echolocating dolphins in a presence/absence target detection task were more similar to each other than in the free-swimming example, but also interesting, because in this task, the noise level was increased across trials until the two animals eventually gave up (Au et al. 1982). The mean number of clicks per trial began between 20 and 30 clicks/trial and, as the noise level was increased, number of clicks steadily increased two-to-threefold until a noise threshold was reached at which point number of clicks went steadily down; in fact, one dolphin avoided clicking altogether for some of the high-noise trials. Apparently, when the problem became insoluble, the dolphins stopped trying to solve it.

In an analysis of problem-solving involving decision-making and object recognition in a stationary echolocating dolphin performing a matching-to-sample task, Roitblat et al. (1990) (like others) framed the matching paradigm as two discrimination problems, a successive discrimination problem to identify the sample target object and a second simultaneous discrimination problem to choose the matching choice from the group of alternative objects, a simple auditory scene. Roitblat et al. define both problems as requiring sequential-decision sampling: identifying the sample requires repeated effortful investigations that build up information over time; determining the object that matches the sample from an array of objects is less effortful, because it only requires gaining enough information to determine similarity to the sample. The highly skilled dolphin Rake, almost error-free with his very familiar four experimental objects, varied his number of clicks, i.e., effort, in investigations. As expected he invested more effort in identifying the sample target objects (averaging 37.2 clicks) than in his first scan to the alternative objects (averaging 17.3 clicks). He also traded effort for accuracy, working harder to identify the weak-echoic-return most difficult objects (the most missed), averaging more clicks to the matching stimuli (versus the non-matching stimuli) in his first scan of the alternative array, and extending his efforts to match alternatives on the right which took more scans, since he investigated the alternatives stereotypically from left to right. Roitblat et al. also used their data to create a decision-making model enlisting signal detection theory and Bayesian decision rules which indicated that the dolphin was integrating information from successive investigations to inform his echoic information-seeking behaviors. In this model, dolphins gain more knowledge with more clicks, and recognition of the sample target requires the most effort, because it requires full recognition rather than merely evaluating for similarities between the target and the choice.

The current study

Auditory scene analysis embraces the whole blooming, buzzing confusion of an animal’s acoustic world, and because echolocators orchestrate aspects of the composition of that scene through their clicks/pulses and movements to decode the returning echoes, their information-seeking behaviors also give researchers information about what the animals need and use to create auditory objects. When processing a scene, animals also likely assess the scene for its decipherability and calibrate their efforts based on the possibilities, as the dolphins in the object detection task outlined above (Au et al. 1982) did when noise overwhelmed the object’s returning echoes. Using the famous everyday-life example of tracking a speaker’s words at a cocktail party, the acoustic scene may allow the listener to hear easily, lead the listener to put in extra effort to manage information exchange, or require the listener to expend so much effort she gives up or invites the speaker to an easier venue (a different scene), all depending upon multiple characteristics related to the scene itself, the sound source, and the listener.

When dolphins echolocate, they vary their investigations to get better information, but specifics of how and why are still emerging from the noise. Target discriminability and familiarity appear to have an influence on dolphins’ echoic investigations of objects, but to date, there have been no studies that systematically examine the influence of these variables in an echoic object recognition task. Our general goal here is to learn more about how a dolphin’s top–down cognitive processes affect his responses to different auditory scenes as they become more familiar. We begin simply, by measuring the effort a free-swimming dolphin invests as he echolocates target samples that then appear in fairly straight-forward auditory scenes (i.e., the three alternatives that appear in choice arrays in a matching-to-sample paradigm) in which the discriminability of the objects themselves within those auditory scenes makes recognition harder or easier, to see if the dolphin takes these variations in decipherability of the scenes into account. This approach allows us to explicitly look at the top–down processes of attention, memory, knowledge, and expectation as the targets and scenes methodically shift from being completely unfamiliar to familiar. Our specific question for this study is: does a dolphin’s echoic investigatory practices, in this case the number of clicks he produces to target sample objects, change as the dolphin learns about auditory scenes that range in their decipherability? Given the dolphin’s prowess at echoic object recognition and the variability of echoes across auditory scenes, we predict that the dolphin’s investigations of objects will change as he learns more about objects within specific contexts, i.e., he will bring top–down processes (knowledge, memory, expectations, and selective attention) to bear to drive his information-seeking behavior (number of clicks) as his representations of objects and their roles within auditory scenes change.

Methods

Subject

The subject was an adult male bottlenose dolphin (Tursiops truncatus), Calvin, who had previous visual and echoic matching-to-sample experience (Harley et al. 2010). He was born in 1994 at a facility in the Florida Keys and moved to his current facility in 2003. Audiograms in 2013 and 2019 conducted via auditory-evoked potential confirmed that he had good hearing across the normal range. He lived with 3 other adult males in one-quarter of a 5.8 million-gallon, mixed-species exhibit and two ancillary pools at The Seas, Epcot®, Walt Disney World® Resort, Lake Buena Vista, FL, USA. The current study was conducted in one of the ancillary pools (B Pool) which measured 8.2 m long by 7 m wide by 2.1 m deep. See Fig. 1 for the study location, B Pool. Calvin consumed a diet of herring, capelin, and squid that was customized by nutritionists, veterinarians, and trainers on the Animal Health and Animal Care teams. These teams were responsible for all care and management decisions which were independent of Calvin’s participation or accuracy in research sessions. Disney’s Animal Care and Welfare committee reviewed and approved of the project (IR1005), and the dolphin was cared for in accordance with the U.S. Animal Welfare Act (1966) and the Association of Zoos and Aquariums (2014) accreditation guidelines at all times. The Seas was authorized to house the animals by permit # 58-C-0076 issued by the U.S. Department of Agriculture Animal and Plant Health Inspection Service.

Fig. 1
figure 1

The experimental set-up. The echolocating dolphin begins with the trainer at “start”, swims to the sample object (hydrophone #1 positioned behind object), and proceeds to the alternative array to make a choice

Because dolphins are difficult to access, and tasks like this one take extensive periods to train, one dolphin participated in the study. Single-subject designs do not allow for group comparisons, but our question focused on the capacity of any dolphin to use top–down processes to calibrate its information-seeking behaviors based on auditory scenes. Capacity indicates what a species can do, although it does not indicate how common the ability is (Triana and Pasnak 1981). Working with a single subject allowed us to answer our question and benefit from a fine-tuned analysis of the dolphin’s responses to the independent variables of object familiarity and scene decipherability. In some circumstances, the individual level, using within-subject repeated sampling, is more effective at understanding psychological phenomena, because large group results do not overwhelm effects that occur on the individual level (Smith and Little 2018). We designed our study to take advantage of this strength, due to access limitations and the nature of our question.

Materials

Stimuli

The main focus of the study was to examine object familiarity across different levels of discriminability, and so many objects served as stimuli. In the beginning of the study, all stimuli were unfamiliar to the subject and were only used in test sessions, as described throughout the manuscript. (The dolphin learned the task in a previous study with other objects.) For the bulk of the study, we used 20 three-object sets (Sets A-T presented chronologically in alphabetical order), although there were a few other objects introduced in the last condition of the study. Most objects were made of PVC, but some were hardware “junk” objects made of a variety of materials. Across the study, the PVC objects ranged in size from 2.7″ H by 4.3″ W (the smallest) to 30.8″ H by 18.1″ W (the largest). The hardware objects ranged in size from 2.6″ H by 1.9″ (the smallest) to 14.6″ H by 12.8″ W (the largest). The smallest hardware object was the Stapler Remover, and the largest hardware object was the Letter Tray. Figure 2 shows examples of object stimulus sets from the original 20 sets. Figure 3 shows examples of the objects used in the three trial types in the final condition of the study, “scene shifts”, in which familiar objects were inserted into new scenes with unfamiliar objects. Figure 4 shows examples of the unfamiliar easy “junk” hardware objects used in the “scene shifts” condition.

Fig. 2
figure 2

Examples of stimulus sets: easy (first block, sets by row), hard (second block, sets by row), and challenging-but-doable (third block, sets by row)

Fig. 3
figure 3

Examples of the three trial types in the Scene Shifts condition featuring an original hard set (top row), a hard-set object presented within an easy scene (middle row), and an easy scene (bottom row)

Fig. 4
figure 4

Examples of unfamiliar easy “junk” hardware objects used in the Scene Shifts condition

The sample and alternative objects were attached to monofilament line covered in small clear soft-plastic tubes and hung off PVC poles via metal hooks clipped into above-water loops of the line. All objects were suspended 0.7 m from the walls of the pool, with 1.2 m between each alternative. Objects were suspended, such that the center of each object was 40.6 cm under the surface of the water.

Recording and acoustical analysis devices

We made acoustic recordings during the sessions with High Tech, Inc. HTI-96MIN hydrophones with a flat frequency response of 2 Hz to 30 kHz (although the actual recording range was 0 Hz to 50 kHz), and clicks were recorded at a sampling rate of 100 kHz per second. The clicks were recorded onto a Lenovo T410 laptop computer using Avisoft Recorder-USG version 4.2.8 (http://www.avisoft.com). During all sessions, a hydrophone was mounted behind the sample object in line with Calvin’s approach to the target sample to record Calvin’s clicks directed toward the sample object. Aside from the sample hydrophone, there were two other hydrophones in the pool (as well as other hydrophones in other parts of the environment) that allowed us to evaluate more easily when Calvin turned his head away from the sample as well as to monitor other dolphins’ vocalizations in other parts of the habitat. The audio recordings using the four channels were analyzed with Avisoft SASLab Pro Sound Analysis and Synthesis Laboratory version 5.2.01. See Fig. 1 to see locations of the hydrophones.

Video was recorded using a PC Osprey 4-channel video card with H.264 Webcam software that simultaneously recorded above the water in all the dolphin areas. The camera over B Pool, the study site, was mounted, such that it could capture the entire pool.

Procedure

At the start of the study, Calvin already performed capably in a three-alternative matching-to-sample task. He wore soft, silicone eyecups during trials to preclude visual cues. He could pop the eyecups off at will, but he was trained to wear them. If an eyecup came off during a trial, stimuli were immediately pulled from the pool.

Before the start of a trial, the sample was positioned below the surface of the water; it was introduced along with a masking object (one of the alternative objects) to disguise any splash cues associated with putting the sample into the water, and the masking object was then pulled out of the water leaving only the sample. The alternative choice objects were also positioned underwater before the start of a trial. See Fig. 1 for the trial set-up.

A trial began when the dolphin positioned himself in front of his trainer, who then signaled him tactilely to swim 6.1 m to the target sample object to his left. The dolphin could swim at his own pace. After investigating the sample object ad libitum, he swam to the choice array 6.1 m to his left where three alternative objects, one identical to the sample, were positioned. After inspecting the alternatives, Calvin positioned himself in front of his choice object and chirped. A research assistant who did not know the identity of the sample identified the object Calvin had chosen. If Calvin’s choice matched the sample, the trainer (across the pool from the alternative array) blew a secondary reinforcer “bridge” whistle and gave Calvin 2–3 capelin. If his choice was incorrect, the trainer interacted briefly with Calvin, and we moved to the next trial. Occasionally, the trainer interacted informally with Calvin between trials. Intertrial intervals were a minimum of 30 s.

We recorded clicks to the sample by mounting the hydrophone behind the sample object based on the dolphin’s swim path towards the object. (The dolphin always swam from the same origin point towards the sample object as indicated with “start” in Fig. 1.) A researcher recording the session inserted comments into Avisoft Recorder-USG at the moment the dolphin was: (1) released to approach the sample, (2) turned away from the sample toward the alternatives, and (3) chirped in front of one of the alternatives to indicate his choice. All clicks between the time the dolphin began approaching the sample and began turning away from the sample were counted in the click counts reported as information-seeking clicks, i.e., the clicks the dolphin emitted during the sample investigation period. The researcher’s comments on start and end of Calvin’s sample investigations were confirmed by a second researcher who synchronized the video and audio using a visible and audible synchronizing tap and then used Avisoft SASLab’s “pulse train analysis” module to count the number of clicks directed at the sample hydrophone. If Calvin’s rostrum was pointed away from the sample object and the clicks were louder on channels 2 or 3 than on channel 1 (the sample hydrophone), then those clicks were not counted in Calvin’s investigation of the sample. Calvin’s clicks toward the sample stopped, paused, or faded in volume, as he turned to the alternatives.

We also hand-counted the number of clicks in 3.3% of the trials in a selection of 12 sets (the 4 most discriminable sets, the 4 least discriminable sets, and the 4 challenging-but-doable sets) using a semi-randomized procedure in which we counted trials that included each sample within a set, always in different sessions with that set. We determined that out of 12,441 hand-counted clicks, the automated system missed few clicks: the system missed 153 clicks (1.2%) total in the hand-counted trials, including 50 clicks (0.4%) in 4 terminal buzzes (click sequences with short inter-click intervals occurring at the end of the investigations of the target sample objects), mostly due to noise. However, we found more falsely detected clicks (1193 clicks, 9.6%), mostly echoes at the beginning or end of a click train when the inter-click intervals were long (the click counting function includes a specified minimum interval during which echoes will not be counted), although in one case, we miscounted 548 clicks due to user error (in one trial, we did not use the appropriate noise-reducing filter). Overall, our error rate was 7.2%. However, the frequency of making an error in click counts was evenly distributed across auditory scene types—i.e., across all levels of discriminability in object sets—thereby having little effect on relative comparisons across scene types. In addition, because we were interested in relative numbers of clicks and we always used the same procedure to count clicks, any sample-hydrophone-directed clicks Calvin may have produced to navigate the pool as he swam to the sample were anticipated to be relatively consistent or consistently variable across the more than 1800 trials we recorded.

Across the experiment, the dolphin was tested with 20 object sets, each of which included three different objects that were unfamiliar to the dolphin before the first session with those objects. Each set was presented for five 18-trial sessions. Trials were presented in semi-random order, i.e., trial order was randomized with these exceptions: Each sample appeared once in the first three trials of the first sessions of the sets, and no sample could appear more than three times in a row in any session. Having 18 trials allowed us to present the sample an equal number of times (6) and the presentation of its match and non-matching alternatives an equal number of times in each position in each session. Having five sessions with each set allowed us to explore any changes in the dolphin’s information-seeking behaviors as the objects changed from being unfamiliar in the first session to familiar across the five sessions.

Not only did the 20 object sets vary on familiarity over time, they also varied in their discriminability. We let the dolphin indicate object discriminability within sets using the dolphin’s performance accuracy at identifying the object that matched the sample within the 3-alternative choice arrays, i.e., the auditory scenes, to determine this variable. In our tests, chance performance accuracy (33% based on 3 alternatives in the choice array) would be interpreted to mean the objects in that set were indiscriminable to the dolphin, whereas high-performance accuracy, e.g., > 90%, would suggest the objects in that set were highly discriminable to the dolphin. In this way, we could measure information-seeking across hard (indiscriminable object arrays) and easy (easily discriminable object arrays) auditory scenes, i.e., scenes that varied in their decipherability. These scenes are “auditory”, because the dolphin is echolocating the objects and processing the returning echoes.

Control “interleaved” sessions to evaluate the effect of general motivation versus auditory scene on performance accuracy and effort

We hypothesized that changes in performance accuracy (discriminability) and number of clicks (attentive effort) might vary across the auditory scenes/object-set alternative arrays based on (1) the dolphin’s experience with those specific scenes/sets OR (2) the dolphin’s general motivational state across the time period during which the 20 scenes/object sets were tested. To distinguish between these two explanations, we ran 15 total control sessions in which we interleaved trials from three difficult object sets (mean performance accuracy < 45%) and three easy sets (mean performance accuracy > 88%) within five 36-trial sessions. That is, we organized an easy object set into 18 trials as previously described for test sessions, and we organized a difficult object set in the same way. Then, we interleaved those trials to create a single 36-trial session in which trials from both sets were randomly ordered resulting in the interleaving of hard and easy trials. We created three interleaved sets for five 36-trial sessions each. We reasoned that if the dolphin’s general motivational state was responsible for the differences across sets, then presenting the sets within a single session should even out the differences; that is, performance accuracy and number of clicks should be similar across all of the trials. On the other hand, if the dolphin’s performance accuracy and information-seeking (number of clicks) were due to his learning about the object sets themselves, then his performance accuracy and number of clicks in these interleaved sets should be similar to his original performance accuracy and number of clicks when the sets were presented the first time within the series.

“Scene shift” sessions to evaluate the effects of a change on auditory scenes

Discriminability is based on auditory scene. That is, objects may be difficult to recognize in some contexts and easy in others. To make a small in-road into understanding the nature of Calvin’s memory of objects related to scene, we conducted nine total sessions that included three trial types based on the objects that were presented in the alternative arrays. Trial type 1 included samples and alternative arrays composed of objects that continued to be presented in sets from the original 20 sets but were sets on which Calvin had poor performance accuracy, i.e., “hard-set sample/hard scene” trials. Trial type 2 included samples from the “hard” sets and alternative arrays in which those samples were presented with new unfamiliar objects anticipated to be easy for Calvin to discriminate, i.e., “hard-set sample/easy scene” trials. Trial type 3 included samples that were new PVC or “junk” hardware objects and alternative choice arrays in which those samples were presented with an unfamiliar “easy” object and a familiar “hard-set” object, creating a set that was anticipated to be easy for Calvin to discriminate, i.e., “easy scene” trials. Each trial type occurred six times in each session leading to 18 total trials. Three “hard” sets were tested across three 18-trial sessions along with unfamiliar “easy” objects unique to each set. Therefore, there were only 18 trials total of each trial type for a given set of objects based on the three sessions with each combination set. See Figs. 3 and 4 for examples of stimuli in these trial types.

Results and analyses

Variability of performance accuracy and number of clicks across all object sets

As intended, difficulty of interpreting the auditory scene, defined empirically using the dolphin’s mean performance accuracy with each set across five 18-trial sessions with that set, varied widely across the 20 object sets (A-T, alphabetical listing reflects chronological order of presentation). Chance performance accuracy was 33%, because there were 3 alternatives in each choice array. Mean performance accuracy on the 20 sets across the 5 sessions that each set was initially presented to the dolphin ranged from 33.0 to 93.33% with an overall mean of 66.83% (Standard Deviation = 18.45%) for all sets. Performance accuracy on individual 18-trial sessions ranged from 22.22 to 100%.

Information-seeking effort, defined as the number of clicks directed to the sample object, also varied widely across the object sets. Mean number of clicks/trial towards the sample objects presented within the 20 sets across the five sessions that each set was initially presented to the dolphin ranged from 229.67 clicks/trial to 473.83 clicks/trial with an overall mean of 353.91 clicks/trial (Standard Deviation = 73.51 clicks) for all sets. The mean number of clicks/trial on individual 18-trial sessions ranged from a low of 154.56 clicks/trial to a high of 694.5 clicks/trial. (One of the 100 sessions had no click counts due to equipment malfunction.)

Figure 5A and B provides a sense of the variability across object sets in terms of performance accuracy and mean number of clicks directed towards the sample objects, respectively, across all of the sessions for each of the 20 object sets (100 total sessions).

Fig. 5
figure 5

A Accuracy by session for all 20 object sets, A-T (one line/set). B Average number of clicks/trial for each session for all 20 object sets, A-T (one line/set)

Information-seeking across object discriminability and familiarity

To learn more about the dolphin’s information-seeking effort across the dimensions of object familiarity and discriminability, we focused several analyses on eight object sets categorized as easy and hard as defined by the dolphin’s mean performance accuracy in the matching task. The four sets with highest mean performance accuracy across the five sessions in which they were presented (range 90.0–93.33%) were designated as “easy” sets. In these sets, two sets had two PVC object shapes and one junk hardware-store object, and two sets had three PVC object shapes. The four sets with lowest mean performance accuracy across the five sessions in which they were presented (range 33.33–44.44%) were designated as “hard” sets and composed of shapes built from PVC. See Fig. 2 for examples of the stimulus objects.

Table 1 provides performance accuracy and mean number of clicks per trial in a session for the eight object sets. As planned, the dolphin’s performance accuracy was significantly better on the easy sets (4 sets of 5 sessions each: mean = 91.39%, SD = 6.85%) than hard sets (4 sets of 5 sessions each: mean = 40.56%, SD = 9.38%), paired t(19) = 21.86, p < 0.00001). The dolphin also investigated the sample object with significantly more clicks in easy sets (mean = 393.49 clicks, SD = 85.16) than hard sets (mean = 310.74 clicks, SD = 143.84), paired t(19) = 2.99, p = 0.007. (The Bonferroni correction to protect experiment-wise error at p = 0.05 with 7 tests, the number of t tests we performed, is p = 0.007.) See Fig. 6 to see a graphic representation of these data.

Table 1 Easy and hard sets: performance accuracy and mean number-of-clicks/trial by session
Fig. 6
figure 6

Mean performance accuracy (bars) and mean number of clicks/trial (lines) for sessions 1–5 for the 4 easy (first bar/pair; top line) and 4 hard sets (second bar/pair; bottom line)

All objects were initially unfamiliar to the dolphin at the beginning of the first session and thus became more familiar across the first and, likely, following sessions. Therefore, we compared the first and final sessions of the easy and hard object sets in terms of effort (number of clicks) reasoning that in the first sessions with each set, the dolphin would be learning about these initially unfamiliar objects and their scenes, and that by the fifth sessions, the objects and their scenes would be familiar. Figure 7A and B shows the mean number of clicks directed to the sample in each trial across the easy and hard object sets as well as performance accuracy on these trials in the first session with the object sets and the last session with the object sets, respectively. With the easy-set objects, the dolphin reduced the number of clicks he directed towards the sample from the first session with each set (four 18-trial sessions: mean = 451.89 clicks, SD = 178.45) to the fifth session with each set (four 18-trial sessions: mean = 375.58 clicks, SD = 120.05), paired t(71) = 2.80, p = 0.007. Similarly, though more pronounced with the hard-set objects, the dolphin reduced the number of clicks he directed towards the sample from the first session with each set (4 18-trial sessions: mean = 363.38 clicks, SD = 244.56) to the fifth session with each set (4 18-trial sessions: mean = 237.85 clicks, SD = 105.40), paired t(71) = 4.46, p < 0.0003. Overall, he dropped his effort to 83% of his original effort with easy sets and to 65% of his original effort with hard sets.

Fig. 7
figure 7

A Mean performance accuracy (bars) and mean number of clicks/trial (lines) for the first sessions, trial by trial, of the 4 easy (first bars/pair; top line) and 4 hard sets (second bar/pair; bottom line). B Mean performance accuracy (bars) and mean number of clicks/trial (lines) for the fifth sessions, trial by trial, of the 4 easy (first bars/pair; top line) and 4 hard sets (second bar/pair; bottom line)

To get a more fine-tuned indication of how the dolphin’s investigations of the objects changed as they became more familiar, we focused on the first trials and sessions with the objects. The dolphin had experienced each of the objects as a sample by the end of the first three trials of each first session. For easy sets, he produced 5949 clicks in those first three trials in the first session and 5176 clicks in the first three trials of the fifth session added across all four sessions with each easy set. For hard sets, he produced 5127 clicks in those first three trials in the first session and 3292 clicks in the first three trials of the fifth session. Hence, there was a 12.99% drop from first to last session in easy trials, and a 35.79% drop for hard trials; that is, the drop in effort for hard scenes was 3 times that of easy scenes for the first three trials of those sessions. Across all the first eight 18-trial sessions for easy and hard sets, the dolphin was incorrect on 10/72 easy trials and on 49/72 hard trials making his easy error rate 14.0% in the first easy sessions and his hard error rate 68.06% in the first hard sessions. Given these error rates, the discriminability of the objects within their auditory scenes was likely detected early in the first sessions.

Investigatory effort continued to shift across the five sessions in the easy and hard sets. As noted earlier, by the end of the first session, the dolphin’s investigatory clicks were clearly reduced for hard sets ( per trial across the first session with each object set = 363.38 clicks) versus easy sets ( = 451.89 clicks); however, the number of clicks converged more closely in the second ( easy = 389.72 clicks; hard = 378.22) and third ( easy = 353.44; hard = 335.59) sessions and then diverged in the fourth ( easy = 396.79; hard = 238.67) and fifth ( easy = 375.48; hard = 237.85) sessions. Figure 6 illustrates the changes in investigatory effort and performance accuracy across the five sessions by easy and hard sets.

In contrast to easy and hard sets, investigatory effort did not significantly change for challenging-but-doable object sets across the five sessions. After discovering that number of clicks went down differentially on easy and hard sets across the five sessions with each set, we analyzed changes in effort for four challenging-but-doable sets (C, F, J, and O) as a way to tease apart the effects of familiarity and scene decipherability on effort. We chose the four sets in which the dolphin’s performance accuracy was low in the first session ( = 47.22%, ranging from 44.44% to 55.56%), but had improved at least 15% by the fifth session ( = 68.06%, ranging from 61.11% to 77.78%). We discovered that mean number of clicks/trial with these sets, in contrast to the easy and hard sets, did not change significantly from the first ( = 302.56 clicks, SD = 121.53) to the fifth ( = 311.42 clicks, SD = 149.13) sessions, paired t(71) = -0.3499, p = 0.727. See Table 2 for performance accuracy and mean number of clicks/trial for each session for these rather variable sets.

Table 2 Challenging-but-doable sets: performance accuracy and mean number-of-clicks/trial by session

Reinforcement and patterns of choices among stimuli across acoustic scenes

Given the differences in performance accuracy and number of clicks across the different acoustic scenes (easy, hard, challenging-but-doable), we wanted to determine how reinforcement history might affect number of clicks per object and how patterns of choices might vary across acoustic scenes. To investigate reinforcement history across the 12 3-object sets (4 easy, 4 challenging-but-doable, 4 hard sets), we looked for reinforcement-value “outlier” objects, i.e., objects that the dolphin had chosen correctly or avoided incorrectly at least 25% more often compared to the other objects in a set, thereby leading to a substantially different reinforcement history with those objects. For example, in easy set A, there were no outlier objects: the dolphin chose each of the 3 objects correctly 90.00%, 93.33%, and 96.67% of the time; hence, the dolphin received nearly the same amount of reinforcement after choosing each of the objects in that set (a 3.33% difference between the high-to-middle performance accuracy values and a 3.33% difference between the low-to-middle performance accuracy values). In addition, to get a feel for any differences in strategy the dolphin may have used to manage the different scenes, we looked at patterns of choices in two ways: (1) We considered pairs of objects to be regularly confused with one another if both objects were chosen for each other 33.33% or more of the time, i.e., after experiencing Sample X, the dolphin chose X at least 10 times and Y at least 10 times (a third or more of his alternatives choices), and after experiencing Sample Y, the dolphin chose Y at least 10 times and X at least 10 times. (2) We considered an object to be the “default” choice in a set if it was chosen more than 50% of the time overall, i.e., if the dolphin preferentially chose that object at least 46/90 times in this 3-alternative task.

The number of outlier objects, and thus reinforcement history with specific objects, varied differentially across categories of sets/acoustic scenes. We found 6 object sets (out of 12 total sets) in which there were outlier objects (1 object per set) overall. In easy sets, there were no outlier objects. The dolphin was reinforced for most choices: The mode difference between high-to-middle and low-to-middle performance accuracy values within easy sets was 3.33% with a range from 3.33% to 20.00% ( = 7.08%). For challenging-but-doable sets, there were two outlier objects. The mode differences between high-to-middle and low-to-middle performance accuracy values were 0.00% and 36.67%, as was the range ( = 12.76%). Set J contained the outlier object Purse which was chosen correctly 27/30 times (90.00%), compared to the other two objects which were chosen correctly 40.00% and 53.33% of the time. In Set O, the object Doctor was chosen correctly only 11/30 times (36.67%) compared to the other two objects which were chosen correctly 73.33% and 76.67% of the time. For hard sets, every set included an outlier object. The mode differences between high-to-middle and low-to-middle performance accuracy values were 6.67% and the range was 0.00% to 56.57% ( = 24.58%). Set B contained the outlier object Flower which was chosen correctly 23/30 times (76.67%), compared to the other two objects which were chosen correctly 26.67% and 20.00% of the time. In Set D, the object KittyCat was chosen correctly 25/30 times (83.33%) compared to the other two objects which were chosen correctly 20.00% and 26.67% of the time. In Set E, the object BugEyes was chosen correctly 18/30 times (53.55%) compared to the other two objects which were chosen correctly 26.67% and 20.00% of the time. In Set R, the object B was chosen correctly 22/30 times (73.33%) compared to the other two objects which were each chosen correctly 30.00% of the time.

Patterns of choices also varied across categories of sets/acoustic scenes. Easy sets included no clear confusions between objects and no default choices; performance accuracy was high, and choices tended to be distributed. Challenging-but-doable sets included no default choices but two sets included clear confusions. In Set C, Bookend was chosen after experiencing MagHolder 11 times, and MagHolder was chosen after experiencing Bookend 11 times. In Set J, Groot was chosen as Z 10 times and Z was chosen as Groot 10 times. In hard sets, there were no clear confusions, but every set had a default object: Flower (chosen 69% of the time), KittyCat (chosen 72% of the time), BugEyes (chosen 60% of the time), and B (chosen 58% of the time). Hence, in hard sets, all of the highly reinforced objects were outlier objects and default choices.

A more rewarding reinforcement history with specific objects did not predict increased number of clicks; rather, the opposite. Purse received the most reinforcement of its set (and within the entire challenging-but-doable category) and received the fewest total clicks (7806 clicks, in its set and in the challenging-but-doable category) compared to the confusable Groot (9841 clicks) and Z (9907 clicks) in its set. The difficult Doctor received the least reinforcement of its group and the most total clicks (9285 clicks) compared to the easier Maleficent (8257 clicks) and Sword (9034 clicks). In three of the hard sets, the default object received the most reinforcement and the fewest clicks: Flower, 12,016 versus 15,027 and 14,799 clicks; KittyCat, 6122 versus 7298, and 7021 clicks; B, 7156 versus 8570 and 8954. In the 4th hard set, BugEyes received the most reinforcement and the middle number of clicks, 8278 versus 7546 and 8506 clicks.

Evaluating the effects of general motivation versus auditory scenes on performance accuracy and effort through “interleaved” sessions

We confirmed that the differences in performance accuracy across easy and hard sets were not due to variability in general motivation for echoic object recognition tasks via an analysis of the interleaved control sessions in which two 18-trial easy and hard session trials occurred in random order within a single 36-trial session for five sessions per six combined object sets (Interleave Set 1 combined original Easy Set G and Hard Set D; Interleave Set 2: easy I and hard E; interleave Set 3: easy Q and hard R). Performance accuracy (original easy = 90.37 & interleaved easy = 94.07%; original hard = 40.37% & interleaved hard = 43.33%) rose a little with these familiar objects, but was not significantly different between the original and interleaved sets, paired t(29) =  − 1.46, p = 0.155. However, number of clicks (original easy = 380.2 clicks/trial & interleaved easy = 404.2 clicks/trial; original hard = 259.1 clicks/trial & interleaved hard = 323.8 clicks/trial) rose to some extent in the interleaved sets compared to the original sets, paired t(29) =  − 2.33, p = 0.027. (Note that the Bonferroni correction to protect experiment-wise error at p = 0.05 with 7 tests, the number of t tests we conducted, is p = 0.007.) Table 3 presents mean performance accuracy and number of clicks/trial for the object sets used in this analysis when originally presented and later interleaved.

Table 3 Interleaved sets: original and interleaved mean performance accuracy and number of clicks/trial on each easy and hard set

Evaluating the effects of a change on auditory scenes through “scene shift” sessions

We next attempted to learn more about how the dolphin’s information-seeking behavior would change when familiar objects occurred in new auditory scenes. (To aid in conveying the relevant histories of sample objects and scenes, we refer to objects that came from “hard” sets as “hard-set samples” or “hard objects”, not because the objects themselves are “hard”, rather their discriminability was affected by being presented with similar objects in “hard” scenes.) We tested the dolphin with familiar objects from hard sets (hard Sets D, E, R) embedded within sessions in which there were 3 trial types: (1) hard-set sample/hard scene: 6 trials with the original hard-set objects, presented as usual, (2) hard-set sample/easy scene: 6 trials in which the samples were original hard-set objects and the alternative choice arrays included the hard-set sample and 2 unfamiliar easy/high-discriminability objects (always 1 PVC object and 1 junk object), and (3) easy scene: 6 trials in which the samples were novel/unfamiliar objects and the alternative choice arrays included a familiar hard-set object, an unfamiliar easily discriminable PVC object, and an unfamiliar easily discriminable junk object. The dolphin experienced three sessions of each of these interleaved trials for each of the hard sets D, E, and R, thus completing nine sessions total in this condition and resulting in only 18 trials of each trial type/set.

Table 4 presents mean performance accuracy and mean number of clicks/trial for each of the trial types with each of these sets. As expected, the dolphin’s performance accuracy, though better than the original accuracy ( = 40.37%) with these now familiar objects, was worst on the familiar Hard-set Sample/Hard Scene trials ( = 51.85%), compared to his strong performance on the new objects in the Easy Scene trials ( = 87.04%). His performance accuracy with the Hard-set Sample/Easy Scene trials was also strong ( = 92.59%), indicating the power of auditory scene in object recognition. Mean number of clicks/trial to the hard-set samples remained lower than the easy samples, as previously found, even when the scenes became easy: Original  = 259.1 clicks, hard-set sample/hard scene  = 286.07 clicks, hard-set sample/easy scene = 262.2 clicks, easy scene = 420.4 clicks. Of course, objects in Easy Scene trials were both highly discriminable and relatively unfamiliar, attributes that resulted in more clicking on previous sets. See Fig. 8 for a graphic representation of these results.

Table 4 Scene shifts: performance accuracy and number of clicks/trial for original hard scenes, hard-set samples in hard scenes and unfamiliar easy scenes, and unfamiliar easy scenes
Fig. 8
figure 8

Performance accuracy (bars) and mean number-of-clicks/trial (line) for scene-shift trial types

Discussion

Our goal in this study was to acknowledge and begin characterizing some of the top–down cognitive processes dolphins might bring to bear for echoic object recognition as targets and auditory scenes that varied in their decipherability methodically shifted from being completely unfamiliar to familiar. We predicted that the dolphin’s information-seeking behavior, defined as number of clicks, would change as he learned more about objects and adjusted to the auditory scenes within which they appeared. The dolphin proved us right.

Our central conclusion is that Calvin exhibited self-governed information-seeking strategies to promote his recognition of objects. Understanding cognition requires inferring cognitive processes based on changing behavior, and we contend that Calvin’s behavioral changes across scenes support this conclusion. We highlight our findings in more detail below, but, in short, we found that in scenes in which objects were easy to recognize, Calvin lightly reduced his effort (number of clicks) across sessions as objects became more familiar but continued to click at fairly high levels, likely because he needed enough information to identify successfully (as he did) the sample object among the alternatives in the set during the choice period. In contrast, Calvin’s behavior in scenes in which objects were difficult to recognize was quite different. In these scenes, he ultimately reduced his effort (number of clicks) by the final session, likely because he often defaulted to a single object during the choice period in these scenes. That is, he shifted the task from being primarily a matching task to being a “find one object” task. This task requires fewer clicks towards the sample, because the choice is mostly foregone. Finally, in scenes with objects that were difficult but possible to recognize—sets with more confusions between pairs of objects—Calvin did not reduce his number of clicks by the final session. Rather, his effort remained the same likely to manage this challenging object recognition task.

To reach our conclusions, first, we confirmed that the auditory scenes were differentially decipherable based on the dolphin’s performance accuracy with the scenes. Like other dolphins (e.g. DeLong et al. 2007; Xitco and Roitblat 1996), Calvin could use echolocation to easily discriminate objects in some scenes and barely discriminate objects in others. His highest performance accuracy (100%) for a session was 4.5 times better than his lowest performance accuracy (22%) for a session. We also confirmed that his information-seeking behaviors differed across sessions: the number of clicks he produced to the sample objects varied widely. Again, the highest number of clicks he directed to a sample in a trial (695 clicks) was 4.5 times the lowest number of clicks (155) he directed to a sample in a trial. Of great interest is why.

A comparison of Calvin’s investigations of samples related to the four hardest (worst performance accuracy) and four easiest (best performance accuracy) scenes suggested that Calvin calibrated his responses based on the decipherability of the scene. For easy and hard object scenes, Calvin reduced his number of clicks as he gained more experience with the objects and scenes. However, this reduction was significantly greater for hard scenes versus easy scenes. Investigatory effort of samples changed quickly in the first sessions with the object sets. By the end of the first three trials, Calvin was producing about 64 fewer clicks towards samples in hard object sets versus easy object sets; by the end of the first session, this difference was about 89 clicks. However, in the second and third sessions, he rallied; his effort for easy and hard objects was within 12 and 18 clicks, respectively, for the different scenes. The real plummeting occurred in sessions 4 and 5, with 158 and 138 fewer clicks/trial, respectively, for samples related to hard scenes versus easy scenes. These changes suggest that Calvin was sensitive to the discriminability of the objects in the auditory scenes almost immediately after exposure to them, and he responded by expending less effort to analyze the most difficult scenarios. Then, he rebounded by engaging in similar effort no matter the difficulty of the scene, continued to do poorly, and ultimately tailored his strategies based on his experiences with the discriminability of the objects. He continued to put effort into the echoically accessible scenes, though he reduced his effort to some extent as the objects became familiar, but he spent significantly less effort on the impenetrable scenes, similar to Au et al.’s (1982) dolphins in the echoic object detection task when the noise was high. Both dolphins in Au et al.’s study reduced effort, one to the extent that he did not echolocate at all on some trials.

Familiarity does not in itself necessarily lead to reduced effort for object recognition. Because we engaged in a common practice to analyze our data—looking at the extremes (the hardest and easiest scenes) in a complex data set—we also chose to analyze four challenging-but-doable sets to assess more clearly the effect of auditory scene decipherability versus familiarity on effort. In these sets, the dolphin’s performance accuracy was low at the start but improved by at least 15% from the first to final sessions. In addition, the dolphin’s number of clicks was not significantly lower on the fifth session compared to the first across the sets as a whole, in contrast to the easy and hard scenes. Although there are few other data related to unfamiliar targets and effort in dolphins, we do know that the dolphin Rake continued to work harder (i.e., to click more) to recognize more difficult, versus easier, familiar objects, and that he worked harder to recognize matches when they occurred in different positions (Roitblat et al. 1990).

Calvin’s general motivational state does not explain his differential performance accuracy and investigatory strategies with the hard and easy sets. The fact that Calvin’s number of clicks with the hard sets began similarly to the easy sets, that his clicking rebounded in sessions 2 and 3 with the hard sets, and that his number of clicks held steady in the challenging-but-doable sets, all suggest that the differences in performance accuracy and effort were not based on vagaries in overall motivation. Moreover, we checked this explanation explicitly by interleaving difficult and easy sets all within the same session. In the interleaved sessions, performance accuracy and number of clicks rose to some extent compared to the original sets, perhaps because the objects were more familiar and interleaving allowed a higher reinforcement level in the session given the easy trials, thereby raising motivation and/or affect; or perhaps because the scenes were changing throughout the session and auditory scene changes may invite more investigation, because the dolphin cannot predict which set will be presented, thereby affecting his efficiency at using echolocation to identify expected salient features or engaging in some other mechanism at play. In any case, the object sets remained more (easy sets) and less (hard sets) discriminable in these sessions relative to each other, and the dolphin continued to produce fewer clicks for hard sets versus easy sets. We conclude that some auditory scenes are harder to decipher than others, and the dolphin acts on his abilities to recognize (or not) the objects across different scenes by changing his investigatory strategies.

The effect of reinforcement is always a question in studies with trained animals. Several factors indicate that Calvin did not learn to click less with difficult sets versus easy sets just because he received fewer fish for the difficult sets. First, there was a behavioral/time lag between his investigation of the sample (the point at which we measured number of clicks) and his experience of the secondary reinforcer (or its lack) after making a correct choice (or making an incorrect choice). After inspecting the sample, he still had to swim to the alternative array, inspect the alternatives, and make a choice before the secondary bridge whistle was emitted. That is, he performed many behaviors before receiving fish; clicking at the sample was just one of them, and fairly distant from the reinforcers. Second, although number of clicks decreased more for hard object sets than for easy object sets, this number decreased for both kinds of sets, even though he received a good deal of reinforcement for the easy sets: for easy sets, he received fish for 329 of 360 total trials (i.e. 91% of the trials), and yet, he decreased the number of clicks he directed toward the samples. Third, patterns of choices and reinforcement history with specific objects varied across acoustic scenes, providing a window into understanding more about the role of reinforcement history with specific objects as well as the dolphin’s approach to managing discriminability differences among objects. In easy sets/interpretable scenes, the dolphin could identify the objects and therefore distributed his choices across the objects in these sets, receiving fairly even reinforcement across all objects. In challenging-but-doable sets, two object sets included an outlier object: one of these objects was easy for the dolphin to identify and so he received an unusually high number of fish for this object (fish for 27/30 trials), and the second was difficult to identify and so he received substantially fewer fish for this object (fish for 11/30 trials). The dolphin produced the fewest total clicks to the easily identifiable outlier object both within its set as well as across all the challenging-but-doable sets (this outlier was also the most easily identifiable object in this whole category). In contrast, the dolphin produced the most clicks within its set to the difficult outlier. Finally, in every hard set, there was a default object that the dolphin chose from 58 to 72% of the time and for which he received the highest reinforcement in the set; nevertheless, in three of these four cases, the dolphin produced the fewest clicks to the default object. Hence, it is likely not reinforcement that was the primary driver governing Calvin’s clicking behavior, but rather his growing knowledge of the scenes and the objects’ discriminability within the scenes.

Finally, we tried a small number of scene-shift sessions in which objects from difficult scenes were inserted into easy scenes, and though few, these sessions were suggestive. Within this limited set, the dolphin’s clicks to the hard-set objects remained fairly steady no matter the scene, even though his performance accuracy in new easy scenes was double his original performance accuracy when these objects were less discriminable. Of course, in this experiment, we gave Calvin very little time to adapt to the new contexts. Within those final Scene Shift sessions, the hard-set objects only appeared as samples within easy contexts on a third of the trials within each session, and there were only three sessions with each original set for a total of nine sessions. Nevertheless, his stability in effort for hard-set objects shows that for easily discriminable scenes, more clicks are not required for good performance. That is, Calvin continued with relatively low numbers of clicks for the hard-set objects, and he still did well when those objects appeared in easy, discriminable scenes. However, the number of clicks directed towards the unfamiliar easily discriminable objects was 60% higher than those directed to the familiar hard-set objects. This increased number of clicks is consistent with Calvin’s investigatory behavior throughout the experiment: he often clicked more to unfamiliar objects and to more easily discriminable objects.

Altogether the data suggest that Calvin’s clicking effort was governed by information-seeking to recognize objects in different auditory scenes. In sets/scenes in which he was very good at recognizing objects, he maintained fairly high clicking levels, levels likely required to manage recognition of all the objects in a set. In sets/scenes in which he was poor at recognizing objects, the chances were high that he would choose a default object, making the identity of the sample potentially irrelevant—a likely reason that the number of clicks to the sample was significantly reduced. On the other hand, the number of clicks he directed to objects from those hard sets was good enough for object recognition when those objects appeared in easier scenes: his number of clicks to those objects did not change when objects from those sets appeared in “scene-shifted” sessions with easier alternatives, and he succeeded in identifying them. In sets/scenes in which objects were more easily confused/harder but possible to recognize, Calvin maintained clicking effort at similar levels from the first to fifth sessions. That Calvin used different strategies in different scenes—identifying the objects correctly when he could, choosing a default object when he could not, producing more clicks when there were confusions—is a strong indicator that number of clicks is related to information-seeking. This conclusion is also consistent with the finding that the number of clicks was not reduced when Calvin received lower reinforcement for a particular object. In fact, number of clicks tended to be inversely related to reinforcement history with “outlier” objects, objects that had a substantially different reinforcement history than others in its set. In easy scenes, there were no outliers, because Calvin could identify all of the objects reasonably well. However, in scenes in which choices of one object were reinforced substantially more often than others in a set, Calvin clicked least to those objects five out of six times. The simplest explanation is that Calvin’s clicking behaviors were governed by information-seeking.

Of course, there are many reasons to emit echolocation clicks beyond object recognition. In this study, we followed Roitblat et al.’s (1990) lead and considered the emission of echolocation clicks to be a form of information-seeking to recognize objects—but not the only use of Calvin’s clicks. For one, he was wearing eye cups and surely used his clicks to help him swim from one place to the next. Because the path was the same across scenes, we expected his navigational clicks to be mostly similar across sets with occasional variability likely distributed across the many trials (90 trials/set). Similarly, we expected that if he employed a stereotypical investigation technique no matter the difficulty of the echolocation matching tasks, he would produce similar numbers of clicks across all stimuli. Because we were interested in finding out how he would approach an investigation of different auditory scenes, we allowed the dolphin to examine the objects ad libitum, while he was swimming. While our method helped us find differences across scenes, we could not determine the minimum number of clicks the dolphin might need to identify an object nor how many clicks he used just to move around. Determining how much information each click provides just for object recognition is work left open for another day, as is learning more about other factors (e.g., emotional state, style of investigation like number of fixations and fixations to salient features, overly familiar objects potentially leading to boredom, etc.) that may drive click quantity.

Analyzing investigatory movements across scenes

During the scene-shift sessions, Calvin engaged in some interesting information-seeking behaviors that we did not systematically analyze but believe may be important. He would occasionally “peek” at the alternative array by directing his rostrum towards it, while echolocating, on his way to the target sample object. This behavior is of interest, because it could have allowed him to assess the auditory scene to come. However, because Calvin did not change his effort when hard-set objects appeared in new easy scenes, his assessment, if completed, did not appear to have a significant effect on his inspection of the sample targets. On the other hand, he did well in these new easy scenes without investing more effort, so perhaps he assessed but needed to change nothing in his investigatory effort.

Across the study, object sets were clearly more and less discriminable, and the dolphin responded to that general characteristic. Of great interest is what makes some object sets confusing to dolphins and others not. Some of the object sets varied in material, but many did not. All the hard sets were purely PVC, but so were two of the easy sets and three of the challenging-but-doable sets. All of the objects varied in shape, and some varied in size. Some were more complex than others. Next steps should include recording investigatory clicks and returning echoes from both samples and alternative objects and analyzing them in relation to the dolphin’s confusions, then using those data to design the next objects with clear hypotheses in mind. Analyzing how a dolphin moves and changes in situations in which she becomes more adept at discrimination would also allow us to determine how the investigation may become more specific—and, hopefully, why.

One investigatory behavior of interest that we did not systematically investigate was Calvin’s behavior with target objects at close range. With fair regularity, Calvin investigated the sample target object when he was centimeters from it, sometimes even touching the object while buzzing it. (He did not touch alternative choice objects.) These investigations sometimes took the form of outlining the object. Both the behavior and the tight click train could allow the dolphin to gain detailed information given the high repetition rate (Moss and Surlykke 2001), but what information? And how at such close range? Recent work on tactile sensitivity of the rostrum, melon, and blowhole (Strahan et al. 2020) may be relevant. In any case, this behavior and its potential functions and outcomes deserve future study.

Conservation

Although this study began very simply in terms of auditory scene analysis, the usefulness of the framework comes in part in considering future work in labs and in the wild. With dolphins, we clearly need future lab work in which we capture and analyze the echoic returns that the dolphin receives, investigations and returns of the auditory scenes/alternative choice arrays themselves, more complex environments, more ecologically relevant targets, eavesdropping scenarios, and more, ideally using multiple approaches—like the behavioral ones illustrated here—in concert with complementary investigations using other technologies, e.g., AEP/EEG studies that focus on determining which sounds dolphins classify together, a central aspect of understanding how they organize auditory scenes (e.g. Schalles et al. 2021). In the wild, tracking information-seeking behaviors more completely with a specific animal would also improve our knowledge (Madsen and Surlykke 2013). Taking Moss and Surlykke’s (2010) focus on information-seeking behaviors as windows into discerning what kinds of information echolocating animals need and how they get it is a powerful framework for helping us think about cognition in echolocating wild cetaceans.

A better understanding of top–down processes, attention, expectations, and motivation in decoding echoic scenes may help us in the conservation of echolocators. For example, allocation of attention is ecologically relevant. If animals engage in less effort in particularly complex auditory scenes, they may miss predators or nets or other important environmental threats. As noted by Malinka et al. (2021a, b), harbor porpoises appear to have the technical capacity to avoid nets, yet they drown in them as bycatch. Could this be an attention problem? Can learning about cues help? Expectations also have a clear effect on the quality of life of dolphins in the wild. For example, Nachtigall and colleagues (2014, 2016) confirmed that warning sounds before loud noises allow animals to prepare for the noise through hearing-dampening to reduce their sensitivity in the expectation that a loud sound will occur. Again, how can we teach them cues to help them prepare for an on-coming din? The investigation of auditory scenes by marine mammals is increasing via work with recording tags and hydrophone arrays (e.g., de Freitas et al. 2015; Ladegaard et al. 2015; Malinka et al. 2021a, b). Finding a way to incorporate study of top–down processes in these ventures could strengthen our ability to understand and help wild cetaceans. The world they live in is the perceptual world they build, and top–down processes are central to its creation.

Conclusions

Here, we found that dolphins engage in top–down processing when working to recognize target objects that appear in auditory scenes that are easier and harder to decipher. They learn about and remember the objects they echolocate and that learning affects their echoic inspections of objects. When objects are unfamiliar, Calvin invests high effort to investigate the objects, and then, he responds fairly quickly to his ability to decipher the scenes. With challenging-but-doable scenes, Calvin maintains his effort to decipher the scenes and his object recognition abilities improve with experience. With familiar objects in easily decoded scenes, Calvin produces fewer clicks compared to his original investigations when the objects were unfamiliar, perhaps because he becomes more efficient at recognizing them, but he maintains enough clicking for object recognition. Finally, familiar, difficult scenes receive the lowest relative effort, likely because the dolphin learns the auditory scene is essentially indecipherable in terms of object discriminability and simplifies the task to finding a single object. Calvin also remembers the objects from difficult scenes and continues to produce a low number of clicks to those difficult-scene target objects even when the scenes become more amenable to object recognition, at least in the short term. Overall, memory for particular objects in specific auditory scenes results in a calibration of attention in information-seeking efforts to recognize those objects.

By taking advantage of the public accessibility of information-seeking in a free-swimming echolocating dolphin by counting the clicks he used to investigate target objects that ultimately appeared in simple auditory scenes as the objects moved from being unfamiliar to familiar across 20 different object sets, we were able to learn more about how top–down processing affects echoic investigation of objects in relation to auditory scene. Studies of dolphin echolocation rarely—if ever—include so many unfamiliar objects. Often, objects are machined, aspect-independent, and few, sometimes a single “standard”, to make it easier for acousticians to predict and decode what is happening with clicks and echoes. For some questions, this approach is powerful. For others, not so much. Dolphins live in complex environments, and somehow they manage to recognize objects echoically in the richness of the world. If we want to understand how they do it, we need to jump into the deep end and figure out what cognitive powers they bring to bear to manage this remarkable feat. That goal means we need to determine what they learn about objects and how that learning affects their investigations of objects within varying scenes. To do that, we need to let them swim, so they can tell us what they are looking for through their movements, clicks, returning echoes, and dynamic responses to the information they are getting. And to study learning, they have to learn, which means we need to give them a shifting array of unfamiliar objects to study how they adapt as they learn. Our current approach provides a good option for this work if it is tackled by a team of interdisciplinary scientists with expertise in cognitive processes, behavior, acoustics, communication, engineering, neuroscience, modeling, and who are open and flexible collaborators, to design, record, analyze, interpret, and model the dolphin’s information-seeking methods. In 1974, Nagel asked, “What is it like to be a bat?” Scientists studying bats and their responses to auditory scenes have made great strides in working to answer that question (though perhaps Nagel would not agree that more perceptual data will help with the thorny question of subjective experience), a worthy goal as bat populations around the world face growing environmental threats. Although the barriers of enlisting dolphins into research and the difficulties of working in salt water are formidable, dolphins too are facing substantive environmental threats as ocean temperatures rise, pH levels drop, and coasts face major change due to sea-level rise and land subsidence. What will our own effort be in addressing the question, “What is it like to be a dolphin?” Click to answer.