Introduction

All newborn animals face the challenge of developing the physical and cognitive skills required to survive and cope in the novel environment they enter at birth. Fast physiological development often characterizes precocial species such as antelopes (Grillner 2011), chickens (Muir et al. 1996), and rabbits (Carrier 1995), which acquire adult walking, running, and jumping capabilities soon after birth. Conversely, altricial species such as humans have a prolonged ontogeny of locomotor skills (Carrier 1996; Grillner 2011). Intensive parental care enables neonates of altricial species to retard the development of locomotor skills in favor of increased investment in brain development leading to larger brains (Iwaniuk and Nelson 2003; Shultz and Dunbar 2010; West 2014). Social groups with pronounced inter-individual relationships and extended parental care are often characteristic of altricial, big-brained species (Dunbar and Shultz 2007; Emery et al. 2007; Shultz and Dunbar 2007; Dunbar 2009; West 2014). Hence, it has been hypothesized that social animals have evolved large brains to solve complex ecological tasks in a social context, as well as to remember social interaction histories with their conspecifics in order to solve the dilemma of with whom to cooperate (Joffe 1997; Shultz and Dunbar 2007, 2010; Dunbar 2009). However, the advantages of being able to manage such social interactions come at the cost of a prolonged developmental period of learning (Zeveloff and Boyce 1982; Joffe 1997; Shultz and Dunbar 2010). Additionally, the time and energy invested in acquiring complex social skills may affect the development of other faculties and, hence, likely explain the late maturation of advanced locomotor skills and thereby independent foraging of humans and other large apes.

Despite living in a markedly different environment, many cetaceans possess many of the key features expected of altricial species: highly complex social lives (Connor et al. 1998; Rendell and Whitehead 2001), large brains allowing for complex cognition (Marino 2002; Marino et al. 2004, 2007), functionally diverse communication systems (Payne and McVay 1971; Rendell and Whitehead 2003; Filatova et al. 2012; King and Janik 2013), prolonged parental care (Whitehead and Rendell 2015), and extensive learning capacities (Janik and Slater 2000). However, due to their aquatic environment, neonate cetaceans require basic locomotor skills to get to the surface to breathe, to suckle, and to keep up with highly mobile mothers in their typically vast three-dimensional oceanic habitat. This raises the conundrum of how cetaceans handle the timing of the ontogeny of their locomotor skills and social behavior. This dilemma is particularly relevant to sperm whales (Physeter macrocephalus) who possess the largest brain in the animal kingdom (Marino et al. 2004), employ a complex, long-range biosonar system (Madsen et al. 2002), live in a complex, multileveled social structure with long-lasting, stable social units at its base (Whitehead et al. 1991; Christal et al. 1998; Whitehead 1999; Gero et al. 2015), and whose social interactions appear to be mediated through a diverse click-based communication system (Rendell and Whitehead 2003; Gero et al. 2016). Yet, this species lives in a pelagic habitat where they range widely and hunt for squid during deep (ca. 750 m) and long (ca. 45 min) foraging dives (Watwood et al. 2006; Whitehead et al. 2008).

Observations of a sperm whale birth suggest that newborn sperm whales acquire basic locomotor skills within hours after birth (Weilgart and Whitehead 1986), and that young calves appear able to track the movements of their natal social unit from the surface (Gordon 1987; Whitehead 1996). The increased predation risk for calves staying near the surface appears to be mitigated through alloparental care in the form of babysitting (Whitehead 1996; Gero et al. 2009). As such, the current understanding of neonate calf behavior is that they spend most of their time at the surface and so do not need to rapidly develop the locomotor, sensory, and diving skills to either avoid predation or to perform deep foraging dives (Best et al. 1984), thereby potentially allowing for an early energy allocation toward social development. However, current evidence of sperm whale calf diving abilities is ambivalent. As is the case for all mammals, sperm whale calves initially gain all their energy by suckling (Best et al. 1984). However, the stomach contents of 1-year-old calves have been found to include some solid food items which may reflect independent foraging or food provisioning by adults, while milk in turn has been found in the stomach of a 13-year-old juvenile male (Best et al. 1984), providing some uncertainty for when the onset of independent, deep, foraging dives might occur. Gordon (1987) demonstrated that two sperm whale calves dove to approximately 300 m suggesting significant diving capabilities although this is only half of the depth routinely reached by foraging adult females (Watwood et al. 2006).

Studies of captive sperm whale calves undergoing rehabilitation have shown that neonate and young calves (up to 7 m, corresponding to an approximate age of 2 years (Lockyer 1981)) are able to produce low frequency clicks which may be precursors for echolocation clicks (Watkins et al. 1988; Madsen et al. 2003). Recent studies have further shown that wild sperm whale calves at the age of 3 months can emit clicks in the form of codas (Gero et al. 2016). Among adults, these vocalizations serve a communicative function (Watkins and Schevill 1977). It may be speculated that such communication is important for facilitating reunion of calves with mothers or babysitters ascending from foraging dives. Sperm whale calves have been proposed to emit a click-based contact call similar to northern fur seal pups (Callorhinus ursinus) and chacma baboons (Papio cynocephalus ursinus) (Rendall et al. 2000; Insley 2001). This notion is based on the omni-directional, low frequency, and long duration clicks emitted by neonate sperm whales in captivity (Madsen et al. 2003). Alternatively, calves near the surface may track foraging adults (Gordon 1987) by eavesdropping on their echolocation. Thus, the current evidence leave open the question of whether resources are invested more in developing diving or social capabilities in sperm whale calves.

To shed light on this question, we deployed multi-sensor sound and movement tags on three first-year calves, to obtain new detailed information on locomotor and vocal capabilities for a wild toothed whale calf. First, we sought to assess whether calves participate actively in social communication by emitting codas, and if calves emit codas in the context of reuniting with their mothers. Second, we wanted to evaluate the locomotive development of sperm whale calves by examining the extend of their diving behavior. Given their presumably smaller oxygen stores, we hypothesized that calves are not able to dive as deep and for as long as adult sperm whales. Third, we attempted to examine whether first-year sperm whale calves gain energy exclusively from suckling or supplement it by foraging. We hypothesized that when calves start foraging, they would echolocate for shorter periods than adults as a consequence of potentially shorter dives and hence less time within the prey field. We show that sperm whale calves less than 1 year old have unexpectedly well-developed diving capacities and that they emit echolocation clicks and buzzes which may be consistent with foraging. In comparison, social sounds were rarely produced, suggesting vocal communication may develop later or calves did not need to vocalize during these relatively short periods of tagging—through which they seemed to be diving for much of the time. Hence, delayed locomotor ontogeny may not be a strict prerequisite for developing complex social skills and may be circumvented in instances where the environment necessitates an early and rapid development of physiological capabilities.

Material and methods

Field site, animals, and tagging

Field research was conducted as a part of a longitudinal study of well-known sperm whale social units off the coast of the island of Dominica (15.30° N 61.40° W) (Gero et al. 2014). Tagging was performed from an 11-m rigid-hulled inflatable boat that also served as the primary observation platform during three consecutive field seasons in 2014, 2015, and 2016.

Sperm whales were located acoustically using an HTI-96 hydrophone in a baffle to provide directionality. Clusters containing calves were approached from behind and priority was given to tagging calves before any adult whale. Dtag version 3 sound and movement tags (Johnson and Tyack 2003) were attached to the whale by four silicone suction cups and deployed using a 9-m hand-held, carbon fiber pole. We continuously evaluated the response to the tagging attempt of the calves to ensure that it was safe to proceed. No invasive sampling was conducted during or after the tagging and a minimum distance of 100 m was kept after tagging to minimize the potential for disturbance. All whales, including tagged whales, were identified by photo-identification of distinct markings of the trailing edge on their flukes for adults, or the dorsal fins for calves (Arnbom 1987; Gero et al. 2009). Surface observations of cluster composition (sensu Gero et al. 2014) were performed throughout the day to determine the calves’ association with adult whales. Finally, far field recordings as well as sloughed skin and fecal samples were collected in the flukeprint after whales made deep fluke-up dives. Data was not collected blindly because our study involved focal animals in the field.

The Dtags sampled audio on two channels (to get a bearing to the sound source) at 120 or 125 kHz with a resolution of 16 bits, providing a flat (± 2 dB) frequency response between 0.4 and 50 kHz, and a clipping level of 184 dB re 1 μPa. Pressure and acceleration were sampled at a rate of 100 Hz and 500 Hz, respectively, both with 16-bit resolution, decimated to 25 Hz for analysis. All analyses were performed in Matlab (ver. 9.0 R2016a, Mathworks, Inc.) using custom written scripts (http://www.soundtags.org).

Dive behavior

Calf dives were divided into two categories, deep dives (> 50 m) and potential suckling dives (< 50 m and longer than 30 s). The 50-m separation value is based on the division between the multiple shallow dives and the fewer deeper dives performed by the three calves using histograms of the data following Watwood et al. (2006). Adult dives deeper than 50 m were similarly termed deep dives and used for comparisons between adult and calves. Calf and adult deep dives were divided into three phases based on the body pitch angle: the descent phase, the bottom phase, and the ascent phase (sensu Watwood et al. 2006). To avoid transient pitch oscillations falsely shortening the descent and the ascent phases, the end of the descents and the start of ascents were constrained to occur at a depth greater than 50% of the maximum depth of the dive (Fig. 1).

Fig. 1
figure 1

Dive profiles over the entire tag deployments showing changes in depth over time for Calf J (a), Calf R (b), Calf A (c, with specific dives I–IV, see Fig. 3) and an adult from Unit J (whale ID = 5987, year 2016) (d). Periods of calf clicking (red), loud adult echolocation (blue), coda bouts (magenta), calf buzzes (black circles in c), adult buzzes (black circles in d), and physical contact with another whale (yellow) are superimposed on the dive profiles. Insert (e) shows the percentage of the recording time containing codas (black), echolocation clicks (dark gray), or no apparent sounds emitted by all audible sperm whales (light gray) and insert (f) shows the tag placement on Calf A

For deep dives, the percentage of time gliding (periods of no fluking) was determined for the calves and the adults by obtaining the root mean square (RMS) of the differentiated pitch in 10-s block. This block size was chosen to cover two fluke strokes assuming a stroke rate of 0.2 strokes per second (Sato et al. 2007). Each 10-s block of the dive with an RMS lower than 20% of the mean RMS of the dive was regarded as a period of gliding. The 20% threshold was set from visual examination of the acceleration signals for all dives (Fig. 2) and this relative threshold was chosen to accomodate for the whales being tagged at different parts of their bodies resulting in different amounts of acceleration for the same fluking effort. The dominant stroke frequency (sensu Sato et al. 2007) was calculated for periods with continuous fluking in descent phases for adults and ascent phases for calves. These different dive phases were chosen to take the different buoyancy of the calves and the adults (see results) into account and thereby comparing the phases of the dives where the whales were working against their buoyancy. Swim speed during descent and ascent was estimated from vertical velocity and body pitch angle (in turn, calculated from the triaxial acceleration low-pass filtered at one half of the dominant stroke frequency) using a two-state Kalman filter and a Rauch smoother (Zimmer et al. 2005).

Fig. 2
figure 2

Dive profile (black) and change in pitch (blue) for Calf A (a) and one of the adults (whale ID = 6052) (b). Vertical lines indicate the end of the descent phase and the start of the ascent phase. RMS values for each 10-s block of the dive for Calf A (c) and the same adult (d), the red line indicates upper threshold for gliding periods set to 20% of the mean RMS value of the dive

Randomization tests were conducted to compare the dive parameters of the calves and the adults. For each calf, the median of dive durations, maximum dive depths, stroke frequencies, and percentages of time spent gliding was extracted across all deep dives. Similar medians were extracted from six dives for each of the ten adults and the juvenile male (Tabel 1). This number of dives was chosen based on the lowest number of dives recorded from the individual calves. Six random dives were chosen for each of the ten adults and the juvenile whale. In cases where a whale had been tagged on several occasions these six dives were chosen randomly from the different tag deployments, however with a criterion that one of these six dives was the first dive on a tag recording. This was done to ensure that the sampling method of adults matched that of calves, where all dives, including the first dive with potential, but not apparent tagging effects, were included. This resulted in a pool of 66 adult dives for bootstrap analysis. When comparing calves and adults, an adult median value was extracted for each dive parameter from a randomly chosen subset of these dives. The number of deep dives in the subset for comparison against a calf was equal to the number of deep dives made by the given calf (i.e., 7, 15, and 6 for Calf J, Calf R, and Calf A). This was done to ensure similar sample size of calf and adult dives. Such an adult median was drawn 1000 times and the proportion of times that the median value for the calf was lower or higher than the median value for the adults was calculated.

Table 1 Tag deployment and data summery including date of tagging, the name, ID number, age-class and sex of the tagged whale, the composition of the tagged whale’s social unit at the time of the tagging, tagging response, duration of tag deployment, and whether the data was used for comparison of dive parameter and echolocation

Acoustic activity

All calf recordings were examined using a custom Matlab tool that allowed listening and visual examination of successive 15-s long windows of the tag recording using a spectrogram display (Hamming window, NFFT = 512 and 50% overlap). Potential echolocation signals from the tagged calf (having a high and stable amplitude above 125 dB re 1 μPa peak) were marked for further assessments along with any associated buzzes and codas. Individual echolocation and coda clicks for both calves and adults were identified using an automated click detection algorithm. The detected clicks were subsequently visually evaluated by a single observer (PT) to make sure that no clicks were missed and that obvious false detections, such as noise or distant whales, were eliminated.

Echolocation

The inter-pulse interval (IPI), the angle of arrival (AOA) on the stereo hydrophones, and the apparent output level (AOL) (Fais et al. 2015, 2016) were examined for all detected echolocation clicks recorded on the calf tags to determine whether they were produced by the tagged calf or a nearby adult. Detected clicks were low-pass filtered using a second-order Butterworth filter with a cut-off frequency of 5 kHz. Clicks with a signal to noise ratio (SNR) below 20 dB or a received level at the tag of less than 154 dB re 1 μPa peak were removed to exclude weak signals that were either misdetections or clicks from distant animals. The SNR was calculated from the RMS of a 1-ms window centered on each detected click (signal) and a 3-ms window starting 15 ms before the click (noise). Clicks with a peak amplitude within 10% of full-scale were excluded to avoid clipping. The IPIs of the accepted clicks were then obtained from inspections of the envelope of the click waveform, computed as the absolute value of the Hilbert transform. The two highest peaks of the envelope were identified, corresponding to the first and second pulse of the click, and the time difference between these was taken as the IPI (based on Bøttcher et al. 2018). We set an upper limit of 5 ms for the IPI corresponding to a maximum body length of roughly 12 m (Gordon 1991), as the average length of mature female sperm whales off Dominica is 9.2 m (Bøttcher et al. 2018).

AOA was estimated using the time delay between the recordings of the same click from the two hydrophones of the Dtag. For clicks emitted by the tagged whale, the AOAs are expected to be stable as the position of the sound producing organ changes very little relative to the tag. Abrupt changes in AOA might however occur due to sliding of the tag on the whale. If another nearby whale, on the other hand, produces the clicks, the AOAs are likely to vary continuously due to changes in the relative position and orientation of the tagged whale and an echolocating conspecific. The AOA was calculated using the following expression:

$$ \mathrm{AOA}={\sin}^{-1}\left(\frac{\Delta \mathrm{time}\cdot c}{\mathrm{dist}}\right) $$

where Δtime is the time difference (s) between when a click was recorded by the two hydrophones, c is the speed of sound in water (1500 m/s), and dist is the distance between the two hydrophones (50 mm). The time difference was estimated from the cross-correlation of the click recorded on the two hydrophones. To help resolve the peak time in the cross-correlation, the click signals were interpolated by a factor of eight.

Lastly, the peak to peak apparent output level (AOLpp) was used to aid the evaluation of whether the calf or a nearby adult was echolocating. Assuming that the calf emits clicks of a near constant amplitude, which seems to be the case for adult sperm whales (Madsen et al. 2002), there should be little variation in the recorded AOLpp, whereas AOLpp of a nearby echolocating whale will fluctuate according to the distance to, and the heading of, the echolocating whale. All clicks with an IPI lower than 2 ms were assigned to the calves and clicks with an IPI higher than 2 ms were assigned to a nearby adult; this approach is supported by the stability of AOA and AOLpp (see Fig. 3). The SNR and clipping criterion excluded none or very few clicks across all whales. Clicks that did not get assigned an IPI due to a low amplitude were ascribed to the animal producing the preceding and subsequent IPI-confirmed clicks. To summarize, candidate clicks were presumed to come from the tagged calf if (i) the click SNR was > 20 dB and AOL > 154 dB re 1 μPa peak; (ii) the AOAs were fairly constant except for occasional steep changes due to tag sliding; and (iii) IPI < 2 ms.

Fig. 3
figure 3

Dive profiles for Calf A’s four deepest dives (a–d), inter-pulse interval IPI (e–h), angle of arrival AOA (i–l), and apparent output level AOLpp (m–p) of high-level clicks recorded by the tag on Calf A during dives (a), (b), (c), and (d). Blue indicates nearby adult echolocation (ad) and adult buzzes (i, km, and op) and red indicates Calf A’s echolocation (bc) and calf buzzes (jk and no). Red vertical lines indicate the shifts from calf to adult click production

All adult echolocation data used for comparison stems from six dives from each of the 10 adult females and the juvenile male, which performed six or more dives during a recording (Table 1). For adults, clicks with consistently high amplitudes were classified as produced by the tagged adult. Buzzes performed in the second and third dive of each of these 11 individuals were used to compare against the calf buzzes. Adult and calf buzzes were defined as a succession of clicks with an ICI lower than 0.2785 s based on the distribution of all adult echolocation clicks (Fig. 4, method sensu Teloni et al. 2008). Due to the high decay rate of the pulses within buzz clicks and their low signal to noise ratio, it was not possible to obtain IPIs of buzz clicks. However, buzzes recorded on the calf tags were assigned to the calf or an adult based on the IPI of clicks before and after the buzzes. Echograms (Johnson et al. 2004) were made to test if any echoes from ensonified objects such as prey could be detected during the calf buzzes. This was done by plotting low-pass filtered (fourth-order Butterworth filter with a cut-off frequency of 5 kHz) envelopes of sound segments of subsequent, outgoing clicks. A Hanning window was additionally applied to the sound segments to emphasize potential echoes and deemphasize the outgoing clicks. The distance to potential echoic objects was calculated from the arrival time of the echo assuming a sound speed of 1500 m/s.

Fig. 4
figure 4

Histogram of inter-click intervals (ICIs) of echolocation clicks of Calf J (green), Calf R (yellow), Calf A (light blue), and adults (dark blue). The bi-modal distribution demonstrates the change in ICI between normal echolocation and buzzes

Codas

The IPIs of clicks within all detectable codas in the calf recordings were examined to determine if the calves emitted any of the codas. All clicks with an IPI less than 2 ms were visually inspected to eliminate clicks with no apparent pulse structure. This cut-off value was chosen, as echolocation clicks judged to be from the tagged calves had IPIs shorter than 2 ms for all three calves. An IPI of 2 ms corresponds to a body length of 7.7 m (Gordon 1991), slightly greater than the average body length of 6 m reported for first-year calves (Lockyer 1981) while the IPI of echolocation clicks from adults off Dominica range between 2.73 and 3.34 ms (Bøttcher et al. 2018).

IPIs of coda clicks were estimated following the same procedure as for echolocation clicks, except that the SNR criterion was omitted to avoid missing any calf codas. Since the calves only emitted codas sporadically, the stability of AOA and AOL could not be used to support the determination of whether the tagged calves or the adults emitted the codas. A coda was therefore assigned to a calf if the IPI of three or more clicks within the coda could be reliably determined and if these IPIs were all lower than 2 ms.

Results

Dtags were deployed on three first-year calves (one in 2015 and two in 2016), ten adult females and one juvenile male (Table 1). All adults, the juvenile, and Calf R and Calf A showed none or minor reactions such as small flinces and/or a short shallow dive in response to tagging (Table 1). Calf J performed several shallow dives for the first two min after tagging. The three calves came from three different social units: Calf J from Unit J, which consisted of three adult females; Calf R from Unit R, which consisted of five adult females and two other calves; and Calf A from Unit A, which consisted of three adult females and a juvenile male. For future reference, Calf J, Calf R, and Calf A were named Jonah, Riot, and Aurora within The Dominica Sperm Whale Project. Unit J and A were engaged in foraging throughout the tag deployment, whereas Unit R socialized during 3 of 7 h of the tag deployment based on hourly sampling of group-level behavioral state determined as per observed behavioral events (Whitehead and Weilgart 1991). As none of the calves were observed with their social unit during the previous year’s field season, they were assumed to be less than a year old. It appeared from field observations that Calf A was slightly bigger than the two other calves. Calf J and Calf A were genetically sexed using sloughed skin as females (Konrad 2017), whereas the sex of Calf R is unknown. Calf J, Calf R, and Calf A were tagged for 3.9, 6.4, and 4.7 h and performed 7, 15, and 6 deep dives, respectively, during these recording periods (Fig. 1). Most of these dives were V-shaped, but Calf R and Calf A each made four dives with a bottom phase (Fig. 1). The calves often initiated and surfaced from deep dives in the immediate company (within 40 m and within < 1 min) of one or more adults (diving: 3 of 3 and 0 of 1 observations, surfacing: 3 of 3 and 2 of 2 observations for Calf J and Calf A, respectively, data not available for Calf R). During dives the calves seemed to be in physical contact with other whales (detected as acoustic cues of rubbing against the tag, see Fig. 1).

Additionally, 19 tags (5 in 2014, 12 in 2015, 2 in 2016) were deployed on ten different adult whales and one juvenile male across six social units (A, F, J, S, U, and one unknown unit) and used for comparison with the calves (Table 1). The juvenile male was analyzed as described for the adults. Four of these whales were tagged twice and two were tagged three times, within one or two field seasons. A maximum of four adults were tagged during the same day and the calves were either the only one tagged that day (Calf R) or one adult was additionally tagged on the same day (Calf J and Calf A) (Table 1).

Suckling

Calf J, Calf R, and Calf A made 43, 52, and 28 potential suckling dives with median durations of 2.1 min (IQR 0.8–4.3 min), 0.9 min (IQR 0.6–2.2 min), and 1.2 min (IQR 0.6–2.8 min). These potential suckling dives occurred at a median depth of 2.7 m (IQR 2.3–5.3 m), 2.2 m (IQR 1.7–3.6 m), and 3.3 m (IQR 2.8–5.5 m) for Calf J, Calf R, and Calf A respectively. The amount of time spent potentially suckling differed between the calves. Calf J spent 47% of the time potentially suckling whereas Calf R and Calf A only spent 20% and 22% of their tag deployments potentially suckling. No acoustical cues or sounds of physical contact (rubbing) were audible during the potential suckling dives.

Deep-dive behavior

All three calves performed several deep dives. In total, they made 28 deep dives deeper than 50 m. The maximum depth of Calf J and Calf R’s dives was approximately 300 m, whereas Calf A made four dives to around 600 m (Fig. 1). The duration of the dives varied between individuals, with Calf J diving for a maximum of 11 min, Calf R staying submerged for up to 31 min and Calf A’s longest dive lasting 44 min. The median dive depth and duration of the calf dives were significantly shallower (randomization tests: numbers of iterations = 1000, p ≤ 0.001) and shorter (randomization tests: numbers of iterations = 1000, p < 0.001) for all three calves than the median adult dive depth and dive duration (median depth 833 m, IQR 734–909 m, median duration 48 min, IQR 44–50 min, Fig. 5). During the recording periods, Calf J, Calf R, and Calf A spent 25, 47, and 56% of the time performing dives beyond 50 m, whereas adults spent a median of 76% (IQR 63–81%) of their time diving (> 50 m, pooling data from different tag deployments for the same individual).

Fig. 5
figure 5

Distribution of dive duration and maximum dive depth for Calf J (green), Calf R (yellow), Calf A (light blue), and adults (dark blue). The lines indicate the minimum (dashed) and maximum (solid) estimated aerobic dive limit for calves

All three calves glided significantly more during descents of deep dives (Calf J median 44% (IQR 2–65%), Calf R median 18% (IQR 0–28%), and Calf A median 31% (IQR 0–67%), randomization tests: numbers of iterations = 1000, p < 0.001 for all calves) than did adults (median 2% (IQR 0–5%)). In contrast, the calves barely glided during ascents (median 0% (IQR 0–0%) for all calves), whereas adults spent a median of 20% of ascents gliding (IQR 8–41%, randomization tests: numbers of iterations = 1000, p < 0.001 for differences between each calf and the adults). During the bottom phases, adults fluked constantly (median 100% (IQR 100–100%)) presumably to approach and catch prey. Calf R and Calf A each made 4 dives with bottom phases during which they similarly fluked almost continuously (Calf R median 95% (IQR 86–99%) and Calf A median 95% (IQR 82–99%)). Calf J, Calf R, and Calf A ascended with a median dominant stroke frequency of 0.37, 0.41, and 0.41 strokes per second (IQR: 0.33–0.39, 0.34–0.46, and 0.39–0.46 strokes per second), which for all calves was significantly higher (randomization tests: numbers of iterations = 1000, p < 0.001 for all three calves) than the dominant stroke frequency of descending adults (median 0.21 strokes per second, IQR 0.20–0.22 strokes per second). Calf R and Calf A both ascended significantly faster (median both 1.6 m/s, IQR: 1.2–1.7 m/s and 1.4–1.7 m/s, randomization tests: numbers of iterations = 1000, p < 0.001 for both calves) than the adults descended (median 1.4 m/s, IQR 1.3–1.5 m/s). Calf J on the other hand ascended significantly slower (median 1.2 m/s, IQR 1.2–1.4 m/s, randomization tests: numbers of iterations = 1000, p = 0.038) than the adults descended. We compare stroke frequencies and swim speeds during ascents for calves with descents for adults to use epochs where the buoyancy works against the relatively heavy calves (less body fat) and relatively light adults (more body fat) (Miller et al. 2004).

Echolocation

Given that all three calves performed dives to 300 m depth and Calf A further reached adult foraging depth, we examined the IPI, AOA, and AOL of clicks recorded by the tags on the calves to determine if the tagged calves were clicking. As an example, Fig. 3 shows these three parameters for calf clicks recorded by the Dtag on Calf A during dives I to IV (Fig. 1). The IPI estimates of clicks differ between and within dives, suggesting that the recorded clicks came from different individuals. In dives I and IV, the median IPI was 3.06 ms (IQR 2.92–3.18 ms) and 2.88 ms (IQR 2.19–3.02 ms), which is similar to the IPI estimates of adult sperm whales in Dominica (Bøttcher et al. 2018). These IPIs indicate a body length of 9.3 and 9.0 m (Gordon 1991), which is the typical length of sexually mature female sperm whales (Lockyer 1981). During dives II and III, the initial median IPIs of 1.45 ms (IQR 1.38–1.51 ms) and 1.48 ms (IQR 1.43–1.53 ms) shifted to a median of 2.82 ms (IQR 1.99–2.99 ms) and 3.00 ms (IQR 2.60–3.08 ms) toward the end of the clicking. This change in the IPIs indicates a shift from a smaller whale to an adult whale clicking. The low IPI clicks had little variation in AOA (IQR: 4.2 and 4.8° for dives II and III), whereas the AOA of the high IPI clicks varied more (IQR: 4.8, 11.3, 14.7, and 10.7° for dives I, II, III, and IV). Moreover, the AOLpp of the low IPI clicks varied less (IQR: 2.6 and 3.2 dB for dives II and III) than the AOLpp of the high IPI clicks (IQR: 12.8, 12.3, 9.6, and 10.5 dB for dives I, II, III, and IV). In combination, these observations of IPI, AOA, and AOLpp suggest that Calf A emitted the low IPI clicks and nearby adults emitted the high IPI clicks occurring right before the calf started its ascents (Fig. 1). Additionally, Calf A produced two of the eight buzzes recorded by its tag (Figs. 1 and 3).

Following the same method, it was found that Calf J and Calf R each produced one bout of regular clicks (Fig. 1). However, no buzzes were recorded from these calves.

During the full recordings, Calf J and Calf R emitted clicks for 80.5 and 89.5 min, Calf A on the other hand echolocated for 18.4 and 28.5 s in two of its approximately 600-m dives (dives II and III in Fig. 1). In comparison, the median adult search phase duration (i.e., from first to last regular click in a dive as defined in Watwood et al. 2006) was 37.8 min (IQR 35.5–41.1 min). Even accounting for the shorter duration of the calf dives compared to adult dives, the percentage of time spent in the search phase per dive was lower for the calves (13.9 and 16.2% for Calf J and Calf R, and 55.4 and 65.2% for Calf A versus adults: median 78.1%, IQR 76.3–80.5%). Calf J and Calf R both emitted their clicks at approximately 200 m depth, with Calf J clicking during the last part of its descents while Calf R emitted clicks during the initial part of its ascent (Fig. 1). Calf A started echolocating at 426 and 340 m during the last part of its descent, similar to adults (median depth 339 m, IQR 235–371 m), but stopped clicking during the last part of the bottom phases (Fig. 1), which is earlier than adults (Watwood et al. 2006). The ICI of Calf R and Calf A, median 0.46 and 0.41 s (IQR: 0.46–0.56 s and 0.40–0.46 s), respectively, was close to the median ICI of 0.49 s (IQR 0.44–0.54 s) for adults. Calf J on the other hand had a higher median ICI of 0.81 s (IQR 0.72–0.86 s) (Fig. 4).

The two buzzes (one per dive) made by Calf A lasted 27.7 and 12.4 s, substantially longer than the median duration of 4.3 s (IQR 4.2–4.8 s) for adult buzzes. Adults produced a median of 17 (IQR 14–19) buzzes per dive. Even when accounting for the different search phase durations, Calf A produced an order of magnitude fewer buzzes per minute than adults (median of 0.045 for Calf A compared to a median of 0.430 buzzes per minute of the search phase for the adults). Calf A produced its buzzes at 556 and 470 m, considerably shallower than the median depth of 771 m (IQR 722–789 m) for adult buzzes. The median ICI of Calf A’s buzzes (0.025 and 0.019 ms) was similar to the median ICI of adult buzzes (0.019 ms, IQR 0.017–0.020 ms) (Fig. 4). The echogram (Johnson et al. 2004) of the second buzz revealed the presence of an object 4.0 m in front of the calf 5.5 s into the buzz (Fig. 6). The distance to this object decreased to 2.7 m over a 2-s period, suggesting that the calf closed in on this object at a net speed of 0.7 m/s.

Fig. 6
figure 6

Echogram of Calf A’s second buzz during dive III

Codas

No codas with an IPI corresponding to a calf were recorded by the tags on Calf J and Calf A, whereas 26 codas produced by a calf were recorded by the tag on Calf R. Codas and echolocation from other sperm whales were audible in 97.1, 95.5, and 94% of the recording time of Calf J, Calf R, and Calf A respectively (Fig. 1e). Codas were audible in 4, 29, and 5% of the recording time of Calf J, Calf R, and Calf A respectively (Fig. 1e).

Discussion

In this first detailed study of the fine-scale behavior of large toothed whale calves in the wild, we used multi-sensor Dtags to show that less than a year old calves can dive to depths at which adults forage and can emit echolocation clicks and buzzes. However, contrary to adults, the calves employed gliding during descents instead of ascents. Thus, despite a difference in locomotor requirements, the tagged calves appeared to be developing what would be the capacity for independent foraging. In comparison, the calves rarely produced codas, perhaps suggesting that investment in locomotor, diving, and echolocation skills may be favored over the development of social communication skills.

Suckling

The calves made several potential suckling dives, a behavior that made up an estimate of 47, 20, and 22% of the recording time for Calf J, Calf R, and Calf A respectively. The variability between these three values may stem from different suckling efficiencies between the calves or may relate to the calves’ stage of transition from exclusively suckling to increasingly supplementing their diet with prey. The three calves additionally performed shorter, shallow dives similar to the peduncle dives observed by Gero and Whitehead (2007). Underwater observations have shown that the calves press their blowhole against the escorting adults’ genital area during such dives (Gero and Whitehead 2007) probably to induce milk let down as observed in other cetaceans (Asper et al. 1988; Peddemors et al. 1992; Xian et al. 2012) and in terrestrial mammals (Lent 1974).

Successful transfer of milk requires behavioral coordination between the adult female and the calf. A recent study shows that humpback whale calves opt for mechanical cues rather than vocal cues to indicate their readiness to suckle (Videsen et al. 2017). Similarly, no vocal cues were associated with the potential suckling behavior of the recorded sperm whale calves. However, contrary to the humpback whale calves, no acoustic signs of physical contact were apparent from the recordings in this study, which may be due to posterior tag placement on the calves. The lack of acoustic cues may be an adaptation to avoid the risk of eavesdropping predators such as killer whales at or near the surface as suggested for humpback whale calves (Videsen et al. 2017).

Communication

It has previously been documented that calves less than a year are able to produce codas (Schulz et al. 2011), but that they produce codas far less frequently than adults (Marcoux et al. 2006). Despite the sparsity of codas, calves are reported to produce a higher diversity of coda types compared to adults (Schulz et al. 2011; Gero et al. 2016) and it appears to take several years before calves converge on their natal unit’s dialect (Gero et al. 2016). In concert, these results suggest that social communication is a complex skill to acquire. We recorded codas on only one of three tags in this study. Unit R socialized during 3 of 7 h of the tag deployment while Units J and A were foraging the entire time and only produced few codas (see Fig. 1d, e). Given that coda production rate is correlated with group behavioral state (Whitehead and Weilgart 1991), we were presumably more likely to record codas on Calf R’s tag. However, it would appear that young calves may not have a high need to participate acoustically in the social bonding during these periods of socializing at the surface. Nonetheless, based on the absence of codas produced by Calf A and Calf J, our results indicate that the calves do not need to emit dedicated social cues to maintain and re-establish contact with deep-diving adult whales; the ample presence of acoustic cues from adult sperm whales (Fig. 1e) seems sufficient for the calves to track as suggested by Gordon (1987).

Dive behavior

Contrary to our expectation based on surface observations (Gordon 1987; Whitehead 1996) and the assumed smaller oxygen stores of calves, we show that first-year calves have well-developed diving abilities. Calf J and Calf R both dove to around 300 m, but Calf J’s longest dive lasted only 11 min, whereas Calf R performed three dives lasting between 22 and 31 min (Fig. 1). Calf A on the other hand performed four even longer dives (ranging from 31 to 44 min) during which it reached 600 m (Fig. 1), which more closely resembles adult dive behavior in this geographical area. These differences may imply that the calves are at different stages of developing their diving ability. It further seems that all three calves stayed within close proximity of one or several adult whales during descent and ascent, as they most often initiated and surfaced from deep dives simultaneously with one or more adults; cues of physical contact were recorded during their descents and ascents, and high-level adult echolocation clicks were recorded right before Calf A ascended from its four deep dives (Fig. 1). Thus, sperm whale calves can tolerate the increased pressure at depth and have sufficient oxygen stores for deep diving within their first year of life.

Noren et al. (2001) suggested that the age at which dolphins’ and pinnipeds’ oxygen stores are fully developed relates to the species’ life history traits and especially how early the calf or pup transitions to independent foraging. Based on their purported life history traits (Gero et al. 2009), sperm whales may be hypothesized to have a protracted development of oxygen stores. However, their relatively large size compared to delphinid calves and phocid pups, for example, gives sperm whale calves a built-in advantage as oxygen stores scale proportionally with body mass (M) whereas metabolic rate scales with M0.7 (Kleiber 1975). This advantage may allow sperm whale calves to supplement milk with independent foraging earlier than expected from their otherwise characteristically slow life history traits. Additionally, deep-diving toothed whales such as sperm whales and presumably Blainville’s beaked whales (Mesoplodon densirostris, Dunn et al. 2017) may have a more rapid development of muscle and blood oxygen stores compared to deep-diving phocids, such as northern elephant seals (Mirounga angustirostris, Noren et al. 2001) that spend their first months on land (Reiter et al. 1978).

The long duration of several of the calf dives and the energetic ascents may have resulted in these calves exceeding their aerobic dive limit (ADL). Assuming that their mass-specific oxygen stores are fully developed, the ADL of the three calves may be estimated by scaling the diving metabolic rates from adults (sensu Watwood et al. 2006):

$$ {\mathrm{ADL}}_{\mathrm{calf}}={\mathrm{ADL}}_{\mathrm{adult}}\cdot {\left(\frac{M_{\mathrm{b},\mathrm{adult}}}{M_{\mathrm{b},\mathrm{calf}}}\right)}^{-0.25} $$

where Mb is body mass. To perform that estimation, a minimum and maximum Mb,calf of 1 and 2 tons were chosen for neonate and first-year sperm whale calves (Lockyer 1981). The average Mb,adult was set to 7.2 tons (Lockyer 1981) corresponding to the median body length of 9.2 m for adult females in the area (based on Bøttcher et al. 2018). Hence, assuming the median adult dive duration of 48 min approximates their ADL (Watwood et al. 2006), the estimated ADL of the calves ranges from 28 to 34 min. This approach takes the difference in mass-specific metabolic rate into account, but assumes equal oxygen stores per unit of body mass, which may cause an overestimation of the calf ADL. The duration of dives made by Calf J (maximum 11 min) was well within this estimated ADL, whereas the longest dives of Calf R and Calf A reached and exceeded the estimated ADLs (Fig. 5). Hence, these calves may have faster maturation of oxygen stores than seal pups on land and shallow water odontocetes (Dolar et al. 1999; Noren et al. 2001), but may still need long surface intervals to process the accumulated lactate from possible anaerobic metabolism (Kooyman et al. 1980). Indeed, all calves spent less time deep diving than the adults. This may, however, also be the consequence of the calves engaging in specific behaviors confined to the surface or near-surface zone, such as nursing.

Unlike adults, all three calves spent more time gliding during ascents and less time during descents compared to the adults. This suggests that the calves are negatively buoyant due to a lower percentage of blubber, spermaceti oil, and/or junk (Miller et al. 2004). Getting a negatively buoyant body back to the surface requires well-developed locomotor skills and careful timing of dives to ensure sufficient oxygen resources for an energetic ascent. The three calves had stroke frequencies higher than adult whales, when comparing the phase of the dives in which each whale worked against its buoyancy, i.e., ascents for calves and descents for adults. This difference is likely due to scaling of body size (Sato et al. 2007). However, Calf R and Calf A attained speeds that in some cases exceeded those of adults during powered swimming, highlighting that they can indeed follow and keep up with adults during some deep dives and/or that they had not yet learned to manage their oxygen stores to maximize dive duration. The third calf swam at lower speeds than the two other calves and the adults in general, possibly because the adults of its social unit were swimming at slow speeds. Alternatively, this difference may imply that the three calves were at different stages of locomotor development.

Echolocation

One explanation for the pronounced diving behavior of these three calves may be that they, despite their young age, are catching food to supplement their milk-based diet. Our study shows that these free-ranging calves produced click trains, which in accordance with earlier suggestions (Ridgway and Carder 2001; Madsen et al. 2003) indicates that young sperm whale calves may echolocate. It was previously assumed that echolocation was a very complex sensory process that took cetacean neonates a long time to master (Bowles et al. 1988; Lindhard 1988). However, our findings are in line with recent studies on smaller toothed whales; harbor porpoises (Phocoena phocoena) emit clicks within minutes after birth and adjust their echolocation to match that of adults within a few days (Delgado 2016). Similarly, wild bottlenose dolphins (Tursiops aduncus) emitted clicks a few days after birth and after 17–21 days, these clicks were similar to adult echolocation clicks (Delgado 2016). This early development of echolocation fits the rapid life history traits of these species, but sperm whales, in contrast, are known for their slow maturation and hence perhaps would be expected to start echolocating much later. We show that this is not the case, and taken in combination with similar observations for Blainville’s beaked whales (Dunn et al. 2017), this suggests that deep diving toothed whales also have an early development of echolocation skills and that such a feature is not restricted to small cetaceans with rapid life history traits.

Calf J and Calf R both emitted one short click train at a depth of approximately 200 m. The absence of buzzes and the fact that these two calves did not reach depths where the adults were foraging suggest that these calves were not engaged in biosonar-based prey interception. However, the clicking may represent an early stage of their echolocation development, which could explain the longer ICIs of Calf J, which may be due to the calf needing longer processing time of the echoic scene of each click. In contrast, Calf A emitted two long bouts of clicks with ICIs similar to adult ICIs, and additionally, produced two buzzes at a depth where adults were also buzzing (Fig. 1). The presence of buzzes within echolocation bouts is a possible indication that Calf A was echolocating to catch prey. This interpretation is further supported by the presence of an echoic object that the calf approached during the second buzz (Fig. 6). Calf A made only two long buzzes, but these had ICIs similar to those of adults. This is consistent with the calf engaging in the approach and buzz phases of biosonar-based foraging, but perhaps not managing to catch the prey as quickly or at all, leading to a protracted capture attempt. Alternatively, the calf may have been echolocating and buzzing on an adult sperm whale. This however seems less likely as no loud (> 125 dB re 1 µPa peak) adult echolocation clicks were recorded before, during, or after the buzz (see Fig. 1) and because the big body of an adult whale would create a much more smeared echo compared to a point target such as a prey object.

The great variability in echolocation effort between the three calves of this study may reflect that these calves were at different stages in the transition from suckling to early independent foraging, or that we simply sampled them too little to capture the full range of vocal behaviors. Field observations suggest that the calf which performed the buzzes was the largest and therefore may be the oldest of the three calves, implying that the difference in diving behavior and echolocation effort observed here may be an effect of age, however more data with a more exact knowledge on age and/or length of the calves are needed to verify this interpretation.

Conclusion

Here, we have used miniaturized bio-logging devices to obtain a unique first snapshot of the early development of social and foraging behavior in the largest tooth-bearing predator on the planet, the sperm whale. Due to practical difficulties in tagging calves, this study is based on a sample size of three calves and a total recording time of 15 h. With this reservation in mind, the data has enabled us to shed some light on the gradual and complex ontogeny of sperm whale calves in unprecedented detail, as well as allowing for greater insight into what sperm whales are capable of in their first year of life and the pace at which they go on to become sound-mediated, highly social apex predators in a deep oceanic environment. Contrary to large-brained and highly social terrestrial mammals, our data potentially suggests that sperm whale calves do not postpone their locomotor development to favor the maturation of complex social skills. Instead, the first-year calves performed deep and long-lasting dives where they seemed to employ echolocation as part of their sensory scene acquisition, and one calf may have engaged in biosonar-mediated prey capture attempts. Furthermore, the calves seemed to primarily rely on passive acoustic cues from the adults rather than emit codas themselves to maintain and restore contact with adults. Hence, it is implied that the sperm whale is an example of a large-brained, highly social mammal that perhaps prioritizes locomotor and diving development potentially at the cost of slower development of social and communicative skill, which in turn may explain their prolonged dependency of their social unit compared to delphinids.