Introduction

Bioacoustic studies have played an important role in the comprehension of avian diversification and speciation (Payne 1986; Catchpole and Slater 2003). Songs are critically important for an array of ecological functions in birds (e.g., mating, territorial and resource defense, identification of social rank) such that song repertoires are under strong selective pressures. Consequently, modifications occurring in song characteristics may cause tangible and measurable evolutionary effects in this taxonomic group (Price 1998; Catchpole and Slater 2003; Price and Lanyon 2002). In bird song, intraspecific geographic variation can have profound effects on gene flow among distinct song types, eventually leading to speciation (West-Eberhard 1983; Parker et al. 2012). Thus, knowledge about geographic song variation provides valuable insights into the processes that have shaped bird species’ diversification (Slabbekoorn and Smith 2002a, b; Podos and Warren 2007; Roach and Phillmore 2017).

In oscines species, the ability to learn songs, and thus song plasticity, plays important roles in promoting song variation (i.e. dialects) across geographic regions (Lachlan and Servedio 2004; Nelson and Soha et al. 2004; Sosa-López et al. 2013; Ortiz-Ramírez et al.2016). In fact, because learning may result both in song repertoire expansion and preferences for local song dialects, which could lead to reproductive isolation (Baker et al. 1981; Price 1998, Catchpole and Slater 2003), this ability is assumed to be associated to faster song evolution (Mason et al. 2017). On the other hand, song learning is very limited (or absent) in suboscines species (Kroodsma 2004; Touchton et al. 2014). This characteristic is expected to provide little opportunities for intraspecific variation in song among isolated suboscine populations (Isler et al. 2005; Soha et al. 2004; Yeh and Servedio 2015; Mason et al. 2017) when compared oscine birds.

In addition to social learning, the acoustic adaptation to ecological factors is another hypothesis that may explain geographic differences in bird song. The acoustic adaptation hypothesis (AAH) establishes that songs are mostly selected for transmission in the local habitat (e.g. Morton 1975; Wiley and Richards 1978). As a consequence, to fully understand song geographic variation one should take into account the environmental conditions that could shape vocal variation across populations.

Over the past years many studies have attempted to understand the role of geographic variation in the development of bird song. However, an extremely unbalanced focus exists for oscines passerines (i.e., song learners) when compared to suboscines passerines (i.e., non-learners; Footea et al. 2013; Lovell and Lein 2013). Thus, the putative absence of (or reduced) song variation in the suboscines is based on few studies, mostly conducted through visual inspection of sonograms (Kellogg and Stein 1953; Lanyon 1978; Kroodsma 1984), and even fewer studies based on modern and systematic acoustic analyses (Lindell 1998; Footea et al. 2013). Recent studies have pointed the existence of song variation in several suboscines species (Leger and Mountjoy 2003; Seddon and Tobias 2007; Tobias and Seddon 2009; Lovell and Lein 2013) raising the possibility that there is not exactly a lack of song variation, but rather a lack of extensive studies on suboscines. Theoretical models that predict increased speciation with learning (i.e., in oscines) have led to an implied consensus that suboscines do not present variation even in the absence of substantial empirical research. For example, Freeman et al. (2017) conclude that suboscines may be more apt than oscines at discriminating song among distinct populations. Considering its diversity, there is an important deficit of empirical studies of song geographic variation in suboscines, particularly if one considers the use of recent and robust acoustic analysis tools.

Here we studied geographic song variation in the suboscines passerine Silvery-cheeked Antshrike (Sakesphorus cristatus). This species inhabits the medium/low stratum of vegetation, and can often be found in the ground. It’s a sexually dimorphic species (feather color and presence of crest in males), both sexes sing, they can sing in duets, and are easily attracted by playback (personal note). There are no studies regarding the structural and functional aspects of their vocal repertoire. We describe for the first time the vocalization of Sakesphorus cristatus, focusing on the song. There is no previous information concerning the structure or function of vocalizations in this species, but databases and information acquired in the field show three distinct vocalizations: the song and two calls. Although there is still no information on the functions and exact contexts of calls, observations and recordings of the song have shown the functions and contexts traditionally attributed to the songs of Passeriformes, relating to the attraction of sexual partner and defense of territory against conspecifics. This member of the Thamnophilidae family is endemic to the Brazilian Caatinga dry tropical forest vegetation, which extends continuously from the southern state of Minas Gerais to the northern states of Ceará and Rio Grande do Norte (Ridgely and Tudor, 1994). Not only S. cristatus is abundant and vocally conspicuous throughout its range, but this species can be found within the entire extension of the Caatinga biome.

The Caatinga is one of the few semiarid regions in the Neotropical zone, it is surrounded by humid areas (Ab’Saber 1974), and occurs exclusively in northeastern Brazil (Rizzini 1997; Prado 2003). Its fauna was previously considered impoverished, with few endemic species and low diversity (Vanzolini 1988; Mares et al. 1985; Willig and Mares 1989), a consensus based on scarce knowledge about the biodiversity, patterns and processes of diversification in the biome, as compared to other South America biomes, such as Atlantic Forest and Amazon (Silva et al. 2003; Turchetto‐Zolet et al.2013).

Despite being a poorly studied biome in the Neotropical region (Silva et al. 2003; Tabarelli and Vicente 2004; Leal et al. 2005; Araújo 2009; Turchetto‐Zolet et al. 2013), recent surveys portrait a more accurate and realistic analysis of the Caatinga biome (Silva et al. 2018). Furthermore, recent studies have shown the role of some processes in shaping the biological diversification across the Caatinga, such as the Pleistocene climatic oscillations (e.g. Gehara et al. 2017) and paleo-changes of the course of São Francisco River (e.g. Werneck et al. 2015). Thus, studies of geographic variation within the Caatinga may provide further details and refinements relative our understanding of the biogeographic history of this biome.

By investigating geographic song variation of seldom studied ecoregion and species group, we aim to expand the knowledge relative to the vocal variation in suboscines, and to the diversification processes and geographic variation within the Caatinga biome. We specifically aim to describe the vocal variation of the Silvery-cheeked Antshrike to investigate putative historical and environmental drivers of song variation within this biome.

Material and methods

Data acquisition and collection

We gathered 102 song recordings of S. cristatus from 14 localities in Caatinga covering most of the species distribution (Supplementary Appendix I, Fig. 1). We obtained 41 recordings from field samplings under natural conditions, using a high-fidelity digital recorder (MPC-50, Sony) coupled to a directional microphone (ME-67, Sennheiser), set at approximately 1 m of the emitter bird. To minimize the possibility of repeatedly recording the same individual, we used different trails, following the protocols described by Price and Lanyon (2002) and Sosa-López et al. (2013). Since the territory of the individuals (or pair) of S. cristatus is relatively small, we avoided pseudoreplication by spacing sampling points by at least 300 m, if recordings were performed within a single day, or at least 500 m if sampling occurred on a subsequent day. Recordings of 49 songs were obtained from online databases (35 from Xeno-Canto https://www.xeno-canto.org, and 14 from Wikiaves https://www.wikiaves.com.br), and 12 recordings from collaborators. We only used audio-records that satisfied conditions of quality for acoustic analysis. The recordings will be available at https://www.animalsoundarchive.org.

Fig. 1
figure 1

Map with the distribution of the Silvery-cheeked Antshrike recordings that were analyzed in the study

Acoustic analyses

Song spectrograms and vocal attributes measurements were obtained in the software Raven Pro 1.5 (Bioacoustics Research Program 2014—The Cornell Lab of Ornithology). All recordings were edited by filtering and normalizing the sound profiles (following Zollinger et al. 2012), using Audacity software (version 2.1.2) to improve the accuracy of the analysis, and by standardizing songs obtained from different sources and/or recording methods. We used a Hanning spectrogram window, a time grid with an overlap of 50% and a hop size of 256 samples, FFT size of 512, and grid spacing of 86.1 Hz. For song measurements, we defined a note as a continuous signal in the spectrogram that is not interrupted for at least 3 ms. Furthermore, we defined a phrase as a sequence of similar repeated notes along the song. Preliminary inspections revealed that songs are typically composed of two phrases, each composed by one specific note (Fig. 2). Based on previous studies (Price and Lanyon 2002; Dingle et al. 2008; Ortiz-Ramírez et al. 2016), we measured a total of 13 acoustic parameters: maximum and minimum song frequency (Hz), interval between phrases (s) and, on each phrase, the peak frequency (Hz), total duration (s), band frequency at 90% (Hz), number of notes (n) and emission rate (n/s).

Fig. 2
figure 2

Spectrogram of a Silvery-cheeked Antshrike song (XC320806) showing two types of notes (i.e., two phrases, a and b) and their differences on the spectral and temporal scales

Statistical and geographic analyses

We used a Principal Component Analysis (PCA) to reduce the number of song variables and thereby eliminate potential problems associated with collinearity between variables. We performed PCA using the correlation matrix, with all songs of all populations analyzed in the same acoustic space, extracting PCs with high explanatory values (eigenvalues > 1) to describe song variation in subsequent analyses. We tested whether existed differences between sampled locations, in a multivariate vocal space (the synthetic PCA variables), by using the discriminant function analysis (DFA). This was done to understand which localities (or groups of localities) differ in acoustic profile. Through a multivariate analysis of variance (MANOVA), as integral part of DFA, we tested whether geographical differences existed in song profile.

As bird songs may show latitudinal variations (Isler et al. 2005; Weir and Wheatcroft 2011), we performed a linear regression between the variables “vocal attributes” and “latitudinal coordinates”. Because variations in songs may be the result of environmental acoustic adaptation (Slabbekoorn and Smith 2002a, b), we tested for a correlation between vocal and environmental variation using the Mantel test. Environmental data were obtained from the 19 bioclimatic variables available on the Worldclim database v.1.4 (https://www.worldclim.org/) with a resolution of 2.5 min. Bioclimatic data for each locality were obtained using DIVA-GIS 7.5 (https://www.diva-gis.org/). We generated an environmental dissimilarity matrix among localities based on Euclidean distance in a multivariate climate space, and the same procedure was applied to vocal variables. We used the software Statistica 10.0 to perform all statistical analyses.

Results

Song description and variation

Our analysis of the 102 recordings of S. cristatus revealed that songs were typically composed by repeated notes. The fundamental frequency of songs descends, from higher frequency (3.21 ± 2.62 kHz, cv = 0.11) to progressively lower in frequency (1.04 ± 1.65 kHz, cv = 0.26). Songs are short in duration (2.6 ± 0.53 s, cv = 0.2), varying from quick bursts of notes (min = 1.5 s) to relatively longer emissions (max = 3.77 s). In term of pace, the songs are initially slow and gradually become faster in the rate that notes are emitted. All songs presented two distinct phrases (a and b; Fig. 2), each characterized by one distinctive note. Phrase A starts the song and is longer (mean = 2.1 ± 0.3 s, cv = 0.1) and composed of a higher number of notes (mean = 9.2 ± 1.6, cv = 0.1) than phrase B, which is shorter (mean = 0.5 ± 0.2 s, cv = 0.5) and contains fewer notes (mean = 2.5 ± 1.1, cv = 0.4). Phrase A is higher pitched (peak frequency mean = 1.88 kHz), exhibiting a wider variation (0.72–1.95 kHz) than the phrase B, which is much lower-pitched (peak frequency mean = 1.01 kHz) and with more narrow variation (0.53–1.06 kHz).

The variation of acoustic parameters between the sampled localities is summarized in Table 1. Songs varied mainly along three axes, which explained 77.29% of the variation between songs (Table 2). PC1 largest factor loadings are related to the duration and number of notes in phrase B (lower PC1 scores imply longer B-phrases, with higher number of notes); accordingly, we refer to this axis as B-phrase. PC2 largest factor loadings are related to the duration and number of notes in phrase A (lower PC2 scores imply longer A-phrases, with a higher number of notes); accordingly, we refer to this axis as A-phrase). PC3 largest factor loadings are related to the pause between phrases A and B (lower PC3 scores imply shorter silence between phrases); accordingly, we refer to this axis as AB-pause. A-phrase, B-phrase, and AB-pause vary significantly across localities (one-way ANOVAs, B-Phrase: F 5.51 = 9.764, A-Phrase: F 5.51 = 25.17, AB-Pause: F 5.51 = 2.968, all P < 0.05).

Table 1 Mean ± standard deviation for each of the 13 acoustic variables within each geographic group
Table 2 Factor loadings of acoustic variables for the first three principal components

Populational vocal divergence

Between the localities, S. cristatus’ songs were acoustically different (DFA; MANOVA, F(14.41) = 3.59, P = 0.00068; α = 0.05), which allowed differentiating among populations. We observed a clinal trend in song variation (regression B-Phrase ~ latitude: R2 = 0.42, P < 0.001), which included a gradual change in song parameters northward (Fig. 3). Northern songs are characterized by longer B-phrases, which are also composed with a higher number of notes (Fig. 4). This pattern illustrates the different types of songs across the localities sampled within the Caatinga, along a south-north gradient.

Fig. 3
figure 3

A regression between PC1 and latitude showed a clinal trend in song variation occur from south to north. It is possible to note the break caused by the SFR between the southern and northern localities relative to SFR, with the exception of the locality of the Raso da Catarina, which suggest a combination of two factors explaining the variation and structure of S. cristatus songs along its north-sought distribution

Fig. 4
figure 4

Spectrograms of Sakesphorus cristatus songs from each population. All songs decrease in frequency over the duration of the vocalization, from 2.33 ± 0.41 kHz to 3.00 ± 0.72. Populations: North of Minas Gerais (ab), Maracás/BA (cd), Boa Nova/BA (ef), Caetité/BA (gh), Chapada Diamantina south/BA (ij), Chapada Diamantina north/BA (kl), São Francisco/PE (mn), Chapada Araripe/CE (op). Middle of Ceará/CE (qr). Serra de Ibiapaba/CE (st)

Vocal and climatic divergence among populations were not correlated (Mantel test: r = 0.2688, P = 0.07). However, our clustering analysis revealed a split between localities on the distinct banks of the São Francisco River. Only one locality (Raso da Catarina) did not follow this split pattern. Raso da Catarina songs remain more similar to localities from the northern regions, even though it is located at the south (right bank) of São Francisco River. The DFA contrasting northern and southern regions of the São Francisco River (Fig. 5) showed striking differences in song parameters of the B-phrase (F2.41 = 3.59, P = 0.001). We also observed higher intergroup song variation on right than on the left bank of São Francisco River (Fig. 6).

Fig. 5
figure 5

Plot of discriminant function 1 and discriminant function 2 representing results of discriminant function analysis for all localities separated by São Francisco River (SFR). In general, song of the South SF is easily distinguished from of the North SF, with some introgressions, mainly by the locality of Raso da Catarina (southern region) more similar to the song northern region. Symbols represent locality affiliation

Fig. 6
figure 6

4 Heatmap distance matrix of the vectors of song variation between the localities. Grey bars represent geographical clustering given by vocal distance. Matrix was implemented on the online service Clustergrammer-Web Visualization (clustergrammer.readthedocs.io/clustergrammer_web.html), through the distance matrix of Mahalanobis generated in the discriminant function analysis of the songs. P values that support the correlations are in the supplementary data associated with this article

To complement the explanation of the predictor variables found in our previous analyses, we contrasted two models using the ANCOVA test, one more complex, that includes both clinal factors and SFR barrier, and one more simple, that includes only clinal factors to explain song variability. As the analyze result, the more complex model was preferred (see SM 3).

Discussion

This study described the geographic variation of song characteristics in a suboscine bird species (Silvery-cheeked Antshrike) at the Caatinga biome. To our knowledge, this is the first study of the acoustic parameters in an endemic bird from the Brazilian Caatinga dry tropical forest vegetation. Various acoustic traits of the song vary in the Silvery-cheeked Antshrike, mainly in term of song duration and complexity (number of notes of the two phrases), but also in the time-lapse between these phrases. However, only attributes associated with the B-phrase showed a geographic patterned variation between the populations in the study.

Geographic variation

We observed a northward pattern of song variation in S. cristatus, which concurs with models of both clinal variation and division by the São Francisco River. Two distinct vocal clusters appear to exist, which are separated by the São Francisco River (Fig. 5). Within the cluster from the right bank of the river, we observed larger intragroup variation. Our results are in agreement with previous studies in other taxa that found potential diversification associated to the São Francisco River (Mares et al. 1985; Rodrigues 1996, 2003; Siedchlag et al. 2010; Werneck et al. 2012; Faria et al. 2013; Nascimento et al. 2013; Werneck et al. 2015). In addition to a potential effect of the separation by the São Francisco River, S. cristatus song appears to undergo a gradual variation northward (Fig. 3), which supports the hypothesis of geographic clinal variation of the song, as shown in other studies (Isler et al. 2005; Weir and Wheatcroft 2011). Clinal variation underscores the necessity of searching for exist intermediary localities when analyzing vocalizations of geographically distant populations. In fact, when there is clinal variation of song parameters, the exclusion of intermediary localities from the analysis may lead to the false conclusion that populations are geographically separated into distinct vocal groups. Here, both the presence of the SF River along with clinal factors seem to affect the structure of S. cristatus song (see model selection results in Supplementary appendix II). Geographic differences in song are characterized by longer B-phrases in northern regions, with a higher number of notes compared to their southern counterparts.

The only case not fitting the SFR hypothesis is the song of the Raso da Catarina population. This right margin population has songs more similar to their left margin neighbor population than to right margin populations located further south. This unexpected similarity could be the result of the geographic characteristic of this locality, which is close to the northern region and thus retain traits more similar to northern population as long as gene flow is not completely impeded. Alternatively, this similarity to northern songs could be the result from paleo changes in the course of the São Francisco River. For instance, ancestrally, Raso da Catarina pertained to the left margin of the river (Mabesoone 1994; Potter 1997). Further studies, using historical analysis will allow the distinction between these hypotheses.

Song variation in suboscines

Suboscines occasionally show significant song variation, as shown here and for the families Tyrannidae (Empidonax alnorum, Sedgwick 2001; Lovall and Lein 2013; and Attila spadiceus Leger and Mountjoy 2003), Dendrocolaptidae (Xiphorhynchus fuscus, García et al.2018), Furnariidae (Synallaxis albescens, Lindell 1998) e Thamnophilidae (Thamnophilus caerulescens, Isler et al. 2005). Rarely are vocal variations in suboscines correlated to geographic variables, such as latitude (Sedgwick 2001; Isler et al. 2005) and altitude (Sedgwick 2001). For our study species, we found a significant relation between latitude and song (Fig. 3). This clinal pattern is not easy to understand because we found no correlations between climatic and vocal distances. Exception aside (e.g. Villegas et al. 2018), studies have failed to find correlation between vocal and ecological traits. Despite these congruences with the majority of suboscines studies, we emphasize that all the previous studies examined more extensive species distributions, while the variation we found in S. cristatus song is evident at a smaller geographical scale (Fig. 4).

Whilst the ability for song learning is assumed low in suboscines, current knowledge on the role of learning posits that variation and diversification are enhanced by this process (Lachlan and Servedio 2004; Mason et al. 2017). Even if suboscines have demonstrated abilities to learn songs (Cotingidae; Kroodsma 2004; Kroodsma et al. 2013), papers with antbirds (Thamnophilidae) have shown close concordance between vocal and genetic geographical variation, such that song structure is more probably an inherited trait rather than a learned one (Brumfield 2005; Isler et al. 1998, 1999, 2001, 2007; Remsen 2005). If learning is minimal (or even inexistent) as expected in a species of Thamnophilidae family, the finding that S. cristatus exhibits two song clusters through its distribution superimposed to a latitudinal clinal variation suggests for significant genetic variation throughout its distribution within the Caatinga biome, and some degree of genetic divergence among southern and northern populations relative to the SF River. Further studies using DNA might investigate this hypothesis.

Considering that song variability did not correlate with environmental variables (i.e., climatic factors), the acoustic adaptation hypothesis (Morton 1975; Wiley and Richards 1978; Slabbekoorn and Smith 2002b) is of little help in explaining S. cristatus song variability, although caution is needed here, because we show only marginally non-significant correlations with climate. Also, the social adaptation hypothesis (Payne 1978; Rothstein and Fleischer 1987; Nordby et al. 2007) could hardly explain S. cristatus song variability under the assumption that there is no social learning. One possible explanation to this unexpected geographical variation lies not in the generation of new song variants (potentially through learning), but in a stricter discrimination between song types. Suboscines have been shown to discriminate more intensely, even when population variability is relatively small (Freeman et al. 2017). If this proves to be the case in S. cristatus, regional song clusters could easily emerge from a putative stricter song discrimination system. This possibility would stress the role of female discrimination, reducing the emphasis on the role of male song learning over speciation processes. Factors other than the development of acoustic signals, such as morphological aspects connected to song production (Pearse et al. 2018), or social interactions with closely related species (Tobias and Seddon 2009), should also be considered if passerine macroecological and macroevolutionary trends are to be fully understood.