Introduction

Cetaceans are large mammals that are widely distributed throughout the world. All species are impacted by human activities (bycatch, overexploitation of prey resources, ship-strikes, noise and chemical pollution; [1]). Commercial whaling dramatically reduced the population of the fin whale, Balaenoptera physalus (Linnaeus 1758) by over 70% during the twentieth century [2]. In 2018, global abundance was roughly estimated at 100 000 individuals [3] and B. physalus remains vulnerable on the IUCN Red List of Threatened Species. However, increases in local population sizes have been recorded for several populations since an international moratorium on commercial whaling was implemented in 1982 [3].

At the beginning of the twentieth century, the Mediterranean population was depleted and migrations between the Atlantic and the Mediterranean were reduced, due to whaling [4]. Within the Mediterranean, different local estimations of abundance were made; in the western basin: 3 583 individuals [5] and in the northern part of this basin: 1 800 individuals [6]. Abundance trends of B. physalus in the Mediterranean Sea are still unknown. Previous genetic analyses have suggested a genetic isolation of the Mediterranean B. physalus population, with a limited gene-flow with the adjacent Atlantic population [7]. To better understand and protect the B. physalus population, different parameters (e.g. abundance, effective size, gene-flow, etc.) are required and some should be updated. To do this, genetic markers, such as microsatellite markers, are useful as they enable changes in genetic diversity to be identified and for the structure of natural populations to be studied [8]. Furthermore, a large number of markers (> 20) allow for the identification of individuals, by producing a unique and immutable identification tag for each organism [9]. Through the identification of individuals, the study of both connectivity and the parentage link between groups of individuals is possible, and in turn, the protection of the population may be enhanced.

Microsatellite markers have been widely used since the 1990s [8]. Microsatellites consist of short nucleotide tandem repeats with DNA sequences that are highly variable in length [8] and commonly dispersed throughout the eukaryotic nuclear genome. Microsatellite length determination generally relies on the design of a PCR-primer pair flanking the variable repeat, and is therefore often species-specific. This method is costly and time-consuming, and for these reasons, testing the transferability to related species is important and is of great interest for further study. The cross-species transferability of microsatellites is unevenly distributed across taxa. For example, microsatellites are more conserved between humans and primates [10], than between oyster species [11]. Mutations in the sequence of microsatellite markers are an important component of the variation between individuals and species [12]. Consequently, some markers may be polymorphic in one species but monomorphic in another [10, 12]. In B. physalus, some microsatellite markers were previously published [7, 13,14,15,16], but were initially developed for the humpback whale, Megaptera novaeangliae. Less than ten specific markers have been developed for B. physalus, which is an insufficient number of markers for accurate population structure and parentage analyses.

The aim of this study was to develop de novo microsatellites for our target species, B. physalus, in order to extend the set of available markers for the study of the Mediterranean population during other biological and conservation studies. To take advantage of the time and cost associated with the creation of a new set of markers, we also tested the newly developed and previously published markers for cross-amplification in four other cetacean species.

Materials and methods

Biological samples

Fifty samples of our target species B. physalus were used in this study. For cross-species amplification, four cetaceans species were selected to test the transferability of microsatellite markers: four minke whales, Balaenoptera acutorostrata, three M. novaeangliae, ten sperm whales, Physeter macrocephalus, and ten long-finned pilot whales, Globicephala melas. Samples from B. physalus, P. macrocephalus and G. melas come from biopsy darts fired from a crossbow taken by the NGO WWF in the North-western part of Mediterranean Sea between 2006 and 2018. In the field, samples were stored in 70% ethanol. For B. acutorostrata, two samples come from specimens stranded along the Mauritanian coast in 2013 (CITES export permit no. 2014-001 and CITES import permit no. FR1402900008-1 provided by the designated authorities in Mauritania and France, respectively) and two others from whale meat (a steak and a sausage) bought in the fishmarket of Bergen (Norway) in July 2018. The samples of M. novaeangliae are biopsies taken by the NGO Cetamada near Sainte Marie Island in the North-eastern coast of Madagascar in 2013 (n = 1) and 2016 (n = 2).

DNA extraction and design of new microsatellite markers

Total genomic DNA was isolated from 12 B. physalus individuals using the Gentra PureGene tissue kit (Qiagen, Hilden, Germany) and sent to GenoScreen molecular marker services (Lille, France) for microsatellite library preparation and sequencing. A total of 1 µg from an equimolar DNA pool of the 12 specimens was used for the development of Illumina TruSeq Nano library, sequenced on MiSeq platform with Nano v2 500 cycles chip. After de-multiplexing, the resulting paired-end reads quality was checked with FastQC v0.11.15 software. Then, the remaining data were subject to assembly/merging through Velvet v1.2.10 program. Finally, the assembly with the best remapping rate was analysed with the QDD v3.1 bioinformatics program [17]. QDD is designed to deal with all steps of treatment of raw sequences until PCR primers are obtained: removal of adapters/vectors, detection of microsatellites, detection of redundancy/possible mobile element association, selection of sequences with target microsatellites and primer design by using BLAST, ClustalW and Primer3 programs. The default setting used for QDD were 200 bp flanking region length, sequence removal less than 80 bp, a minimum at 95% of pairwise identity between sequences of a contig, a minimum at 66% of sequences that must have the same base at a site to accept it as a consensus. The design of primer was determined with the following setting values: a minimum PCR product size at 90 bp and a maximum at 300 bp, a PCR product size interval for iterative primer design at 50 bp, an optimum length of a primer at 20 bp with a minimum at 18 bp and a maximum at 27 bp. Among 3,609,736 raw sequences, 82,283 merged sequences were obtained and 5 044 primer sets were designed. Based on this dataset, 33 primer pairs were selected, according to the repeat number (≥ 9), motif and PCR product size (≥ 100 bp), and tested. Additionally, 14 primer pairs previously published in [7, 13,14,15,16] were also tested. Most of them were initially developed for M. novaeangliae. To identify redundant sequences, we ran the BLASTN algorithm with default values for the parameters (megablast with: word size: 28; match/mismatch scores: 1/− 2; gap cost: linear; filter: low complexity regions; http://blast.ncbi.nlm.nih.gov/Blast.cgi; [18]) for each of the 33 new sequences for which suitable primers were designed and for the 14 previously published on the B. physalus genome assembly (Genbank accession number: GCA 008795845.1).

Molecular analysis

The newly developed loci together with those which were previously published [7, 13,14,15,16] were genotyped in our target species and cross-tested in four other cetacean species to verify their transferability. For the characterisation of newly developed microsatellites, DNA was extracted from small tissue fragments (< 0.5 cm3) using the Gentra PureGene tissue kit (Qiagen, Hilden, Germany).

PCR amplifications were performed using Type-it Microsatellite PCR kit (Qiagen, Hilden, Germany) in 12.5 µL total volume reactions containing 4 µL Type-it Multiplex PCR Master Mix 2X (contains HotStarTaq® Plus DNA Polymerase, Type-it Microsatellite PCR Buffer with 6 mM MgCl2 and dNTPs), 6 µL RNase-free water, 1 µL of primers (2 µM forward and reverse primers diluted in TE pH 8 buffer) and 1.5 µL of DNA template at 50 ng/µL. Forward primers were labelled with a fluorescent dye (PET, NED, VIC or 6-FAM, Applied Biosystems). Amplifications were carried out as follows: 5 min at 95 °C, followed by 40 cycles of 30 s at 95 °C, 1 min 30 s at optimal annealing temperature (53–63 °C, depending on the locus), 30 s at 72 °C and a final extension for 30 min at 60 °C. All loci were combined in six multiplexes according to their size range and primer annealing temperature to perform PCR (Table 1). Each PCR product was run through 1.5% agarose gel and visualized using ethidium bromide to verify amplifications. PCR products were sent to GenoScreen (Lille, France) and allele sizes were assessed using an Applied Biosystems 3730 Sequencer. For accurate sizing, an internal size ladder (GeneScan 500 LIZ, Applied Biosystems) was used.

Table 1 Characterization of the 32 microsatellite markers used for this study on Balaenoptera physalus together with the seven loci also tested but that did not amplify

Data analysis

Allele sizes were scored and checked manually using GENEMAPPER software v.5 (Applied Biosystems). All ambiguous peak profiles were considered as missing data. Controls for the presence of null alleles, scoring errors and large allele dropout were performed with MICRO-CHECKER v.2.2.3 [19]. Allele frequencies, total number of alleles (Na) and private allele numbers (PA) were identified and estimated in GenAlEx v.6.503 [20]. Observed (Ho) and expected (He) heterozygosities were estimated for each locus using GENETIX v.4.05.2 [21]. GENETIX was also used to identify the inbreeding coefficient (FIS) and linkage disequilibrium (LD). Finally, a Principal Coordinates Analysis (PCoA) was computed in GenAlEx with pairwise population matrix unbiased genetic distances in order to explore genetic variance among individuals and species.

Results

Marker amplification on B. physalus samples

Of the 47 primer pairs (33 newly developed and 14 previously published) tested on B. physalus in this study, 32 successfully amplified (70%). For the fifteen markers that did not amplify, eight were newly developed and seven previously published (Table 1). Thus, half of the previously published markers amplified, among which six were initially developed for M. novaeangliae and one for B. physalus (Table 1). For the seven markers that did not amplify (Table 1), five were developed for M. novaeangliae, one for P. macrocephalus and one for B. physalus.

Genetic diversity of B. physalus

The thirty-two amplified loci were polymorphic (from three to fifteen alleles) and were kept for cross-species transferability tests. Twenty-three loci contained dinucleotide repeats, with the number of alleles ranging from three to fifteen (for a total of 202 alleles), four were trinucleotide repeats with number of alleles ranging from three to six (16 alleles in total), and five contained tetranucleotide repeats with five to fifteen alleles (42 alleles in total).

After analysis with MICROCHECKER, null alleles were detected in three loci (from our new set: Bp1430262, Bp747752, and Bp1003953). There was no evidence of scoring errors or large allele dropout in any other loci. Test for linkage disequilibrium revealed that 14% of the pairwise locus combinations were linked (p < 0.05), and distributed among all loci in B. physalus. The three loci with null alleles (Bp1430262, Bp747752 and Bp1003953) showed a significant probability of linkage disequilibrium (29%, 35% and 19% respectively). Due to null alleles and linkage disequilibrium, these loci were removed from further analyses in B. physalus.

The 29 polymorphic loci that showed no evidence of null alleles revealed distinct allele polymorphisms from three (Bp1311471) to fifteen alleles (Bp740180, GATA417, and GT122; Table 1) per locus, with an average of 8.3 alleles per locus. In total, for all loci, 240 alleles were detected in B. physalus. Expected heterozygosity (HE) varied between 0.34 and 0.91 (mean ± standard deviation values: 0.75 ± 0.13) whereas observed heterozygosity (H0) ranged from 0.34 to 0.98 (mean ± standard deviation values: 0.75 ± 0.13; Table 1). The inbreeding coefficient varied from − 0.146 to 0.231, with three loci significantly deviant from the Hardy Weinberg equilibrium (HWE; loci TAA023, Bp1357166 and Bp779455) and with a significant probability of linkage disequilibrium too (13%, 13% and 23% respectively).

Cross-species amplification

All 32 loci were also tested for amplification in four other cetacean species (M. novaeangliae, B. acutorostrata, P. macrocephalus and G. melas; Table 2). The success of the amplification varied among species, and 27 loci were amplified with success in all four species. Cross-amplification revealed a variation in polymorphism depending on the locus and the species from one to nine alleles per locus. The lowest percentages of polymorphic loci were for B. acutorostrata and G. melas with 72% and 75%, respectively. Furthermore, the five species presented private alleles, with numbers ranging from one to thirteen. Balaenoptera physalus presented a large number of private alleles, in total 105, with an average of 3.28 per locus. Physeter macrocephalus and G. melas revealed 39 and 31 private alleles, respectively; with an average of 1.22 and 0.97 private alleles per locus, while M. novaeangliae and B. acutorostrata showed 28 and 19 private alleles, respectively; with an average of 0.88 and 0.59 per locus. The number of private alleles was dependent on the locus. Loci Bp451306 and Bp1430262 allowed for the species to be distinguished, due to a large number of private alleles per species. Regarding the PCoA, the second axis separated the species into four groups (Fig. 1a). Balaenoptera physalus, P. macrocephalus and G. melas were isolated whereas M. novaeangliae and B. acutorostrata were congregated. The third axis clearly divided the specimens into five groups (Fig. 1b), corresponding to the five species. In contrast, the PCoA with the seven previously published microsatellite markers allowed odontocetes (G. melas and P. macrocephalus) to be differentiated from mysticetes (M. novaeangliae, B. acutorostrata and B. physalus), but did not allow differentiation between the three mysticete species (Fig. 1c–AD). The addition of the new set of microsatellite markers thus provides a finer resolution and allows for all species to be differentiated.

Table 2 Results of the cross-species amplifications using the fin-whale loci
Fig. 1
figure 1

Principal coordinates analysis on 5 cetacean species genotyped with newly developed and previously published microsatellite markers (32 microsatellite loci; a and b) or only with previously published (7 microsatellite loci; c and d). a Axis 1 (16%) and 2 (11%). b Axis 1 (16%) and 3 (6%). c Axis 1 (14%) and 2 (9%). d Axis 1 (14%) and 3 (5%). B.p., Balaenoptera physalus; M.n., Megaptera novaeangliae; B.a., Balaenoptera acutorostrata; P.m., Physeter macrocephalus; G.m., Globicephala melas

Discussion

In this study, 32 microsatellite markers were tested on five cetacean species. Twenty-five were newly developed for B. physalus. The 32 loci were amplified in five species, providing a valuable tool for future studies of cetacean populations.

On the 14 previously published markers, only seven could be amplified in this study, while all were verified by their authors on B. physalus with success. Locus GT142 was tested at five temperatures (53, 55, 57, 60 and 63 °C) for amplification, like all microsatellite markers, but none of the temperatures showed an amplification. The non-amplification of this locus can be due to mutations in the primer sites in our population of B. physalus [22], and the displacement of the primer could resolve the non-amplification of the locus GT142. For loci GATA053, GT271 and GT575, the non-amplification appeared when we added the fluorescent dye, for each locus we tested only the temperature of their multiplex. A solution for these loci would have been to test all of the temperatures again, but at this stage, the multiplexes were already established and because we had enough loci to study the population, no further tests were performed. The last three loci that did not amplify (GATA028, EV01 and EV94) were lost during the last step, corresponding to the PCR multiplex with fluorescent dye on the primer. One reason for this may be the interaction between primers of two loci in multiplex PCR, which can lead to poor PCR sensitivity and specificity and/or the preferential amplification of non-target loci [23, 24].

The new set of microsatellite markers displayed numerous dinucleotide repeats (76%), compared to tri- (12%) and tetranucleotide repeats (12%), as also seen in others studies [14, 25]. In general, tri- and tetranucleotides are preferred to dinucleotides, since variations in the number of repeat units are easier to detect between individuals due to the larger number of base pair differences between alleles [26]. These differences remove potential biases such as stuttering and ambiguous allele sizing [27]. However, dinucleotides are also very informative since they usually exhibit higher number of alleles and higher polymorphism. In cetaceans, dinucleotide repeats were found to be significantly more polymorphic than tetranucleotide repeats [25]. In our case, dinucleotides and tetranucleotides have the same level of polymorphism, approximatively 8 alleles per locus. Twenty-nine loci were used resulting in a total of 240 alleles in the dataset, which seems sufficiently robust to study the structure, parentage and history of the B. physalus population [28].

The heterozygosity levels were coherent with results published for B. physalus populations from other regions [29, 30]. Some of the previously published loci (EV37 [13], GATA417 [14] and GT211 [15]) displayed higher heterozygosity levels in our population (EV37: 0.9000; GATA417: 0.9800; GT211: 0.8200) than in other studies (e.g. EV37: 0.75; GATA417: 0.70; GT211: 0.64 in [31]). Only four loci presented significant heterozygote deficiency. The percentage of heterozygote deficiency in our population (14%) is less than in other cetacean species for the same region (Stenella cœruleoalba: 33% [32]; P. macrocephalus: 25% [33]). A high level of heterozygote deficiency could be explained by a significant inbreeding within the population or a ‘Wahlund effect’ due to sampling distinct populations [34]. The level of heterozygosity is linked to the level of linkage disequilibrium (LD), and markers with low heterozygosity have been demonstrated to show less LD than those with high heterozygosity [12]. Locus TAA023 was significantly deviant from HWE in our population in the Mediterranean Sea, but was not deviant in other regions (Gulf of California: [30]), contrary to loci GATA098, EV37 and GT211 which were not deviant in our population but were deviant in other regions.

The level of LD (14%) was similar to the one estimated for B. physalus in different regions (13% in [7]). This level depends on species and populations, and while some cetacean species show a level of LD below 2% [35] as in the present study, many species show no LD [36]. Globally, cetaceans have limited LD in comparison to other species: 78% for Hula painted frog [37], 35% for chimpanzee [12].

The new set of microsatellites included three loci with null alleles. One of them presented a departure from HWE that could be explained by null alleles [38]. Bp747752 showed null alleles, a departure from HWE and linkage disequilibrium, and should be removed for all further analyses. Regarding loci Bp1430262 and Bp1003953, analyses should be performed with and without them to detect any biases that they may trigger in the results. However, the presence of null alleles are dependent on the B. physalus population, and while some loci were detected as likely having null alleles in the Gulf of California [30] this was not the case in our region. The design of new primers could reveal new alleles for many of the homozygous individuals of these three loci.

The 32 microsatellites used in the present study showed a high probability of being polymorphic in different cetacean species (> 72%). Cetaceans have already demonstrated a success greater than 70% for cross-amplification between sub-orders, Odontoceti and Mysticeti [13]. Cetaceans, birds, and frogs are known to have microsatellites with a high degree of success with respect to cross-species amplification [39]. The successful cross-species amplification provides evidence of conserved DNA sequences bordering microsatellite regions for these taxa, confirming previous studies [40, 41]. The success or failure of cross-amplification does not depend on the taxonomic classification system [42]. However, some taxa show a low probability of having polymorphic loci through cross-amplification, such as fishes [43] and bivalves [11]. With regards to our results, the new set of loci could be useful for further studies on other cetacean species.

Furthermore, all loci revealed at least one private allele for one or several species. Some species have numerous private alleles for a single locus. Balaenoptera physalus presented the highest number of private alleles, due to the development of microsatellite markers specifically for this species. Next, P. macrocephalus and G. melas, the two species most phylogenetically distant, displayed the highest number of private alleles. Identification of private alleles showed that the microsatellite markers could be useful to distinguish species. The number of private alleles is a measure of genetic distinctiveness [44]. However, a large number of private alleles could also be an artefact of low sample size. Increasing the number of samples decreases the average frequency of private alleles in the species [45]. Nonetheless, the number of private alleles for B. physalus, which presents a higher sampling effort compared to the other species, suggests that these microsatellite markers are very informative for the study of population structure in this species.

Microsatellite markers can help to detect hybridization events, when markers are numerous and informative. In the infraorder Cetacea, 62 putative hybridization events have been reported in captive and natural populations [46]. In captivity, all hybridizations occurred only in Odontoceti species and more precisely in the Delphinoidea superfamily while in the natural population, 13 concerned Mysticeti species. Our target species B. physalus is involved in 11 hybridizations with Balaenoptera musculus. A complementary use of nuclear and mitochondrial markers makes it possible to better identify the parental species [47]. Our microsatellite markers will likely enable the detection and study of hybridization within the Balaenoptera genus.

The cross-amplification revealed some genetic differentiation between species (Fig. 1). When considering only axes 1 and 2, two species (M. novaeangliae and B. acutorostrata) are grouped together (Fig. 1a), but are separated when considering the third axis (Fig. 1b). Cross-amplification was tested on two Odontoceti species (P. macrocephalus and G. melas) and three Mysticeti species (B. physalus, B. acutorostrata and M. novaeangliae). The emergence and diversification of Mysticeti and Odontoceti occurred 35 Ma ago [48]. This divergence time did not prevent most of the loci developed for B. physalus (Mysticeti) from amplifying in Odontoceti species.

Phylogenetic studies have incorporated M. novaeangliae within the Balaenoptera genus [49]. The proximity of M. novaeangliae and B. acutorostrata with our target species is congruent with the phylogeny. The two Odontoceti species are the most distant on the PCoA, with G. melas, the most isolated, belonging to Delphinidae. Overall, for cetaceans, the cross-amplification was successful regardless of genetic distances, but the success seems to be species-dependent in other groups [50].

The success of amplification (25 de novo and seven previously published) and the high degree of polymorphism detected suggests that these newly developed microsatellite markers will be suitable to infer genetic diversity, population and family structures. Furthermore, the cross-species transferability was successful and provides a new set of markers for four other cetacean species spread across the phylogeny, and will likely be useful for many other cetacean species. The availability of numerous microsatellite markers is critical for studying and understanding such ecologically important migratory populations.