Introduction

Hybridization is a fundamental evolutionary process that results from the interbreeding between two distinct species, subspecies, or populations, which can lead to the mixing of gene pools, the disruption of locally adapted populations, as well as the re-characterization of evolutionarily significant units (Arnold 1992; Schwenk et al. 2008; Gholanhosseini et al. 2013). Introgressive hybridization describes the incorporation of alleles from one lineage into the gene pool of a second divergent entity through hybridization and backcrossing (Anderson 1949; Harrison and Larson 2014). Although animal hybridization in nature is relatively rare (see Coyne and Orr 2004; Mallet 2005; Vences and Wake 2007), at least 10% of avian species are known to hybridize (Grant and Grant 1992; McCarthy 2006; Aliabadian and Nijman 2007; Ottenburghs et al. 2015). This observation makes Aves a great taxon to study the evolution of reproductive isolation mechanisms that prevent species interbreeding (Arnold 1997; Mallet 2007; Price 2008; Gholanhosseini et al. 2013). The process of hybridization, although considered relatively rare, may actually occur more frequently than estimated in the wild or in captivity. On the other hand, introgressive hybridization has been well documented in many hybrid zones, providing evidence for the semipermeable nature of species boundaries (Harrison 1990).

There are multiple potential genetic consequences of hybridization, including the formation of sterile or reduced fitness offspring, the introgression of novel alleles, the extinction of species, and the formation of new species (Kraus and Miyamoto 1990; Arnold et al. 1991; DeMarais et al. 1992; Bullini 1994; Rieseberg et al. 1995; Rhymer and Simberloff 1996; Wolf et al. 2001; Coyne and Orr 2004; Schwarz et al. 2005; Morgan-Richards et al. 2009; Jančúchová-Lásková et al. 2015). The genetic consequences may also vary based on the level of genetic differentiation, reproductive isolation, and degree of sympatry between hybridizing species. Furthermore, genetic consequences may be dependent on the frequency of introgressive hybridization, which can be driven by skewed population sex ratios. Geographic isolation and environmental barriers may limit gene flow among populations, and potentially decrease the impact of hybridization events (Mallet 2005; Slotte et al. 2008; Twyford and Ennos 2012). On the other hand, overlapping breeding ranges may promote introgressive hybridization among recently diverged species, particularly when mate choice is limited in one species, e.g., due to low abundance of individuals or skewed sex ratios (Hubbs principle; Hubbs 1955; Randler 2002). From an evolutionary perspective, hybrid individuals may have equal, reduced, or even greater fitness as compared to their parental species. Decreased fitness of hybrids may promote the evolution of pre-zygotic mechanisms of reproductive isolation (Barton 2008). Alternatively, hybrid individuals may have equal or even higher fitness compared to their parental species and, consequently, a stable hybrid zone may develop (Endler 1977; Barton and Hewitt 1985; Harrison 1990, 1993; Nielsen et al. 2003). Beneficial effects of hybridization were shown in a study of Darwin’s finches, as the hybrid offspring of some crosses had higher fitness compared to their parents over a period of several years (Grant and Grant 1992).

Hybridization events among bird species have been reported within the Sphenisciformes (penguins). This avian order represents one of the most derived groups of living birds (Ksepka et al. 2006), originating about 60 million years ago (Slack et al. 2006) with the last common ancestor to all extant penguins appearing between 20 and 40 million years ago (Baker et al. 2006; Subramanian et al. 2013; Vianna et al. 2020). There are 18–20 extant species of penguins found throughout the southern hemisphere, ranging from Antarctica to their northern-most region at the Galápagos Islands. Several closely related species of penguins have overlapping ranges and, thus, could potentially hybridize. Divergence within the extant banded penguins (genus Spheniscus) and the crested penguins (genus Eudyptes) occurred only recently, within the last 4–6 million years (Gavryushkina et al. 2017; Cole et al. 2019; Vianna et al. 2020). Early studies based on genetic analysis (DNA-DNA hybridization; Sibley and Ahlquist 1990) suggested that Spheniscus and Eudyptes were the two penguin genera most likely to reveal hybridization events, due to their recent divergence and their geographic distributions. In fact, recent genomic analysis of all extant penguins suggests that episodes of genome-wide introgression have occurred during the evolution of both Eudyptes and Spheniscus penguins (Vianna et al. 2020). In contrast, the Aptenodytes, the genus that includes the King (Aptenodytes patagonicus) and Emperor (A. forsteri) penguin species, is the penguin group with the least probability for hybridization, given that these species represent one of the most ancestral divergences among the lineages of extant penguins (Baker et al. 2006).

To date, six studies have documented contemporary hybridization events in the wild within Spheniscus and Eudyptes genera. A study by Simeone et al. (2009) provided evidence of interbreeding between Magellanic (Spheniscus magellanicus) and Humboldt (Spheniscus humboldti) penguins based on behavioral observations, the presence of adult individuals with intermediate morphological characteristics at a nesting site, and DNA evidence of a single hybrid individual (based on mitochondrial DNA restriction patterns). Hybridization studies on Eudyptes were all based solely on morphological or nesting observations, with no genetic confirmation of hybridization. A study by Napier (1968) reported a single mixed pair of an Erect-crested penguin (Eudyptes sclateri) and a Rockhopper penguin (Eudyptes chrysocome) from West Point Island, located within the Malvinas/Falkland Islands. Simpson (1985) also described a hybrid between a Rockhopper and Royal (Eudyptes schlegeli) penguin based on morphological features of a museum specimen, which was originally collected in 1957 from North Head, Macquarie Island. Woehler and Gilbert (1990) described interbreeding and mixed species pairs of Rockhopper and Macaroni (Eudyptes chrysolophus) penguins at Heard and Marion Islands in South Africa, also based on morphological observations. White and Clausen (2002) reported a hybrid cross between Rockhopper and Macaroni penguins located in the Malvinas/Falkland Islands based on morphological characteristics of a single heterospecific breeding pair at a nesting site. However, neither eggs nor hybrid chicks were observed in this study and, thus, evidence of hybridization could not be confirmed. More recently, Morrison and Sagar (2014) observed a male Snares penguin (Eudyptes robustus) and a female Erect-Crested penguin as a breeding pair on the North East Island of the Snares Islands, New Zealand. Specifically, they detected a heterospecific pair attending an egg at a nesting site, and then reported the presence of another egg approximately five days later. Based on the size of the eggs, Morrison and Sagar (2014) determined that the female was in fact an Erect-Crested penguin. The fate of the two eggs, however, was not reported and thus, it is not clear that the hybrids were viable.

Here, we assessed genetic evidence for hybridization events between Magellanic (S. magellanicus) and Humboldt (S. humboldti) penguins in the wild. This study focuses on the genetic characterization of six individual penguin samples that were identified in the field as Magellanic penguins, but later identified in the lab as putative hybrids based on having mitochondrial DNA (mtDNA) COI haplotypes characteristic of Humboldt penguins. These six putative hybrids were all found at the Puñihuil colony, located on Chiloé Island in southern Chile, within the overlapping range of Magellanic and Humboldt penguins (Fig. 1). To detect genetic evidence of hybridization and potential introgression of Humboldt penguin DNA (at the region of sympatry), reference populations of Magellanic and Humboldt penguins were selected from the Atlantic and Pacific Oceans, respectively. Overall assessments of genetic variation in reference populations of Magellanic and Humboldt penguins allowed determining the validation and utility of mtDNA Cytochrome Oxidase subunit I (COI) gene, microsatellites, DRβ1 sequences of the major histocompatibility complex (MHC), and CHD1 ZW sex-linked markers for detecting hybridization events. The use of genetic markers associated with maternal inheritance (i.e., mitochondrial and W sex-linked markers), in addition to nuclear markers with typical biparental inheritance (i.e., microsatellites and MHC), allowed assessment of the utility of these markers in determining the directionality of introgression; e.g., by detecting male versus female introgression of Humboldt penguin DNA into the Magellanic penguin gene pool.

Fig. 1
figure 1

Distribution ranges of Magellanic (S. magellanicus; light gray stripes) and Humboldt (S. humboldti; dark gray) penguins during the breeding season. Reference populations of Magellanic penguins include samples from three colonies in the Atlantic Ocean (Caleta Valdés, Punta Tombo, and Cabo Dos Bahías). Reference populations of Humboldt penguins includes zoo individuals with known ancestry to colonies from the northern distribution of the species in Perú. Sampled colonies of Magellanic penguins from a region of sympatry with the Humboldt penguin on the Pacific Ocean include Sebastiana, Puñihuil, and Ahuenco. BirdLife International (2018a, b) provided spatial data for species distribution ranges

The genetic markers studied provided confirmatory genetic evidence, at both the individual and population levels, of introgressive hybridization between Magellanic and Humboldt penguin species in the wild. Our results provide insights into the role of hybridization in regions of species sympatry and its potential consequences on the levels of genetic introgression, genetic diversity, and conservation of these penguin species.

Materials and methods

Sample collection and DNA extraction procedures

Blood samples from the reference population of Magellanic penguins were collected in 1998–2002 at three colonies from the Atlantic Ocean basin (Caleta Valdés, Punta Tombo, and Cabo Dos Bahías; n = 33 with 11 samples from each colony; see Fig. 1). In addition, blood samples from 23 individuals of Humboldt penguins were obtained in 2017 from the captive population at Chicago’s Brookfield Zoo (Illinois, USA). These Humboldt penguin samples included captive-born individuals with known ancestry traced back to individual penguins captured in the wild in northern Perú. The relatively isolated distribution of the ancestral colonies makes this group a proper reference sample of Humboldt penguin DNA with no genetic introgression from Magellanic penguins. We used these reference populations as “controls” of non-introgressed populations due to their disjunct geographic distributions from the area of sympatry. In addition to samples from the reference populations and the six putative hybrids, a total of 33 blood samples of Magellanic penguins were collected in 2004 at three breeding colonies (Sebastiana, Puñihuil, and Ahuenco; 11 samples per colony) from the Pacific Ocean coast, in the overlapping range of the species. In most cases, samples collected from populations of Magellanic penguins were identified morphologically based on their characteristic black throat band bordered by white feathers and the amount of pink pigmentation around the beak. We used samples from these colonies to assess potential introgression and gene flow between Magellanic and Humboldt penguins located in the region of sympatry.

Blood samples (~ 100–200 µL) from Magellanic penguins were obtained by puncture of the brachial vein and stored in Queen’s lysis buffer (0.01 M Tris, 0.01 M NaCl, 0.01 M EDTA, and 1% n-lauroylsarcosine, pH 7.5; Seutin et al. 1991). DNA extractions from blood samples were performed using standard phenol–chloroform extraction protocols followed by ethanol precipitation (Sambrook et al. 1989) or through Qiagen DNeasy Blood and Tissue DNA Extraction kits (Qiagen Inc., Valencia, CA, USA). Collection of blood from Magellanic penguins in the wild was performed with corresponding permits from Division of Fauna and Flora, Chubut Province, Argentina (Permit # 18 / 02FyFS), and the Division of Forestry (CONAF) and the Secretary of Fisheries of Chile (Permit # 3523, 2003). Samples were then transported to the United States with importation permits from the USDA-APHIS (USDA permit # 42579 to Robert C. Faucett and Alan Clark, University of Washington Burke Museum of Natural History; importation permit # 51802 to Juan L. Bouzat, Bowling Green State University). Humboldt penguin blood samples were collected by Chicago’s Brookfield Zoo veterinary personnel during routine annual animal checkups and submitted to our lab for genetic analysis.

Genetic analysis

Four types of genetic markers were analyzed to assess their potential use in detecting hybridization. These included the mtDNA COI gene (Hebert et al. 2003a, b; Bouzat et al. 2009), six microsatellite loci that have been previously designed and tested in Spheniscus penguins (Akst et al. 2002; Bouzat et al. 2009, 2013), the class II DRβ1 exon 2 of the Major Histocompatibility Complex (MHC) (Kikkawa et al. 2005, 2009; Knaffler et al. 2012), and the ZW CHD1 sex-linked intron chromosome marker that is commonly used for avian sex determination (Griffiths et al. 1998).

Amplification and sequencing of mtDNA

A section of the mtDNA COI gene, which is commonly used as a DNA barcode to identify species (Hebert et al. 2003a, b), was amplified using primer sets EM5287 (5′-CACATCAATGAGCTTGCAACTC-3′) and COI-R722 (5′-TAAACTTCAGGGTGACCAAAAAATYA-3′). These primers have previously shown to amplify mtDNA of Spheniscus penguins (Bouzat et al. 2009, 2013). PCR amplifications were performed in 25 µL volumes containing approximately 40 ng of DNA, 1X of GoTaq Flexi Buffer, 1 mM of MgCl2, 0.08 mM of each dNTP, 0.4 µM of each primer, and 0.5 units of GoTaq Flexi DNA polymerase (Promega, Madison, WI, USA). The PCR amplification profiles included an initial denaturing step for 3 min at 94 °C, followed by 35 cycles of 30 s at 94 °C, 30 s at 52 °C, and 30 s at 72 °C, with a final extension step of 5 min at 72 °C. PCR products were purified by ethanol precipitation and then sent for direct sequencing at the University of Chicago Comprehensive Cancer Center DNA Sequencing and Genotyping Facility.

Microsatellite genotyping

Six microsatellite markers (B3-2, G2-2, H2-6, M1-2, M1-11, and G3-6) originally developed for Spheniscus penguin species (Akst et al. 2002; Bouzat et al. 2013) were used to assess potential hybridization events. These markers have previously shown variability within Magellanic and Galápagos (Spheniscus mendiculus) penguin species (Akst et al. 2002; Bouzat et al. 2009, 2013). Microsatellite amplification reactions were performed in 25 µL volume containing approximately 40 ng of DNA, 1X of GoTaq Flexi Buffer, 1.5 mM of MgCl2, 0.2 mM of each dNTP, 0.4 µM of each primer, and 0.625 units of GoTaq Flexi DNA polymerase. PCR amplification profile included an initial denaturing step for 2 min at 95 °C followed by 30 cycles of 45 s at 95 °C, 45 s at the respective annealing temperature, and 45 s at 72 °C, with a final extension step of 10 min at 72 °C. The annealing temperature for microsatellite markers B3-2, G2-2, H2-6, M1-2, and M1-11 was set at 51 °C, while the annealing temperature for microsatellite marker G3-6 was set at 53 °C. All successful PCR amplification products were sent for fragment analysis at the University of Chicago.

Amplification and sequencing of MHC

PCR amplification of the DRβ1 of the penguin MHC was performed using primers Lpen.hum1F2 (5′-ACTCCTGGCACAGCCGCGTG-3′) and Lpen.hum2R (5′-TCCCCTGCAGCCCTCCCTG-3′), which were originally developed by Kikkawa et al. (2005, 2009) and have proved to be locus-specific for Spheniscus penguin MHC (Kikkawa et al. 2005, 2009; Knaffler et al. 2012). PCR amplification reactions were performed in 25 µL volumes containing approximately 40 ng of DNA, 1X of GoTaq Flexi Buffer, 1.5 mM of MgCl2, 0.2 mM of each dNTP, 0.5 µM of each primer, and 0.5 units of GoTaq Flexi DNA polymerase. PCR amplification procedures followed the protocol outlined by Knaffler et al. (2012). Direct sequencing of amplification products was initially used to classify individuals as either homozygotes or heterozygotes, based on the presence of double peaks in DNA sequence electropherograms. Individual alleles were then confirmed through cloning and sequencing of amplification products. MHC amplification products were cloned using the pGEM-T Easy Vector System (Promega Corporation, Madison, Wisconsin) following the protocol outlined by Arauco-Shapiro et al. (2020). Four to six colonies per individual were then selected for PCR amplification and direct sequencing of individual alleles. DNA sequencing of amplification products from clones was performed at the University of Chicago. This approach allowed us to confirm individual alleles for every genotype in all samples analyzed.

CHD sex-linked markers

PCR amplification of the chromo-helicase-DNA-binding protein gene (CHD1) was performed to target CHD1-Z and CHD1-W sex-linked DNA sequences. PCR was performed using primers CHD-P8 (5′-CTCCCAAGGATGAGRAAYTG-3′) and CHD-P2 (5′-TCTGCATCGCTAAATCCTTT-3′) which were originally developed by Griffiths and Tiwari (1995) and Griffiths et al. (1998) to amplify avian ZW chromosome-linked CHD1 sequences. PCR amplification reactions were performed in 20 µL volumes containing approximately 40 ng of DNA, 1X of GoTaq Flexi Buffer, 2.5 mM of MgCl2, 0.24 mM of each dNTP, 1.0 µM of each primer, and 0.5 units of GoTaq Flexi DNA polymerase. The PCR amplification profile included an initial cycle of 4 min at 94 °C, 30 s at 55 °C, and 1 min at 72 °C, followed by 33 cycles of 30 s at 92 °C, 30 s at 50 °C, and 45 s at 72 °C, with a final extension step of 5 min at 72 °C. Gel electrophoresis of amplification products was initially used to categorize individuals as either males (ZZ) or females (ZW). We chose to clone and sequence Z and W alleles for five males and five females from each group along with the six putative hybrids. CHD1 amplification products were cloned using the pGEM-T Easy Vector System. As with the MHC marker, four to six colonies per individual were used as a source of DNA for PCR amplification and direct sequencing of Z- and W-CHD1 alleles. Sequencing of amplification products from individual clones was outsourced to the University of Chicago.

Experimental design and data analysis

All sequences and genotypes were analyzed using either Geneious, version R6.1.8 (Kearse et al. 2012) or BioEdit, version 7.0.5.3 (Hall 1999). For mtDNA, Magellanic and Humboldt penguin haplotypes were compared to identify species-specific differences. Microsatellite genotypes were assigned by fragment analysis of electropherograms. We compared reference populations (Humboldt and Magellanic-Atlantic) to assess allele and genotype frequencies, and the potential presence of species-specific alleles. Similarly, MHC allele sequences and frequencies were compared among individuals both within and between species to detect informative markers of hybridization. CHD1 Z and W sequences of putative hybrids were also compared to the Magellanic and Humboldt reference populations. Since birds have a ZW sex-determination system, all female penguins (i.e., the heterogametic sex) should have only one CHD1 Z and one CHD1 W sequence, while males should have two CHD1 Z sequences. Consequently, males can have one (homozygote) or two (heterozygote) unique CHD1 Z sequences.

Reference populations and diversity

To assess the utility of the studied markers for detecting evidence of hybridization, we compared both putative hybrid individuals as well as the population of Magellanic penguins from the Pacific Ocean (in the region of sympatry with the Humboldt penguins) with the Magellanic and Humboldt reference populations. Due to the variance in sample sizes between Magellanic and Humboldt penguin reference populations (n = 33 versus n = 23, respectively), we developed rarefaction curves to assess the potential effect of sample size on the number of alleles detected at the MHC and microsatellite markers. Rarefaction curves showed that 25 random samples of Magellanic penguins (n = 23, from reference population) revealed an average of 46 alleles (range 41–53) across all six microsatellite loci, and an average of 17 MHC alleles (range 11–20). Rarefaction curves indicated that high variation in the Magellanic penguin compared to the Humboldt penguin did not result from differences in sample size (see Supplementary Material, Figure S1).

Evaluation of putative hybrids

A genetic marker was characterized as an informative marker if it exhibited species-specific haplotypes and/or alleles that allowed tracing back the ancestry of hybrid individuals to either Magellanic or Humboldt penguins. We compared the six focal samples identified as putative hybrids and the potentially introgressed population of Magellanic penguins (from the zone of sympatry with Humboldt penguins in the Pacific Ocean) to the reference populations. The reference populations provided non-introgressed “controls” for the identification of species-specific alleles/haplotypes for each genetic marker analyzed. Occasionally, alleles that are at high frequencies in one species but are rare in the other species could also provide potential evidence of hybridization. For mtDNA, species-specific haplotypes identified in “control” reference populations would provide direct evidence of hybridization events (e.g., by detecting Humboldt mtDNA haplotypes in morphologically characterized Magellanic penguins). At the population level, an introgressed population would reveal presence of both Magellanic and Humboldt mtDNA species-specific haplotypes. Due to the maternal inheritance of mtDNA haplotypes, the presence of Humboldt mtDNA in a Magellanic penguin population would suggest the directionality of initial introgression by the breeding of a female Humboldt penguin with a male Magellanic penguin. The utility of nuclear markers (i.e., microsatellites and MHC) in determining hybridization events depends on the presence of ancestral polymorphisms shared by both species versus unique alleles, which may give species-specific signals of genetic ancestry. For example, F1 hybrid individuals should have genotypes with species-specific alleles from each parental species. Similarly, Magellanic penguin populations with Humboldt penguin introgression should show presence of alleles with both Magellanic and Humboldt penguin ancestry. ZW sex-linked markers may also suggest the directionality of hybridization, since W chromosomes are maternally inherited. Magellanic penguin populations should show only Z species-specific alleles from Humboldt penguins if the initial introgression resulted solely from the hybridization of male Humboldt with female Magellanic penguins. On the other hand, introgression through a female Humboldt would result in both Z and W Humboldt haplotypes and, thus, may not exclude contribution of DNA from male Humboldt penguins through the Z chromosome.

Detection of admixture at population level

In addition to the comparative analysis of individual genotypes/haplotypes, we performed a cluster analysis using Structure, version 2.3.4 (Pritchard et al. 2000), to characterize reference populations and identify potentially admixed individuals (i.e., F1 hybrid individuals or backcrossed hybrids beyond F1). Structure uses multi-locus genotype data (based on non-linked loci) and Bayesian model-based clustering methods for defining population structure, characterizing distinct genetic populations, and identifying admixed individuals within sample groups (Pritchard et al. 2000). Prior to running Structure on the MHC and microsatellite genotypes, we eliminated microsatellite locus M1-2 since linkage analysis in GENEPOP (Raymond and Rousset 1995; Rousset 2008) revealed significant disequilibrium between loci M1-2 and G3-6 across all populations (P ≤ 0.048). GENEPOP detected no deviations from Hardy–Weinberg equilibrium in either reference population, but did show that M1-2 and G3-6 deviated from HWE (P ≤ 0.011) at the Puñihuil colony in the Magellanic population from the Pacific Ocean (i.e., the colony from which the putative hybrids were sampled). For the initial Structure analysis of the reference populations, we performed independent analyses of the MHC locus and microsatellite loci using the location prior option to specify Magellanic versus Humboldt reference samples. Since the independent runs (MHC vs microsatellites) provided results consistent with that found when the loci were combined and run with no priors, we only present here the combined analysis of MHC DRβ1 and microsatellites (again, excluding the M1-2 locus). Structure analyses were done using the admixture ancestry model (with no priors), independent allele frequencies model, a burn-in period of 10,000 iterations, and 10,000 iterations to compute posterior probabilities for identifying the number of distinct clusters (K) estimated from the data. We performed 10 replicate runs for each K (for K = 1 to 5) to verify the consistency of our estimates between runs. At the inferred K, we observed that estimates of α varied little throughout runs and individual assignments (Q) were consistent across runs. Likewise, increasing the number of iterations to 100,000 did not change the inferred K or significantly change assignment values. We first performed a Structure analysis including only the reference populations to assess the ability of the selected markers in assigning individuals to either Magellanic or Humboldt penguin controls. We then added the six potential hybrid samples to determine their assignment probabilities to the reference populations and quantify their potential admixture. Finally, a global structure analysis was used to assess levels of possible admixture in the region of sympatry; thus, Magellanic individuals from the Pacific were included in this analysis. The online program Structure Harvester (Earl and vonHoldt 2012) was used to determine the optimal ΔK (Evanno et al. 2005) estimated from the data (see Supplementary Material, Figures S2.1–S2.3). Lastly, we used the FullSearch algorithm within CLUMPP v 1.1.2 (Jakobsson and Rosenberg 2007) to permute the ten runs at the inferred K to determine the optimal alignment of clusters and calculate Q across replicates.

Determination of hybrid or admixed individuals

Based on the genetic analysis, individuals were identified as hybrids (admixed) when individuals either exhibited a combination of species-specific haplotypes/alleles from both species or had a much lower Structure assignment probability than the average assignment of samples to their reference populations. For individuals with Magellanic-specific mtDNA haplotypes, given the high probability of Structure assignments to their corresponding species (mean Q = 0.99; see "Results"), we used a conservative threshold value of Q < 0.80 to identify individuals with potential Humboldt penguin ancestry in the nuclear markers. Any identified individuals were then evaluated for the presence of species-specific alleles that would confirm mixed ancestry across the studied markers.

Results

The genetic characterization of Magellanic and Humboldt penguin reference populations allowed assessing the usefulness of the studied genetic markers for detecting hybridization. Excluding the ZW sex-linked marker, all other genetic markers (COI, microsatellites, and MHC) revealed polymorphisms both within and among species, including species-specific haplotypes/alleles that could be used to provide evidence for hybridization.

Analysis of mtDNA COI sequences from Magellanic and Humboldt penguin reference populations revealed haplotypes that were not shared between the two species. Although COI haplotype differences within species were small (one to three base pairs), all haplotypes were detected in multiple individuals and/or confirmed through multiple sequencing runs (see Supplementary Material, Table S1 for frequency distribution of haplotypes). The detection of species-specific COI haplotypes is consistent with data previously reported in GenBank, supporting the use of this marker as a molecular barcode for species identification (Hebert et al. 2003a, b). Magellanic penguin reference samples revealed three distinct COI haplotypes that were not detected within the Humboldt penguin species (GenBank accession numbers: MN565806, MN565807, MN830876). Similarly, samples of Humboldt penguins revealed three distinct haplotypes that were not present in Magellanic penguin species (GenBank accession numbers: MN830877-MN830879). The characteristic haplotypes from Magellanic and Humboldt penguins differed by fourteen fixed base pair differences (see Supplementary Material, Figures S3 and S4). Analysis of all six putative hybrid individuals revealed a single mtDNA haplotype that was identical to the most common Humboldt penguin haplotype (Shum_01); thus, providing evidence of likely hybridization between the two species (Table 1). In contrast, analysis of the Magellanic penguin samples from the Pacific Ocean colonies revealed 3 COI haplotypes, two of which were species-specific to the Magellanic penguin reference population and one additional haplotype (GenBank accession number: MN565808) that was not found in either reference population but differed by only one base pair from the most frequent haplotype in the Magellanic reference population (see Supplemental Material, Table S1 and Figure S4).

Table 1 Genotype and haplotype characterization of six putative hybrids between Magellanic and Humboldt penguins and an admixed individual detected by the Structure analysis

Analysis of microsatellite markers showed significant differentiation in the number and frequencies of alleles detected in each species (see Supplementary Material, Tables S2.1–S2.6 for frequency distribution of alleles). In the Magellanic penguin reference population (n = 33), the analysis of six loci revealed 53 alleles within the three colonies assessed. Of the 53 alleles described, 44 alleles were species-specific; i.e., they were only found in the reference population of Magellanic penguins and not in the Humboldts. When including the reference population of Magellanic penguins and those from the Pacific (n = 66) we detected a total 63 alleles. On the other hand, Humboldt penguins (n = 23) revealed 16 alleles, seven of which were specific to this species when compared to the Magellanic reference population. In the Magellanic penguins the highest variation was found at loci B3-2, G3-6, and M1-2, which revealed an average of 15 alleles per locus (range 8–19). The remaining loci (G2-2, M1-11, and H2-6) were less polymorphic, showing an average of three alleles per locus (range 1–6). Humboldt penguin samples had lower diversity at most microsatellite loci, revealing an average of three alleles per locus (range 1–7). Twenty-six microsatellite alleles were detected within the six putative hybrid samples; fifteen of these were species-specific to the Magellanic penguin reference population, whereas four alleles were species-specific to the Humboldt penguin. In addition, one of the putative hybrid individuals (PBC1) had an allele that was unique (i.e., was not detected in any other Magellanic or Humboldt individuals).

Analysis of the MHC class II DRβ1 marker revealed a total of 39 distinct alleles found across all penguin samples analyzed (GenBank accession numbers: MN565812-MN565826, MN830880-MN830905). Twenty-two of these 39 alleles were species-specific to the Magellanic penguin reference population, whereas six were species-specific to the Humboldt penguin (see Supplementary Material, Table S3 and Figure S5). Overall, Magellanic and Humboldt penguin reference populations shared two MHC alleles. Two of the six putative hybrid individuals (PBA1 and PBA2) revealed alleles that were species-specific to the Humboldt penguin, while one individual (PB02) revealed a distinct allele that was not found in any other penguin. The remaining putative hybrids showed MHC alleles that were mostly shared between the two species (Table 1; see Supplementary Material, Table S3 for MHC DRβ1 frequency distributions). An additional eight alleles were observed only within Magellanic individuals from the Pacific.

Analysis of the CHD1 sex-linked marker revealed three Z and three W alleles within the Magellanic penguin reference samples and those from the Pacific (GenBank accession numbers: MN830906-MN830908, MN830910-MN830912). On the other hand, the Humboldt samples revealed a single Z and W allele (GenBank accession numbers: MN830909, MN830913). In both cases, the monomorphic variant from the Humboldt penguin was identical to the most common Z and W allele found within the Magellanic penguin species (see Supplementary Material, Table S4 and Figures S6.1–6.2). All six putative hybrid individuals also had a single Z and a single W corresponding to the most frequent allele found across both species. Thus, this result eliminated the ZW marker as an informative marker for the detection of hybridization between Magellanic and Humboldt penguins, as the absence of Humboldt species-specific alleles did not allow differentiating the potential contributions of Humboldt and Magellanic penguin DNA to the genome of hybrid individuals.

Our results showed that three of the four markers (COI, microsatellites, and MHC DRβ1) were informative because they provided both Magellanic and Humboldt species-specific alleles or haplotypes that could be used to trace species ancestry in hybrid individuals. The analysis of these markers determined that these individuals did not represent F1 hybrids, as evidence for this would require the presence of alleles from both species at all nuclear loci (i.e., microsatellites and MHC). The putative hybrid individuals, however, did provide positive evidence of genetic introgression of Humboldt penguin’s DNA through backcrossing events (Table 1). The detection of Magellanic penguin species-specific alleles at microsatellites and the MHC in four of the six individuals with characteristic Humboldt COI mtDNA provides positive evidence of introgressive hybridization (Table 1). These four individuals showed evidence of hybridization in at least four of the seven nuclear markers genotyped. Two of the putative hybrids, however, revealed that all of their alleles or haplotypes represented either Humboldt species-specific or shared alleles (i.e., none were specific to the Magellanic reference population) (see Table 1). Therefore, we cannot rule out the possibility that these two individuals are Humboldt penguins (rather than hybrids), which were likely misidentified during the collection process at the Puñihuil colony. The population analysis revealed some Humboldt species-specific (and/or high frequency) microsatellite and MHC alleles in the Magellanic penguin population from the Pacific Ocean basin, which is also consistent with Humboldt penguin genetic introgression.

The Structure analyses validated the use of microsatellites and MHC as informative markers for the potential detection of hybridization and quantification of admixture. All ΔK analyses showed that K = 2 was the optimal number of clusters estimated from the data, which always represented groups corresponding to the Magellanic and Humboldt penguins (see Supplemental Material Figures S2.1–S2.3). The reference populations revealed that all individuals had high probabilities of assignment to the corresponding species (mean Q = 0.990; Fig. 2a). Although one Magellanic individual had only 0.823 probability of assignment to the Magellanic reference population, this result appears driven by that individual being homozygous for an allele fixed in the Humboldt reference population. However, when location priors were applied (data not shown), this individual's assignment probability reached 0.95.

Fig. 2
figure 2

Structure analysis of microsatellite and MHC loci of Magellanic and Humboldt penguin samples. Vertical bars represent individuals with assignment probabilities (Q, y-axis) to the Magellanic (gray) and Humboldt (white) populations. The grey and white horizontal bars below the assignment probabilities represent corresponding species-specific mitochondrial haplotypes of those individuals as determined by COI sequences. Asterisks (*) indicate individuals of hybrid origin that were identified by this study. a Structure analysis (K = 2) for Magellanic and Humboldt reference populations. b Structure analysis (K = 2) for Magellanic and Humboldt reference populations and Putative Hybrids (PH). c Structure analysis (K = 2) for Magellanic and Humboldt reference populations, Putative Hybrids (PH), and the potentially introgressed population of Magellanic penguins from the Pacific Ocean

Consistent with the results from our comparative analysis, the assignment of the six putative hybrids revealed that two individuals were most likely Humboldt penguins (rather than hybrids), as they had 0.995 probability of assignment to the Humboldt penguin reference population (Fig. 2b). On the other hand, the other four samples represented admixed individuals with some degree of genetic introgression of Humboldt penguin DNA. This was confirmed by the presence of Humboldt mtDNA in individuals with relatively high Magellanic penguin genetic assignment probabilities based on nuclear markers (0.913–0.990 assignment as Magellanic). The Structure assignment of these samples showed that the two putative hybrid individuals with the largest admixture had 0.087 (individual PB01) and 0.062 (individual PBC1) probabilities of assignment to the Humboldt penguin reference population, indicative of limited genomic introgression of Humboldt penguin DNA (Fig. 2b). As in the assessment of individual genotypes (Table 1), the Structure analysis showed that none of the putative hybrids represented F1 hybrids (Fig. 2b). The Structure assignment of the sympatric population of Magellanic penguins in the Pacific Ocean revealed very low levels of genetic introgression at the population level, as the average probability of assignment of all Pacific samples (including the four confirmed hybrids) to the Magellanic penguin species was 0.974 (Fig. 2c). However, one Magellanic individual from the Pacific showed high nuclear introgression of Humboldt penguin DNA (with 0.362 probability of assignment to Humboldt penguins). Analysis of individual markers showed that this individual (PBD2) from Puñihuil had Humboldt specific or high-frequency alleles at two microsatellite loci and the MHC (Table 1).

Discussion

We found genetic evidence confirming hybridization events between Magellanic and Humboldt penguins in the wild. Our results showed that three of the four genetic markers analyzed (COI, microsatellites, and MHC DRβ1) were useful to assess hybridization because they provided species-specific alleles or haplotypes that could trace species ancestry in admixed individuals (e.g., F1 hybrids or backcrossed hybrids beyond F1). We provided genetic evidence confirming introgressive hybridization in four out of six putative hybrid samples that were identified as Magellanic penguins but had characteristic Humboldt mtDNA sequences. The remaining two individuals revealed both mitochondrial and nuclear marker variants that confirmed they were in fact Humboldt penguins, rather than hybrids.

Unfortunately, the genetic analysis of CHD1 sex-linked markers did not reveal Humboldt species-specific alleles. Therefore, no major inferences could be made regarding the directionality of introgression with CHD1 markers. However, the skewed sex ratio with more males than females likely favors male Magellanic penguins mating with female Humboldt penguins (Boersma et al. 2013; Gownaris and Boersma 2019). The analysis of the mtDNA COI gene provided information about maternal introgression of Humboldt penguin DNA into the sympatric Magellanic penguin population from the Pacific Ocean. It is therefore clear that hybrid crosses resulting from mating between male Magellanic and female Humboldt penguins have contributed to the genetic introgression of Humboldt mtDNA into Magellanic penguin populations. The analysis of the relative contribution of maternal versus paternal introgression of Humboldt DNA was limited by the lack of species-specific variation in the ZW CHD1 sex-linked marker and the design of our study, which did not include Humboldt penguin samples from the region of sympatry. However, the ratio of Humboldt to Magellanic penguins at Puñihuil, the breeding colony where we detected introgressive hybridization, is heavily skewed (approximately 1:7) due to low abundance of Humboldt penguins at the edge of their breeding range (see Simeone et al. 2009; Vianna et al. 2014). Likewise, long-term studies on Magellanic penguin breeding colonies from the Atlantic Ocean exhibit male-biased adult sex ratios (Boersma et al. 2013; Gownaris and Boersma 2019; Gownaris et al. 2020). These observations suggest that our mtDNA results are in alignment with expectations of introgression directionality as predicted by the Hubbs principle (Hubbs 1955).

The Structure analysis confirmed that the microsatellites and the MHC DRβ1 loci are informative markers of hybridization. The analysis of reference populations provided a mean assignment probability of 0.990 to either Magellanic or Humboldt penguin species, indicating that the levels of variation in these markers allowed the correct assignment of samples to either species. Consistent with the individual genetic profile of the putative hybrid individuals at the mitochondrial and nuclear markers, the Structure analysis confirmed that four of the six samples represented admixed individuals with varying levels of Humboldt nuclear DNA introgression (0.010–0.087). Although these admixed individuals had Humboldt species-specific mtDNA haplotypes, they all had multiple nuclear loci with Magellanic alleles, together indicating that these individuals resulted from hybrid backcrosses.

The Structure analysis that included the Magellanic penguin population from the Pacific Ocean (i.e., the population from the region of sympatry with Humboldt penguins and where the putative hybrid samples were collected) allowed detection of one additional hybrid individual, which showed a higher level of introgression of Humboldt nuclear DNA (0.362). This individual (PBD2) had a characteristic Magellanic mtDNA haplotype but Humboldt specific or high-frequency alleles at two microsatellite loci and the MHC, which also suggests admixture from hybrid backcrosses (see Table 1). At the population level, assignment probabilities indicated that the Magellanic penguin sympatric population revealed very limited Humboldt penguin introgression, with only five out of the 37 individuals analyzed showing some signs of past hybridization events. However, Humboldt nuclear DNA introgression (as estimated by Structure assignment probabilities) indicated an average of 9.2% Humboldt DNA in the five hybrid individuals, but only 2.6% when averaged across the whole population. It is worth noting that the five hybrid samples all come from the Puñihuil colony, which represents a colony with significant number of intermixed nesting sites of Magellanic and Humboldt penguins (Simeone et al. 2009; Bouzat and Boersma, personal observations), and thus it has high potential for inter-species breeding interactions.

The ability to successfully detect hybridization events in the wild depends on whether genetic markers are informative of species ancestry, which in turn may depend on the species-specific variation detected. In this study, we assessed marker variation in reference populations of Magellanic and Humboldt penguins. Although the Magellanic reference population was sampled from breeding colonies on the Patagonian shores of the Atlantic Ocean, the Humboldt reference population was collected from captive individuals with ancestry to allopatric colonies in the northern range of the species (in Perú). One may argue that a sample of captive individuals may harbor limited variation due to potential inbreeding and isolation, and thus may affect assignment probabilities and the ability to detect admixed individuals in the wild. However, this was not an issue in our study, as the variation detected provided enough power to correctly assign with high probability (0.99) any reference sample to a single species in the Structure analysis. In a way, the characterization of two of the putative hybrids as misidentified Humboldt penguins could be considered as a “blind control” for the efficacy of the Structure analysis and the studied genetic markers in the assignment of unknown samples to the studied species. Furthermore, one would expect that additional samples from the wild would provide higher levels of variation in the reference population (Schlosser et al. 2009; Dantas et al. 2019), higher probability of detecting species-specific markers, and thus higher power to detect potential hybridization events.

The process of hybridization between closely related species plays an important role in defining the genetic integrity and overall genetic diversity of species (Grant and Grant 1992). Evolutionary theory predicts that closely related species have a higher probability to interbreed and hybridize with one another because they are less likely to have evolved reproductive isolation mechanisms (Price 2008). Therefore, one would expect that genetic divergence between species might be associated with their hybridization potential, particularly in recently evolved taxa such as the Sphensicus. Speciation of extant species within the Spheniscus genus was relatively recent; i.e., diversification amongst species within this group occurred within the last four million years (Baker et al. 2006; Gavryushkina et al. 2017). Simeone et al. (2009) suggested that the current overlapping geographic distribution between Magellanic and Humboldt penguins represents a region of secondary contact, rather than sharing an ancestral geographic area where the two species diverged. This is consistent with phylogenetic studies suggesting that Magellanic and Humboldt penguins are part of two independent evolutionary lineages (Grant et al. 1994; Thumser and Karron 1994; Thumser et al. 1996), with Magellanic penguins being more closely related to African penguins (S. demersus), and Humboldt penguins sharing ancestry with Galápagos penguins (S. mendiculus) (Bertelli and Giannini 2005; Baker et al. 2006; Ksepka et al. 2006; Gavryushkina et al. 2017; Vianna et al. 2020). It is reported that Humboldt and Magellanic penguins readily cross breed in captivity (McCarthy 2006). These reports and our study suggest that the recent genetic divergence between these species may have not led to the full development of reproductive isolation mechanisms (Price 2008), which would prevent the potential formation of hybrids. This is consistent with a recent study of extant penguin genomes, which reported genome-wide introgression between Magellanic penguins and the Galápagos/Humboldt ancestor (Vianna et al. 2020).

The present study, and the morphological observations and DNA analysis by Simeone et al. (2009), suggest that rates of hybridization and levels of introgression in the area of sympatry of Magellanic and Humboldt penguins is relatively low. Although signals of genetic admixture were found in 13% of the individuals analyzed, these had predominantly nuclear Magellanic ancestry with Humboldt mtDNA haplotypes, suggesting limited hybrid introgression into the Magellanic population. The more flexible breeding cycle of the Humboldt penguins and more set seasonal cycle of the Magellanic penguins (García Borboroglu and Boersma 2013) decreases the likelihood of interbreeding between these species. From a conservation perspective, the limited levels of Humboldt genetic introgression into Magellanic penguins may not pose a significant threat to the latter. However, it may be of special concern for Humboldt penguin populations in the region, which are experiencing continued decline (Vianna et al. 2014). Future studies evaluating hybrids in sympatric populations of Magellanic and Humboldt penguins would be required to further assess the directionality and relative significance of hybridization in these species.