Introduction

Genes involved in the immune response directly affect survival and are targets for natural selection as a consequence of the evolutionary arms race between host and pathogen (Apanius et al. 1997; Best et al. 2009; Hamilton 1980; Shultz and Sackton 2019). Many bird species migrate and thus must constantly cope with a plethora of pathogens and parasites endemic to multiple environments (Gill 2007; O’Connor et al. 2018). Thus, signatures of positive selection for amino acid diversification in avian immune genes should be detectable on an evolutionary time scale.

In this study, the immune loci of interest represent both major vertebrate immune gene families: one of the adaptive immune system (the Major Histocompatibility Complex, MHC) and one of the more ancient innate immune system (the Toll-like Receptor family, TLRs). Both encode receptors that bind microbial ligands and initiate attacks on foreign invaders (Acevedo-Whitehouse and Cunningham 2006; Vinkler and Albrecht 2009). Evidence in birds, primarily at the population level but also across species, suggests the highly polymorphic MHC and to some extent, the TLRs, are subject to positive selection for amino acid diversification particularly in the gene regions responsible for binding parasite-derived ligands (Alcaide and Edwards 2011; Borghans et al. 2004; Downing et al. 2010; Grueber et al. 2014; Khan et al. 2019; Minias et al. 2019; O’Connor et al. 2018; Sommer 2005). In sharp contrast, most functionally constrained genes (e.g., histones, ubiquitins, and heatshock proteins) are highly conserved within and among species, as they evolve under purifying selection that purges mutations that cause amino acid substitutions (Nei et al. 1997, 2000; Piontkivska et al. 2002). Thus, they can effectively serve as control genes for studies of diversifying selection.

The MHC multigene family of the adaptive immune system plays a key role in the immunological and evolutionary response to novel pathogens, with selection often favoring particular alleles or numbers of alleles that confer resistance while maintaining high levels of genetic variation in a population in the form of allelic diversity and heterozygosity (Hedrick 1994; Piertney and Oliver 2006). In this study we focused on the primary classical MHC Class I locus, referred to herein as Mhc1 (sometimes named Uaa), which recognizes intracellular parasites such as viruses and certain protozoa such as haemosporidian parasites, thus affecting a host’s resistance to specific diseases such as avian malaria (Antonides et al. 2019; Bonneaud et al. 2006; Hess and Edwards 2002; Sepil et al. 2013). Mhc1 encodes a polypeptide (the alpha chain) which forms a receptor which recognizes and binds particular parasite-derived ligands; the antigens are then displayed on the cell’s surface for recognition by T-cells and subsequent attack on the foreign invader (Janeway 2005; Klein 1986). While most studies of MHC Class I diversity and evolution focus on the peptide-binding region of the alpha chain, we considered the entire coding sequence as different regions of the gene are likely under different selection pressures and evolve via different mechanisms (Yang et al. 2000).

The TLR multigene family recognizes conserved molecular signatures of microbial classes, and upon binding a foreign ligand, induce a signal cascade for the inflammatory response (Kobe and Kajava 2001). Mechanisms governing intraspecific TLR diversity have been under exploration, including in avian species of conservation concern (Dalton et al. 2016; Gilroy et al. 2017; Grueber et al. 2013). TLR evolution has been thought to be governed primarily by purifying selection (Mukherjee et al. 2009; Roach et al. 2005), but recent research has suggested that TLR evolution also involves positive selection particularly in ligand-binding regions (Alcaide and Edwards 2011; Grueber et al. 2014; Khan et al. 2019). Herein we utilize the Tlr2b locus (present only in Class Aves, a duplication of eukaryotic Tlr2), which evidence suggests forms heterodimers with Tlr1a and Tlr1b to recognize peptidoglycans of Gram-positive bacteria and glycolipids on the surface of protozoan parasites such as Plasmodium spp. (Alcaide and Edwards 2011; Campos 2001; Eriksson et al. 2014; Grueber 2015; Higuchi 2008; Krishnegowda 2005). In a study of a population of Bananaquits (Coereba flaveola) subject to avian malarial parasites, Tlr2b was one of three TLR genes found to have significant associations between allelic composition and infection status (Antonides et al. 2019). Thus, like Mhc1, Tlr2b may play an important role in the evolutionary arms race.

In addition to the immune genes, we utilized the housekeeping gene Polyubiquitin B (Ubb) to serve as a control (i.e., a gene which is not expected to evolve under any positive diversifying selection). Ubb is a conserved eukaryotic precursor to Ubiquitin, an abundant protein involved in regulation of the concentration of cell-cycle signaling proteins and the degradation of damaged proteins. We chose Ubb because the cellular function of the protein is well known, the gene is generally well-characterized, the mode of selection (purifying) is known, and orthologs can easily be identified among related species (Kimura and Tanaka 2010; Nei et al. 2000).

The avian lineages in this study encompass all extant birds within Class Aves, which are divided into the infraclasses Paleognathae (the tinamous and flightless ratites) and Neognathae (all other modern birds), which diverged around 100 MYA in the late Cretaceous. The Neognathae subsequently diverged about 88 MYA into the superorders Galloanserae (ducks, geese, chickens, and kin) and the prolific Neoaves, which underwent a rapid radiation around the Cretaceous–Paleogene (K–Pg) boundary (Jarvis et al. 2014). If the evolutionary rates of immune genes within and among these clades (Paleognathae, Galloansarae, and Neoaves) differ, it would be indicative of differential selection pressures that may be manifested by different mechanisms (e.g., purifying selection versus diversifying selection).

Evolutionary processes, both neutral and non-neutral, are reflected in genome-wide nucleotide substitution rates, which have been estimated as an average background rate of 0.0019 substitutions per site per million years in birds (as compared to the average evolutionary background rate of 0.0027 in mammals), and is not significantly different among clades (Jarvis et al. 2014). The evolutionary rates of genes or portions of genes subject to natural selection differ from that expected under neutrality, and are often quantified by nonsynonymous-to-synonymous substitution rate ratios (ω = dN/dS). Neutral evolution is reflected by ω = 1, an excess of synonymous mutations signals negative (purifying) selection (ω < 1), and an excess of nonsynonymous mutations signals positive (diversifying) selection (ω > 1). To explore mechanisms of avian immune gene evolution, we utilized full coding sequences primarily derived from whole genome sequences and bioinformatics methods that detect transient excesses of nonsynonymous substitutions to synonymous substitutions.

We expected to find evidence of positive diversifying selection in the ligand-binding regions of the adaptive MHC gene (Mhc1) and, to a lesser extent (due to functional constraints), the innate TLR gene (Tlr2b), on evolutionary time scales within and among all major clades of birds, but not in the housekeeping gene Ubb. We additionally expected to identify a differential intensity of immune gene evolution among the avian clades. For example, the highly vagile Neoaves clade occupies a large variety of ecosystems, and therefore likely under more diverse selection pressures, which may lead to more rapid evolution of their immune genes relative to the other two clades [e.g. see O’Connor et al. (2018)]. To these ends, we quantified avian divergence at the representative genes, and tested for selection within the genes as a whole. For those that demonstrated episodic positive diversifying selection, we examined selection at codon sites within the genes and among avian lineages (branches), and finally compared rates of gene evolution among the major avian clades. This study comprises a diverse and relatively large subset of avian species (34 for MhcI, 29 for Tlr2b, and 37 for Ubb, representing all major clades) studied at orthologous loci representing two major immune gene families. Our approach provides enhanced detection of signatures of selection of at the macroevolutionary scale, providing new insights into the evolutionary processes which shape the avian immunogenetic repertoire.

Methods

Sequence selection

We obtained the full coding sequence (CDS) of orthologous sequences for each of the three genes of interest (Mhc1, Tlr2b, and Ubb) for as many avian species as possible. Due to the fragmented nature of draft genome assemblies, the complete sequences are not present in all species with sequenced genomes, and many sequences available from targeted sequencing strategies are not intended to span the full CDS. For Mhc1, we used the CDS for the entire alpha chain. The alpha chain encodes both the variable regions α1 and α2, which forms the peptide-binding groove, and the conserved region α3 which encodes the alpha chain immunoglobulin domain (Janeway 2005; Klein 1986). For the Tlr2b locus, we obtained the full CDS, including the variable region, the extracellular N-terminal LRR (leucine-rich repeat) region which is involved in pathogen recognition, the conserved TIR (Toll/interleukin-1 receptor) region, and the intracellular domain that initiates a signal cascade for downstream immune response (Alcaide and Edwards 2011; Kobe and Kajava 2001). We obtained complete CDS For Ubb, which codes for a polymer of repeat conserved ubiquitin domains (Nei et al. 2000).

First we identified avian orthologs of Mhc1,Tlr2b, and Ubb using protein similarity from the reference chicken genome at www.orthoDB.org (Zdobnov et al. 2017). This corresponded to orthoDB group IDs of EOG090F0AQT for Mhc1, EOG090F02AU for Tlr2b, and EOG090F07RP for Ubb. We then used the chicken protein for each locus as the query on the NCBI nucleotide database (www.blast.ncbi.nlm.nih.gov) using the tblastn algorithm to obtain nucleotide sequences of avian orthologs. The results were hand-curated to choose only those avian orthologs which contain Refseq CDS sequences with all functional domains present as described above, the majority of which were derived from whole genome sequences. When alternative transcripts were present, the one with the longest sequence was chosen. Species in which the Tlr2 sequence was not distinguished between Tlr2a and Tlr2b were not included for that locus.

Multiple sequence alignments

An in-frame codon-based multiple sequence alignment (MSA) was constructed for each gene (i.e., one for Mhc1, one for Tlr2b, and for Ubb) in the TranslatorX server (Abascal et al. 2010). First the nucleotide CDS from each species was translated according to the standard genetic code, and an amino acid MSA numbered according to the chicken sequence was generated with T-Coffee v. 12.00 (Di Tommaso 2011). After protein alignment, poorly aligned or ambiguous regions (e.g. containing many large gaps) were removed with Gblocks (Castresana 2000) using the “less stringent” criteria of a minimum block length of 5 amino acids, a minimum number of sequences for a conserved or flanking position of 55%, and a maximum of 50% for gap positions in the sequences in the final alignments. The curated peptide alignment was then back-translated to the corresponding codon-based multiple nucleotide sequence alignment numbered according to the chicken sequence. This approach to DNA alignment ensures that gaps cannot be inserted between the first or second nucleotide positions of a codon, providing a biologically relevant alignment consisting of in-frame codons.

Construction of individual gene trees

For each gene, codon-based nucleotide gene trees for the selected avian species were reconstructed by providing the multiple sequence alignments to IQ-TREE v.1.6.11 (Trifinopoulos et al. 2016). First we used ModelFinder within IQ-TREE to choose the appropriate model of sequence evolution (Kalyaanamoorthy et al. 2017), as assessed by Bayesian Information Criterion (BIC) support measures among 185 codon models. The best substitution model for each locus was then used to infer the best gene tree using a maximum likelihood approach. For branch support analysis, we performed 1000 replicates for a non-parametric Shimodaira–Hasegawa-like approximate likelihood ratio test (SH-aLRT) (Guindon et al. 2010; Shimodaira and Hasegawa 1999) as well as 1000 alignments of ultrafast bootstrapping (Hoang et al. 2018). A 50% majority-rule consensus tree was constructed for each gene based on 1000 bootstrap trees. Graphical representation of gene trees was performed with FigTree v. 1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/).

To allow comparison of the gene tree and species tree topologies, we created species cladograms corresponding to the species represented in each gene using the NCBI taxonomy database (www.ncbi.nlm.nih.gov/taxonomy). The Robinson–Foulds symmetric distance (RF) was calculated as the number of edges in the species tree but not in the gene tree plus the number of splits in the gene tree but not in the species tree (Robinson and Foulds 1981). Comparison and distance calculations were performed and visualized in Phylo.io (Robinson et al. 2016).

Detecting selection

Selection tests were performed using the HyPhy package via the DataMonkey Adaptive Evolution server (Pond et al. 2004; Weaver et al. 2018), based on the produced codon-based nucleotide MSAs (which allow silent substitutions to be taken into consideration) and the majority-rule consensus ML trees. The tests for selection are each based on calculations of the nonsynonymous-to-synonymous substitution rate ratios (ω = dN/dS) using codon models and Likelihood Ratio Tests (LRTs), with null hypotheses based on neutral evolution (ω = 1); negative (purifying) selection indicated by ω < 1 and positive (diversifying) selection indicated by ω > 1. The HyPhy methods first perform an initial global MG94xREV fit to optimize branch length and nucleotide substitution, which are used as initial parameter values during model fitting for hypothesis testing. An advantage of the inclusion of synonymous rate variation whereby dS is allowed to vary across sites and branches in the phylogeny, permitting more powerful detection of positive selection and reducing false discovery (Weaver et al. 2018). We sought to detect hierarchical signals of selection across entire genes, at particular codon sites within a gene, and at particular branches of the gene trees. Finally, we tested for differential selection among subsets of species to determine relative selection intensity by clade for each locus.

Gene-wide

To identify positive selection anywhere on the three gene trees, we used the algorithm BUSTED (Branch-site Unrestricted Statistical Test for Episodic Diversification) (Murrell et al. 2015). BUSTED fits a codon model with three rate classes, constrained as ω1 ≤ ω2 ≤ 1 ≤ ω3 and estimates the proportion of sites belonging to each ω class. Positive selection is then detected by comparing this model fit to a null model where positive selection is not allowed (ω3 = 1). If the null hypothesis is rejected, then there is evidence that at least one site has experienced positive selection, at least in one branch.

At sites

To determine at which codon sites, if any, positive selection was detected, the algorithm MEME (Mixed Effects Model of Evolution) was used (Murrell et al. 2012). MEME uses a mixed-effects maximum likelihood approach to test the hypothesis that individual sites have been subject to episodic positive or diversifying selection in a proportion of branches. Two ω rate classes were inferred per site, and corresponding weights (the proportion of branches evolving under that rate class) were calculated. The two rate classes were inferred by a single dS value (α) and two separate dN values (β− and β+) per site. In the null model, β− and β+ were constrained to be less than or equal to α, but in the alternative model β+ was not constrained. If β+ > α at a site, and was significant using the likelihood ratio test, positive selection was inferred for the site.

At branches

To test which individual branches (lineages) are subject to selection at one or more sites, we used the algorithm aBSREL (adaptive Branch-Site Random Effects Likelihood) was used (Smith et al. 2015). aBSREL models both site-level and branch-level ω heterogeneity and infers the optimal number of ω classes for each branch using AICc (small sample AIC). The alternative model was compared to a null model in which positive selection was not allowed in the rate classes, and a Likelihood Ratio Test was performed at each branch. p-values at each branch were corrected for multiple testing using the Holm-Bonferroni correction before determining significance (Holm 1979).

To detected differential selection among each of the three clades, we used the algorithm RELAX to determine if selection pressure was significantly relaxed or intensified on those branches relative to the rest of the tree (Wertheim et al. 2015). RELAX uses a random effects branch-site model to test whether a set of test branches evolves under a different stringency of selection than a set of reference branches. For the null model, a codon model with three ω classes was fitted to the entire phylogeny. The test for changes in selection stringency involved the selection intensity parameter k (≥ 0), which served as an exponent to the ω classes in the alternative model. Upon likelihood ratio testing between the null and alternative models, a significant result of k > 1 indicated that selection strength was intensified along the test branches relative to the reference branches (i.e. the clade in question relative to the rest of the gene tree), and a significant result of k < 1 indicated that selection strength was relaxed along the test branches. The RELAX algorithm shows a Type I error rate of 0.052 (Wertheim et al. 2015).

Results

Sequence selection

Our search of available avian sequences produced suitable full CDS sequences for 34 avian species for Mhc1, 29 species for Tlr2b, and 37 species of Ubb. See Supplementary Tables S1-S3 for the summaries of the species and their sequences, and Supplementary Figs. S1-S3 for the fasta CDS sequences. Overall, 18 species common between both immune loci and 13 overlapped at all three loci.

For Mhc1 these included four species of Paleognathae (in three orders), six species of Galloanserae (in two orders), and 24 species of Neoaves (in 10 orders). The lengths of the Mhc1 sequences ranged from 759 to 1422 bp. In Tlr2b the sequences included four species of Paleognathae (in three orders), five species of Galloanserae (in two orders), and 20 species of Neoaves (in 12 orders). The length of the Tlr2b locus ranged from 1464 to 2440 bp. In Ubb, sequences ranged from 462 to 1378 bp, and included three species of Paleognathae (in three orders), five species of Galloanserae (in two orders), and 29 species of Neoaves (in 22 orders). The differences in sample sizes between the clades are reflective of the differences in the real number of species in those clades: of the 10,000 extant birds, ~ 0.5% are in the Paleognathae clade, 4.5% are in Galloanserae, and 95% are in the Neoaves clade (Eo et al. 2009; Sibley and Monroe 1990).

Multiple sequence alignments

The CDS sequences for each gene were translated to protein sequences, which after protein alignment and curation produced amino acid MSAs of 295aa for Mhc1, 783aa for Tlr2b, and 302aa for Ubb (Supplementary Figs. S4–S6). Back translation to DNA sequences produced in-frame codon-based nucleotide MSAs of 885 bp for Mhc1 (Fig. S4), 2325 bp for Tlr2b Fig. S5), and 906 bp for Ubb (Supplementary Figs. 7–9). Each are numbered according to the chicken sequence. For Mhc1, 1.7% of the 295 codon sites were invariant (constant or ambiguous constant) and 267 sites were parsimony informative. For Tlr2b, 13.3% of the 783 codon sites were invariant and 558 sites were parsimony informative. For Ubb, 11.9% of the 302 codon sites were invariant and 219 sites were parsimony informative.

Construction of individual gene trees

For Mhc1, the best-fit codon substitution model determined by BIC within 95% confidence was MGK + F3X4 + R4, and for Tlr2b it was MGK + F3X4 + R3. In this nomenclature, MG represents the codon substitution model of Muse and Gaut (1994), with K representing the addition of a dN/dS rate ratio as well as a transition/transversion (ts/tv) rate ratio. The frequency type F3X4 denotes unequal nucleotide frequencies overall as well as unequal nucleotide frequencies over the three codon positions. The rate type R represents “free rate” heterogeneity, which relaxes the assumption of Gamma distributed rates (Soubrier et al. 2012; Yang 1995). Thus, the differences in the MHC and TLR models reflect different number of categories of rates. In Mhc1, these four categories in the form of (proportion of sites, relative rate) were (0.0999, 0.1638), (0.3670, 0.5176), (0.4459, 1.2027), and (0.0872, 2.9511). The three categories among Tlr2b were (0.3651, 0.2744), (0.5185, 1.0806), and (0.1164, 2.9175).

The best-fit codon substitution model for Ubb was KOSI07 + F + G4. Here KOSI07 represents the empirical codon model of Kosiol et al. (2007), with F representing empirical codon frequencies counted from the data. The rate type G4 indicates the discrete Gamma model of Yang (1994) with four rate categories. These were modeled as: (0.25, 0.2406), (0.25, 0.6047), (0.25, 1.0559), and (0.25, 2.0988).

The best Maximum Likelihood trees were inferred from the codon-based nucleotide alignment and substitution model. For Mhc1 the best ML tree had a total tree length (the sum of branch lengths, each representing number of nucleotide substitutions per codon site) of 13.8, with the sum of the internal branch lengths representing 31.3% of the tree length. For Tlr2b, the total gene tree length was 4.8, with the internal branch lengths constituting 30.4% of the total length. For Ubb, the total gene tree length was 5.3 and 20.2% of the total length was represented by the internal branches. The majority-rule consensus gene trees produced by subsequent bootstrapping are shown in Fig. 1 for Mhc1 and Fig. 2 for Tlr2b.

Fig. 1
figure 1

The gene tree of avian Mhc1 based on the majority-rule consensus maximum-likelihood tree of 34 species, produced by IQ-Tree (Trifinopoulos et al. 2016) and visualized with FigTree v. 1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/). Yellow highlighting indicates species in the Paleognathae clade, pink indicates the Galloanserae clade, and green indicates the Neoaves clade. Branches are labeled with Ultrafast bootstrap supports (Hoang et al. 2018). Scale bar shows molecular distance (number of nucleotide substitutions per codon site)

Fig. 2
figure 2

The gene tree of avian Tlr2b based on the majority-rule consensus maximum-likelihood tree of 29 species, produced by IQ-Tree (Trifinopoulos et al. 2016) and visualized with FigTree v. 1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/). Yellow highlighting indicates species in the Paleognathae clade, pink indicates the Galloanserae clade, and green indicates the Neoaves clade. Branches are labeled with Ultrafast bootstrap supports (Hoang et al. 2018). Scale bar shows molecular distance (number of nucleotide substitutions per codon site)

Comparison of the gene trees with corresponding species trees (cladograms) showed differences at each locus, as visualized in Fig. S10 for Mhc1, Fig. S11 for Tlr2b, and Fig. S12 for Ubb. The RF distance and Euclidean distance between the Mhc1 gene tree and its species tree were 18 and 14.3, respectively, while for Tlr2b the distances were 15 and 13.5. For Ubb the RF distance was 6 and the Euclidean distance was 4.0.

Detecting selection

Gene-wide

For both Mhc1 and Tlr2b, the BUSTED algorithm found evidence (LRT, p value ≤ 0.05) of episodic diversifying selection. Evidence shows that for each gene tree, at least one site on one branch has experienced diversifying selection. For Ubb, no evidence for diversifying selection was found anywhere in the gene (LRT, p = 0.40) (Table 1). Therefore, no expanded testing for diversifying selection was performed with the Ubb sequences.

Table 1 Statistical results of model fits for the BUSTED (Branch-site Unrestricted Statistical Test for Episodic Diversification) algorithm (Murrell et al. 2015) performed with the Datamonkey server (Weaver et al. 2018) showing that for both Mhc1 and Tlr2b, there is evidence of episodic diversifying selection in the gene (the null model of no positive selection, ω3 = 1, is rejected; LRT: p < 0.05)

At sites

The MEME algorithm found evidence of episodic positive/diversifying selection (LRT, p-value ≤ 0.05) at particular codon sites in both immune loci. For the Mhc1 phylogeny, 25 out of 295 codon sites (8.5%) showed evidence for selection (Table 2), and 21 of 783 codon sites (2.7%) showed evidence for selection in Tlr2b (Table 3). These codon sites correspond to the numbering of the curated protein alignments in Figs. S3 and S4.

Table 2 The results of the MEME (Mixed Effects Model of Evolution) algorithm (Murrell et al. 2012) performed with the Datamonkey server (Weaver et al. 2018) for the Mhc1, for the 25 out of 295 (= 8.5%) codon sites under positive/diversifying selection (p ≤ 0.050)
Table 3 The results of the MEME (Mixed Effects Model of Evolution) algorithm (Murrell et al. 2012) performed with the Datamonkey server (Weaver et al. 2018) for the Tlr2b locus, for the 21 out of 783 (= 2.7%) of codon sites under positive/diversifying selection (p  ≤ 0.050)

At branches

The algorithm aBSREL tested each locus for evidence of branch-specific selection, finding multiple lineages under selection in each locus (Table 4). In Mhc1, selection was detected at 13 (of the 65 = 20%) branches/nodes in the phylogeny (LRT, p-value ≤ 0.05). These included the American crow (Corvus brachyrhynchos), Golden-collared manakin (Manacus vitellinus), Red-billed gull (Chroicocephalus scopulinus), Sandhill crane (Antigone Canadensis), North Island brown kiwi (Apteryx australis), Common ostrich (Struthio camelus), Killdeer (Charadrius vociferous), Bananaquit (Coereba flaveola), Blue-crowned manakin (Lepidothrix coronata), Ruff (Calidris pugnax), and Chimney swift (Chaetura pelagica). Additionally selection was found at the node representing the divergence of the Peregrine falcon and the Saker falcon (Falco peregrinus and Falco cherrug), and the node representing the divergence of the Common ostrich from the node that represents the divergence of the Emu (Dromaius novaehollandiae) from the North Island brown kiwi and the Okarito kiwi (Apteryx rowi).

Table 4 Results for the aBSREL (adaptive Branch-Site Random Effects Likelihood) algorithm (Smith et al. 2015) as determined in the Datamonkey server (Weaver et al. 2018)

In Tlr2b, selection was detected at 5 (of the 55 = 9.1%) branches/nodes in the gene tree (LRT, p-value ≤ 0.05). In Tlr2b, these include the Common starling (Sturnus vulgaris), Bald eagle (Haliaeetus leucocephalus), Emperor penguin (Aptenodytes forsteri), and Budgerigar (Melopsittacus undulates), and the node representing the divergence of the Chimney swift and Anna’s hummingbird (Calypte anna). Selection was also detected at the node representing the divergence of the clade consisting of the Bald eagle/Emperor penguin/Adelie penguin (Pygoscelis adeliae) with the clade consisting of the Saker falcon/American crow/Bananaquit/Eurasian blue tit (Cyanistes caeruleus)/Zebra finch (Taeniopygia guttata)/Ground finch (Geospiza fortis)/Saffron-crested tyrant-manakin (Neopelma chrysocephalum).

The results of the RELAX algorithm (Fig. 3 and Table S4) indicate that for Mhc1 the Galloanserae clade evolved at a relaxed rate (K = 0.80) relative to the rest of the tree, while the Neoaves clade evolved at an intensified rate (K = 1.28) relative to the rest of the tree. The Paleognathae did not show significant relaxation or intensification of selection. For Tlr2b, none of the three clades showed significant differences in evolutionary rates among them.

Fig. 3
figure 3

Omega (ω = dn/ds) distributions under the RELAX alternative model for the significant clades within Mhc1 (LRT, p ≤ 0.05) as determined by the RELAX algorithm (Wertheim et al. 2015) in the Datamonkey server (Weaver et al. 2018). In Mhc1, selection intensity is significantly relaxed in Galloanserae and increased in Neoaves relative to the rest of the tree. Table S4 shows the full RELAX model results. K  relaxation/intensification parameter; LT likelihood ratio

Discussion

Our study is unique in comparing molecular evolution among representative of both the MHC and TLR gene families, and our results indicate that the MHC gene is under stronger diversifying selection than the TLR gene. In summary, in both immune genes, we detected episodic diversifying positive selection at the overall gene level, at the lineage level, and at the individual codon site level, with a stronger effect on Mhc1 than Tlr2b. We additionally found evidence of differential selection pressures among the clades for Mhc1, with evolution intensified in the Neoaves but relaxed in the Galloanserae. Diversifying selection at the Mhc1and Tlr2b loci is apparent when their patterns of molecular evolution are contrasted with Ubb, which did not show evidence for selection anywhere in the gene.

Mhc1 and Tlr2b have diverged at different rates, with Mhc1 having almost three times the gene tree length (the sum of the branches which represent substitutions per codon site) of Tlr2b and thus evolving almost thrice as fast (Figs. 1 and 2). The topologies between gene trees and species trees differ more in Mhc1 than Tlr2b based on distance metrics, and both differ more than Ubb (Figs. S10–S12). These differences occur primarily in the Neoaves clade, which rapidly radiated at the Cretaceous–Paleogene (K–Pg) boundary about 66MYA, and incomplete lineage sorting is indicated at these loci (Suh et al. 2015).

We found episodic positive diversifying selection operating on particular codon sites in both loci, with a higher proportion at the adaptive immune system gene Mhc1 (8.5%) than the innate immune system gene Tlr2b (2.7%). Another recent report in birds found that in exon 3 of Mhc1, which encodes a portion of the variable peptide-binding region, 29% of codon sites were under positive selection in a group of 23 passerine families and 14% of sites were under positive selection in a group of 10 non-passerine Neognathae families (Minias et al. 2018). The increased percentage of sites under selection compared to our study is attributable at least in part to the sequences representing only exon 3, and not the full alpha chain including conserved domains, as well as their choice of species, which is consistent with our findings of increased selection intensity in Mhc1 in Neoaves. Studies of interspecific detection of site-level selection in Mhc1 genes in other taxa are scarce. In one, among six species of frogs from three families, approximately 8.0% of sites showed evidence of positive selection in the full alpha chain, based on average results from four different methods (5.0–9.7%) (Kiemnec-Tyburczy et al. 2012), on par with the 8.5% we found in birds.

While we found a larger portion of the Mhc1 locus under diversifying selection than the Tlr2b in Class Aves, the percentage of the Tlr2b gene showing significant positive selection is higher in this study (2.7%) than in most previously reported, such as 1.6% in a comparison of seven species representing (Alcaide and Edwards 2011) and 0.3–3.0% in a comparison of 14 avian species (Grueber et al. 2014), with sequences representing only the variable LRR region and with no species included from the Galloanserae clade. The higher percentage of the Tlr2b gene under positive selection is in accordance with a recent study showing 4.5% of sites under selection in Tlr2b in 42 Neoaves species (Velova et al. 2018).

In mammals, 13 TLRs exist compared with 10 in birds; however, birds recognize a similarly broad range of microbial ligands as mammals. Dimerization between either of the two avian TLR2 s (TLR2a and TLR2b) with the two avian TLR1s (TLR1La and TLR1b) mimics the pattern recognition of mammalian TLR2 dimerized with mammalian TLR1, TLR6, or TLR10 (Brownlie and Allan 2011). Site-level interspecific positive selection in TLR2 is reportedly lower in mammals than in our findings, with the percentage of the gene under selection ranging from 0.8% with 23 mammalian species to 2% with 27 mammalian species (Areal et al. 2011; Huang et al. 2011). A study on 11 primate species found no significant positive selection in TLR2 (Wlasiuk and Nachman 2010). This indicates that the avian Tlr2b could be under stronger evolutionary pressure than mammalian TLR2, despite the higher overall background substitution rate in mammals. However, it has been suggested the functional ortholog of mammalian TRL2 in birds is TLR2A as the ortholog of Tlr2b has been lost in mammals (Huang et al. 2011).

Our branch site tests showed 20% of branches/nodes in MHC and 9.1% in TLR under positive diversifying selection. These data agree with the site-level data in indicating that MHC evolves more rapidly than TLRs, presumably in response to both the intensity of selection and the underlying evolutionary mechanisms (positive vs. negative selection), consistent with adaptive immunity being less constrained than innate immunity. To our knowledge this study is the first to report tests for positive selection at the branch (lineage) level across taxa at either Mhc1 or Tlr2b genes, indicating particular species in which positive selection is detectable (Table 4).

One of the lineages showing episodic diversifying selection at both immune loci was the bananaquit, a Caribbean passerine bird which is a common host of avian malaria. Its demography, evolutionary history, and parasite prevalence have been the subject of previous study (e.g. Bellemain et al. (2008), Ricklefs et al. (2011)) and its genome and immune genes have been characterized and studied at the population level (Antonides et al. 2017, 2019). The Mhc1 and Tlr2b loci are key players in recognizing the ligands of avian malarial parasites, and the positive diversifying selection detected at these loci in the bananaquit lineage (without a priori specification) likely reflects the adaptability of a species whose population continues to expand in the face of haemosporidian parasites (Fahey et al. 2012). The knowledge of the evolutionary rates of immune genes in a species such as the bananaquit could potentially inform long-term viability estimates of species of conservation concern such as Hawaiian honeycreepers (Fringillidae), who, due to the range expansion of dipteran vectors, have been exposed to malarial parasites only recently in their evolutionary history (Foster et al. 2007; Liao et al. 2017).

Avian lineages occupy a wide range of habitat preferences and ecological niches, and their genotypic and phenotypic diversity is a reflection of that (e.g. Jetz et al. (2012); O’Connor et al. (2018)). Variation in evolutionary rates between lineages or clades would suggest evolution under varying selection pressures. Among the three major clades, the average genome-wide background substitution rates vary but are not significantly different: the mean rates and their ranges (in substitutions per site per million years) are 0.0015 (0.0010–0.0020) for Paleognathae, 0.0019 (0.0017–0.0020) for Galloanserae, and 0.0019 (0.0013-0.0036) for Neoaves (Jarvis et al. 2014). We found that Mhc1 evolved at a relaxed rate in Galloanserae, a rate 80% of the rest of the species as a whole. We found the Neoaves Mhc1 evolved at a more rapid rate, about 1.7 times that of the rest of the tree as a whole. This is consistent with a study on 16 Neoaves species and 3 Galloanserae on exon 3 of Mhc1, which found evidence of stronger adaptive evolution in passerines than in non-passerines (Alcaide et al. 2013). We did not find differential selection pressure among clades in the Tlr2b gene. Taken together, these data indicate the adaptive immune gene Mhc1 is more responsive to selection pressures among clades and has more functional significance over evolutionary timeframes than the innate immune gene Tlr2b.

Behavior, life history traits, and demography are all among factors that influence the selection pressures a particular lineage will encounter, and each has the potential to affect the evolution of their immune genes (e.g. (Lee et al. 2008; O’Connor et al. 2018)). The Neoaves underwent the highest known diversification rate among vertebrate radiations upon the K–T mass extinction that opened up an unprecedented variety of ecological niches (Alfaro et al. 2009; Feduccia 2003). Therefore, an increased likelihood of encountering parasites and a wider diversity of parasites could drive more rapid evolution of the Mhc1 gene in Neoaves. Similarly, the Galloanserae and Paloegnathae may have a decreased risk of encountering novel parasites (e.g. more limited dispersal in the flightless lineages of those clades) which may relax selection pressure in Galloanserae (Mhc1). However, as with the interpretation of the rest of our results, the necessarily small sample sizes (i.e., number of species) representing the Galloanserae and Paloegnathae may impede detection of differences within and among them.

The framework utilized herein could be applied to detect varying modes and intensity of selection based on phylogenetic hypotheses. For example, migratory species are likely exposed to a wider variety of pathogens on average than non-migratory species, but migratory behavior has been shown to reduce disease risk (Hall et al. 2014), so one might hypothesize intensification of selection in immune genes in the migratory group. Conservation status is another example: vulnerable and endangered species often exhibit reduced genetic diversity in general (Willoughby et al. 2015), and one might expect relaxation of selection on immune genes in those species relative to species of least concern because of the inefficiency of natural selection in small populations.

Conclusion

This study amassed a relatively large and phylogenetically diverse set of species for inference of avian divergence and signatures of selection at three loci, including representatives from both the adaptive and innate immune gene families. We found strong evidence of episodic diversifying selection in Mhc1 and Tlr2b gene-wide, at codon sites, and at branches (lineages). The control Ubb gene did not show evidence of selection, which is expected given its conserved nature. This indicates our results reflect biological mechanisms: selection promotes amino acid diversity in immune genes in such a way as to widen the spectrum of pathogen ligands that can be recognized. Mhc1 and Tlr2b have diverged at different rates, with Mhc1 having almost three times the gene tree distance of Tlr2b, and over twice the percentage of sites and lineages under selection than Tlr2b, and thus evolving at least twice as fast. Our data provide insights into avian immune gene evolution, consistent with the adaptive immune system playing a more critical role in avian evolution than the innate immune system. Our data are also consistent with the adaptive immune system evolving more rapidly in the Neoaves than in other avian clades.