Introduction

The major histocompatibility complex (MHC) encompasses families of highly polymorphic genes encoding antigen-presenting molecules with a central role in the acquired immune response. MHC class I genes, in association with β2-microglobulin, form molecules that bind peptides derived from intracellular pathogens for display at the cell surface where recognition by CD8+ T cells initiates an immune response (Klein 1986). Residues of the peptide-binding region (PBR) that is formed from the α1 and α2 domains encoded by the second and third exons, respectively, of MHCI genes determine the specificity of peptide binding. Classical MHC class Ia genes are widely expressed and are typified by high variability and positive selection acting upon substitutions at PBR sites, in addition to a large number of alleles maintained by the action of balancing selection. In contrast, nonclassical MHC Ib genes usually show reduced expression or limited tissue distribution, are less polymorphic than class Ia genes, and may have restricted or modified functions (Hughes and Nei 1988, 1989; Rodgers and Cook 2005).

Avian MHCI gene function and evolution is most fully described for the chicken (Gallus gallus, order Galliformes), where the BF–BL region of the “minimal essential” MHC (the B locus of microchromosome 16) contains two classical MHCI genes—the major, or predominantly expressed BF2 locus, and the less variable minor gene, BF1 (Kaufman 1999; Kaufman et al. 1999a, b). Additional MHCI gene family members, including at least one functional nonclassical locus, localize to the secondary MHC-Y region that segregates independently of the classical MHC (Miller et al. 1994; Afanassieff et al. 2001; Hunt et al. 2006; Hee et al. 2010). Expanded MHCI gene numbers are described for two other galloanseriforms: the Japanese quail (Coturnix japonica, order Galliformes) and the mallard duck (Anas platyrhynchos, order Anseriformes). However, only one (duck) or two (quail) genes exhibit the hallmarks of classical MHCI loci (with predominant expression of a single quail locus, Coja-E; Shiina et al. 2006), which suggests that a functional minimization of this gene family, at least in terms of classical gene function, could be common to birds (Shiina et al. 2004; Moon et al. 2005). Predominant expression of a major MHCI locus in galloanseriforms is attributed to tight linkage with the transporter associated with antigen processing genes TAP1 and 2, which is believed to result in the coevolution of alleles with coordinated functional specificities within haplotypes (Kaufman 1999; Kaufman et al. 1999a; Mesa et al. 2004). Expanded numbers of MHCI genes are also found in passerines (perching birds, order Passeriformes), but functional categorization of all gene family members is not currently available, and it is not yet known if a major MHC class I locus is typical within this group (Westerdahl 2007).

Despite continuing investigation of avian MHCI genes, and an increasing interest in their use as markers of adaptive genetic variation in evolutionary ecology research (Edwards et al. 2000; Piertney and Oliver 2006; Babik 2010), there remain many important gaps in our knowledge of MHCI multigene family evolution in birds. Most studies have focused upon the isolation of cDNA transcripts or the variation within the polymorphic PBR exons, with the result that no full-length sequences have been described for non-galloanseriforms. Limited DNA sequence information, combined with differences in gene copy number, the presence of pseudogenes, and the inability to identify orthologous loci from sequence similarity in any but the most closely related species (Hughes and Nei 1989; Shawar et al. 1994) necessitates de novo characterization of MHCI genes in each studied species. However, locus-specific amplification in birds is commonly hampered by the existence of highly conserved sequences surrounding the PBR, and locus assignment based upon PBR sequence similarity is often impeded by conflicting evolutionary signals due to recent duplication, recombination and/or concerted evolution (Edwards et al. 2000; Hess and Edwards 2002; Westerdahl 2007; Babik 2010). Consequently, the evolutionary forces acting upon each gene cannot be disentangled from a larger picture of polymorphism and selective pressures averaged across all loci, and the relative contributions of classical versus nonclassical genes to overall variation cannot be evaluated. Finally, current knowledge of avian MHCI genes remains taxonomically restricted, which limits the potential of comparative methods and a broader understanding of the evolutionary processes acting upon these genes.

In this study, we aimed to isolate full genomic and cDNA MHCI sequences from the red-billed gull (Larus scopulinus, order Charadriiformes), and to survey the variation in PBR-encoding exons for all loci. The long-term study of red-billed gull breeding ecology (Mills 1989) makes this species an excellent choice as the pioneer for MHCI research in charadriiforms (shorebirds, gulls, and allies), as a large number of known family groups are available to facilitate the assignment of alleles to loci through segregation analyses. Characterization of the variation, selection, and expression of individual loci was pursued to assess whether a major classical MHCI locus exists in the red-billed gull. Ancillary methodological goals were to investigate the usefulness of noncoding sequence variation for the design of locus-specific amplification strategies, and to evaluate the applicability of reference-strand conformation analysis (RSCA, supplemental Fig. S1) as a tool for screening MHCI variation in a nonmodel bird species with no existing genetic resources. RSCA is emerging as a popular option for MHC genotyping (e.g., Noakes et al. 2003; Pratt et al. 2006; Worley et al. 2008; Lenz et al. 2009; Babik 2010). However, its use in avian MHC studies is largely limited to the red jungle fowl (G. gallus gallus; Worley et al. 2008; Gillingham et al. 2009), where extensive research of its domesticated descendant, the chicken, provides a priori knowledge of MHC organization and gene content that is unavailable for other species.

Materials and methods

Standard PCR and sequencing conditions

PCR reactions contained final concentrations of 1X commercially supplied buffer (20 mM Tris–HCl pH 8.4, 50 mM KCl), 1.5 mM MgCl2, 0.2 mM each deoxyribonucleotide triphosphate, 0.5 μM each of forward and reverse primers, 5% dimethyl sulfoxide, 1/3 U Platinum Taq DNA polymerase (Invitrogen), and 50–100 ng template DNA in a 25-μL final reaction volume. PCR cycling conditions were: initial denaturation at 94°C for 5 min followed by 35 cycles of denaturation at 94°C for 45 s, annealing for 45 s, and extension at 72°C, and a final elongation step at 72°C for 5 min. A minimum extension time of 30 s was used; for longer PCR fragments, extension times of 1 min per kilobase amplified were used.

Following electrophoresis, PCR products were recovered from agarose gels by band excision and filtered pipette tip centrifugation (Dean and Greenwald 1995). Standard sequencing reactions used 4 μL of recovered DNA with Big Dye Terminator v. 3.1 chemistry (Applied Biosystems), and fragments were resolved by automated capillary electrophoresis on an ABI 3100.

Isolation of MHCI cDNA transcripts

A directionally cloned, pooled cDNA library was commercially prepared (Bio S&T, Montreal, Canada) from spleen tissue harvested from a single adult female, with each library pool consisting of 1,000 clones carrying primary transcripts ligated into pBlueScript SK vector. Hybridization-based screening used a 167-bp probe amplified from a conserved region of exon 4. Following PCR amplification with primers MHC1AF/MHC1AR (Table 1), probe DNA was biotin-labelled using the NEBlot Phototope Kit (New England BioLabs). Hybridizations of colony-growth membranes were performed at 65°C, with subsequent detection of cDNA pools producing positive signals to exon 4 probe using the Phototope-Star Detection Kit for Nucleic Acids (New England BioLabs). DNA from each positive cDNA pool was amplified with primers MHC1OF/MHC1KR and sequenced to identify the MHCI allele present in the pool. Full transcript sequences were obtained from five cDNA pools per allele using primer combinations MHC1OF/M13F and MHC1SR/M13R.

Table 1 Primers used in MHC class I gene characterization and survey of peptide-binding region variation

PCR-based cDNA library screening was used to screen for alleles isolated from genomic DNA that were not detected during hybridization-based cDNA library screening. Amplification with gene-specific primer combinations A3F/R (Lasc-UBA), A4F/R (Lasc-UCA), and A5-6F/R (Lasc-UDA; Table 1) was used to identify cDNA pools uniquely containing transcripts from each of these loci. Full transcript sequences were subsequently isolated from three such pools per gene using each gene-specific primer in combination with either M13F or M13R and an annealing temperature of 60°C in all instances.

Isolation of genomic MHCI sequences

A ∼1,500-bp fragment spanning exons 2–5 and a 1,844-bp fragment covering exon 3 to the 3′UTR were amplified using primer combinations MHC1OF/MHC1KR and MHC1LF/MHC1ER, respectively, from the same individual used for cDNA library construction. PCR products were cloned into pCR 2.1 vector using The Original TA Cloning Kit (Invitrogen), and inserts amplified with M13F/M13R primers and sequenced according to standard methods. Sequences extending to the 5′ and 3′ ends of cloned fragments were obtained from shotgun sequencing and primer walking in cosmid clones containing each MHCI gene (A. Cloutier, unpublished data) from the same individual used for the cDNA and genomic cloning studies.

Screening of peptide-binding region variation

Sample collection and DNA extraction

Blood samples were collected from individually colour- and metal ring-banded gulls at a breeding colony in Kaikoura, New Zealand during the 1992–2004 breeding seasons. DNA was extracted using the DNeasy Blood & Tissue Kit (Qiagen). MHCI variation was screened in 470 individuals, consisting of 45 family groups where chicks had known maternity and paternity and 71 family groups with known maternity only (32 female–female social pairs; 39 male–female pairs with broods of mixed paternity or all extrapair young), with 1.75 ± 0.82 (mean ± SD) genotyped offspring per family group. Parentage assignments made from field observations were verified using multilocus microsatellite genotyping (J. Mills and A. Cloutier, unpublished data).

Reference-strand conformation analysis

A ∼1,300-bp fragment spanning the 3′ end of intron 1 through the 5′ end of intron 3 was simultaneously amplified from all MHCI genes with primers MHCI-F5/EX3R1 in 15 individuals. Following TA cloning and M13F/R amplification of insert DNA, 8–12 clones/individual were sequenced.

Five alleles (Lasc-UAA*03, UAA*04, UBA*01, UCA*01, and UDA*07) were tested as potential fluorescently labelled references (FLRs). A 350-bp fragment spanning exon 3 was amplified from cloned DNA with primers EX3F1/EX3R2 using the standard conditions outlined above, except: (1) the forward primer was 5′ end labelled with VIC dye (Applied Biosystems), and (2) a 5:1 ratio of forward to reverse primer was used. FLR products were purified using the QIAquick PCR Purification Kit (Qiagen) to remove unincorporated primers and stored at 4°C until use.

Products for simultaneous screening of MHCI genes were amplified from genomic DNA using primers EX3F1/EX3R2. Additionally, gene-specific RSCA was performed for Lasc-UCA and Lasc-UDA to allow preliminary locus assignment (i.e., assignment of alleles to Lasc-UCA, -UDA, or -UAA and -UBA). Initial PCRs used standard conditions with primers Lin2F/EX3R1 (Lasc-UCA) or Lin3F/EX3R1 (Lasc-UDA). PCR products were diluted 100-fold, and 2.5 μL of diluted DNA was used in a reamplification with primers EX3F1/EX3R2 and a touchdown profile of 71→68.5°C (−0.5°C/cycle), followed by 30 cycles at 68.5°C annealing temperature.

Two microliters of a 1/25 dilution of FLR and 2.5 μL of genomic PCR product were mixed, and hybridized as follows: 95°C for 10 min, decreased to 55°C (−1°C/sec), 55°C for 15 min, decreased to 4°C at maximum ramp speed, and 4°C for 15 min. Following hybridization, 15 μL of H2O was added to each reaction. Sample plates were prepared for processing on an ABI 3100 by combining the following in each well: 3 μL diluted hybridization mix, 0.3 μL LIZ500 size standard (Applied Biosystems), and 11.7 μL H2O. Samples were run under nondenaturing conditions using 5% Genescan polymer (Applied Biosystems), 1X TBE buffer, and the following run module settings: 30°C oven temperature, 1 kV injection voltage, 30 s injection time, 15 kV run voltage, and 2,700 s run time.

PBR exon sequencing

Direct sequencing of gene-specific amplification products was used to obtain sequences for all alleles identified during RSCA screening. PCRs spanning exon 3 used primer combinations: Lin145F/Lin45-3UTR2 (Lasc-UAA), Lin145F/Lin1-3UTR1 (Lasc-UBA), Lin2F/EX3R1 (Lasc-UCA), and Lin3F/EX3R1 (Lasc-UDA). Whenever possible, homozygous individuals (as determined from RSCA profiles) were used. Otherwise, different heterozygous combinations of alleles were amplified and sequenced, and haplotype assignments made using PHASE v. 2.1 software (Stephens and Scheet 2005).

Exon 2 sequences were also obtained for at least three copies of each RSCA allele. Lasc-UCA and -UDA were amplified with gene-specific primer combinations MHCI-F5/Lin2R and MHCI-F5/Lin3R, respectively. Lasc-UAA and -UBA could not be amplified independently for exon 2. Instead, homozygotes for each gene were co-amplified using primers MHCI-F5/Lin145R and haplotypes inferred using PHASE. Low-frequency alleles that could not be typed in this manner were amplified using allele-specific primer combinations MHCI-F5/Lin45BOV and MHCI-F5/Lin45FX (Lasc-UAA), and MHCI-F5/Lin1-EX3R (Lasc-UBA).

Genotyping of multilocus gene family members is prone to the formation of chimeric sequence artefacts during PCR and cloning procedures (Longeri et al. 2002; Lenz and Becker 2008). We therefore validated all described alleles by ensuring that cloned allele sequences coincided with alleles found during RSCA screening of the same individual, by sequencing at least three copies of each allele originating from different individuals using gene-specific PCR reactions, and by confirming all allele identities and locus assignments with segregation analyses in known family groups.

Data analysis

Analysis of sequence variation

ChromasPro v. 1.5 (Technelysium Pty Ltd.) was used for sequence editing and contig assembly. Sequences were aligned using Clustal W (Thompson et al. 1994) as implemented in BioEdit Sequence Alignment Editor v. 7.0.5.3 (Hall 1999). Sequence variability statistics were calculated in Molecular Evolutionary Genetics Analysis v. 4 (Tamura et al. 2007), including average rates of synonymous (d S) and nonsynonymous (d N) substitutions per site and codon-based Z tests of positive selection according to the Nei–Gojobori method with Jukes–Cantor correction for multiple substitutions (Nei and Gojobori 1986). Allele frequencies of unrelated adults (N = 230) were calculated in Cervus v. 3.0.3 (Kalinowski et al. 2007), and Ewens–Watterson neutrality tests of allele frequency distributions (Ewens 1972; Watterson 1978) were conducted using Arlequin v. 3.5.1.2 (Excoffier et al. 2005).

Genetic Algorithm Recombination Detection (GARD, Kosakovski Pond et al. 2006) was used to identify recombination breakpoints. LDhat v. 2.1 (McVean and Auton 2007) was used to calculate the recombination parameter 4N e r, population-scaled recombination rate (ρ), minimum number of recombination events (Rm), and the Watterson estimate of the mean population mutation rate (θ). Geneconv v. 1.81 (Sawyer 1999) was used to identify homogenized sequence tracts consistent with interlocus gene conversion.

Phylogenetic analysis

The use of a bifurcating treelike structure to describe sequence evolution of PBR-encoding exons may be unrealistic in the presence of conflicting phylogenetic signals resulting from gene conversion and/or recombination reported for avian MHCI genes (Alcaide et al. 2009; Promerová et al. 2009). Instead, phylogenetic network methods that allow the modeling of incompatibilities in the dataset as parallel splits may be warranted. SplitsTree v. 4.10 (Huson and Bryant 2006) was used to construct a neighbor-net phylogenetic network from concatenated exon 2 and 3 nucleotide sequences (540 bp total), using a GTR + Γ substitution model with empirical base frequencies and gamma distribution shape parameter α = 0.17032, as recommended by FindModel (Tao et al. 2009), and 1,000 bootstrap resamplings of the dataset.

Inference of positive selection

OmegaMap v. 0.5 (Wilson and McVean 2006) implements a Bayesian strategy using reversible-jump Markov chain Monte Carlo (MCMC) simulations to co-estimate spatial variation in the selection parameter (ω = d N/d S) and recombination rate (ρ). Analyses were run on concatenated exon 2 and 3 nucleotide datasets based on adult allele frequencies for each gene. Three independent MCMC chains of 500,000 iterations were run for each dataset, with ten random orderings of the data, a thinning interval of 100, and the first 50,000 iterations discarded as burn-in after graphical inspection of the raw output. Objective priors were set for the transition to transversion rate ratio (κ), synonymous transversion rate (μ), and rate of insertion to deletion (φ) by specifying an improper inverse distribution and starting values of 3.0, 0.1, and 0.1, respectively. The spatial variation in selection (ω) and rate of recombination (ρ) were allowed to vary across sites, using an inverse distribution and starting values ranging from 0.01 to 100 for ω and 0.001–100 for ρ, and block sizes of 2 for both parameters.

Results

Isolation and characterization of red-billed gull MHCI genes

Four MHCI genes were isolated from genomic DNA of a single individual and named MhcLasc-UAA, MhcLasc-UBA, MhcLasc-UCA, and MhcLasc-UDA in accordance with proposed nomenclature (Klein et al. 1990), and with alleles at each locus numbered in order of their validation during the RSCA screening procedure. Full-length sequences were obtained for alleles Lasc-UAA*11, Lasc-UBA*01, and Lasc-UCA*01 (GenBank accessions HM008713–HM008715), and a partial sequence covering exon 2 through the end of the 3′UTR was obtained for allele Lasc-UDA*03 (HM008716).

Hybridization-based cDNA library screening detected 77 (of 1440) library pools producing strong positive signals. Allele Lasc-UAA*08 (HM015819) or Lasc-UAA*11 (HM015818) was identified in each positive pool, and full-length cDNA sequences were obtained for each allele. Full-length transcripts for alleles Lasc-UBA*01 (HM015820) and Lasc-UCA*01 (HM015821), and a partial transcript spanning exon 2 through to the poly-A tail for allele Lasc-UDA*03 (HM015822) were isolated by PCR-based re-screening of the library. In all instances, exon and UTR sequences obtained for cDNA transcripts were identical to consensus sequences derived from cloning and primer walking in genomic DNA.

The red-billed gull MHCI genes show the expected eight exons encoding the signal peptide, α1, α2, α3, transmembrane, and cytoplasmic domains. No frameshifts or premature stop codons were observed, and canonical donor–acceptor splice sites (GT/AG) and polyadenylation recognition sequences (AATAAA or ATTAAA) were present in all sequences. Exon sizes are similar to those found in other birds, but longer introns (in particular, introns 1 and 2; supplemental Fig. S2) resulted in total gene lengths from start to stop codons of 4,370 bp for Lasc-UAA, 3,626 bp for Lasc-UBA, and 3,513 bp for Lasc-UCA as compared to a range of 1,957–2,555 bp in galloanseriform species (Japanese quail Coja-B1 [AB078884] and goose Anser sp. isolate G5 [AY387655], respectively).

The region spanning intron 3 to the beginning of the 3′UTR is highly similar among all loci, with a minimum pairwise nucleotide identity of 99% (between Lasc-UBA and -UDA, in 1,382 bp of sequence). There is a single variable site within the first 173 bp of 3′UTR sequence (of 807–857 bp in total) among all four loci. After this point, Lasc-UBA sequence cannot be aligned to those of the other three loci. In contrast, the 3′UTR sequences of Lasc-UCA and -UDA are identical over their entire lengths, and highly similar to Lasc-UAA (98.6% nucleotide identity), excepting a 47-bp deletion occurring 72 bp before the end of the 3′UTR. The first and second introns have much lower levels of nucleotide identity among loci and regions with long insertions/deletions, with the exception of Lasc-UAA and -UBA that share 97.9% sequence identity across the entire second intron (with a total size of 752–754 bp).

Conservation of sequence features with predicted functional or structural roles

Eight residues in the α1 and α2 domains contact mainchain atoms of the bound ligand, and anchor the peptide termini in a sequence-independent manner (Bjorkman et al. 1987; Saper et al. 1991; indicated by dark grey shading in Fig. 1). Classical MHCI genes of human and mouse have no more than two amino acid replacements at these sites, with substitutions tending to be of a conservative nature (Shum et al. 1999). Lasc-UAA*11 shows no deviation from the consensus “YYRTKWYY” sequence (in non-mammalian vertebrates, arginine is substituted for tyrosine at alignment position 87 [human leukocyte antigen (HLA)-A2 position 84]; Kaufman et al. 1994). Lasc-UBA has a slightly disfavoured substitution of asparagine (N) for tyrosine (Y) at position 7 (Betts and Russell 2003). Lysine (K) is substituted for arginine (R) at position 148 of Lasc-UCA, in addition to the transcribed sequence from the great reed warbler shown in Fig. 1, all described alleles in the scarlet rosefinch (Carpodacus erythrinus; Promerová et al. 2009), both nonclassical (YF1, formerly YFV) and classical (BF2) chicken MHCI genes (Afanassieff et al. 2001), and red-billed gull alleles Lasc-UAA*15, Lasc-UBA*03, and Lasc-UDA*08 (identified during RSCA screening, and shown in Fig. 2). In Lasc-UDA, the tyrosine (Y) at position 162 is replaced by cysteine (C). Y→C is a disfavoured substitution type in extracellular proteins (Betts and Russell 2003), and results in an unpaired cysteine within the α2 domain of all Lasc-UDA alleles (Fig. 2, position 157). A conservative substitution of serine (S) for threonine (T) also occurs in alleles Lasc-UDA*02 and Lasc-UDA*04 (Fig. 2, position 140) as well as classical MHCI genes of the chicken (Livant et al. 2004) and scarlet rosefinch allele CaerU*23 (Promerová et al. 2009). Thus, no red-billed gull allele has more than two substitutions from the consensus sequence of mainchain-binding residues, although some nonconservative amino acid replacements are observed.

Fig. 1
figure 1

Amino acid alignment of avian MHCI sequences and reptilian outgroup. The first residue of the α1 domain is considered as alignment position 1. Dots indicate identity with Lasc-UAA, while dashes indicate gaps. Conserved sequence features with predicted functional or structural roles are shown as: peptide mainchain-binding residues (dark grey shading), intra- and interdomain contacts (light grey shading), intradomain disulfide bridges (horizontal lines), N-glycosylation site (bracket), CD8 binding sites (“8”s), critical CD8 co-receptor sites (*), phosphorylated residues in cytoplasmic tail (black diamond) (Bjorkman et al. 1987; Saper et al. 1991; Grossberger and Parham 1992; Kaufman et al. 1994). Sequence sources are: Grca (Florida sandhill crane, Grus canadensis pratensis, AF033106); Coja (Japanese quail, C. japonica Coja-B1, AB078884); Gaga (chicken, G. gallus BF2, AL023516); Mega (turkey, M. gallopavo alpha chain 2, DQ993255); Anpl (mallard duck, A. platyrhynchos UAA, AY885227); Acar (great reed warbler, Acrocephalus arundinaceus, AJ005503); Sppu (tuatara, Sphenodon punctatus, DQ145788)

Fig. 2
figure 2

Alignment of translated amino acid sequences for PBR exons of red-billed gull MHCI genes. Alleles with identical translations are shown as single sequences. Dots indicate identity to the majority rule consensus sequence. Annotation of conserved residues follows Fig. 1, and codons corresponding to PBR residues in human HLA-A2 are indicated by black circle. (Bjorkman et al. 1987; Saper et al. 1991)

Of 18 residues involved in MHCI intra- and interdomain contacts that are highly conserved across vertebrates (Grossberger and Parham 1992), 17 are strictly conserved in all red-billed gull genes (Fig. 1). Threonine (T) at position 10 is also conserved in Lasc-UCA, but is replaced by valine (V) in Lasc-UAA, -UBA, and -UDA. The T→V substitution is considered neutral in extracellular proteins (Betts and Russell 2003) and is also seen in the transcribed great reed warbler sequence.

Residues that form mainchain and domain contacts are also conserved in chicken allele BF2*2101, for which two conformational structures have been experimentally determined (Koch et al. 2007). Predicted PBR residues for these chicken MHCI molecules largely coincide with those of human HLA-A2 (exceptions are chicken PBR residues 61 and 121, which show low variability in the red-billed gull). However, no red-billed gull allele possesses the unusual combination of small residues at both positions 69 and 97, and residues of counterbalancing charge at positions 9 and 24 (basic and acidic residues, respectively; residue numbering follows Fig. 2) described for BF2*2101. It therefore seems unlikely that the enlarged central cavity of the peptide-binding groove and novel charge-transfer system that permits promiscuous binding of peptides of unrelated sequence by BF2*2101 (Koch et al. 2007), is shared by the red-billed gull.

MHCI promoter elements are conserved in the red-billed gull (Fig. 3), although this region was not sequenced for Lasc-UDA. The enhancer A NFκB1 binding site of all red-billed gull sequences is identical to those of human HLA-A and HLA-B (van den Elsen et al. 1998), whereas substitutions occur in other avian sequences. Identifiable interferon-stimulated response element and SXY module elements (van den Elsen et al. 1998) are present in all avian sequences. Deletions are seen in the proximal promoter region of Lasc-UCA, and a point mutation occurs within the CAAT-box, but it is unknown how these might affect the initiation of transcription. cDNA transcripts were isolated for each locus; thus, while these differences may modulate expression level, they do not appear to result in pseudogenization.

Fig. 3
figure 3

Avian MHCI promoter sequences. Alignment position “0” is the first position of the start codon triplet. Dots indicate identity with Lasc-UAA, while dashes indicate gaps. Regulatory element motifs are boxed. Sequence sources are as in Fig. 1

Sequence variation in peptide-binding region exons

Complete multilocus genotypes were obtained for 470 individuals using a combination of RSCA and sequencing of gene-specific amplification products. All patterns of allelic segregation within known family groups were consistent with Mendelian inheritance at each locus. Five reference alleles were tested for use in the RSCA procedure; of these, Lasc-UAA*03 and UAA*04 produced the highest quality data and greatest resolution between known alleles and were used in the screening of test samples (Fig. 4, supplemental Fig. S3). Some alleles could not be resolved by RSCA typing alone (listed in caption to Fig. 4). In particular, five of the nine alleles at locus Lasc-UDA could not be differentiated, and genotype assignments were confirmed using direct sequencing of gene-specific PCR products for all samples at this locus.

Fig. 4
figure 4

Mean electrophoretic mobility of alleles hybridized with FLR-UAA*03 (black triangle) and FLR-UAA*04 (). Alleles not distinguished by RSCA alone are: UAA*16 vs. a combination of UAA*04 and UAA*16; UAA*07 vs. UBA*08; UDA*01, *03, *06, *07, and *09 from each other; and alleles at the upper size range of UBA*01 from those at the lower size range of UBA*02. Mean values for these alleles were calculated following final genotype assignment by direct sequencing of PCR products. Standard deviation error bars do not extend beyond value symbols

Thirty-eight unique nucleotide sequences were identified from exon 3 screening and sequencing. Sequencing of exon 2 revealed one allele (Lasc-UAA*02) with two different exon 2 haplotypes (Lasc-UAA*02a and b), thus raising the total number of alleles to 39 (deposited in GenBank under accession numbers HM025950–HM025988). For both exons combined, 34 unique alleles were found at the amino acid level. When translated, no Lasc-UAA alleles shared identical amino acid sequences across both exons, whereas two to three alleles identical at the amino acid level were found in each of the other three genes (Table 2, Fig. 2). Lasc-UAA has the greatest number of alleles and greatest nucleotide and amino acid variability among alleles (Table 2). Histograms of allele frequency data (Fig. 5) show a much more even distribution for the 16 Lasc-UAA alleles than is found for the other three genes, and the observed homozygosity at this locus is significantly lower than expected under neutrality, consistent with the action of balancing selection (Table 3).

Table 2 Summary of sequence variation and global tests of positive selection for PBR exons
Fig. 5
figure 5

Allele frequencies of red-billed gull MHCI genes

Table 3 Ewens–Watterson neutrality tests of allele frequencies

Thirty of the 38 PBR residues have predicted roles in sequence-dependent peptide binding—of these, 22 (73.3%) show non-overlapping amino acid identities between at least two genes (Fig. 2, Supplemental Fig. S4). These differences are mostly seen in comparisons involving either Lasc-UCA or -UDA, which have multiple PBR residues that are both invariant and locus-specific. Lasc-UCA is invariant at all PBR residues, while Lasc-UDA varies at a single non mainchain-binding site. In contrast, 29 of 30 non mainchain-binding residues show a full or partial overlap in amino acid identity between Lasc-UAA and -UBA, with Lasc-UBA alleles tending to encompass a subset of the variation found in Lasc-UAA.

All measures indicate substantial intralocus recombination within Lasc-UAA (Table 4). Notably, the per-site population recombination rate (ρ) slightly exceeds the estimate of mutation within the population (θ), indicating that both mutation and recombination contribute substantially to total sequence variation within this locus. In contrast, the other three genes show little or no evidence of intralocus recombination as a major force acting to generate allelic diversity.

Table 4 Summary of intralocus recombination analyses for concatenated exon 2 and 3 sequences

Lasc-UAA and -UBA show evidence of four significant interlocus gene conversion events and four additional, although statistically unsupported, regions of sequence identity within PBR exons 2 and 3 (supplemental Fig. S5). Conversely, Lasc-UCA and -UDA have numerous locus-specific substitutions throughout both PBR exons, although regions of nucleotide sequence identity that are possibly indicative of gene conversion cover the 5′ end of exon 3.

These differences in recombination and/or gene conversion are evident in the network reconstruction of PBR exon sequences, where Lasc-UCA and -UDA form well-differentiated locus-specific clades (Fig. 6). Alleles of Lasc-UBA also cluster in a gene-specific manner, although shorter branch lengths and multiple parallel splits indicate greater conflict in the phylogenetic signal differentiating this clade from alleles of Lasc-UAA. Lasc-UAA does not form a single gene cluster, and numerous alternative splits indicate the prevalence of conflicting evolutionary signals among alleles of this locus, consistent with the much higher estimate of intralocus recombination.

Fig. 6
figure 6

Phylogenetic network of relationships among red-billed gull MHCI alleles from concatenated exon 2 and 3 nucleotide sequences. Percentage bootstrap supports are shown for Lasc-UBA, -UCA, and -UDA locus-specific clades

Inference of positive selection

The ratio of nonsynonymous to synonymous substitution (ω = d N/d S) is a widely used measure of the action of selection on coding sequences, with ω < 1 indicative of negative (purifying) selection and ω > 1 serving as a signal of positive (diversifying) selection. Global estimates of selection across predicted PBR sites (excluding those involved in binding peptide mainchain atoms) suggest that only Lasc-UAA is subject to diversification resulting from positive selection (Table 2).

Estimation of the spatial variation in selection can be more biologically meaningful than global tests when positively selected sites are interspersed with residues under strong functional constraint or when a priori restriction of analyses to sites “known” to be likely targets of positive selection (e.g., PBR residues) is based upon multiple sequence alignment rather than experimental determination of protein structure in the focal species (Yang et al. 2000; Hughes and Friedman 2008). OmegaMap analyses indicate that Lasc-UAA has the greatest number of residues with posterior probabilities of positive selection ≥0.95 (Fig. 7), and that a large proportion of these residues have predicted roles in sequence-dependent peptide binding (17 of 24 sites, or 70.8%).

Fig. 7
figure 7

Spatial variation in selection across PBR exons. Point estimates of omega (ω) are plotted as solid black lines on a logarithmic scale, and higher and lower 95% highest posterior probability densities as dotted grey lines. The selection cutoff d N/d S = 1 is shown as a dotted black line. Sites with a posterior probability of positive selection ≥0.9 are indicated by plus sign, those ≥0.95 by double dagger sign, and predicted sequence-dependent PBR residues by black circle. Codon numbering corresponds to Fig. 2

Unlike Lasc-UAA, positive selection does not appear to greatly promote allelic diversification at the other MHCI loci. Six predicted PBR residues of Lasc-UBA showed evidence of positive selection at the 0.95 posterior probability cutoff. However, variation at three of these sites largely coincides with inferred gene conversion tracts between Lasc-UAA and -UBA. It is possible that the observed pattern at these sites reflects an absence of negative selection acting upon these shared PBR residues rather than positive selection, although this interpretation assumes a directionality to gene conversion events that requires further investigation. Lasc-UCA has no PBR sites showing evidence of positive selection, while only a single residue identified for Lasc-UDA has predicted involvement in non-mainchain peptide binding. However, caution should be used in interpreting an excess of nonsynonymous substitution as evidence of positive selection for these loci as, given the low number of observed substitutions, this pattern could equally reflect stochastic variation in d N and d S across codons (Hughes and Friedman 2008).

Discussion

Characterization of red-billed gull MHCI genes

The four MHCI genes presented here are the first complete avian MHCI sequences described outside galloanseriforms (gamebirds + waterfowl), and the first MHC class I genes isolated from Charadriiformes (shorebirds, gulls, and allies). It is likely that these genes represent all intact MHCI gene family members of the red-billed gull, as isolation and screening procedures used three different primer pairs placed in highly conserved regions and targeting different sequence segments. However, additional pseudogenes could exist as gene fragments or orphan exons, as are seen within the Japanese quail B locus (Coja-F, -G, and -H; Shiina et al. 2004). Sequence features typical of antigen-presenting MHCI genes are largely conserved in each red-billed gull gene, and no frameshift mutations, premature stop codons, or inappropriate splice signals were observed. Additionally, cDNA sequences were isolated for each locus, indicating that all are transcriptionally active in spleen tissue.

Only Lasc-UAA transcripts were isolated during hybridization-based cDNA library screening, whereas transcripts for the other three loci were instead obtained during PCR-based screening using gene-specific primers. The hybridization probe was amplified from a conserved region of exon 4 with a single variable nucleotide site among all loci, and was therefore unlikely to result in the biased detection of Lasc-UAA transcripts. Instead, the results suggest that only Lasc-UAA was expressed at a high enough level to produce strong positive signals from cDNA pools (i.e., positive pools likely contained multiple transcripts). In support of this reasoning, 12 of the 77 positive pools contained mixed sequences entirely attributable to transcripts of both Lasc-UAA*08 and Lasc-UAA*11. Major class I genes with high sequence variation and high expression levels across tissue types are described for the chicken and duck, and in each instance are located adjacent to the antigen transporter gene TAP2 (Kaufman 1999; Kaufman et al. 1999a; Mesa et al. 2004). Tight MHCI–TAP2 linkage may result in the coevolution of coordinated functional specificities of haplotypic alleles for these genes, and could explain the occurrence of a single, predominantly expressed MHCI locus (Kaufman 1999; Kaufman et al. 1999a). A similar position and orientation relative to TAP2 was found for Lasc-UAA from cosmid clone sequencing results (A. Cloutier, unpublished data), and Lasc-UAA might therefore represent a major class I gene in the red-billed gull.

In addition to its location relative to TAP2 and inferred higher expression level, the polymorphism and pattern of positive selection of Lasc-UAA are also consistent with its designation as a major classical MHCI gene. Among the four genes, Lasc-UAA has not only the greatest number of alleles and highest amino acid divergence among alleles, but also a more even allele frequency distribution that is suggestive of balancing selection acting to maintain allelic polymorphism, and an excess of nonsynonymous substitution concentrated in residues with predicted involvement in sequence-dependent peptide binding, which is also consistent with balancing selection for PBR diversity acting over longer evolutionary timescales (and resulting in a pattern of positive selection for individual codons, Garrigan and Hedrick 2003; Piertney and Oliver 2006). Both point mutation and intralocus recombination contribute to the observed variation of Lasc-UAA, and the effect of intralocus recombination is evident as multiple parallel splits among alleles of this gene in the network reconstruction of MHCI relationships. High levels of recombination are associated with high exon 3 sequence diversity in the scarlet rosefinch (C. erythrinus), although locus affiliations were unknown in this study (Promerová et al. 2009). Intralocus recombination contributes substantially to the generation of allelic diversity in the Eurasian kestrel (Falco tinnunculus; Alcaide et al. 2009), and to a lesser extent, is also reported for alleles of the major MHCI gene in the chicken (BF2; Hunt et al. 1994; Hunt and Fulton 1998). The relatively large size of Lasc-UAA intron 2 (754 bp) suggests that additional allelic variation could result from recombination within this region. While variation in exon 3 was screened for all individuals, exon 2 sequences were instead obtained from direct sequencing of PCR fragments in a subset of individuals, and the importance of recombination within intron 2 to total sequence variation at this locus remains to be determined.

In the chicken and duck, the occurrence of a major MHCI gene is accompanied by mutational events acting to reduce or eliminate the expression of additional loci, including deletions or nucleotide substitutions in upstream regulatory elements, missing polyadenylation signals, or premature stop codons (Moon et al. 2005; Shaw et al. 2007). In contrast, Lasc-UBA, -UCA, and -UDA each show conservation of sequence features that are typical of functional MHCI genes, and all are transcribed in the spleen. However, differences in polymorphism and selection from those seen for Lasc-UAA suggest that they function as either minor classical class Ia or nonclassical class Ib genes.

Although alleles of Lasc-UBA form a locus-specific clade in network reconstructions of PBR exon nucleotide sequences, substantial overlap occurs between Lasc-UAA and -UBA in the identities of residues with predicted involvement in sequence-dependent peptide binding, with alleles of Lasc-UBA tending to encompass a subset of the variation observed for Lasc-UAA. While this similarity suggests both loci could bind and present foreign peptides in a similar manner, the presumably lower expression level of Lasc-UBA in spleen tissue, in combination with its more limited variation, suggests that this gene is a minor classical class I locus. Both Lasc-UBA and the minor MHCI locus of the chicken (BF1) have fewer alleles, lower sequence variation, and less evidence of positive selection than their “major” classical counterparts, Lasc-UAA and BF2, respectively (Livant et al. 2004; Shaw et al. 2007). However, the eight alleles of Lasc-UBA are less variable than BF1, with only 14 polymorphic amino acid residues observed within the α1 and α2 domains, as compared to 32 variable sites in nine alleles for BF1 (Livant et al. 2004). Furthermore, 16 of these variable BF1 sites have amino acid identities that are not entirely shared with alleles of the major BF2 locus, including a cluster of conserved locus-specific residues within the α1 domain (Livant et al. 2004). In contrast, only three positions in Lasc-UBA encode residues that are not found in any Lasc-UAA allele. Further differences are seen in the upstream regulatory regions, where BF1 alleles have deletions within the enhancer A element or transcriptional start sites of the proximal promoter (Shaw et al. 2007), whereas Lasc-UAA and -UBA share high sequence similarity across the entire promoter region and are identical at all predicted transcription factor binding sites.

Linkage relationships of putative major and minor genes also appear to differ between the red-billed gull and chicken. Serotyping and restriction fragment length polymorphism analyses of offspring from experimental matings suggest that recombination is rare within the BF (class I)–BL (class II) region of the chicken B locus (Skjødt et al. 1985; Hála et al. 1988; Kaufman et al. 1999b), although the identification of a recombination breakpoint within the BF region from DNA sequencing indicates that reshuffling of major and minor MHCI alleles among haplotypes does occasionally occur (Hosomichi et al. 2008). In contrast, although low offspring numbers per family and low polymorphism of Lasc-UBA limited analyses, we detected recombination of Lasc-UAA/UBA parental haplotypes in 7 of 12 red-billed gull family groups, suggesting that haplotypes undergo frequent recombination. An intriguing possibility is that limited Lasc-UBA variation could signal a functional constraint for alleles to retain PBR motifs that are at least somewhat compatible with all TAP translocation specificities, as recombination would frequently reassort alleles of these genes among haplotypes. Alternatively, greater sequence similarity of these red-billed gull loci could reflect a more recent gene duplication event, heightened by sequence homogenization arising from multiple gene conversion events. Fuller knowledge of the function and evolutionary relationships of these loci requires a survey of TAP polymorphism and investigation of major MHCI–TAP coevolution in the red-billed gull, quantitative measurement of gene expression across tissues, identification of epitopes bound by alleles of each gene, and investigation of MHCI genes in other shorebirds to infer the timing of duplication events.

In contrast to the greater sequence variation of the putative classical loci, Lasc-UCA and -UDA showed extremely limited variation. Of the few variable sites, none (Lasc-UCA) or one (Lasc-UDA) corresponded to nonsynonymous substitution of residues with predicted involvement in sequence-dependent binding of ligands, and both genes have a number of locus-specific residues at these sites. Thus, both Lasc-UCA and -UDA possess divergent and mostly invariant binding-groove motifs, while largely maintaining other conserved sequence features of functional MHCI molecules, which suggest that they could be nonclassical MHC Ib genes with restricted peptide-binding specificities, or could present an alternative repertoire of non-peptidic ligands.

Nonclassical MHCI genes with limited ranges of peptide-binding capabilities or differences in the manner of contact made with the bound peptide have been identified in both humans and mice (Shawar et al. 1994; Rodgers and Cook 2005). Afanassieff et al. (2001) suggested that YF1 (formerly YFV), a nonclassical MHCI gene of the chicken Y locus, could also bind peptides in a novel manner due to substitutions at four mainchain-binding positions. More recently, Hee et al. (2010) determined the crystal structure of allele YF1*7.1 complexed with β2-microglobulin. They suggested that the “hybrid” structure of YF1*7.1, consisting of a hydrophobic binding groove situated within a protein architecture highly similar to that of classical MHCI molecules, could allow the binding of non-peptidic ligands, and demonstrated that YF1*7.1 can indeed bind nonself lipids in vitro (Hee et al. 2010). Whether nonclassical genes of the red-billed gull serve a similar function is currently unclear. Although substitutions occur at mainchain-binding sites of both Lasc-UCA and -UDA, no more than two substitutions at these sites occur in any allele and only one change involves a radical amino acid replacement. The structural importance of this radical Y→C substitution that results in an unpaired cysteine within the α2 domain of all Lasc-UDA alleles remains to be determined, although unpaired cysteines within the α1 domain are involved in MHCI homodimer formation in human HLA-B27 (Allen et al. 1999). Greater amino acid identity is seen between chicken YF1*7.1 and classical BF2*2101 than to either red-billed gull locus, whether considering the entirety of exons 2 and 3 (0.581, 0.519–0.525, and 0.491–0.513 for YF1 compared to BF2, Lasc-UCA, and -UDA, respectively), or the 27 residues identified as lining the binding groove of YF1*7.1 (Hee et al. 2010; 0.481 for YF1–BF2, versus 0.444 for YF1–Lasc-UCA, and 0.370–0.407 for alleles of Lasc-UDA). The number of hydrophobic residues occupying these 27 sites in putative nonclassical loci of the red-billed gull (12–13 residues) is also lower not only than YF1*7.1 (16 residues) but also BF2*2101 (14 residues). Thus, while nonclassical loci of both the chicken and red-billed gull may serve to increase an otherwise limited repertoire of bound ligands presented by a major classical MHCI gene, further work is needed to determine the structure, function, and tissue expression patterns of Lasc-UCA and -UDA.

Screening of MHC variation in nonmodel organisms

Investigation of MHCI sequence evolution, and the use of these genes as molecular markers in studies of natural populations, requires an efficient and reliable methodology to screen MHCI allelic variation. RSCA allows the rapid screening of a large sample size with high reproducibility and minimal labour, and has proved effective in studies of MHC genes across a range of vertebrates (e.g., Noakes et al. 2003; Pratt et al. 2006; Worley et al. 2008; Lenz et al. 2009). RSCA is particularly suited to the initial characterization of MHCI variation in nonmodel organisms by targeting a subset of individuals carrying each allele for subsequent sequence determination. Despite the advantages of the RSCA screening approach, and the ability of our protocol to discriminate between some alleles that differed by a single nucleotide, not all red-billed gull MHCI alleles could be distinguished by RSCA alone. Further optimization will require the testing of additional reference alleles, including cloned alleles that incorporate sequence artefacts, to develop a reference panel that can unambiguously identify all alleles.

Consistent with other studies of MHCI sequence variation in birds and reptiles (e.g., Miller et al. 2007; Shaw et al. 2007; Alcaide et al. 2009; Promerová et al. 2009), sequences immediately flanking the PBR exons were highly conserved, and the placement of PCR primers within these regions resulted in simultaneous amplification of all red-billed gull MHCI genes. However, differences in intron 2 and/or the 3′UTR permitted the design of gene-specific primer pairs spanning exon 3 for each locus. By employing direct sequencing of gene-specific amplification products, it was possible to fully resolve all genotypes, to easily obtain sequences of novel alleles detected during the screening procedure, and to confirm the accuracy of allele sequences obtained from cloning. The usefulness of noncoding regions for developing locus-specific protocols necessarily relies on the extent and nature of sequence divergence between duplicate genes, which is in turn dependent upon the time since duplication, mutation rate, extent of recombination and gene conversion, and population-level processes such as genetic drift (Innan 2009). Gene-specific amplification may be impossible for some species (e.g., Westerdahl et al. 2004; Miller et al. 2007) without the use of long-range PCR using primers in adjacent genomic regions, and larger MHCI gene numbers in passerines (Westerdahl 2007) may further complicate research endeavours within this group. Nevertheless, our results indicate that efforts to sequence noncoding regions of MHCI loci could be amply rewarded by the ability to characterize individual loci, and could not only enhance the utility of these genes as molecular markers in evolutionary ecology research, but could also greatly contribute to the study of MHC gene family evolution across the breadth of avian diversity.