Introduction

Major histocompatibility complex (MHC) class II molecules are heterodimeric cell-surface glycoproteins expressed by professional antigen presenting cells, including dendritic cells, macrophages, and B lymphocytes, where they present pathogen-derived exogenous peptides to CD4+ T-cells (Kambayashi and Laufer 2014). The cardinal MHC class II molecules encoded by the human leukocyte antigen (HLA) DQ and DR loci are highly polymorphic, with over 3000 alleles identified (http://www.ebi.ac.uk/imgt/hla/). MHC class II polymorphisms have been associated with many disease conditions, most of which have involvement with immune and/or inflammatory responses (Trowsdale 2011). The MHC class II alpha and beta chain gene structure is well conserved across mammals, with multiple genes comprising the DR and DQ loci (Kelley et al. 2005; Yuhki et al. 2007; Yuhki et al. 2003). Overall, the alpha chain genes are generally less polymorphic than their beta chain counterparts.

The equine MHC region, designated as the equine leukocyte antigen (ELA) complex, was originally defined using classical serological techniques in international workshops (Lazary et al. 1988). The MHC haplotypes identified using the lymphocyte microcytotoxicity assay were based on polymorphism in MHC class I antigens, which are highly immunogenic in normal equine pregnancy (Antczak et al. 1984). Recently, a system for MHC haplotyping based on polymorphic intra-MHC microsatellites has been described (Tseng et al. 2010), and some of these haplotypes have been linked to haplotypes defined serologically. The choice of closely related horses homozygous for the common ELA-A3 haplotype as donors for the equine bacterial artificial chromosome (BAC) library (Gustafson et al. 2003) and full genome sequence (Wade et al. 2009) facilitated characterization of the structure and polymorphism of the equine MHC class I region (Tallmadge et al. 2010; Tallmadge et al. 2005).

By contrast, the equine MHC class II region has received much less attention. Crepaldi et al. (1986) identified equine MHC class II molecules using monoclonal antibodies and demonstrated expression of MHC class II antigens on the surface of resting T lymphocytes. Lazary et al. used mixed lymphocyte cultures (MLC) to show that response to stimulation in MLC experiments segregated in families based on their ELA serotype (Lazary et al. 1980). A few years later, the same group described three class II haplotypes including the ELA-W13 type that is now recognized as the class II region of the ELA-A3 haplotype (Lazary et al. 1986).

Following the publication of full length complementary DNA (cDNA) sequences for DQA and DQB genes from an ELA-A2 homozygous Thoroughbred stallion (Szalai et al. 1994a, 1994b), most subsequent molecular studies of equine MHC class II genes focused on sequencing of the second exon from multiple class II loci using genomic DNA. Investigations of the equine DRA locus identified polymorphism and evidence for selection in DRA that had not been previously reported in other species (Albright-Fraser et al. 1996; Brown et al. 2004; Janova et al. 2009). Studies of DRB exon 2 sequences provided evidence for at least three DRB genes, although the number of loci was believed to vary across MHC haplotypes (Fraser and Bailey 1996). Fraser and Bailey also published work on the DQA locus detailing substantial polymorphism and the identification of a DQA fragment unlinked to ELA (Fraser and Bailey 1998). To our knowledge, this has not been independently verified. Exon 2 sequences for the DQB locus have also been studied resulting in the identification of at least two highly polymorphic loci (Horin and Matiasovic 2002; Villegas-Castagnasso et al. 2003).

In the NCBI horse genome assembly [http://www.ncbi.nlm.nih.gov/genome/145], we identified ten MHC class II loci that appeared to encode expressible genes. We also found these genes in clones of the equine BAC library (Gustafson et al. 2003). We verified the expression of these six DQ and four DR loci in the ELA-A3 haplotype and then amplified and sequenced these genes using cDNA prepared from lymphocytes obtained from horses homozygous for the ELA-A2, ELA-A5, ELA-A9, and ELA-A10 haplotypes, assigning the resulting sequences to their respective loci where possible. The use of MHC homozygous horses from only a few common ELA haplotypes of the Thoroughbred and Standardbred breeds facilitated phasing of the MHC class II genes, although this also reduced the amount of population-level variation that could be detected in this study.

Materials and methods

Experimental animals

Eight ELA homozygous horses representing five independent MHC haplotypes were used in this study (Table 1). The horses were from the herd maintained at Cornell University’s Equine Genetics Center. Animal care was performed in accordance with the guidelines set forth by the Institutional Animal Care and Use Committee of Cornell University.

Table 1 Horses tested in this study

The five MHC haplotypes were originally defined using alloantisera characterized in international workshops (Lazary et al. 1988). MHC homozygosity in these horses was confirmed using multiple independent methods, including selective breeding/segregation, mixed lymphocyte cultures, intra-MHC microsatellite analysis (Brinkmeyer-Langford et al. 2013; Tseng et al. 2010), and direct sequencing of MHC class I (Tallmadge et al. 2010; Tallmadge et al. 2005) and class II genes (this report). Lines of the ELA-A2 and ELA-A3 haplotypes have been bred and maintained at Cornell for over 30 years. At least two horses each of the ELA-A5, ELA-A9, and ELA-A10 haplotypes were used to confirm homozygosity in those haplotypes. All five MHC haplotypes are found commonly in the Thoroughbred and Standardbred breeds (Antczak et al. 1986). Results of microsatellite typing and additional information about the MHC complexes and genetic relationships of the horses can be found in Table S1, Text S1, and Fig. S1.

Horse blood samples and cDNA synthesis

Peripheral blood lymphocytes obtained from external jugular vein blood samples (Antczak et al. 1982) were snap frozen in liquid nitrogen and stored at −80 °C. RNA was isolated from frozen samples using the RNeasy kit (Qiagen, Valencia, CA) following the manufacturer’s protocol. One microgram of total RNA was DNase I (Invitrogen, Carlsbad, CA) treated for degradation of contaminating genomic DNA and then used in first strand cDNA synthesis reactions with M-MLV Reverse Transcriptase (USB, Cleveland, OH) and Oligo-dT primers (Invitrogen) in a final volume of 100 μl.

BAC library screening

The CHORI 241 equine BAC library was constructed in 2001 by Dr. Pieter de Jong at the Children’s Hospital of Oakland Research Institute, Oakland, CA, using DNA from neutrophils of the stallion no. 3474, a Thoroughbred horse bred to be homozygous by descent for the ELA-A3 serotype. The library has approximately 11.8× coverage of the equine genome and contains over 190,000 recombinant clones with an average insert size of 170 KB. The library was screened for MHC class II positive clones using radiolabeled Overgo probes for DQA, DQB, DRA, and DRB loci as previously described (Gustafson et al. 2003). The minimum tiling path through the class II region from Gustafson and colleagues is shown in Fig. 1 and includes the identities of the BAC clones that form the path. The expressed gene loci shown in Fig. 1 include class II loci originally identified by Gustafson et al., along with two additional loci identified from the whole genome sequence (DQA3 and DQB3).

Fig. 1
figure 1

Map of equine MHC class II region on ECA 20. Upper line demarcates the class II region from 32.6 to 33.6 M based on the Broad Assembly v2. Expressed genes are shown directly beneath the ECA 20 ruler. Length of genes is proportional to width of arrows; direction of gene transcription follows direction of arrows. Pseudogenes are noted below expressed genes. Overlapping bacterial artificial chromosome clones (BACs) from the minimal tiling path are labeled in horizontal boxes. Microsatellite loci are shown in vertical boxes: EqMHC1, M. Binns personal communication. COR112, COR113, COR114—Tseng et al. 2010. UM011—GenBank acc. no. AF195130. See Table S1 for details of microsatellites

Full sequencing of BAC 288J19

Prior to the availability of the whole genome sequence of the horse in 2007, the full sequence of a 207 KB BAC clone from the published contig of the horse MHC was characterized. BAC 288J19 was selected for its positive hybridization signals for the DQA, DQB, DRA, and DRB loci. BAC DNA was purified, sonicated, size selected, and sub-cloned into the pUC18 plasmid in preparation for sequencing. Sequencing was carried out using an Applied Biosystems DNA Sequencer. The resulting sequence data was assembled using Staden software. Sequencing of other BACs in the class II minimum tiling path was beyond the scope of this project; however, sub-clones of selected BACs were used to examine fragments of genomic sequence associated with class II loci not found in BAC 288J19.

Next generation/Illumina sequencing of selected ELA homozygous horses

DNA from one homozygous horse for each of the ELA haplotypes A2, A5, A9, and A10 was sequenced using Illumina technology (Table 1). In short, 1 μg of genomic DNA was sheared to 200–700 bp size distribution by Adaptive Focused Acoustics using a Covaris E220 instrument (Covaris, Inc., Woburn, MA) under the following conditions: 50 μl total volume, 10% duty cycle, 175 intensity, 200 cycles per burst, and 50 s in frequency sweeping mode. The remainder of the library preparation followed manufacturer’s protocol as described in NEBNext DNA Library Prep Master Mix Set for Illumina (catalog no. E6040L, New England Biolabs, Inc., Ipswich, MA). Briefly, the sheared DNA was end-repaired and A-tailed to generate blunt ends, then ligated to Illumina compatible adaptors, and then followed by size selection using Agencourt AMPure XP Beads (Beckman Coulter, Inc., Indianapolis, IN); finally, the adaptor ligated DNA fragments were subjected to PCR enrichment. The quality of the final next generation sequencing (NGS) libraries was evaluated on Agilent DNA 1000 chip using a 2100 Bioanalyzer instrument (Agilent Technologies, Inc., Santa Clara, CA); each sample showed a narrow distribution with a peak size approximately 300 bp. Each sample was sequenced in one lane on the Illumina HiSeq 4000 in a paired-end 150 bp run (300 cycles). Reads were aligned to the EquCab2.0 whole genome sequence (WGS) of the horse (Broad Institute, Cambridge, MA) using BWA (Li and Durbin 2010). After alignment, Samtools (Li et al. 2009) was used to convert the SAM output to BAM format and then sorted to index the BAM file. Reads were visualized using the Integrative Genomics Viewer (IGV) (Broad Institute). Read files were submitted to the SRA database at NCBI under accession SRP082688 (Table 1). There was an average of 546 million reads (range 523–584 M) and an average of 30× coverage (range 29–32×) for the four horses that were sequenced.

RT-PCR primer design

Locus-specific primers for the ten expressed equine DQ and DR genes were developed in three steps. First, we used the sequences obtained from BAC 288J19 to design primers for the DRA, DQA1, and DRB1 genes (the DQB locus in BAC 288J19 is a pseudogene). These primers were tested using lymphocyte cDNA from the BAC library DNA donor, horse no. 3474. Second, we used sequences obtained from sub-cloning and sequencing of other DQ and DR loci identified in the BAC library contig (Gustafson et al. 2003). The new DQB and DRB sequences were compared to BAC 288J19 gene sequences and aided the design of primers specific for the new loci. Third, we used the WGS when it became available in combination with our existing sequences to design primers for the DQA2, DQA3, and DQB3 genes. With the exception of the DQB3 locus, all primers are located in the 5′ and 3′UTRs of the target genes (Table S2). In order to obtain locus-specific primers for DQB3, it was necessary to locate the primers within the coding sequence. The amplified DQB3 sequences are missing the leader sequence and a few nucleotides at the 3′ end of the transcript. PCR primer oligonucleotides were designed using Primer3 software on the web (http://frodo.wi.mit.edu/primer3/) and then synthesized by Integrated DNA Technologies (Coralville, IA).

MHC class II gene amplification, cloning, and sequencing

cDNA was synthesized from RNA of peripheral blood lymphocytes from eight MHC homozygous donor horses (Table 1) as described above and used in RT-PCR reactions. RT-PCR amplification of DQ and DR loci was carried out using Pfu DNA polymerase (Stratagene, La Jolla, CA) in 25 μl reactions containing 1× PCR buffer, 0.4 mM each dNTPs, 0.6 μM each primer, 2.5 μ Pfu, and 2 μl of template cDNA. Amplification conditions were 1 cycle 95 °C, 45 s; 33 cycles 95 °C, 30s; 58 °C, 30s; 72 °C, 2 min; and 1 cycle 72 °C, 10 min, using a Eppendorf Mastercycler (Eppendorf, Hamburg, Germany). PCR amplicons were purified using the Qiaquick PCR Purification kit (Qiagen) and cloned into the pCR®4 Blunt TOPO vector (Invitrogen) following the manufacturer’s protocol. Sequencing reactions were carried out at the Cornell University Life Sciences Core Laboratory Facility’s DNA Sequencing Center using an ABI 3700 automated sequencer (Applied Biosystems).

Sequence analysis

Sequences derived from RT-PCR were trimmed and assembled using VectorNTI software (Invitrogen). Full-length sequences were then aligned using the MegAlign package (DNASTAR, Inc., Madison, WI). All sequences were obtained multiple times and often from multiple PCRs. Phylogenetic trees for each locus (DQA, DQB, and DRB) were constructed using the neighbor joining method (Saitou and Nei 1987) in the MEGA version 4 program (Tamura et al. 2007). To determine the level of support for each node, bootstrap resampling was performed with 10,000 replications. An HLA orthologue for DQA, DQB, and DRB was used in each analysis as an out-group to root the phylogenetic tree.

Nomenclature

Newly discovered MHC class II sequences were originally given local names reflecting the locus and order in which they had been discovered. These loci were then renamed to reflect their physical orientation as displayed by the NCBI annotation of the equine assembly v.2 from the Broad Institute. Sequences have been officially named to adhere to approved MHC nomenclature (Ellis et al. 2006), e.g., Eqca DQA1*00101, and all sequences have been deposited into both GenBank and the IPD MHC database (http://www.ebi.ac.uk/ipd/mhc/). In this naming convention, Eqca refers to Equus caballus, DQA1 refers to the DQA1 locus, and *00101 refers to the allele name. The first three digits of the *00101 designation refer to the major type of an allele. Major types are distinguished by having more than four variants in the predicted amino acid sequence of an allele when compared to another major type, thus *00101 and *00201 would differ by more than four amino acids. The last two digits refer to the minor type of the allele and represent variants that have four or less non-concurrent residues. Alleles *00101 and *00102 would differ by four or less predicted amino acids. The ELA-A3 allele possessed by horse no. 3729 and thus represented in the equine WGS was always named the *00101 allele. All other alleles were named by comparison to the no. 3729 reference allele. Furthermore, subsequent alleles were always named using the order of ELA-A2, ELA-A5, ELA-A9, and ELA-A10. Table 2 and Table S3 list the named alleles characterized for each haplotype. Using the DQB1 locus as an example, the order of the alleles *00101 through *00501 represent ELA-A3, ELA-A2, ELA-A5, ELA-A9, and ELA-A10, respectively.

Table 2 MHC class II genes and alleles in five horse ELA haplotypes

The name given to the DRB3 allele from the ELA-A2 haplotype is reported here as Eqca DRB3*001010001. The previously mentioned naming rules apply to this allele as well, but supplementary digits have been added to describe additional polymorphism between this allele and the Eqca DRB3*00101 allele shared with the ELA-A3, ELA-A5, and ELA-A9 haplotypes. The sixth and seventh digits (00) represent the presence of synonymous changes within the coding sequence (none in this case), and the eighth and ninth digits (01) represent differences in a non-coding portion of the gene, which in this case, a single nucleotide change in the 5′ untranslated region (UTR) (data not shown).

Results

Genomic organization of the equine MHC class II region

The availability of the equine WGS enabled us to examine the equine MHC class II region at the sequence level and provided a physical location for and associated annotation of class II region genes. The equine WGS contains ten apparently functional DR and DQ genes: one DRA, three DRB, three DQA, and three DQB genes. In addition, the WGS assembly also contains four pseudogenes from the DQ and DR families (DQA4Ψ, DQB4Ψ, DRB4Ψ, and DRB5Ψ) (Fig. 1; Tables 2 and S2). The pseudogenes have insertions and/or deletions that render them non-functional.

We compared the gene content described in the horse BAC MHC study (Gustafson et al. 2003) with the annotated equine sequence assembly from NCBI [http://www.ncbi.nlm.nih.gov/genome/145]. Analysis of gene content by overgo hybridization in the BAC library study underestimated the number of loci, particularly when two genes of the same family were in close proximity to one another, as is the case for DQA2 and DQA3 and DQB2 and DQB3.

Expression of DQ and DR genes in the ELA-A3 haplotype

Using the resources of the equine BAC library and WGS, we were able to validate expression of the ten apparently functional DQ and DR genes in the ELA-A3 haplotype. By identifying the genomic sequence of each of the genes from BAC or WGS sources, predicting the messenger RNA (mRNA) sequence for the same locus, and comparing that to the actual sequence we amplified from lymphocyte cDNA, we were able to confidently characterize each gene. The messenger RNA coding sequence for equine alpha genes consists of 765 (DRA) or 768 (DQA) nucleotides across 4 exons, which encode a protein of 255 (DRA) or 256 (DQA) amino acids, respectively. The coding sequence for equine beta chain genes consists of 798 (DQB) or 801(DRB) nucleotides across 6 exons and a protein of 266 (DQB) or 267 (DRB) amino acids, respectively.

With one exception, our locus-specific primers for each gene amplified a single sequence from cDNA isolated from lymphocytes of horses homozygous for the ELA-A3 serotype (horses no. 3474 and no. 3729), and these sequences matched the underlying genomic DNA sequence. This information allowed us to determine expression of the ten MHC class II genes in other ELA haplotypes. The single exception was found in the DRB2 gene in horses carrying sub-haplotypes of the serologically defined ELA-A3 haplotype that is observed commonly in horses of the Thoroughbred breed. Three sub-haplotypes defined by intra-MHC microsatellites have been described. The Thoroughbred mare Twilight (horse no. 3729), the DNA donor of the horse reference genome sequence, is homozygous for the ELA-A3a sub-haplotype (Tseng et al. 2010; Wade et al. 2009). The closely related Thoroughbred stallion Bravo (horse no. 3474), the DNA donor for the equine bacterial artificial chromosome (BAC) library, is heterozygous for the ELA-A3b and ELA-A3c haplotypes (Tseng et al. 2010). The ELA-A3a and ELA-A3b sub-haplotypes carry the DRB2*00101 allele, while the ELA-A3c sub-haplotype carries the DRB2*00201 allele (Tables 1 and 2; Figs. 3 and S1).

Expression and polymorphism of DQ and DR genes in other haplotypes

After refining our primer design on lymphocyte cDNA from horse no. 3474, we tested the primers on a cohort of five Thoroughbred and Standardbred horses. The horses had been determined to be homozygous for the MHC by both serological and mixed lymphocyte culture assays (see the “Materials and methods” section; Table 1) and represented the ELA-A2, ELA-A5, ELA-A9, and ELA-A10 haplotypes (Tallmadge et al. 2010). The ELA-A2 Thoroughbred stallion (horse no. 0834) was the donor for multiple MHC class I and II sequences in the public domain (Barbis et al. 1994; Carpenter et al. 2001; Szalai et al. 1994a, b). We amplified only a single sequence at any MHC class II locus from each horse of these four haplotypes (Tables 2 and S3). Each haplotype was comprised of a unique, independent complement of alleles, although there was some sharing of a few alleles between different haplotypes.

We were only able to amplify DQA3 locus mRNA sequence from the ELA-A3 haplotype and not from the other haplotypes tested, despite multiple attempts. After these unsuccessful attempts, we used NGS reads of genomic DNA from our ELA-A2, ELA-A5, ELA-A9, and ELA-A10 homozygotes to examine the locus at the genomic level. For the ELA-A2, ELA-A5, and ELA-A9 haplotypes, we observed high coverage of the locus. For ELA-A10, few reads aligned to the DQA3 coordinates, and in many areas of the locus, no reads aligned at all, leading us to speculate that the DQA3 gene might not be present in ELA-A10.

For ELA-A2, ELA-A5, and ELA-A9, we observed that each haplotype possesses mutations in the DQA3 sequence that encodes premature stop codons in the gene. ELA-A5 possesses a c/t mutation in the DQA3 exon 2 region at chr20:33,105,088 that changes the predicted amino acid from arginine to a stop codon. ELA-A9 has a 1 bp deletion nearby at chr20:33,105,101 that would cause a frame shift and subsequent stop codons in the predicted mRNA transcript. Finally, ELA-A2 has a g/t mutation in exon 4 at chr20:33,106,153 that changes the predicted amino acid from a glutamic acid to a stop codon. Each of these mutations was confirmed by Sanger sequencing. If the genes are transcribed into messenger RNA, it is unlikely that they would make a mature protein.

Analysis of the predicted amino acid alignments generated for the DQA, DRB, and DQB genes identified locus-specific residues for each gene in this cohort of five MHC haplotypes. In the case of DQA and DRB, there were multiple unique residues for each gene, and the sequences were clustered in the resulting phylogenetic trees in a locus-specific manner (Figs. 2a, b and 3a, b). For DQA and DRB, there is one highly polymorphic locus (DQA1 and DRB1) and other less polymorphic loci (DQA2, DRB2, DRB3). At the DRA locus, we identified four unique alleles from the five haplotypes that we examined (Table 2).

Fig. 2
figure 2

Amino acid alignment and phylogenetic tree of ELA-DQA genes. a Amino acid alignment of ELA-DQA genes showing the first three exons. Locus-specific residues are boxed. b Phylogenetic tree of ELA-DQA genes. Trees were generated using the neighbor-joining algorithm in MEGA with predicted amino acid sequences as input. DQA1 and DQA2 alleles segregate on distinct branches of the tree. DQA3 is a separate branch off of the DQA1 cluster. Genomic evidence confirms DQA3 as a unique locus

Fig. 3
figure 3

Amino acid alignment of ELA-DRB genes. a Amino acid alignment of alleles from three ELA-DRB genes. Locus-specific residues are boxed. b Phylogenetic tree of ELA-DRB genes. Trees were created using the neighbor-joining algorithm in MEGA with nucleotide sequences as input. Codons predicted to contact antigen have been removed prior to alignment. Each gene locus separates cleanly in this tree. Note: Two DRB2 sequences were detected in different horses homozygous for the ELA-A3 serological haplotype. The MHC haplotypes of these horses can be distinguished using intra-MHC microsatellite markers (Table S1). The haplotypes associated with this DRB2 polymorphism are designated ELA-A3a (DRB2*00101) and ELA-A3c (DRB2*00201). All other expressed MHC class I and II genes of these haplotypes are identical

For the DQB genes, we observed only one unique amino acid residue in the first three exons for both DQB1 and DQB2 (Fig. 4). The lack of unique residues reduced the clarity of the associated trees (Fig. 5a, b). By analyzing the nucleotide coding sequence minus the codons predicted to contact antigen (Fig. 5a) or the 3′UTR sequence only (Fig. 5b), we identified molecular signatures for the DQB1 and DQB2 loci. Additional methods (minimum evolution, maximum parsimony) using two tests and options (bootstrap, interior branch test) and multiple models (Jukes–Cantor, Kimura 2-parameter, Tajima–Nei, etc.) yielded virtually the same trees. Additional full-length sequence at the DQB3 locus and additional alleles from other haplotypes at all three DQB loci may resolve the trees more clearly.

Fig. 4
figure 4

Amino acid alignment of ELA-DQB genes. Locus-specific residues are boxed. Only exons 13 are shown

Fig. 5
figure 5

Phylogenetic trees of equine DQB. a Phylogenetic tree of ELA-DQB genes with codons predicted to contact antigen removed. Trees were created using the neighbor-joining algorithm in MEGA with nucleotide sequences as input. DQB2 (lower branch) and DQB3 (upper branch) are defined clearly in this tree, but DQB1 is not. b Phylogenetic tree of 3′UTR sequences for ELA-DQBs 1 and 2. DQB1 clusters into a unique locus with high confidence in this tree (upper branch). 3′UTR sequence for the DQB3 gene was not available for analysis

Two earlier studies characterized a full length DQA and DQB gene from the ELA-A2 haplotype (Szalai et al. 1994a, b). The same homozygous stallion was used in this study. For DQA, we only amplified one ELA-A2 sequence in this study, allele DQA1*00201; it matched the sequence from Szalai (GenBank acc. no. L33909) at the level of 99%. The DQB sequence reported by Szalai and colleagues (GenBank acc. no. L33910) matched the ELA-A2 DQB2*00201 allele reported here at the level of 99%. It is likely that the differences observed between the two studies can be attributed to advances in sequencing technology over the last two decades.

Gene conversion event in ELA-A9 DRB genes

We have observed a potential gene conversion event in the ELA-A9 haplotype for the three expressed DRB genes (Fig. 6) characterized in this study. In our sequence analysis, we detected a common motif in a DRB3 allele (DRB3*00101) that is identical at the corresponding nucleotides to the DRB2 allele found in our ELA-A9 horse (A9_DRB2*00401). However, no other DRB2 alleles identified in our study share this motif, and in fact, they are all identical to one another except for one nucleotide change observed in the ELA-A2 sequence (A2_DRB2*00301). In addition, a similar sequence is also found in the ELA-A9 DRB1 sequence (A9_DRB1*00401), and it varies distinctly to the nucleotides possessed by the other four haplotypes at these corresponding nucleotides.

Fig. 6
figure 6

Nucleotide sequence supporting a possible gene conversion event for the ELA-A9 DRB genes. Partial nucleotide sequence of ELA-DRB exons 1 and 2 showing a segment of nucleotides common to the DRB3 locus (bottom sequences) that are also found in the ELA-A9 DRB2 sequence (A9_DRB2*00401). A similar motif is also found in the ELA-A9 DRB1 allele (A9_DRB1*00401). Codons that are considered to contact antigen are boxed in the reference A3_DRB1_*00101 allele

Discussion

This study represents the first comprehensive investigation of full-length MHC class II sequences in the horse. We began by validating the expression of ten MHC class II DQ and DR genes found in the ELA-A3 haplotype. The mare that provided DNA for the equine WGS, horse no. 3729 (Wade et al. 2009), is homozygous for the ELA-A3 serotype and the ELA-A3a microsatellite-defined MHC haplotype and identical by descent for the MHC. We developed locus-specific primer sets that amplified the entire coding sequence of each gene, with the exception of DQB3, where a near full length sequence was obtained. The ten genes were amplified from cDNA templates developed from peripheral blood lymphocytes. The structural signatures of exon length and polymorphic regions of the equine class II genes are consistent with other mammalian species.

We compared the structural organization of the ELA class II region from the WGS with the model put forward in the BAC library study (Gustafson et al. 2003). The maps produced by the two methods show good correlation in the order and number of loci. However, the added resolution of the WGS identified additional loci that were not identified in the BAC library study due to the close physical proximity of the genes (DQA2 and 3, DQB2 and 3). Furthermore, mapping of the end sequences for the BACs in the class II contig via BLAST showed a high level of agreement between the two maps.

Tseng and colleagues reported three separate microsatellite haplotypes for the ELA-A3 horses in the Cornell MHC defined herd, with the polymorphism located in the class II region (Tseng et al. 2010). Horse no. 3729, the DNA donor for the WGS, is homozygous for the ELA-A3a microsatellite variant, while horse no. 3474, the BAC library DNA donor, is heterozygous for the ELA-A3b and ELA-A3c haplotypes. We found minor sequence differences only in the DRB2 gene of the ELA-A3c sub-haplotype horse no. 3474 but not in any of the other MHC class II genes carried by this horse. The DRB2 sequences of the ELA-A3c sub-haplotype, designated DRB2*00201, had three nucleotide changes in the second exon compared to the DRB2*00101 allele of horse no. 3729 (ELA-A3a sub-haplotype), resulting in changes to two consecutive predicted amino acid residues, plus a few other changes outside of exon two (Tables 2 and S3; Fig. 3). Thus, the microsatellite variation in horses carrying ELA-A3 haplotype appears greater than the MHC class II structural gene polymorphism. No other MHC gene sequence polymorphism has been identified within haplotypes in the Cornell herd of homozygous ELA-A3 horses. Providing similar results, the ELA-A10 horse that we tested was heterozygous for microsatellites in the distal MHC class II region (Tables 1 and S1) but had no detectable variants in any expressed MHC class II genes (Tables 2 and S3).

In this study, we took advantage of the availability of horses homozygous for five MHC haplotypes to define the phase of the expressed MHC alleles. In the horse, the DQA, DRB, and DQB loci have at least one highly polymorphic locus for each gene. For DRB, a pattern similar to the human was observed, with one highly polymorphic locus (DRB1), and in the case of the horse, two other less polymorphic loci. In human, the DRB1 locus has over 1800 alleles, while the next most polymorphic locus, DRB3, has only 77 alleles (http://hla.alleles.org/nomenclature/stats.html). For DRB2, we observed five alleles, but they only represent three major types, defined as alleles that differ by more than four amino acids. All five alleles at DRB1 identified in this study are major types, with more than four amino acid changes observed when comparing any two alleles. For DRB3, we observed only two alleles. Horse DQA has a similar pattern to DRB, with DQA1 showing higher polymorphism than the DQA2 locus (Table 2).

The DQB gene sequences we obtained had a higher level of polymorphism at each of the three loci compared to the equine DQA and DRB loci. In the cow, sequencing of DQB exon 2 in a small cohort of British Friesian cattle identified eight previously unidentified DQB alleles, providing supporting evidence for extensive DQB polymorphism in cattle (Nasir et al. 1997). However, one DQB sequence was found in 21 of 22 cows tested, indicating that a locus with very low polymorphism also exists, at least in the Friesian breed.

Four horse DRA exon 2 alleles have been reported to date (Albright-Fraser et al. 1996; Brown et al. 2004; Janova et al. 2009). Three of these alleles (DRA*0101, *0201, *0301) match alleles found in our results. The fourth exon 2 allele, DRA*JBH11, was not found in our cohort. Full-length sequences were named with current convention; thus, the previously reported exon 2 alleles DRA*0101, *0201, and *0301 are homologous to the full-length sequences Eqca DRA*00101 (shared by ELA-A2 and ELA-A3), Eqca DRA*00103 (found in ELA-A9), and Eqca DRA*00102 (found in ELA-A5), respectively. A novel fourth allele named Eqca DRA*00104 was identified in the ELA-A10 haplotype. This allele had not been reported before. Although low, this level of polymorphism at the DRA locus is higher than reported in most other species, with the exception of other members of the genus Equus (Kamath and Getz 2011; Vranova et al. 2011). Furthermore, Janova and colleagues demonstrated that this polymorphism is associated with positive selection at the DRA locus, which may have functional implications for the host (Janova et al. 2009).

With the exception of DQA3, which was detected only in the ELA-A3 haplotype, and which may be a pseudogene in many haplotypes, we found conservation of the other nine expressed MHC class II genes in the five haplotypes we examined. This is in sharp contrast to the equine MHC class I region, which appears to show large differences in MHC class I gene content between haplotypes (Tallmadge et al. 2010). Because the horses and MHC haplotypes used in the present study were also characterized by Tallmadge et al. (2010), the two reports provide a complete phased accounting of expressed MHC class I and class II genes of five common horse MHC haplotypes. Based on our results, we would predict that the same level of conservation may hold true across the majority of ELA haplotypes.

Our focus on five well-characterized MHC haplotypes has determined variation across all expressed MHC class II loci in only these few haplotypes. However, this information can be used to interpret new MHC class II sequences from additional haplotypes. How much of the variation in the equine MHC is attributable to the evolution and maintenance of independent alleles versus recombination of existing haplotypes? Our data indicates that the studied haplotypes are largely independent. Early data from serological investigations (Antczak et al. 1986) and more recent population studies performed using polymorphic intra-MHC microsatellites (Tseng et al. 2010) suggest that there is likely to be extensive genetic variation in the equine MHC class II region. Tseng and colleagues identified 50 microsatellite haplotypes in a study of over 350 horses. Only about 50% of the microsatellite-defined haplotypes were associated with serologically defined ELA types, and several common ELA types were split into sub-haplotypes by the microsatellites (Tseng et al. 2010). The classical ELA serological assays, which are based largely on alloantibodies raised against paternal MHC class I antigens as a result of pregnancy, probably underestimate MHC variation, while, as described here, the microsatellite-defined variation may overestimate variation in MHC class II structural genes.

Finally, our study identified the existence of a potential gene conversion event for DRB genes in the ELA-A9 haplotype (Fig. 6). Gene conversion is a mechanism whereby novel polymorphism is created in a recipient gene from a template of nucleotides originating in a donor gene (Hogstrand and Bohme 1999). Gene conversion events have been reported in MHC class II loci of the pig (Brunsberg et al. 1996) and sheep (Hickford et al. 2004), and a recent study in grouse reports a significant role for gene conversion in maintaining diversity in the class II region of that species (Minias et al. 2016).

Disease association studies have linked the horse MHC class II region to several important conditions, including uveitis (Fritz et al. 2014), insect bite hypersensitivity (Andersson et al. 2012; Klumplerova et al. 2013), and sarcoid skin tumors (Lazary et al. 1994; Staiger et al. 2016). In addition to the structural gene variation reported here, there is evidence for polymorphism in upstream regulatory regions of equine DRB genes that could also affect function (Diaz et al. 2005).

In conclusion, we have characterized the expression and polymorphism of ten MHC class II genes, first in the ELA-A3 haplotype and then in four other haplotypes commonly found in Thoroughbred and Standardbred horses. This study represents the most thorough analysis of full-length class II genes in the horse and provides the basis for study of these genes in other horse breeds and ELA haplotypes.