Introduction

Pig-tailed macaques (Macaca nemestrina) are important nonhuman primate models for infectious disease research, such as influenza, chlamydia, and tuberculosis (Gardner and Luciw 2008; Jegaskanda et al. 2013; Patton et al. 2014; Shen et al. 2004). They are a particularly valuable species for studying HIV infection because they carry a nonfunctional TRIM5α variant that eliminates a major barrier for replication of HIV-1 in macaques (Brennan et al. 2007; Liao et al. 2007). This allows pig-tailed macaques to be challenged with minimally modified HIV-1 strains that may more closely mimic the course of HIV infection in humans (Hatziioannou et al. 2009, 2014; Igarashi et al. 2007). Pig-tailed macaques can additionally express a major histocompatibility complex (MHC) class I allele, Mane-A1*084 (previously known as Mane-A*10), that is associated with reduced SIV viral load, making it possible to study spontaneous SIV/HIV control in this population (De Rose et al. 2008; Smith et al. 2005). Mane-A1*084 is also associated with decreased risk of developing the lentiviral-induced central nervous system disease SIV encephalitis (Mankowski et al. 2008). Pig-tailed macaques are also known to be susceptible to hepatitis C, dengue, Chikungunya, Japanese encephalitis, malaria, and Kaposi’s sarcoma herpesvirus among others, making pig-tailed macaques potentially valuable models for many additional human infectious diseases (Bruce et al. 2013; Nakgoi et al. 2014; Putaporntip et al. 2010; Sourisseau et al. 2013).

The MHC encodes gene products that present peptides to T cells, dictating the specificity of cellular immune response to pathogens or other nonself peptides. Recent studies have significantly improved knowledge of the compliment of MHC class I alleles expressed by pig-tailed macaques (Fernandez et al. 2011; O’Leary et al. 2009). However, characterization of MHC class II alleles for this population is far from complete. MHC class II molecules are heterodimers of alpha and beta chains encoded by Mane-DRA, Mane-DQA, Mane-DPA and Mane-DRB, Mane-DQB, Mane-DPB genes, respectively; these molecules are expressed on antigen-presenting cells and display peptides to CD4+ T cells. Specific MHC-II molecules have been implicated in human immune response to tuberculosis, influenza vaccination, and hepatitis B virus; MHC-II molecules are also involved in several autoimmune disorders like rheumatoid arthritis, type 1 diabetes, and celiac disease (Anderson et al. 2013; Jones et al. 2006; Kamatani et al. 2009; Kuranov et al. 2014; Moss et al. 2013; Raychaudhuri et al. 2012). Therefore, a complete characterization of the libraries of MHC-II alleles expressed by nonhuman primate models like pig-tailed macaques is essential to understanding the immune responses of these animals during infectious disease challenges and therapeutics studies.

To date, only 141 MHC-II alleles have been named for the pig-tailed macaque population, and only 16 of these are available as full-length sequences. In this survey, we performed full-length MHC-II allele discovery for the DRA, DRB, DQA, DQB, DPA, and DPB loci in 32 pig-tailed macaques using a novel next-generation sequencing method. This method is faster and higher-throughput than traditional cloning and Sanger-based sequencing methods, facilitating more rapid full-length MHC-II allele discovery (Creager et al. 2011; Karl et al. 2009). We also performed an exon 2-based genotyping assay to validate the full-length sequencing results with an independent set of primers specific for the DRB, DQA, DQB, DPA, and DPB loci.

Materials and methods

Animals

Total RNA samples from 32 pig-tailed macaques were provided by investigators at Johns Hopkins University (Baltimore, MD, USA). These animals were cared for according to the regulations and guidelines of the Institutional Care and Use Committee at their institution.

cDNA synthesis and PCR amplification

Synthesis of complementary DNA (cDNA) from the provided RNA was performed using the Superscripttm III First-Strand Synthesis System for RT-PCR (Invitrogen, Carlsbad, CA, USA). Polymerase chain reaction (PCR) was performed to amplify two different regions of six MHC class II loci (DRA, DRB, DQA, DQB, DPA, and DPB) independently. The first PCR reaction amplified full-length MHC class II products, with primers located in the 5′ and 3′ untranslated regions (UTRs) of each locus (primer sequences provided in Supplemental Fig. 1a). The full-length amplification primers also included multiplex identifier (MID) molecular barcodes and the adapters necessary for Roche/454 FLX+ sequencing (Supplemental Fig. 1a). cDNA templates were amplified using high-fidelity Phusiontm polymerase (New England Biolabs, Ipswich, MA, USA) and full-length amplification primers under the following conditions on an MJ Research Tetrad Thermocycler (Bio-Rad Laboratories, Hercules, CA, USA): initial denaturation at 98 °C for 3 min; between 25 and 28 cycles of 98 °C for 5 s, 60 °C for 10 s, and 72 °C for 20 s; and a final extension of 72 °C for 5 min. Aliquots of each reaction were checked for amplification on a Flash Gel (Lonza Group Ltd, Basel, Switzerland) after 25 cycles, with additional PCR cycles for reactions showing undetectable amplification at that stage. PCR products were then purified twice using a 1:1 ratio of sample volume to AMPure XP SPRI beads (Agencourt Bioscience Corporation, Beverly, MA, USA) to remove primer dimers and quantified using the Quant-iT dsDNA HS Assay kit and a Qubit fluorometer (Invitrogen). Full-length amplicons were normalized to between 0.2 and 3.0 ng/μl, depending on locus (the same target concentration was used for all samples of a particular locus), and then pooled in equal volumes for each locus (32 samples per pool).

The second PCR reaction amplified exon 2 products of MHC class II DRB, DQA, DQB, DPA, and DPB (primer sequences provided in Supplemental Fig. 1b). These amplification primers also included consensus sequences (CS1 and CS2) necessary for 4-primer amplicon tagging with the Fluidigm Access Arraytm System (Fluidigm, San Francisco, CA, USA). In brief, this system performs limited rounds of PCR with target-specific primers fused to CS1 and CS2 linkers and then additional rounds of PCR with CS1/CS2 linker primers fused to the indices and adapters necessary for Illumina MiSeq sequencing (Supplemental Fig. 1b). An array of up to 48 different samples can be individually barcoded and amplified with up to 48 different primer pairs, with PCR reactions occurring within an integrated fluidic circuit (IFC) microfluidics chip. PCR was performed essentially according to the 4-Primer Amplicon Tagging Protocol from Fluidigm, substituting High-Fidelity Phusiontm Hot Start Flex master mix (New England Biolabs) for the FastStart High Fidelity PCR System (Roche, Indianapolis, IN, USA). Master mix, cDNA samples, and barcoded outer primers were loaded into the IFC microfluidics chip through the sample inlets, and primers specific for exon 2 of MHC class II DRB, DQA, DQB, DPA, and DPB were loaded through the primer inlets. Samples, master mix, and primers were all combined in the PCR reaction chambers of the IFC microfluidics chip by the IFC Controller, after which the chip was moved to the FC1 Cycler for thermal cycling. The PCR program was also modified from the manufacturer’s protocol to the following: mixing steps of 50 °C for 2 min followed by 70 °C for 20 min; initial denaturation at 98 °C for 2 min; ten PCR cycles of 98 °C for 10 s, 60 °C for 30 s, and 72 °C for 20 s; two C0t cycles of 98 °C for 10 s, 80 °C for 30 s, 60 °C for 30 s, 72 °C for 20 s; eight additional PCR cycles; two additional C0t cycles; eight final PCR cycles; five final C0t cycles; and a final extension of 72 °C for 5 min. Following PCR cycling, the samples were harvested from the IFC chip using the IFC Controller, which pushes a total of 10 μl for each sample back into the sample inlets. These inlets contained a pool of six different MHC exon 2 amplicons for each sample. Two pools of samples were created by combining 4 μl of each sample for samples 1–16 and again for samples 17–32, for purification with 1.3× volume AMPure XP SPRI beads (Agencourt Bioscience Corporation) to remove primer dimers. Quantification was again performed using the Quant-iT dsDNA HS Assay kit and a Qubit fluorometer (Invitrogen), and the two MHC exon 2 pools were normalized to 0.45 ng/μl and combined with other pools of samples for Illumina MiSeq sequencing.

Sequencing of MHC class II alleles

Sequencing of the full-length MHC class II amplicons was performed using Roche/454 GS FLX+ next-generation methods following the manufacturer’s protocols for emulsion PCR and pyrosequencing (Roche). Five pools (32 samples/pool) were each sequenced on 1/8 plate of a GS FLX+ instrument, one locus per pool (DRA, DQA, DQB, DPA, DPB). The DRB locus pool was run on 2× 1/8 plate, since more alleles were expected per animal.

Sequencing of the exon 2 MHC class II and MHC class I amplicon Fluidigm pools was performed using Illumina MiSeq next-generation methods following the manufacturer’s protocols (Illumina, San Diego, CA, USA), with the appropriate supplemental sequencing primers added per the Fluidigm manufacturer’s protocol. The 32 pig-tailed samples comprised 1/6 of a MiSeq run.

Data analysis

For the full-length amplicons, sequences were analyzed using the program Geneious Pro version R6.1.5 (Biomatters Limited, Auckland, New Zealand). Sequences were quality trimmed to an error probability limit of 0.01 from both ends, binned by animal using the MID tags, and primers were removed. Sequences for each animal were assembled at 99 % stringency to allow for potential homopolymer mismatch for all loci except DRA; that locus was assembled at 100 % stringency due to an expected high degree of similarity between alleles. Contigs of assembled sequences were further trimmed to only include areas of at least 5× sequence coverage and then exported for inter-animal sequence comparison in CodonCode Aligner (CodonCode, Dedham, MA, USA). Resulting full-length sequences were compared to a curated database of known macaque MHC class II alleles using the Basic Local Alignment Search Tool (BLAST), and unnamed alleles were submitted to GenBank (accession numbers KJ801668-KJ801772) as well as the IMGT/MHC Nonhuman Primate Immuno Polymorphism Database-MHC (IPD-MHC) and NHP Nomenclature Committee (de Groot et al. 2012; Robinson et al. 2013).

The exon 2 amplicons were analyzed using a custom command line pipeline which merged R1 and R2 MiSeq reads, trimmed all primers, and compared sequences against a custom database of alleles trimmed to the same length as the unknown sequences using Bowtie ultrafast short read aligner (University of Maryland, College Park, MD, USA) (Langmead et al. 2009) as previously described (Wiseman et al., manuscript submitted). Results of the Bowtie analysis were reported as a table of number of MiSeq reads per animal that match a particular known allele, which was used to compare results across animals in the cohort.

Results and discussion

Full-length MHC class II allele analysis

The advances in next-generation sequencing over the last few years allow for analysis of vastly more sequences per animal at comparable read lengths to traditional cloning and Sanger-based sequencing methods. This has made it feasibly possible to address shortcomings in the databases of known alleles for species like the pig-tailed macaque. As of June 2014, the IPD-MHC database of named MHC class II sequences for this species contained 141 alleles (11 DRA, 99 DRB, and 31 DQB), compared to 437 named alleles for rhesus macaques and 675 for cynomolgus macaques (de Groot et al. 2012; Robinson et al. 2013). Of the 141 named pig-tailed macaque alleles, only 16 (11 Mane-DRA and five Mane-DRB) were full-length sequences. The remaining 125 alleles focused on the most polymorphic exon 2 region, which are insufficient for downstream analysis like definition of peptide binding motifs or creation of MHC/peptide tetramers. We developed a method for full-length MHC class II allele discovery using the Roche/454 GS FLX+ sequencing system and applied it to a cohort of 32 pig-tailed macaques, examining an average of ~9,800 sequences per animal across all six loci (DRA, DRB, DQA, DQB, DPA, and DPB). In this cohort, we identified a total of 128 distinct full-length alleles: 15 DRA, 44 DRB, 23 DQA, 24 DQB, 13 DPA, and 9 DPB alleles (Table 1). Of these, 73 alleles were mismatched at one or more bases from any sequences in the database of named alleles, 44 were extensions to previously identified named alleles, and only 11 matched named full-length sequences. Almost half of these sequences (61 of 128) are identical to alleles previously described in rhesus and/or cynomolgus macaques, which is consistent with previously observed sharing of pig-tailed macaque MHC class I sequences and high levels of MHC class II allele sharing between macaque species (Lafont et al. 2004; Doxiadis et al. 2006; O’Leary et al. 2009; Creager et al. 2011).

Table 1 MHC class II alleles identified in pig-tailed macaques

After identifying the panel of 128 full-length alleles in our cohort of pig-tailed macaques, we inferred haplotypes for each of the three regions sequenced. Haplotypes are localized groups of alleles inherited together on a chromosome. These class II region-specific haplotypes were determined by looking for alpha and beta alleles shared between multiple animals—for instance, the combination of Mane-DRA*01:03:01, Mane-DRB*W020:01, and Mane-DRB1*03:14, which was observed in six animals (Supplemental Fig. 2). Haplotypes shared between two or more animals were inferred first, then any remaining MHC-II A and B alleles unassigned to shared haplotypes were analyzed. In most instances, the haplotypes observed in just a single animal were inferred because the other chromosome in the animal was already described by one of the shared haplotypes. In the rare case that both haplotypes for a particular animal were unshared, inferences were based on allelic similarity to shared haplotypes. Inferring haplotypes allows us to consider these full-length sequencing results in terms of shared groups of alleles expressed by multiple animals. Each region was considered independently because there was no known direct relationship between the animals sequenced in this survey, so identifying multiple animals sharing a complete MHC-II region (DR, DQ, and DP) was far less likely than sharing of region-specific haplotypes.

Of the MHC-II loci, DRB is the best-studied and most polymorphic macaque locus. Much like humans, macaques express a single DRA gene and multiple DRB genes per chromosome. For the DR region, we observed 32 distinct haplotypes by full-length sequencing (Fig. 1; Supplemental Fig. 2). This is far more diverse than has been observed in Mauritian cynomolgus macaques and Filipino cynomolgus macaques, where the apparent sequence diversity of the DR region can be described by seven and ~20 haplotypes, respectively (Blancher et al. 2012; O’Connor et al. 2007; Wiseman et al. 2013). This is not particularly surprising, as both of these geographically isolated insular cynomolgus macaque populations are thought to have arisen from comparatively small founding groups. Compared to a recent study of the DRB region in Indian rhesus macaques where 38 different Mamu-DRB haplotypes were observed, there was less diversity observed in this cohort of pig-tailed macaques (Wiseman et al., manuscript submitted). However, the Indian rhesus macaque study included over 600 animals; it is quite likely that examination of additional pig-tailed macaques will reveal additional diversity in the Mane-DRB region.

Fig. 1
figure 1

Mane-DR region haplotypes. Mane-DRA alleles are listed across the top and Mane-DRB groups of shared alleles are listed down the left. Shaded boxes correspond to haplotypes observed in 32 pig-tailed macaques. Numbers in the shaded boxes indicate how many times each haplotype was observed (out of 64 total chromosomes). Grey alleles are those observed only with the exon 2 genotyping assay primers (not full-length). The question mark indicate one or more unknown Mane-DRA and Mane-DRB alleles that were not recovered by either method; these alleles would be predicted to be encoded on haplotypes with the unpaired observed Mane-DRB/Mane-DRA alleles, respectively

For the DQ region, we observed 30 distinct haplotypes (Fig. 2; Supplemental Fig. 2). This is again consistent with a higher DQ region diversity for pig-tailed macaques compared to Mauritian and Filipino cynomolgus macaques, in which seven and 13 haplotypes have been described, respectively (Blancher et al. 2014; O’Connor et al. 2007; Wiseman et al. 2013). However, many of the distinct haplotypes in the DQ region are closely related variants of each other. For example, nine of the 30 haplotypes pair an allele of the Mane-DQA1*01 lineage group with an allele of the Mane-DQB1*06 group. This occurrence of unique but closely related haplotypes appears in the Mauritian and Filipino cynomolgus macaque populations as well—four of the seven Mauritian cynomolgus macaque and seven of the 13 Filipino cynomolgus macaque DQ haplotypes are a combination of distinct Mafa-DQA1*01/Mafa-DQB1*06 alleles. Indian and Burmese rhesus macaques also have been shown to contain at least eight different Mamu-DQA1*01/Mamu-DQB1*06 haplotypes (Doxiadis et al. 2013). It remains to be tested whether haplotypes with minimal nucleotide differences like this would have any functional differences for peptide presentation.

Fig. 2
figure 2

Mane-DQ region haplotypes. Mane-DQA1 alleles are listed across the top and Mane-DQB1 alleles are listed down the left. Shaded boxes correspond to haplotypes observed in 32 pig-tailed macaques. Numbers in the shaded boxes indicate how many times each haplotype was observed (out of 64 total chromosomes)

For the DP region, we observed 12 distinct haplotypes (Fig. 3; Supplemental Fig. 2). The relative lack of diversity of the DP region compared to the DR and DQ regions is not unique to pig-tailed macaques. Mauritian cynomolgus macaques only contain six distinct DP region haplotypes, and only nine different DP haplotypes have been described for Filipino cynomolgus macaques (Blancher et al. 2014; O’Connor et al. 2007; Wiseman et al. 2013).

Fig. 3
figure 3

Mane-DP region haplotypes. Mane-DPA1 alleles are listed across the top and Mane-DPB1 alleles are listed down the left. Shaded boxes correspond to haplotypes observed in 32 pig-tailed macaques. Numbers in the shaded boxes indicate how many times each haplotype was observed (out of 64 total chromosomes). Grey alleles are those observed only with the exon 2 genotyping assay primers (not full-length). The question mark indicates one or more unknown Mane-DPB1 alleles not recovered by either method; these alleles would be predicted to be encoded on haplotypes with the unpaired observed Mane-DPA1 alleles

Exon 2 genotyping survey of MHC class II alleles

The primers used to amplify the full-length MHC class II sequences were designed using all available information for macaque 5′ and 3′ UTR regions; since many of the studies performed to date have focused on the exon 2 region of MHC class II alleles, there is a very limited amount of UTR sequence available. We therefore sought to confirm our full-length sequencing method by performing a genotyping survey of our cohort of 32 pig-tailed macaques using an independent set of primers. The primers for this assay were designed against known exon 2 genomic DNA sequences of Mauritian cynomolgus macaques for five of the six MHC class II loci (DRA was omitted due to high homology between alleles, even within exon 2). Mauritian cynomolgus macaque MHC-II sequences were used because they are one of the best-studied macaque populations for the complete MHC region.

Overall concordance across all five MHC-II loci between the full-length and exon 2 sequencing primers was good, with an average of 74 % of the alleles amplified with the full-length primers detected by the exon 2 primers as well (Fig. 4; Supplemental Fig. 3). Individual loci ranged from 61 % (DRB) to 100 % concordance (DPB) between the full-length and exon 2 primer sequencing results. The discrepancies between sequencing methods most likely arise from a need to optimize the primers for the exon 2 assay. These primers were designed for use in a genotyping assay using genomic DNA as the template and are based on the consensus of all known Mauritian cynomolgus macaque alleles for each locus. Alleles with differences under the primers do not typically amplify as well as those that are perfectly matched to the primers. Additional primers can be developed for alleles underrepresented in the exon 2 survey using the full-length cDNA sequences obtained in this study to obtain more consistent representation of all alleles in future genotyping assays with RNA as the starting material.

Fig. 4
figure 4

Exon 2 genotyping assay confirmation of full-length sequencing results. Number of alleles identified by locus with both sets of primers, full-length primers only, and exon 2 primers only. DRA is omitted since it was not included in the exon 2 assay

In addition to the exon 2 primers detecting the majority of the full-length alleles at each locus, they also identified some additional alleles for the DRB, DPA, and DPB loci that were not recovered with the full-length primers. For the DRB locus, 11 additional putative alleles were detected with the exon 2 primers. Four of these putative alleles (Mane-DRB1*03:10, Mane-DRB1*04:nov:04, Mane-DRB1*07:nov:01, and Mane-DRB5*03:nov:02) formed previously undescribed DR region haplotypes with a pair of Mane-DRA*02 lineage alleles, bringing the total number of distinct DR haplotypes detected in this cohort to 36 (Fig. 1). Similarly, an additional putative DPA allele and three additional putative DPB alleles added three DP region haplotypes, bringing the total to 15 DP haplotypes in this cohort (Fig. 3). The identification of these additional putative alleles with the exon 2 assay suggests that some optimization of the full-length sequencing primers is also required, particularly for the DR and DP loci. It is assumed that for these loci, the alleles detected with the exon 2 assay alone are also present in the full-length cDNA template molecules, but the primers are not sufficiently matched to these specific alleles to allow for amplification. For the DQA and DQB loci, it appears that full-length primers amplified all alleles present in these 32 pig-tailed macaques, as there were no additional putative DQ alleles detected by the exon 2 assay alone. Primer design for both full-length sequencing and the exon 2 genotyping assay will improve as more alleles are identified in macaques in the future.

Implications for infectious disease research

In this survey of just 32 pig-tailed macaques, we identified a total of 128 unique full-length MHC-II alleles, including 44 extensions of previously identified exon 2 sequences and 73 alleles unmatched against the database of named alleles. This increases the full-length MHC-II database for pig-tailed macaque alleles from 16 to 133 sequences, representing a more than eightfold increase in characterized full-length alleles. These sequences make it more feasible to create reagents like MHC/peptide tetramers and define peptide binding motifs for common MHC-II pig-tailed macaque alleles. However, which MHC-II alleles are commonly shared in this population remains to be determined.

The development of the exon 2 Fluidigm genotyping assay is an important tool for researchers using pig-tailed macaques in infectious disease studies. With some primer optimization, this assay can be used to genotype large numbers of pig-tailed macaques to determine frequencies of MHC-II alleles across the DR, DQ, and DP regions in large cohorts. Those alleles found to be most common provide a starting block from which the role of MHC-II can be determined for infectious disease and vaccine studies as well as development of reagents like MHC/peptide tetramers to study the immunology surrounding MHC-II for a given disease model. Large-scale examination of allele frequencies in populations of pig-tailed macaques available for research can also help reduce MHC class II bias in study results by balancing animals with commonly expressed alleles between control and study groups.

There is recent paradigm-shifting evidence that MHC-II molecules may play an even greater role in immune response to pathogens than previously appreciated. It has been shown that SIV-expressing rhesus cytomegalovirus vectors can elicit SIV-specific CD8+ T cell responses that recognize epitopes restricted by MHC class II molecules in Indian rhesus macaques (Hansen et al. 2013). This suggests a role for MHC-II molecules beyond presentation of peptides to CD4+ T cells. Therefore, a thorough understanding of the MHC-II complement of pig-tailed macaques as well as other nonhuman primates used as models for HIV and other infectious disease research is essential to fully understand immune response to vaccines and infections. The methods presented here can be applied to characterizing the MHC-II DR, DQ, and DP regions of rhesus and cynomolgus macaques as well as other nonhuman primates.