Introduction

In 2012, we published a nomenclature report focused on the major histocompatibility complex (MHC) genes and alleles of the great apes as well as Old and New World monkey species (de Groot et al. 2012). Since then, research on the MHC of non-human primate (NHP) species has intensified, and most often concerns species that are models for human biology and disease. In addition, there has also been a steady growth in the MHC content–derived diverse NHP species that are studied for conservation biology purposes (Cao et al. 2015; de Groot et al. 2017a, b; Hans et al. 2017; Maibach et al. 2017; Wroblewski et al. 2017; Arguello-Sanchez et al. 2018).

The MHC is a large genomic region (approximately 5 million base pairs in length), and is packed with different genes, many of which are polymorphic. Mapping allelic polymorphisms is still a challenge, though recent technical developments such as next-generation sequencing technologies are speeding up the discovery of alleles, thereby increasing the reported number of MHC genes and alleles. The Immuno Polymorphism Database (IPD)-MHC Non Human Primate Database (https://www.ebi.ac.uk/ipd/mhc/group/NHP) is the platform used to store and retrieve quality-controlled and annotated MHC sequences of various non-human primate species. In order to cope with current and future developments, this platform was recently upgraded to allow the processing and annotation of large flows of data (Maccari et al. 2017).

In humans, the MHC is referred to as the human leukocyte antigen (HLA) complex, and it has an active WHO Nomenclature Committee for Factors of the HLA System. The committee’s most recent complete nomenclature report was published in 2010, and described the currently used naming convention for HLA factors (Marsh et al. 2010). The conventions laid out by that committee have, whenever possible, also been applied to the non-human primate MHC equivalents. However, when these rules are inadequate or inapplicable—for instance when there is no apparent evolutionarily related counterpart in humans—specific non-human primate nomenclature is introduced. As indicated in the previous NHP nomenclature report (de Groot et al. 2012), lineages and alleles of different MHC genes may have been named arbitrarily, based largely on the order in which they were discovered. However, there are exceptions. In some lineages shared between species, the same digits are used. HLA-DRB1*03:27, Patr-DRB1*03:03, and Mamu-DRB1*03:09, for example, are the official names for human, chimpanzee, and rhesus macaque alleles, respectively, that descend from an ancient DRB1 lineage that predated their speciation (Bontrop et al. 1999). Since the report in 2012, huge amounts of new data have become available, which has helped us to rename a number of the lineages/alleles with a more biologically representative designation.

This nomenclature report presents a detailed description of the most recent rules regarding NHP-specific nomenclature. It also provides an overview of the annotated data that are currently available in the IPD-MHC NHP Database. Also summarized are the upgrade of this platform and its novel features, which are now available or forthcoming, and which should facilitate future use of the database by all researchers in the field.

Guidelines: nomenclature of the non-human primate MHC

Nomenclature for the MHC systems of NHP species

The first nomenclature proposal regarding the major histocompatibility complex of different species was published in 1990 (Klein et al. 1990). Most of the nomenclature rules concerning the species prefixes that were then proposed are, in essence, currently still valid. In brief, the Mhc symbol is followed by a four-letter abbreviation of the species’ scientific name. The first two letters are derived from the name of the genus, and the last two letters from the name of the species. For the sake of convenience, the prefix “Mhc” is often omitted. A complicating factor is that there is no officially accepted consensus on non-human primate taxonomy, as the status of many species is still under discussion (Groves 2014). The 1990 nomenclature report used scientific species names based on those given by Corbet and Hill (1986). At present, the assignment of names at the levels of genus and species is based on Groves (2005). A register of officially accepted MHC names for 56 different NHP, for which annotated MHC genes or alleles may have been published and maintained by the IPD-MHC database, is provided (Table 1). Research on other vertebrate species has resulted in the discovery and description of MHC systems as well (i.e., see IPD Database; www.ebi.ac.uk/ipd/mhc/), and most have been named according to the nomenclature proposal that was originally published in 1990 (Klein et al. 1990). There is, however, a possibility that the MHCs of two or more different species have inadvertently been given identical species designations (Ballingall et al. 2018; Maccari et al. 2018). Therefore, this committee advises research communities working on the MHCs of other groups of species to develop and publish an MHC register. In this way, potential confusion will be avoided as much as possible.

Table 1 Register of official MHC symbols for non-human primates, and the genes and number of alleles represented in the IPD-MHC NHP Database

Nomenclature for NHP MHC genes, lineages, and alleles

As the human MHC (HLA), located on the short arm of chromosome 6, is the MHC most thoroughly studied, the HLA community has longstanding experience in dealing with issues of nomenclature. For that reason, the NHP committee uses the HLA system as both a guideline and a reference to name MHC genes, lineages, and alleles in NHP species. Our 2012 report gives a detailed description of the important nomenclature issues and how they apply to NHP species (de Groot et al. 2012). These guidelines are still applicable, but where updates are needed the changes implemented are described in this report. Table 1 provides an overview of the various classical and non-classical MHC genes that are now described for the great and small apes, Old World monkeys (OWM), and New World monkeys (NWM). The next section describes in more details the specific nomenclature applicable to particular NHP.

Non-human primate-specific nomenclature

In most cases, NHP MHC genes/lineages/alleles can be named according to the 2010 nomenclature rules described by the WHO Nomenclature Committee for Factors of the HLA System (Marsh et al. 2010). However, research into the MHC of non-human primates revealed genes/lineages that are not detectable, or present, in the HLA system. Consequently, in those cases, non-human primate-specific nomenclature was introduced. For example, classical class I genes in OWM species, such as the rhesus monkey, show extensive gene copy number variation that is absent from the HLA (Vogel et al. 1999; Daza-Vamenta et al. 2004; Otting et al. 2005). An overview and explanation of all specific prefixes and suffixes that are introduced in the non-human primate nomenclature is provided in Table 2.

Table 2 Specific prefixes and suffixes used in the MHC nomenclature of non-human primates

Nomenclature for the MHC class I genes in non-human primates

It has become manifest that the true orthologs of the HLA-A, HLA-B, and HLA-C genes are only present in the great ape species (Hans et al. 2017; Parham and Guethlein 2018). However, some great ape species may have additional class I genes (Fig. 1); for example, some chimpanzee MHC haplotypes have an additional HLA-A-like gene, designated Patr-AL (Adams et al. 2001). Some gorilla MHC haplotypes also have an additional A-related gene, named Gogo-Oko, which shares features with the classical MHC class I genes (Lawlor et al. 1991; Watkins et al. 1991a; Hans et al. 2017). Moreover, some gorilla haplotypes have another A-related gene, designated Gogo-A*05. This gene appears to be the equivalent of the human pseudogene HLA-Y, and may also be a pseudogene in gorillas (Hans et al. 2017). The orangutan A gene (Popy-A) is closely related to Patr-AL (Adams et al. 2001; Gleimer et al. 2011), emphasizing the fact that true orthologs of HLA-A are present only in African great apes.

Fig. 1
figure 1

Schematic overview of the MHC class I A and B/C region haplotypes that are present in humans and the various great ape species. The figure was adapted (Hans et al. 2017). The orthologs of the HLA-A, HLA-B, and HLA-C genes are presented in blue boxes. The orthologs of the A-related genes are in red boxes. The orthologous B genes present only in gorilla and orangutan are depicted in green boxes. Bn indicates that copy number variation in orangutan exists for the MHC-B gene

Chimpanzees possess one copy of a B and C gene per haplotype, as is the situation in humans. In gorillas, haplotypes with one copy of a B and C gene are also observed. In addition, some gorillas have an additional B gene (Gogo-B*07) (Hans et al. 2017). In orangutans, the C gene can be present or absent (Adams et al. 1999; de Groot et al. 2016), whereas the B gene exhibits copy number variation, with a minimum of two B genes per haplotype (Chen et al. 1992; de Groot et al. 2016). Due to the absence of segregation and of sufficient genomic data, the different paralogous B genes in orangutans have yet to be given official gene designations. The additional Gogo-B*07 gene, which differs from the B clade shared between humans, chimpanzees, and gorilla, appears to be most similar to the orangutan B genes (Hans et al. 2017). Figure 1 gives a schematic impression of the MHC-A and MHC-B/MHC-C region haplotypes that are present in humans and great apes.

In various Old World monkey species, family studies and genomic data have determined which genes are carried on the same haplotype. To differentiate the paralogous genes in these OWM, a nomenclature protocol was introduced that mirrors the designation of the various human HLA-DRB region genes. For example, Mamu-A1, Mamu-A2, and Mamu-A3 are closely related MHC class I A genes of the rhesus macaque, which can be present on the same MHC haplotype. Similar configurations are present in other macaque species (Fig. 2). Currently, the orthologous and paralogous relationships of the various OWM MHC-B genes are poorly understood. We anticipate that the precise order of the class I genes on macaque MHC haplotypes will be determined in the near future, which should aid in devising a precise and sensible nomenclature. A detailed description of macaque MHC class I nomenclature is given in the 2012 nomenclature report (de Groot et al. 2012).

Fig. 2
figure 2

Haplotype distribution of the Mhc-A genes in three macaque species. Mamu is Macaca mulatta, Mafa is M. fascicularis, and Mane is M. nemestrina. The presence of a haplotype in a particular macaque species is indicated with a different colored circle (Budde et al. 2010; Orysiuk et al. 2012; Doxiadis et al. 2013; Karl et al. 2013; Shiina et al. 2015). The brown open circles indicate the haplotypes that were detected in several pedigreed groups of Southern pig-tailed macaques, but the related data have not yet been published (B. Lafont, personal observations). In M. fascicularis, the A6-gene haplotype distribution is not yet known (Otting et al. 2007; Saito et al. 2012)

Nomenclature of MHC-class II alleles

Orthologs of HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRA, and HLA-DRB are present in all the non-human primate species so far investigated. In the past, only exon 2 sequences of MHC class II alleles were sequenced, but this practice is giving way to the sequencing of full-length cDNAs and genes (Otting et al. 2017). When naming non-human primate class II alleles, the HLA class II nomenclature is used when applicable. In some species, however, the class II genes have evolved differently from the human genes. For example, HLA-DPA1 is conserved, while the orthologous macaque gene is polymorphic. Below we describe the rationale behind naming the DQ, DP, DR, and DM of non-human primates (Maccari et al. 2017).

For the HLA-DQA1 gene, six allele groups/lineages are defined (Marsh et al. 2000), named DQA1*01 up to *06. Great apes and OWM have alleles that group with the HLA-DQA1*01 and the DQA1*05 lineages, and this is reflected in the nomenclature. Thus, chimpanzee Patr-DQA1*01:01 is orthologous to HLA-DQA1*01, and rhesus macaque Mamu-DQA1*05:01 is orthologous to HLA-DQA1*05. Other alleles can belong to lineages that are specific to a species or to a subset of non-human primate species. These lineages are numbered in series starting from DQA1*20, thus reserving HLA-DQA1*07*19 for HLA-DQA lineages that have yet to be discovered. Previously, lineage-numbers were given to sequences that do not meet the current criterion for a name (full-length exon 2 sequences): for example, gibbon DQA1*22. We encourage researchers to extend these incomplete sequences in order to confirm their lineage division.

Equivalents of the HLA-DQB1*02, *03, *05, and *06 lineages are present in great apes, whereas in OWM only the DQB1*06 lineage has been found. Non-human primate DQB1 lineages with no similarity to HLA-DQB1 lineages are numbered in series starting from DQB1*15.

The DRA gene is relatively conserved in humans and most non-human primates, with all allele designations being part of the DRA*01 group. An exception is the macaque DRA gene, which exhibits substantial variation in exon 1. To accommodate this variation, a second group of alleles, DRA*02, is defined for macaques. The DRB gene is duplicated in humans as well as in non-human primates. The non-human primate genes/loci are numbered DRB1, DRB3, DRB5, and DRB6, and, in essence, follow the HLA system. Within cases where a sequence or group of sequences cannot yet be assigned to a gene, the gene number is omitted, and the lineage number is preceded by W, thereby denoting a temporary “workshop” designation. For instance, Poab-DRB*W118:01 is the most recently designated DRB allele of the Sumatran orangutan, but its precise relation to DRB in humans and other great apes is not sufficiently well understood.

HLA-DPA1 is oligomorphic, having four sub-lineages of alleles. In contrast to humans, the macaque DPA gene is polymorphic, with alleles grouping into clusters, which are different from the human equivalents. Historically, the DPA1*01 lineage name was used for humans and great apes, whereas the macaque alleles were given the names DPA1*02, *04, and *06 up to *13, and the baboon DPA alleles were named DPA1*14, *15, and *16.

Non-human primate DPB nomenclature is more complicated because the gene evolved differently in humans and OWM. In HLA-DPB1, the exchange of sequence motifs by recombination has played a prominent role in the generation of allelic diversity. Over a thousand sequences are currently known, named HLA-DPB1*01:01 up to HLA-DPB1*1069:01 (Gyllensten et al. 1996; Marsh et al. 2010). The DPB gene in OWM is more polymorphic than in humans, and these differences are generated mainly by point mutations. Moreover, the alleles group into distinctive phylogenetic lineages (Otting et al. 2017). Consequently, the nomenclature of HLA-DPB barely overlaps with that of non-human primate DPB. All human alleles are part of the DPB1*01 lineage, whereas the macaque alleles, and those of other non-human primates, are distributed among the DPB1*01*30 lineages.

Full-length chimpanzee class II cDNA sequences have recently become available in the IPD-MHC Database (Otting et al. 2019). In addition, class II sequences of gorillas and two orangutan species have been deposited (N. Otting, personal observations). The allele names, accession numbers, and individual animals studied are given in Table 3 of this report. Phylogenetic analyses of great ape, macaque, and human DPB1 sequences point to DPB polymorphism being limited in great apes, compared to macaques (data not shown). However, the great ape DPB alleles cluster into groups and are given lineage-numbers DPB1*01*05. Of note, there is no similarity between great ape and Old/New World monkey DPB1 alleles that have the same lineage number.

Table 3 New class II alleles in gorilla (Gogo), orangutan (Poab and Popy), and chimpanzee (Patr). The newly detected alleles are listed in alphabetical order, together with the accession numbers and names of the animals in which the sequences were present. The gorilla group (Gogo) included the sire Jambo along with three offspring Mapasa, M’Zungu, and Wimbe. These animals share a haplotype, the alleles of which are depicted in blue. The two subspecies of orangutan share a haplotype as well, except for 1- or 2-nt differences in some alleles. This haplotype is depicted in light green for the Sumatran Pongo abelii (Poab) and in dark green for the Bornean Pongo pygmaeus (Popy)

Recent full-length class II sequencing of chimpanzee Patr-DPB (Otting et al. 2019) identified errors in the exon 2 sequences that were described in the 1990s. After correction, the Patr-DPB alleles received new designations as presented in Table 4.

Table 4 Renaming of DPB1 alleles in chimpanzees (Patr)

Orthologs of the non-classical MHC class II genes, HLA-DM and HLA-DO, are also present in non-human primates, and the DMA alleles of macaques have been named DMA*02 (Min et al. 2019). Full-length DMB sequences have been reported for chimpanzees, gorillas, and orangutans (Alvarez et al. 1998). Chimpanzee and gorilla DMB alleles cluster with HLA-DMB*01, and were given the Patr-DMB*01 and Gogo-DMB*01 names, respectively. Orangutan and macaques DMB are named DMB*02 and *03, respectively (Alvarez et al. 1998; Min et al. 2019). Among the OWM, DO sequences have only been described for macaques, and they have been named DOA*01 and DOB*01 (Lian et al. 2018).

The IPD-MHC NHP Database

The IPD-MHC NHP Database (https://www.ebi.ac.uk/ipd/mhc/group/NHP) is part of the IPD-MHC platform (https://www.ebi.ac.uk/ipd/mhc/). This platform was recently upgraded, which has resulted in the incorporation of sequence updates. In addition, new tools have been made available and the submission procedure has been improved (Maccari et al. 2017).

The first generation of the database went online in March 2002 (Robinson et al. 2003, 2010), and since then it has greatly expanded (Fig. 3). Today, the database includes MHC data from great and small apes, OWM, and NWM, and archives > 7400 allele sequences derived from 54 species of NHP (Table 1). Due to the increasing interest in studying the MHC of strepsirrhine species (e.g., lemurs, lorises, galagos, pottos) (Averdam et al. 2009, 2011; Pechouskova et al. 2015; Kaesler et al. 2017; de Winter et al. 2019), it is our intention to deposit the data for these species in the IPD-MHC NHP Database in the near future.

Fig. 3
figure 3

Annual growth of the IPD-MHC NHP Database

The database includes Mhc class I (full-length or minimal exons 2 and 3) and class II (full-length or minimal exon 2) sequences, which have been submitted and published by numerous authors. Since the 2012 report (de Groot et al. 2012), the database has almost doubled in size and includes data from an additional eight species: Hylobates moloch (de Groot et al. 2017a), Gorilla beringei (Hans et al. 2017), Pongo abelii (de Groot et al. 2016), Cercocebus atys (Heimbruch et al. 2015; Wang et al. 2015), Chlorocebus pygerythrus (Gieger et al., unpublished), Macaca assamensis (Yan et al. 2013), Macaca leonina (Lian et al. 2016, 2018), and Alouatta pigra (Arguello-Sanchez et al. 2018) (Table 1, red numbers in column 2019). For some established species in the IPD-MHC NHP, the data have been extended. These species are Pan troglodytes and Pan paniscus (Wroblewski et al. 2015; de Groot et al. 2017b; Maibach et al. 2017; Wroblewski et al. 2017; Otting et al. 2019), Gorilla gorilla (Hans et al. 2017), Pongo pygmaeus (de Groot et al. 2016), Cercopithecus mitis (Liu et al. 2014), Chlorocebus sabaeus (Aarnink et al. 2014), Macaca arctoides (Yan et al. 2013), Macaca fascicularis (Lawrence et al. 2012; Orysiuk et al. 2012; Blancher et al. 2014; van der Wiel et al. 2015; Karl et al. 2017; Otting et al. 2017), Macaca mulatta (Karl et al. 2013; Dudley et al. 2014; van der Wiel et al. 2015; Otting et al. 2017), Macaca nemestrina (Karl et al. 2014; van der Wiel et al. 2015; Otting et al. 2017; Semler et al. 2018), Macaca thibetana (Yan et al. 2013; Min et al. 2019), Papio anubis (Otting et al. 2016; Morgan et al. 2018; van der Wiel et al. 2018), Papio hamadryas (Morgan et al. 2018), Aotus nancymaae and Aotus vociferans (Lopez et al. 2014), Ateles fusciceps (Cao et al. 2015), and Callithrix jacchus (van der Wiel et al. 2013), and Otting et al. and Mueller et al., unpublished) (Table 1, blue numbers in column 2019).

The recently improved IPD-MHC Database is also able to host genomic sequences and to provide a multiple sequence alignment tool for the comparison of genomic and non-genomic data (Maccari et al. 2017). This tool facilitates single-gene alignments, as well as inter- and intra-species gene alignments for all species groups within the IPD-MHC database. As a standard, the alleles in an alignment are first grouped by identity at the first two digits, which represents the lineage. If a particular lineage contains several alleles, the number of the additional alleles is indicated in brackets adjacent to the first allele (Fig. 4). Clicking on the associated number will allow the corresponding sub-alignment to be visualized. This feature increases the visualization of large alignments. In addition, the level of representation of an alignment can be varied by changing the value in the “resolution level” field. Four different resolution levels can be chosen: 01 is lineage level; 01:01 is allele level; 01:01:01 is all alleles including those with synonymous substitutions; and 01:01:01:01 is all alleles including those with non-coding variation.

Fig. 4
figure 4

Partial alignment of some chimpanzee A (Patr-A) alleles. The human HLA-A*01:01:01:01 allele is taken as a reference sequence. The brackets after the Patr-A*03:01 allele indicate that the A*03 lineage contains six additional alleles. A dash indicates identity to the consensus, and a nucleotide replacement is represented by the conventional one-letter code

The curators of the IPD-MHC NHP Database are responsible for providing official designations for newly identified alleles. Alleles/sequences can be submitted using the online submission tool, which is found on the IPD-MHC Database homepage (https://www.ebi.ac.uk/ipd/mhc/). Currently, only one sequence can be submitted at a time. However, we are developing a bulk submission tool. To enhance the reliability of the alleles deposited in the IPD-MHC NHP Database, we encourage the scientist involved in non-human primate MHC research to submit the sequences they identified in their cohort studies, even if they are identical to already published alleles. Every 6 months, the IPD-MHC Database releases new data, which updates the website with all novel NHP sequences, and with additions or corrections to previously deposited allele sequences.