The development of novel high throughput sequencing (HTS) technologies has opened new ways for exploring and understanding the human genome. Recently, studies of genomic contigs obtained by next generation sequencing (NGS) in Homo sapiens have identified a series of novel duplicated immunoglobulin (IG) heavy variable (IGHV) genes and new alleles, refining our view of the immunoglobulin heavy chain (IGH) locus (Watson et al. 2013), particularly regarding copy number variation (CNV). IMGT gene and allele names were assigned to the novel IGHV genes and alleles by the IMGT Nomenclature Committee (IMGT-NC) (Lefranc 2014) and have been incorporated in IMGT/GENE-DB (Giudicelli et al. 2005) and in the reference directories of the international ImMunoGeneTics information system® (IMGT®) (http://www.imgt.org) the global reference in Immunogenetics and Immunoinformatics (Lefranc 2014; Lefranc et al. 2009).

Overall, the changes introduced within the last year to the IMGT/V-QUEST reference directory (Brochet et al. 2008) regarding the functional IGHV genes include the following: (i) the enrichment of the IGHV reference dataset through the addition of 5 novel genes and 17 new alleles (6 from the novel genes and 11 from pre-existing IGHV genes), (ii) the assignment of definitive IMGT gene names to 6 previously unmapped IGHV genes, (iii) the update of the reference sequence of 7 IGHV alleles and, (iv) the removal of two IGHV alleles which have become obsolete due to these changes (Table 1). These updates not only contribute to an improved characterization of the expressed repertoire but also raise the issue of whether re-evaluation of previous results is warranted.

Table 1 Synopsis of the changes (new genes and alleles and upgrade of previous data) recently introduced into the IMGT/V-QUEST reference directory for the IGHV functional genes based on comparison of the IMGT/V-QUEST reference directory release 201308-3 (20 February 2013) versus the IMGT/V-QUEST reference directory release 201408-4 (20 February 2014)

Precise determination of somatic hypermutation (SHM) status strongly depends on alignment of the nucleotide sequence of the rearranged IGHV gene against that of its closest germline counterpart (Ghia et al. 2007; Langerak et al. 2011). Hence, changes in the relevant reference directories could, in principle, affect the IGHV gene and allele assignment and the interpretation of SHM status of IG rearrangement sequences in health and disease, including B cell malignancies, where immunogenetic analysis of the clonotypic IGHV genes has offered valuable insight into their ontogeny (Sutton et al. 2013).

Chronic lymphocytic leukaemia (CLL) amply exemplifies this concept. Indeed, in CLL, the SHM status of the clonotypic rearranged IGHV gene has been established as a strong and accurate prognosticator of patient outcome (Damle et al. 1999; Hamblin et al. 1999). Based on a cutoff value of 98 % germline identity (GI), clonotypic IGHV-IGHD-IGHJ gene rearrangement sequences are defined either as mutated (<98 % GI) or unmutated (≥98 % GI) and patients are, accordingly, assigned to a “mutated” (M-CLL) or “unmutated” (U-CLL) category, each associated with a distinct prognosis (Chiorazzi and Ferrarini 2011; Chiorazzi et al. 2005; Damle et al. 1999; Hamblin et al. 1999). For these reasons, CLL is an excellent candidate to explore the biological and clinical impact of recent developments regarding the IGH locus composition.

In order to address this issue, we took advantage of a large dataset of IGHV-IGHD-IGHJ gene rearrangement sequences from patients with CLL consolidated within the context of the multi-institutional IMGT/CLL-DB initiative (http://www.imgt.org/CLLDBInterface/Welcome.do). IGH rearrangement sequences were analyzed through the use of the IMGT/HighV-QUEST portal for NGS immunoglobulin (IG) and T cell receptor (TR) sequences (Alamyar et al. 2012a, b; Li et al. 2013). This tool is capable of analyzing up to 500,000 sequences per batch and provides the same high quality results as IMGT/V-QUEST online. IMGT/HighV-QUEST uses, at a given time, the same tool version and the same reference directory release as IMGT/V-QUEST (Table 2).

Table 2 Overall analysis involved releases 201308-3 (20 February 2013) and 201408-4 (20 February 2014), respectively (http://www.imgt.org/IMGT_vquest/share/textes/datareleases.html). For each analysis a different IMGT/HighV-QUEST version (http://www.imgt.org/HighV-QUEST/help.action?section=upgrades) was available and, thus, applied: namely versions 1.1.2 (01 October 2012) and 1.2.0 (14 November 2013) which in turn correspond to IMGT/V-QUEST program versions 3.2.30 (28 January 2013) and 3.3.0 (20 February 2014) (http://www.imgt.org/IMGT_vquest/share/textes/programversions.html). The different program versions correspond to improvements in the algorithms, the impact of which has not been analysed in this study

Specifically, we analyzed 8066 productive IGHV-IGHD-IGHJ rearrangement sequences from our consortium both before and after the latest update to the IMGT/V-QUEST reference directory (Table 2).

Overall, differences due to the updates of the IMGT/V-QUEST reference directory were identified in a total of 405/8066 sequences (5 % of the cohort) and fell into two categories. The first category (291/405 sequences, 71.9 %) comprises rearranged sequences for which there were changes in the IGHV gene name or allele name without change in the percent GI (as expected for changes only due to the IMGT definitive nomenclature). The gene name changes were practically confined to two genes, IGHV4-38-2 (75/291 sequences, 25.8 %) and IGHV5-10-1 (46/291 sequences, 15.8 %). The allele changes mostly concern updates of the IGHV2-5 gene alleles (165/291 sequences, 56.7 %) (Fig. 1a). The second category (114/405 sequences, 28.1 %) comprises rearranged sequences for which there were changes in the percent GI. These are sequences for which a novel IGHV allele or an updated reference had a closest germline V-region identity; to a large extent, the changes were due to the addition of only three novel IGHV alleles, namely IGHV1-18*04 (28/114 sequences, 24.5 %), IGHV3-11*06 (28/114 sequences, 24.5 %) and IGHV3-64D*06 (31/114 sequences, 27.2 %) assigned to the novel IGHV3-64D gene (Fig. 1b) (Table 2).

Fig. 1
figure 1

Impact of novel IGHV genes and alleles on the interpretation of immunogenetic results in CLL. 8066 productive IGHV-IGHD-IGHJ gene rearrangement sequences from the IMGT/CLL-DB initiative were analyzed before and after update of the IMGT/V-QUEST reference directory of IMGT/HighV-QUEST. Differences were identified in a total of 405 sequences (5 % of the cohort) falling into two categories: a Change in the IGHV gene or allele name (291 sequences) or b Change in the percent GI (114 sequences). c Repartition of the mutational sets in the cohort before the update. Changes in the percent GI led to changes in the mutational set (50 sequences) shown with arrows (additional information concerning cases with changes in the mutational set or cases that are referred to as “other” can be found in Supplemental file 1)

In order to analyze if the changes in the percent had impact on the CLL prognosis and following our previously published definitions (Murray et al. 2008), we subdivided the 8066 sequences of the cohort into four different sets based on the level of SHM: truly unmutated (100 % GI) (n = 2636, 32.7 %), minimally mutated (99–99.9 % GI) (n = 609, 7.6 %), borderline mutated (98–98.9 % GI) (n = 397, 4.9 %) and mutated (<98 % GI) (n = 4424, 54.8 %). As expected, the recent updates in the IMGT reference directories did not affect the percent GI of those IGHV-IGHD-IGHJ rearranged sequences that were reported as truly unmutated from the previous IMGT/HighV-QUEST analysis. Changes in the percent GI led to a different mutational set assignment for 50/114 sequences (43.8 %) and 50/8066 sequences when considering the cohort (0.61 %). In detail: (i) 2 sequences moved from the mutated to the borderline mutated set, thus shifting from M-CLL to U-CLL following the 2 % GI cutoff value; (ii) 15 sequences moved from the borderline mutated set to either the minimally mutated (11/15) or the truly unmutated (4/15) set; and (iii) finally, 33 sequences moved from the minimally mutated to the truly unmutated set (Fig. 1c) (Supplemental file 1).

The advent of HTS technologies has led to an explosion of information at a scale unprecedented in cancer research. Within the field of immunogenetics, recent updates of the IMGT/V-QUEST reference directories, brought about by haplotype HTS studies, identified novel duplicated IGHV genes (CNV) and new alleles. These recent developments have implications for immunology at large since the reliable identification of SHM is fundamental for the understanding of immune processes occurring in both healthy and pathological states, where even a single change can make a big difference (Barbas et al. 1995; Murray et al. 2008). Therefore, discriminating true SHM from unrecognized polymorphisms is essential for biological inference based on immunogenetics, and as shown here, this requires up-to-date approaches and reference directories (Lefranc 2014). Given the biological and prognostic significance of SHM, our findings indicate that, mainly, researchers and physicians need to be alerted to these developments and consider re-evaluating sequence data, especially for IGHV-IGHD-IGHJ gene rearrangements that up until now were viewed as borderline mutated (GI close to 98 %), where caution is warranted.