Introduction

Charcot-Marie-Tooth disease (CMT) represents a group of degenerative disorders of the peripheral nervous system, characterized by progressive weakness and atrophy of distal limb muscles and sensory abnormalities. CMT is the most common type of inherited peripheral neuropathy affecting 1 in 2,500 individuals worldwide [1]. Autosomal recessive CMT (ARCMT) is considered to be rare in the European CMT population accounting for less than 10 % of patients. This frequency is most likely underestimated due to the small size of sibships, with many ARCMT patients being unrecognized and considered as sporadic cases. ARCMT, similarly to other rare recessive disorders, is more frequent (30–50 % of CMT patients) in populations with high prevalence of consanguineous marriages (e.g., in the Mediterranean basin or in the Middle East) [2].

ARCMT can be divided into demyelinating (CMT4), axonal (ARCMT2), or intermediate types based on clinical, electrophysiological, and neuropathological criteria. Clinical presentation of patients is similar to that of other types of CMT; however, it has a more severe disease course. The first symptoms are early in life (can be even congenital or in early infancy) and result in delayed developmental milestones. Progression of muscle atrophy and weakness often leads to loss of ambulation. The widespread involvement of peripheral nerves is frequently associated with vocal fold paralysis, sensorineural deafness, bulbar, facial and diaphragmatic weakness, and can lead to early death in the most severely affected patients. Accompanying phenotypic features, like scoliosis, glaucoma, myelin outfoldings, or neuromyotonia, can guide the correct genetic diagnosis of some ARCMT subtypes [36].

Up until now, 27 genes were causally associated with ARCMT, which however explain only a small proportion of cases. The known genes have been identified mainly by linkage analyses and candidate gene screenings in large pedigrees or in specific ethnic minorities (e.g., Gypsies), while the smaller families often remained undiagnosed. Furthermore, there is no single gene or mutation accounting for the majority of patients, and most of them carry private defects in rare genes. This extensive non-allelic heterogeneity challenges the molecular genetic diagnosis and follow-up of the ARCMT patients and hampers the development of effective treatment strategies.

In the current study, we analyzed a cohort of 174 nuclear families or isolated patients diagnosed with autosomal recessive Charcot-Marie-Tooth disease. Combining gene-based screening with homozygosity mapping and mutation analysis of positional CMT candidates, we were able to provide a molecular diagnosis to 41.3 % of the families. The gene screening was guided by genetic rather than clinical criteria, and we were able to expand the phenotypic spectrum of mutations in the SH3TC2 gene. Also, homozygosity mapping allowed estimating the actual degree of inbreeding and the corresponding size and number of homozygous regions in each patient or family, which facilitated mutation identification. Our results present an overview on the ARCMT genetic architecture and provide guidelines for future gene screenings in ARCMT patient cohorts.

Materials and methods

Standard protocol approvals, registrations, and patient consents

All patients or their legal representatives signed an informed consent form prior to enrolment. This study was approved by the local institutional review boards.

Patient cohort

In this study, we included 174 index patients with peripheral neuropathies either inherited in autosomal recessive fashion or sporadic patients, descendants from consanguineous marriages. Diagnosis of CMT in the probands was established by an experienced physician and was based on neurological examination and electrophysiological evaluation.

In 87 selected families, we performed SNP genotyping and subsequent candidate gene sequencing. The patients in the genotyped cohort were diagnosed with CMT1 (38), CMT2 (25), CMT-intermediate (2), hereditary motor neuropathy (2), and hereditary neuropathy of unspecified type (20). Families included in this part of the study were of Turkish (83 %), Arab (5.7 %), Roma (4.5 %), or other (6.8 %) origin. We had data on parental consanguinity for 34 pedigrees, including double first cousins (2), first cousins (23), first cousins once removed (6), second cousins (2), and second cousins once removed (1) marriages. Other 23 families had unknown degree of relatedness, for 12 no information on parental consanguinity was available and for 18 no consanguinity was reported.

SNP genotyping and homozygosity mapping

Whole genome SNP genotyping on 155 individuals from 87 ARCMT families was performed with the Illumina Human660W-Quad (42 families) or OmniExpress (45) platforms. Data analysis was based on the human genome reference hg18 and hg19, respectively. We used the GenomeStudio software to calculate individual call rates (CR) and prepare files for bioinformatics analysis. Our quality filtering eliminated from further analysis samples with CR < 98 %. Subsequent filtering steps were conducted with the PLINK software [7]. We excluded SNPs with low genotyping rate (uncalled in more than 10 % of individuals) and with low heterozygosity (minor allele frequency <0.05). Runs of homozygosity (ROH) in PLINK scanned the genome with a window of 50 consecutive SNPs, of which 94 % had to be homozygous and allowing 3 heterozygous SNPs. Overlap between sliding windows was 5 %. Acceptable missingness was three SNPs that were not called within the window. Allowed gap between SNPs was 1,000 kb. Subsequently, we selected only regions ≥1 Mb in size and containing minimum 100 SNPs. In the intra-familial analysis, overlapping homozygous segments were compared pairwise and were considered to be shared between individuals if allelic matching was declared for 95 % of all jointly homozygous and non-missing SNPs. Individual homozygous regions and the shared ones were visualized in Microsoft Excel using in-house developed Perl scripts. We developed Microsoft Excel-based script in order to automatically investigate the presence within selected homozygous regions of known CMT genes and genes involved in autosomal recessive forms of either ataxia or hereditary spastic paraplegia associated with peripheral neuropathy symptoms.

Mutation analysis

Total genomic DNA was isolated from peripheral blood samples and used as a template in the polymerase chain reactions (PCR). All coding exons and exon–intron boundaries of HINT1, GDAP1, SH3TC2, MTMR2, PRX, FGD4, MFN2, SBF2, NEFL, LMNA, NDRG1, HK1, and LRSAM were amplified using primer oligonucleotides designed with Primer3 (primer sequences and PCR conditions are available upon request). Subsequently, PCR products were purified with the exonuclease I-shrimp alkaline phosphatase enzymes (USB, Cleveland, USA) and sequenced in both directions using the BigDye® Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems, Foster City, USA). Electrophoretic separation of fragments was performed on an ABI3730xl DNA Analyzer (Applied Biosystems, Foster City, USA). Sequences were analyzed with SeqMan™ II (DNASTAR Inc., Madison, USA) program. Mutations were described according to the latest conventions of the HGVS nomenclature (http://www.hgvs.org/mutnomen) with nucleotide numbering based on the published online (www.ncbi.nlm.nih.gov/) protein and mRNA sequences: HINT1 (NM_005340.5, NP_005331.1), GDAP1 (NM_018972, NP_061845), SH3TC2 (NM_024577.3, NP_078853.2), MTMR2 (NM_016156.5, NP_057240.3), PRX (NM_181882, NP_870998.2), FGD4 (NM_139241.2, NP_640334.2), MFN2 (NM_014874.3, NP_055689.1), SBF2 (NM_030962.3, NP_112224.1), NDRG1 (NM_001135242.1, NP_001128714.1), and HK1 (NM_033498.2, NP_277033.1). All sequence variants were confirmed by an independent PCR and resequencing of the original or newly obtained DNA samples. Segregation analysis of the mutations with the disease phenotype was performed in all available family members. For the newly identified mutations, 100 ethnically (Turkish, Belgian, Bulgarian) matched control individuals were screened.

Haplotype analysis

Haplotype sharing between families with the common GDAP1 deletion was ascertained with four highly informative short tandem repeat (STR) markers surrounding the gene (D8S286, D8S551, D8S1144, D8S548). STRs were first PCR-amplified with fluorescently labeled primer pairs (sequences are available at www.ncbi.nlm.nih.gov/), and fragments were subsequently combined with a formamide and GeneScanTM 500 Liz® Size Standard (Applied Biosystems, Foster City, USA) mixture (ratio 1:30) and size-separated on an ABI3730xl DNA Analyzer. Genotyping results were analyzed with Local Genotype Viewer, an in-house developed software program (http://www.vibgeneticservicefacility.be/).

Multiplex amplicon quantification assay (MAQ)

The presence of 3′-end partial deletion in GDAP1 was investigated by the MAQ assay (http://www.multiplicon.com/). In this assay, we performed multiplex PCR of ten fluorescently labeled amplicons targeting the genomic region of GDAP1 and six reference amplicons located at randomly selected genomic positions outside the GDAP1 region and other known copy number variations (CNVs). PCR fragments were mixed with a formamide and GeneScanTM 500 Liz® Size Standard (Applied Biosystems, Foster City, USA) solution (ratio 1:30) and size-separated on ABI3730xl DNA Analyzer. The ratio of peak areas between the target and the reference amplicons was calculated. Comparison of the normalized peak area values between patients and control individuals allowed determination of a dosage quotient (DQ) for each target amplicon, calculated by the MAQ-S package (http://www.multiplicon.com/). DQ values below 0.75 were considered indicative for amplicon deletion.

Results

Mutation analysis of the most common ARCMT genes

Initially, we performed systematic analysis of 156 ARCMT families with unknown molecular defects (Fig. 1). We screened this cohort for the most frequently mutated ARCMT genes known to date, independent of the clinical phenotype of the patients. Mutation analysis of the coding regions and exon–intron boundaries of GDAP1 and HINT1 and the known hotspot mutations (R954* and R1190*) in SH3TC2 identified 15, 17, and 9 molecular defects, respectively. Some of the mutations were described previously [5, 812], and the novel or known pathogenic defects in non-reported pedigrees are presented in Table 1.

Fig. 1
figure 1

Project strategy scheme

Table 1 Mutations identified by the gene-based screening of HINT1, GDAP1, and SH3TC2 and after homozygosity mapping

Additionally, 12 patients with known Roma ethnic background were screened for the private founder mutations in NDRG1 (R148*) and HK1 (AltT2 G>C + G>A intron). We identified ten individuals homozygous for the NDRG1 mutation and two for the HK1 alterations.

Collectively, this analysis provided molecular diagnosis to 53 out of the 156 families (33.9 %).

Homozygosity mapping analysis

As the remaining known ARCMT genes have very low frequency (<2 %) or are described in single families, we opted for an approach that would allow identification of the underlying defects in the rest of the patients while respecting the genetic heterogeneity of the CMT disease. To this end, we selected 69 ARCMT pedigrees from the initial cohort, based on the available genealogical information, reported consanguinity, ethnical, or geographic clustering. Additional 18 consanguineous families were included only at this stage of the study and therefore were not pre-screened for any gene (Fig. 1).

We performed genotyping with high-density SNP arrays of 155 individuals from these 87 families, followed by homozygosity mapping. We determined the median number of homozygous segments ≥1 Mb in size per affected individual. This number progressively decreased from 61 for patients, whose parents were double first cousins, to 48 for first cousins, 35 for first cousins once removed, and to 40 for second cousins. The average sizes of the homozygous SNP blocks were 7.1, 5.4, 4.7, and 2.6 Mb, respectively (Supplementary Table 1).

We also correlated the proportion of homozygous autosomal genome with the degree of consanguinity. In descendants from first cousin marriages, the observed median genome homozygosity was 8.2 %, while the theoretical predictions suggest 6 % (1/16). Similarly, the calculated median values were higher by at least 2–3 % than the expected ones for all other consanguinity groups (Supplementary Table 1). Interestingly, patients with ARCMT and no documented history of consanguinity showed increased homozygosity of their genome that was often higher than the homozygosity observed in descendants from second cousin marriages (Supplementary Fig. 1).

To reduce the number and the size of homozygous regions containing putative ARCMT-causing mutations, we included the genotypes of additional sibs, available in 36 % of the families. Combined analysis of two affected individuals decreased the number and size of homozygous blocks by ∼60 %. Inclusion of unaffected sibs also decreased the percentage of homozygous genome (and number of homozygous blocks) by 12 (33) %, 48 (50) %, or 73 (68) %, respectively for one, two, or three unaffected individuals.

Furthermore, we could not identify large autozygous regions shared by two affected individuals in the Belgian family cmt68, thus suggesting the presence of compound heterozygous mutations. Subsequent linkage analysis combined with whole genome sequencing revealed two compound heterozygous mutations in the novel at that time HINT1 gene [6].

Gene and mutation contribution to ARCMT

We used the homozygosity data, rather than clinical features of the patients, to Sanger sequence any known ARCMT gene located within a large (>1 Mb) homozygous region. With this approach, we successfully identified mutations in 19 out of 87 families (22.9 %) (Table 1). We found mutations in SH3TC2 (four), HINT1 (two), GDAP1 (two), FGD4 (two), MFN2 (two), PRX (two), SBF2 (two), MTMR2 (two), and NDRG1 (one). Although NEFL, LMNA, and LRSAM1 were located in large homozygous regions in several families, we could not find mutations in the coding regions of these genes.

In total, our screening efforts resulted in identification of 14 non-reported sequence variations in GDAP1 (five), SH3TC2 (three), SBF2 (two), MTMR2 (two), and FGD4 (two) (Table 1). Importantly, we found a partial homozygous deletion of GDAP1 in three presumably unrelated families originating from different parts of Turkey (cmt131, cmt239, and cmt1226). Haplotype analysis with flanking microsatellite markers revealed a shared chromosomal region surrounding GDAP1, suggesting a founder effect (Supplementary Fig. 2a). The deleted genomic segment encompasses intron 5, exon 6, and three prime untranslated region (3′ UTR) of the gene. Heterozygosity of this deletion in the parents was confirmed by the MAQ assay (Supplementary Fig. 2b).

For all variants, we confirmed segregation with the disease phenotype in available family members. Presence of the newly identified mutations in ethnically matched control individuals was also excluded.

The phenotypic characteristics of the patients with newly identified mutations are presented in Supplementary Table 2.

Our combined target- and homozygosity-based gene screening approach identified disease-causing mutations in 73 ARCMT families and provided molecular diagnosis to 41.3 % of the 174 ARCMT index patients analyzed (Fig. 2).

Fig. 2
figure 2

Gene contribution to ARCMT in the studied cohort

Discussion

ARCMT is a disorder found in all ethnic groups; however, its prevalence varies significantly between populations and is rather low in the Western societies, when compared to autosomal dominant CMT forms. ARCMT patients are dispersed among clinical and research centers and very few large patient collections exist worldwide. Accordingly, the majority of studies performed so far focused on single ARCMT genes, specific phenotypes, or populations. Here, we present the findings in 174 nuclear families and sporadic patients diagnosed with ARCMT. To our knowledge, this is the first systematic study providing a general overview on the genetic landscape of ARCMT in a large cohort of patients.

Considering the extensive clinical and genetic heterogeneity of ARCMT, we pursued a genetic approach to unravel the molecular defects in our patients. Initially, we examined them for GDAP1, SH3TC2, and the recently discovered HINT1, followed by homozygosity-based CMT gene screening. In the ARCMT field, application of homozygosity mapping and/or linkage analyses have been proven successful in small diagnostic studies [13] and in identification of novel genes [6, 1416]. Therefore, we SNP-genotyped the affected patients of 87 families with high-density arrays, providing 550,000–700,000 equally spaced genetic markers per individual. We selected autozygous stretches ≥1 Mb in size and containing at least 100 consecutive homozygous SNPs. We used the number of SNPs as an inclusion criterion, as in non-random locations of the human genome with low recombination rates, regions of increased homozygosity-by-state are present that are characterized by low number of polymorphic markers [17]. Using these parameters, the average size of a homozygous block per ARCMT individual, whose parents are first cousins, was 5.5 Mb and the average number of homozygous regions was 48. These numbers could be further reduced by including additional affected or non-affected family members, when available.

We applied homozygosity mapping as an unbiased prioritization tool for analysis of the numerous known ARCMT loci. We sequenced a total of 12 CMT genes located within homozygous regions larger than 1 Mb and found mutations in nine of them, i.e., SH3TC2, HINT1, GDAP1, FGD4, MFN2, PRX, SBF2, MTMR2, and NDRG1. Genes with identified mutations were residing in the largest shared homozygous block in 35 % of families. In patients, whose parents were first cousins, the average size of the disease-associated homozygous region was 19 Mb (n = 8, range 4.4–45 Mb). Notably, in family cmt112, in which we detected high degree of relatedness of the parents (calculated autosomal genome homozygosity of ∼5 %), the disease-associated region was only 1.7 Mb in size. This finding has important practical consequences, particularly in light of the numerous reports recommending the mutation search to be focused primarily on genes residing in one of the largest homozygous regions [18]. One should consider the genetic map density and the criteria for homozygosity in order not to miss small homozygous stretches that could also contain a disease-causing mutation.

Homozygosity analysis facilitated the molecular diagnosis of 22 % of patients in the genotyped cohort. Our success rate is lower in comparison with a recently published study [13], in which homozygosity mapping provided ARCMT diagnosis in 63 % of the cases. Fischer et al. however investigated only 24 index patients, while our cohort is substantially larger. Also, the fact that 7/24 patients with known ARCMT mutations were retrospectively analyzed for homozygosity could inflate the detection rate. Despite the differences in sensitivity, outcomes of both studies are complementary in showing that homozygosity mapping is an efficient tool in providing molecular diagnosis for a highly heterogeneous genetic disease, like ARCMT.

We identified 14 novel ARCMT mutations. New sequence variations were found in GDAP1 (five), SH3TC2 (three), SBF2 (two), FGD4 (two), and MTMR2 (two). Homozygous partial deletion in GDAP1 was identified in three Turkish families. STR marker analysis showed ancestral haplotype sharing between them, thus revealing a new GDAP1 founder mutation in the Turkish population. Importantly, since the deletion encompasses intron 5, exon 6, and 3′ UTR, this small copy number variation can be easily missed if only parents are analyzed (e.g., the index patient is deceased) or if it occurs in trans with another heterozygous mutation. Thus, dosage analysis of exon 6 should be considered for ARCMT patients having only one heterozygous GDAP1 mutation.

The probands carrying the GDAP1 deletion showed an early onset phenotype and had walking difficulties requiring supports. In cmt239, Pectus carinatum was present, which could be considered as a skeletal deformity due to CMT, along with hoarseness and hypophonia. Although the patients had the same mutation, electrophysiological differences were observed between them. While cmt1226 and cmt239 had severely reduced or unrecordable nerve conduction velocities, cmt131 had considerably higher NCVs. Similar clinical heterogeneity was documented for other GDAP1 mutation, even among patients within the same pedigree [10]. The peripheral nerve biopsies were characterized by absence of large myelinated fibers and overt signs of demyelination and axonal degeneration. Overall, this partial GDAP1 deletion was characterized by mixed features, highlighting GDAP1 as a gene causing an intermediate CMT phenotype.

Interestingly, our unbiased genetic approach allowed establishing of important genotype–phenotype correlations. In family, cmt1224 with the novel p.L780P SH3TC2 mutation the proband had median NCVs values ≥40 m/s. The conduction velocities of patients with SH3TC2 mutations reported so far are usually within the range 4–34 m/s (http://neuromuscular.wustl.edu/time/hmsn.html). In the available literature, the only two exceptions are a CMT patient with median motor NCVs of 42.0 m/s, amplitude of 1.9 mV, and unexcitable median sensory nerves [12] and a patient reported by ref. [19] with median motor NCVs of 39.8 m/s, amplitude of 6.6 mV and median sensory NCVs of 43.0 m/s, and amplitude of 2.9 mV. Our findings extend the electrophysiological spectrum of SH3TC2 mutations and suggest that along with intermediate conduction velocities, also patients with axonal electroneurographic findings should be considered for testing of this gene.

In patient cmt1236 with the novel p.A639Pfs*6 SH3TC2 mutation, cerebellar dysfunction (tremor, bilateral horizontal nystagmus, titubation of the head with cerebellar dysmetria) was apparent. This type of central nervous system involvement has not been associated with SH3TC2 mutations so far. We cannot exclude, however, that an additional gene might cause the cerebellar features in the patient.

The clinical phenotype of the remaining patients with newly identified mutations was in agreement with the genotype–phenotype correlations reported in the literature.

Our extensive screening efforts could not identify genetic defects in 58 % of the patients. Overall, there were no major clinical differences between the CMT individuals with known and unknown genetic causes. We cannot exclude that mutations located in unknown regulatory regions or deep-intronic sequences could have been missed by focusing on protein-coding regions. Furthermore, mutations in genes associated with other neuromuscular disorders that mimic CMT clinically could cause the disease in part of the families. Nevertheless, our findings underscore the heterogeneous molecular etiology of ARCMT and imply many unidentified disease-causing genes to exist.

The findings in this large-scale study allow us to suggest some guidelines for molecular genetic analysis of patients with ARCMT. Three genes are distinguished as major players in our ARCMT cohort, namely GDAP1 (10.9 %), HINT1 (10.3 %), and SH3TC2 (7.5 %). Sanger or panel sequencing of these genes, which consist of only six, three, and two hotspot exons, respectively, would provide molecular diagnosis in ∼25 % of the cases. It is also meaningful to first exclude known population-specific founder mutations. For example, the overall contribution of NDRG1 is only 6.3 %; however, it reaches 17.9 % in the Gypsy population and should be tested in any patient with this ethnicity and demyelinating type of CMT [20]. The remaining known ARCMT genes have rather limited contribution to this disease (∼1 %). Therefore, after screening the three most common genes, in view of constantly decreasing running costs, one could directly proceed with whole exome sequencing (WES) of the index patient. WES presents an interesting future alternative of the application of CMT gene panels, provided that the capturing and coverage will be improved. This way, finding of known genetic causes and searching for variations in new candidate genes can be performed simultaneously.

Homozygosity mapping with extracted SNPs from whole exome sequencing data can be used as a prioritization tool, pointing the regions encompassing known or novel ARCMT genes [21, 22]. WES-based homozygosity approach, however, will only identify autozygous loci that are located within regions containing a sufficient number of informative SNPs. Therefore, an alternative could be SNP genotyping of at least one affected individual prior to whole exome sequencing. Our data suggest that in patients originating from populations with prolonged history of consanguinity, the percentage of homozygous autosomal genome might be at least 2–3 % higher than the theoretical predictions [23]. Woods et al. by studying patients with autosomal recessive disorders originating from Pakistani and Arab populations reported similar findings [18]. In their study, mainly patients from first cousin marriages (n = 38) were analyzed, and the percentage of homozygous autosomal genome was on average 11 %, while for us, this equals to 8.6 % (n = 27). Interestingly, among the ARCMT patients with identified mutations, but no data or negative data for consanguinity, the percentage of homozygosity was also increased and could be as high as for descendants of a second cousin marriage. Therefore, in sporadic patients with increased degree of homozygosity, a recessive CMT inheritance should primarily be suspected. Contrastingly, lack of large in size (>1 Mb) or number of homozygous regions in any ARCMT family will be indicative of most likely compound heterozygous mutation underlying the phenotype.

In conclusion, our findings contribute to the knowledge on the molecular basis of ARCMT by providing an overview of the ARCMT genetic landscape, updating the ARCMT gene frequencies and broadening the mutation and clinical spectrum of known genes, like GDAP1, SH3TC2, MTMR2, SBF2, and FGD4. Furthermore, we propose guidelines for molecular investigation of ARCMT patients considering recent technological advances. Our findings have major implications for future molecular diagnostics and research in the field of peripheral neuropathies and other disorders with extensive genetic heterogeneity.