Introduction

Alkaptonuria (AKU) [MIM 203500] is a rare autosomal recessive disorder of both historical and medical interest. It represents a classic example of a discrete biochemical lesion resulting from a single gene deficiency that gives rise to a degenerative disease. AKU was one of the four diseases described by Sir Archibald Edward Garrod as being the result of the accumulation of intermediates due to metabolic deficiencies. He linked ochronosis with the accumulation of alkaptans in 1902 (Garrod 1902), and his views on its mode of inheritance were summarized in a 1908 Croonian lecture at the Royal College of Physicians (Garrod 1908). Only after 50 years was the enzymatic defect narrowed down to the homogentisate 1,2 dioxygenase (HGD) [E.C.1.13.11.5] deficiency (La Du et al. 1958). The HGD gene was first cloned from Aspergillus nidulans (Fernández-Cañón and Peñalva 1995), and the genetic basis of human AKU was elucidated 1 year later, when the first HGD mutations were demonstrated (Fernández-Cañón et al. 1996). Since then many AKU-causing mutations have been described in patients, but our understanding of the pathogenesis of the disease in affected tissues and genotype-phenotype correlations still requires further study.

Mapping and cloning of the human AKU gene

Several groups worked on mapping the HGD gene in various organisms starting in the early 1990s. Pollak et al. mapped the human AKU gene to chromosome 3q21–q23 using homozygosity mapping in two highly inbred AKU families, one of Turkish origin and the other from the United States (Pollak et al. 1993). At the same time, an alkaptonuric mouse was detected at the Pasteur Institut in Paris, and the gene locus was localized to murine chromosome 16 (Montagutelli et al. 1994). Based on this work, Janocha et al. in the same year mapped the human AKU to chromosome 3q2 in six pedigrees of Slovak origin using the synteny between murine and human chromosomes (Janocha et al. 1994).

The human HGD gene itself was identified and characterized by the groups of Miguel A. Pañalva and Santiago Rodríquez de Córdoba from Madrid (Fernández-Cañón et al. 1996; Fernández-Cañón and Peñalva 1995). These authors showed that it is a single-copy gene that spans a genomic sequence of 54,363 bp. A 17,715 bp long HGD transcript is split into 14 exons ranging from 35 to 360 bp and coding for the HGD protein, which is composed of 445 amino acids (Fernández-Cañón et al. 1996; Granadino et al. 1997). Approximately 26% of the gene sequence of the human HGD represents repetitive elements (Granadino et al. 1997). Expression of the HGD gene is restricted to the liver, kidney, small intestine, colon, and the prostate (Fernández-Cañón et al. 1996).

The HGD gene mutations

The first clear proof that enzymatic loss in AKU is caused by mutations within HGD was provided by Fernández-Cañón et al. in 1996 by identification of the first two missense mutations in Spanish AKU families: P230S in exon 10 and V300G affecting exon 12 (Fernández-Cañón et al. 1996). Both mutations segregated with the disease in studied families, and for P230S, the authors also provided biochemical evidence that it is a loss of function mutation (Fernández-Cañón et al. 1996).

Mutation screening within the HGD gene has been performed in several countries, and until recently 96 mutations and 33 HGD polymorphisms had been encountered, including three variable dinucleotide repeats, HGO1–3 (Aquaron et al. 2009; Beltrán-Valero de Bernabé et al. 1998, 1999a, b; Felbor et al. 1999; Fernández-Cañón et al. 1996; Gehrig et al. 1997; Goicoechea De Jorge et al. 2002; Grasko et al. 2009; Higashino et al. 1998; Ladjouze-Rezig et al. 2006; Mannoni et al. 2004; Muller et al. 1999; Phornphutkul et al. 2002; Porfirio et al. 2000; Ramos et al. 1998; Rodríguez et al. 2000; Toth et al. 2010; Uyguner et al. 2003; Vilboux et al. 2009; Walter et al. 1999; Zatkova et al. 2000a, b; AKUdatabase: http://www.alkaptonuria.cib.csic.es). In addition, recently we described 11 novel HGD mutations discovered during the analysis of 13 index AKU patients from Slovak families and a further 15 index cases from different countries sent to our laboratory for mutation analysis (Zatkova et al. 2011). Another eight novel mutations were identified in 21 AKU patients from the United Kingdom and will be published soon, bringing the total number of known HGD mutations to 115 (Fig. 1a, Supplementary Table 1).

Fig. 1
figure 1

a Distribution within the HGD gene of 115 AKU mutations reported so far in about 267 families, 30 single nucleotide polymorphisms (SNPs), and 3 simple sequence repeats (SSRs). Mutations G115fs* (c.413_434 + 35del57) and V157fs* (c.470-1_494del25) are caused by genomic deletions that are predicted to cause exon 6 and 8 skipping, respectively, thus leading to frameshift. Missense changes E87A (exon 4) and N278D (exon 11) were found in the United States and Slovakia, respectively, during screening of heathy individuals. Since no pathogenic effect could be evaluated, they have been reported among SNPs (Vilboux et al. 2009; AKUdatabase). b Comparison of the proportions of HGD mutation types as identified worldwide and in Slovakia

We also created and continue to run a new online HGD mutation database (http://hgddatabase.cvtisr.sk/), in which data can be found related to AKU-causing mutations identified thus far as well as other variants of the HGD gene. Nomenclature has been corrected, and all AKU variants are now described according to the Human Genome Variation Society (HGVS) nomenclature additions (den Dunnen and Antonarakis 2000).

As can be seen in Fig. 1a, AKU-causing mutations are distributed throughout the entire HGD gene with a somewhat higher prevalence in exons 6, 8, 10, and 13. The proportions of the different types of HGD mutation are compared in Fig. 1b. Missense mutations are the most numerous with 77/115 (66.37%), followed by an equal number of small deletions and insertions causing frameshift and splicing mutations, each in 14/115 (12.2%), and then nonsense mutations in 7/115 (6%). The frequencies of individual mutations and the number and origin of patients carrying these mutations are summarized with accompanying literature references in Supplementary Table 1. As can be seen in the table, the 23 most frequent mutations are present in 361 out of 496 (72.8%) AKU chromosomes observed worldwide.

As can be seen in Supplementary Table 1, there are about 30 AKU chromosomes reported in which no HGD mutation was identified (Aquaron et al. 2009; Beltrán-Valero de Bernabé et al. 1998; Ladjouze-Rezig et al. 2006; Mannoni et al. 2004; Muller et al. 1999; Phornphutkul et al. 2002; Vilboux et al. 2009). These chromosomes might carry deep intronic mutations affecting splicing that could not have been identified when only exons with short neighboring intronic parts were analyzed in the patients. They might carry mutations in the promotor region or in other cis-regulatory sequences that also have not been captured using classic mutation detection methods.

HGD gene mutation hot spots

By re-examination of the mutations and polymorphisms that had been reported in HGD up to 1999, Beltrán-Valero de Bernabé et al. showed that the “CCC” sequence motif and its inverted complement, “GGG,” are preferentially mutated (Beltrán-Valero de Bernabé et al. 1999a). Subsequently, nucleotide c.342 + 1 G was also described as a mutational hot spot in HGD, since it was found mutated into A on two different Slovak haplotypes, and also into T in a patient from the Netherlands (Zatkova et al. 2000a). Therefore, this nucleotide and “CCC” triplets together with CpG dinucleotides are considered to be mutational hot spots in the HGD gene.

Crystal structure of the human HGD enzyme and impact of the mutations

The establishment of the crystal structure of the human HGD enzyme provided a framework for understanding the pathogenic effect of the AKU mutations (Titus et al. 2000). The active form of HGD is organized as a hexameric protein, dimer of trimers: two disc-like trimers stacked base-to-base around two-fold axes to form hexamers. The active site is formed by the C-terminal domain and the trimer interface, and it contains His292, His335, His365, His371, and Glu341. HGA binds in the active site to Glu341, His335, and His371 via the Fe2+ atom.

Many noncovalent bonds between amino acid residues (hydrogen, salt, and hydrophobic bonds) are required to maintain the spatial structure of the monomer, of the trimer, and ultimately of the hexamer. Thus, intersubunit interactions are important for the activity of the HGD enzyme, and the complex structure of the functional HGD protein can be easily disrupted by mutations.

The impact of several mutations on the HGD enzyme’s function was studied by Rodrígues et al. employing His-tagged mutant HGD proteins in E. coli and projection of mutations onto the HGD molecular structure (Rodríguez et al. 2000). All tested AKU substitutions resulted in significant loss of activity, and some of the mutations also caused reduced solubility. Although most of the mutant enzymes showed a complete lack or very low levels of enzyme activity, five mutations led to specific activity that was 22–37% of the wild type (E42A, Y62C, A122D, D153G, and M368V). The catalytic efficiency of these five mutant recombinant enzymes was reduced to 7–25% of the wild type. For example, in M368V it was 14% of the wild type (Rodríguez et al. 2000).

Mutations identified in HGD have been grouped according to predicted structural consequences (Rodríguez et al. 2000): (1) directly affect active site residues, disrupting cofactor binding (H371R, H292R); (2) indirectly affect the active site, disturbing the active site conformation (R330S, W97G ); (3) interfere with the folding of the HGD subunit, disrupting the intra-subunit hydrophobic structure (W97G, V300G,V181F, F227S, P230S, P230T) or disrupting electrostatic interactions (D153G, K248R), or removing a polar amino acid from an important position (human S189I), or introduction of unfavorable steric contacts (G161R, G270R, G360R); (4) affect intersubunit interactions, within the trimer (E42A, E168K, L25P, W60G, Y62C, A122D, M368V) or between trimers (R225H, I216T, R53W, W322R, D294N); (5) C-terminal truncating mutations—their loss leads to the loss of the interaction and to general structural defects, since C-terminal residues between Leu430 and Asn440 protrude away from the HGD subunit to contact the adjacent three-fold related subunit (Rodríguez et al. 2000).

From more recent works, Grasko et al. reported that the K57N missense mutation most likely exerts its effect by interfering with substrate traffic at the active site (Grasko et al. 2009).

DNA diagnostics and validation of novel AKU variants

Identification of both AKU-causing mutations provides a final confirmation of AKU diagnosis. During the DNA diagnostics, when a novel variant is found and no functional studies are available, especially for the missense changes, several programs are used in order to predict the variant’s possible pathogenic consequences on the structure and function of the protein. Recently, Vilboux et al. (2009) assessed the potential effect of all missense variations on HGD protein function, using five bioinformatic tools specifically designed for interpretation of missense variants (SIFT, POLYPHEN, PANTHER, PMUT, and SNAP). In general, the performances of the prediction tools are estimated to be between 50 and 80% accurate (Bromberg and Rost 2007; Ng and Henikoff 2006). Since the HGD crystal structure is already known (Titus et al. 2000), POLYPHEN and SNAP, which use information on the 3D structure of the protein, may be more reliable, as was also shown in the study of Vilboux et al. (2009). However, even these tools have their limitations for predicting the entire intricate pattern of the intra- and inter-subunit interactions that the HGD oligomeric enzyme requires for its activity and that can be inactivated at multiple levels by single residue substitutions (Rodríguez et al. 2000). Also the conformation of the active site, and consequently HGD’s function, is highly dependent on the HGD quarternary structure.

For example, in the absence of hexamer association, the R225, I216, R53, and W322 side chains would be solvent-accessible and predictably tolerant to mutations (Rodríguez et al. 2000), but it is known that they are disease-causing mutations. It is possible that some amino acid substitutions, which would be benign if HGD functioned as a monomer, show deleterious effects due to disturbances to the higher organization of the functional hexamer.

Vilboux et al. (2009) also analyzed the potential effect of splice-site variants using the two different tools BDGP and NetGene2. Another tool for predicting the effect of splicing mutations is the Human Splicing Finder, which, in addition to sequence analysis, also enables a quick mutation check (Desmet et al. 2009).

In order to identify all possible pathogenic changes, during the molecular diagnostics of the disease it is generally recommended to sequence the entire coding region, i.e., all exons of the studied gene, usually also including the neighboring intronic sequences. In order to confirm the pathogenic effect of a novel missense variant it is recommended to test whether the variant segregates in the family and associates with the disease. Helpful in this respect is also to show that the amino acid position affected by the change is conserved among the species.

Genotype-phenotype relationship

As recently reported, no apparent correlation exists between a patient’s genotype and the level of excreted homogentisic acid (Vilboux et al. 2009). Moreover, the excretion of homogentisic acid in the urine differed between siblings, and this most likely also depends on alimentation. For a patient to display AKU symptoms, a loss of more than 99% of the enzymatic activity is required, thus, the absence of a clear correlation between genotype and phenotype may be explained by the variability in residual HGD enzymatic activities (Vilboux et al. 2009).

Another reason why no extensive genotype-phenotype studies have been performed on AKU is that no clinical details were reported for the patients in whom mutations were identified. There were a few attempts to assess this correlation for patients from Algeria and France (Aquaron et al. 2009; Ladjouze-Rezig et al. 2006). The homozygous state was known in two Algerian patients: one with a mild phenotype (S181I), perhaps owing to the 3% of residual activity and/or the patient’s young age, and the second, a subject deceased at the age of 60 from renal insufficiency, with a severe phenotype (IVS1–1 G > A) (Ladjouze-Rezig et al. 2006). Another patient from La Reunion Island who carried a homozygous mutation V300G (Aquaron et al. 2009) showed a quite severe phenotype, which agrees with the finding that this mutated protein shows only 1.9% of the specific activity of the wild type (Rodríguez et al. 2000). The residual activity values were obtained by the measurement of the purified His-tagged mutant HGD proteins expressed in bacterial system (Rodríguez et al. 2000).

There is another interesting paradox regarding the HGD enzyme. There are several AKU mutations that, in functional assays, showed catalytic activities in the range of 7–25% of the wild type (Rodríguez et al. 2000). One is the prevalent mutation M368V, found in homozygosity and also compound heterozygosity, and rare mutations, each detected in just one family: A122D (in a patient in combination with M368V), D152G (with undetected mutation), and E42A and Y62C (both found in homozygosity) (Rodríguez et al. 2000). It is important to note that the disease phenotype is caused by these apparently partial loss-of-function mutations. On the other hand, the heterozygous carriers of AKU are healthy.

Due to the complex structure of the HGD enzyme, genotype-phenotype correlation studies are complicated when patients with two different mutations are to be considered. In order to fully understand all the factors influencing the functionality of the enzyme, it is important to study the function and effectivity of hybrid HGD hexamers. It may also be necessary to measure the stability of the transcripts carrying different mutations in order to understand whether they actually contribute to the formation of HGD proteins.

Population genetics of AKU

AKU has the very low prevalence of 1:100,000–250,000 in most ethnic groups. So far, 626 AKU patients have been reported in about 40 countries worldwide (Ranganath et al. 2011). Interestingly, countries such as Slovakia and the Dominican Republic exhibit an increased incidence in this disorder of up to 1:19,000 (Milch 1960; Srsen and Varga 1978).

A map showing all countries and the number of the AKU patients in whom HGD mutations have been identified is shown in Fig. 2. AKU-causing mutations found in these countries and the geographical distribution of all individual HGD mutations in the world can be found in Fig. 3.

Fig. 2
figure 2

Map showing the number of AKU patients in whom HGD mutations have been identified thus far, by country

Fig. 3
figure 3

Summary of HGD mutations found in all countries where genetic testing has been performed in AKU patients. Selected mutations are identified by color in order to highlight their distribution. Numbers in parentheses indicate the number of analyzed patients

In order to trace the origin of the identified AKU mutations, the allelic associations of HGD intragenic polymorphisms are usually employed. Where possible, HGD haplotypes are constructed based on the segregation in the family of some known single nucleotide polymorphisms [IVS3-112 C/T (c.176-112 C/T), H80Q (c.240 T/A), IVS4 + 31A/G (c.282 + 31A/G), IVS5 + 25 T/C (c.342 + 25 T/C), IVS6 + 46 C/A (c.434 + 46 C/A), IVS11 + 18A/G (c.879 + 18A/G)] and three dinucleotide repeats (HGO-3/D3S4556, HGO-1/D3S4496, HGO-2/D3S4497). Allelic associations (haplotypes) of all HGD mutations analyzed thus far can be downloaded from the novel HGD gene mutation database (http://hgddatabase.cvtisr.sk/).

Using this type of analysis, Beltrán-Valero de Bernabé et al. (1998) showed already in 1998 that patients from different countries who shared the same mutations—M368V, V300G, or P230S—also shared the same haplotype. Alternatively, the haplotypes differed among these patients only in regions distal to the mutation position, thus, the differences could be explained by recombination events (see the table “HGD haplotypes associated with the AKU mutations” in the HGD mutation database). Thus, the authors concluded these were most likely old mutations introduced to Europe with the founder populations, and they have spread throughout western Europe along with the different migrations (Beltrán-Valero de Bernabé et al. 1998).

The missense change M368V is the most frequent mutation found mainly in Europe and present in 59 out of 526 AKU chromosomes (11.2%) (Supplementary Table 1). It has so far been reported in Finland, France/Armenia, Germany, Portugal, Slovakia, Spain, Switzerland/Belgium, the Netherlands, the United States, and the United Kingdom.

There are also several other mutations spread throughout the world, such as S59fs (R58fs), which has been identified in patients from Finland, La Reunion, Slovakia, India, Turkey, UEA, the United Kingdom, and the United States. One of the first identified AKU mutations, V300G, has been found in France, Germany, La Reunion, Portugal, Slovakia, Spain, the United States, and the United Kingdom, while P230S occurs in Spain, Turkey, Slovakia, the United States, and the Canary Islands.

On the other hand, there are mutations that are rather specific to some countries or regions, for example IVS5 + 1 G > A in Slovakia and the Czech Republic, H371fs (P370fs) in Slovakia, and C120W in the Dominican Republic (Figure 3).

Haplotype analysis was used to study cases in countries with increased incidence of AKU. Two mutations were identified in 16 Dominican AKU chromosomes: C120W (14/16), which is the classic founder mutation, and G270R (2/16), which most likely represents a recurrent mutational event, since Dominican patients carrying this mutation showed a different haplotype from the Slovak and Italian ones (Goicoechea De Jorge et al. 2002).

A total of 12 different AKU mutations have been established in Slovakia in 104 AKU chromosomes from 50 families and are present in the literature (Gehrig et al. 1997; Muller et al. 1999; Zatkova et al. 2000a, b, 2003, 2011). Thus, in Slovakia it is difficult to explain the increased incidence of AKU in this relatively small country by a classic founder effect.

Recently, the initial results of screening family members with a history of AKU in the southern region of Jordan have been reported (Al-Sbou and Mwafi 2010). The authors presented nine cases of AKU in one Jordanian family, and within a short time, the number of all identified AKU cases in Jordan rose significantly. Mutation analysis of these patients is underway. It is therefore possible that the incidence of this disease in many countries is higher than previously thought.

AKU with other than autosomal recessive inheritance

Several families have previously been reported in which AKU appeared to be inherited dominantly (Khachadurian and Feisal 1958; La Du 1958; Milch 1955; Oexle et al. 2008). In some of these, this segregation was explained by extended consanguinity. However, in a family presented by Oexle et al. (2008), haplotype analysis did not support the likelihood that the identity-by-descent at the HGD locus had resulted in pseudo-dominant alkaptonuria. These authors discussed that dominant mutations of a different gene, such as a hitherto unrecognized cofactor, may be responsible for AKU. Although there seems to be no simple HGA-HGD feedback control in eukaryote organisms, AKU can also be caused by disturbance in this mechanism, for example by constitutional activation of negative control on HGD expression (Oexle et al. 2008).

Slovak AKU genetic specificities

In the rather small population of 5 million in Slovakia, 208 patients have been registered (Srsen et al. 2002) and a total of 12 different HGD mutations have been established, revealing a remarkable allele heterogeneity of AKU in this country.

An allelic association was performed for 11 HGD intragenic polymorphisms in a total of 69 AKU chromosomes from 32 Slovak pedigrees. These were then compared to the HGD haplotypes of all AKU chromosomes carrying identical mutations characterized thus far in non-Slovak patients in order to study the possible origin of these mutations (details are available in Zatkova et al. 2011). Based on the analysis and comparison of haplotypes, two groups of HGD mutations were observed in Slovakia. To the first group belong mutations such as P230S, V300G, S59fs (R58fs), M368V, and IVS1-1 G > A, which account for 17.3% of the Slovak AKU chromosomes and thus provide a marginal contribution to the AKU gene pool in this country. The most frequent European mutation, M368V, is present in one copy in only two unrelated Slovak families. Mutations of this group are shared by different populations and have most likely been introduced into Slovakia by the founder populations that spread throughout Europe (Zatkova et al. 2000a). The second group consists of the remaining seven mutations established in 82.7% of Slovak patients. These include the most prevalent G161R (44.2%), D153fs (G152fs) (14.4%), H371fs (P370fs) (11.5%), and G270R (7.7%), as well as IVS5 + 1 G > A present on three AKU chromosomes and the S47L and E178G mutations observed each in only one patient. It is likely that mutations from this second group originated in Slovakia.

The distribution of the identified mutations within Slovak territory is also interesting. As previously reported, examination of the geographical origin of Slovak AKU mutations shows remarkable clustering in a small area in northwest Slovakia, with these mutations most likely originating in this area and spreading into other regions after the breakdown of genetic isolates in the 1950s (Zatkova et al. 2000a).

As the combined sequence and haplotype analysis shows, 7 of the 12 AKU mutations (58.3%) that most likely originated in Slovakia are associated with hypermutated sequences in the HGD, while worldwide it is 40/115 (34.8%) (Fig. 1b). Therefore, it is possible that an increased mutation rate in the HGD gene in a small geographical region is responsible for the high genetic heterogeneity in Slovak AKU (Zatkova et al. 2000a). However, it remains unclear which mechanism acted specifically on the HGD gene to increase its mutation rate, since similar targets are also present in other genes without evident elevated gene frequency in Slovakia (Srsen et al. 2002; Zatkova et al. 2000a). The increased number of mutations could also be the result of random accumulation of mutations in the region. It has been discussed that the Valachian colonization during the 14–17th centuries may also have played a role in the increased prevalence of AKU in Slovakia (Srsen et al. 2002; Zatkova et al. 2000a). The preservation of the most prevalent AKU variants, which either arose in Slovakia or were brought there, may be the result of a founder effect and genetic drift, due to the geographic isolation of villages in northwest Slovakia.

Perspectives

In AKU, strong collaboration between the patients’ associations and researchers has been observed, and this can serve as an example for other rare disease initiatives. There are several support networks for AKU patients, including the AKU Society (http://www.alkaptonuria.info), French ALCAP (http://www.alcap.fr/), Italian AIMAKU, and the emerging U.S. AKU society. Amongst other support services, they provide patients with the best information about the latest news, research, and treatments of AKU. In Slovakia, the National Center for Alkaptonuria and Ochronosis (NCAO) is being established by the National Institute of Rheumatic Disease (NIRD) in Piestany. In 2009 the AKU Society and the University of Liverpool started a joint collaborative research project, FindAKUre (http://www.findakure.org), which is the first of its kind to examine the causes of AKU. This project will enable researchers to use their ochronosis models to provide a fundamental understanding of the development of this condition and help to develop potential therapies.

Several treatment strategies have already been suggested for AKU including nitisone, the triketone herbicide, which inhibits the 4-hydroxyphenylpyruvate dioxygenase enzyme that produces HGA (Anikster et al. 1998; Phornphutkul et al. 2002; Suwannarat et al. 2005; Suzuki et al. 1999). There are also interesting recently published preclinical studies concerning antioxidant biomolecules and their effect in preventing pigment deposits in cartilaginous tissue (Tinti et al. 2010).

The molecular and genetic analysis of AKU patients certainly presents a useful data source for genotype-phenotype correlations and also for future clinical trials. We believe that appropriate genetic information will help investigators and clinicians gain a better understanding of the significance of different HGD variants found in their patients. In order to fully understand all the factors influencing the functionality of the enzyme with the perspective of possible molecular therapy, it is also important to study hybrid HGD hexamers carrying various combinations of the AKU-causing mutations.