Introduction

Low back pain (LBP) from degenerative disc disease (DDD) is one of the most common disorders seen in general and orthopaedic practices. As many as one out of three consultations in general practice are related to LBP. It is a significant cause of work related sick leave and results in loss of working hours to the detriment of all industrialized societies.

Degenerative disc disease has been attributed to the accumulation of environmental factors, primarily mechanical insults and injuries, imposed on the “normal” aging changes. Such factors include occupation, sporting activities, spinal injuries, cigarette smoking and atherosclerosis. Numerous studies of these exposures have produced mixed findings related to the presence and degree of association with disc degeneration [10, 21, 22, 37, 48, 50, 51]. In contrast, a number of studies have shown an association between genetic influences and disc degeneration [6, 7, 39, 47, 53]. A study of monozygotic twins with different environmental backgrounds showed that disc degeneration might be explained primarily by genetic influences [7]. The commonly implicated environmental factors had only very modest effects. Another study on a large population twin sample have shown quantitative measures of disc degeneration to have a large genetic component [53]. Indeed, a number of studies have identified specific genetic risk factors (genes) associated with DDD [1], with risk of developing DDD quoted to be increased up to six times that of the general population [12]. It is also likely that DDD is a complex/multifactorial disease determined by the interplay between gene(s) and the environment [56].

This review focuses on the evidence for genetic disposition to DDD, the genes or biological processes that are implicated, and the need to consolidate resources and clarify phenotype definition to take advantage of the new technologies in genetic analysis to enhance our understanding of DDD.

Genetic risk factors for DDD

While studies have indicated contribution from familial aggregation and genetic influences, it was not clear whether there existed genes with a relatively strong effect. So far, the genes associated with DDD have been identified using a candidate genes approach, and their effect size are generally modest. Apart from the vitamin D receptor (VDR), the other genes that have been identified to be associated with DDD all code for molecules that contribute to or affect the integrity/function of the extracellular matrix present in cartilage and responsible for the mechanical properties of intervertebral discs. This is related to the general thinking that disc failure is directly related to accumulative stress loading on the disc, leading to lost of structural integrity and thus function of the disc unit. A number of predisposing genes have been reported but at present, only the association of the VDR and COL9A2 genes with DDD have been verified in different ethnic populations.

Vitamin D receptor

Vitamin D receptor is a steroid nuclear receptor, better known for its role in normal bone mineralization and remodelling [26]. Its gene polymorphisms (FokI and TaqI) are thought to contribute to common disorders such as osteoporosis [42, 49, 65], osteoarthritis (OA) [34, 66] and others [68]. VDR was the first reported gene associated with DDD in a study of monozygotic twins in Finns [69], with alleles of the TaqI and FokI polymorphism being associated with reduced magnetic resonance imaging (MRI) signals of thoracic and lumbar discs. This association was later confirmed in a study of 205 Japanese volunteers and patients between age 20 and 29 years, with the Tt genotype of the TaqI polymorphism being more frequently associated with multilevel disc disease, severe disc degeneration and disc herniation than the TT genotype [33]. The association of the TaqI polymorphism to DDD was further substantiated in a population study in Chinese with an odds ratio of 2.61 [12]. Interestingly, this association is age-dependent with an even higher odds ratio of 5.97 in a subgroup analysis of individual under 40 years of age. The age correlation is consistent with the finding in young Japanese volunteers and patients [33]. The association in the Chinese cohort was to changes in the MRI signal intensity of the nucleus pulposus, and no association was detected for structural defects such as annular tears and Schmorl’s nodes [12]. It is of interest that the frequency of the risk t-allele is significantly different between the three major ethnic populations, being 8% in Asians, 31% in Africans and 43% in Caucasians [68].

The replication of the TaqI polymorphism in three different ethnic populations presents VDR as the most robust of associated genes identified for DDD. The mechanism by which the TaqI polymorphism disposes to risk susceptibility is not clear as it is a synonymous polymorphism in exon 9 of VDR. However, a recent functional study in osteoblasts has shown that a risk haplotype containing the t-allele of TaqI increases mRNA decay by 30% relative to a haplotype containing the T-allele [18], which could impact the vitamin D signalling efficiency.

The only “functional” polymorphism amongst the over 25 different polymorphisms [68] that have been identified the VDR gene is FokI. The FokI polymorphism in exon 2 eliminates the first ATG translation initiation codon and allows the second start codon, 9 bp downstream to be translated. Thus, there are two possible protein forms that differ by three amino acids. A study has shown that the short form (M4 or the f-allele) has a higher transcriptional activity than the long form (M1 or the F-allele) [70]. However, confirmation of the FokI association to DDD awaits replication in other ethnic populations.

Vitamin D can influence sulphate metabolism which is important for sulphation of glycosaminoglycans (GAGs) during proteoglycan synthesis [20]. Thus, a hypothesis is that the polymorphisms affect receptor level and function leading to changes in the structural characteristic of the extracellular matrix in the intervertebral disc.

Genes encoding the collagen IX molecule (COL9A2 and COL9A3)

Genes with rare mutations giving rise to severe skeletal disorders with spinal defect are candidates for common variants to be associated with common forms of disc degeneration. This was the basis for the identification of two alleles of the genes COL9A2 and COL9A3 coding the collagen IX, an extracellular matrix molecule present in cartilage and the nucleus pulposus of the intervertebral disc [19], to be association with DDD [3, 44].

An amino acid substitution (Gln326Trp) mutation in the α2 chain of collagen IX (the Trp2 allele), was found in 6 out of 157 Finnish patients (4%) but in none of 174 control individuals without sciatica, a symptomatic criteria used in this study [3]. Because of the low frequency of the Trp2 allele in the Finnish population, the association was established using family linkage analysis indicating a disease-causing mutation. Interestingly, the Trp2 allele is present at a much higher frequency in Southern Chinese, close to 20% of the population [29]. In an association study of 804 individuals recruited from the general population with MRI assessment for DDD, the Trp2 allele was shown to be an age-dependent risk factor for DDD, with an odds ratio of 2.4 [29]. Following age stratification, the greatest effect was observed in the 40–49 year age-group. Furthermore, affected Trp2 individuals had a tendency for more severe degeneration.

The association of the Trp2 allele to DDD is reliable, being replicated across two different ethnic populations using symptomatic and non-symptomatic criteria. The Trp2 allele was also associated with structural changes of the disc with odds ratios of 2.4 and 4.0 for annular tears and end-plate herniations, respectively [29].

Another amino acid substitution (Arg103Trp) mutation in the α3 chain of collagen IX (the Trp3 allele) was found in the same Finnish cohort to be associated with sciatica [44]. The Trp3 allele was found in 40 of 164 patients (24%) compared with 30 of 321 asymptomatic controls (9%) and represents a risk factor. The presence of at least one Trp3 allele increases the risk of DDD about threefold [44]. Surprisingly, this allele appeared to be absent in Southern Chinese [29]. The different frequency of the Trp2 and Trp3 alleles between the Finn and Chinese populations suggests that risk factors for DDD vary between ethnic groups. Association of both the Trp2 allele and Trp3 was not confirmed in a study of Southern Europeans of individuals from Athens, Greece [30]. However, the cohort size was relatively small with only 105 DDD patients (mean age = 39 ± 8.5 years) and 102 controls (mean age = 35.1 ± 7.9 years).

How the products of the Trp2 and Trp3 alleles act as a risk factor is not clear because the precise function of collagen IX in cartilage or intervertebral disc is not known. Collagen IX is proposed to act as bridging molecules important for the maintenance of tissue integrity [5]. Thus, the presence of tryptophan, a large and hydrophobic amino acid normally not present in collagen helical region, may affect matrix property altering the interaction between collagen IX with the cartilage/disc collagen fibrils or with other matrix molecules. Protein product of the Trp2 allele is present in cartilage matrix during development and did not appear to interfere with the covalent cross-linking to collagen II-containing fibrils [40], but its impact later in life has not been studied. The importance of collagen IX in the function of the spine was further highlighted in a study showing the Trp2, Trp3 and other sequence variations in collagen IX are associated with degenerative lumbar spinal stenosis [43].

Aggrecan (AGC1)

Aggrecan is the major proteoglycan component in cartilage and the nucleus pulposus of the intervertebral disc. Its key function is to maintain hydration of the disc structure, attracting water molecules through the highly negatively charged GAG moieties which are mainly chondroitin sulphate (CS) chains. Thus, from the structural integrity point of view and the associated lost of water content in a degenerating disc, this extracellular matrix molecule is considered as a good candidate for genetic association studies. Within the aggrecan core protein, CS chains are present in two adjacent regions, the CS1 and CS2 domain. In the human AGC1 gene, the region coding for the CS1 domain exhibits size polymorphism, commonly known as variable number of tandem repeats (VNTR) in exon 12, ranging from 13 to 33 repeats [16].

The functional property of aggrecan thus may vary between individuals with different lengths of the VNTR coding for the attachment sites of CS chains with a difference of as many as 40 CS chains per aggrecan core protein between the shortest and the longest AGC1 alleles. Indeed, association of the VNTR polymorphism was demonstrated to be associated with DDD in Japanese [32] involving a small study group of 64 young women (mean age = 21.3 years; range 20–29), with half of the participants shown to have normal discs by MRI. Also, a significant difference was reported between the distribution of the allele sizes and severity of the degeneration, but no significant association to disc herniation [32]. A later study of another population, ethnicity not disclosed [52], did not find association in a study group comprised of 44 individuals with DDD by MRI (mean age = 55 years; range 18–69) and 58 control subjects (mean age = 57 years; range 20–79). These DDD patients all presented with LBP and were candidates for surgical decompression. Like the Japanese study, this cohort size is too small to be reliable. Furthermore, the Japanese study focused on young individuals with “early onset” DDD, thus making it difficult to compare the two studies directly given the difference in the age-mean and spectrum of the age-range between the cohorts. Additional studies of the VNTR polymorphism or other single nucleotide polymorphisms (SNPs) identified through the Hapmap project [2] are needed to validate AGC1 as a disposing gene for DDD in much larger scale studies.

Collagen I (COL1A1)

The Sp1 polymorphism (TT/GT/GG) in intron 1 of the COL1A1 gene for the binding site for the transcription factor Sp1 is associated with low mineral density, increased bone loss, higher bone turnover and increased fracture risk [25, 67]. COL1A1 codes for the α1(I) chain of the collagen I molecule, the major collagen in bone matrix. Collagen I is also present in the annulus fibrosus of the intervertebral disc providing tensile strength [19]. Clinical and epidemiological studies observed that osteoporosis is inversely related to intervertebral disc degeneration. This relationship is the basis for studying the association of this polymorphism with DDD in a Dutch population that demonstrated individuals with the TT genotype had a higher risk for disc degeneration than individuals with the GT and GG genotype [45]. However, the authors rightly indicated the limitations of their study, particularly the definition of disc degeneration where they have used the Kellgren scoring system [35] from lateral radiographs of the thoracic and lumbar spine (T4-L5). Interestingly, their finding was somewhat supported in a study of young military recruits of extremely small sample size consisted of 24 cases and 12 controls, with DDD status assessed by MRI [64]. Again, further studies of larger cohorts with clearer definitions for DDD are required to consolidate the association of this gene.

Matrix metalloproteinase 3

In the promoter region of the human matrix metalloproteinase 3 (MMP3) gene, a common polymorphism was identified where one allele contains a run of six adenosines (6A allele) and the other five (5A allele) [71]. The 5A/6A polymorphism appears to be involved in the regulation of MMP3 expression, where the 5A allele has twice the promoter activity as the 6A allele [72]. MMP3 (stromelysin 1) is a key enzyme that degrades components of the extracellular matrix such as proteoglycans and collagens and is implicated in the degeneration of the intervertebral disc [24], and thus a candidate gene for DDD. A recent study in 54 young (mean age = 21.4; age range 18–28) and 49 elderly (mean age = 74.3; age range 64–94) Japanese subjects showed the 5A5A and 5A6A genotype in the elderly was associated with a higher number of degenerative discs than the 6A6A genotype, but not in the young group [61]. The interpretation of this finding however needs to be considered carefully too. Essentially, there are two studies of very small sample size; a young group with disc degeneration assessed using MRI, and an elderly group with degeneration related changes assessed by radiographs using the Kellgren scoring system [35], thus correlation should not be drawn from comparison of the two age groups as the phenotype definitions are not identical. Furthermore, normal controls were not available for proper genetic testing for predisposition in either the young or elderly groups. As yet, no replication of this association to DDD has been reported.

Cartilage intermediate layer protein

Cartilage intermediate layer protein (CILP) was identified as a matrix constituent of human articular cartilage that appeared to be unregulated in OA. The protein was found to be restricted in its distribution within specific zones in the intermediate layer of cartilage and thus the name CILP [38]. In a recent case–control association study of 20 candidate genes using sequence variations selected from the Japanese SNP Database and the Applied Biosystem SNP resource, a function SNP in CLIP, + 1184T→C, in exon 8 was shown to be associated with DDD in a Japanese sample set (OR = 1.61, 95% CI = 1.31–1.98) [54]. This is a highly significant association with P = 0.00002 following correction for multiple testing. The allelic change resulted in amino acid substitution Ile395Thr.

Cartilage intermediate layer protein is expressed widely in intervertebral discs and its expression increases as disc degeneration progresses [54]. It co-localizes with TGF-β1, inhibiting the TGF-β1-mediated induction of extracellular matrix proteins such as aggrecan and collagen II, through direct interaction with TGF-β1 [54]. Functional studies showed that the C allele (coding for Thr395) resulted in increased binding and inhibition of TGF-β1, suggesting the regulation of TGF-β1 signalling by CILP has a crucial role in the aetiology and pathogenesis of DDD [54]. This study further underscores the importance of the extracellular matrix component in the aetiology of DDD, and the role of ECM not only in the structural integrity of the tissue, but in the regulation of signalling molecules that may be important in tissue repair and maintenance. This finding opens new ideas for the search of candidate genes for DDD as well as novel therapeutic treatments that target specific pathways such as TGF-β signalling.

A point of interest to note in this association study with CILP is the selective criteria of the cohort that consisted of 467 individuals with DDD and 654 controls. Of the cases, all had a history of unilateral pain radiating from the back along the femoral or sciatic nerve to the corresponding dermatome of the nerve root for more than 3 months, all had an MRI scan and 367 underwent surgical operation for lumbar disc herniation to relieve symptomatic pain [54]. While it is noted that degenerative changes are necessary for the disc to herniate [41]; it is also clear that herniation-induced pressure on the nerve root alone cannot be the cause of pain, because over 70% of asymptomatic individuals have disc herniations pressurizing the nerve root but no pain [8, 9]. Thus, the Japanese cohort for the association of CILP [54] may not represent DDD in general but a special extreme subset of individuals with painful disc herniations. Replication in other populations with similar phenotype criteria or with MRI assessment independent of symptomatic pain would help to address the significant value of this association to DDD.

Interleukin-1, disc bulging and symptomatic pain

Interleukin-1 (IL-1) has regulatory roles in both health and disease. It is produced in response to infection, injury or antigenic challenge, and can elicit a broad spectrum of responses affecting neurologic and metabolic systems. Thus, IL-1 is a possible candidate linking neurological sensations, metabolic changes and environmental factors in DDD. Recent association studies of a functional SNP (+ 3954C→T) in exon 5 of the IL-1β gene to DDD [57] and LBP [58] is highlighting such possibility with interaction with other gene risk factors such as the Trp3 allele [59]. Given that Trp3 also interacts with environment factors such as obesity [56], a complex network of genetic and environmental interactions for DDD is emerging.

There are three members (IL-1α, IL-1β and IL-1RN) of the IL-1 gene family represented in the IL-1 gene cluster. IL-1α and IL-1β are strong inducers of inflammation while IL-1RN modulates their effect acting as a receptor antagonist. The IL-1β + 3954 T allele is associated with increase IL-1 levels [46], and similarly for the TT genotype of the IL-1α (-889C→T) promoter polymorphism increase transcription of IL-1α relative to the CC genotype [17]. An effect of IL-1 on disc function may be related to its role in inducing matrix matelloproteinases that degrade proteoglycans [55] such as aggrecan in the disc, and has a negative effect on the synthesis of proteoglycans and collagens [23].

The IL-1β + 3954 T and IL-1α -889 T alleles are associated with disc bulges with odds ratio of 2.4 and 3.0, respectively [57]. Thus, proinflammatory mediators such as IL-1 could play a role in disc bulging and possibly pain. A hypothesis is that, in symptomatic individuals with DDD, the nerve roots are “sensitized” to the pressure [11], possibly by molecules arising from an inflammatory cascade with the expression of prostaglandin E2, phospholipase A2, nitric oxide, TNF-α, IL-6 and MMPs produced by cells in herniated discs [31]. Thus, IL-1 may be important in mediating matrix degradation and pain in disc degeneration. However, the association still needs to be verified in a large data set.

Identifying new candidate genes for genetic analysis of DDD

In a complex/multifactorial disease like DDD, it is likely that many risk factors are involved. So far, all the genetic analyses were carried out using the candidate genes approach, and the genes selected are based on our knowledge of disc biology, albeit limited. The focus has been on extracellular matrix components such as collagen and proteoglycan molecules, and processes that can disrupt the function of these molecules in the disc [4]. This approach is still likely to remain a very fruitful one, with emerging new knowledge of disc biology and degeneration, as well as knowledge generated from high throughput analysis of gene expression profiles, new candidates will no longer be limited to extracellular matrix molecules. In addition, clues for possible candidate genes can come from rare genetic diseases in human chondrodysplasia and mouse models with spinal involvement. An example is a recent report for mutations in the chondroitin 6-O-sulphotransferase-1 (C6ST-1) gene (CHST3) giving rise to severe spinal involvement [62]. Alterations in GAG sulphation can result in changes in disc hydration, thus making CHST3 a good candidate for DDD.

Other candidates can be drawn from related degenerative skeletal diseases such as OA. Given the intervertebral disc has similar matrix molecules and structural function as a synovial joint, providing spine motion and resisting compressive forces, risk factors identified for OA [60] are potential candidates for DDD. Furthermore, it has been reported that osteoporosis has an inverse relationship to OA of the spine [14, 15], while OA in the spine is associated with disc degeneration. The reason for these relationships is not known, but it has been suggested that the inverse relationship may be attributed to a shared set of genetic risk factors underlying both disorders.

In addition to study genes individually that could lead to alterations in extracellular matrix synthesis or degradation, cell survival and nutrition supply. We should consider genetic interaction of genes for particular biochemical or cellular processes. An example would be genes for the degradation of the extracellular matrix that can have a combined genetic effect; this could include the extracellular matrix components, specific matrix metalloproteinases and cytokines that can activate these enzymes. It is likely that risk factors with small effects can only be revealed in interaction association studies.

Considerations for genetic analysis of DDD

While a number of disposing genes have been identified for DDD, most only give moderate effects with relatively low odds ratio. Furthermore, a number of the studies were carried out with relatively small sample sizes, and others need to be substantiated with replication in other populations. At present, only the association for VDR and COL9A2 has been replicated in more than one population. Lack of replication may be related to the complexity of DDD, different criteria used for genetic studies of DDD and the size of the cohorts. This is not a unique problem to DDD but genetic association studies in general [28, 63]. Therefore, for genetic association studies to be successful, one needs large sample sizes, small P values, reported associations that make biological sense and alleles that affect the gene product in a physiologically meaningful way. It is advisable that a recommended checklist should be strictly adhered to, to minimize problems of reporting genetic association with complex outcomes [13].

Importance of clear phenotype definition, age effect and size of the cohort

A clear phenotype definition is an important prerequisite for genetic studies. Using sciatica as an example, while it may seem to make clinical sense to use this common symptom as a definition for genetic study, one needs to bear in mind that there are more than one cause for sciatica, and each of these causes may have different genetic predisposition. Moreover, different people may have different response to pain, from differences in central perception to differences in local inflammatory response [8, 9]. All these can be further confounded by psychosocial issues, such as injured worker compensation.

Thus, the ideal would be to use a clear-cut definition. One possible way to define the phenotype would be to use the severity of intervertebral disc degeneration, as one of the first signs of DDD is the loss of water content of the disc, this could be used as a reliable and reproducible indicator of degeneration. The major disadvantage of this would be the need to perform MRI. But this phenotype has been used in a number of genetic studies and has been shown to have power in detecting genetic predisposition [12, 29]. With a clear unified definition, and information for other structural changes, it is possible to perform valid stratification analyses; to compare different association studies; and the possibility to carry out meta-analysis across different populations. An important issue to address in a degenerative disease like DDD is the effect of age. This is perhaps one of the most confounding variables that need to be standardized or adjusted by using statistical means.

Approaches for genetic analysis for disc degeneration

There are typically two approaches that have been used to map genetic variants: linkage analysis and association studies. Families with multiple affected individuals and multiple generations are used in linkage analysis to detect genetic regions that are more likely to segregate with the disease than would be expected by random chance. Linkage analysis is typically the most effective means for mapping single gene “Mendelian” diseases with high penetrance. It has limited success in identifying polygenic disease genes because the statically power is diluted with increasing number of genes involved. Nevertheless, if one can collect sufficient number of families with multiple members with early on-set DDD, then linkage study on these families, which are likely to have a significant genetic disposition for DDD, might have adequate power to detect novel genes for DDD.

It is believed that common variants often cause common diseases and that comparison of genetic differences in case and control samples will provide more statistical power to identify susceptibility genes. Association studies start with a collection of well-defined case and control samples, and use genetic polymorphisms in the candidate genes to determine if there are genetic differences between these two groups. This approach has been used in most of the association studies so far for DDD. The limiting factors have been the selection of candidates to be tested and the availability of common polymorphism for testing, particularly, non-synonymous (functional) SNPs for the genes of interest. However, the completion of the HapMap project (http://www.hapmap.org/) has provided a rich resource of common polymorphism for genetic testing [2]. More significantly, genes can be more efficiently surveyed using tagged SNPs for regions of the gene that are in strong linkage disequilibrium with other SNPs [27]; thus minimize the number of SNPs to be genotyped.

Conclusion: the way forward

It is estimated that about 1,000 cases and 1,000 controls are required for each study to have sufficient power to identify susceptibility genes with only a moderate effect. Well-defined samples sharing similar environment are recommended as well as the use of quantitative traits to define cases rather than disease definition which can be rather subjective. Thus, the issue highlighted earlier regarding the importance of clear phenotype definition, age effect and size of the cohort are all relevant for the way forward for association studies of DDD. Large consortium of international groups with a common interest should be established to provide opportunities for multiple-population testing to enhance the power and significance of the findings.

Candidate-gene association studies have limitation of detecting all genetic basis of the disease because this approach rely on having predicted the correct genes on the basis of biological hypothesis or the location of the known linkage regions. The genome-wide association approach has no assumptions of the location of the causal variants and represents an unbiased yet fairly comprehensive approach even in the absence of knowledge of the function or location of the causal genes. The availability of low-cost high throughput genotyping technologies has made it feasible to consider genome-scan analysis for case–control studies for multigenic diseases. The applicability of this approach has been highlighted recently in the identification of a novel gene in age-related macular degeneration [36]. Interestingly, in this study, sufficient power was obtained for a genome-wide screen of 96 cases and 50 controls surveying only 116,204 SNPs across the genome. The selection criteria for the cases were very strict, again highlighting the importance of accurate phenotype definition.

Finally, supporting evidence of risk genes should include functional data such as cell-base assays. And definitive answers would come from in vivo testing of risk factors in animal models to validate susceptible genes for DDD. For this, emerging high definition MRI technology for small animals such as the mouse would be highly desirable to assess disc degeneration in a controlled environment.