Introduction

Cerebral palsy (CP) is the most frequent cause of physical disability in childhood with a prevalence of 2–2.5 per 1000 live births that has changed little in 50 years.1 It is a clinically heterogeneous group of non-progressive disorders, primarily affecting the movement and posture. Co-morbidities can include intellectual disability (ID), autism spectrum disorder, epilepsy, speech and language deficits, and visual and hearing impairments.2 Known epidemiological risk factors include preterm delivery, intrauterine growth restriction, intrauterine infection and male gender (male:female ratio 1.3:1).3, 4, 5 However, causative pathways are poorly understood. Acute or chronic intrapartum fetal compromise, historically considered the cause of CP, is found in <10% of cases.6 Substantial empiric recurrence risks for certain types of CP and identification of mutant genes in a small number of familial cases are consistent with a genetic contribution to CP.7, 8, 9 Defining that contribution has been hampered by the largely sporadic nature of the disorder. Candidate CP gene mutation screening and association studies have been inconclusive.10 A recent genome-wide search for copy number variants in 50 CP cases did not yield any de novo or obvious candidate mutations.11

High-throughput sequencing of whole-exome-captured genomic DNA (WES) is an efficient strategy for finding rare, disease-causing mutations, including de novo mutations, and has established that de novo mutations are associated with a sizeable proportion of sporadic cases of ID, autism spectrum disorder and schizophrenia.12, 13, 14, 15 To investigate the possible contribution of de novo and other rare variants to CP, we sequenced the exomes of 183 sporadic cases and where available, their unaffected parents.

Materials and Methods

Study cohort

The study cohort comprised 183 Caucasian cases with CP and 263 parents: 98 case-parent trios; 67 case-parent duos (DNA from only one parent) and 18 singletons (no parental DNA). No case with a confirmed diagnosis of CP was excluded. The CP diagnosis was confirmed by a pediatric rehabilitation specialist using standard published criteria relating to non-progressive disorders of movement control and posture.16 Brain imaging reports were available for 112 of the cases (61%). The cohort’s overall phenotypic, clinical and demographic characteristics (Table 1) were very similar to population distributions described in the 2013 report of the Australian Cerebral Palsy Register (see Web Resources). Pediatric specialist evaluation of the available medical records for potential known causes of CP was complete before enrollment to this study. Comprehensive details of the cohort are provided in Supplementary Table A.

Table 1 Clinical characteristics of 183 individuals with CP

Approval from the Women’s and Children’s Health Network Human Research Ethics Committee, Adelaide, Australia, and Child and Adolescent Health Service Ethics Committee at the Princess Margaret Hospital, Perth, Australia and the Internal Review Board at Baylor College of Medicine, Houston, USA was obtained for this project. Written informed consent was obtained from participants or their parents.

DNA extraction

DNA extraction from lymphoblastoid cell lines was performed on a fully automated large volume nucleic acid purification system (Qiagen Autopure LS; Qiagen, Stanford, CA, USA), which ensures high quality DNA from Epstein–Barr virus-transformed lymphoblastoid cell lines. The affected child and parent’s blood was extracted using a QIAamp DNA Blood Mini Kit and QIAamp DNA Blood Maxi Kit (Qiagen, Stanford, CA, USA), respectively, following the manufacturer’s instructions.

DNA library construction

Genomic DNA samples were constructed into Illumina paired-end pre-capture libraries according to the manufacturer’s protocol (Illumina Multiplexing_SamplePrep_Guide_ 1005361_D) with modifications as described in the BCM-HGSC protocol (https://hgsc.bcm.edu/sites/default /files/documents/Illumina _Barcoded _Paired-End_Capture_Library_Preparation.pdf). Briefly, 1 μg of genomic DNA in 100 μl volume was sheared into fragments of ~300–400 base pairs in a Covaris plate with E210 system followed by end repair, A-tailing and ligation of the Illumina multiplexing PE adapters. Pre-capture ligation-mediated PCR was performed for seven cycles of amplification using the 2 × SOLiD Library High Fidelity Amplification Mix (a custom product manufactured by Invitrogen, Grand Island, NY, USA). Universal primer IMUX-P1.0 and a pre-capture barcoded primer IBC were used in the PCR amplification. In total, a set of 12 such barcoded primers were used on these samples. Purification was performed with Agencourt AMPure XP beads (Beckman Coulter, Inc., Brea, CA, USA) after enzymatic reactions. Following the final XP beads purification, quantification and size distribution of the pre-capture ligation-mediated PCR product was determined using the LabChip GX electrophoresis system (PerkinElmer).

Whole-exome capture

Six pre-capture libraries were pooled together (~166 ng per sample and 1μg per pool) and hybridized in solution to the HGSC VCRome 2.1 design17 (42 Mb, NimbleGen) according to the manufacturer’s protocol Nimblegen Seqcap Ez Exome Library Sr User’s Guide (Version 2.2) with minor revisions. Human COT1 DNA and full-length Illumina adapter-specific blocking oligonucleotides were added into the hybridization to block repetitive genomic sequences and the adapter sequences. Post-capture ligation-mediated PCR amplification was performed using the 2X SOLiD Library High Fidelity Amplification Mix with 14 cycles of amplification. After the final AMPure XP bead purification, quantity and size of the capture library was analyzed using the Agilent Bioanalyzer 2100 DNA Chip 7500 (Agilent Technologies, Santa Clara, CA, USA). The efficiency of the capture was evaluated by performing a quantitative PCR-based quality check on the four standard NimbleGen (Roche NimbleGen, Madison, WI, USA) internal controls. Successful enrichment of the capture libraries was estimated to range from 6 to 9 of ΔCt value over the non-enriched samples.

DNA sequencing

Library templates were prepared for sequencing using Illumina’s cBot cluster generation system with TruSeq PE Cluster Generation Kits (Illumina, San Diego, CA, USA; Cat. no. PE-401-3001). Briefly, these libraries were denatured with sodium hydroxide and diluted to 3–6 pm in hybridization buffer to achieve a load density of ~800 K clusters per mm2. Each library pool was loaded in a single lane of a HiSeq (Illumina) flow cell, and each lane was spiked with 2% phiX control library for run quality control. The sample libraries then underwent bridge amplification to form clonal clusters, followed by hybridization with the sequencing primer. Sequencing runs were performed in paired-end mode using the Illumina HiSeq 2000 platform. Using the TruSeq SBS Kits (Cat. no. FC-401-3001), sequencing-by-synthesis reactions were extended for 101 cycles from each end, with an additional seven cycles for the index read. Sequencing runs generated ~300–400 million successful reads on each lane of a flow cell, yielding 5–6 Gb per sample. With these sequencing yields, samples achieved an average of 92% of the targeted exome bases covered to a depth of 20 × or greater.

Analysis

Analysis was conducted as previously described.18 Briefly, Illumina sequence data were aligned to the human reference genome (HG19) with BWA.19 Variant qualities were recalibrated with GATK.20 Variants were called using ATLAS-SNP and the SAMtools program pileup. Reads were locally realigned at presumptive insertion or deletion events. Differences between the human reference and the sequence reads (variants) were identified. Variants were annotated with their minor allele frequency in normal populations, previous association to disease, predicted effect on the human gene models, association of the gene with disease and, when parental data were available, their inheritance. Variants were considered de novo if neither parent had the variant, and candidate recessive and inherited X-linked variants were selected by segregation analysis. The availability of unaffected siblings would have provided additional power to address the significance of potential causative variants for CP, however, due to the constraints of human research ethics unaffected siblings were not collected for this study.

Sanger sequencing was performed across two sites to test segregation and to validate rare variants predicted to alter the gene product: (1) HGSC—an in-house, automated pipeline designed primers for de novo variant sites. Sanger sequencing was performed using BigDye terminator chemistry 3.1 (Applied Biosystems, Foster City, CA, USA) and sequenced using an ABI 3730xl DNA analyzer (Applied Biosystems). Sequencing data were analyzed using in-house software, SNP-D and consed; (2) The University of Adelaide—primers incorporating the candidate variant for a subset of de novo and X-linked variants were designed using Primer3 (v.0.4.0; www.primer3plus.com/web_0.4.0/input.htm). Sanger sequencing was performed using BigDye terminator chemistry 3.1 (Applied Biosystems) and sequenced using an ABI prism 3700 genetic analyzer (Applied Biosystems). Sequencing data were analyzed using DNASTAR Lasergene 10 Seqman Pro (DNASTAR, Inc. Madison, WI, USA). All validation was performed using genomic DNA isolated from whole blood.

DNA variant and gene prioritization

Where possible, we followed recent guidelines for investigating causality of sequence variants in human disease before future experimental validation in animal and in vitro models.21 We identified variants that were in known disease genes whose clinical spectrum overlaps CP or de novo in any gene. We checked these against single-nucleotide polymorphism database, 1000 genomes or Exome Variant Server. Synonymous and intronic de novo variants were not included once they were predicted to be neutral. Unique variants were assessed using multiple criteria. However, to select the best possible candidate genes and variants we used: (1) a combination of Residual Variation Intolerance Score (RVIS)22 and Combined Annotation-Dependent Depletion (CADD)23 with cut offs for RVIS<50th percentile for known and novel genes and CADD>10 for known Online Mendelian Inheritance in Man disease genes and CADD>20 for novel candidate genes; (2) the type of variant (that is, frameshift, splice, stopgain and missense); (3) in silico prediction of functional effect at the amino-acid level by various algorithms;24, 25 (4) evolutionary conservation; (5) brain expression pattern; (6) the predicted effect of haploinsufficiency26 and (7) previous disease association reported in Online Mendelian Inheritance in Man. The RVIS and CADD tools are complex, multidimensional tools.22, 23 RVIS ranks genes in terms of intolerance to functional genetic variation and CADD integrates several well-known tools, among these also PolyPhen and SIFT.22, 23 We assessed the validity of RVIS and CADD as prioritization models using two cut off scores (RVIS<25, CADD>20 and RVIS<50, CADD>10) applied to recent ID,12, 13 autism13, 27, 28 and schizophrenia14 WES papers to confirm whether the genes identified would be selected under this model. We performed a t-test through Partek (see Web Resources) to compare the haploinsufficiency scores26 of de novo variants (where a score was available) with non-mutated genes.

In total, 2.5 Tbp of sequence data were generated from 446 individuals and aligned to the human reference genome sequence (hg19). This yielded an average redundant coverage of 78.8 × with 92% of targeted, coding bases having at least 20 × redundant read coverage. Nine out of 14 predicted causative genes for CP had better than 90% of the gene covered with >20 × reads (Supplementary Table B).

Results

Case-parent trios (n=98)

De novo mutations

We identified 61 de novo mutations in 43 cases from 98 case-parent trios (44%; one to four de novo mutations per individual), 60 autosomal and one X-chromosomal (Supplementary Table C). The de novo rate for protein-altering mutations (one stopgain, five splice site, three frameshift deletions, two frameshift insertions, two non-frameshift deletions and 48 missense mutations) was 0.62 per individual. The rate was within the lower end of the range previously reported for ID (0.63 and 1.49 per individual)12, 13 and autism spectrum disorder (0.70 per individual).14 Assessment of the haploinsufficiency scores26 for de novo mutations (where scores were available; n=38) identified a slight enrichment compared with non-mutated genes but this was not significant (Supplementary Figure A). On the basis of RVIS and CADD and a set of multidimensional prioritization criteria (see methods), we selected 10 de novo mutations in 98 case-parent trios as potentially relevant to CP causation. These included one splice site and nine missense de novo mutations. Two different de novo mutations were identified in TUBA1A, in two cases, and the remaining eight de novo mutations occurred in different genes (Tables 2 and 3).

Table 2 Novel variants in known (OMIM) genes in CP cases identified in 98 trios: CADD score>10 and RVIS %<50
Table 3 Novel candidate genes in CP cases identified in 98 trios; CADD score>20 and RVIS %<50

Of the 10 de novo mutations predicted to be causative for CP, four occurred in genes associated with neurological disorders (Table 2): p.P480L in KDM5C, a lysine-specific histone demethylase and known X-linked ID (XLID) gene;29 p.G1050S in SCN8A, associated with cognitive impairment;30 and p.R123C and p.L152Q in TUBA1A associated with neuronal migration disorders31 (Table 2). Six de novo mutations occurred in genes not known to be associated with disease (Table 3): including a single mutation (c.957+1G>A) predicted to affect splicing in AGAP1, which directly regulates AP-3-dependent trafficking32 and five missense mutations in JHDM1D, MAST1, NAA35, WIPI2 and RFX2. In addition, we identified seven loss-of-function variants in non-disease associated genes (CDK17, ENPP4, LTN1, MIIP, NEMF, SSPO and UBQLN3), but these genes either had a high RVIS percentile or there were other frequent loss-of-function variants in these genes in Exome Variant Server.

Inherited X-chromosome and recessive variants

In addition to the de novo mutations, which included one X-chromosome de novo mutation (in KDM5C), four maternally inherited X-chromosome variants in four male cases (4%) predicted to be causative for CP were also seen in the 98 case-parent trios (Tables 2 and 3). Two were in known Online Mendelian Inheritance in Man disease genes: p.P161A in L1CAM associated with L1 syndrome,33 and p.R493C in PAK3 associated with XLID. The remaining two were in genes not yet associated with human disease: a nonsense mutation p.K163X in CD99L2, and a missense variant p.G2533S in TENM1 (Tables 2 and 3 and Supplementary Table D).

No predicted deleterious homozygous autosomal recessive variants were identified. However, we identified a single case with compound heterozygous variants, c.11562+2 T>A (paternal allele) and c.7158+5 G>A, p.R2682Q and p.A2854D (maternal alleles) in the HSPG2 gene (Supplementary Table E). HSPG2 mutations are associated with a wide variety of phenotypes, often including chondrodysplasia. However, isolated muscle stiffness without obvious signs of chondrodysplasia has also been noted in individuals with mutations in HSPG2.34 Although perlecan deficiency has been shown to underlie the chondrodysplasia, severe muscle stiffness may also induce bone deformities. In individuals with HSPG2 mutations, muscle stiffness or bone deformities are often the first symptoms. Deformities of lower limbs and feet are common.34 This individual showed joint limitation of his knee, ankle and foot and equinus deformity.

Whole-exome sequencing

In summary, 14% of the 98 case-parent trios had a predicted causative variant. Six (6%) had a de novo (4) or inherited (2) predicted deleterious mutation in known disease genes. Eight (8%) had an implicated pathogenic variant, de novo (6) and inherited (2), in novel candidate CP genes. Multiple prioritization criteria including the assessment of RVIS and CADD scores assisted in implicating the deleteriousness for these mutations (see methods). We tested this model in previously published ID, autism and schizophrenia WES data sets (Supplementary Figures B–I and also Supplementary Table F). These conditions have a high level of locus heterogeneity involving hundreds of independent risk loci and this was reflected in the gene distribution pattern. We also looked at the multispecies alignment for each of the 12 predicted causative missense mutations, de novo (9) and inherited (3), and found a high level of conservation across species (Supplementary Table G).

Two different variants (p.E304K in L1CAM and p.M100I in PAK3) in two X-linked genes we predicted to be causative in the trio cases were identified in two of the 67 duo cases (Supplementary Table H). Five singleton cases had a variant in known disease genes, associated with phenotypes overlapping with CP or genes associated with neurological disorders that are frequently comorbid with CP (Supplementary Table I). As inheritance of these variants was unknown, we could not confidently associate these with CP.

Pathway analyses

Biological system analysis of the predicted causative CP genes (n=14) was conducted using the Ingenuity Pathway Analysis (www.ingenuity.com). Three genes (L1CAM, PAK3 and TUBA1A) were identified (P=0.006) as being involved in Axonal Guidance Signaling. The top associated network was developmental disorder, hereditary disorder and neurological disease (Supplementary Figure J).

Clinical associations

From 15 cases (13 trio cases and two duo cases) with a predicted pathogenic variant, 14 (93%) had positive findings on brain imaging. Overall we had brain imaging on 112 cases. Two out of 21 cases with intraventricular hemorrhage, four out of 46 cases with white matter damage, five out of 20 cases with developmental brain malformations and three out of 15 cases with unilateral cerebral infarction had a predicted pathogenic variant, respectively. One case out of seven cases with normal imaging had a predicted pathogenic variant. No predicted pathogenic variants were found in two cases with in utero infection or in one case with a brain tumor (Supplementary Table J).

Twenty-two out of 98 trio cases had a diagnosis of ID. Seven of the 22 cases (0.32) had a predicted pathogenic variant. These included one novel protein-truncating mutation (0.05; AGAP1), four missense mutations (0.18) in known ID genes (KDM5C, L1CAM, SCN8A and TUBA1A) and two novel (0.09) missense mutations (JHDM1D and NAA35; see also Supplementary Tables K and L). These proportions were slightly lower than previously reported for ID, where three out of 22 (0.14)12 and six out of 45 (0.13)13 novel truncating mutations and four out of 22 (0.18)12 and 11 out of 45 (0.24)13 variants in known ID genes were found.

Discussion

CP encompasses a large group of non-progressive childhood movement and posture disorders and can occur as an isolated finding or together with additional phenotypic features.5 Previous estimates have suggested that the contribution of genetic variants to the burden of CP is about 2%.35 We followed, where possible, recent guidelines to resolve which of the many variants may be implicated in human disease.19 Using strict criteria, we found that 14% of 98 case-parent trios had variants that were putatively disease causing in five known disease genes and eight novel candidate genes. L1CAM and PAK3 genes, which we implicate in CP from our trio cases, had one additional predicted deleterious variant each in two of the 67 duo cases. Overall the predicted deleterious variants of this study were in genes other than the currently known CP genes, suggesting a considerable genetic heterogeneity underlying CP.7, 8, 36 We used multiple criteria to select CP-relevant unique variants, including a combination of two recently developed prioritization algorithms, RVIS22 and CADD.23 We tested these algorithms in other genetically heterogeneous neurological disorders12, 13, 14, 15, 28 and found a similar distribution pattern between our CP-relevant variants and those found in ID.

In total, five known disease genes had variants predicted to be causative for CP. These included KDM5C, SCN8A, TUBA1A, L1CAM and PAK3. Weakness, poor muscle control and spasticity have been reported in patients with mutations in these genes.29, 30, 33, 37, 38 In combination with the current sequencing results, these signs of movement and posture disturbances implicate CP as a previously unrecognized diagnosis of the clinical spectrum associated with mutations in these genes.

KDM5C is a well-known XLID gene with varying clinical features including mild–severe ID, microcephaly spasticity and seizures.29 Recently, the same variant (p.P480L) was reported as a novel mutation in two male siblings diagnosed with ID.39 Although the majority of KDM5C affected individuals are males, carrier females with milder phenotypes including ID and spasticity have been reported.29, 40 This female (26026P) showed spastic diplegia, ID and speech dyspraxia. Two cases, 106115P and 169451P, had different novel de novo missense mutations in TUBA1A. De novo mutations in this gene have been reported in a wide spectrum of neuronal migration disorders.31 Furthermore, spastic diplegia or quadriplegia, ID and microcephaly are common in TUBA1A mutations.37, 41, 42 Case 106115P was diagnosed with diplegic CP, optic atrophy, adducted thumbs, seizures and moderate ID. Case 169451P had diplegic CP. Brain magnetic resonance imaging of both the individuals showed, except for the characteristic sign of pachygyria, TUBA1A associated anomalies, such as cerebellar and corpus callosum anomalies.41, 42

One female case (43043P) diagnosed with hemiplegic CP and ID had a missense mutation, p.G1050S, in SCN8A. ID, epilepsy and varying degrees of motor dysfunction including ataxia and spasticity have been described in children with mutations in SCN8A.13, 30, 43 In mice with partial- and complete-loss-of-function of scn8a ataxic gait, tremor, dystonia, muscle atrophy and loss of hind-limb function have been reported.43

Two males (142424P and 156438P) had different maternally inherited X-chromosome variants in L1CAM. L1CAM is involved in neurite outgrowth and when mutated is associated with L1 syndrome, a disorder with variable features including hydrocephalus, ID, spastic paraplegia, adducted thumbs and agenesis of the corpus callosum.33 The phenotype of both the cases is clearly compatible with L1 syndrome, although case 156438P was less severely affected showing only subclinical signs of hydrocephalus with ventricular enlargement on brain computed tomography.

We also found two male cases with different maternally inherited missense variants in PAK3, a known XLID gene. PAK3 belongs to a family of serine/threonine p21-activating kinases, critical effectors that link Rho GTPases to cytoskeleton reorganization and nuclear signaling. Two essential domains have been identified, the N-terminal regulatory region with a CRIB domain and a C-terminal catalytic domain that includes a kinase domain.44 One of the variants affects the CRIB domain (p.M100I in 165447P) and the other the kinase domain (p.R493C in 15015P). Both mutations have not been reported previously. Case 165447P was diagnosed with a left cerebral artery infarction in the neonatal period and was later diagnosed with dystonic CP. Case 15015P was prematurely born and neonatal brain magnetic resonance imaging showed a grade IV intraventricular hemorrhage. He was diagnosed with hemiplegic CP and epilepsy and showed cognitive abilities in the upper limit of the low average range. Different PAK3 mutations may lead to different clinical outcomes. Although isolated loss of PAK3 kinase activity causes non-syndromic ID, a dual molecular effect is seen in those cases with a syndromic neurocutaneous phenotype.38 In the latter phenotype also hypotonia, apparent motor delay with inability to walk, hyperreflexia and afinalistic movements have been described.38 Out of 2400 males in the NHLBI Exome Sequencing Project (EPS), only two missense variants are reported, which suggests that indeed missense variants in this gene are not tolerated in males.

In addition to these known disease genes, we identified a splice variant in MAOB, which might be of interest for CP. Most duplications and deletions of MAOB include nearby genes and have been associated with severe ID.45 Two missense variants in known XLID genes, IQSEC2 and ZDHHC9, were identified in two cases. Although they met our prioritization criteria, we did not consider them causative of CP mostly because no ID was reported for either case.

Brain imaging reports were available for 61% of cases. Thirteen per cent of these had a potentially pathogenic variant to explain their CP. It has been well recognized that malformations of cortical development often have a genetic basis.46 Our findings of putative pathogenic mutations in cases with diffuse white matter damage or a unilateral thrombotic event raise the possibility of a genetic contribution in these pathologies.

The results of this preliminary study suggest that CP is genetically heterogeneous, likely reflecting its clinical heterogeneity. The classification of CP as a non-progressive disorder of movement and posture has been defined by international criteria.4, 5 All individuals with CP in this study met these criteria for CP at time of diagnosis. Physical examination before inclusion was performed by experienced pediatric rehabilitation specialists. The fact that mutations in known syndromic ID genes such as TUBA1A and L1CAM have been identified within this cohort confirms the clinical variability of these disorders and suggests that movement disturbances are part of their phenotypic spectrum.

It should be emphasized that these data, on their own, do not definitively implicate the identified gene variants in CP causation. Ultimately ongoing cellular, molecular and animal model functional studies of these candidate CP genes will provide final evidence of pathogenicity. As yet, they cannot be used in diagnostic screening and clinical decision making.

With the exception of TUBA1A, L1CAM and PAK3, where two different mutations were identified for each gene, only single mutations were observed for the remaining genes. Much larger sample sizes are required to confirm the involvement of any single gene and this can be addressed through targeted re-sequencing of candidate genes.47 Whole-exome sequencing and whole-genome sequencing of new CP cohorts will undoubtedly reveal other candidate mutations and potential novel syndromes associated with movement disturbances. In time, we may gain a better understanding of the interactions between heterogeneous genetic susceptibility, environment and chance that appear to underlie this relatively common, complex and burdensome neurodevelopmental disorder.