Introduction

Intellectual disabilities (ID) have an estimated prevalence of approximately 3 % in the general population [1, 2]. IDs are genetically extremely heterogeneous, and etiologies include aneuploidies, segmental deletions/duplications, and numerous monogenic causes. It is likely that many causes of ID have yet to be identified [1, 2]. Due to the genetic heterogeneity and small percentage of monogenic ID attributable to any one gene, as well as the number of genes involved in ID that remain to be discovered, whole exome sequencing (WES) is a useful genomic tool to identify the etiology of ID in clinical diagnostic testing. In cases of severe ID, as a result of the decreased reproductive fitness, a significant portion of genetic causes are due to de novo mutations that can be readily identified by exome sequencing using a trio comparison of the proband and the biological parents [3, 4].

Among the genes identified from clinical exome sequencing, the most frequently identified gene from one large series was ARID1B [5]. Chromatin modification is an important regulator of gene expression. Among three major epigenetic chromatin regulatory mechanisms, the ATP-dependent SWI/SNF chromatin modifier has been linked with neurodevelopmental disorders including ID and autism. The SWI/SNF complexes are evolutionarily highly conserved and alter the accessibility of DNA to nuclear factors through modification of DNA-nucleosome topology in an ATPase-dependent manner. They play essential roles in regulation of gene expression, embryogenesis, organ development and tumorigenesis [6, 7]. Based on the subunit composition, SWI/SNF can be grouped into two subcomplexes, BRG1- or hBRM-associated factors (BAF) and polybromo-associated BAF (PBAF) [8, 9]. Both subcomplexes contain ARID protein family members. BAF complexes contain ARID1A (BAF250a) or ARID1B (BAF250b) [10], while ARID2 (p200; BAF200) is a subunit of the PBAF complexes [11]. Haploinsufficiency of ARID1B has been identified as a frequent cause of intellectual disability, agenesis of the corpus callosum [12, 13], and autism [14, 15]; and mutations in ARID1A and ARID1B cause Coffin-Siris syndrome [16, 17]. Studies also have identified other core catalytic units of the SWI/SNF complexes as causes of neurological diseases, including SMARCA2 in Nicolaides-Baraitser syndrome (NBS) [18] and SMARCB1, SMARCA4, and SMARCE1 in Coffin-Siris syndrome [19].

In the present study, we performed clinical whole exome sequencing (WES) in patients with ID and identified four independent, likely gene-damaging novel variants in ARID2. Despite the well-established association between BAF subunits and neurodevelopmental disorders, little is known about PBAF function in human diseases, and this is the first report of mutations in ARID2 as a cause of intellectual disabilities.

Materials and methods

Informed consent was obtained from all individual participants included in the study. Additional informed consent was obtained from all individual participants for whom identifying information is included in this article. This study was approved by the Institutional Review Board of Columbia University.

Whole exome sequencing

Genomic DNA was extracted from whole blood from affected children and their parents. Exome sequencing was performed on exon targets captured using the Agilent SureSelect Human All Exon V4 (50 Mb) kit (Agilent Technologies, Santa Clara, CA). One microgram of DNA from blood specimen was sheared into 350–400 bp fragments, which were then repaired, ligated to adaptors, and purified for subsequent PCR amplification. Amplified products were then captured by biotinylated RNA library baits in solution following the manufacturer’s instructions. Bound DNA was isolated with streptavidin-coated beads and re-amplified. The final isolated products were sequenced using the Illumina HiSeq 2000 sequencing system with 100-bp paired-end reads (Illumina, San Diego, CA). DNA sequence was mapped to the published human genome build UCSC hg19/GRCh37 reference sequence using BWA with the latest internally validated version at the time of sequencing, progressing from BWA v0.5.8 through BWA-Mem v0.7.8 [20]. Targeted coding exons and splice junctions of known protein-coding RefSeq genes were assessed for average depth of coverage with a minimum depth of 10× required for inclusion in downstream analysis. Local realignment around insertion-deletion sites was performed using the Genome Analysis Toolkit v1.6 [21]. Variant calls were generated simultaneously on all sequenced family members using SAMtools v0.1.18 [20]. All coding exons and surrounding intron/exon boundaries were analyzed. Automated filtering removed common sequence changes (defined as >10 % frequency present in 1000 Genomes database). The targeted coding exons and splice junctions of the known protein-coding RefSeq genes were assessed for the average depth of coverage and data quality threshold values. Whole exome sequence data for all sequenced family members was analyzed using GeneDx’s XomeAnalyzer (a variant annotation, filtering, and viewing interface for WES data), which includes nucleotide and amino acid annotations, population frequencies (NHLBI Exome Variant Server and 1000 Genomes databases), in silico prediction tools, amino acid conservation scores, and mutation references. Variants were filtered based on inheritance patterns, gene lists of interest, phenotype, and population frequencies, as appropriate. Resources including the Human Gene Mutation Database (HGMD), 1000 Genomes database, NHLBI Exome Variant Server, OMIM, PubMed, ClinVar [2225], and Exome Aggregation Consortium (ExAC; Cambridge, MA; URL: http://exac.broadinstitute.org; accessed June 2015) were used to evaluate genes and sequence changes of interest. Additional searches were performed using specific gene lists related to ID. Identified sequence changes of interest were confirmed in all family members by di-deoxy Sanger sequence analysis using an ABI3730 (Life Technologies, Carlsbad, CA) and standard protocols with a new DNA preparation.

Results

Exome sequencing was performed in 970 probands with neurodevelopmental disorders, most of whom had previous nondiagnostic genetic testing including chromosome microarrays. WES produced an average of ~13 GB of sequence per sample. Mean coverage of captured regions was ~150× per sample, with >98 % covered with at least 10× coverage, an average of >92 % of base call quality of Q30 or greater, and an overall average mean quality score of >Q36. Filtering of common SNPs (>10 % frequency present in 1000 Genomes database) resulted in ~4400 variants per proband sample. After automated filtering of variants with a minor allele frequency (MAF) of >10 %, manual curation was performed to filter less common variants with MAF of 1–10 %, single variants in genes inherited from unaffected parents, evaluate predicted effects of rare variants and known function of the genes and associated human conditions, and examine overlapping phenotypes of individuals with de novo variants in the same gene. Among the 970 probands with neurodevelopmental disorders, there were four probands with predicted loss of function mutations in ARID2 with similar phenotypes. The NHLBI Exome Variant Server did not detect any truncating variants in ARID2 in approximately 6000 individuals of European and African-American ancestry, in the Database of Single Nucleotide Polymorphisms (dbSNP), or in a local database of over 15,000 exomes. The ExAC database contained eight loss of function (LoF) variants (URL: http://exac.broadinstitute.org; accessed June 2015), of which, three were annotated in non-canonical transcripts, and their annotations in the canonical transcript (ENST00000334344) are missense and intronic variants, which are unlikely to cause LoF. Four variants have abnormal allele balance, likely due to somatic mosaicism. Additionally, one stop-gain variant is within 5 % of C-terminal and therefore unlikely to result in nonsense-mediated decay. One apparent LoF variant is part of a multinucleotide polymorphism (MNP). Based on the information above, we believe that among the eight LoF variants listed in ExAC database, five are unlikely to be LoF variants, and the remaining three are potential somatic mosaic LoF variants. Therefore, the presence of these variants in ExAC does not contradict our suggestion that germ line de novo mutations in ARID2 gene are responsible for the phenotype in our patients.

Four probands from four families were identified with novel variants in ARID2 that are predicted to result in loss of function. Among the four novel variants, three were de novo, while the origin of the fourth variant could not be determined because parental samples were not available. Two of the variants (p.H1481fs*4 and p.V846fs*3) cause frameshifts, and the other two (p.L343* and p.Q1440*) are nonsense variants in the 1835 amino acid protein. All four variants are novel and were not present in Exome Aggregation Consortium (ExAC; http://exac.broadinstitute.org/), the Database of Single Nucleotide Polymorphisms (dbSNP; http://www.ncbi.nlm.nih. gov/ SNP/), 1000 Genomes (1000G; http://www.1000genomes.org/), or Exome Variant Server (ESP; http://evs.gs.washington.edu/EVS/), and all are predicted to disrupt ARID2 proximal to the two highly conservative zinc finger motifs (Fig. 1).

Fig. 1
figure 1

Location of ARID2 mutations. Germ line mutations identified in ARID2 gene in four probands diagnosed with intellectual disabilities. Figure shows ARID2 protein and the deleterious variants indicated by red arrows. p.L343* and p.Q1440* are nonsense mutations; p.V846fs*3 and p.H1481fs*4 cause frameshifts. ARID domain AT-rich interaction domain; RFX RFX-like DNA binding motif, GLN glutamine-rich region, Zinc Fingers C2H2 zinc fingers [27]

These four patients ranged in age from 6 to 15 years, and all have some degree of neurodevelopmental delay (Table 1). All of them have gross motor delays and began walking at 20–24 months of age. Hypotonia was frequently reported. They also demonstrate global developmental delay including speech and fine motor delays. Developmental quotient/intelligence quotient ranges from 50 to 89. All of the children also have behavioral issues although there is a range of reported behavioral issues including attention deficit hyperactivity disorder, tics, anxiety, obsessions, repetitive behaviors, and sensitivity to loud noises and certain food textures. Seizures were not observed.

Table 1 Clinical features of individuals with ARID2 mutations

All of the patients had a history of short stature or failure to thrive, although all had normal birth parameters. Gastroesophageal reflux disease and constipation are common.

Two of the patients have Wormian bones of the skull, one of whom also had plagiocephaly. A third patient has plagiocephaly that did not require treatment. One patient has a mild pectus excavatum, and one child has kyphoscoliosis and conductive hearing loss.

MRI of the brain was performed in two patients and demonstrated mild periventricular leukomalacia in one patient and a small arachnoid cyst with prominent lateral ventricles in the other patient.

All patients have dysmorphic facial features including micrognathia or retrognathia, low set or posteriorly rotated ears, epicanthal folds, down slanting palpebral fissures, highly arched palate, and frontal bossing (Fig. 2).

Fig. 2
figure 2

Facial characteristics of individuals with ARID2 mutations

One patient also had a cleft palate, diaphragmatic eventration, and atrial septal defect. Birth defects were not observed in the other three patients.

Discussion

We have identified ARID2 (HGNC ID: 18037) as a likely novel genetic cause of ID through WES. Clinical WES revealed four novel, likely gene damaging variants in ARID2 out of 970 ID patients evaluated. Each of our four patients has global developmental delay with notable problems with speech, as well as behavioral problems including ADHD, anxiety, and obsessions that are common among patients with ID and patients with Coffin-Siris syndrome or Nicolaides Baraitser syndrome, two other SWI/SNF-A chromatin-remodeling complex disorders. Notably, none of the children have had seizures, and there are no major brain malformations. Wormian bones, frontal bossing, micrognathia, and retrognathia are distinguishing features in these ARID2 patients not observed with other chromatin remodeling disorders.

We believe it is likely that the clinical features shared by these patients are due to ARID2 mutations. Patient 2 has birth defects including an atrial septal defect, cleft palate, diaphragmatic eventration, and also has a 421 kb 11q12.1 duplication. The parents were not tested for the 11q21.1 duplication, so inheritance could not be determined, but this duplication is small and we do not believe is likely to account for the majority of the clinical presentation. Whether these birth defects are low frequency manifestations of ARID2 mutations or are due to a second genetic contribution will await the characterization of more patients with ARID2 mutations. Patient 4 also carries a novel, de novo Y629N variant in EP300 that is predicted to be deleterious by SIFT, PolyPhen2, GERP, and CADD. Mutations in EP300 cause Rubinstein-Taybi syndrome-2 [26] (Table 1). The patient clinically does not resemble individuals with Rubinstein-Taybi syndrome, so it is unclear whether the EP300 variant is contributing to the patient’s neurodevelopmental delays.

ARID2 (NP_689854.2) contains a consensus AT-rich DNA interaction domain located at the N-terminus, followed by RFX, GLN (glutamine-rich region) and two classic C2H2 zinc fingers at the C-terminus, which directly bind to DNA or interact with proteins [8, 27]. All four ARID2 mutations in our cases (p.L343*, p.Q1440*, p.H1481fs*4, and p.V846fs*3) are predicted to be deleterious since they result in frameshift/truncated proteins, and all cause the loss of the two conservative zinc finger motifs, located at amino acids 1634–1690 [27] (Fig. 1). On the basis of the predicted deleterious effect of these variants on the protein and the fact that all three cases with parents available were de novo, we conclude that haploinsufficiency of ARID2 is a likely novel cause for ID in these four patients.

ARID2 belongs to the ARID family of proteins, characterized by the AT-rich DNA interaction domain [28], and plays an important role in development, tissue-specific gene expression and control of cell proliferation [29, 30]. ARID2 is ubiquitously expressed throughout the developing spinal cord, brain and other embryonic tissues such as heart and liver in mouse [31, 32]. It is one of the three ARID proteins in SWI/SNF subunits and is an intrinsic component of the PBAF complex [11].

PBAF and BAF are related, yet distinct chromatin remodeling complexes which share multiple core complex components including the catalytic subunit BRG1, and have unique components including ARID1A (BAF250a) and ARID1B (BAF250b) in the BAF complex, and polybromo (PB1, BAF180), ARID2 (BAF200), and BRD7 in the PBAF complex [11, 33]. Through combinatorial assembly of the subunits, SWI/SNF complexes play distinct and essential roles in cell fate specificity and lineage conversion during development. BAF complexes play essential roles in controlling cell renewal, neural progenitor cell division, and neuronal maturation. A switch in subunit composition of SWI/SNF complexes BAF45a/53a and BAF45b, c/53b has been identified in the transition from neural stem/progenitors to post-mitotic neurons [31], with ARID2 physically interacting with BAF complexes in different developmental stages [34]. The importance of the BAF complex in human brain development has emerged through recent discoveries that mutations in subunits including SMARCA2, SMARCB1, SMARCA4, ARID1A, and ARID1B have all been implicated in ID and related neurobehavioral disorders including Nicolaides-Baraitser syndrome [18], Coffin-Siris syndrome [17], autism [2], and schizophrenia [35]. The high frequency of BAF subunit mutations in neurological disorders underscores the fundamental importance of BAF in the development and function of the central nervous system [36].

Beyond the role of ARID2 in brain development and function, it also appears to have a role in skeletal and cardiac development. A recent study using comprehensive single cell-resolution analysis in Caenorhabditis elegans revealed that inhibition of ARID2 (swsn-7 ortholog) caused a severe delay in cell cycle progression, suggesting an important role as chromatin regulator in embryogenesis [37]. ARID2 plays an essential role in osteoblast differentiation using MC3T3-E1 pre-osteoblasts as a model. ARID2-depletion severely impaired the mature mineralization of osteoblasts [38]. ARID2 is found in the multiple protein complex that participates in SRF-dependent gene regulation, and affects the promoter activity of cardiac genes in NIH3T3 cells [27]. SRF regulates cell survival during murine embryonic development [39], is required for PI3-kinase-regulated cell proliferation [40], and plays an important role in the regulation of immediate-early genes and muscle-specific genes [41]. Arid2 (BAF200) homozygous mutant mice died between E12.5-E14.5, and Arid2 homozygous mutant mouse embryos exhibited multiple cardiac defects [32]. These data suggest that ARID2 is required for the regulation of cardiomyocyte proliferation. ARID2 interacts with CHD7 (chromodomain helicase DNA-binding domain, member 7, an ATP-dependent chromatin remodeler mutated in CHARGE syndrome) to promote neural crest formation and cell migration in human neural crest cells [42]. CHARGE syndrome is characterized by congenital anomalies including heart and craniofacial malformations, growth retardation, and developmental delay [43, 44]. Some of those features are seen in our ARID2 patients (Table 1).

Recent sequencing studies have identified frequent mutations in subunits encoding genes in SWI/SNF chromatin remodeling complexes in a wide spectrum of human cancers, and BAF complex subunit gene mutations occur in 20 % of all human cancers. SWI/SNF complexes are wildly involved in the regulation of gene expression through interactions with various transcription factors, recruiting coactivators and corepressors at target promoters; and they play important role in modulating DNA repair [4547]. A recent study also shows that the PBAF remodeling complex is important for double-strand breaks (DSBs)-induced transcriptional silencing and promotes repair of a subset of DNA DSBs at early time points [48]. ARID2 has been recently identified as a key cancer gene for hepatocellular carcinoma [49, 50], melanoma [51, 52], and lung cancer [53]. Recently, a missense mutation R314C in ARID2 was reported in an individual with autism [54]. R314C is the second most frequent somatic missense mutation in COSMIC in ARID2 gene, suggesting that this missense mutation leads to loss-of-function of ARID2. These cancer data further strengthen our prediction that haploinsufficiency of ARID2 is responsible for the intellectual disabilities of our patients.

To our knowledge, this is the first report of mutations in ARID2 associated with developmental delay and intellectual disabilities. Additional functional studies and evaluation of this gene in a larger patient population will be needed to definitely prove causality and fully characterize the spectrum of associated clinical features.