Introduction

Autism spectrum disorders (ASDs) are a group of developmental disorders that involve qualitative deficits in social interaction and communication, and restricted/repetitive interests and behavior. Included among the ASDs are autistic disorder, with deficits in all three domains observed before the age of 3 years; Asperger’s disorder, where developmental language milestones are achieved within normal limits; and pervasive developmental disorder–not-otherwise-specified (PDD–NOS), which may involve later onset, impairments in only two domains, or lesser severity (APA 2004).

Twin, sibling, and family studies of ASDs demonstrate increased risk of autism and autism-related traits to immediate family members (reviewed by Freitag 2007). Briefly, twin studies (Bailey et al. 1995; Folstein and Rutter 1977; Ritvo et al. 1985; Steffenburg et al. 1989) show MZ twin concordance rates between 36 and 96%, with DZ twin concordance rates ranging between 0 and 30%. Among siblings of autistic individuals, the recurrence risk is estimated at 6%, and increased rates of cognitive and communication abnormalities, social impairments, rituals and repetitive play are noted (Bishop et al. 2006; Bolton et al. 1994; MacLean et al. 1999; Silverman et al. 2002; Spiker et al. 1994). Studies of parents of children with autism show increased rates of various social impairments (Bailey et al. 1995, 1996; Bolton et al. 1994). These studies strongly suggest that genes play an important role in autism susceptibility.

Both rare and common susceptibility alleles have been implicated in the genetics of autism. Recent work indicates a substantial role for rare variants, including chromosomal rearrangements (Szatmari et al. 2007) and copy number variants (Glessner et al. 2009). Genome scans of autism using linkage analysis have been performed in ten independent datasets (Allen-Brady et al. 2009; Auranen et al. 2002; Barrett et al. 1999; Cantor et al. 2005; IMGSAC 1998; Philippe et al. 1999; Risch 1999; Schellenberg et al. 2006; Shao et al. 2002b; Yonan et al. 2003). Some signals have been replicated, including 7q (Barrett et al. 1999; Palferman et al. 2001a; Risch 1999; Schellenberg et al. 2006; Shao et al. 2002b; Yonan et al. 2003), 2q (Buxbaum 2001; Palferman et al. 2001b; Philippe et al. 1999), and 17q (Cantor et al. 2005; Yonan et al. 2003), while other regions have shown interesting results in individual datasets [e.g. 5p (Yonan et al. 2003), 13q (Barrett et al. 1999), 4q (Schellenberg et al. 2006), 19p13 (Liu et al. 2001)]. Failure to replicate linkage signals in some regions is likely due to phenotypic or genotypic heterogeneity and/or different ascertainment schemes.

Attempts to address this heterogeneity fall into three broad classes. The first approach is to identify a subset of affected individuals with distinctive characteristics, and stratify the data accordingly. Families have been classified using gender of autistic individuals (Cantor et al. 2005; Lamb et al. 2005; Schellenberg et al. 2006; Stone et al. 2004), presence of developmental regression (Molloy et al. 2005; Schellenberg et al. 2006), severity of obsessive–compulsive behaviors (Buxbaum et al. 2004), delay in the development of phrase speech (Bradford et al. 2001; Buxbaum 2001; Shao et al. 2002a) and IQ (Liu et al. 2008). Despite the reduction in sample size using this approach, linkage signals on 17q11 and 2q were greatly strengthened by considering male-only sib-pairs (17q11) (Cantor et al. 2005; Stone et al. 2004) and by stratifying on phrase speech delay (2q) (Buxbaum 2001; Shao et al. 2002a). The second approach to reducing heterogeneity is to analyze a related quantitative trait, rather than the qualitative autism diagnosis. This approach may increase power, as analysis based on a quantitative trait may address milder expressions of the phenotype that are obscured by a binary diagnosis, and the dataset need not be stratified. Previously examined quantitative traits include social impairment (Duvall et al. 2007; Liu et al. 2008), non-verbal communication ability (Chen et al. 2006), age at first word, and age at first phrase (Alarcon et al. 2002, 2005), the latter two of which identified a region on chromosome 7q35 (Alarcon et al. 2002, 2005). Finally, genome-wide association studies using very large sample sizes have been carried out. These studies ignore subtleties in the phenotype, in hopes that the large samples used will provide the statistical power necessary to overcome heterogeneity. Results have been disappointing so far: regions have been implicated on 5p14 (Ma et al. 2009; Wang et al. 2009) and 5p15 (Weiss and Arking 2009), but replication of these signals has not been achieved (Anney et al. 2010).

Increased IQ discrepancy (between performance and verbal IQ) is seen in autistic individuals and is correlated with core components of the autism phenotype. IQ discrepancy is a quantitative trait, and is easily measured on parents and unaffected siblings. A review of 23 studies considering the cognitive profile of individuals with autism found that verbal IQ is generally lower than performance IQ (Lincoln et al. 1998), a pattern frequently found in a recent study of over 450 preschoolers with autism (Munson et al. 2008), although not universally observed (e.g. Mayes and Calhoun 2003; Minshew et al. 2005; Siegel et al. 1996; Szatmari et al. 1990). Decreased verbal IQ relative to performance IQ can be thought of as an indirect measure of communication impairment, one of the core components of the autism phenotype. Several studies have documented correlation of IQ discrepancy with autism symptoms in the social and communication domains. Smaller IQ discrepancy is correlated with better language skills (Lincoln et al. 1998), while IQ discrepancy in the direction of higher non-verbal skills is correlated with impaired social skills (Tager-Flusberg and Joseph 2003). Recent studies have found that large IQ discrepancies in either direction were associated with social symptoms of autism (Black et al. 2009), while intellectual disability (as measured by a composite IQ score) was only weakly correlated with severe autistic traits (Hoekstra et al. 2009).

We present the first genetic analysis of IQ discrepancy as a continuous autism-related phenotype. We describe correlations between IQ discrepancy and social and communication impairments, and present evidence for the genetic basis of IQ discrepancy from clustering and segregation analysis. We describe evidence of linkage to five genomic regions based on a genome-scan, including one region that reaches genome-wide significance.

Materials and methods

Ascertainment and phenotyping

Recruited families were identified with two or more children with an ASD, as part of the National Institutes of Health Collaborative Programs of Excellence in Autism. Contributing centers included the University of Washington and the University of Pittsburgh. Exclusion criteria included proband age below 3 years, presence of a known genetic condition, history of a serious head injury or of significant sensory or motor impairment, such as blindness or cerebral palsy. This study was approved by the University of Washington Institutional Review Board, and informed consent was obtained from all participants and/or their parents.

Children with a reported ASD were directly assessed using the Autism Diagnostic Interview-Revised (Lord et al. 1994) (ADI-R) and Autism Diagnostic Observation Schedule-Generic (Lord et al. 2000) (ADOS-G) by trained clinicians, and assigned a DSM-IV (APA 2004) diagnosis. Individuals designated as affected met DSM-IV criteria for an ASD based on ADI-R, ADOS-G and expert clinical judgment. Affected status in this dataset corresponds to the “broad” autism diagnosis used in a previous study (Schellenberg et al. 2006).

A total of 287 families with two or more affected children were recruited, including 278 families from the University of Washington and 9 from the University of Pittsburgh. Family structures are primarily nuclear (n = 270), with 2–8 children. Other family structures include those with half-siblings (single or multiple; n = 15) and extended families (n = 2). Information regarding race was available for 267 families, 227 of whom self-identified as White, 7 Asian, 4 Black or African-American, 1 Native Hawaiian or Pacific Islander, 1 American Indian, and 27 mixed race.

Cognitive ability was assessed in all willing participants using standardized assessments appropriate to the individual’s age: for ages 2.5–4.5 years, the Mullen Scales of Early Learning (Mullen 1997); for ages 4.5–6 years, the Wechsler Preschool and Primary Scale of Intelligence-Revised (Wechsler 1989) (WPPSI-R); for ages 7–17 years, the Wechsler Intelligence Scale for Children (Wechsler 1991) (WISC-III); and for ages 17 and older, the Wechsler Adult Intelligence Scale (Wechsler 1997) (WAIS-III). The Wechsler short form subtests consisted of Vocabulary, Comprehension, Block Design and Object Assembly. Performance IQ (PIQ), verbal IQ (VIQ) and full-scale IQ (FSIQ) were estimated based on these subtests as recommended (Sattler 2001). From the Mullen scales, we calculated VIQ based on receptive and expressive language scales, and PIQ based on visual reception and fine motor scales. The IQ discrepancy phenotype was defined as PIQ-VIQ and was available for 1,127 of the 1,359 individuals (82%) in these families. Some subtleties in the discrepancy phenotype arise in individuals who receive the lowest possible score (floor), on either or both of the VIQ or PIQ scales. When an individual floored on VIQ alone (n = 102), their discrepancy score was treated as censored from above. Similarly, flooring on PIQ alone (n = 6) represents censoring from below, but due to analytical constraints prohibiting censoring on both ends of the distribution, these scores were taken at nominal value. For individuals who floored on both PIQ and VIQ (n = 58), the IQ discrepancy was considered to be missing. The remaining individuals with missing data were mostly founders.

Genotyping

Genotype data on a genome-scan of micro-satellite markers were available for 1,157 (85%) of 1,359 individuals. The remaining 15% represents unavailable founders. The mean distance between the 387 markers used was 9.29 centiMorgans (cM), and the largest gap was 22.90 cM. These genotype data, determined with methods previously described (Schellenberg et al. 2006), are henceforth referred to as the ABI dataset.

Supplemental micro-satellite marker data provided 95 new markers on the five chromosomes of interest (28, 20, 19, 14 and 14 markers on chromosomes 2, 6, 10, 15, and 16, respectively). These genotypes were available for up to 857 individuals in the 199 families that were part of a collaboration with the Autism Genome Project (AGP) (Szatmari et al. 2007). An integrated map containing both ABI and supplemental AGP markers was generated for each chromosome, using the information in the Rutgers map (build 35.1) (Kong et al. 2004). The average marker spacing of the integrated maps over the five chromosomes was 4.9 cM, and the largest gap was 15.1 cM. The addition of the AGP dataset does not increase the number of families contributing to the analysis, but rather increases the marker density for those already contributing through the ABI dataset.

A total of 210 families have two or more children who are both phenotyped and genotyped, and therefore contribute to both linkage and trait-modeling analyses. The remaining families have phenotype data alone, and therefore contribute information only to trait-modeling analysis. A subset of these families was used in a previous linkage analysis of the binary autism phenotype (Schellenberg et al. 2006).

Statistical analyses

Correlation and clustering analyses

We tested the null hypotheses that (a) IQ discrepancy is uncorrelated between affected siblings; and (b) high IQ discrepancy does not cluster in families. We considered the 189 families with exactly two affected children for whom IQ discrepancy was measured, and calculated the Pearson correlation coefficient for the continuous IQ discrepancy phenotype. We also defined a qualitative phenotype, “high PIQ”, which is observed when the IQ discrepancy is greater than or equal to 15 points. Each sib-pair was then designated either concordant or discordant for high PIQ status. In each of 10,000 permutations, the individuals with high PIQ were randomly assigned among the 189 families, and the resulting number of concordant sib-pairs was recorded. The P value is the fraction of permutations in which the number of concordant sib-pairs exceeded the number in the observed data.

Joint segregation and linkage analysis

Oligogenic joint segregation and linkage analyses, in addition to segregation analysis alone, were carried out using Loki 2.4.7 (Daw et al. 1999; Heath et al. 1997), a package that models genetic traits using a Bayesian Markov chain Monte Carlo (MCMC) approach. This approach provides computationally tractable simultaneous multipoint analyses of censored continuous traits while incorporating a flexible model that allows for multiple quantitative trait loci (QTLs). The power and reliability of these methods have been described using simulated and real datasets [e.g. HDL variation (Gagnon et al. 2003)], while Wijsman and Yu (2004) provide a detailed review of the attributes and interpretation of the methodology.

The IQ discrepancy trait is modeled as

$$ y_{j} = \alpha + \sum\limits_{i = 1}^{k} {Q_{i,j} } + \varepsilon_{j} $$

where y j is the value of PIQ-VIQ for individual j, α the baseline mean, Q ij the genotype effect due to QTL i according to individual j’s genotype, and ε i is the normally distributed random error. The total number k of QTLs is a random variable estimated in the modeling process. Prior assumptions include (a) the number of QTLs follow a Poisson distribution with mean 2; (b) the genotype effects for each QTL are normally distributed with mean zero and common variance τ = 400, and (c) QTLs are uniformly distributed across the genome.

By use of a Bayesian reversible-jump MCMC sampler, estimates of posterior distributions are obtained conditional on observed data and prior assumptions. Parameters for which the posterior distribution is estimated include the number k of QTLs, the allele frequencies and genotype effects for each, and if marker genotype data were included, QTL locations. Both segregation and linkage analyses were based on 500,000 iterations after 5,000 iterations of burn-in, with values in alternate iterations saved for estimation of the posterior distributions. Single-marker analyses were based on 150,000 iterations.

In analyses where the observed data include marker genotypes, the strength of evidence for linkage to an interval of the genome can be summarized using the Bayes Factor:

$$ {\text{BF}} = {\frac{{{{q_{1} } \mathord{\left/ {\vphantom {{q_{1} } {\left( {1 - q_{1} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {1 - q_{1} } \right)}}}}{{{{q_{0} } \mathord{\left/ {\vphantom {{q_{0} } {\left( {1 - q_{0} } \right)}}} \right. \kern-\nulldelimiterspace} {\left( {1 - q_{0} } \right)}}}}}. $$

where q 1 is the posterior probability of a QTL being in the interval of interest, while q 0 is the prior probability of a QTL being in that interval, based on the assumption of uniform QTL distributions on the genome. Thus, the BF is the ratio of the posterior odds of linkage to the prior odds of linkage. We use the base-10 logarithm of the BF (calculated in 4 cM intervals across each chromosome) to render the scale similar to that of a traditional LOD score. This is a Bayesian approach, and therefore frequentist P values are not appropriate for interpretation of a single observed BF.

Generation of empirical P values

Region-wide empirical P values were generated by simulation in each of the five regions of interest. We defined a region of 60 cM on each chromosome of interest that included the location of the linkage signal, and used the program SimSuite (Igo and Wijsman 2008) to simulate 1,000 sets of marker data for each. Joint segregation and linkage analyses were performed for each simulated dataset, and the results were summarized by calculation of log(BF) across the region. The P value is the fraction of the simulations for which the observed maximum log(BF) anywhere in the region exceeds that obtained in the original dataset.

Results

Table 1 shows age and sex demographics along with IQ summary statistics for the 287 families, subdivided into groups of founders (individuals without parents in the pedigree; either parents or grandparents), unaffected children, and affected children. The distributions of the IQ discrepancy score are shown in Fig. 1, and summarized in Table 1. In parents, the IQ discrepancy score is centered near zero (mean = −0.99), with similar proportions of individuals in the high (PIQ–VIQ ≥ 15; 16%) and low (PIQ–VIQ ≤ −15; 19%) tails. In unaffected children, the mean discrepancy score is higher (mean = 6.67) than in parents, and the proportion of individuals scoring in the high tail (33%) is twice that in parents. Finally, the mean discrepancy score in affected children is higher still (mean = 18.69) and a larger proportion of individuals fall in the high tail (58%), than in the low end (6%) of the distribution. These results for children with autism spectrum disorders are in contrast to the distribution of IQ discrepancy scores in the reference population for the WISC-III, where 12.4 and 14.3% of children fell in the low and high tails, respectively (Sattler 2001). In adults assessed with the full WAIS-III, 17.6% of a reference sample fell in the high and low tails combined (Wechsler 1997), as compared to 37% in our sample. It is notable that FSIQ scores in adults in our sample are high, possibly reflecting recruitment bias.

Table 1 Demographic characteristics and summary statistics for full-scale IQ (FSIQ), verbal IQ (VIQ), performance IQ (PIQ) and IQ discrepancy (PIQ–VIQ) for 287 families
Fig. 1
figure 1

Empirical distributions of IQ discrepancy (PIQ–VIQ) in parents (solid line), unaffected siblings (dashed line) and affected individuals (dot-dash)

Phenotypic correlations

Previously identified correlations between IQ discrepancy and core autism symptoms are replicated in our sample. Table 2 summarizes the correlation of IQ discrepancy and raw scores of both social and communication subscales of the ADOS and ADI, with affected individuals grouped according to which module was administered (ADOS) or their verbal/non-verbal status (ADI). ADOS modules correspond to the expressive language level of the person being tested—module 1 is for individuals who are pre-verbal or have single words; module 2 is for individuals with phrase speech. modules 3 and 4 are for children and adults with fluent speech, respectively. Correlation of IQ discrepancy with the ADOS social and communication subscales was significantly different from zero in individuals who were administered module 3. Borderline significance was observed in those given module 4, and for the social subscale in those given module 2. Correlation of IQ discrepancy with ADI social and communication subscales was also significantly different from zero in verbal individuals (see Online Resource 1 for more detail).

Table 2 Spearman correlation between IQ discrepancy and ADOS and ADI social and communication subscales

Data cleaning

RELPAIR version 2.0 (Epstein et al. 2000) was used to check family structures and identify potential sample swaps, using the assumption of a 1% genotyping error rate. Using the ABI dataset, we identified families with monozygotic twins and apparent non-paternities. Mendelian inconsistencies were identified using Loki (Heath et al. 1997). In the ABI dataset, a total of 1,156 inconsistencies were identified, from a total of 555,573 genotypes, for an overall error rate of 0.2%. These inconsistencies were coded as missing for all individuals in the pedigree in question.

Correlation and clustering analyses

Correlation and clustering analyses indicate that (a) IQ discrepancy is correlated in affected siblings and (b) high IQ discrepancy clusters in families. The correlation coefficient for the quantitative phenotype was 0.27 (P = 0.00014). There is also significant evidence (P = 0.0067) that high PIQ phenotype clusters within families. Of the 378 affected children in the 189 families with exactly two affected children, 219 had high PIQ. Seventy-five sib-pairs were discordant for high PIQ, while 114 were concordant (both had high PIQ in 72 families, both did not in 42 families).

Segregation analysis

MCMC oligogenic segregation analysis demonstrates that there is substantial evidence of familial transmission of IQ discrepancy in this dataset. After approximate Winsorization (Igo et al. 2006) of two potentially influential observations (scores of 71 and −68, Winsorized to 50 and –50, respectively), 3 clear models emerge (see Table 3). Model A, with a broad-sense heritability of 33%, has a minor-allele frequency of 0.25, where the rare homozygote has dramatically increased IQ discrepancy. Model B, with a broad-sense heritability of 18%, has a similar structure, with a minor-allele frequency of 0.36 and a smaller increase in IQ discrepancy in the rare homozygote. In Model C the minor-allele (P = 0.47) conveys risk to the rare homozygote of increased IQ discrepancy in the opposite direction to models A and B (PIQ < VIQ).

Table 3 Models resulting from MCMC segregation analyses of IQ discrepancy, allowing for censoring from above

Joint segregation and linkage analysis

Genome-scan analyses identified five chromosomal regions with substantial evidence of linkage to IQ discrepancy. Figure 2 shows the log(BF) across the genome. A log(BF) of 1 as a lower threshold for notable linkage signals identified five regions. By far the strongest signal genome-wide is on chromosome 10, where log(BF) achieves a maximum value [logmax(BF)] of 2.53 at D10S197, and log(BF) > 1 over a 12 cM region. Chromosomes 2 and 16 yielded moderate signals with logmax(BF) values of 1.69 (near D2S2259) and 1.89 (near D16S3091), respectively. The region of interest covers 12 cM on chromosome 2, and 32 cM on chromosome 16. The weakest signals are on chromosomes 6 and 15, where logmax(BF) is 1.11 and 1.21, respectively. On chromosome 6, log(BF) > 1 in only one 4 cM interval (between D6S441 and D6S1581), although log(BF) is inflated over background over an interval of 14 cM. The interval on chromosome 15 is larger, with log(BF) > 1 over 12 cM (between D15S131 and D15S205). Single-marker analyses in all five chromosomal regions (see Online Resource 2) confirm that each signal is due to multiple markers, and support the multipoint results. The addition of the AGP markers on the chromosomes of interest results in only subtle changes in the interpretation of the results. Figure 3 shows log(BF) on chromosomes 2, 6, 10, 15 and 16, using both the ABI marker set alone, and the combined ABI and AGP markers. Differences are small and likely reflect stochastic variation among MCMC runs.

Fig. 2
figure 2

Genome-scan results using the ABI dataset. The base-10 logarithm of the Bayes Factor, calculated in 4 cM intervals, is plotted against the genetic location along each chromosome using the Haldane map. Chromosomes are labeled across the bottom of the figure

Fig. 3
figure 3

Whole-chromosome scan results for each of the five chromosomes of interest, using both the ABI dataset (solid line) and the combined ABI and AGP datasets (dot-dash line). The base-10 logarithm of the Bayes Factor is plotted against the genetic location (cM) along the chromosome

Empirical P values confirm that the linkage signals on chromosomes 10 and 16 showed strongest significance, with the chromosome 10 signal achieving genomewide significance (Table 4). On chromosome 10, logmax(BF) = 2.65 had a region-wide empirical P value of 0.001, which corresponds to a genome-wide P value of 0.05 (after a conservative Bonferroni-style correction based on the total genome length assumed in the analysis). On chromosome 16, logmax(BF) = 1.67 had a region-wide empirical P value of 0.015 (genome-wide P = 0.53), and signals on chromosomes 2, 6 and 15 had region-wide P values of 0.03, 0.047 and 0.053 (Bonferroni corrected genome-wide P values 0.78, 0.91 and 0.93), respectively.

Table 4 Region-wide empirical P values based on 1,000 simulated datasets for each chromosome

The three models identified by the segregation analysis reappear in the joint segregation and linkage analysis, with results supporting the specific localization of each model. The QTLs with locations on chromosomes 10 and 16 represent models A and B, respectively, while those on chromosome 15 overwhelmingly represent model C. The models are less clearly defined on chromosomes 2 and 6: QTLs with chromosome 2 locations represent both models B and C, while those on chromosome 6 represent models A and C.

Discussion

The work presented here demonstrates that IQ discrepancy is a useful autism-related phenotype, and may also be of interest in other disorders that involve deficits in language and/or communication. In the current study, increased performance IQ relative to verbal IQ was observed in siblings of autistic individuals, and to an even greater degree in the probands themselves. Such an observation is important, and can only be demonstrated in a dataset such as ours where the phenotype has been measured in additional relatives. IQ discrepancy in the same direction as observed here is also observed in specific language impairment [where children have language below age-expectations but normal or higher non-verbal intelligence (Rice et al. 2005)], learning-disabilities including dyslexia (e.g. Alm and Kaufman 2002), and in some samples of children with ADHD (Wechsler 1991). The genes that affect IQ discrepancy in autism may also be relevant to these other disorders—but only further study can answer this question.

The use of IQ discrepancy as a phenotype for genetic analysis, while novel, is supported by results in cognitively normal subjects from linkage studies of component domains of FSIQ, and the observation of consistent discrepancy in autism samples (reviewed in “Introduction”). Previous genome-scan linkage analyses (Dick et al. 2006; Luciano et al. 2006; Posthuma et al. 2005) of FSIQ, PIQ, and VIQ have demonstrated that the different domains of IQ (i.e. PIQ and VIQ) may map to different locations in the genome. Potential limitations with the use of IQ discrepancy include (a) greater relative variability due to the use of a difference measure, and (b) the possibility that PIQ and VIQ may assess slightly differing aspects of intelligence at different points in development (Sattler 2001). Despite these challenges, our results clearly demonstrate that strong signals of a genetic basis can persist. Replication of our signals in other datasets will require careful selection of a phenotype that is the same or very similar to that measured here, and may also require availability of the phenotype in parents and siblings to achieve sufficient power. For example, identical tests may be necessary because different tests measuring the same domain can yield different results, and even within the same test, subtle differences in subtest composition can be important.

Analysis of IQ discrepancy as an autism-related phenotype in these multiplex families demonstrated that this trait segregates in a manner consistent with the presence of several major genes. Joint segregation and linkage analysis highlighted five regions with evidence for linkage, the strongest on chromosomes 10p12 and 16q23. Regions implicated by other groups for linkage to the binary autism phenotype on 2q, 7q and 17q were not replicated in the current analysis, suggesting that IQ discrepancy addresses a novel aspect of the autism phenotype that has not been explored before. From an analytical standpoint, IQ discrepancy has two particular strengths. First, as a continuous measure, it maximizes power to detect linkage. Second, it is directly measurable in both adults and children, with the availability of parental data contributing information about the transmission of the trait from parent to child.

This is the first genome-scan in which chromosome 10 yielded the strongest signal for an autism phenotype, although the region has been previously implicated. The first analysis of the International Molecular Genetic Study of Autism Consortium (IMGSAC 1998) implicated exactly the same marker (D10S197). Analyses of two distinct subsets of the Autism Genetic Resource Exchange database found weak evidence of linkage in male-only affected sib-pairs (Cantor et al. 2005) and in sib-pairs with obsessive–compulsive traits (Buxbaum et al. 2004), at locations about 15 and 25 cM p-ter to D10S197, respectively. Further evidence implicating this region includes a report of a child with PDD-NOS and a complex chromosomal rearrangement involving breakpoints in both 10p11 and 10p12 (Zwaigenbaum et al. 2005). Together these results indicate that further consideration of this region of chromosome 10 is warranted.

The chromosome 16 signal, while statistically less significant, is particularly interesting because the maximum evidence of linkage (at D16S3091) falls in the same location as a strong linkage signal for a non-word repetition (NWR) phenotype in a sample of families with specific language impairment (SLI) (Newbury et al. 2004). Studies to date of NWR in both autism and SLI are reviewed in Whitehouse et al. (2008). SLI is often thought to result from deficits in phonological short-term memory (Gathercole and Baddeley 1990), quantified using tests based on NWR, and poor performance on such tests is a defining feature of SLI (Tager-Flusberg and Cooper 1999). Significantly, poor performance on NWR tasks has also been observed in a subset of autistic individuals with language impairments (Kjelgaard and Tager-Flusberg 2001). This striking finding of deficits in NWR tasks in both SLI and autism has been interpreted in different ways. Some argue that there is a common underlying etiology for both SLI and the language impairment aspects of autism (Tager-Flusberg 2006). Alternatively, others argue that other aspects of language impairment differ in the two populations, and in fact the types of error made in the NWR tasks are also different (Whitehouse et al. 2008). Allelic heterogeneity provides a plausible explanation for both the coincident linkage findings and the observation of differing errors on NWR tests in the SLI and autistic populations. Allelic heterogeneity is a well known phenomenon, and includes the classic case of Duchenne and Becker muscular dystrophies, both of which involve mutations in the dystrophin (OMIM #300377) gene, but with very different presentation and disease course (Bushby 1992). The coincidence of our linkage signal for IQ discrepancy in autism with a strong linkage signal for NWR in SLI supports the idea of a common etiology for aspects of these disorders, and provides an exciting new avenue of study.

The region of interest on chromosome 16 contains an obvious candidate gene: CDH13. D16S3091, which is at the location with the strongest evidence for linkage, is in the gene for CDH13, a T-cadherin. CDH13 is expressed in adult cerebral cortex and medulla, and may play a role in neural cell proliferation (Takeuchi et al. 2000). There is a high degree of amino acid conservation between mouse and human T-cadherin sequence, suggesting that the gene is of particular importance (Takeuchi et al. 2000). Thus, CDH13 is a plausible candidate gene for autism and related phenotypes. In a recent genome-wide association study, a SNP in CDH13 was significantly associated with autism (P = 0.000845) in combined samples from the Autism Genetics Resource Exchange and the Autism Case Control cohort (Wang et al. 2009). Furthermore, CDH13 has been implicated in three genome-wide association studies of ADHD (Franke et al. 2009). Children with ADHD and ASD share impairments in executive functioning and have overlapping behavioral concerns, and involvement of common neurological pathways and neuro-anatomical structures (Corbett et al. 2009). It is plausible that mutations in CDH13 contribute to ADHD, SLI and autism—a parsimonious hypothesis that would relate three complex developmental phenotypes through a gene expressed in the brain. Confirmation of this hypothesis awaits replication of the linkage signal in other datasets, and close examination of the region with dense SNP panels or DNA sequencing.