Introduction

Over the past 15 years, there has been increasing interest in the early identification of autism spectrum disorders (ASD). In that respect, several studies have examined the stability of early diagnosis (Lord 1995; Cox et al. 1999; Moore and Goodson 2003; Charman et al. 2005). In keeping with those studies, we conducted a meta-analysis focussing on the stability of the diagnosis of pervasive developmental disorder not otherwise specified (PDD-NOS), when diagnosed for the first time in young children.

Three diagnostic categories are included in ASD: autistic disorder (AD), pervasive developmental disorder not otherwise specified (PDD-NOS) and Asperger’s syndrome (AS). PDD-NOS represents an important ASD subtype due to the high frequency with which it is diagnosed, even though it is the least studied (Matson and Boisjoli 2007). In fact, prevalence estimates are 2.6 per 10,000 for AS, 13 per 10,000 for AD and 20.8 per 10,000 for PDD-NOS (Fombonne 2005). Despite PDD-NOS tending to be more commonly diagnosed than AD, there are very few studies on the predictive validity and stability of this diagnostic category.

Early identification of ASD facilitates participation in specialized intervention programs. Studies of the impact of early intervention reported significant improvements in communication skills and social behaviour, and diminished maladaptive behaviours (Charman and Baird 2002). The eligibility for participation in these programs is often limited to children with a formal diagnosis of autism, supporting the relevance of an accurate early diagnosis. However, earlier identification of ASD highlights important issues such as the difficulty to diagnose ASD and to differentiate AD and PDD-NOS in very young children.

Diagnosing autism at a very young age is possible but clinically challenging because of its overlap with severe language delays or disorders and general developmental delays due to mental retardation (Lord 1995; Charman and Baird 2002). Van Daalen et al. (2009) reported in their study that even experienced clinicians had disagreements on the distinction between ASD and an intellectual disability without ASD in 2-year-old children. Lord (1995) also reported that clearer diagnostic differences between autistic and non-autistic children occur at age 3. At this age, over-inclusion of children with severe mental handicap and severe communication difficulties as autistic was less frequent than at age 2 (Lord 1995). This lack of predictive validity of ASD diagnoses might be due in part to developmental issues. In fact, studies of family home movies reveal that stereotypical behaviours and restricted interests appear later in the course of ASD, impeding early diagnosis (Saint-Georges et al. 2010).

Although the differentiation between delayed and deviant development remains clinically challenging in the first 2 years of life, inter-rater reliability for the distinction between ASD and non-ASD tend to be good to excellent (Van Daalen et al. 2009). Mahoney et al. (1998) found that experienced clinicians could reliably differentiate those with and without ASD (κ = 0.67). This result is consistent with the DSM-IV autism field trial (Klin et al. 2000) which reported an excellent inter-rater reliability (κ = 0.95) for the distinction between AD and non-PDD diagnoses. More recently, Van Daalen et al. (2009) found a good agreement between psychiatrists (κ = 0.74) for the distinction between ASD and non-ASD in 2-year-old children. The finer distinction between ASD subtypes, especially AD and PDD-NOS, seems more problematic with authors consistently reporting lower inter-rater agreements: κ = 0.18 (Mahoney et al. 1998); κ = 0.65 (Klin et al. 2000); κ = 0.51 (Van Daalen et al. 2009). These values are consistent with Witwer and Lecavalier (2008) who concluded that clinicians and researchers were not able to discriminate the three ASD subtypes based on the current DSM criteria. In addition to these observations on the inter-rater reliability of the PDD-NOS diagnosis, concerns have been raised about its predictive validity. While several studies have reported high stability levels for AD when diagnosed at age 2 (Lord 1995; Cox et al. 1999; Moore and Goodson 2003; Charman et al. 2005), lower stability has been observed for PDD-NOS (Cox et al. 1999). Walker et al. (2004) suggested that these results originate in the inability to accurately diagnosed PDD-NOS with current autism assessment instruments and the ambiguous DSM-IV-TR conceptualization.

In order to reinforce the reliability and validity of the diagnostic categories, a number of standardised instruments including the Autism Diagnostic Observation Schedule (ADOS) and the Autism Diagnostic Interview-Revised (ADI-R) have been developed over the past years. The ADOS is a standardised diagnostic instrument which provides a direct observation of the child in different domains: social interactions, communication, play and imagination, and stereotyped behaviours. Each domain has cut-off scores for classification, and the scores for the first two domains are totalled to provide a summary score. Children can be classified as autistic, PDD-NOS or non-autistic (Lord et al. 1999). The ADI-R is a clinician-based parent interview that evaluates the child’s communication, social development, play and restricted, repetitive and stereotyped behaviours. Children can be classified as autistic or non-autistic (Lord et al. 1994). Diagnostic evaluation relies mostly on the clinical presentation around 4–5 years of age. Thus the utility of using this instrument in very young children has been questioned (Charman and Baird 2002). Another disadvantage about the use of ADI-R is that there is no scoring algorithm for PDD-NOS or AS (Scahill 2005). Despite the development of these empirically validated diagnostic instruments, clinical judgment by experienced clinicians remains the gold standard for an ASD diagnosis, especially in very young children (Charman and Baird 2002). In regards to those limitations, standard diagnostic instruments should be used with caution.

According to DSM-IV-TR (2000), PDD-NOS is to be used when there is a social interaction impairment associated with either communication impairments or with the presence of repetitive and stereotyped behaviours, interests, and activities, but the criteria for another PDD are not met. According to DSM-IV-TR criteria, PDD-NOS remains a heterogeneous group because it includes all children in the autism spectrum who are not classified with AD or AS. Thus, some concerns have been raised regarding the validity of the PDD-NOS category, with many questioning the vague criteria and the heterogeneous group of individuals included in this diagnostic category (Towbin 2005). Walker et al. (2004) also highlighted the fact that children with many diverse clinical and heterogeneous features are given the PDD-NOS diagnosis due to a lack of a better clinical definition. In regards to those concerns, Witwer and Lecavalier (2008) reviewed studies that examined differences between ASD subtypes but they were not able to conclude on the validity of the three ASD subtypes due to the inconsistent application of diagnostic criteria by researchers. Therefore, the validity of the distinction between ASD subtypes still remains a topic of debate among clinicians and researchers. Given the low inter-rater reliability for the distinction between AD and PDD-NOS, Van Daalen et al. (2009) recently questioned the validity and utility to differentiate PDD-NOS and AD at a very young age. They suggested restricting clinical diagnosis to ASD or non-ASD in 2-year-old children. This is consistent with Walker et al. (2004) who reported that the differentiation of ASD subtypes should be done in children older than 36 months to allow for the repetitive behaviours and cognitive and language impairments to be apparent.

Thus, as PDD-NOS refers to a heterogeneous construct associated with several diagnostic validity concerns, many clinicians tend to see this diagnostic category as supposedly less stable over time (Towbin 2005). However, this assumption should be challenged with respect to the state of the literature. This paper aims at examining the temporal stability of the PDD-NOS category using a meta-analytic method, exploring the developmental trajectories of PDD-NOS over time and finally bringing light on the reliability of early PDD-NOS diagnoses in young children.

Methods

Study Inclusion

The literature search was conducted in the Ovid Medline database to locate published articles from January 1996 to October 2009. We obtained 180 articles by combining a diagnostic term (pervasive developmental disorder/diagnosis or autistic disorder/diagnosis) with a descriptor term (prospective studies or longitudinal studies or early diagnosis or follow-up studies or outcome assessment). To be included in this meta-analysis, studies had to follow a longitudinal study design, provide original information about PDD-NOS diagnostic stability and use DSM criteria. Based on these criteria, we selected 8 articles for inclusion in the meta-analysis (see Fig. 1 article selection process). Most studies yielded by the electronic search were about AD and Asperger’s Disorder rather than PDD-NOS. Consequently, we obtained a low number of articles for this review.

Fig. 1
figure 1

Article selection process. AD autistic disorder, PDD-NOS pervasive developmental disorder not otherwise specified, ASD autism spectrum disorder

Response Criteria and Hypothesis

Exploring diagnostic stability, the response criterion chosen for the meta-analysis was the percentage of individuals with the same diagnosis at Times 1 and 2. We hypothesized that PDD-NOS was less stable than AD.

Statistical Analysis

The statistical analysis was performed with EasyMA software (Cucherat et al. 1997). We used the relative risk (RR) as a parameter of stability. A random model was chosen as the chi-square heterogeneity test first performed, yielded a value of 21.56 (p = 0.003).

Results

Demographic and Assessment Characteristics of the Studies

After combining all the studies, we obtained a total sample of 322 AD children and 122 PDD-NOS children. In all studies the first diagnostic assessment was conducted before 36 months. Mean age at Time 1 ranged from 21.6 to 33.0 months and mean age at Time 2 ranged from 34.8 to 113.8 months. Time intervals ranged from 12 to 84 months. All studies used clinical diagnoses based on DSM-IV criteria. Two of them used DSM-IV-TR diagnostic criteria (Turner et al. 2006; Turner and Stone 2007). All studies except Stone et al. (1999) and Eaves and Ho (2004) used the ADI-R and all studies but Eaves and Ho (2004) used the ADOS as diagnostic measures. Turner and Stone (2007) used the ADI-R at Time 2 only, Chawarska et al. (2007) used the ADI-R at Time 1 only and Stone et al. (1999) used ADOS at Time 2 only. Inception of treatments between Times 1 and 2 was reported in all studies.

Diagnostic Stability

As mentioned in the Method section, we used the relative risk (RR) as a parameter of stability over time. When the eight trials were combined, the pooled RR was 1.95 (CI 95%, 1.294–2.934; p < 0.001) showing that diagnostic stability of AD was higher than PDD-NOS (See Fig. 2). When looking at studies individually, Fig. 2 shows that all trials displayed a RR higher than 1.5 (demonstrating a higher diagnosis stability for AD than PDD-NOS) except Chawarska et al. (2007) (RR = 0.987) who found a slightly lower diagnosis stability for AD than PDD-NOS.

Fig. 2
figure 2

Relative risk (RR) for diagnosis stability at T2 according to diagnosis at T1 (PDD-NOS vs. AD) (random model). PDD-NOS pervasive developmental disorder not otherwise specified, AD autistic disorder, Stable Dg number of participants with the same diagnosis at T1 and T2

As all studies selected for the meta-analysis examined AD and PDD-NOS diagnoses before 36 months, this meta-analysis provides information about diagnostic stability in very young children. Reported diagnostic changes among the selected studies are summarized in Table 2. All studies except Chawarska et al. (2007) found that the stability of AD diagnoses were superior to 50% whereas PDD-NOS diagnosis was unstable over time, presenting with less than 45% stability in every study. In sum, 76% of the children diagnosed with AD before 36 months retained the same diagnosis whereas only 35% of PDD-NOS children did so over a mean period of 3 years.

Of note, two of the selected studies showed stability values which differed from the other studies. Over a 15-month inter-assessment period Chawarska et al. (2007) found a very high PDD-NOS stability over time (100%) in a very small sample (n = 6). Turner and Stone (2007) reported the weakest AD stability over time (53%). However it remained superior to the PDD-NOS stability over time reported in the same study (30%).

Developmental Trajectories for PDD-NOS

As diagnostic stability does not provide information on the evolution of children first diagnosed with PDD-NOS who did not retain the diagnosis, we examined the diagnostic trajectories between the two assessment times. Movements over time within the spectrum are presented in Fig. 3. After combining the 8 trials, we obtained 322 AD and 122 PDD-NOS children at Time 1. Of the 322 AD children at Time 1, 245 remained AD (76%), 47 moved to PDD-NOS (15%) and 30 moved off the autism spectrum (9%). Of the 122 PDD-NOS at Time 1, 43 remained PDD-NOS (35%), 48 moved to AD (39%) and 30 moved off the autism spectrum (25%). In keeping with these findings the PDD-NOS condition seems to be almost equally dispatched over time to three statuses: (1) persistence of the same condition, (2) worsening of the phenomenology and (3) remission.

Fig. 3
figure 3

Developmental trajectories within the autism spectrum. AD autistic disorder, PDD-NOS pervasive developmental disorder not otherwise specified, non ASD non autism spectrum disorder. Numbers within brackets refer to the discussion part of the text

Discussion

We will first discuss findings related to the diagnostic stability, then the developmental trajectories of PDD-NOS over time, and finally the reliability of early PDD-NOS diagnoses in children less than 36 months.

PDD-NOS Diagnostic Stability

As it was mentioned in the Results section, the meta-analysis confirms the hypothesis that there is a higher diagnostic stability for AD than PDD-NOS (pooled RR of 1.95, p < 0.001). Pooling data from the selected studies indicated that an AD diagnosis was stable over time (76% stability) whereas PDD-NOS tended to be unstable over time (35% stability) (See Fig. 3).

A relatively low difference between AD and PDD-NOS stability was found in Turner and Stone (2007) who reported the lowest AD stability (53%). Table 2 shows that there are more AD children moving off the autism spectrum (31%) in Turner and Stone (2007), as compared to other studies. All children received speech therapy and the majority also received additional interventions (see Table 1). Thus, children from the Turner and Stone (2007) study could have shown a better improvement over time due to speech therapy which could in turn explain the lower AD stability over time. This would be supported by Turner et al. (2006) who associated speech therapy with better outcome. It is also consistent with the evidence of possible benefits of early targeted intervention (Charman and Baird 2002). However such an assumption should be received with caution as many children in other studies received interventions during the inter-assessment period.

Table 1 Descriptive characteristics of the studies included in the current meta-analysis of diagnostic stability of AD and PDD-NOS overtime
Table 2 Diagnostic changes between Times 1 and 2 evaluations

Unlike other studies where PDD-NOS stability was below 45%, Chawarska et al. (2007) elicited a very high stability (100%). This result could be explained by the very young sample of this study (Time 2 corresponded to age 3 years). Some studies reported that stereotyped behaviours, resistance to change and restricted interests seemed to appear later in children’s development. These symptoms would be identified less consistently in 2 and 3 year old children (Charman and Baird 2002; Sutera et al. 2007; Kleinman et al. 2008). Thus, at 3 years of age, all symptoms are not necessarily displayed. This could explain why PDD-NOS diagnoses remained stable in this study. In addition, as for the previously discussed study, method considerations have to be taken into account. In Chawarska et al. (2007) inclusion derived from a selection process which combined information based on both the ADOS-G Module 1 and the ADI-R scored with the algorithm developed for children under 2 years. This could have led to the formation of more homogeneous groups recruited with more stringent criteria, which in turn fostered stability. Consequently high stability rates were elicited for both AD (90%) and PDD-NOS (100%). Moreover, at Time 2 the authors did not use the ADI-R and relied on observations drawn from the ADOS-G and choose a more elaborate module (Module 2) for 4 of 6 PDD-NOS participants to fit their speech development progress. Finally, the small interval between the two assessments (15 months) should be highlighted and seen as a key contributor for the high stability rates.

In sum, our meta-analysis confirms the hypothesis that AD bears a higher diagnostic stability than PDD-NOS which tends to be unstable over time (35% stability). With respect to this overall finding, the relative discrepancies found with two studies (Chawarska et al. 2007; Turner and Stone 2007) could be mainly explained by methodological differences affecting the design of the studies. Further studies on the stability of PDD-NOS should use ADI-R and ADOS at each diagnostic assessment time.

Developmental Trajectories for PDD-NOS

All studies included in the meta-analysis examined diagnostic stability in very young children (Time 1 corresponding to about 2 years of age in all studies). Two studies reported short-term stability from age 2 to age 3 years (Stone et al. 1999; Chawarska et al. 2007). These 2 studies reported the highest PDD-NOS stabilities (respectively 42 and 100%) as compared to other studies. As we previously mentioned, stereotyped behaviors, resistance to change and restricted interests seemed to appear later in children’s development (Charman and Baird 2002; Sutera et al. 2007; Kleinman et al. 2008). Thus, at 3 years of age, all symptoms are not necessarily displayed. This could explain why PDD-NOS diagnosis is high. They also reported high AD stabilities (respectively 72 and 90%).

Four studies reported stability from age 2 years to age 4 years (Eaves and Ho 2004; Turner and Stone 2007; Sutera et al. 2007; Kleinman et al. 2008). These four studies reported low PDD-NOS stabilities (respectively 22, 30, 35 and 33%). Stereotyped behaviors, resistance to change and restricted interests seemed to emerge at 4 or 5 years of age, which can lead to the change from PDD-NOS to AD diagnosis and could explain the low stabilities (Kleinman et al. 2008; Sutera et al. 2007). They also reported relatively high AD stabilities (respectively 91, 53, 68 and 70%).

Finally, two studies reported long-term stability from age 2 to age 9 years (Lord et al. 2006; Turner et al. 2006). These 2 studies reported high AD stabilities (respectively 85 and 89%) and low PDD-NOS stabilities (respectively 30 and 29%) over time. The above findings demonstrated that AD tends to be a stable and reliable diagnosis in 2 years old children and that PDD-NOS tends to be an unstable diagnosis in young children when established before 36 months.

From a developmental perspective, Fig. 3 displays the trajectories of the PDD-NOS conditions within the spectrum over time (12–84 months). In terms of development, PDD-NOS appears to be distributed along fourth situations (see Fig. 3): (1) A clinical construct per se likely depicting a subgroup of children with autistic features (35% retained the same diagnosis at Time 2), (2) A developmental disorder, e.g. communication delay, that is not so pervasive in some patients (25% are moving off the autism spectrum at Time 2), (3) A possible evolution of AD in some cases that show improvement (15% moved from AD to PDD-NOS), (4) A provisional diagnosis awaiting full AD criteria to be met (39% are moving from PDD-NOS to AD). The relevance of the fourth pathway could be supported by some studies which have reported that stereotyped behaviors, resistance to change and restricted interests seemed to appear later in children’s development. The emergence of these behaviors by Time 2 can lead to the change from PDD-NOS to AD diagnosis (Kleinman et al. 2008; Sutera et al. 2007). In keeping with those assumptions illustrated in Fig. 3, PDD-NOS would be conceived as corresponding to a group of heterogeneous pathological conditions including prodromic forms of later AD, remitted or less severe forms of AD, and developmental delays in interaction and communication. Further studies on the stability of PDD-NOS should provide adequate information on the participants’ gender, IQ, verbal and motor skills.

Reliability of PDD-NOS Diagnoses Before 36 Months

Towbin (2005) raised concerns about the validity of the PDD-NOS category because the criteria for this diagnostic category were vague and the group was heterogeneous. The DSM-IV-TR states that the PDD-NOS category “should be used when there is a severe and pervasive impairment in the development of reciprocal social interaction associated with impairment in either verbal and nonverbal communication skills, or with the presence of stereotyped behavior, interests, and activities, but the criteria are not met for a specific Pervasive Developmental Disorder” (APA 2000). Even if requiring impairments in at least two among the three domains, PDD-NOS remains an exclusion category used when the patient’s phenomenology does not meet criteria for Asperger’s Disorder or Autistic Disorder. In the same vein, Witwer and Lecavalier (2008) who cautiously examined the discriminant validity of ASD subtypes, found that the only difference between PDD-NOS and AD was the symptom frequency, as PDD-NOS patients presented fewer symptoms in all three core domains when compared with AD. This finding is basically a tautology because it only follows the definition of the PDD-NOS category. The authors failed to find any external validators for the PDD-NOS category other than anxiety symptoms. Moreover they requested to interpret such a finding with caution because this difference might be due to differences in IQ between the studied groups. In addition the inter-rater agreement for PDD-NOS was found to be the lowest among ASD diagnoses (Mahoney et al. 1998). In sum, there is little support in the literature for the discriminant and inter-rater reliabilities of the PDD-NOS category.

Our meta-analysis confirms the hypothesis that AD bears a higher diagnosis stability than PDD-NOS which tends to be unstable over time with a 3-year stability rate of 35%. Despite two studies reporting high short-term stability rates for PDD-NOS between 2 and 3 years of age, long-term stability studies (from age 2 to age 9) demonstrated that PDD-NOS tends to be an unstable diagnosis in young children when established before 36 months. Therefore the literature seems not to support the discriminant and predictive validity of the PDD-NOS category in young children. Such a finding has clinical implications. Children whose PDD-NOS diagnosis was established before 36 months should be re-assessed at a later age. Given that PDD-NOS when diagnosed in very young children, covers a wide variety of pathological conditions including prodromic forms of later AD, remitted or less severe forms of AD, and developmental delays in interaction and communication, PDD-NOS should be considered at most as a provisory diagnosis, requiring reassessment at a later age.

With respect to the current proposed revision of DSM-IV-R criteria for ASD including merging the three subtypes, AD, AS and PDD-NOS, into one category, namely Autism Spectrum Disorder (ASD), our study confirmed the lack of support for reliably distinguishing PDD-NOS as a diagnostic entity. However our study clearly demonstrated the heterogeneity of the PDD-NOS group which will impact the predictive validity of the proposed DSM-V entity. The international clinical and research consensus on the robustness of AD as defined by currently more stringent DSM criteria, would be lost. International communication and comparison between interventions will be jeopardized.

Limitations

Several limitations to this meta-analysis should be noted. First, for the majority of the studies, clinical diagnosis at follow-up was not independent from initial diagnosis. Kleinman et al. (2008) suggested that a truly blind evaluation at Time 2 would be preferable because it would reduce potential bias in the diagnostic determination. However, Stone et al. (1999) evaluated the influence of same and different clinicians on the stability of the diagnosis. They found no significant difference whether it was the same or a different clinician at Time 2. Further studies are needed to support this result. Second, general information that affects the evolution of AD and PDD-NOS were not documented. For example, the impact of the interventions on the final outcome was not measured, except in Turner et al. (2006). In addition there was not enough information to conduct subgroup meta-analysis based on gender ratios and IQ, as recommended for the latter by Witwer and Lecavalier (2008).

Conclusion

Our meta-analysis conducted on the eight longitudinal studies on PDD-NOS that have been published from 1996 to 2009 showed that PDD-NOS diagnosis was less stable than AD diagnosis. When established before 36 months, the overall stability rate was 35% at 3-year follow-up. Consistent with the previous literature on the reliability of the PDD-NOS diagnosis in young children, our meta-analysis did not to support the discriminant and predictive validity of this category. Thus, from a clinical standpoint, children whose PDD-NOS diagnosis was established before 36 months should be re-assessed at a later age. In addition further studies on the stability of PDD-NOS should use ADI-R and ADOS at each diagnostic assessment time and provide adequate information on the participants’ gender, IQ, verbal and motor skills.