Introduction

With a growing body of evidence assessing the efficacy of psychological and pharmacological treatments for anxiety in children with autism spectrum disorders (ASD; Vasa et al. 2014; Sukhodolsky et al. 2013; Rudy et al. 2013; Williams et al. 2010), there is a need to validate and optimize the measurement of treatment outcomes. The Pediatric Anxiety Rating Scale (PARS; RUPP 2002) is a clinician-administered measure of anxiety symptom presence and severity over the past week for use with children and adolescents. The PARS has gained traction as a popular measure of treatment outcome following its use in a number of landmark trials with typically developing children (e.g., RUPP 2001, 2002; Walkup et al. 2008). While diagnostic measures of anxiety exist, for example the Anxiety Disorder Interview Schedule (ADIS-IV; Silverman and Albano 1996), these diagnostic measures assess the presence and severity of individual disorders, rather than across anxiety disorders. The PARS parallels other clinician-rated measures of overall disorder severity such as the Children’s Yale-Brown Obsessive Compulsive Scale (CY-BOCS; Scahill et al. 1997) and Yale Global Tic Severity Scale (YGTSS; Leckman et al. 1989) and provides a measure of anxiety severity. This ability to assess overall anxiety severity highlights the utility of the PARS for tracking treatment progress and outcome in clinical trials as well as in routine clinical care. Guidelines have been developed delineating optimal definitions of treatment response and remission on the PARS in typically developing children (Caporino et al. 2013; Ginsburg et al. 2011), however, it is unclear whether these recommendations generalize to the measurement of treatment outcome for anxiety in children with ASD.

Psychological and pharmacological treatment studies use a variety of criteria to assess and classify treatment outcomes, typically utilizing metrics of treatment response and/or symptom remission. Accurately identifying whether an individual is responding to treatment is important for clinical decision making about whether to continue or augment treatment protocols, as well as for identifying outcome in clinical trials. Remission tends to be a more conservative measure of outcome, and refers to situations whereby the patient no longer meets diagnostic or severity criteria for the illness on standardized measures (Steele et al. 2006; Bandelow 2006). Remission status has implications for clinical decision making about the utility of continued or augmented treatment (especially for pharmacological treatment) and treatment discontinuation. The optimal way to measure treatment response and remission varies between studies, and standardizing measurement is important to facilitate comparison between trials as well as aid dissemination into clinical practice. While clinical trials often report mean differences on outcome measures or statistical effects for symptom measures, this information is of little value to clinicians for comparison purposes and ultimately exacerbates barriers disseminating empirical findings. The use of metrics that can be calculated and used by clinicians allows providers to ascertain whether a patient is experiencing comparable benefits to those in a clinical trial, and may assist with clinical decision making about likely treatment trajectory of their individual client. Treatment response and remission can be assessed via the percent reduction in symptoms or clinical cut-offs (Steele et al. 2006; Bandelow 2006) and via specific measures of improvement (e.g., Clinical Global Impression-Improvement [CGI-I]; Guy 1976). While these metrics are used in some clinical trials to define outcome, and have the most utility for benchmarking in clinical practice, the thresholds used to classify treatment outcomes often vary between studies and measures, limiting the comparability of treatment effects (Tolin et al. 2005). There is a need for guidelines to be developed on validated outcome measures that maximize the chance of accurately classifying treatment outcomes in relevant populations.

The PARS has been used as an outcome measure following CBT for anxiety in pediatric ASD (Storch et al. 2013; White et al. 2013; Wood et al. 2015) and has shown adequate inter-rater and test–retest reliability, as well as good construct validity for use with youth on the autism spectrum (Storch et al. 2012b; Kerns et al. 2015). Cut-off guidelines for differentiating youth with anxiety disorders from those without anxiety disorder, based on PARS scores was examined in typically developing youth, and more recently in a small sample of youth with ASD. In a sample of typically developing children, Ginsburg et al. (2011) found an optimal cut-off score of 11.5 on the five-item PARS and 17.5 on the seven-item PARS to screen for clinical levels of anxiety. Replication of these analyses in a small ASD sample Kerns et al. (2015) found that a cut-off of 11.5 resulted in low sensitivity (0.53), despite 95 % specificity (Kerns et al. 2015). This study suggested that a lower cut-off of 7 optimally differentiated youth with ASD who had an anxiety disorder from those who didn’t, resulting in 95 % sensitivity and 71 % specificity.

To examine the measurement of treatment response and remission, a signal detection study using receiver operating characteristic (ROC) procedures in 438 typically developing children with a primary diagnosis of social, separation or generalized anxiety disorder found that 35 % reduction in symptoms on the PARS optimally predicted treatment response while 50 % reduction in symptoms, or a cut-off score of 8–10, most reliably predicted remission of symptom (Caporino et al. 2013). Unfortunately, this study utilized a six-item version of the PARS used in the eminent Child/Adolescent Anxiety Multimodal Treatment Study (CAMS; Walkup et al. 2008) rather than the five or seven-item versions validated by the scale developers (RUPP 2002, 2001), limiting the utility and comparability of these results to other studies.

While the Caporino et al. (2013) guidelines have benefit for defining outcome on the PARS in typically developing children, this study excluded children with ASD. Diagnosing anxiety in youth with ASD is complex and can be confounded by symptoms related to their ASD diagnosis, such as the drive for routine and sameness, repetitive and ritualized behaviors, and avoidance of certain stimuli related to sensory sensitivities (Kerns and Kendall 2012; Kerns et al. 2014). Reductions in anxiety symptoms appear to be lower in clinical trials for anxiety in children with ASD in comparison to rates seen in typically developing children (Storch et al. 2013, 2014). These children may continue to experience residual levels of impairment related to their developmental diagnosis despite improvements in their anxiety symptoms. As such, optimal cut-offs for identifying clinically meaningful changes in anxiety symptoms amongst children on the autism spectrum may differ somewhat from those relevant for typically developing youth, where restoration of ‘normal’ functioning may be possible. As clinical trials continue to proliferate among anxious youth with ASD, there is a need to develop standardized criteria to define response and remission on the commonly used measures of anxiety in youth with ASD, including the PARS.

Replicating procedures used to develop guidelines for defining treatment remission and response on clinician-administered measures in other populations, including adult OCD (Tolin et al. 2005; Farris et al. 2013; Lewin et al. 2011); pediatric OCD (Storch et al. 2010); tic disorders (Jeon et al. 2013; Storch et al. 2011); and on the PARS in typically developing children with anxiety (Caporino et al. 2013); this study aimed to define optimal criteria for assessing clinical response and remission of anxiety on the PARS in youth with ASD in comparison to gold-standard measures of outcome. Remission was defined as no longer meeting diagnostic criteria for a current primary anxiety disorder on the ADIS-IV and scoring mildly ill or better (≤3) on the Clinical Global Impression Scale-Severity (CGI-S). Response was assessed via improvement ratings of much improved or very much improved (1 or 2) on the Clinical Global Impression Scale-Improvement (CGI-I) scale. The percent reduction in symptoms and clinical cut-offs were examined given these metrics have utility for standardizing criteria used for research trials, as well as clinical practice.

Method

Participants

Participants consisted of 108 children with ASD and their parents who were recruited from one site during four treatment outcome studies of CBT for anxiety in youth with ASD (Storch et al. 2013; Wood et al. 2015; Ehrenreich-May et al. 2014; Storch et al. 2014; Lewin and Storch 2014). One study recruited children (aged 7–11 years; Storch et al. 2013), one recruited adolescents (aged 11–16 years; Storch et al. 2014), one recruited young adolescents (aged 11–14 years; Ehrenreich-May et al. 2014; Wood et al. 2015) and final study recruited youth across this age range (6–17 years; Lewin and Storch 2014). For all studies, participants were required to have a diagnosis of anxiety and a diagnosis of Autism, Asperger syndrome (AS), or PDD-NOS based on the Autism Diagnostic Interview-Revised (ADI-R; Lord et al. 1994) and/or the Autism Diagnostic Observation Schedule (ADOS; Lord et al. 1999). Participants were excluded if they were actively suicidal, required high level care (inpatient), had recently initiated or changed antidepressant medications, or had significant cognitive impairments (IQ < 70). Youth included in this study were aged 6–17 years old (M = 10.97, SD = 2.29; 39.8 % aged over 11 years, 29.6 % aged over 12 years, and 13.9 % aged over 13 years), and were recruited through a specialty pediatric neuropsychiatry clinic. Participant demographics and diagnostic information are provided in Table 1.

Table 1 Sample demographics

Treatment

Three studies utilized the same treatment manual, a family CBT-based treatment manual for anxiety in children with ASD, the Behavioral Interventions for Anxiety in Children with Autism (BIACA) treatment (Wood and Drahota 2005). The BIACA program is a modular treatment approach implemented flexibly based on clinical need and a treatment algorithm. Treatment components include traditional CBT components for anxiety (e.g., graded exposure, parent training) as well as ASD-specific treatment components (e.g., social skills training). Treatment consisted of sixteen sessions of up to 90 min duration, with at least eight sessions devoted to in vivo exposures to feared stimuli. For adolescents, developmentally appropriate adaptations were utilized. The fourth study utilized a primarily exposure-based treatment for anxiety with heavy parental involvement and consisted of 12 sessions of up to 90 min duration. The first session incorporated psychoeducation about anxiety and hierarchy generation, with the remaining eleven sessions consisting of exposure tasks.

Measures

PARS (RUPP 2002)

The PARS is a clinician-administered interview that assesses overall anxiety severity over the past week. The presence of fifty anxiety symptoms is assessed on a yes/no scale during interviews with the child and parent separately, and the severity, distress and impairment of anxiety symptoms is rated on a six-point scale with higher scores indicating greater severity. Clinician ratings are based on the combined parent–child reports. Where there were discrepancies, preference was given to the parental reports. The five-item version is recommended for use in clinical trials and incorporates items assessing anxiety symptom frequency, distress, avoidance, anxiety-related interference at home, and interference out of home (at school, with peers etc.). This five-item version excludes the symptom count item, and the item assessing physiological symptoms given the potential overlap with SSRI medication side-effects in pediatric samples (RUPP 2002).

ADIS-IV (Silverman and Albano 1996)

The ADIS-IV is the gold-standard clinician-administered interview for diagnosing anxiety and related disorders. Interviews are conducted separately with the parent and child, with final clinician diagnoses being based on the combined reports. Where there were discrepancies, preference was given to parental reports given strong agreement between parental and clinician reports, and poor diagnostic agreement of youth-report compared to clinicians and parent ratings (Storch et al. 2012a). This measure assesses the presence of individual disorders based on DSM-IV-TR criteria (American Psychiatric Association 2000), and provides a severity rating for each disorder on a 0–8 point scale, with scores ≥4 indicating full diagnostic criteria were met. The ADIS-IV demonstrates good inter-rater reliability in children and adolescents with high functioning ASD (Ung et al. 2014) and is regularly used in treatment trials for anxiety in youth with ASD (e.g., Storch et al. 2013, 2014; Reaven et al. 2012; Chalfant et al. 2007; Ehrenreich-May et al. 2014; McNally Keehn et al. 2013).

Autism Diagnosis Interview-Revised (ADI-R; Lord et al. 1994)

The ADI-R is a clinician-administered semi-structured interview to assess the presence and severity of ASD based on the DSM-IV-TR criteria (American Psychiatric Association 2000). The ADI-R was administered to the primary caregiver at baseline by a certified doctoral level assessor to confirm the presence of an ASD.

Autism Diagnostic Observation Schedule: Module 3 or 4 (ADOS; Lord et al. 1999)

The ADOS is a structured observational assessment often used in conjunction with the ADI-R. This measure consists of structured and unstructured tasks administered to the child to elicit skills in social interaction, identify stereotyped behaviors, andto assess atypical language use. Module 3 was used with verbally fluent children and module 4 with adolescents. This observation was administered in conjunction with the ADI-R by the same certified doctoral level assessor.

CGI-S/I (Guy 1976)

The CGI-S is a clinician-rated single-item measure of overall symptom severity rated on a 7-point scale where 1 = normal, not at all ill and 7 = extremely ill. The CGI-I is a one-item measure of treatment-related improvement in symptoms rated by clinicians on a 7-point scale where 1 = very much improved to 7 = very much worse. The CGI scales are widely used in pediatric anxiety studies with typically developing children and children with ASD, and display sensitivity to treatment effects and good predictive validity (Storch et al. 2013; White et al. 2013; Wood et al. 2015; Walkup et al. 2008; Caporino et al. 2013; RUPP 2002).

Procedure

Participants completed an assessment session before and after treatment. For those participants who were randomized to the waitlist or treatment as usual condition in the trial, data from the assessment immediately prior and following their entry to the active treatment condition was used. Assessments were conducted by trained assessors who had undergone standardized training programs, including didactic training, video rating training, and supervised administration. Assessors were supervised by an experienced licensed clinical psychologist on diagnostic and rating measures. Inter-rater reliability on the PARS was considered acceptable in the four trials (ICC = 0.79 − 1.0). Therapists were doctoral-level clinical psychology students or post-doctoral psychologists. Therapists completed standardized training consisting of didactic training, guided reading, observation, in vivo supervision, and ongoing supervision with an experienced licensed clinical psychologist. Quality assurance measures were used in each trial.

Data Analysis

Consistent with previous methodology (e.g., Caporino et al. 2013; Storch et al. 2010, 2011; Tolin et al. 2005; Lewin et al. 2011), ROC analyses derived from signal detection theory were used to assess cut-offs on the PARS for identifying treatment response and remission based on gold standard criteria (ADIS-IV and CGI). Remission was defined as loss of primary anxiety diagnosis on the ADIS-IV, and mild or better symptoms on the CGI-S (≤3). Response was defined as those who were much or very much improved on the CGI-I (1 or 2) and the optimal percent reduction in symptoms was assessed. Table 2 shows the contingency table from which metrics were derived.

Table 2 Contingency table for calculating test fit statistics based on gold standard criteria and test cut-offs on the Pediatric Anxiety Rating Scale (PARS)

Six metrics were used to assess optimal cut-offs: Sensitivity, also referred to as the true positive rate, is the proportion of participants who meet remission/response criteria on gold standard measures that are correctly captured using the test cut-off (True Positive/[True Positive + False Negative]); Specificity, also referred to as the true negative rate, is the proportion of participants who do not meet remission/response criteria on gold standard measures that are correctly identified by the test cut-off as not meeting criteria (True Negative/[False Positive + True Negative]); the Positive Predictive Value (PPV) is the proportion of participants who are positively identified as meeting criteria by the test cut-off, who also meet gold standard criteria for remission/response (True Positive/[True Positive + False Positive]); the Negative Predictive Value (NPV) is the proportion of participants who are identified as not meeting criteria by the test cut-off who also do not meet remission/response criteria on gold standard measures (True Negative/[True Negative + False Negative]); Efficiency, a measure of simple agreement between the cut-off and gold standard criteria; and, Cohen’s kappa, a measure of agreement between ratings on the gold standard measure and the PARS cut-off that accounts for chance agreement based on limitations of simple agreement (Kraemer et al. 2012) and the theory underlying quality ROC statistics (QROC; Kraemer et al. 2002). Cohen’s kappa can provide an improved measure of efficiency when there is error in gold-standard measurement (e.g., the CGI). These statistics were evaluated for the percent reduction in symptoms on the PARS at values increasing by 5 % for response and remission criteria, and at absolute cut-off values for identifying remission.

While there are a variety of ways to assess the optimal cut-off, primary emphasis was given to the highest Cohen’s kappa agreement given this statistic maximizes accuracy of agreement while also accounting for chance agreement between the score and gold standard criteria. Given that these cut-off values are likely to be used in the context of evaluating treatment augmentation for non-response and treatment discontinuation for remission, secondary criteria were also determined for response and remission analyses. For treatment response, the implication of a negative test value is that treatment may be augmented (e.g., adjunct pharmacological therapy). In the case of a false negative, this may result in unnecessarily increasing the treatment burden. As such, a high sensitivity value was prioritized as the secondary criteria. For remission, the implications of a positive test value is that treatment may be discontinued. In the case of a false positive, this may increase the risk of premature treatment discontinuation and relapse, thus, a high specificity value was considered of secondary importance.

Results

Descriptive Statistics

Firstly, the samples were compared on demographic and treatment outcome information. There were no significant differences between the groups on gender (F(3, 104) = 0.58, p = 0.629) or ASD diagnosis (χ2 = 2.86, p = 0.826). There was a significant difference between the groups on age (F(3,104) = 33.66, p < 0.001). Unsurprisingly, post hoc Tukey tests suggested that mean age of the adolescent study (M = 12.73, SD = 1.36) and young adolescent study (M = 12.35, SD = 1.23) samples were significantly higher than the child study (M = 9.12, SD = 1.30) and child/adolescent study (M = 9.95, SD = 2.66) samples (all p’s < 0.001).

Given the heterogeneity in age, ASD diagnosis, and anxiety diagnosis, we conducted further analyses to examine the potential for sample heterogeneity to influence the clinical utility of these findings. In regards to age, there were no significant age differences between those who met criteria for response or remission, and those who did not (t(106) = 0.65, p = 0.52 and t(106) = −0.46, p = 0.648 respectively). Results of a one-way ANOVA suggested that there were also no significant differences in age based on ASD diagnosis (Autistic disorder, Asperger’s disorder or PDDNOS), F(2, 105) = 1.40, p = 0.252. There was also no significant correlations between age and either the PARS outcome measures, percent reduction on the PARS (r = 0.02, p = 0.848) or post-treatment PARS score (r = 0.007, p = 0.945). In regards to ASD diagnosis, a one-way ANOVA suggested that there was no difference in the percent reduction on the PARS (F(2, 105) = 0.55, p = 0.58), or the post-treatment PARS score (F(2,105) = 0.33, p = 0.720) based on ASD diagnosis. There was also no significant difference in ASD diagnosis based on response or remission status (χ2(2) = 1.57, p = 0.457 and χ2(2) = 0.40, p = 0.818 respectively). In regards to anxiety diagnoses, most participants (86.1 %) were diagnosed with more than one anxiety disorder, limiting the utility of separate signal detection analyses based on individual disorder profiles. Given these results, the samples were combined for analyses to increase the statistical power, precision, and generalizability of the signal detection analyses.

There was an average 30.7 % (SD = 26.0) reduction in symptoms on the PARS from pre-treatment (M = 15.44, SD = 2.72) to post-treatment (M = 10.66, SD = 4.21; t(107) = 12.62, p < 0.001, d = 1.21). There was also a significant reduction in symptom severity on the CGI-S from baseline (M = 3.57, SD = 0.78) to post-treatment (M = 2.66, SD = 0.79; t(107) = 10.49, p < 0.001, d = 1.00). Of the total sample, 73.1 % met criteria for treatment response on the CGI-I and 32.4 % met the more conservative remission criteria. This was slightly more conservative than using either measure alone, with 38 % meeting remission criteria using CGI-S criteria only, and 46.7 % meeting criteria based on ADIS-IV criteria only.

Prediction of Treatment Response Using Percent Symptom Reduction on the PARS

We conducted an ROC analysis to examine the predictive ability of the PARS to identify response and remission criteria. Figure 1 shows the ROC curve.

Fig. 1
figure 1

ROC curve for the percent reduction in anxiety severity on the Pediatric Anxiety Rating Scale (PARS) to predict treatment response (based on Clinical Global Impression-Improvement ratings of much improved or very much improved)

The area under the curve (AUC) statistic was 0.876, and Table 3 presents six metrics for predicting treatment response based on the percent reduction of symptoms on the PARS at values increasing by 5 % (up to 70 % reduction). Results suggest that the optimal agreement with treatment response criteria was reached at 15 % reduction on the PARS. A cut-off of 15 % had a sensitivity value of 0.86 and specificity of 0.72. The predictive value of a positive test and negative test were 0.89 and 0.66 respectively.

Table 3 Prediction of clinical response based on Clinical Global Impression-Improvement (CGI-I) scale ratings (much improved or very much improved) using percent reduction of symptoms on the Pediatric Anxiety Rating Scale (PARS)

Predicting Remission Using PARS Percent Reduction

Two measures of change on the PARS were examined in relation to remission criteria: percent reduction in symptoms, and absolute value cut-offs. Figure 2 shows the ROC curve for the percent reduction in symptoms on the PARS (AUC = 0.810).

Fig. 2
figure 2

ROC curve to predict symptom remission (based on loss of primary diagnosis on the anxiety disorders interview schedule and Clinical Global Impression-Severity ratings of normal to mild symptoms) based on the percent reduction in symptoms on the Pediatric Anxiety Rating Scale (PARS) and post-treatment Pediatric Anxiety Rating Scale (PARS) score

As shown in Table 4, optimal agreement with remission criteria was reached at 30 % (k = 0.52). However, when incorporating specificity as the secondary criteria for identifying optimal cut-offs, 40 % reduction had a very similar kappa value (0.50), but had a higher specificity value (0.79 for 30 % reduction compared to 0.70 for 40 % reduction) and a better PPV statistic (0.59 for 30 % reduction compared to 0.63 for 40 % reduction). In addition, 40 % reduction had the highest level of agreement (0.77) with the gold standard criteria for remission. There was only slight decrease in the sensitivity value (0.88 for 30 % reduction compared to 0.74 for 40 % reduction), however this index was given a lower priority in determining optimal criteria for remission status in comparison to response status. At this cut-off, the predictive value of a positive test suggested that 63 % of those identified as positive using this cut-off are likely to reflect true remission (with 37 % likely to be false positives), while the predictive value of a negative test was 0.86, suggesting that only 14 % of those failing to meet remission criteria are likely to reflect false negatives.

Table 4 Prediction of clinical remission based on loss of primary diagnosis on the anxiety disorders interview schedule and Clinical Global Impression-Severity (CGI-S) scale ratings (normal to mild symptoms) using percent reduction of symptoms on the Pediatric Anxiety Rating Scale (PARS)

Predicting Remission Using PARS Total Scores

The second metric often used to classify remission refers to absolute cut-off scores at post-treatment. The assumption of ROC analyses is that higher test scores (on the PARS) reflect increased probability of a positive test result. In the case of the post-treatment PARS score, lower scores indicate greater probability of a positive test result (remission status). To preserve the direction of the relationship for this ROC analysis, test scores on the post-treatment PARS score were reversed when producing the ROC graph. Figure 2 shows the ROC curve (AUC = 0.852) for predicting remission status based on the post-treatment PARS score. Using post-treatment cut-off scores to predict remission status, a threshold of 10 achieved optimal agreement with remission criteria, with sensitivity of 0.97 and specificity of 0.72 (see Table 5). The positive predictive value was 0.62 and negative predictive value was 0.98.

Table 5 Prediction of clinical remission based on loss of primary diagnosis on the anxiety disorders interview schedule and Clinical Global Impression-Severity (CGI-S) scale ratings (normal to mild symptoms) using post-treatment cut-off scores on the Pediatric Anxiety Rating Scale (PARS)

Comparison of Response and Remission Cut-offs

For visual comparison, the kappa statistic of agreement between the prediction of remission and response over the series of PARS percent reduction cutoffs are plotted in Fig. 3. This graph illustrates the highest level of agreement based on response criteria at 15 % reductions, and the highest levels of agreement with remission criteria at 30 or 40 % reduction. Based on secondary criteria described above, 40 % reduction was considered the most optimal criteria for identifying remission.

Fig. 3
figure 3

Comparison of the quality index of agreement (k) for percent reduction in anxiety symptoms on the Pediatric Anxiety Rating Scale (PARS) cut-offs predicting treatment response (per Clinical Global Impression [CGI]–Improvement) and remission (per CGI-severity and anxiety disorders interview schedule) for children with autism spectrum disorders

Discussion

The optimal way to define and measure treatment response in treatment for comorbid anxiety and ASD remains unclear and varies between measures and studies. Researchers and clinicians face a number of important decisions when choosing outcome measures, including identifying appropriate measures and identifying relevant guidelines for assessing outcome in relevant populations. The PARS represents a promising measure of anxiety severity that has gained increasing use in clinical trials over recent years, including treatment trials for children with ASD (Storch et al. 2013; White et al. 2013; Wood et al. 2015; Storch et al. 2014). In line with recent research identifying optimal guidelines for defining treatment response and remission on the PARS in typically developing children, this study aimed to identify optimal guidelines for defining treatment response and remission for anxiety in youth with ASD. Results suggest that a 15 % reduction in symptoms was optimal for identifying those who had responded to treatment. This is lower than that identified in typically developing anxious youth (35 %; Caporino et al. 2013). The percent reduction in symptoms is often used to ascertain whether the treatment is benefitting an individual patient, and impacts clinical decision making about whether continued treatment or treatment augmentation is warranted. Results suggest that even small reductions in anxiety symptoms are likely to indicate clinically meaningful improvement in the context of ASD.

The percent reduction in symptoms is often used in psychological and pharmacotherapy treatment in order to identify when treatment may be augmented, titrated or discontinued, given remission of symptoms. This requires balancing optimal identification of those who meet remission criteria to avoid over-servicing clients, while minimizing the risk of prematurely discontinuing treatment which may increase the risk of relapse. Similar to thresholds for identifying treatment response, the optimal percent reduction in symptoms for identifying remission of anxiety was lower in children with ASD in comparison to typically developing youth (40 % compared to 50 % in typically developing children; Caporino et al. 2013). Typically developing children may return to relatively normal levels of functioning following treatment, however, individuals with ASD are likely to continue to have residual symptoms and impairments related to their ASD diagnosis, despite remission of their anxiety. Coupled with the complexities with differential diagnosis of anxiety in the context of ASD (Kerns and Kendall 2012; Kerns et al. 2014), these results similarly suggest that small changes in anxiety symptoms in youth with ASD are likely to be clinically significant. For example, being able to get to bed on time as a result of remission of bedtime fears and worries is likely to have a significant impact on family functioning and child functioning. Similarly, although these children will still have social impairments, being able to greet or respond to questions and interactions from others is likely to have a meaningful impact in a variety of contexts for children with ASD.

The percent reduction in scores has the benefit of measuring individual change; however it is influenced by the baseline disorder severity, thus, individuals with lower initial baseline severity may experience a floor effect when using this metric. Cut-off scores represent an alternative metric of treatment outcome that utilizes a threshold to identify those who are likely to no longer meet clinical caseness. Results suggest that a score at or below 10 was most strongly related to anxiety remission in youth with ASD, similar to that identified for typically developing children (Caporino et al. 2013). While the amount of reduction of anxiety symptoms needed to correspond with remission and response criteria appears to be smaller for children with ASD in comparison to typically developing children, there does appear to be a threshold of symptoms on the PARS that indicates the presence of an anxiety diagnosis that is consistent across developmental status.

There are a number of strengths of this study. While most psychometric and clinical studies of pediatric anxiety disorders exclude youth with ASD, the inclusion of comorbid anxiety and ASD in this study increases the generalization of results to the complex presentations in clinical settings where ASD diagnoses are commonplace. Despite the strengths of this study, there are some limitations that warrant consideration. Firstly, the use of the five item PARS is consistent with the recommendations from scale developers (RUPP 2001, 2002), as well as previous clinical trials with ASD populations (Storch et al. 2013; Wood et al. 2015), however does limit the comparability to other studies with other ASD samples (White et al. 2013) and typically developing youth that have used the six-item scale (e.g., Caporino et al. 2013; Walkup et al. 2008). There is a clear need for further research comparing cut-offs on the five item scale in typically developing children with anxiety. Secondly, in comparison to previous studies assessing cut-off values on clinician rated measures (e.g., Caporino et al. 2013; Lewin et al. 2011; Storch et al. 2010, 2011; Tolin et al. 2005), even the ‘best’ cut-off values still had relatively low agreement (k ≤ 0.60) with response and remission criteria. This is likely to be a result of the low internal reliability of the PARS at baseline (α = 0.56) increasing the variability of the measure, although this was improved at post-treatment (α = 0.83). This low internal consistency is a recurrent finding with the PARS, with the authors suggesting that this is because the items are related, but not redundant (RUPP 2002; Storch et al. 2012b). This lower agreement rate and increased variability in scores may also be a result of the complexities involved in assessing anxiety symptoms where there is overlap with ASD symptomatology. Thirdly, the same assessor administered the PARS and gold-standard measures during the assessment. While it is possible that estimates of severity on the PARS were influenced by knowledge about diagnostic status, this is more likely to affect results utilizing absolute value cut-off scores rather than the cut-offs utilizing percentage reduction scores given that assessors were blinded to scores at previous assessment. Fourth, although the age range and inclusion of participants with varied ASD diagnoses increases the clinical utility and generalizability of these results, it is important to note that youth with a co-occurring intellectual disability were not included, and results are unlikely to generalize to this population. Finally, although two metrics (percent reduction in symptoms and cut-off scores) have been presented as measures of change, each measure has advantages and disadvantages. Although the percent reduction in symptoms accounts for individual changes, it is influenced by the baseline symptom severity and individuals with lower baseline severity may be hindered by a floor effect, while cut-off scores do not provide information about individual change and may be considered a more blunt measure of change. Researchers and clinicians are advised to consider the strengths and limitations of each metric when selecting outcome criteria for their particular need.

The PARS is a promising measure of anxiety severity across disorders, and given the relative brevity of this measure in comparison to other clinician-administered diagnostic measures, has considerable utility for use in research and clinical settings. While guidelines exist for optimal cut-offs on the PARS for identifying clinical remission and treatment response, the applicability to youth with ASD has not been established. Results suggest optimal cut-offs of 15 and 40 % reduction in symptoms for identifying response and remission respectively, or a cut-off value below 10 for identifying remission. The percent reduction in symptom values are lower than those recommended for typically developing children and suggest that even small amounts of change in anxiety in youth with ASD are likely to have notable and clinically meaningful effects on functioning, however the absolute cut-off values are post-treatment appear to be similar in youth with ASD and typically developing youth. The results have implications for the standardization of response and remission criteria in treatment trials and the benchmarking of outcomes in clinical practice.