Introduction

Flexibility is a complex construct encompassing a range of interrelated characteristics, behaviors and cognition (Dajani and Uddin 2015). Conceptualized as a key component of executive functioning (Chan et al. 2008; Diamond 2013), flexibility includes: easily and efficiently switching between tasks (Schmitz and Voss 2012); shifting attention to different features within a paradigm (Hedden and Gabrieli 2010; Wagner et al. 2004); adapting responses/learning based on rewards and punishments (Cools et al. 2002; Hornak et al. 2004; Votinov et al. 2015), etc. Reduced flexibility is associated with a number of conditions including traumatic brain injury (Busch et al. 2005; Pang et al. 2016; Whiting et al. 2015), obsessive–compulsive disorder (Francazio and Flessner 2015; Morein-Zamir et al. 2016; Zhang et al. 2015), Prader-Willi syndrome (Benarroch et al. 2007; Woodcock et al. 2009), anxiety disorders (Arlt et al. 2016; Lawson et al. 2015), depression (Nolen-Hoeksema 2000; Stange et al. 2016) and autism spectrum disorders (ASD; Blijd-Hoogewys et al. 2014; Gioia et al. 2002; Rosenthal et al. 2013). Flexibility skills are related to outcomes in childhood and adulthood (e.g., Engel de Abreu et al. 2014; Genet and Siemer 2011); for example, flexibility is related to and predictive of adaptive skills in ASD (Gilotty et al. 2002; Pugliese et al. 2015, 2016). Improving flexibility skills can impact other key abilities. For example, cognitive behavioral intervention targeting flexibility and planning in children with ASD improved social skills as well as executive functioning (Kenworthy et al. 2014).

Flexibility has both cognitive and behavioral components, and distinctions between the cognitive and behavioral are not always absolute. For example, less flexible thinking can lead to behavioral rigidity when expectations are violated. Consider 12-year old Daniel building a model airplane. When called by his father to come take the dog for a walk, Daniel struggles to respond flexibly, announcing that he will forget what step he’s on and never be able to finish the model. These “stuck” thoughts lead to a rigid interchange with his father. Ultimately, Daniel comes downstairs, but rigidly refuses to walk the dog and has a meltdown. In this example, like so many from scenarios in daily life, both cognitive and behavioral manifestations of inflexibility are apparent. The cognitive aspects of flexibility are emphasized in performance-based neuropsychological measures of executive function (e.g., Delis et al. 2001; Van Eylen et al. 2015) such as the Wisconsin Card Sorting Test (Greve et al. 2002), but are underrepresented in current informant report measures. Current informant reports tend to focus on behaviors, or if they include cognitive flexibility-related items, do so from a uni-dimensional (single factor) perspective. Cognitive flexibility has clear conceptual overlap with the construct of insistence on sameness (a component of the restricted and repetitive behaviors and interests in ASD and other conditions), and strong relationships have been noted between measures of these two constructs (Lopez et al. 2005).

Current Informant Report Measures of Flexibility and Related Constructs

The Behavioral Rating Scale of Executive Function (BRIEF) is a well-studied and validated informant report measure of everyday executive function, which includes a single 8-item “shift” factor targeting flexibility (Gioia et al. 2000). Seven of the “shift” items assess aspects of a child’s insistence on sameness. The Behavior Flexibility Rating Scale—Revised (BFRS-R; Green et al. 2007) is a 16-item parent report scale specifically targeting the behavior responses/agitation caused by day-to-day flexibility challenges (e.g., “The person wants something that is not available;”) factor analyses of the BFRS-R have emphasized factors related to the type of flexibility challenge presented (i.e., interruptions/disruption vs. position/location of the flexibility challenge; or flexibility towards objects, flexibility towards the environment and flexibility towards persons; Peters-Scheffer et al. 2008; Pituck. et al. 2007).

Restricted and repetitive behaviors and interests, a behavioral constellation related to flexibility (Kenworthy et al. 2009; Lopez et al. 2005; South et al. 2007) are the subject of several informant report measures. The Restricted Behavior Scale—Revised (RBS-R) is a well-studied, internally consistent and validated multidimensional measure of restricted and repetitive behaviors (Bodfish et al. 2000; Lam and Aman 2007; Leekam et al. 2007; Mirenda et al. 2010) which assesses six types of behavior problems: Stereotyped Behavior Problems, Self-Injurious Behavior Problems, Compulsive Behavior Problems, Ritualistic Behavior Problems, Sameness Behavior Problems, and Restricted Behavior Problems. The Interests Scale specifically targets strong interests (Bodfish 2004; Turner-Brown et al. 2011). The Autism Diagnostic Interview—Revised (ADI-R) provides a diagnostic assessment of ASD, and includes, among other domains, 12 items assessing repetitive and restricted behaviors and interests (RRBI) specific to ASD diagnosis (Lord et al. 1994); these 12 items reflect more than one factor, including a repetitive sensory and motor behaviors factor as well as an insistence on sameness factor (Bishop et al. 2006; Cuccaro et al. 2003; Richler et al. 2007; Szatmari et al. 2006).

The Flexibility Scale (FS) was developed as a multi-dimensional parent/informant report measure of a child or adolescent’s real-world flexibility skills. The FS was not designed to replace existing behavioral flexibility or RRBI measures, but instead to complement them by sampling richly a broad range of cognitive (and primarily verbally and/or socially mediated) expressions of everyday flexibility. The FS presents specific descriptions of flexibility characteristics, including areas in which less-flexible approaches may be a strength (e.g., “Enjoys categorizing information” or “Likes to know everything about a topic”). This recognizes, as highlighted in the concept of neurodiversity, that characteristics that may be problematic in some settings can be a source of strength and ability in other settings (Armstrong 2015; Kapp et al. 2013). The item-set for the FS was developed based on the cognitively-related components of flexibility previously described in the broad flexibility literature: routinized thinking (e.g., Evans et al. 1997; Lam and Aman 2007; Zandt et al. 2007), strong interests (e.g., Anthony et al. 2013; McHale et al. 2001; South et al. 2005), insistence on sameness (e.g., Bishop et al. 2013; Hus et al. 2007; Szatmari et al. 2006) and weak idea generation skills (Generativity; e.g., Kenworthy et al. 2008; Turner 1999). Rigid/picky eating challenges, common among youth with ASD (e.g., Bandini et al. 2010; Kuschner et al. 2015), were included as exploratory items, conceptually hypothesized as potential components/expressions of flexibility around food.

Study Aims and Hypotheses

Flexibility is an important construct to assess in many disorders, and the FS is expected to be relevant well beyond ASD. In line with previous findings of the trans-diagnostic nature of behavioral inflexibility, but with greatest severity in ASD (Bodfish et al. 2000), this study specifically develops the FS in youth with ASD: examining its multi-dimensionality through factor analytic techniques; assessing possible FS covariates of age, IQ and gender; assessing convergent and divergent validity in the context of measures with known sensitivity in ASD; and assessing discriminant validity between ASD participants and typically developing participants, with these hypotheses:

  1. 1.

    Based on previous studies of flexibility, the FS is predicted to present four distinct, but related, factors: routinized thinking/behavior, insistence on sameness, strong interests and weak idea generation skills. Items related to food selectivity are also included as an exploratory factor.

  2. 2.

    The following are predictions regarding the reliability and content validity (convergent, divergent, and discriminant) of the FS:

    1. a.

      The FS factors will be internally reliable, given previous findings of internal consistency in the dimensional assessment of flexibility (e.g., Peters-Scheffer et al. 2008; Pituck. et al. 2007).

    2. b.

      The FS will be positively correlated with the BRIEF (Gioia et al. 2000), with particularly strong relationships between the FS and the Shift domain of the BRIEF. The BRIEF Shift domain relationships will be strongest for insistence on sameness-related factors/components of the FS.

    3. c.

      The FS will be positively correlated with two performance-based flexibility tasks (Delis Kaplan Verbal Fluency Total Switching Accuracy and Trail Making Test Switching), but not correlated with non-switching conditions of these same tasks (DKEFS Category Fluency and Trail Making Motor Speed; Delis et al. 2001; Reitan 1992).

    4. d.

      Based on previous literature linking flexible thinking to RRBI ASD symptoms it is hypothesized that the FS will be positively correlated with the Autism Diagnostic Observation Schedule (ADOS; ADOS 2; Lord et al. 2000, 2012) restrictive/repetitive behaviors and ADI-R repetitive and restrictive behaviors total scores, but not correlated with ADI-R Verbal Communication or Social totals. Additionally, it is hypothesized that the FS will be highly correlated with the conceptually parallel domains of the RBS-R (sameness behaviors, ritualistic behaviors, restricted behaviors; Bodfish et al. 2000) and with the Interests Scale (Bodfish 2004). The FS will show smaller correlations with the factors of the RBS-R that are more related to repetitive movements and intellectual disability presentations (stereotyped behaviors and self-Injurious behaviors), and the FS generativity domain will not relate to any RBS-R factors.

    5. e.

      The FS will discriminate based on diagnostic group (ASD vs. typically developing (TD)) given previous findings that flexibility performance/functioning distinguishes children with ASD from TD controls (Van Eylen et al. 2015). It is predicted that youth with ASD will have significantly greater flexibility problems as measured by the total score and subdomain scores on the FS as compared to TD controls.

Methods

Participants

Participants were 221 children and adolescents (182 males) with ASD between the ages of 6 and 17 (M = 10.68, SD = 2.09), and 57 typically developing controls (47 males) between the ages of 6 and 17 (M = 11.31, SD 2.50). Typically developing controls (TD) were included in this study to test the ability of the FS to discriminate TD versus ASD groups. Participants with parent-reported comorbid genetic conditions, traumatic brain injury and neurological disorders that could affect cognitive functioning were excluded. Participants were assessed at either Children’s National Medical Center or The Children’s Hospital of Philadelphia. Trained and experienced clinicians diagnosed all participants with ASD using DSM-IV-TR criteria. All participants with ASD met criteria established by the NICHD/NIDCD Collaborative Programs for Excellence in Autism (Lainhart et al. 2006) using the ADI-R (Lord et al. 1994) and/or the ADOS (Lord et al. 2000) or ADOS 2 (Lord et al. 2012). The mean IQ for ASD and TD groups differed (ASD M = 107.77, SD = 18.95; TD M = 117.77, SD = 14.32, t(276) = 3.71, p < .001). Therefore, for analyses comparing ASD and TD groups, a subgroup of the ASD sample was used, removing all ASD participants with IQ less than 92 to match the IQ range of the TD group (TD IQ range: 92–149; subgroup of ASD group IQ range: 92–154). This resulted in two more comparable “higher-IQ” groups, which did not differ in age, gender ratio, or IQ (all ps > .05). However, the IQ difference between the groups still approached significance (ASD M = 114.13, SD = 15.39; TD M = 117.77, SD = 14.32, t(232) = 1.579, p = .116), as did age differences (ASD M = 10.77, SD = 2.00; TD M = 11.31, SD = 2.50, t(232) = 1.663, p = .098). Therefore, all analyses comparing the ASD and TD groups were conducted twice, with and without covarying IQ and age, and the significance of the results of the two analyses were compared. A subset of participants with ASD completed two additional neuropsychological measures (DKEFS Verbal Fluency; N = 89; Delis et al. 2001; Trails A and B; N = 83; Reitan 1992). Those participants who received the neuropsychological measures did not differ from those who did not in IQ (IQ of those who received neuropsychological measures: M = 107.86, SD = 19.62; IQ of those who did not receive neuropsychological measures: M = 107.62, SD = 17.67, t(219) = 0.090, p = .929), age (Age of those who received neuropsychological measures: M = 10.66, SD = 2.27; Age of those who did not receive neuropsychological measures: M = 10.69, SD = 1.69, t(219) = −0.075, p = .940), or gender (Χ 2(1) = 0.124, p = .725). Table 1 provides characterization and performance information for the full ASD sample. Table 2 provides this information for the ASD subgroup with IQ > 91 and the TD controls.

Table 1 ASD Participants and Measures (N = 221, except as noted)
Table 2 Comparison of ASD (IQ > 91) and TD Groups

Measures

Flexibility Scale

The Flexibility Scale (FS) is a parent/caregiver report measure developed to assess the multidimensionality of flexibility in youth. The FS was developed by four executive function specialists (L.G. Anthony, L. Kenworthy, B. Yerys and G.L. Wallace) through an iterative process. First, a comprehensive literature review was conducted to identify current understandings of the multidimensionality of flexibility. Review of clinical interview notes from previous cases helped to focus item-development on real world manifestations of flexibility/inflexibility, such as what parents often say about the flexibility challenges their children face, including what their children report as hard for them as well as areas of great (albeit over-focused) depth of knowledge. Finally, the face validity of the items and their appropriateness for different categories was assessed by a larger group of neuropsychologists and developmental psychologists.

Study participants completed 50-items, with a total score as an overall measure of flexibility problems calculated by summing all items, with some positively-worded items reverse scored (e.g., “Easily generates new ideas.”). Questions present observable day-to-day characteristics/responses that highlight a child’s cognitive flexibility style. The FS item-set was generated based on known factors, as described above. The FS assesses more than just problems by including cognitive characteristics and approaches that may be understood as strengths in specific contexts. The FS has a four-point ordinal Likert scale for each item: 0 = no, 1 = somewhat, 2 = very much, 3 = always. Higher scores (after reverse scoring) mean greater endorsements of problems/characteristics. See the “Introduction” section for a summary of how the scale was initially developed.

Parent Report Measure of Executive Function

Behavioral Rating Inventory of Executive Functioning

The Behavioral Rating Inventory of Executive Functioning (BRIEF; Gioia et al. 2000) is an 86-item, parent-report inventory that measures EF skills in children ages 5–18 years. Each item is scored on a Likert scale from 1 (Never) to 3 (Often). The BRIEF contains eight scales corresponding to the following EF subdomains: inhibition, shift, emotional control, initiation, working memory, planning and organization, organization of materials, self- monitoring. Raw scores on each of the domains are converted to standardized T-scores.

Cognitive and Executive Function Performance-Based Measures

IQ

Full-scale IQ was measured by one of several different standardized, well-normed tests: the Wechsler Abbreviated Scale of Intelligence (Wechsler 1999; 53.9% of the sample), the Wechsler Abbreviated Scale of Intelligence II (Wechsler and Zhou 2011; 10.5% of the sample), the Wechsler Intelligence Scale for Children—IV (Wechsler 2003; 12.3% of the sample), the Wechsler Adult Intelligence Scale—IV (Wechsler 2008; 0.5% of the sample) or the Differential Abilities Scale 2 (Elliott 2007; 22.8% of the sample). A range of tests were given based on a participant’s previous testing status and age. Correlations between the WASI II and WISC IV Full-Scale IQ are reported to be 0.88 (Wechsler and Zhou 2011) and correlations between the DAS II and WISC IV Full-Scale IQ are reported to be 0.93 (Kuriakose 2014). All Full-Scale IQ scores are reported as standard scores (M = 100 ± 15).

DKEFS Verbal Fluency

The Delis Kaplan Executive Function System (D-KEFS) (Delis et al. 2001) is a standardized measure with extensive normative data that consists of nine tests that measure a variety of executive functions. The D-KEFS Verbal Fluency tests as administered in this study measure the child’s ability to fluently retrieve words under two conditions: Category Fluency (i.e., naming animals) and Switching (alternating between two categories when naming). Standardized scores are produced: category fluency total correct (i.e., a measure of how many words are produced in the simple category condition) and switching fluency accuracy (i.e., a measure of the child’s ability to accurately switch between categories.) Performance is reported as scaled scores, with higher scores indicating better fluency (category fluency) or flexibility (switching fluency accuracy).

Trail Making Test

The Halstead-Reitan Trail Making Test - Intermediate (Reitan 1992; Strauss et al. 2006) is a standardized two-stage neuropsychological measure of visual search/motor speed and task switching. In the first round, the child is timed drawing a line to connect consecutive numbers. The second round presents numbers and letters, and the child is timed drawing a line switching between consecutive letters and numbers (i.e., 1-A-2-B, etc.). Trail making motor speed is a standardized score based on the first round, and represents simple visual search and motor speed. The second score, trail making switching minus motor speed (trail making–motor speed), is the difference of the non-switching (first) round standard score from the switching (second) round standard score. Trail making switching–motor speed is understood as a measure of the switch-cost of the second condition and accounts for the child’s underlying (first round) visual search and motor speed.

Autism Symptoms/Diagnostic Measures

ADOS

The ADOS (Lord et al. 2000) is a semi-structured, observational assessment that scores a participant’s response to social presses for communication, reciprocal social behavior, and repetitive behaviors and stereotyped interest patterns. There are five different modules of the ADOS, each designed for a different developmental/communication level. The revised ADOS, the ADOS 2 (Lord et al. 2012), presents a new scoring algorithm, though with a majority of items for ratings common with the original measure. A majority of participants received the ADOS Module 3 (N = 164) or the ADOS 2 Module 3 (N = 41), modules which have similar restrictive/repetitive behaviors domain item content. Due to the updated administration and scoring of the ADOS 2 Module 3, which differs somewhat from the administration and scoring of the original ADOS Module 3, participant scores were standardized based on other ASD participants in the study who received the same ADOS module. Those standardized scores (z-scores) were then used for analyses to allow for a Module 3 Restrictive/Repetitive Behaviors total across the first and second versions of the ADOS.

ADI-R

The Autism Diagnostic Interview-Revised (Lord et al. 1994) is a structured parent interview about the child’s developmental history, with an emphasis on communication, social development, and repetitive and restrictive behaviors. The ADI-R was administered using the diagnostic algorithm for scoring, which assesses the presence of symptoms in earlier childhood (i.e., between the ages of 4–5) as well as over the course of development since that time. With this scoring, the measure assesses the presence of these symptoms over the individual’s lifetime, and does not give an immediate current level of ASD-symptoms. The ADI-R scores used in analyses are the communication, social, and repetitive and restrictive behaviors totals. Higher scores indicate greater reported impairment.

Parent Report Measures of RRBIs

Repetitive Behavior Scale—Revised

The Repetitive Behavior Scale—Revised (RBS-R; Bodfish et al. 2000) is a parent report measure with items rated on a four-point Likert-scale, and contains six subscales: Stereotyped Behavior, Self-Injurious Behavior, Compulsive Behavior, Ritualistic Behavior, Sameness Behavior, and Restricted Behavior. The items are a series of behaviors (e.g., Ritualistic behavior at eating/meal time). In addition to the subscales, all items are summed to create a Total score. The RBS-R was initially developed to assess the severity of a variety of repetitive behaviors in a broad range of individuals with developmental/intellectual disabilities. Factor analysis of the RBS-R has supported five and fewer factors in several investigations in ASD (Bishop et al. 2013; Lam and Aman 2007), as well as validation of the measure in relationship to the ADI (Mirenda et al. 2010). In the current study, due to a site-based administration error which occurred with the first 118 participants at the Children’s National Medical Center (CNMC) site, the participant parents did not complete the entire Restricted Behavior domain of the RBS-R for their children with ASD. The remaining 60 CNMC parents did complete this domain, as did all 43 families from The Children’s Hospital of Philadelphia group, resulting in a total of 103 complete administrations. Therefore, analyses involving the RBS-R Total Behavior score (which consists of the sum of all sub-domains) and the Restricted Behavior domain are limited to those 103 participants with complete RBS-R administrations. The group of participants who did not complete the entire Restricted Behavior domain were not different from the group that did in terms of their IQ (IQ of those who did not complete RBS-R Restricted: M = 108.67, SD = 18.82; IQ of those who did complete RBS-R Restricted: M = 106.77, SD = 18.82, t(219) = 0.744, p = .458) or gender (Χ 2(1) = 0.004, p = .950). However, the two groups did differ in age (Age of those who did not complete RBS-R Restricted: M = 10.11 SD = 1.72; Age of those who did complete RBS-R Restricted: M = 11.31 SD = 2.30, t(219)=-4.451, p = .000).

Interests Scale

The Interests Scale (IS; Bodfish 2004; Turner-Brown et al. 2011) is a parent report checklist that asks the parent/caregiver to rate a child’s specific interests (e.g., “Interest in machines, how things work”) as being present currently, present in the past, or not ever present. The parent is then asked to indicate the child’s three primary interests and several additional questions about these interests including how intense they are, how much they interfere with social interactions and flexibility, and the need for accommodation around those interests. The IS yields a Total Number of Current Interests Endorsed score and a Total Intensity score. Higher scores indicate greater number or intensity of interests. The IS is a sensitive measure in children with ASD, with findings of increased intensity of interests and a relationship between intensity and other symptoms (including ASD symptoms and executive dysfunction; Anthony et al. 2013).

Procedure

Participants were evaluated as part of a research protocol. This project was conducted in compliance with standards established by the institutions’ Institutional Review Boards, including procedures for informed consent.

Data Analysis

Refinement of Item Set and Identification of Factor Structure of FS

The preliminary FS was revised through an iterative scale-development procedure drawing upon techniques employed in the development of other informant-report scales, such as the Behavior Rating Scale of Executive Function (BRIEF; Gioia et al. 2000). An exploratory factor analysis (EFA) with 221 participants with ASD was performed on the preliminary FS items using Mplus, Version 7 (Muthén and Muthén 1998) based on polychoric correlations, because the FS items use an ordinal response scale (0–3). An oblique rotation method (Promax) was selected to allow for presumed correlations between the factors. Three methods were employed to assess the optimal number of factors to extract: evaluation of root mean square error of approximation (RMSEA), theory-driven evaluation of the various factor solutions, and parallel analysis. RMSEA is a measure of model fit, and factor solutions with RMSEA values under 0.05 are considered a “good fit” (Preacher et al. 2013). The RMSEA is a standard indicator of model fit used in contemporary EFA studies (e.g., Bishop et al. 2013; Takishima-Lacasa et al. 2014). The parallel analysis was conducted through the program, ViSta-PARAN (Ledesma and Valero-Mora 2007).

The item loadings for each of the factors in the factor solution were evaluated according to specific common criteria in the literature, with the goal of refining the FS factors such that each would have as few cross-loaded items as possible, as many highly-loaded items as possible and an acceptable internal consistency (Costello and Osborne 2005; Tabachnick and Fidell 2001). Specifically, items with loadings of less than 0.32 on all factors, or with no significant factor loadings were dropped and items with cross-loadings greater than 0.32 but with no loadings at 0.4 or above were dropped. A second EFA was conducted on the remaining item-set. The resulting item-set and factor structure was reviewed by a team of 10 international experts in ASD and/or executive function,Footnote 1 who provided recommendations for factor descriptions/names and for decisions around item factor assignments and item removal. A final EFA procedure was conducted (as before).

Reliability and Validity of Revised FS evaluated in the ASD group

Once a final factor analytic solution was obtained, internal reliability (Cronbach’s alpha) and inter-factor correlations were evaluated. Hypothesized covariates were evaluated, still in the ASD group alone. To evaluate construct validity based on the a priori hypotheses, partial correlations with the covariates of IQ, age and gender were conducted between the FS and the other measures. Strong relationships were predicted between the FS and other parent report measures. Because the ADI-R scores used in this study were based on report of the child’s behaviors since infancy, and therefore potentially temporally distant/different from current functioning, relatively weaker, albeit significant correlations were predicted between the FS and the ADI-R RRBI total. The method variant correlation effect sizes (i.e., parent report FS correlations with performance-based tasks) were predicted to be smaller given previous findings that relationships between executive function performance tasks and parent report of real world manifestations of executive function are more difficult to capture (Barkley and Murphy 2010; Toplak et al. 2013; Vriezen and Pigott 2002). Finally, FS mean scores were compared in the subset of participants with ASD and IQ greater than 91 and the TD group and effect sizes were calculated. Given the trend toward different IQs in the groups, these comparisons were conducted again using ANCOVA with IQ covaried and the significance of the results compared.

Results

FS Extracted Factors

Potential factor solutions are reported in order, beginning with models with the least number of factors based on the principle of parsimony. The one-factor solution had poor model fit (RMSEA = 0.094) and produced an apparent general flexibility factor that conflated aspects of flexibility known to be distinct. The two-factor model also had poor fit (RMSEA = 0.065) and produced an apparent general flexibility factor and a problems generating ideas factor. Again, the general flexibility factor conflated distinct subcomponents of flexibility. The three-factor solution was also unsatisfactory given its model fit (RMSEA = 0.055) and its conflation of routines/rituals and problems with transitions/change into one factor; these components have been previously identified as distinct (e.g., Lam and Aman 2007). The remaining apparent factors in the three-factor solution were related to special interests and problems generating ideas. The four-factor solution produced apparent factors in the areas of routines/problems with change, over-focused interests, problems with the social aspects of flexibility, and problems generating ideas. Again, this solution’s conflation of items related to routines and rituals and problems with transitions/change into one factor was problematic, as described above. The RMSEA was 0.050. The five and six-factor models were also explored. The six-factor solution was problematic, as it isolated the selective-eating-related items in a single factor and produced a sixth factor with just one item, which was highly cross-loaded on another factor. The five-factor solution was the smallest factor solution (i.e., the most parsimonious) to achieve an RMSEA below 0.05 (RMSEA = 0.045). Evaluation of the five factor solution produced theoretically sound and distinct factors, based on known flexibility-related domains and clinical experience: Routines/Rituals, Transitions/Change, Special Interests, Social Flexibility, and Generativity. Four of these factors were predicted in the authors’ initial design of the instrument and by previous studies. The Social Flexibility factor has parallels in the literature, with findings of the interrelatedness of social functioning and executive function/flexibility in development as well as in intervention studies (e.g., Kenworthy et al. 2014, 2009; Pellicano 2007; Stichter et al. 2010, 2012). As a final check, parallel analysis of the FS items was conducted and indicated that the five-factor EFA solution was the best fit for the data.

FS Factor Loadings and Item Set

Nine items were dropped after the initial five factor EFA of the original 50-item FS, one item due to factor loadings less than 0.32, and eight items due to cross-loadings and no factor loading of 0.4 or higher. Of note, items related to repetitive movements (e.g., “Paces”, “Enjoys repetitive movements”) were among the items that dropped out, and these items did not factor together in six or seven factor solutions. A second EFA with the remaining 41 items retained the same apparent factor structure, and was presented to the team of international experts. The team made suggestions about appropriate descriptors for each factor (see above), and cross-loaded items were evaluated for appropriate factor assignment or deletion. For example, the team recommended the following deletions: “Comfortable with unscheduled time” (reverse scored) cross-loaded on Transitions/Change and Generativity, and was dropped due to the cross-loading (a lack of specificity of the item); “Difficulty when rules are not explicit” cross-loaded on Transitions/Change and Social Flexibility and was dropped due to the cross-loading as well as poorer theoretical fit with the other items on these two factors. A third EFA was conducted on the 33 remaining items, resulting in a similar factor structure as before. Six additional items were removed based on cross-loadings and/or loadings below 0.4. Regarding the three eating-related items (“Will eat only certain foods; picky eater”, “Has other special preferences around eating [Example: insisting that foods don’t touch each other, insisting on using a particular eating utensil]”, and “Eats food in a peculiar way [Example: picking it apart instead of biting into]”, the first two items loaded moderately on Routines/Rituals, but with subsequent EFA, showed poorer loadings and were dropped. The latter item originally had moderate loadings on the Social Flexibility factor, but was dropped after refinement of this factor both for psychometric and conceptual reasons. The final 27-item FS with factor loadings is presented in Table 3.

Table 3 Five factor solution for the flexibility scale in ASD (N = 221) using polychoric EFA with an oblique rotation (Promax)

Reliability and Validity of Revised FS Evaluated

Internal Consistency

Internal reliability was adequate for each scale: Routines and Rituals (α = 0.750), Transitions/Change (α = 0.906), Special Interests (α = 0.795), Social Flexibility (α = 0.854), and Generativity (α = 0.878).

Relationships Between Factors

Relationships between factors are shown in Table 4. Strong relationships were observed between each of the factors and the Total Problems with Flexibility score, except Generativity, which was only moderately correlated with Total Problems. All correlations between factors were of moderate size, except the following: (1) Special Interests and Social Flexibility had a weak correlation, as did Generativity and Social Flexibility, and (2) Generativity was otherwise unrelated to FS factors.

Table 4 Flexibility scale domain Pearson correlations (ASD only)

FS Covariates

See Table 5 for a summary of relationships between the FS and age and IQ; differences in FS scores based on gender are presented below. In accordance with previous findings of generally decreased symptoms over time in school-age range youth with ASD (Eaves and Ho 1996; Piven et al. 1996), including higher-order RRBI (Esbensen et al. 2009), increased age was related to fewer Total Flexibility Problems (r = −.200, p = .003) and Social Flexibility problems (r = −.231, p = .001). There was also a very small negative relationship between Special Interests and age (r = −.167, p = .013), which contrasts with some reports in the literature of increased circumscribed interests with greater age (Bishop et al. 2006; South et al. 2005). In accordance with previous findings of reduced flexibility problems with higher IQ (Van Eylen et al. 2015), IQ was negatively related to Routines/Rituals (r = −.228, p = .001) and Generativity problems (r = −.311, p < .001). IQ was unrelated to other factors, consistent with previous findings of no relationship between insistence on sameness-related ASD symptoms and IQ (Richler et al. 2010). As age, IQ and gender were related to components of the FS in the ASD group, these variables were used as covariates in all other correlations. In accordance with some previous findings of less pronounced restricted interests in females with ASD (Supekar and Menon 2015), based on an ANCOVA controlling for age and IQ there was a significant effect of gender on Special interests, F(1,217) = 13.186, p < .001, with males with ASD showing greater endorsements on this factor, p < .001, 95% CI [.53, 1.798]. ANCOVAs with Total flexibility problems and other FS factors showed no differences between genders (Total flexibility problems, F(1,217) = .627, p = .429; Routines/rituals, F(1,217) = 2.659, p = .104; Transitions/change, F(1,217) = .412, p = .522; Social flexibility, F(1,217) = .167, p = .683; and generativity, F(1,217) = 1.004, p = .317).

Table 5 Relationship of FS total and domain scores to age, intelligence, repetitive behaviors, and executive functions; ASD only, N = 221 except where noted

FS Construct Validity: FS and BRIEF

See Table 6 for FS and BRIEF relationships. There were strong relationships between the FS and BRIEF (parent report of everyday executive function), with a specific pattern of the strongest relationships between conceptually similar domains. As predicted, the FS Transitions/Change and BRIEF Shift were highly related (r = .701, p < .001); these domains have conceptually similar items focused on insistence on sameness. The FS Total also showed a very strong relationship with BRIEF Shift (r = .679, p < .001).

Table 6 FS correlations with parent report BRIEF, controlling for age, gender and IQ, ASD only; N = 212

FS Construct Validity: FS and Performance-Based Switching Tasks

See Table 5 for FS and performance-based switching task relationships. As predicted, there were no significant relationships between D-KEFS Category Fluency and the FS. Also as predicted, D-KEFS Switching Accuracy was related to FS Total (r = .227, p = .036) and social flexibility (r = .312, p = .004). Consistent with the hypothesis, there were no significant relationships between the FS and Trail Making Motor Speed. A trend toward a relationship was observed between Trail Making Switching–Motor Speed and Transitions/Change from the FS (r = .234, p = .050), but no other relationships were significant. Overall, relationships between FS and method variant tasks provided some support for construct validity, particularly when considering that the D-KEFS and Trail Making tests included much smaller numbers of participants (n = 74–89), and relationships between performance tasks and real world skills are difficult to capture.

FS Construct Validity: FS and ADOS/ADI

See Table 5 for FS and ADOS/ADI relationships. Construct validity was explored with partial correlations using age, IQ, and gender as covariates, to test predictions of FS relationships with established ASD diagnostic measures. Contrary to the hypothesis, FS Total and all FS subdomains were unrelated to ADOS RRB, except Special Interests, which had a very small, but significant, correlation (r = .149, p = .034). As predicted, the ADI-R RRBI Total had small significant correlations with FS Total Problems, Routines/Rituals, Transitions/Change, and Special Interests (r = .214, p = .002; r = .207, p = .003; r = .169, p = .016; r = .156, p = .027, respectively), however, the ADI-R RRBI was not correlated with Social Flexibility or Generativity. Consistent with the hypothesis, the ADI-R Social Total and Verbal Communication Total did not relate to FS Total or subdomains, except significant, though small, unpredicted relationships with Generativity (r = .177, p = .012; r = .247, p < .001, respectively).

FS Construct Validity: FS and RBS-R and IS

See Table 7 for FS and RBS-R and IS relationships. As predicted, there were many strong relationships between the FS and other parent report measures of RRBI (see Table 6). Consistent with the hypothesis, the FS Total and RBS-R Total Behavior were very highly correlated (r = .747, p < .001). Conceptually similar subdomains between the FS and RBS-R were strongly correlated: FS Routines/Rituals and RBS-R Ritualistic Behavior (r = .600, p < .001); FS Problems with Transitions/Change and RBS-R Sameness Behavior (r = .698, p < .001); FS Special Interests and RBS-R Restricted Behavior (r = .518, p < .001). FS relationships with more conceptually distant RBS-R subdomains (i.e., Stereotyped Behavior and Self-Injurious Behavior) were, as predicted, small to moderate, with the exception of RBS-R Restricted Behaviors, which had strong relationships FS Routines/Rituals and Transitions/Change (r = .511, p < .001; r = .532, p = .001). As predicted, the FS Generativity had no significant relationships with any RBS-R domains. Small unpredicted relationships were observed between the IS Current Interests and the FS (except there was no relationship with Social Flexibility). IS Total Intensity had moderate relationships with FS Total, Routines/Ritual, Transitions/Change, and Social Flexibility (r = .419, p < .001; r = .328, p < .001; r = .325, p < .001; r = .401, p < .001), but surprisingly had only a small relationship with FS Special Interests (r = .182, p = .007). FS Generativity was unrelated to IS Total Intensity, and had a small negative relationship with IS Current Interests (r = −.192, p = .005).

Table 7 FS correlations with parent report RRBI measures, controlling for age, gender and IQ; ASD only

Comparing FS in Diagnostic Groups

Refer to Table 2 for comparisons of FS scores in the ASD subgroup (IQ > 91) and the TD group. As predicted, the two groups had significantly different scores on all FS factors and Total Problems with Flexibility, and the effect sizes were large (Cohen’s d ranging from 1.23 to 2.84). These analyses were conducted a second time using ANCOVA and covarying IQ and age, with no change in the significance of comparisons (all ps < .001).

Discussion

This study is innovative for its focus on the measurement and multidimensionality of real-world higher-order executive function flexibility skills and problems. Of particular significance, based on the most parsimonious factor solution of the FS, is the emergence of a factor linking cognitive flexibility and social functioning (“Social Flexibility”), a linkage which has been foreshadowed in previous studies (Fisher and Happé 2005; Kenworthy et al. 2014; Pugliese et al. 2016; Stichter et al. 2010, 2012). Through a multi-step iterative process synthesizing previous research findings, employing EFA techniques and garnering input from an international team of ASD experts, the FS is presented as an instrument with a clear factor structure, solid internal reliability, and emerging evidence for construct validity when compared with existing executive function/flexibility report, as well as RRBI report. Also significant are the findings of small, but meaningful relationships between the FS and performance-based tasks, as these relationships suggest that switch-cost tasks capture aspects related to real-world flexibility skills. Documenting such relationships between task performance and parent report in the area of executive function has been challenging (Barkley and Murphy 2010; Toplak et al. 2013; Vriezen and Pigott 2002), yet was accomplished in this study, perhaps related to the larger sample size/increased power. The eating-related items did not factor with the five FS factors; avoidant/restrictive eating has been previously linked to sensory sensitivities, specifically (Zucker et al. 2015).

The five factors of the FS, described as Routines/Rituals, Transitions/Changes, Special Interests, Social Flexibility, and Generativity all have solid internal reliability and sufficient item totals (four factors have five or more items per factor, and Generativity has four, which is considered acceptable, but less than ideal in terms of generalizability; Costello and Osborne 2005). Four of the five factors are similar to the a priori predicted factors, and the FS produced the additional social flexibility factor, not originally predicted. Of note are the significant relationships between this factor and performance-based tasks of switching. Linkages between social functioning and executive function have been clearly described in the treatment literature, and the Social Flexibility factor may capture that overlap (Fisher and Happé 2005; Kenworthy et al. 2014). The Social Flexibility factor also has potential overlap with emotional regulation (ER), as several FS items on the Social Flexibility factor have apparent ER components (i.e., “Is a good sport”, “Difficulty taking turns”, “Gets upset when losing a game”); ER relates to flexibility in general (Gioia et al. 2000) and socialization skills in ASD (Mazefsky 2015). Clearly, the Social Flexibility items are related to items that could be found on a social or adaptive measure, but what unifies them beyond social or adaptive skills is the underlying flexibility demand intrinsic in these items (e.g., “taking turns”). This factor’s strong relationship to other FS domains as well as to neuropsychological switching tasks suggests that it is capturing something more specific than just social functioning. Given this new dimension of measurement, the FS may be useful in capturing how flexibility difficulties relate to social skills. For example, the FS could be a useful as an outcome measure in both flexibility and social skill intervention studies.

Mirroring some previous reports of reduced RRBI and executive function flexibility problems with greater age, FS total problems were less intense in the older participants. This was a modest association, however, and inspection of the factor correlations with age suggest that this relationship may be related, at least in part, to older children experiencing fewer difficulties with restricted interests and social flexibility. Although age-related effects cannot clearly be surmised from cross-sectional data, these findings encourage further exploration of developmental trajectories with the FS domains. Regarding IQ, there has been some evidence of reduced flexibility problems and RRBI with higher IQ (Bishop et al. 2013; Van Eylen et al. 2015), but some of these studies have included a broad IQ-range, including individuals with intellectual disability, and IQ-related effects may have been driven by the differences between those with and without intellectual disability. In this study in which children had generally average range IQ, higher IQ was related to reduced problems with Routines/Rituals and Generativity, but not to the overall Total Problems with Flexibility or other factors. By omitting individuals with intellectual disability, it may be observed that overall in children without intellectual disability, IQ has little impact on the expression or intensity of flexibility problems in ASD. Finally, the finding of greater Special Interests in boys than girls with ASD raises the question of whether girls have fewer intense special interests, or if by the nature of their somewhat different interests (Anthony et al. 2013), these interests are experienced as less noticeable or impactful. Additionally, typically-developing boys have been found to have more special interests than TD girls (DeLoache et al. 2007), so this finding is not specific to ASD.

Although difficulty with generativity may be a key component of executive dysfunction in ASD, the performance of this factor was unimpressive, both in terms of its lack of relatedness to other FS factors, executive function and RRBI measures, and its unexpected relationship with social and verbal symptoms of ASD (on the ADI-R). It was not possible to parse its potential relationship with language skills or verbal IQ, but it is notable that even with full scale IQ accounted for as a covariate in construct validity comparisons (of which a sizeable portion is verbal IQ), significant relationships with the ADI-R remained. This factor may be capturing some aspects of language skills that are not accounted for by verbal IQ. Previous findings of generativity in ASD are mixed. A study of word generation in ASD reported no relationship with RRBIs, but instead relationships with communication symptoms (Dichter et al. 2009). In contrast, significant relationships were observed between inflexibility/RRBI and problems generating ideas on the Uses of Objects test (Bishop and Norbury 2005; Van Eylen et al. 2015) and between flexibility and a verbal fluency task (Kenworthy et al. 2013). The concept of generativity may well be important in the executive function of youth with ASD (e.g., Turner 1999), but given the present findings, more work is clearly needed to determine how this construct relates to clinical and research outcomes.

The relevance and utility of the FS in comparison to other informant report measures lies in its multi-dimensional assessment of flexibility; focus on higher-order, verbally and socially mediated expressions of flexibility as opposed to behavior problems; inclusion of items capturing the potential strengths associated with less-flexible thinking; and introduction of the social flexibility factor, foreshadowed in the executive function treatment literature. When comparing the BRIEF Shift domain to the FS subdomains, the transitions/change factor and BRIEF Shift domain are highly correlated, not surprising given that seven of the eight items making up the BRIEF Shift domain are specifically targeting problems tolerating transitions and change. However, the relationships between the BRIEF Shift and other FS factors are much less strong, highlighting the distinctiveness of these other FS factors from the BRIEF Shift. Comparing the FS and the RBS-R, the total intensity of presentation between the measures is highly correlated, whereas the factor structures and subdomain relationships are less symmetrical and more nuanced. A focus on subdomains of the FS is appropriate, as its structure is best expressed in five factors, not one, by all indicators of best fit for EFA. The RBS-R Sameness Problems and FS Transition/Change show the strongest subdomain correlations, clearly measuring highly related insistence on sameness constructs. However, the relationships between other RBS-R and FS domains are less parallel. For example, the RBS-R Restricted Behavior Problems domain has similar level relationships to three FS domains, while the FS social flexibility factor shows smaller relationships with all RBS-R domains. The fact that the measures are related, but not parallel in terms of factor structure and subdomain relationships suggests that each is capturing different, but related constructs based on the focus of their item-sets: the FS with its focus on higher-order and more verbally/socially mediated expressions of flexibility and the RBS-R with its focus on problems with repetitive behaviors.

This study has several limitations. It employed a convenience sample of individuals who volunteered to participate in research that required visits to a hospital in dense urban areas, and in many cases, part of the study included a neuroimaging component. This is not likely to represent the ASD population at large, and a future study should work to obtain a community sample of individuals with ASD that represents the population at large (matching U.S. census rates in race/ethnicity, socioeconomic status and urbanicity). Although the item set was generated through an iterative process based on known components of flexibility in ASD as well as hypothesized factors based on extensive clinical experience, it is possible that the item set does not capture a full range of skills that comprise flexibility. In this light, there are particular concerns with the generativity-related factor, as discussed above. There were insufficient TD participants to explore the factor structure in the non-ASD group. Due to a site error, a reduced number of participants completed the entire RBS-R, reducing the number of participants with RBS-R Restricted Behavior scores and RBS-R Total scores. Those who received and did not receive the complete RBS-R did not differ in IQ or gender, but those who received the complete RBS-R were older (mean score difference of slightly more than 1 year). It is not known how this might have impacted the results. A further limitation of the study is that the FS was developed for informant report and in its current iteration does not include a self-report version. Capturing and assessing the inner experience of flexibility and inflexibility from a child’s perspective will be an important future direction, though likely challenging in ASD given the reduced insight youth with ASD often have regarding their internal states. Finally, this study investigated the FS in youth with ASD and typically developing controls, but not in other populations with known flexibility challenges (e.g., anxiety disorders, OCD, etc.) In spite of these limitations, the FS is found to be a psychometrically-sound, brief parent-report measure of dimensions of executive function flexibility skills in ASD without intellectual disability, including early support for construct validity.

Future studies should evaluate the measure over time, and especially in the context of executive function as well as social interventions. Future research with the FS may also build on the concept of neurodiversity in ASD; the FS was designed to include content reflecting potential strengths inherent in less flexible thinking. An example of an application of this concept would be evaluation of whether the ability to “get stuck” on daily routines and rituals supports a young person’s ability to “get stuck” on beneficial cognitive behavioral therapy scripts and routines, such as in the Unstuck and On Target executive function intervention (Kenworthy et al. 2014). Finally, future studies should examine the FS in other populations for which flexibility is a known challenge, including comparisons of factor structures of the FS across different disorders. The FS is available by request from the corresponding author for general use.