Refinement and Psychometric Evaluation of the Executive Skills Questionnaire-Revised

Strait, Julia Englund; Dawson, Peg; Walther, Christine A. P.; Strait, Gerald Gill; Barton, Amy K.; Brunson McClain, Maryellen

doi:10.1007/s40688-018-00224-x

Refinement and Psychometric Evaluation of the Executive Skills Questionnaire-Revised

Published: 07 January 2019

Volume 24, pages 378–388, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Contemporary School Psychology Aims and scope Submit manuscript

Refinement and Psychometric Evaluation of the Executive Skills Questionnaire-Revised

Download PDF

Julia Englund Strait ORCID: orcid.org/0000-0002-3262-5094¹,
Peg Dawson²,
Christine A. P. Walther³,
Gerald Gill Strait¹,
Amy K. Barton⁴ &
…
Maryellen Brunson McClain⁵

2470 Accesses
20 Citations
Explore all metrics

Abstract

Executive functioning (EF) skills are vital for academic success. Along with the recent explosion of interventions targeting these skills comes the need for affordable, efficient, and ecologically valid measures for planning and tailoring interventions and monitoring outcomes. The current study describes the refinement and initial psychometric evaluation of the Executive Skills Questionnaire-Revised (ESQ-R), a self-report EF rating scale that integrates current scientific understanding of core EF processes with an ecologically valid understanding of EF skills (ESs) that is directly applicable to academic contexts and tasks and tied to available interventions. We describe reduction of an initial 61-item pool to a final 25-item version using a series of exploratory and confirmatory factor analyses with 347 participants. Psychometric evidence for the 25-item version is promising, with excellent internal consistency (alpha = .91), adequate test-retest reliability for a small subsample (.70 with no effects of time delay on score variability), moderate correlations with other EF rating scales (.56–.74) and psychological symptom scales (.38–.55), and a significant correlation with academic engagement (− .40).

The Relationship Between Self-Regulated Learning and Executive Functions—a Systematic Review

Article Open access 22 August 2024

Cross-Battery Approach to the Assessment of Executive Functions

Validating Rating Scales for Executive Functioning across Education Levels and Informants

Article Open access 29 April 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Executive functioning (EF) is one of the most important constructs in understanding individuals’ academic performance. Executive functions (EFs) are a group of higher-level cognitive functions that allow individuals to initiate, maintain, monitor, adjust, and complete goal-directed actions (Dawson and Guare 2010; Dempster 1992; Lezak 1995; Miyake et al. 2000). Major EFs identified from cognitive research include working memory (the ability to mentally hold and manipulate information, often while performing another task and/or dealing with distractions), inhibition (the ability to suppress prepotent responses), and mental flexibility (the ability to switch between tasks or rules or sets; Chan et al. 2008; Diamond 2014; Engle 2002; Miyake et al. 2000; Miyake and Friedman 2012). These core functions facilitate higher-level processes such as problem-solving, reasoning, and decision-making (Collins et al. 2012; Lunt et al. 2012) and predict academic performance in students from PreK to college (Baars et al. 2015; Best et al. 2011).

In college students, for example, EF predicts academic performance above and beyond high school grades and standardized test scores (Crede and Kuncel 2008). EF deficits are also correlated with symptoms and behaviors that negatively impact academic performance, such as anxiety, stress, depression, adjustment problems, and procrastination (Petersen et al. 2006; Rabin et al. 2011; Wingo et al. 2013). The robust relation between EF and academic success has triggered a recent proliferation of promising interventions (e.g., executive coaching; Dawson and Guare 2012) designed to improve students’ EF skills. However, it is unclear how to best plan, tailor, and measure the effectiveness of these interventions.

Measurement of EF

EF can be measured using direct performance tests or rating scales. The bulk of previous EF research has been conducted using direct performance tasks or neuropsychological tests, such as remembering and manipulating digits (digit span; e.g., Diamond 2014). However, researchers have increasingly suggested that EF behavior rating scales are more ecologically valid than direct performance tests (Barkley 2012; Barkley 1991; Dawson and Guare 2010; Dehn 2008; Isquith et al. 2013; Samuels et al. 2016; Toplak et al. 2013). Furthermore, direct tests of EF and EF rating scales are weakly correlated (e.g., r = .19)—indicating that these methods measure different aspects of EF (Toplak et al. 2013). Neuropsychological tests are administered in highly controlled settings and are not representative of how individuals use EFs in their daily lives, whereas rating scales measure EF behaviors that occur in natural environments across time. Additionally, rating scales offer increased efficiency, accessibility, and convenience, especially given advances in web-based administration and scoring platforms. This is particularly important when using measures with large groups to inform interventions and at multiple time points to evaluate interventions.

Currently Available EF and Related Scales

Currently available adult EF rating scales either have high efficiency and accessibility but poor technical adequacy (e.g., Adult Executive Function Inventory or ADEXI; Holst and Thorell 2018), or strong technical adequacy but high training, administration, and scoring financial and time costs. The latter are designed to detect clinical disorders as part of individual diagnostic evaluations rather than to inform and track intervention effectiveness (e.g., the Behavior Rating Inventory of Executive Function-Adult or BRIEF-A, Roth 2005; and the Barkley Deficits in Executive Function Scale or BDEFS, Barkley 2011). There are also study strategies scales specifically designed for college students that include EF (e.g., Learning and Study Strategies Inventory or LASSI, Weinstein et al. 1987); however, these scales tend to blur boundaries between EF and related, but non-synonymous, constructs, such as study skills, learning preferences, and psychological symptoms (Credé and Kuncel 2008). Given the importance of EF for academic success and the limitations of available adult EF rating scales, practitioners would benefit from having access to reliable and valid, non-clinical, affordable, efficient, and academically focused measures of a range of specific EF-related behaviors. This study is the first step in providing an EF scale that meets these needs.

The Current Study

The current study describes the refinement and preliminary psychometric evaluation of the self-report Executive Skills Questionnaire-Revised (ESQ-R) rating scale, a substantial revision of the informal checklists (all called Executive Skills Questionnaires, ESQs) previously published in a series of popular and widely available books for educational support professionals that offer a variety of interventions for different EF skill areas (Dawson and Guare 2010, 2012, 2018). We selected items from the original ESQ versions, generated and refined additional items, used factor analyses to reduce the item pool, and evaluated the resulting ESQ-R version for preliminary reliability and validity evidence.

Method

Participants

We recruited 410 participants enrolled at a regional public university as undergraduate, graduate, or post-baccalaureate/non-degree-seeking students through the university Psychology Participant Pool and campus-wide advertisements. Informed consent and, when appropriate for age, parental consent and child assent were obtained for all individual participants included in the study in procedures approved by the university IRB (Committee for the Protection of Human Subjects or CPHS). (There were two 17-year-olds in the current sample, both of whom participated through the Participant Pool, which requires parent/guardian consent to enroll in the Pool and participate in approved research).

Of the 410 participants who consented to the study, 36 (8.7% of the original sample) were removed due to incomplete data (i.e., participants who started the questionnaire but did not complete it). Most non-completers exited the survey early on, after reading consent and directions but before completing the ESQ or other measures. (On average, non-completers completed 13.94% of the entire questionnaire, which includes consent, and instructions). Comparisons across completers and those who exited the survey before finishing are not available because demographic and other individual data were gathered as the final items in the questionnaire to reduce stereotype threat and other demographic-related impacts on participant responses.

The total sample with completed ESQ-Rs included 374 participants. Demographic characteristics appear in Table 1. Mean age of participants was 26.28 (range = 17–55, SD = 7.61, 70% ages 17–27). Mean GPA was 3.25 (range = [0.0, 4.0], SD = 0.61, 65% 2.7–3.7). When compared to the demographic composition of the USA according to the 2010 census, women and Hispanic participants were overrepresented, while men were underrepresented. When compared to the university student body at the institution from which participants were recruited, women and White participants were overrepresented.

Table 1 Demographic characteristics (N = 374)

Full size table

Procedures

Data collection was part of a larger series of studies and occurred in three waves over a 1-year period. Therefore, sample sizes differ slightly for the various measures that were included in consecutive versions of the study questionnaire. For all waves, participants received a link to an online questionnaire administered through Qualtrics and completed the measures on their own devices. The questionnaire included the online consent form, followed by the measures described below presented in blocks by topic area (e.g., EF, psychological symptoms, etc.) and randomized within each block, followed by demographic questions. (Participants under 18 years old were also required to obtain parental consent prior to registering for research participation through the university psychology subject pool.) To obtain preliminary test-retest estimates, a subset of 38 participants took the measures twice. Average time between administrations was 100 days (SD = 72), as most participants took the questionnaire once in the Fall semester and again in Spring (see “Results”). All participants earned course credit for participation.

Measures

The Executive Skills Questionnaire-Revised (ESQ-R)

The ESQ-R self-report rating scale integrates current scientific understanding of core EFs (Chan et al. 2008; Diamond 2014; Miyake et al. 2000; Miyake and Friedman 2012) with an ecologically valid understanding of EF that is directly applicable to academic contexts and tasks and directly tied to available EF interventions. It represents a substantial revision of Dawson and Guare’s various ESQ versions (Dawson and Guare 2010, 2012, 2018), based on the psychometric and expert review procedures described in the current manuscript.

In the original ESQ versions and popular books, Dawson and Guare conceptualize EF as “executive skills” (ESs; Dawson and Guare 2010, 2012, 2018). This term highlights the malleability of “skills,” as opposed to the traditional conceptualization of EF as “abilities,” which implies inherent or stable competencies that cannot be successfully intervened upon. The conceptualization of EF as skills also encompasses broader academic and behavioral manifestations of the major EFs than traditional lab EF tasks. These skills are observed when individuals apply core EFs to real-world academic tasks, such as studying for tests, planning big projects, and paying attention in class. The 11 ES areas included in the original ESQ versions and on the ESQ-R are planning/prioritization (P), organization (O), time management (TM), working memory (WM), metacognition (M), response inhibition (RI), emotional control (EC), sustained attention (SA), task initiation (TI), flexibility (F), and goal-directed persistence (GDP). The ESQ-R includes items designed to measure this broad range of ESs to emphasize academic applicability and strengthen the link to intervention.

The ESQ-R directions state, “Read each item and decide how often it’s a problem for you.” We changed the original ESQ response scales, which ranged from Strongly Disagree (1) to Strongly Agree (7) on the adult version (Dawson and Guare 2012) and Big Problem (1) to No Problem (5) on the earlier child/adolescent student version (Dawson and Guare 2010), to a frequency-based response scale that includes the following options: Never or Rarely (0), Sometimes (1), Often (2), and Very Often (3). This frequency-based response scaling method better reflects an attempt to measure the quantity of constructs (DeVellis 2017), enhances sensitivity to change over short periods of time (Fok and Henry 2015), and is similar to the response scaling on some of the most well-validated behavior self-report scales (e.g., the Behavior Assessment System for Children, Third Edition; Reynolds and Kamphaus 2015). Items describe difficulties with ESs. Scores for each item range from 0 to 3, with higher scores indicating more ES problems. The order of items is randomized for each participant to minimize order effects.

The original ESQ checklists include slightly different versions for different age groups (Dawson and Guare 2010, 2012, 2018). We selected the items from the versions geared toward older students that were most relevant to academic tasks and most representative of the current scientific understanding of EF (Chan et al. 2008; Diamond 2014; Miyake et al. 2000; Miyake and Friedman 2012). After selecting items from the original ESQs to represent each of the 11 skill areas proposed by Dawson and Guare, we refined item wording using guidelines from survey and scale development literatures (e.g., eliminating compound and confusing items, rewording negatively worded items, removing specific examples to broaden applicability, etc.; DeVellis 2017; Fok and Henry 2015; Holmbeck and Devine 2009; Visser et al. 2000).

An initial 32-item pool was reviewed by one of the authors of the original ESQ versions and the books in which they appear. This content expert is a doctoral-level licensed psychologist and school psychologist with extensive training, clinical expertise, and publications and presentations focusing on ESs. We pilot tested the initial 32 items with 30 college students (90% undergraduate; 67% women, 30% men, and 3% prefer not to say; 47% White, 10% Black or African-American, 10% Asian, 3% American Indian or Alaskan Native, 3% prefer not to say; 27% Hispanic, Latino, or Spanish origin). For the 32-item pilot version, internal consistency was good (Cronbach’s alpha = 0.95); however, expert review indicated that some ES areas were underrepresented. Applying the Spearman-Brown Prophecy Formula suggested that increasing the number of items to 60 would increase internal consistency reliability to 0.97. With consultation from the ES expert, we created new items to reflect ESs in underrepresented areas, such that each ES had a minimum of four items, until the development team, and the expert agreed that all ESs received adequate coverage. This resulted in an expanded candidate item pool of 61 items, which was administered to study sample participants and used in the factor analyses described below. The 61 candidate items were distributed across the ES areas hypothesized by Dawson and Guare (2010) as follows: five planning/prioritization (P), five organization (O), six time management (TM), five working memory (WM), six metacognition (M), five response inhibition (RI), six emotional control (EC), seven sustained attention (SA), four task initiation (TI), six flexibility (F), and six goal-directed persistence (GDP). Internal consistency for the total sample (n = 374) for the 61-item version was good (Cronbach’s alpha = 0.96). Test-retest reliability for the subsample who took the measures twice (n = 38) was r = .74 for all 61 candidate items.

Convergent Validity Measures

We administered two additional nonclinical, self-report EF scales that are also widely available, efficient, and suitable for older students. The 14-item Adult Executive Functioning Inventory, Self-Report form (ADEXI; Holst and Thorell 2018) measures skills and behaviors related to working memory and inhibitory control. Participants use a five-point scale (e.g., 1 = Not True and 5 = Definitely True) to respond to statements by indicating how well each one “describes [them] as a person.” Higher scores indicate more difficulties. Holst and Thorell reported a two-factor solution for the ADEXI, with working memory and inhibitory control factors highly correlated (r = .69). Reported internal consistency was .91 for the whole sample, and .89 for the nonclinical sample. Test-retest reliability from bivariate correlations was .71, and using intra-class correlations, it was .67. For the current sample (n = 374), the ADEXI shows adequate internal consistency (Cronbach’s alpha = .90) but poor test-retest reliability (r = .52, n = 38).

We also used eight EF-relevant items from the Current Behavior Scale (CBS; see Biederman et al. 2008), with modified instructions (past 2 weeks instead of past 6 months, to emphasize current behaviors). This measure is informal and published only as part of a research manuscript but is similar to and authored by the developer of the BDEFS (Barkley 2011). For the current sample (n = 374), the CBS showed adequate internal consistency (Cronbach’s alpha = .91) but low test-retest reliability (r = .52, n = 38).

Discriminant Validity Measures

Psychological difficulties such as depression, anxiety, and stress are known to negatively impact EF (Ajilchi and Nejati 2017; Wingo et al. 2013) but are not synonymous with EF difficulties. The Generalized Anxiety Disorder 7-item scale (GAD-7; Spitzer et al. 2006) is a global measure of anxiety symptoms in the past 2 weeks. Participants use a four-point scale (e.g., 0 = Not at all sure and 3 = Nearly every day) to rate how often they experience specific anxiety symptoms. Higher scores indicate higher levels of anxiety. The GAD-7 has internal consistency at .92 (Cronbach’s alpha) and test-retest reliability of .83 (intra-class correlation). In the current sample, internal consistency was alpha = .91 (n = 364) and test-retest reliability was r = .79 (n = 38).

The 10-item Perceived Stress Scale (PSS; Cohen et al. 1983) assesses global perceived situational stress levels. The instructions ask participants to use a 5-point scale (e.g., 0 = Never and 4 = Very Often) that indicate how often they experienced various feelings and thoughts in the past month. Four items are positively worded (and reverse scored), and the other six are negatively worded (i.e., ask about problems). Higher scores mean higher levels of stress. For the current sample, internal consistency was alpha = .79 (n = 374) and test-retest reliability was r = .58 (n = 38).

The 21-item Depression Anxiety Stress Scales (DASS-21; Lovibond and Lovibond 1995) asks participants to indicate how given statements applied to them over the past week. There are seven items for each subscale: Depression, Anxiety, and Stress. Participants rate items on a four-point scale (e.g., 0 = Did not apply to me at all and 3 = Applied to me most of the time). All items are worded in terms of problems, such that higher scores mean more symptoms. Research with a large non-clinical adult sample indicated good internal consistency (Cronbach’s alpha values = .91, .80, and .84 for Depression, Anxiety, and Stress, respectively) and supported the three-factor structure for Depression, Anxiety, and Stress subscales (Sinclair et al. 2012). For the current sample, internal consistency was alpha = .94 (n = 364) for the total DASS score, .88 for the Depression scale, .85 for Anxiety, and .86 for Stress. Test-retest correlations were .69 for total score, .59 for Depression, .60 for Anxiety, and .74 for Stress (n = 38).

Criterion Validity Measures

We evaluated criterion validity for the ESQ-R by investigating correlations with university grade point average (GPA) and academic engagement, an important correlate of achievement and adjustment (Zhang et al. 2012). Students self-reported current GPA on the university’s standard 4.0-point scale. Grade data could not be obtained directly from the university for administrative reasons; however, a previous meta-analysis showed that “self-reported grades generally predict outcomes [such as future GPA] to a similar extent as actual grades” (Kuncel et al. 2005, p. 76), and the correlation between self-reported and actual GPAs in their combined sample of 12,089 college students was r = .90.

The 21-item Student Course Engagement Questionnaire (SCEQ; Handelsman et al. 2005) asks students to rate their academic engagement over the past week on a scale from 1 = Not at All Engaged to 5 = Very Engaged. The measure has a four-factor structure: skills, emotions, participation, and performance. Internal consistency ranged from .76 to .82 for the scales. The SCEQ authors reported factor analysis results showing that SCEQ scores explained 26% of the variance in homework grades and 30% in final exam grades. For the current sample, internal consistency was alpha = .91 (n = 364) and test-retest reliability was r = .79 (n = 38).

Results

Factor Analyses

We conducted a series of exploratory factor analyses (EFAs) and confirmatory factor analyses (CFAs) to reduce the item pool. No missing data handling was necessary, as only participants with full data for the ESQ-R were included in factor analyses. The latent trait hypothesized to underlie all ESQ-R items is ES difficulties, with higher scores associated with more difficulties. First, we conducted principal components analyses (PCA) with Varimax rotation and without constraints on number of factors, using the criteria of Eigenvalues greater than 1.0, in SPSS 25.0. We inspected the results and reduced the number of factors for subsequent models when there were fewer than three items on an identified factor. We then ran the EFA on the same items and constrained the analysis to that number of factors (e.g., five) and flagged items for removal that showed either (a) no loadings above 0.40 on any factor or (b) loadings of 0.40 or above on more than one factor (cross-loadings). Flagged items were removed after expert review for content (i.e., to ensure approximately equal weight given to the full range of ES areas and to reduce redundancy with other items; items from original ESQ versions published in Dawson and Guare books were given priority for retention).

Using Mplus 7.1 (Muthén and Muthén 2013), we ran the remaining items through separate CFAs for each of the factors from the initial EFA. Additional problematic items were identified based on suggested correlations among items in the CFA modification indices. Items with high correlations were removed when the content overlapped with correlated items (again, with priority given to retaining original ESQ version items). Once items were removed, the new item pool was assessed through another series of PCAs and CFAs. This process continued until each CFA had either good fit or was just-identified (i.e., the number of estimated parameters is equal to the number of elements in the covariance matrix, which results in zero degrees of freedom and an inability to estimate fit statistics), and no additional meaningful modification indices were present.

In the first round of analyses, the EFA identified eight poorly fitting items. After examining item content to ensure removal would not narrow representation of the range of ES areas, these items were dropped from further analyses. Another EFA was then conducted with the remaining 53 items, and a five-factor model was identified. Fit for each factor in the CFAs was mixed or poor (χ²(9–299) = 82.15–683.04, all p < .001; RMSEA = .06–.21; CFI = .76–.94), and modification indices identified additional items to consider for removal. After examining item content, 11 additional items were dropped from the analyses, which resulted in 42 remaining items (of the initial 61-item pool).

In the second round of analyses, the EFA indicated that these 42 items still fit a five-factor model. One item cross-loaded on multiple factors and, after examining item content, was removed from further analyses. Separate CFAs were run for each factor from the EFA using the remaining 41 items. Fit for three factors was mixed (χ²(2–230) = 25.29–497.03.04, all p < .001; RMSEA = .06–.18; CFI = .90–.95), fit for one factor was good (χ²(9) = 10.44, p = .32; RMSEA = .02; CFI = 1.00), and one factor was just-identified. CFA modification indices identified items to consider for removal, and six additional items were dropped from further analyses, which left 35 items.

For the third round of analyses, the EFA again supported a five-factor model. Based on cross-loadings and item content, an additional six items were dropped from further analyses. Separate CFAs were run for each factor from the EFA using the remaining 29 items. Fit for one factor was acceptable (χ²(119) = 226.88, p < .001; RMSEA = .05; CFI = .96), fit for two factors was good (χ²(2) = 3.52, p = .17; RMSEA = .05; CFI = 1.00 and χ²(2) = 3.48, p = .18; RMSEA = .05; CFI = .99), and two factors were just-identified. Using modification indices and item content, an additional four items were dropped from the one factor with acceptable fit, which left 25 items from the original measure. After dropping these four items, the factor that previously had acceptable fit had good fit (χ²(44) = 54.68, p = .13; RMSEA = .03; CFI = .99). After establishing good fit in the individual factors, a full model estimating all five factors was run. The full five-factor model had acceptable fit (χ²(265) = 423.38, p < .001; RMSEA = .04; CFI = .95). Table 2 shows the 25 retained items’ loadings on the five factors.

Table 2 Retained items on the 25-item ESQ-R Scale and factor loadings

Full size table

We used participants’ total scores from this 25-item ESQ-R version to estimate reliability and correlations with other measures, with the total score calculated as the sum of scores for all 25 items (possible range 0–75). The items were distributed across the ES areas hypothesized by Dawson and Guare (2010) as follows: two planning/prioritization (P), two organization (O), two time management (TM), two working memory (WM), three metacognition (M), three response inhibition (RI), four emotional control (EC), two sustained attention (SA), one task initiation (TI), two flexibility (F), and two goal-directed persistence (GDP).

Reliability

Reliability estimates were calculated for ESQ-R total scores using Classical Test Theory (CTT). For the 25-item ESQ-R total score, internal consistency estimates were excellent: Cronbach’s alpha = .91, Guttman split-half coefficient = .91. Internal consistency estimates for the items in the five factors described previously were as follows: .89 for Factor 1 (11 items), .74 for Factor 2 (4 items), .76 for Factor 3 (3 items), .75 for Factor 4 (3 items), and .65 for Factor 5 (4 items; see Table 2 for items included in each factor).

Test-retest reliability correlation between ESQ-R total scores (25-item version) at Time 1 and Time 2 was r = .70 (n = 38). There was no significant effect of delay interval (between Time 1 and Time 2) on Time 2 ESQ-R scores (using the final 25 items) (β = .05, t(36) = 1.33, p = .19), or on absolute difference between Time 1 and 2 ESQ-R scores (β = .003, t(36) = 0.19, p = .85). Thus, despite variability in delay among the 38-participant subsample, delay was not associated with a consistent score increase or decrease, and scores did not become significantly more inconsistent over time.

Convergent, Discriminant, and Criterion Validity

Table 3 shows correlations between ESQ-R total scores and other EF measures (range r = .56–.74), as well as psychological symptom measures (r = .38–.55), student academic engagement (r = −.40), and college GPA (r = −.07). All correlations were statistically significant at the p < .001 level, except for that between ESQ-R scores and GPA (r = − .07, p = .175, n = 374). ESQ-R scores were also significantly correlated with age (r = − .118, p = .023, n = 373). Notably, of all the measures administered in the current study, only the student academic engagement measure correlated significantly with GPA (r = .199, p < .001, n = 374).

Table 3 Correlations among measures

Full size table

Discussion

The current study described the refinement and initial psychometric evaluation of the Executive Skills Questionnaire-Revised (ESQ-R), a self-report ES rating scale. We designed the ESQ-R to adequately represent a range of ESs important for academic success in a way that is specifically tied to available EF interventions.

ESQ-R Development

Initial development was successful in improving the efficiency of the measure by reducing the candidate pool of 61 items to a final 25-item version that retained representation of all 11 ES areas originally hypothesized by Dawson and Guare (2010). Notably, factor analyses resulted in a five-factor structure for the ESQ-R, which deviates from Dawson and Guare’s (2010) original conception of 11 distinct ES areas. The five-factor model appears to represent ESs related to making, adjusting, monitoring, and sticking with a plan (Factor 1 Plan Management: 11 items); and time management and switching tasks (Factor 2 Time Management: 4 items), organization of materials (Factor 3 Materials Organization: 3 items), emotional regulation (Factor 4 Emotional Regulation: 3 items), and impulsivity/inhibition (Factor 5 Behavioral Regulation: 4 items). All factors had internal consistency estimates at or above .70 in the current example, except for Factor 5. This suggests that additional development may be needed for consistent and comprehensive representation of behavioral regulation in the ESQ-R.

The first factor is the largest and includes items originally hypothesized to represent Dawson and Guare’s ES areas of planning/prioritization, metacognition, emotional control, sustained attention, flexibility, and goal-directed persistence. It is common to identify a large and inclusive first factor, but this factor’s representation of multiple ES areas from Dawson and Guare’s (2010) model complicates the one-to-one link between ESQ-R scores and specific ES area interventions. Although future test development efforts may identify items that would more clearly distinguish among the included ES areas in this factor, it is also possible that there are fewer distinguishable ES areas than originally hypothesized and that interventions for one ES area in this factor (e.g., metacognition) would result in skill transfer to related areas (e.g., planning/prioritization). This hypothesis would need to be explicitly tested with participants receiving ES interventions.

The latter explanation is supported by the similarity of the current results to those found for clinical EF rating scales, which generally include two to five factors. For example, Barkley (2011) identified five EF factors measured on the BDEFS, and Roth and colleagues (2013) identified a three-factor structure on the BRIEF-A. In the future, researchers should examine the relations among factors on the ESQ-R and such clinical EF scales. Unfortunately, we were unable to include these scales because of funding constraints. The cost of such measures, as previously noted, is also a likely obstacle for practitioners wishing to use them to plan, tailor, and evaluate interventions—especially at multiple time points and with large groups of students.

Psychometric Properties of the 25-Item ESQ-R

For the 25-item ESQ-R total score, we found excellent internal consistency and adequate test-retest reliability across a wide time range, with no significant effects of delay on scores. These are important properties for an intervention-focused measure. Additionally, although our current test-retest sample was small, the ESQ-R had the strongest test-retest reliability (r = .70) when compared to other researcher-developed EF measures (r = .52 for CBS and ADEXI), and ESQ-R scores were moderately related to those scores (r = .69 to .74). Although further research regarding test-retest stability vs. sensitivity to change is needed, the current results are promising for the ESQ-R as a repeatable measure to measure intervention outcomes.

The current convergent and discriminant validity results are also promising and begin to elucidate a nomological network. Examining patterns in Table 3, ESQ-R correlations with EF measures were all higher than those between the ESQ-R and psychological symptom measures (e.g., DASS, GAD, and PSS), and all correlations were in the expected directions (i.e., scores indicating more problems on the ESQ-R were associated with scores indicating more psychological symptom).

Further, ESQ-R scores were significantly correlated with student academic engagement scores (SCEQ), which were, in turn, correlated with students’ self-reported GPAs. However, the direct correlation between ESQ-R scores and GPA was not significant. This is likely because as Table 1 indicates, our sample had relatively high GPAs, which may have led to restriction of range and hampered our ability to detect significant correlations at the lower end of the GPA distribution—with struggling students, where ESs may matter most. Of all the measures administered in the current study, only academic engagement scores showed a significant correlation with GPA, suggesting that the low ESQ-R–GPA correlation is likely a shared feature of all EF rating scales and not unique to the ESQ-R.

Limitations and Future Research

The current study has several limitations and areas for future improvement. First, sample size and representativeness could be improved and, with cross-validation studies, may show slightly different results regarding correlations and reliability estimates, as well as alternative factor structures. The test-retest sample should be expanded, especially given the intent of the ESQ-R as an intervention-focused measure. Further, the current sample included multiple individuals with (self-reported) disability conditions such as attention deficit/hyperactivity disorder (ADHD), which could have influenced results; however, other measures have included such heterogeneous samples and have touted this inclusivity as an advantage (see Barkley 2011). In fact, future studies may benefit from explicitly recruiting clinical samples with conditions known to involve EF impairment, such as ADHD.

Future research should focus on further scale refinement and additional psychometric data collection to support using the ESQ-R as a comprehensive but efficient measure for informing and measuring effectiveness of EF interventions. We plan to further evaluate basic psychometric properties (e.g., factor structure in different samples) and advanced measurement characteristics (e.g., invariance across cultural and clinical groups), as well as relations among scores on the ESQ-R and other measures of improvement (e.g., actual GPA, grades, other EF scales, and tests) across time. Future studies should also examine ESQ-R scores as predictors of retention and other important academic outcomes. Finally, we plan to evaluate the psychometric properties of the 25-item ESQ-R with an extended age range, including middle and high school students, and to adjust item content according to the resulting data. This will increase applicability to different populations of older students who may benefit from ES interventions and for practitioners who need efficient, reliable measures to evaluate these possible benefits.

Conclusion

In the current study, we addressed limitations of available EF measures by developing a comprehensive but time- and cost-efficient ES self-report scale with adequate to excellent reliability and validity. Future studies are needed to increase sample representativeness and expand psychometric evidence. Given the current results, the ESQ-R is a promising tool for practitioners to plan, tailor, and evaluate the effectiveness of interventions for multiple ES areas.

References

Ajilchi, B., & Nejati, V. (2017). Executive functions in students with depression, anxiety, and stress symptoms. Basic and Clinical Neuroscience, 8(3), 223–232. https://doi.org/10.18869/nirp.bcn.8.3.223.
Article PubMed PubMed Central Google Scholar
Baars, M. A. E., Nije Bijvank, M., Tonnaer, G. H., & Jolles, J. (2015). Self-report measures of executive functioning are a determinant of academic performance in first-year students at a university of applied sciences. Frontiers in Psychology, 6, 1131. https://doi.org/10.3389/fpsyg.2015.01131.
Article PubMed PubMed Central Google Scholar
Barkley, R. A. (1991). The ecological validity of laboratory and analogue assessment methods of ADHD symptoms. Journal of Abnormal Child Psychology, 19(2), 149–178. https://doi.org/10.1007/BF00909976.
Article PubMed Google Scholar
Barkley, R. A. (2011). Barkley deficits in executive functioning scale (BDEFS). New York, NY: Guilford Press.
Google Scholar
Barkley, R. A. (2012). Executive functions: what they are, how they work, and why they evolved. In Choice (Vol. 50, p. 762). https://doi.org/10.5860/CHOICE.50-2366.
Best, J. R., Miller, P. H., & Naglieri, J. A. (2011). Relations between executive function and academic achievement from ages 5 to 17 in a large, representative national sample. Learning and Individual Differences, 21(4), 327–336. https://doi.org/10.1016/j.lindif.2011.01.007.
Article PubMed PubMed Central Google Scholar
Biederman, J., Petty, C. R., Fried, R., Doyle, A. E., Mick, E., Aleardi, M., Monuteaux, M. C., Seidman, L. J., Spencer, T., Faneuil, A. R., Holmes, L., & Faraone, S. V. (2008). Utility of an abbreviated questionnaire to identify individuals with ADHD at risk for functional impairments. Journal of Psychiatric Research, 42(4), 304–310. https://doi.org/10.1016/j.jpsychires.2006.12.004.
Article PubMed Google Scholar
Chan, R. C. K., Shum, D., Toulopoulou, T., & Chen, E. Y. H. (2008). Assessment of executive functions: Review of instruments and identification of critical issues. Archives of Clinical Neuropsychology, 23(2), 201–216. https://doi.org/10.1016/j.acn.2007.08.010.
Article PubMed Google Scholar
Cohen, S., Kamarck, T., & Mermelstein, R. (1983). A global measure of perceived stress. Journal of Health and Social Behavior, 24(4), 385–396. https://doi.org/10.2307/2136404.
Article PubMed Google Scholar
Collins, A., Koechlin, E., Simon, H., Kahneman, D., Tversky, A., Cohen, J. D., et al. (2012). Reasoning, learning, and creativity: Frontal lobe function and human decision-making. PLoS Biology, 10(3), e1001293. https://doi.org/10.1371/journal.pbio.1001293.
Article PubMed PubMed Central Google Scholar
Crede, M., & Kuncel, N. R. (2008). Study habits, skills, and attitudes: The third pillar supporting collegiate academic performance. Perspectives on Psychological Science, 3(6), 425–454.
Article Google Scholar
Credé, M., & Kuncel, N. R. (2008). Study habits, skills, and attitudes: The third pillar supporting collegiate academic performance. Perspectives on Psychological Science, 3(6), 425–453. https://doi.org/10.1111/j.1745-6924.2008.00089.x.
Article PubMed Google Scholar
Dawson, P., & Guare, R. (2010). Executive skills in children and adolescents: A practical guide to assessment and intervention. Guilford Press.
Dawson, P., & Guare, R. (2012). Coaching students with executive skills deficits. New York, NY: Guilford Press.
Google Scholar
Dawson, P., & Guare, R. (2018). Executive skills in children and adolescents: A practical guide to assessment and intervention (3rd ed.). New York: Guilford Press.
Dehn, M. (2008). Working memory and academic learning. Hoboken, NJ: John Wiley & Sons, Ltd..
Google Scholar
Dempster, F. N. (1992). The rise and fall of the inhibitory mechanism: Toward a unified theory of cognitive development and aging. Developmental Review, 12(1), 45–75. https://doi.org/10.1016/0273-2297(92)90003-K.
Article Google Scholar
DeVellis, R. F. (2017). Scale development. SAGE publications. https://doi.org/10.1017/CBO9781107415324.004
Diamond, A. (2014). Executive functions. Annual Review of Clinical Psychology, 64, 135–168. https://doi.org/10.1146/annurev-psych-113011-143750.Executive.
Article Google Scholar
Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11(1), 19–23.
Article Google Scholar
Fok, C. C. T., & Henry, D. (2015). Increasing the sensitivity of measures to change. Prevention Science, 16(7), 978–986. https://doi.org/10.1007/s11121-015-0545-z.
Article PubMed PubMed Central Google Scholar
Handelsman, M. M., Briggs, W. L., Sullivan, N., Towler, A., Handelsman, M. M., Briggs, W. L., et al. (2005). A measure of college student course engagement. Journal of Educational Research, 98(3), 184–192. https://doi.org/10.3200/JOER.98.3.184-192.
Article Google Scholar
Holmbeck, G. N., & Devine, K. A. (2009). Editorial: An author’s checklist for measure development and validation manuscripts. Journal of Pediatric Psychology, 34(7), 691–696. https://doi.org/10.1093/jpepsy/jsp046.
Article PubMed PubMed Central Google Scholar
Holst, Y., & Thorell, L. B. (2018). Adult executive functioning inventory (ADEXI): Validity, reliability, and relations to ADHD. International Journal of Methods in Psychiatric Research, 27(1). https://doi.org/10.1002/mpr.1567.
Isquith, P. K., Roth, R. M., & Gioia, G. (2013). Contribution of rating scales to the assessment of executive functions. Applied Neuropsychology: Child, 2(2), 125–132. https://doi.org/10.1080/21622965.2013.748389.
Article Google Scholar
Kuncel, N. R., Crede, M., & Thomas, L. L. (2005). The validity of self-reported grade point averages, class ranks, and test scores: A meta-analysis and review of the literature. Review of Educational Research, 75(1), 63–82. https://doi.org/10.3102/00346543075001063.
Article Google Scholar
Lezak, M. D. (1995). Neuropsychological assessment (3rd ed.). New York: Oxford University Press.
Google Scholar
Lovibond, S. H., & Lovibond, P. F. (1995). Manual for the depression anxiety stress scales (2nd ed.). Sydney: Psychology Foundation.
Google Scholar
Lunt, L., Bramham, J., Morris, R. G., Bullock, P. R., Selway, R. P., Xenitidis, K., & David, A. S. (2012). Prefrontal cortex dysfunction and ‘jumping to conclusions’: Bias or deficit? Journal of Neuropsychology, 6(1), 65–78. https://doi.org/10.1111/j.1748-6653.2011.02005.x.
Article PubMed Google Scholar
Miyake, A., & Friedman, N. P. (2012). The nature and Organization of Individual Differences in executive functions: Four general conclusions. Current Directions in Psychological Science, 21(1), 8–14. https://doi.org/10.1177/0963721411429458.
Article PubMed PubMed Central Google Scholar
Miyake, A., Friedman, N. P., Emerson, M. J., Witzki, A. H., Howerter, A., & Wager, T. D. (2000). The Unity and Diversity of executive functions and their contributions to complex “frontal lobe” tasks: A latent variable analysis. Cognitive Psychology, 41(1), 49–100. https://doi.org/10.1006/cogp.1999.0734.
Article PubMed Google Scholar
Muthén, L. K., & Muthén, B. O. (2013). Mplus 7.11. Los Angeles, CA: Muthén & Muthén.
Petersen, R., Lavelle, E., & Guarino, A. J. (2006). The relationship between college students’ executive functioning and study strategies. Journal of College Reading and Learning, 36(2), 59–67. https://doi.org/10.1080/10790195.2006.10850188.
Article Google Scholar
Rabin, L. A., Fogel, J., & Nutter-Upham, K. E. (2011). Academic procrastination in college students: The role of self-reported executive function. Journal of Clinical and Experimental Neuropsychology, 33(3), 344–357. https://doi.org/10.1080/13803395.2010.518597.
Article PubMed Google Scholar
Reynolds, C. R., & Kamphaus, R. W. (2015). Behavior Assessment Systems for Children: Third Edition (BASC-3) (3rd ed.). Bloomington, MN: Pearson.
Google Scholar
Roth, R. M. (2005). Behavior rating inventory of executive function—adult version. Lutz, Florida: Psychological Assessment Resources, Inc.
Samuels, W. E., Tournaki, N., Blackman, S., & Zilinski, C. (2016). Executive functioning predicts academic achievement in middle school: A four-year longitudinal study. The Journal of Educational Research, 109(5), 478–490. https://doi.org/10.1080/00220671.2014.979913.
Article Google Scholar
Sinclair, S. J., Siefert, C. J., Slavin-Mulford, J. M., Stein, M. B., Renna, M., & Blais, M. A. (2012). Psychometric evaluation and normative data for the depression, anxiety, and stress Scales-21 (DASS-21) in a nonclinical sample of U.S. adults. Evaluation & the Health Professions, 35(3), 259–279. https://doi.org/10.1177/0163278711424282.
Article Google Scholar
Spitzer, R. L., Kroenke, K., Williams, J. B. W., & Löwe, B. (2006). A brief measure for assessing generalized anxiety disorder: The GAD-7. Archives of Internal Medicine, 166(10), 1092–1097. https://doi.org/10.1001/archinte.166.10.1092.
Article PubMed Google Scholar
Toplak, M. E., West, R. F., & Stanovich, K. E. (2013). Practitioner review: Do performance-based measures and ratings of executive function assess the same construct? Journal of Child Psychology and Psychiatry and Allied Disciplines, 54(2), 131–143. https://doi.org/10.1111/jcpp.12001.
Article Google Scholar
Visser, P. S., Krosnick, J. O. N. A., Lavraws, P. J., Visser, P. S., Krosnick, J. O. N. A., & Lavraws, P. J. (2000). Survey research. In H. T. Reis & C. M. Judd (Eds.), Handbook of research methods in social psychology (pp. 223–252). New York: Cambridge University Press.
Google Scholar
Weinstein, C. E., Palmer, D. R., & Schulte, A. C. (1987). Learning and study strategies inventory (LASSI). Clearwater, FL: H & H.
Google Scholar
Wingo, J., Kalkut, E., Tuminello, E., Asconape, J., & Han, S. D. (2013). Executive functions, depressive symptoms, and college adjustment in women. Applied Neuropsychology, 20(2), 136–144. https://doi.org/10.1080/09084282.2012.670154.
Article PubMed Google Scholar
Zhang, Y., Gan, Y., Cham, H., Wang, M.-T., Willett, J. B., Eccles, J. S., et al. (2012). School engagement trajectories and their differential predictive relations to dropout. Journal of Adolescence, 74(4), 274–283. https://doi.org/10.1002/pits.
Article Google Scholar

Download references

Funding

The authors received no external funding for this study.

Author information

Authors and Affiliations

Department of Clinical, Health, and Applied Sciences, University of Houston-Clear Lake, 2700 Bay Area Blvd, Houston, TX, 77058, USA
Julia Englund Strait & Gerald Gill Strait
Center for Learning and Attention Disorders, 1149 Sagamore Ave, Portsmouth, NH, 03801, USA
Peg Dawson
Department of Psychology, University of Houston-Clear Lake, Houston, TX, USA
Christine A. P. Walther
Department of Psychological, Health, and Learning Sciences, University of Houston, 3657 Cullen Blvd., Room 491, Houston, TX, 77204, USA
Amy K. Barton
Department of Psychology, Utah State University, 2810 Old Main Hill, Logan, UT, 84322, USA
Maryellen Brunson McClain

Authors

Julia Englund Strait
View author publications
You can also search for this author in PubMed Google Scholar
Peg Dawson
View author publications
You can also search for this author in PubMed Google Scholar
Christine A. P. Walther
View author publications
You can also search for this author in PubMed Google Scholar
Gerald Gill Strait
View author publications
You can also search for this author in PubMed Google Scholar
Amy K. Barton
View author publications
You can also search for this author in PubMed Google Scholar
Maryellen Brunson McClain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julia Englund Strait.

Ethics declarations

Conflict of Interest

Julia Englund Strait, Ph.D., declares that she has no conflict of interest.

Peg Dawson, Ed.D., is a co-author of the original ESQ rating scales and the books where they appear.

Christine AP Walther, Ph.D., declares that she has no conflict of interest.

Gerald Gill Strait, Ph.D., declares that he has no conflict of interest.

Amy K. Barton declares that she has no conflict of interest.

Maryellen Brunson McClain, Ph.D., declares that she has no conflict of interest.

Ethical Approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed Consent

Informed consent and, when appropriate for age, parental consent and child assent were obtained for all individual participants included in the study in procedures approved by the university IRB (Committee for the Protection of Human Subjects or CPHS).

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Strait, J.E., Dawson, P., Walther, C.A.P. et al. Refinement and Psychometric Evaluation of the Executive Skills Questionnaire-Revised. Contemp School Psychol 24, 378–388 (2020). https://doi.org/10.1007/s40688-018-00224-x

Download citation

Published: 07 January 2019
Issue Date: December 2020
DOI: https://doi.org/10.1007/s40688-018-00224-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Refinement and Psychometric Evaluation of the Executive Skills Questionnaire-Revised

Abstract

Similar content being viewed by others

The Relationship Between Self-Regulated Learning and Executive Functions—a Systematic Review

Cross-Battery Approach to the Assessment of Executive Functions

Validating Rating Scales for Executive Functioning across Education Levels and Informants

Measurement of EF

Currently Available EF and Related Scales

The Current Study

Method

Participants

Procedures

Measures

The Executive Skills Questionnaire-Revised (ESQ-R)

Convergent Validity Measures

Discriminant Validity Measures

Criterion Validity Measures

Results

Factor Analyses

Reliability

Convergent, Discriminant, and Criterion Validity

Discussion

ESQ-R Development

Psychometric Properties of the 25-Item ESQ-R

Limitations and Future Research

Conclusion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Informed Consent

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation