Introduction

Youth with internalizing disorders are common, but underserved in community and school care. They often “fly under the radar” because they tend not to disrupt class or violate school rules, and many may not have obvious outward displays of symptoms. Although under-identified, epidemiological research indicates that approximately 14 and 32 % of American youth will have a mood or anxiety disorder, respectively, at some point during childhood or adolescence (Merikangas et al., 2010). Internalizing forms of psychopathology are associated with a host of negative psychosocial and academic outcomes, including impaired social relationships, engaging in substance use and other risky behavior, development of future mental health problems, and decreased academic achievement or school failure (Copeland, Miller-Johnson, Keeler, Angold, & Costello, 2007; Grover, Ginsburg, & Ialongo, 2007; Ialongo, Edelsohn, & Kellam, 2001). Compared with the rates of mental health services received by youth with disruptive behavior disorders, children with internalizing disorders are particularly unlikely to receive any treatment for their disorder (Bradshaw, Buckley, & Ialongo, 2009; Merikangas et al., 2011). When treatment is received, it is commonly within the school setting (Merikangas et al., 2011). A recent meta-analysis supports that both depression and anxiety disorders can be treated in schools effectively using cognitive-behavioral interventions traditionally delivered in clinical or research settings (Mychailyszyn, Brodman, Read, & Kendall, 2012). For schools to provide mental health services in a proactive and preventative fashion, accurate and early identification of vulnerable children with elevated symptomatology is crucial.

Prevalence and Form of Anxiety and Depression in Children

Anxiety disorders are most likely to first manifest during the elementary school years, whereas onset of a depressive episode occurs typically later, during adolescence (Merikangas et al., 2010). Among elementary school-aged children, prevalence rates for anxiety range widely, from 5 to 6 % point prevalence estimates among 9–10 year olds in prospective research (Copeland, Angold, Shanahan, & Costello, 2014) to a cumulative prevalence of 23 % by age 10 using retrospective recall methods (Merikangas et al., 2010). Rates are consistently smaller for depressive disorders among children (i.e., 1–2 %; Goldman, 2012). Even more children have sub-threshold levels of internalizing disorders, symptoms that are problematic in and of them and predict more severe forms of emotional distress later.

The nature of the primary symptoms of depression vary across developmental stages, with school-aged children more apt to show signs of irritability or acting out behaviors, more likely to report a lack of fun (vs. boredom in adolescents) and somatic complaints, and less likely to manifest symptoms of sleep disturbance or appetite/weight change (Goldman, 2012). Although anxiety disorders are marked by heterogeneity of symptoms across distinct disorders, common symptoms that reflect unique expressions of anxiety in children include tantrums, crying, freezing, clinging to caregivers, and shrinking from social situations, as well as less recognition that the fear is unreasonable or excessive (Beesdo-Baum & Knappe, 2012). Most common forms of anxiety in elementary school children include separation anxiety, phobias, and social anxiety, whereas panic attacks, agoraphobia, and generalized anxiety are more likely to onset in adolescence (Beesdo-Baum & Knappe, 2012). Both mood and anxiety disorders are characterized by persistence of symptoms across time, from a period of weeks (major depressive episode) to months (social anxiety, specific phobia; American Psychiatric Association, 2013). Brief symptoms of sadness or fearfulness that remit spontaneously may be less concerning to school mental health providers than symptoms that persist across even short durations, such as from one week to the next.

Methods of Identifying Students in Need of Mental Health Services

Masia Warner and Fox (2012) pointed out that a major challenge to researchers focused on school-based intervention for internalizing disorders is “to find effective and efficient methods to reach students with anxiety and depression,” as the school-wide screenings that may be best practice are unlikely to be feasible without external support (p. 194). Relying on teachers to systematically identify which students display clearly described symptoms of mental health problems is one option that is appealing due to its face validity (teachers presumably know their students well due to their daily contact), efficiency, and relatively low cost. While teacher nominations are an effective way to identify students with externalizing behaviors (Kalberg, Lane, Dricoll, & Wehby, 2011; Lane & Menzies, 2005), teachers’ ability to recognize internalizing symptoms is less established. Nevertheless, educators are often relied on to serve as gatekeepers in the first stage of identifying students who may benefit from mental health services (e.g., Walker & Severson, 1992), including through school–community partnerships (e.g., McLennan, Reckord, & Clarke, 2008) or efficacy trials (e.g., Chiu et al., 2013). To shed light on the validity of this method, we evaluated the accuracy of teacher nominations in identifying elementary school students who repeatedly reported elevated levels of anxiety and depression. A summary of extant empirical support for this and alternate methods follows.

Common methods of systematically identifying students in need of mental health services include universal screening of all students’ symptoms using behavior rating scales completed by student self-report or by informants (i.e., parents, teachers), review of archival data sources, namely office discipline referrals, and teacher nominations (Dywer, Nicholson, & Battistutta, 2006; Layne, Bernstein, & March, 2006; Levitt, Saka, Romanelli, & Hoagwood, 2007). A method’s accuracy is often discussed in terms of its sensitivity and specificity (Levitt et al., 2007). Sensitivity refers to the proportion of children with a positive diagnosis on a criterion (i.e., positive cases for a given condition) who are correctly identified by the method, such as a specific rating scale. A method’s specificity pertains to the proportion of individuals without the condition who are correctly identified as negative cases. Specificity is important due to the potential stigma and cost associated with false identification.

Universal Screening

Universal screening using self-report measures is the most common method used to identify youth with mental health concerns (Weist, Rubin, Moore, Adelsheim, & Wrobel, 2007). This method entails gathering symptom-focused data on all students within a specific population, with intent to identify those at-risk for academic failure and/or emotional and behavioral difficulties (Glover & Albers, 2007). High sensitivity is prioritized to ensure that a child with an emotional or behavioral problem is not overlooked and subsequently not identified for treatment (Glover & Albers, 2007; Levitt et al., 2007). Schools are often mentioned as a venue for screening (Center for Mental Health in the Schools, 2005); however, on-going debate surrounds the appropriateness of conducting mental health screening within the school setting. Reasons cited in support include early identification of students in need, practicality (i.e., identifying large groups of students at once), cost efficiency (i.e., savings associated with ready access to youth and fewer intensive interventions in the future), and facilitation of student success at school via the subsequent treatment of students identified in the screening process (Center for Mental Health in the Schools, 2005). Arguments against universal mental health screening include potential family opposition to querying youth about symptoms deemed a private matter, insufficient treatment resources to handle the influx of referrals for students who are identified, low specificity (high error rates are especially troublesome when subgroups of students are over-identified), lack of sufficient follow-up assessment resources to correct false positives, and high costs (Center for Mental Health in the Schools, 2005).

Archival School Records

A less intrusive way of examining the functioning of all students within a school involves reviewing existing data, most commonly in the form of office discipline referrals (ODRs). However, ODRs better identify students with externalizing concerns than internalizing concerns (Richardson, Caldarella, Young, Young, & Young, 2009; Walker, Cheney, Stage, & Blum, 2005), possibly because the latter are less likely to violate school rules via disruptive behaviors. While tracking ODRs does not appear to be a viable school-wide means to locate anxious and depressed students, other indicators in school records such as attendance data may be more relevant (Richardson et al., 2009). Extant research with middle school students found low sensitivity from attempts to predict a subgroup with elevated depressive symptoms (31 % of sample that exceeded a clinical threshold on the Mood and Feelings Questionnaire; Angold & Costello, 1987) from information in records, including demographic features (gender, race/ethnicity), special education status, home language spoken, grade point average (GPA), attendance, and frequency of suspensions (Kuo, Vander Stoep, Herting, Grupp, & McCauley, 2013). The most promising combination of predictors (GPA, home language, gender) detected less than a quarter of the subgroup of students with elevated depression.

Teacher Nominations

This method entails asking teachers to identify students (often three) in their classroom who exhibit symptoms of specific forms of psychopathology (e.g., internalizing or externalizing behavior). Teacher nominations are the first stage in what is widely considered a gold-standard multiple-gating system to identify elementary school students with behavior disorders (i.e., the Systematic Screening for Behavior Disorders [SSBD]; Walker & Severson, 1992). Assumed advantages of teacher nomination procedures include that they are efficient, non-intrusive, cost-effective, and in theory universal in that teachers are asked to consider all students in a classroom. However, nomination biases have been identified with respect to student gender and problem type. Boys are more likely to be nominated as demonstrating an emotional or behavioral disorder (Roeser & Midgley, 1997; Soles, Bloom, Heath, & Karagiannakis, 2008). Teachers are as much as five times more likely to put forth students who exhibit externalizing behaviors (vs. internalizing behaviors) when asked to identify students with moderate to severe emotional and/or behavioral difficulties (Soles et al., 2008), in line with teachers’ expressed comfort with identifying externalizing forms of mental health problems (Williams, Horvath, Wei, Van Dorn, & Jonson-Reid, 2007), and limited confidence with identifying depression or anxiety disorders in students (Walter, Gouze, & Lim, 2006). While abundant empirical support exists for the use of teacher nominations as an accurate method for identifying students with externalizing behaviors (e.g., Kalberg et al., 2011; Lane & Menzies, 2005), very few studies could be located that specifically examined accuracy in relation to anxiety and depression.

Accuracy of Teacher Nominations to Detect Students with Internalizing Symptoms

A few relevant studies have provided support for teachers’ ability to detect students with elevated internalizing symptoms by examining nominated versus non-nominated students’ mean scores on outcome measures. For instance, teachers of fifth-grade students who were asked to judge (yes/no) if an individual student in their class “has emotional or behavioral difficulties serious enough that s/he could benefit from seeing a psychologist” (p. 120) nominated students who self-reported more depressive symptoms, anger, and negative affect toward school than their non-nominated classmates, and teachers rated these students as having more disruptive behavior and anxiety (Roeser & Midgley, 1997). Regarding symptoms of depression, fourth-grade students who teachers nominated as withdrawn were rated by peers as less likeable, interacted less frequently during recess, and reported more distressed cognitions than students nominated by teachers as popular (Ollendick, Oswald, & Francis, 1989). In contrast to these two studies that looked at mental health problems not aligned with specific clinical disorders, Layne, Bernstein, and March (2006) asked teachers (grades 2–5) to indicate “the three most anxious children in the classroom” (p. 386). On the Multidimensional Anxiety Scale for Children (MASC; March, 1997), the nominated children reported significantly more symptoms of anxiety (Total Anxiety T score M = 57.9) than the non-nominated children (M = 54.5). This finding supports the notion that elementary school teachers can select children whose anxiety levels differ from their classmates.

We identified only three studies that examined the accuracy of teacher nominations for internalizing concerns in relation to a criterion, specifically students’ clinical status (anxious or depressed) as defined by meeting full diagnostic criteria (Auger, 2004; Moor et al., 2007) or in the elevated range of a well-established measure of psychopathology symptoms (Dadds, Spence, Holland, Barrett, & Laurens, 1997). In the only published study relevant to anxiety, Dadds et al. (1997) reported low sensitivity for teacher nominations gathered during a school-wide universal screening process conducted to identify students for participation in a school-based anxiety intervention. Approximately 10.5 % of children (grades 3–7) from eight schools self-reported elevated symptoms on the Revised Children’s Manifest Anxiety Scale (RCMAS; Reynolds & Richmond, 1985). Teachers nominated up to three children in their class who “displayed the most anxiety, i.e., were shy, nervous, afraid, inhibited” (p. 628); approximately 9.7 % of the child sample was identified as anxious by teachers. Comparisons of agreement between the clinical subgroups indicated little overlap. Teachers identified only 19.3 % of students who self-reported high levels of anxiety, thus missed 80.7 % of symptomatic youth. Regarding specificity, approximately 8.5 % of students who reported anxiety symptoms in the average range were misidentified, in that teachers nominated them as anxious. This study is limited by the uncertain reliability of children’s anxiety reports, as the RCMAS was administered only once (vs. repeated by the clinical sample to exclude students with temporary elevations).

Sensitivity of teacher nominations appeared somewhat better with respect to identifying depression among secondary students. In a large sample of 13–16 year olds from eight high schools, Moor et al. (2007) identified approximately 8.4 % of students as clinically depressed through a clinical interview. This prevalence is consistent with estimates yielded from epidemiology research on adolescent depression, particularly among high school age youth (Goldman, 2012; Merikangas et al., 2010). Almost half of the participating teachers were “guidance teachers” with designated responsibilities for ensuring the personal and social well-being of their students. The teacher sample identified 4.5 % of all students as possibly/probably depressed (achieved via unlimited nominations of students from class lists containing names of the youth under study). Regarding the overlap between these two clinical samples, teachers correctly recognized as depressed 41–52 % of students diagnosed with depression. Findings provided preliminary support for teachers’ ability to identify adolescents who exhibit depressive symptoms, although in a unique group of educators. In contrast, Auger (2004) found that most middle school teachers failed to identify as “depressed to a degree that some type of intervention would be helpful” (pp. 381–382) the 5 students (of 356 participants) who were diagnosed as clinically depressed. Of note, this 1.4 % prevalence rate is akin to the 1.6 % prevalence of depressive disorders identified in prior study of 11–13 year olds; Costello, Mustillo, Erkanli, Keller, & Angold, 2003). Only 27 % of teachers of the five depressed students correctly nominated them as such; clinically depressed students were thus missed by 73 % of teachers. Of the 351 students who did not emerge as depressed during the screening and diagnosis process, 91 % of teachers correctly did not nominate these individuals (leaving 9 % false positives). Taken together, these studies provide somewhat stronger support for teachers’ ability to identify clinically depressed adolescents as compared to children with elevated levels of anxiety. Teacher sensitivity to students with depressive symptoms may be even greater among elementary school samples, in part due to increased teacher–student familiarity likely to come from spending the majority of a school day together (vs. only one class period together as in most models of secondary school).

Purpose of the Current Study

As schools have become a major setting through which youth receive mental health services, accurate methods of identifying symptomatic students are necessary. Drawbacks associated with universal self-report screenings necessitate consideration of alternative methods. The accuracy of one efficient method, teacher nominations, is relatively understudied with regard to identifying students with anxiety and depression. When the target age group is limited to elementary school students, we identified no evaluations of the accuracy of teacher identification of depressive symptoms, and only one study of anxiety in the same age range (although limited by a one-time assessment of symptoms). To facilitate a proactive and preventative approach to school-based mental health services, it may be more relevant to develop accurate methods for identifying those children who demonstrate repeatedly elevated, yet not necessarily clinical or diagnosable, levels of symptoms. Identifying at-risk youth earlier increases the likelihood of preventing impairments in educational functioning and overall quality of life (Albers, Kratochwill, & Glover, 2007; Levitt et al., 2007). Thus, the current study evaluated the sensitivity and specificity of teacher nominations to identify elementary school students who repeatedly self-reported at-risk levels of anxiety or depression symptoms.

Method

Participants

Children

Student participants attended one of two elementary schools within a large school district in a Southeastern state. The National Center for Educational Statistics (NCES) classified the locales of both schools as suburb of a large city. The schools are approximately 5 miles apart and received similarly positive school grades (A or B) in the 2011–2012 school year. Student participants were 238 students (males = 47.5 % of the sample) in grades 4 (56.3 %) and 5 (43.7 %), ages 9–12 years old (M = 10.10; SD = 0.82). For ethnicity, 92 students (38.7 % of sample) identified as Hispanic or Latino. For race, 76 students (31.9 %) identified as White, Non-Hispanic; 52 (21.8 %) African American; 40 (16.8 %) multiracial; 4 (1.7 %) American Indian/Alaskan Native; 3 (1.3 %) Asian; and 63 (26.5 %) students identified with another group, predominantly Hispanic White (n = 61). A total of 194 students (81.5 %) reported receiving free or reduced-price school lunch, which is used as an indicator of low socio-economic status (SES). When compared to the demographic features of the combined population of the two participating schools (N = 1,516 students K − 5, per NCES database), the sample was similar in terms of proportion of students who were male (population = 52.8 %; χ 2 (1) = 0.38, p = .54) and Hispanic (population = 41.1 %; χ 2 (1) = 2.01, p = .16). Significant differences (p < .05) were noted between the sample and population in terms of SES (population = 73.3 % received free or reduced-price lunch; χ 2 (1) = 8.55, p < .004) and race (χ 2 (4) = 16.95, p < .003), with students identified as African American under-represented (43.9 vs. 35.9 % of non-Hispanic students in population and sample, respectively) and White over-represented (41.2 vs. 52.4 % of non-Hispanic students in population and sample, respectively).

Teachers

The students were served in a total of 26 elementary school classrooms (11 in School A, 15 in School B). Several classrooms used a co-teaching model in which some students were split between two teachers for part of the day according to subject (for instance, with a different teacher 2 h per day for reading instruction). The 26 participating teachers were the primary teachers for a given class and had an average of 19 students in their class (SD = 4.23). Teachers were predominantly female (84.6 % of sample), and 15.4 % identified as Hispanic or Latino. For race, most teachers identified as White (69.2 %); the remaining were African American (26.9 %) or multiracial (3.9 %). For education level, the highest degree earned was as follows: Bachelors (50.0 %), Masters (46.2 %), and Specialist (3.8 %). Four teachers (16.0 %) reported receiving previous professional development on children’s mental health issues.

Procedures

Recruitment

All teachers of fourth and fifth-grade classrooms in both schools were invited to participate and were offered a $25 gift card as incentive. All 26 teachers (100 %) consented. Parent consent forms were distributed to all 493 students in the 26 classrooms. A class-level incentive of a donut party was offered to classes in which 75 % of consent forms were signed and returned (achieved by 8 classes), and small individual incentives (i.e., bracelets) were provided to each student who returned a consent form. In total, 275 students returned consent forms, resulting in a 55.8 % response rate (66 % for school A, and 49 % for school B). Of these 275 consent forms, 244 provided affirmative permission for student participation, yielding a 49.5 % student participation rate. Across classrooms, the participation rate ranged from 19 to 95 % (Mdn = 45 %; M = 48.72 %; SD = 21.36 %).

Data Collection

Four months into the school year (January 2011), the 238 students with parent consent to participate who were present at school the week of data collection gathered in small groups at a private location on campus. After providing written assent, students completed a brief demographic form followed by the two measures described below. To ensure confidential responding, children were assigned code numbers (no names on papers) and seated with ample space between students. To facilitate accurate responding, the authors and members of their research team monitored students’ completion of measures. When students stopped responding or raised their hands, researchers privately pronounced and explained the meanings of any words that participants found unclear. Students’ raw scores were converted to T scores using age and gender norms provided in the manuals. We selected a T score cut-point of 60 as the clinical threshold for what we refer to as “at-risk” or “elevated.” A T score of 60 indicates symptom severity that is one standard deviation above the mean. The interpretative guidelines in the manuals for the narrowband measures of depression and anxiety we used characterize T scores between 61 and 65 as “above average,” 66–70 as “much above average,” and above 70 as “very much above average.”

One week later, students whose T scores fell at or above 60 for either rating scale completed the same rating scale(s) a second time. Students with T scores under 60 at Time 2 were excluded from the “at-risk” sample. These assessment procedures are consistent with recommendations for multiple-stage screening procedures to identify students with elevated anxiety (Laurent, Hadler, & Stark, 1994) or depression (Reynolds, 1986), specifically via school-wide survey with a narrowband measure of psychopathology followed by second administration of the same measure to students whose initial scores exceeded a predetermined threshold. During a large-scale screening process to detect depressed middle school students, Auger (2004) also used a 1-week interval between measure administrations in order to eliminate false positives. The two-stage student self-report process is intended to increase confidence in the validity of the risk status of students identified as at-risk for elevated depression or anxiety. Removing students with transient negative affect or fears from the at-risk group essentially acknowledges (a) the conceptualization of anxiety and depression as persistence in symptoms over time (American Psychiatric Association, 2013), (b) measurement error inherent to rating scales, which may contribute to some false positives among students who are near the clinical threshold, and (c) the tendency for respondents to report more severe psychopathology the first time they complete a given measure (for a discussion, see Reynolds, 1986). In our study, parents of students whose scores twice exceeded the threshold (T ≥ 60) promptly received a letter with contact information for community mental health agencies, and were offered an opportunity to participate in school-based group counseling targeting symptoms of anxiety or depression.

Data were collected from teachers on Tuesday of the same week students completed the first round of measures (student data collection occurred on a Wednesday or Friday, depending on school attended). Teachers received a list of their students who had parent consent to participate. A cover letter included the directions below, and provided behavioral descriptors of childhood anxiety and depression:

Please nominate up to three (3) students that, based on your knowledge of this student and his/her typical behavior, demonstrate symptoms of anxiety and/or depression; You may nominate a student for anxiety, for depression, or for both conditions, for a total number of up to six (6) students. Please do not discuss your nominations with any colleagues; please complete this form independently.

The next portion of the page included two columns. The left column listed 11 behavioral descriptors of symptoms of anxiety: appears nervous; acts in a fearful manner; cries, tantrums, freezes in social situations; reluctant or afraid to attend school; acts jittery or fidgety; worries often; is timid or unassertive; has trouble separating from caregiver; worry about harm befalling caregiver; physical complaints (headache, stomachache); and fear of being humiliated or embarrassed. The right column listed 10 behavioral descriptors of depression: cries often; looks sad; excessively shy; avoids or withdraws from social situations; lack/diminished interest in peers or activities; prefers to spend time alone; has a lack of energy/appears tired; might act irritable or agitated; changes in appetite—increased or decreased; and difficulty concentrating.

Under the appropriate column, teachers were provided three blank lines and asked to write “Students showing elevated anxiety” and “Students showing elevated depression.”

Measures

Multidimensional Anxiety Scale for Children (MASC; March, 1997)

The MASC is a 39-item measure of anxiety for children and adolescents ages 8-19 which assesses four types of anxious symptoms (physical symptoms, social anxiety, harm avoidance, and separation/panic). Students reported the degree to which they experienced each feeling or behavior, using 4-point Likert scale from 0 (Never True About Me) to 3 (Often True About Me). The total MASC anxiety scale (sum of all 39 symptoms) was used as the indicator of level of anxiety. Regarding reliability, the total MASC anxiety scale has demonstrated excellent internal consistency (α = .88–.89) and test–retest reliability (r = .93; March, 1997). Regarding validity, March (1997) reports the MASC evidenced 90 % sensitivity when used to discriminate a sample of children and adolescents with anxiety disorder diagnoses from a non-anxious comparison group; further, the correlation between the MASC total anxiety score and the RCMAS is high (r = .63). We used the MASC in part because the norm sample includes children as young as 8 years old.

Children’s Depression Inventory (CDI; Kovacs, 2003)

The CDI is a 27-item measure of depression in youth ages 7–17. The CDI assesses five dimensions of depressive symptoms: negative mood, interpersonal difficulties, negative self-esteem, ineffectiveness, and anhedonia. The total score (sum of all 27 symptoms) was used as the indicator of level of depression. The CDI total score has demonstrated good to excellent internal consistency (α = .71–.89) and acceptable test–retest reliability (r = .74–.87; Kovacs, 2003). The manual summarizes a large number of studies that find the CDI discriminates children who are depressed from non-depressed youth, and several studies finding large, positive associations between the CDI total score and other measures of depression (Kovacs, 2003). We used the CDI in part because it includes normative data for children as young as 7 years old.

Overview of Data Analyses

Based on scores on the CDI and MASC, children were dichotomized into two groups: (1) “at-risk” = T scores ≥60 at both time points that students completed the rating scales (i.e., 7 days a part), and (2) “not elevated” = T scores <60 at either time point. Using those subsamples, teacher nomination status (yes or no) was compared with the dichotomized rating scale variables (elevated or not elevated) for each measure separately.

As summarized in Table 1, diagnostic efficiency statistics (Green & Zar, 1989; Landau, Milich, & Widiger, 1991) were calculated for: (1) the percentage of students correctly identified by teachers (i.e., sensitivity), (2) the proportion of internalizing students missed by teacher nominations (i.e., false negative), (3) the percentage of students misidentified by teachers (i.e., false positive), and (4) specificity— the proportion of self-reported negative students that teachers did not nominate (i.e., true negative). Using the score method (Newcombe, 1998), 95 % confidence intervals were constructed around each proportion.

Table 1 Accuracy formulas and sample sizes for student self-report of symptoms of depression and anxiety

Results

Prevalence of Elevated Anxiety and Depressive Symptoms

A total of 238 students took part in the initial administration (Time 1) of the MASC and CDI. All students with elevated scores at Time 1 were re-administered the appropriate measure at Time 2, indicating a 0 % attrition rate across the 7–10 day interval. At Time 2, 42 students (17.6 % of the sample) were re-administered the MASC, and 34 students (14.3 %) were re-administered the CDI. Eighteen of those students were re-administered both measures. Among all 238 participants, the association between Time 1 MASC and CDI scores was moderate (r = .42). Mean scores on the MASC and CDI across time are presented in Table 2. The proportion of the student sample who twice reported at-risk levels of symptoms was 11.3 % (n = 27) for anxiety and 9.2 % (n = 22) for depression. Eleven of these students (4.6 % of the sample) had elevated scores on both measures. In sum, 38 students (16.0 % of the sample) twice reported elevated scores on the MASC, CDI, or both measures. These at-risk samples exclude the 15 students for anxiety and 12 students for depression whose MASC or CDI scores, respectively, were elevated at Time 1 only.

Table 2 Means, SD, and ranges for MASC and CDI T scores

The across-time correlations were .42 (p = .01) and .51 (p < .001) for the CDI and MASC, respectively, within the restricted samples of 34 and 42 participants with elevated scores at Time 1. Those correlations could be particularly sensitive to outliers due to the relatively small sizes of the subgroups. After removing the participants whose difference scores were nearly 2 or more SD from the sample M, the across-time correlations were .52 (p = .003; n = 32) and .55 (p < .001; n = 40). Although these values are still smaller than the test–retest correlations reported in the technical manuals, the attenuated correlations are not surprising given the restricted range in our subgroups of symptomatic students, and are in line with results from prior studies with clinical samples (e.g., for the CDI, r = .62 after 10 days among 96 children in a psychiatric hospital; Nelson & Politano, 1990). Corrected for range restriction (Chan & Chan, 2004), our across-time correlations were .75 (n = 34) and .91 (n = 42) for the CDI and MASC, respectively.

The mean number of student participants in a classroom was 9.15 (SD = 4.03). Teachers nominated an average of 1.77 of the participating students in their class for depression; 15.4 % of teachers nominated 0 students, 23.1 % nominated 1 student; 30.8 % nominated 2 students, and 30.8 % nominated 3 students. Teachers nominated an average of 1.85 of the participating students in their class for anxiety; 19.2 % of teachers nominated 0 students, 23.1 % nominated 1 student, 11.5 % nominated 2 students, and 46.2 % nominated 3 students. In total, teachers nominated 46 students as possibly depressed (19.3 % of the sample) and 48 students as possibly anxious (20.2 % of the sample). Twelve of these students (5.0 % of the sample) were nominated for both depression and anxiety. Thus, teachers nominated 82 students (34.6 % of the sample) for depression, anxiety, or both.

Teacher Identification of Students with Elevated Depressive Symptoms

Sensitivity

As summarized in Table 3, 11 of the 22 students who self-reported CDI T scores at or above 60 at Time 2 were also nominated by their teachers as demonstrating elevated depressive symptoms, yielding a sensitivity rate of 50 %. Teachers thus accurately nominated approximately one-half of participants who repeatedly self-reported at-risk levels of depression. A 95 % confidence interval yielded a lower limit of 30.72 % and an upper limit of 69.27 %.

Table 3 Accuracy of teachers in identifying students with elevated depressive and anxiety symptoms

Miss Rate

Eleven of the 22 students who self-reported CDI T scores at or above 60 at Time 2 were not nominated by their teachers as demonstrating elevated depressive symptoms, yielding a miss rate of 50 % (95 % CI 30.72–69.27).

Specificity

A total of 181 of the 216 students who self-reported CDI T scores <60 (i.e., not “at-risk”) were also not nominated by their teachers as demonstrating elevated depressive symptoms, yielding a specificity rate of 83.80 %. Thus, teachers correctly identified 83.80 % of students who did not self-report elevated levels of depression, by intentionally not nominating them. A 95 % confidence interval yielded a lower limit of 78.33 % and an upper limit of 88.14 %.

Misidentified Rate

Thirty-five of the 216 students whose self-reports on the CDI corresponded to T scores <60 were identified by their teachers as demonstrating elevated depressive symptoms, yielding a misidentified rate of 16.20 % (95 % CI 11.86–21.66).

Teacher Identification of Students with Elevated Anxiety Symptoms

Sensitivity

As summarized in Table 3, 11 of the 27 students who self-reported MASC T scores at or above 60 at Time 2 were nominated by their teacher as demonstrating elevated anxiety symptoms, yielding a sensitivity rate of 40.74 %. Teachers thus accurately nominated almost 41 % of children who in fact repeatedly self-reported at-risk levels of anxiety. A 95 % confidence interval yielded a lower limit of 25.52 % and an upper limit of 59.27 %.

Miss Rate

Sixteen of the 27 students who self-reported MASC T scores at or above 60 at Time 2 were not nominated by their teacher as demonstrating elevated symptoms of anxiety, yielding a miss rate of 59.26 % (95 % CI 40.72–75.48).

Specificity

A total 174 of the 211 students who self-reported MASC T scores less than 60 (i.e., not “at-risk”) were not nominated by their teachers as demonstrating elevated anxiety symptoms, yielding a specificity rate of 82.46 %. Thus, teachers correctly identified 82.46 % of students who did not self-report elevated levels of anxiety, by intentionally not nominating them. A 95 % confidence interval yielded a lower limit of 76.76 % and an upper limit of 86.99 %.

Misidentified Rate

Thirty-seven of the 211 students whose self-reports on the MASC corresponded to T scores <60 were identified by their teachers as demonstrating elevated anxiety symptoms, yielding a misidentified rate of 17.54 % (95 % CI 13.00–23.23).

Post hoc Analyses

In line with the previously mentioned biases in teacher nominations by student gender and symptom characteristics (Roeser & Midgley, 1997; Soles et al., 2008), additional analyses were conducted to determine if the students who reported at-risk levels of anxiety or depression but were not identified by their teachers as such differed in any systematic way from their counterparts who were accurately identified, specifically with respect to clinical presentation (i.e., symptom severity) or demographic characteristic.

Table 4 presents the mean CDI scores (Time 1 data presented because, by design, most students were not administered this measure at Time 2) and demographic characteristics by identification group. An independent samples t test comparing the CDI scores among the students who consistently reported elevated symptoms of depression indicated that the mean score of the 11 students identified by their teachers as demonstrating elevated symptoms of depression (M = 69.90) was not significantly higher than the mean score of students not identified by their teachers (M = 68.00), t (20) = 0.68; p = .50. Thus, the missed students were not more likely to be a less symptomatic group. Although a higher proportion of males was identified by teachers (54.55 % in the true positive group vs. 27.27 % in the false negative group), this difference in proportions was not statistically significant, χ 2 = 1.69, p = .19. The distribution of demographic groups across identification groups was not significantly different with regard to the other characteristics examined (race, ethnicity, grade level, SES).

Table 4 Clinical and demographic features of each depression group

Table 5 presents the mean MASC scores and demographic characteristics by identification group. An independent samples t test comparing the MASC scores among the students who consistently reported at-risk symptoms of anxiety indicated that the mean score of the 11 students identified by their teachers as demonstrating elevated symptoms of anxiety (M = 69.36) was not significantly higher than the mean score of the 16 students not identified by their teachers (M = 65.88), t (25) = 1.50; p = .15. Thus, the missed students were not more likely to be a less symptomatic group. Although a higher proportion of males was identified by teachers (63.64 % in the true positive group vs. 36.36 % in the false negative group), this difference in proportions was not statistically significant, χ 2 = 2.77, p = .096, but may be considered a trend in the data. The distribution of demographic groups across identification groups was not significantly different with regard to race, ethnicity, grade level and SES.

Table 5 Clinical and demographic features of each anxiety group

Discussion

We found that approximately 9 and 11 % of elementary school children repeatedly reported at-risk levels of depression and anxiety, respectively. Teachers accurately identified as symptomatic 50 % of these students with elevated depressive symptoms, and 41 % of students with elevated anxiety symptoms, providing moderate support for the sensitivity of teacher nominations in identifying children with internalizing problems. Symptomatic children were not missed by teachers systematically as a function of their symptom severity or demographic features, with the exception of a nonsignificant trend for gender in the identification of elevated anxiety (at-risk girls were somewhat more likely to be missed than their at-risk male peers). For specificity, teachers misidentified as possibly depressed about 16 % of students who denied elevated symptoms, and misidentified a similar proportion of youth as anxious (about 17.5 %).

Accuracy of Teacher Nominations to Identify Students with Internalizing Problems

Sensitivity

The sensitivity rates obtained in the current study are similar to those obtained in previous research on identification of depressed students, and exceed the more modest sensitivity uncovered in the one relevant study of students with elevated anxiety. Prior studies of secondary students yielded sensitivity estimates between 27 and 52 % with respect to identifying students diagnosed with clinical depression, with higher rates within samples of teachers that included many with designated social-emotional guidance responsibilities (Auger, 2004; Moor et al., 2007). Our findings suggest that general education elementary school teachers are likely to also identify about half of students with elevated depressive symptoms, and extend this conclusion to using a criterion variable more likely to be employed in the schools (i.e., at-risk status on a valid self-report measure of symptoms vs. psychiatric diagnosis).

Regarding anxiety, Dadds et al. (1997) found teacher nominations detected roughly 19 % of primary school students who self-reported at-risk symptom levels on a different narrowband measure of anxiety. The higher sensitivity (about 41 %) obtained in the current study may reflect differences in study design, including (a) focus on a more persistently symptomatic group, as students who reported elevated anxiety only once were excluded from the at-risk sample, and (b) provision of a more comprehensive list of behavioral descriptors of symptoms to teachers.

The converse of the sensitivity finding is that teachers also missed (by intentionally not nominating) 50 % of students who in fact experienced elevated depressive symptomatology. This miss rate parallels the 48–59 % miss rates obtained with teachers of middle school students (Moor et al., 2007). The miss rates for anxiety obtained in the current study (59 %) and previous research (81 %; Dadds et al., 1997) suggest teachers may have more trouble detecting atypical levels of anxiety in children, or that this form of distress may have less observable features than depression.

Specificity

Compared with their accuracy level in identifying students with elevated psychopathology, teachers were more accurate in identifying students with typical levels of depression (84 %) and anxiety (82 %). However, lower rates of misidentification (8–9 %) have been obtained with earlier studies of anxiety (Dadds et al., 1997) and depression (Auger, 2004). The finding that 16–18 % of children were misidentified in the current study underscores the need for follow-up assessments of teacher-nominated students. From a prevention standpoint, a larger number of misidentified youth may be tolerated if the more liberal identification procedure contributes to a higher sensitivity rate. In an early gatekeeping process, it is arguably better to identify a student incorrectly at first, as opposed to missing a symptomatic child entirely, precluding possible intervention. Thus, most concerning is the elevated miss rate; relying on teacher nominations alone, many students may have gone unidentified and unserved.

The substantial number of students “missed” and misidentified by teachers in the current study suggests that many teachers have some trouble detecting symptoms of internalizing distress in students who experience at-risk levels of problems. No identification training was provided to teachers, and the majority (84 %) reported little to no prior professional development in children’s mental health issues. Teachers’ recognition of symptoms might have been greater following training in manifestations of anxiety and depression in children. However, Moor et al. (2007) found such training failed to improve middle school teachers’ recognition of students’ depressive symptoms. A growing body of literature thus casts doubt on the validity of teachers as completely accurate gatekeepers of students in need of services targeting internalizing forms of mental health problems, particularly anxiety.

Implications for Research and Practice

Numerous concerns have been raised in regards to using a universal screener to identify students with mental health concerns (Center for Mental Health in Schools, 2005; Levitt et al., 2007). The current study verified the efficiency of an alternative—asking teachers to identify students with anxiety and/or depression. We found the teacher nomination process to circumvent some of the barriers inherent to universal screeners. Teachers took 10–15 min to complete the nomination forms; the entire nomination process for two grade levels in a school was completed in <2 h. The costs associated with the nomination process were negligible.

Benefits aside, elementary schools that elect to use a teacher nomination procedure in lieu of universal screening can expect to miss about half of the students who report elevated depressive symptoms, and up to 60 % of children who experience elevated anxiety. Girls are particularly likely to be missed (Roeser & Midgley, 1997; Soles et al., 2008). A more comprehensive approach that involves gathering symptom frequency and severity from all students is justified by the potential risks associated with missing a student with internalizing problems (e.g., school refusal, suicidal thoughts). However, issues associated with universal screeners might preclude schools from selecting this option. Logistical barriers we encountered included extensive time and personnel demands to (a) confidentially administer the measures to groups of students, (b) score measures quickly, and (c) track down absent students for later screening. Cost to purchase the copyrighted measures tallied several dollars per student. Some parents were resistant to permitting their children to provide information about their mental health. Specifically, 6.3 % of parents communicated in writing and refused to allow their child to participate in the study, and 44.2 % of consent forms were not returned at all, possibly indicative of a passive decline (albeit confounded with students not bringing the forms home). Although we cannot verify the reason(s) for the low participation rate, anecdotal communication with school personnel implicated the stigma that surrounds mental health, which apparently deterred many families from participating in what was described in the consent form as in part a “free screening of [students’] mental health” in that parents would be notified in writing in the event a student indicated elevated levels of anxiety or depression, and provided with appropriate referral options.

In sum, there are both pros and cons to identification methods such as universal screeners and teacher nominations. The identification methods used by schools should vary based upon their purpose, type of symptoms, and intention for follow-up. For example, schools that aim to identify all students with at-risk symptoms should look no further than a universal screener, as results from the current study indicate that not all students will be identified using educator nominations. However, if the school aims to only identify some students who might benefit from a limited amount of supplemental mental health services available at the school (e.g., from a single school-based or community provider), the nomination procedures used in the current study may prove a feasible method to identify a specific subgroup of students. If a clinical assessment will occur between initial identification and clinical service provision, then a school should not be discouraged by methods associated with low specificity rates.

Although not the primary focus of the current study, our findings lend further support for a multiple-stage process in school-wide screenings. Akin to the attenuation effects identified in prior studies that required youth to self-report their internalizing symptoms at multiple time points using the same measure (e.g., Masip, Amador-Campos, Gómez-Benito, & del Barrio Gándara, 2010), we also found a tendency for students’ scores to decrease from the first to second completion of the narrowband measure. Just under two-thirds of students initially identified as symptomatic maintained symptoms in the elevated range (T ≥ 60) 1 week later (specifically: 64.29 % for anxiety and 64.71 % for depression). In earlier school-wide screenings (grade levels 4–8), the proportion of students who exceeded a symptom threshold at the first stage that went on to exceed the same threshold after completing the same narrowband measure at the second stage was 69.35 % in Laurent et al.’s (1994) study of anxiety symptoms (threshold: T ≥ 60) and 67.74 % in Auger’s (2004) study of depression (threshold: T ≥ 70). Thus, approximately 1/3 of students who initially exceed a clinical threshold during a school-wide screening may have spurious elevations. If only the data from the initial school-wide screening had been used to identify an at-risk sample for inclusion in an early intervention, eventual improvements in the sample may have been over-attributed to intervention effects. Future research could test for differential intervention impacts as a function of baseline risk as stable (i.e., elevated at repeated administrations of a measure) or transient (i.e., only elevated at the initial time point).

Limitations and Directions for Future Research

Given the lack of studies that have evaluated the accuracy of teacher nominations as a method for identifying elementary school children with internalizing problems, replication is needed prior to making definitive conclusions. Findings in the current study are limited to a relatively small sample of older elementary school students. This sample reflects the lower age limits of children appropriate for inclusion in efficacious targeted interventions for anxiety and depression (Kendall, Furr, & Podell, 2010; Stark, Streusand, Krumholz, & Patel, 2010). Although the 49.5 % participation rate and resulting sample size were in line with rates in prior relevant studies (e.g., Layne et al., 2006), students who were African American or of higher SES were less likely to participate in our study. It is also possible that students with the highest elevations in anxiety or depressive symptoms were excluded from teacher consideration due to lack of parental consent. Full student participation would afford teachers additional students eligible for nomination, possibly including some of those that evidenced elevated symptoms. Future studies with larger samples of students and teachers, with more complete student participation, may produce narrower confidence intervals and more trust in the obtained proportions of interest.

Although not employed in this study, an educational training for teachers about depression and anxiety may have been useful to illustrate how internalizing symptoms manifest. Such training could include group discussions regarding the range of challenges students encounter due to depression and anxiety, and video demonstrations of young children depicting internalizing symptoms in various school settings. Notably, Moor et al. (2007) found such a workshop did not improve high school teachers’ identification of depressed students; rather, after training teachers simply nominated fewer students as depressed. More research is needed to identify efficacious means of educating teachers on signs and symptoms of internalizing disorders in students, particularly among developmental groups (e.g., elementary school children) they may not as readily associate with emotional distress.

Additional studies are needed to determine (a) the unique features of students who repeatedly report diminished mental health but are not identified by teachers, and (b) the “true” clinical status of students we deemed misidentified. Regarding the former, we detected a trend for missed students to be girls, but ruled out other demographic features and sub-clinical levels of psychopathology. These students may have other factors that contribute to their likelihood of going unnoticed, such as strong academic skills or involved families. Regarding the latter, the current study is limited by the sole reliance on student self-report data to determine the presence of psychopathology. Validity of self-report may be compromised by issues related to readability, recall of symptoms (i.e., reflecting on their feelings/behaviors at the time of the screening and not during the time frame as directed by the measure), and students’ own abilities to evaluate their feelings or problems. In the absence of other methods of determining students’ symptoms (e.g., parent ratings or clinical interviews), we cannot verify the validity of students’ self-reports. However, students identified as at-risk endorsed elevated symptoms two times, which lends some support to the reliability of their personal evaluations. Further, previous research found children as young as first grade can accurately self-report feelings of depressed mood (Grover et al., 2007). Future studies can focus on the misidentified quadrant to determine if parents’ and clinicians’ ratings are more in line with students’ or teachers’ perceptions.

Conclusions

School-based treatment of anxiety and depression is hampered by the lack of effective and efficient means for identifying the students in need of services (Masia Warner & Fox, 2012). Our study found teacher nominations identified about 40–50 % of elementary school students with elevated anxious and depressive symptomatology. From a “glass half full” perspective, systematic educator nominations appear a promising method for identifying a sizable proportion of children who may otherwise go unreported, particularly in social contexts in which more invasive universal strategies such as school-wide self-report screenings may not be feasible or there are limited resources to provide subsequent targeted interventions. On the other hand, our finding that classroom teachers missed about 50 and 60 % of their students who consistently reported at-risk levels of depression and anxiety, respectively, is rather alarming and calls into question the validity of this method as an initial step in identifying youth who may be in need of mental health services (for examples, see Chiu et al., 2013; McLennan et al., 2008; Walker & Severson, 1992) and indicates a need to use a more comprehensive universal strategy when attempting to locate all students in potential need of services. In line with best practice guidelines for multi-modal assessment, our findings that teachers falsely identified approximately 1 in 6 non-symptomatic students as depressed or anxious underscores the need for nominated students to be further evaluated with data from another source.