Introduction

Although for several decades there has been widespread, international agreement on the importance of expanding the focus of pediatric primary care to include mental as well as physical health problems and of implementing routine screening for psychosocial problems as a strategy for achieving more integrated care [23, 48], it is only in the past few years that several states and countries have established large-scale screening programs [29, 41]. In the USA, for example, although for the past two decades the nation’s health goals [45, 46] have recommended routine screening for mental health problems as a part of primary care for both children and adults and although for the past decade court orders in a number of states have reaffirmed the requirement for screening of children from low-income families who have government health insurance [29, 42], it was not until 2008 that the first state in the USA made screening a formal requirement [18, 40].

The prevalence and impact of children’s mental health problems outside the USA have also long been recognized in both developed and developing countries. Over the past three decades, the World Health Organization has consistently and repeatedly advocated for the inclusion of enhanced mental health screening and treatment programs in primary health care and in schools [12, 19, 52]. Studies of the global burden of mental health problems have been widely accepted and are influential [35, 48] and applied to children as well as adults [52]. Over this period, there have been many studies of children’s mental health problems and interventions around the world. The Pediatric Symptom Checklist has been translated into two dozen languages and validated in Japan [20], Holland [41], Austria [44], the Philippines [7] and Brazil [36], as well as Chile [10]. The Strengths and Difficulties Questionnaire, another brief screen like the PSC, has been translated into 50 languages and the Child Behavior Checklist into even more [50]. Despite the widespread concern about children’s mental health throughout the world over many decades, only a single country, the Netherlands [41], mandates mental health screening for all children. In Chile, screening is done in only some designated high-risk schools [17] and in the USA only a single state enforces its screening requirement and only for children with public health insurance [29].

Although many would agree that screening for earlier recognition of psychosocial dysfunction has face validity, data on the implications of a positive mental health screen for children’s overall adjustment, health and academic achievement are limited. This gap is evident in the US Preventive Services Task Force’s recent decision to endorse routine screening for depression (in adolescents), but not routine screening for overall mental health because of the lack of evidence linking screened risk to real-world outcomes and improved functioning after intervention [47]. Nevertheless, the needs are great enough and the face validity of interventions strong enough that recent position papers by the American Academy of Pediatrics recommend routine screening especially in settings where services are readily available [4, 5].

Screening for depression is an example of what is called ‘narrow band’ screening, while screening for overall psychosocial functioning or mental health is called ‘broadband’ screening [37]. There have been many studies that document the relationship between academic outcomes and specific diagnoses like depression [13], diagnosis specific measures like the Children’s Depression Inventory [2], domain-specific problems like accepting a teacher’s authority [39] and general conditions like serious emotional disturbance [49]. But, only one of the narrow band studies mentioned above [49] used the kind of standardized academic achievement tests of reading and mathematics that are now recognized as a standard and required by all 50 states in the USA [1, 32].

Two recent papers have explored—with mixed results—the relationship between narrow band mental health scores and other academic achievement tests (although not the type that is now required by the states). Breslau and his colleagues [6] conducted a secondary analysis of a longitudinal sample of 823 urban children who had been followed from birth and reported that using a domain-specific teacher report measure administered at age 6 years, internalizing, externalizing and attention problems all predicted math and reading achievement at age 17 years, although when entered in a stepwise fashion only attention problems were significant predictors. Duncan and colleagues [11], in a secondary analysis of six large longitudinal studies, found that on domain-specific behavioral measures completed by teachers or parents at school entry (ages 5–6), it was only attentional problems that predicted math and reading achievement test scores at the end of primary school (ages 11–12 years). Neither externalizing nor internalizing problems predicted academic test scores.

The relationship between broadband measures of mental health and academic outcomes has been explored in a number of smaller studies. In a sample of 156 preschoolers who were screened with a brief teacher-rated version of the BASC (Behavior Assessment System for Children), total problem score had a strong negative correlation with reading and math achievement test scores in the first grade [24]. In samples of 166 middle school and 185 high school students, positive screening scores on the parent- or student-rated Pediatric Symptom Checklist (PSC) have been shown to be related to higher rates of school absence, poor grades and/or repeating a grade in school [14, 33].

In summary, all of the studies reviewed have significant limitations with regard to understanding the relationship between the kinds of broadband mental health screens that are recommended or required in pediatrics and the kinds of standardized achievement tests that are now the de facto standard throughout the USA. The current paper addresses these limitations by using two different broadband screens in a large sample to predict national achievement test scores much like those used in the USA. These analyses were possible because the nation of Chile employs achievement tests much like those used in the USA and because it uses broadband screens to identify students at risk in a national mental health program designed to improve educational outcomes.

Habilidades para la Vida [translated as Skills for Life (SFL)] is a school-based mental health promotion and intervention program that is offered by the National Board of School Aid and Scholarships (JUNAEB) of Chile. Beginning in the 1990s, JUNAEB started to provide programs aimed at promoting equal opportunity to education and preventing school dropout for all students in Chile. In addition to school meals, and programs correcting dental, vision, hearing and orthopedic problems that could interfere with learning, a program targeting mental health was added in 1998. SFL had evolved over the 1990s from studies of the prevalence of mental health problems in Chile, the adaptation and validation of screening instruments for that country and the development of the intervention itself. The intervention followed the three-tiered model (promotion/targeted prevention/intervention) recommended by the World Health Organization in a seminal report [19] and developed over time in collaboration with local communities and consultation from a number of authorities in children’s mental health including Drs. Sheppard Kellam and Thomas Anders [10]. The program’s voluntary adoption by so many local school districts is probably due to their perception that it is effective in reducing behavior problems and this in turn is one of the most important factors in its steady growth in a handful of schools in 1998 to more than 1,000 at the present time.

Skills for Life is available on a voluntary basis to selected public and subsidized private schools in Chile that meet criteria for “high risk” based in part on a formula created by the World Health Organization that takes school-level indicators of family income, maternal education and other factors into account. In Chile, about 20% of all schools are considered to be high risk by this standard. The SFL program reached national scale in 2002 with 402 schools participating, and since that time it has grown to include 1,172 schools and 181,352 students [23].

In keeping with the original WHO model, SFL offers mental health promotion activities for all students, teachers and parents in participating schools, preventive intervention workshops for children screened at risk and referral to outside professionals for students found to be at highest risk. For screening, SFL employs brief standardized measures—developed in the USA and adapted and validated in Chile—as broadband screens for mental health problems for all of the first grade students in these schools. Students identified with problems on the teacher screen are referred to a 15-session school-based preventive intervention and those with problems identified by the parent screen are referred to their primary care providers for further evaluation.

The current paper focuses on evidence on whether mental health matters in this elementary school sample: whether the mental health screening scores from first grade of more than 11,000 students predict academic achievement test scores 3 years later. If the answer is yes, then the value of broadband psychosocial screening in pediatric and educational settings would be better established and the stage would also be set for assessing the impact of large-scale interventions using mental health and academic measures like these.

Methods

Sample

All subjects in the current study began as first grade students in 2002 in schools that were participants in the Skills for Life Program. The mental health screening scores of these students from the first grade were merged with their SIMCE achievement test scores from the fourth grade in 2005.

Design, setting, and population

Data on academic testing come from the SIMCE (Sistema de Medicion de Calidad de la Education), which is required for all students in Chile in the fourth, eighth and tenth grades [31]. The analytic data set for this paper required the combination of two different databases: (1) national fourth grade SIMCE test scores including parent background data questionnaire and (2) scores from first grade teacher and parent-reported mental health screening and risk factor questionnaires. After Chilean staff had merged the data using the national student ID number as the match, identifying information was removed and the data set was sent to the USA for the analyses reported here. This procedure was reviewed and approved as exempt by the Partners Human Research Review Board in February of 2009 as well as the requisite authorities in Chile.

Measures

Outcomes

Academic achievement

For each of three subject areas (math, science and language), the SIMCE test includes 40–51 questions. In a 1999 national standardization, SIMCE scores were calculated to have a mean of 250 and a standard deviation of 50 [31]. In this paper, we report all three subtest scores, although for greater simplicity in the text, we focus primarily on the average of the three subject areas for each student.

Predictors

Mental health high risk

The main predictor variables were high (vs. lower) student mental health risk as assessed independently on the Chilean versions of the Teacher Observation of Classroom Adaptation-Revised (TOCA-R) and the Pediatric Symptom Checklist (PSC). Screens were administered during the second half of the school year after teachers had known their students for at least 4 months.

The TOCA-R is a valid and reliable measure, which has been used in the USA for over two decades and was one of the primary measures in an influential series of studies of elementary school classroom behavior, intervention and adult outcomes that started in the 1990s [51]. The TOCA-R is a brief structured interview with a teacher that is administered by a trained assessor. Teachers respond to items pertaining to the child’s adaptation to classroom task demands as a way to identify students with problems. The TOCA-R [26] has 31 items each scored on a six-point Likert scale from ‘never’ to ‘always’ and groups of four to seven items are summed to produce scores on six problem-specific subscales. The studies referenced above have shown that one of the TOCA-R subscales (authority acceptance) scores in elementary school is predictive of real-world health, academic and mental health problems into adulthood [25, 38]. In the 1990s, Chilean social scientists, university staff, investigators and educational officials translated, adapted and validated the TOCA-R for a Chilean context in a series of studies. This measure is called the TOCA-RR.

The SFL program uses TOCA-RR Global score in the first grade as the primary determinant of overall risk. TOCA Global Risk is a dichotomous variable assigned by the SFL national program staff at the end of the first grade on the basis of the pattern of high-risk scores on the six subscales. All students who are at risk on the TOCA Global score in the first grade are referred to school-based workshops in the second grade. Because the cohort followed in this paper was the first in the national implementation of SFL, there were high rates (30–50%) of missing data about workshop attendance and referral to primary care, and it was not possible to study the association between the interventions and academic test scores in this cohort. However, exploratory analyses using these variables were conducted and are reported below.

The second mental health screen used in SFL is the Chilean version of the Pediatric Symptom Checklist (PSC-Cl). The PSC is a one-page questionnaire listing a broad range of children’s emotional and behavioral problems [22]. Over the past two decades, the PSC has been one of the most widely used psychosocial screens for children in the USA, validated in a national sample and in many subpopulations [27] and recommended for use as a part of the Medicaid EPSDT program in many states [42]. A recent study [15] has shown a high correlation between the short version of the PSC (17) and a number of well-accepted brief measures of specific types of psychopathology (like the Children’s Depression Inventory [28]) and with overall psychiatric diagnosis on the K-SADS-L [3].

The PSC also went through an extensive adaptation to the Chilean context by the same team that validated the TOCA-RR [10, 16, 17]. The PSC-Cl contains 33 items, each of which is reported by the parent as never, sometimes or often present as on the US PSC. On the PSC-Cl, the items are weighted as 1, 2 and 3, whereas on the US version they are rated as 0, 1 and 2. On both versions, a total score is obtained by summing the weighted scores for all items and higher scores indicate more problems and more serious risk. As on the US form, PSC-Cl total scores can be recoded into a dichotomous score based on a pre-determined cutoff score. For the PSC-Cl, scores of 65 or higher indicate mental health risk [10]. To obtain an estimate of overall mental health risk in the first grade, we combined the categorical TOCA-RR and PSC-Cl scores to distinguish between students who were at risk on both measures, only one or neither.

Covariates

The PSC-Cl form also assesses five risk factors: whether (1) the mother was a teenager when the child was born, (2) whether the father lives with the child, (3) whether the child lives with a relative disabled by mental illness, (4) whether the family participates in organized social activities, and (5) whether the child has a chronic illness leading to one or more school absences a month. These items are all dichotomous, requiring parents to give yes/no answers. These variables were added to the Chilean version of the PSC in 1992 and have remained a part of it ever since, because officials have found them to provide useful information [30]. The TOCA form in both the Chilean and US versions contains an additional summary item that asks the teacher to rate the child’s overall academic progress on a six-point Likert scale ranging from excellent to failing. To ascertain the impact of mental health problems on standardized test scores in the fourth grade in these analyses, we used this variable to control for academic ability in the first grade. In keeping with the dichotomous approach to risk factors shown in the analyses presented in Table 2, the Likert categories of failing, very poor and poor achievement in the first grade were grouped to operationalize a category of academic risk and contrasted with Likert ratings of fair, good and very good achievement, which were grouped together to operationalize a lack of academic risk. Scores on this one TOCA-RR item and on the five PSC-Cl risk factor items are not considered in the TOCA-RR or PSC-Cl total scores.

We also used several background factors from the SIMCE parent questionnaire as additional covariates: child gender; family socioeconomic status (SES; a recoding of the family’s reported monthly income into five levels), whether the school was public or private (with or without public subsidy) and mother’s and father’s level of education (elementary, high school, college). These five SIMCE variables were collected during fourth grade.

Data analysis

Mental health and academic achievement

We first examined bivariate relationships between the key predictor variables and SIMCE scores without adjustments and then the same relationships taking covariates into account. We used inference via multiple imputations (MI) to make full use of observed data and to incorporate the added uncertainty due to missing data. MI is a simulation-based inferential tool operating on multiple “completed” data sets, where the missing values are replaced by real values based on random draws from their respective predictive distributions [53]. Based on this model, multiple draws were used to impute missing data due to mismatched or incomplete scores on the PSC. Our imputation model accounted for clustering due to school and relationships that are important to our substantive analyses, while also reflecting the potential causes for missing data.

Results

In 2002 when the students in the program entered school, 402 schools in 43 districts with 25,442 first grade students had elected to participate in the SFL program. Schools submitted usable SFL questionnaire data on 20,135 of these students and 17,252 (68%) of the students had completed forms for the TOCA. It was possible to match 11,185 (65%) of these students with their SIMCE scores in the fourth grade. Completed PSC-Cls were obtained for 8,510 (76%) of these children, of whom 7,903 (93%) had complete data on all five PSC-Cl risk factors and all other variables. This became the primary analytic sample for the current paper.

Table 1 compares students in this subsample (‘Teacher and parent-screened sample’; column 1) with students in the ‘Teacher but not parent-screened sample’ (those whose parents did not fill out a PSC-Cl; column 2). These two subsamples are similar overall with no differences on type of school and gender and only slight differences on socioeconomic status and summary SIMCE score, with the PSC completers significantly less likely to be from the lowest SES groups and to have higher SIMCE scores.

Table 1 Information on analytic subsamples

Table 2 shows the bivariate relationships between risk factors and standardized test scores (with no adjustments) in the fourth grade for those in the teacher- and parent-screened sample. All but one of the 11 risk factor variables were significantly associated with lower scores on SIMCE ‘3 subject summary score’ in the fourth grade. Since the patterns for all three subjects and summary tests were similar, in text we will describe only the SIMCE summary score findings. As shown in the first (‘Summary’) column under ‘SIMCE fourth grade achievement test scores’ in Table 2, the only exception to this pattern was with regard to the impact of gender: boys had significantly higher SIMCE summary scores (246.99) than girls (244.71) although this obscured the more complex pattern in which boys had higher science and math scores and girls had higher language test scores.

Table 2 Relationship between risk factors in the first grade and achievement test scores in the fourth grade for cases with complete data

Continuing down the ‘Summary’ column in Table 2, public school students had scores that were about 14 points lower than students in private or subsidized private schools (242.94 vs. 257.04; F = 116.23, p < .001). Students from the lowest SES group had scores that were about 38 points lower than students from the highest (upper middle) SES group (F = 180.85, p < .001). Students whose mothers had only grade school educations scored an average of 29 points lower than students whose mothers had educations beyond high school (F = 210.44, p < .001). Students whose fathers had only grade school educations scored an average of 32 points lower than students whose fathers had education beyond high school (F = 265.08, p < .001). Students who had had teenaged mothers, absent fathers, family histories of mental illness or frequent school absences had SIMCE scores that were from three to eight points lower than did students without these risk factors. The final rows of Table 2 show that the strongest predictor of standardized test scores in the fourth grade is teacher rating of academic achievement problems in the first grade. For the 5% or so of students who were rated by teachers as having poor, very poor or failing achievement in first grade, the average fourth grade SIMCE scores were 49 points (almost a full standard deviation) lower than they were for students rated by their teachers as having fair, good or very good academic achievement in the first grade (199.45 vs. 248.43; F = 445.83, p < .001).

Table 3 shows the relationship between mental health risk assessed in the first grade (as measured by the TOCA-RR and PSC-Cl separately and combined) and standardized test scores in the fourth grade, again with no adjustment for covariates. In the analytic sample, 11% of the students were classified as being at risk on the TOCA-RR, as were 11% of the students on the PSC-Cl. On the two measures combined, 3% of students were at risk on both measures, 16% on one and 81% on neither. Students rated as being at risk on the TOCA-RR in the first grade had fourth grade SIMCE summary scores that were about 28 points lower than students not at risk on Global TOCA-RR score in the first grade (220.95 vs. 248.98; F = 284.61, p < .001). Students rated as being at risk on the PSC-Cl in the first grade had fourth grade SIMCE summary scores that were 22 points lower than students not at risk on PSC-Cl in the first grade (F = 176.51, p < .001). As shown in the last three rows in Table 3, students with both teacher- and parent-rated mental health problems in the first grade scored an average of nearly 40 points lower on summary SIMCE score than students coded as being at risk on neither of the screens (F = 198.89, p < .001). Students coded as being at risk on only one of the two screens had SIMCE scores in between.

Table 3 Relationship between mental health and behavior risk in the first grade and academic test scores in the fourth grades unadjusted for covariates

Table 4 shows the same relationships as Table 3, except that these results control for all of the covariates listed in Table 2 and use linear mixed-effects models to control for school-level variation and to handle missing data in the covariates and the responses using inference by multiple imputation. Even after these adjustments, teacher- and parent-rated mental health risk assessed separately and/or together in the first grade were still found to be significantly associated with poorer standardized test scores in the fourth grade. In these analyses, the apparent difference in SIMCE average score related to mental health risk decreases slightly from 28 to 21 points on the TOCA-RR, from 22 to 16 points on the PSC-Cl, and from 39 to 33 points on the two measures combined, still about two-thirds of a standard deviation in summary SIMCE score. The pattern on all of the SIMCE subtests is the same.

Table 4 Relationship between mental health and behavior risk in the first grade and academic test scores in the fourth grades adjusted for covariates, school-level clustering, and missing data uncertainty via multiple imputation

In a final set of analyses not shown here, the data on workshop attendance and referral to specialists were added as covariates to the multivariate equations shown in Table 4, and the results did not change appreciably. The relationship between TOCA-RR and PSC-Cl risk separately and combined and SIMCE summary score were still significant and of the same magnitude as reported above.

Discussion

Results from this study demonstrate that being identified with a mental health problem in the first grade on a broadband teacher, parent or combined screen predicted significantly poorer performance 3 years later on routinely given standardized achievement tests, the most widely used benchmark for children’s real-world functioning in the USA and in many other countries. The study also shows that mental health problems stand on their own as an independent risk factor after controlling for other major risk factors.

This study allowed us to quantify the impact of mental health problems on student achievement. Using a multivariable analysis of variance approach, when the two measures of mental health were combined, the unadjusted 39-point differential in achievement test scores between students with both and students with neither of the mental health screening risks was slightly larger than the 38-point difference in achievement for the poorest versus upper middle class children, making mental health the second most powerful predictor of achievement in the study (after the 50-point impact of teacher rating of academic performance at baseline). Paternal and maternal education were the fourth and fifth most powerful variables and of almost comparable magnitude (impacts of 32 and 29 points, respectively), but the other social risk factors studied were of a smaller magnitude (3–8 points), although still also significantly associated with lower test scores. Since socioeconomic status, native ability and parental education [9, 43] have been long established as the most powerful predictors of academic achievement in elementary school, this study’s finding that mental health is of comparable magnitude is of great importance since it provides empirical support for efforts in Chile, the USA and international agencies such as the WHO that are based on the premise that mental health really does matter for children.

Our findings are limited to some extent by the large amount of missing data on some of the variables in this real-world sample. However, when multiple imputation, a procedure that can be used to adjust the data in such a way as to control for the ‘missingness’ of some of the variables was used in the current analyses, the observed relationships between mental health screening scores and achievement test scores were still significant and of almost the same magnitude.

Another limitation of the current study is that it does not address the key question of the impact on achievement of the intervention that was provided to some of the students in this sample. With about one-third of the analytic sample missing, follow-up data on attendance at the school-based workshops and/or referral to outside specialists in this first year of national data collection, it was not possible to assess the impact of interventions in this sample. A paper in preparation [34] based on a later cohort, which has less missing data on these variables, does report a significant relationship between participating in the SFL workshops and improved academic outcomes.

It should also be noted that this study was done in a single country. Chile is in some ways unique in its combination of characteristics of developed and developing nations [8]. As one of the most prosperous nations in the southern hemisphere, it has resources for screening and intervention that other developing nations may lack and it has the flexibility to design large new programs that many developed countries may lack. Overall, however, the prevalence of psychosocial problems and the need for academic achievement are concerns for children around the world.

A final limitation of this study is that although the SIMCE test is thought to be similar to standardized academic achievement tests given in the USA, there have been few if any studies that prove this. Therefore, the power of the PSC and TOCA to predict achievement test scores in the USA remains to be demonstrated.

Conclusion

Although the limitations noted above constrain its conclusions somewhat, the current study nevertheless provides the strongest evidence to date that students with overall, broadband mental health risk have lower levels of subsequent academic achievement as measured by standardized academic achievement tests when compared with students who are not at overall mental health risk. Unlike poverty, parental education and preexisting academic ability—the other major predictors of academic success in this study—mental health is a risk factor that may yield to intervention.

In the USA, programs such as Medicaid and Head Start have long acknowledged, but never fully acted upon, the need to address the mental health needs of children. Neither program has effectively implemented mental health screening or treatment on a national scale despite program standards that require this [21, 42]. In the rest of the world too, the burden of mental health problems has long been acknowledged, but until now not specifically addressed through national programs [12, 19, 52]. By providing data on the strong association between psychosocial risk identified through screening and poorer achievement test scores in a large real-world sample, the current study provides clear evidence that mental health matters for children. We hope that this evidence will encourage efforts to fully implement the recommended mental health standards for screening and intervention in pediatric and educational settings in the USA and other countries as well. As recommendations for screening are followed increasingly throughout the world, it should be possible to ascertain whether current findings about the relationship between mental health measures and academic achievement test scores are replicated in other countries, which use different tests. Perhaps even more importantly, existing data in Chile can be analyzed to ascertain whether a relatively inexpensive school-based intervention is associated with improved outcomes. If preliminary findings that show improved outcomes in Chile are confirmed, then child advocates in other countries will have the evidence they need to create and test similar programs in their own countries.