Key words

A vital component of school mental health (SMH) programming is program evaluation , which is the process of assessing how well SMH services meet local needs. SMH evaluators are interested in the value of a program while taking into account available resources and local goals, primarily to provide feedback to stakeholders (Wholey, Hatry, & Newcomer, 2010). But unfortunately, high-quality program evaluation is difficult to achieve (cf., de Anda, 2007; Rones & Hoagwood, 2000; Weist, 2005). In this chapter, we examine program evaluation from the standpoint of rural school practitioners (e.g., school counselors, school psychologists, consulting clinical psychologists). We contrast our topic with school-based research, which is commonly conducted by outside experts who investigate specific treatments or disabilities to advance a broader knowledge base. Program evaluations, on the other hand, are local audits that are typically not intended to speak beyond the evaluation setting.Footnote 1

SMH program evaluation introduces procedural challenges that are unlike those commonly encountered by school-based researchers . For instance, SMH referrals can reflect the entire spectrum of child mental health needs (Farmer, Burns, Phillips, Angold, & Costello, 2003), requiring evaluators to examine outcomes across a wide range of referral questions. Evaluators might also examine the impact of multiple, unrelated interventions to assess whether their combination reduces the need for costlier services (i.e., program accountability). For such reasons, evaluation is not synonymous with research, although one might inform the other.

If it is assumed that published SMH effectiveness studies indirectly reflect program evaluation, the findings would be a cause for concern. Even though most research studies in the schools are conducted by experienced researchers, outcomes are often mixed or even disappointing (e.g., Hoagwood & Erwin, 1997; Kimber, Sandell, & Bremberg, 2008; Peltonen, Qouta, Sarraj, & Punamäki, 2012; Watabe, Stewart, & Owens, 2013; Wei, Szumilas, & Kutcher, 2010).Footnote 2 Research conducted in rural schools is particularly discouraging due to methodological weaknesses and inconclusive findings (Arnold, Newman, Gaddy, & Dean, 2005). For such reasons, new efforts are underway to identify the factors that prevent efficacious treatments from working in real-world settings (i.e., implementation science) (Cook & Odom, 2013). Likely barriers include competing staff responsibilities, logistical issues, and a lack of educator support (Langley, Nadeem, Kataoka, Stein, & Jaycox, 2010), but clearly these same factors can complicate program evaluation as well. We believe the field is overdue for a revolution in program evaluation science to advance training and methodology in this misunderstood and often overlooked aspect of SMH.

In this chapter, we discuss professional standards and measurement concerns in the evaluation of rural SMH programs. We do not treat rurality solely in cultural terms because there is a lot of variability from region to region, but there are economic and organizational concerns that are important to consider in the context of program evaluation. Unlike their urban and suburban counterparts, rural practitioners cannot easily consolidate resources to achieve an economy of scale (cf., Chakraborty, Biswas, & Lewis, 2000), so a relatively large proportion of resources go to redundant services across schools. Whereas a large school district might pool resources, rural school teams often need to develop procedures independently and with fewer contributors. Thus, we focus our discussion on the components of program evaluation that rural evaluators working in small groups might find most challenging. In instances where program evaluation concerns apply more broadly, we point readers to several helpful resources and keep our comments brief. We then conclude this chapter by offering our thoughts for how to improve the evaluation of rural SMH programs based on over a decade of experience collaborating with diverse rural schools .

Standards for Program Evaluation

Program evaluation is not unique to school mental health. In fact, there is a rich history of program evaluation in many applied settings, resulting in widely accepted and readily applicable standards of practice. Although a complete review of these standards is beyond the scope of this chapter (see Yarbrough, Shulha, Hopson, & Caruthers, 2010 for a comprehensive discussion), we examine utility, feasibility, and accuracy standards, with an emphasis on the challenges that can cause rural SMH evaluations to fall short.

Utility Standards

To meet the needs of stakeholders (e.g., teachers, parents, students, community leaders, taxpayers), it is vital to ensure that evaluation outcomes are relevant and relatable. Utility standards for program evaluation pertain to the usefulness of the evaluation data (Yarbrough et al., 2010). In the evaluation of mental health programs, stakeholders will clearly want to know whether the program reduced morbidity, and perhaps improved safety in the setting. However, in rural communities, mental illness generally has greater stigma than in urban and suburban settings (Hoyt, Conger, Valde, & Weihs, 1997; Jones, Cook, & Wang, 2011), so evaluation outcomes that emphasize mental health labels could discourage participation due to the relative lack of anonymity in small communities. At the same time, typical barriers to accessing child mental health care in rural communities, such as a lack of transportation, scarcity of community-based mental health care providers, and a lack of health insurance, might be circumvented by SMH services (Owens, Andrews, Collins, Griffeth, & Mahoney, 2011). Thus, the challenge is to communicate the potential benefits of a SMH program, and at the same time avoid issues that might discomfit families, educators, and other stakeholders.

In our view, rural SMH evaluation is strengthened when evaluators measure outcomes related to client functioning—particularly school-related functioning—while avoiding potentially stigmatizing mental health labels. Outcomes related to children’s disorganization and attention problems, for example, provide information that is more helpful to rural families than a measure of “ADHD.” Similarly, a program that improves grades by helping students overcome procrastination and distraction may speak more effectively to rural parents than a program for adolescents who are “depressed.” By shifting away from stigmatized labels, program outcomes may become salient for rural families and educators (Owens et al., 2011).

Similarly, some stakeholders might expand the definition of SMH program success beyond academic or attendance outcomes to include student safety. School administrators and school board members in particular are often interested in seeing tangible reductions in student disciplinary actions. Although disciplinary actions predict long-term unwanted outcomes (e.g., Walker, Steiber, Ramsey, & O’Neill, 1993), these data can be problematic because schools rarely standardize the reporting systems. The lack of standardization may be a particular problem in rural schools (e.g., Michael et al., 2013), given the remoteness of some schools relative to others. In the interest of utility, evaluators might need to help teachers standardize the office referral process to provide valid and reliable answers to these questions over time. Without standardized definitions, stakeholders from different disciplines and backgrounds can have the experience of using different terms to describe similar concepts—a phenomenon that interdisciplinary teams often lament.

Feasibility Standards

Feasibility standards promote the efficiency of program evaluation, in part by assuring practicality and cost-effectiveness (Yarbrough et al., 2011). In rural settings, the feasibility of evaluation can be a concern due to limited resources. In our experience, there are limited options in rural settings for staffing and funding evaluation efforts. We have found that rural school administrators are hesitant to invest in new programs due to initial cost concerns, and then when existing programs become widely accepted, practitioners are hesitant to invest additional time and resources to evaluate their outcomes.

Beyond resource concerns, many SMH practitioners do not know how to demonstrate program effectiveness without expert help. School counselors, for example, receive little formal training in program evaluation beyond service recording (e.g., the number of counseling sessions provided) (Astramovich, Coker, & Hoskins, 2005). At the same time, stakeholders have a right to know that SMH practitioners are meeting their fiduciary responsibilities as a condition of continued investment (Poirier & Osher, 2006). Thus, evaluation costs are as integral a component of SMH programs as staffing and material costs, even though evaluation costs can take resources away from service provision. For this reason, we believe that feasibility concerns pose some of the greatest challenges for rural SMH evaluation. For program evaluation efforts to advance it will be vital for evaluators to make use of affordable measurement tools (see Instrumentation section below); but perhaps more importantly, it will be vital to demonstrate the long-term cost savings when children receive effective mental health care in the schools. If evaluators can clearly demonstrate student benefit and cost-effectiveness, stakeholders in rural areas will likely find SMH initiatives desirable.

Accuracy Standards

Accuracy standards relate to the need to ensure that the conclusions based on the evaluation data are justified (Yarbrough et al., 2011). One of the most important considerations in evaluation accuracy is treatment integrity, defined as the degree to which programs are implemented as intended (Schulte, Easton, & Parker, 2009). Outcomes can be compromised and the quality of the outcome evaluation can be weakened when integrity is poor (Durlak & DuPre, 2008). Even high-quality outcomes evaluations reflect the effectiveness of the program as delivered rather than as it was intended. If, for example, a Daily Report Card intervention (DRC) is implemented only intermittently rather than on a daily basis, the impact is likely to be weakened (Owens, Murphy, Richerson, Girio, & Himawan, 2008), but it would be a mistake to conclude that the DRC is useless. Thus, we must evaluate the process of intervention, and treatment integrity measures speak to these concerns. Poor integrity might explain why some efficacious programs fail when implemented in naturalistic school settings (Atkins, Fraizer, Adil, & Talbott, 2003).

Although published effectiveness studies in rural schools are rare, two findings in Appalachian public schools have implications for achieving evaluation accuracy. First, Owens et al. (2008) examined evidence-based practices in rural elementary schools using a naturalistic referral process rather than participant recruitment. The researchers included measures of treatment integrity in their study, including both dosage and adherence indicators. In terms of dosage, the number of sessions attended and the number of days the teachers collected intervention data were assessed. In terms of adherence, the clinicians rated how well the home-school component of the intervention included parents’ adherence to home-based procedures. The results suggest that teachers and parents implemented the interventions with acceptable (albeit imperfect) integrity, thereby providing support for the accuracy of the results (Owens et al., 2008). In other words, the authors could safely conclude that their results spoke to the efficacy of the intervention as designed.

Second, Albright and Michael (2013) evaluated a SMH program (Assessment, Support, and Counseling (ASC) ; Albright et al., 2013; Michael, Wandler, & Quick, 2010) in rural high schools that also responded to real-world referrals rather than participant recruitment. To evaluate the accuracy of their results, the researchers recorded dosage variables similar to Owens and colleagues (e.g., number of sessions), but also included client ratings of the clinicians’ effectiveness. Client ratings of the clinicians taken at treatment termination provided some evidence that the program was directly targeting client needs as they perceived them, which in this case were mostly crisis intervention needs. In effect, the client ratings spoke to the clinician’s competency in addressing mental health needs, which is important to ensure client buy-in.

Taken together, these studies suggest that the accuracy of program evaluation can be improved through adherence and competency measures including service tracking, permanent products, and satisfaction ratings. Program evaluations would clearly benefit from integrating similar measures, but other integrity strategies used in effectiveness trials may be less useful. For example, researchers often create manuals for their treatments to ensure adherence to a set protocol of interventions, but this may not always be a realistic option in referral-based SMH programs. Albright and Michael (2013), for instance, describe their treatments as including cognitive-behavioral therapy, crisis intervention, and school consultation, but due to the variety of referrals, manuals for every service proved infeasible. SMH program evaluators are likely to face this same challenge (see also Albright et al., 2013; Michael et al., 2013).

Assessment Strategies

Next we turn our attention to assessment strategies, which are the specific techniques used when collecting data during the evaluation process. Much is known about best practice assessment because the techniques are similar across treatment research and program evaluation. Still, given the challenges faced when evaluating SMH programs, it is vital to consider the standards outlined above. Even valid and reliable measures could have limited utility, for example, if the costs are too high or if the constructs measured do not convey meaningful results to stakeholders. For this reason, our assessment recommendations integrate the aforementioned evaluation standards—utility, feasibility, and accuracy standards—throughout.

To begin, it is important to think strategically about the purpose of a program evaluation. Aligning this purpose with the goals of stakeholders can be critical to assuring the relevance of the findings, but ensuring that the measures and methods selected adequately address the purpose will further increase the relevance of the evaluation. Errors in these initial decisions can be costly. For example, we are personally aware of a community mental health agency that was contracted to provide SMH services in a rural school district. As part of these efforts, the service provider was required to perform a program evaluation in the first year that was to be considered by the school board prior to renewal for the following year. The service provider kept track of the number and length of sessions with students, as well as satisfaction data collected from teachers at the end of the year. At the board meeting, the service provider reported contact with large numbers of students at every school and data about the number of sessions per student. In addition, they provided quotes from teachers who said positive things about their services. Nevertheless, none of these data spoke to whether the services benefitted students, as was noted by one of the school board members. No data were collected from or about students, and the perspective of parents was ignored. Having nothing other than service records and teacher quotes upon which to base their decision, the school board cancelled the contract. We believe that these events highlight the importance of understanding the goals of the program and considering the choice of questions with the stakeholders in the program. As is clear in this example, the purpose of the evaluation guides the choice of outcome domains and sources of information, consistent with the aforementioned evaluation standards.

Outcome Domains

In a review of SMH effectiveness studies, Rones and Hoagwood (2000) identified three outcome domains that might prove useful for program evaluation depending on the purpose of the evaluation: symptoms, functional impairments, and service usage (e.g., special education).

Symptoms. Most psychiatric disorders and other conditions of interest to school mental health programs are identified and differentiated from each other by the presence of specific symptoms. Symptoms are the observable behavioral features of a disorder, and assessing change in symptoms has been one of the most prominent methods of evaluating outcomes from treatment of mental health problems among youth (Weisz, Doss, & Hawley, 2005). Symptoms also provide a means of identifying youths who present with subclinical manifestations of disorders (i.e., are at risk for developing a disorder) and may still benefit from school-based mental health programs (Angold, Costello, Farmer, Burns, & Erkanli, 1999). Although symptoms have been the most prominent means of evaluating outcomes among youth (Weisz et al., 2005), increasingly researchers are recognizing that narrowly focusing on symptom reduction is insufficient and that outcomes beyond symptoms are needed in order to contextualize results (refer to Utility Standards above).

Functional impairments . Another prominent outcome domain in the treatment research and program evaluation literatures involves the problems (or distress) in daily life caused by symptoms of a disorder, referred to as functional impairment. Impairment is a required feature for the diagnosis of all child and adult disorders in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association (APA), 2013) and represents the impact of the symptoms or disorder on the individual or others. It is often impairment, rather than symptoms of the disorder, that lead to the need for mental health services. In some cases, impairments may be a better predictor of adverse outcomes (e.g., school failure) than a formal diagnosis (Vander-Stoep, Weiss, McKnight, Beresford, & Cohen, 2002) and may therefore be a better measure of treatment outcomes than symptoms alone. Youth who receive mental health services often experience impairment in multiple domains, most notably in academic and social functioning. As we mentioned above, program evaluations can benefit by including measures of change in students’ academic and social impairment over time. In terms of analysis, the reliable change index can tell the evaluator the degree to which each child’s change over time exceeds typical variation, based on the test-retest reliability of the instrument (see Jacobson & Truax, 1991). Similar calculations can be applied to symptom measures, but stakeholders in schools are likely to find reductions in impairment—particularly, academic impairment—most compelling.

Service use. Another important indicator that may be of interest to stakeholders is the use of school or community services following program implementation. A recent report on children’s mental health indicates that one in five children experience a mental health disorder each year, resulting in an estimated $247 billion cost to in terms of treatment utilization, special education, juvenile justice services, and decreased productivity (Centers for Disease Control and Prevention, 2013). It is not surprising that many program evaluations measure use of school and community services related to discipline, special education, mental health, and the juvenile justice system (Rones & Hoagwood, 2000). For example, in their evaluation of Project ACHIEVE, a comprehensive school reform process focused on improving outcomes for at-risk and underachieving students, Knoff and Batsche (1994) measured the number of students referred for special education assessment and the number placed in special education as outcomes. Similar types of service-use measures could clearly be useful in local program evaluations as well, and even seem to be implied as an outcome measure by many tiered prevention models (cf., Walker & Gresham, 2013).

Sources of Information

Several sources of information might be considered when conducting SMH program evaluation, but it may be difficult to determine whose data are most useful, feasible, and accurate. At the outset, it is important to determine if teacher, parent, child, clinician, observation, or school records will meet evaluation standards, so we discuss the strengths and weaknesses of each of these sources in turn.

Teacher reports. Teacher reports seem vital to high-quality SMH program evaluations, given the need for assessing school-related impairments; however, teacher reports are generally more accurate for identifying or assessing externalizing rather than internalizing problems, due to the overt/covert nature of these domains (McMahon & Frick, 2005; Pelham, Fabiano, & Massetti, 2005; Silverman & Ollendick, 2005). For example, teachers may easily observe when a student is refusing to comply with requests or is out of his or her chair (externalizing problems), but not notice when a student feels hopeless or worried (internalizing problems). Thus, we generally recommend prioritizing teacher reports when monitoring externalized concerns and parent or student self-reports (described below) when monitoring internalized concerns. In secondary schools, the use of teacher reports is further complicated by the fact that students typically work with multiple teachers during the school day, so it is not always clear which teacher(s) ought to be included. There is some evidence for systematic teacher bias, including the tendency for women and early service teachers to be more severe in their ratings of externalizing disorders than men or more experienced teachers (Schultz & Evans, 2012), so evaluators must use caution when choosing, or weighing disagreements among, several teacher reporters.

Parent reports. Professional recommendations for child and adolescent assessment include the use of parent report for both internalizing and externalizing disorders (Hunsley & Mash, 2007). Parent report is important to assessing for childhood problems because parents are typically involved in children’s day-to-day lives, making them informed reporters. Parent reports are also important because children may not be reliable when reporting the temporal sequence of their problems (Klein, Dougherty, & Olino, 2005). Thus, collecting parent reports can greatly inform outcome assessments of SMH programs, but there are some limitations. For example, parent psychopathology can bias these reports and we can reasonably expect such concerns in many SMH cases. As a case in point, it has been demonstrated that mothers with depression over-report symptoms of ADHD (Pelham et al., 2005) and depression in children (Klein et al., 2005). As a result, program evaluation cannot safely rely on these reports alone, and teacher ratings might be used to confirm or supplement parent data.

Child self-report . In general, child and adolescent self-reports are less reliable for externalizing problems as compared with internalizing problems. Children and adolescents with behavior disorders often underestimate aggressive behaviors, symptoms, and overall impairment when rating themselves (McMahon & Frick, 2005; Pelham et al., 2005). But self-reports can be useful when assessing outcomes related to covert forms of conduct problems, such as drug use, risky sex, and dangerous driving behaviors. For internalizing problems, such as anxiety and depression, child and adolescents self-report can be more valid than either teacher or parent report. One caution regarding self-report of internalizing problems is that some groups of children and adolescents (e.g., younger children, African American, and Hispanic American youths) may be more likely to minimize problems, a bias sometimes referred to as social desirability (Silverman & Ollendick, 2005).

Clinician report. In some instances, evaluators might be tempted to have clinicians and interventionists report their impressions of outcomes, particularly when other information is missing or otherwise unavailable. The potential for bias in these reports is obvious, particularly when evaluations affect resources, but there are instances where clinicians can provide context for other evaluation outcomes. For instance, clinician report of the therapeutic alliance can predict outcomes to some degree (Elvins & Green, 2008; McLeod, 2011). As such, clinician alliance ratings can lend support for treatment competence, consistent with accuracy standards (assuming adequate validity and reliability). Clinician reports have also been used to measure how well parents have followed through with aspects of intervention. For example, Owens et al. (2008) asked clinicians to rate parent adherence to a home-school collaborative intervention, but the accuracy of these data was unclear.

Multiple informants. Given the potential for source biases, multiple-informant assessment is widely recommended, but inter-informant disagreement is common (Achenbach, McConaughy, & Howell, 1987). SMH practitioners often grapple with how to integrate the differing reports, understanding that disagreement between raters does not necessarily mean that the reports are invalid. Differences between raters could be due to differences in tolerance for child and adolescent behaviors or differences in how students behave across situations. Thus, variations in report across informants could offer valuable insight into target behavior, but potential rater biases still need to be considered.

Several suggestions for integrating reports from multiple informants can be found in the literature (Klein et al., 2005). One strategy is the “or” rule, which assumes that a behavior is present if it is reported by any informant. Alternatively, there is the “and” rule, which requires at least two informants confirm an observation. A third approach that more closely resembles clinical practice is the “best estimate” strategy, which relies on clinical judgment to integrate varying reports from informants. Although the “best estimate” strategy can introduce clinician bias, there is some evidence to suggest that the reliability of this estimate can be high (e.g., Klein, Ouimette, Kelly, Ferro, & Riso, 1994).

Direct observation. Systematic direct observation (SDO) typically involves observing students in their normal environments (e.g., classrooms) to assess changes in behavior. SDO can lead to accurate and contextualized measurement of behaviors that cause impairment, but observations often require significant staff resources (i.e., training, time) and often have limited reliability and generalizability without repeated observations (Hintze & Matthews, 2004). These concerns can render direct observations infeasible for large numbers of students. Moreover, low-frequency behaviors (e.g., physical aggression) are poor targets for SDO because the observer may not see the behavior of interest. Still, SDO of academic (e.g., off-task or disruptive behavior) and social (e.g., withdrawal) behaviors can provide important information about changes in student behavior over time.

Existing school data. Existing school data (e.g., grades, disciplinary referrals, attendance, and gradebook data) are attractive because of availability, relevance, and freedom from biases caused by social desirability, parent psychopathology, or evaluator judgment. For these reasons, existing data can serve as a convenient and valid outcome measure of SMH programs in some instances. But there are limitations because these data generally do not specify the duration, frequency, or intensity of specific problems (Riley-Tillman, Kalberer, & Chafouleas, 2005). There can also be questions regarding the reliability and validity of existing school data. For example, grading policies can vary across teachers or school districts and grades are influenced by external social influences such as poverty and other major life stressors. Given these limitations, we recommend that evaluators only include school data as part of a multi-method, multi-informant strategy. When multiple sources of information support the same conclusion, evaluators can have confidence in their outcomes.

Instrumentation

Researchers and program evaluators alike have searched for “generic” measures to assess outcomes for myriad referral questions (Schulte, 2010). Similarly, it would be convenient to identify a General Outcome Measure (GOM ; Deno, Mirkin, & Chiang, 1982) for each outcome domain with strong psychometrics and relevance (Walker & Gresham, 2013), but so far generic measures and GOMs have proven elusive. Instead, there are multiple candidate instruments for the various outcomes of interest, each with strengths and weaknesses. The options and tradeoffs can be overwhelming, especially given that instrumentation interjects as much variance into outcome estimates as do the actual treatments (Wilson & Lipsey, 2001). Thus, instrumentation is a critical element of program evaluation that requires careful consideration.

At the outset, evaluators must develop hypotheses about which outcomes the program interventions are most likely to impact, and then select relevant measures for each outcome of interest. It can be useful to also consider possible distal outcomes and measure those as part of this process as well. Distal outcomes can include nontarget symptoms (e.g., reduction in child depressive symptoms after an intervention targeting ADHD), additional functioning measures (e.g., peer sociometric ratings after a social skills intervention), environmental impact measures (e.g., reduction in maternal depressive symptoms after a behavioral parent training intervention), and client satisfaction measures (e.g., parent satisfaction with group parent training). For this reason, treatment research utilizes 12 participant measures on average (Weisz et al., 2005), which is likely to prove unrealistic for rural SMH program evaluations. But by expanding the scope of assessment beyond one or two outcomes, evaluators can gain a comprehensive understanding of treatment outcomes relative to the target concerns, as well as the child’s overall level of functioning (e.g., classroom functioning, family functioning).

Table 21.1 provides a brief overview of instruments that can be useful for SMH program evaluation. Given the limited resources available in rural communities, we highlight free instruments that are readily available online. Each instrument listed includes information regarding the class of instrument (e.g., rating scale), source (e.g., parent, teacher), domain (e.g., symptoms, impairment, satisfaction, classroom or family functioning), construct assessed (e.g., depression, academic performance), and age range for which it has been validated.Footnote 3 Of course, many more instruments could be useful depending on the needs of evaluators, so readers are encouraged to refer to the treatment effectiveness and program evaluation literatures. Readers needing a more thorough review of the various classes of instruments than is provided here are encouraged to read one of the excellent published reviews (e.g., Pelham et al., 2005; Riley-Tillman et al., 2005). Below we offer brief overviews of some of the instruments highlighted in Table 21.1.

Table 21.1 Publicly available measures of symptoms and impairment

Symptoms. In the symptom domain, instruments generally fall into two categories: broadband and narrowband. Broadband scales measure a wide variety of behavior concerns, generally including both externalizing and internalizing symptoms, whereas narrowband ratings focus on specific concerns, such as anxiety or depression. In program evaluation, we would predict that broadband ratings would prove most useful for the reasons stated at the beginning of this chapter. For example, the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997) is a 25-item behavioral rating scale that can be used to screen for or progress monitor problems in a number of domains. Parents and teachers are asked to rate the severity of emotional symptoms, conduct problems, symptoms of hyperactivity/inattention, peer relationship problems, and prosocial behavior. As such, the SDQ could be a useful outcome measure suitable for program evaluation purposes.

It is also conceivable that narrowband scales could be useful in evaluation if these measures are targeted to specific referral questions. If the evaluation is stratified by the referral category, for example, narrowband scales might be used for each. There are several examples of narrowband rating scales that could be useful for these purposes. For example, the Disruptive Behavior Disorders Rating Scale (DBD ; Pelham, Gnagy, Greenslade, & Milich, 1992) assesses symptoms of a number of externalizing disorders common in childhood and adolescence, including attention deficit/hyperactivity disorder (ADHD) , oppositional defiant disorder (ODD) , and conduct disorder (CD) . Currently the items on this instrument are similar to the diagnostic items found in the DSM.

Functional impairment. Similar to the measures of symptoms, instruments that assess functional impairment can focus on specific domains of impairment (e.g., academic, social) or global outcomes. For example, the Academic Performance Rating Scale (APRS; DuPaul, Rapport, & Perriello, 1991) is a measure completed by teachers to rate a child’s academic performance over the past week across a number of subject domains and academic abilities. Sometimes it may be more useful to impact of treatment across multiple domains of functioning or on the overall functioning of a child. The Impairment Rating Scale (IRS; Fabiano et al., 2006) has parents and teachers rate the severity of impairment and need for treatment in multiple areas (e.g., relationships with peers, siblings, parents, teacher; academic performance; self-esteem; and overall) resulting to the child’s presenting problems. Overall impairment can also be assessed using rating scales filled out be either the clinician (e.g., The Children’s Global Assessment Scale (CGAS) ; Shaffer et al., 1983) or parent and child report (e.g., Columbia Impairment Scale (CIS) ; Bird, Shaffer, Fisher, & Gould, 1993).

Systemic outcomes. Beyond measures of symptoms and impairment , program evaluators may be interested in the impact SMH programs have on families and schools. For example, the Parenting Stress Scale (PSS; Berry & Jones, 1995) has been used to obtain parent report on the amount of stress they experience in their role. Although we do not include such instruments in Table 21.1, readers can refer to Pritchett et al. (2011) for an extensive list of family functioning measures that could prove useful in SMH program evaluations.

Conclusion

We have explored the goals of program evaluation and how those goals might be achieved for rural SMH programs. As we have shown, program evaluation is not synonymous with school-based research, which often has expert support, limited foci, and implications for the broader field. Program evaluation, by comparison, is intended to assess whether SMH programs meet the needs and expectations of a local community. Standards for program evaluation guide the practice, but rural evaluators are likely to encounter economic and organizational challenges along the way. We have highlighted several of these challenges in this chapter, but we cannot anticipate all potential difficulties; readers will need to consider the possible roadblocks in their setting when planning an evaluation. High-quality program evaluation requires that utility, feasibility, and accuracy standards are maintained.

Early in this chapter, we claimed that school mental health was overdue for a revolution in program evaluation science. It seems this need has been overlooked partly due to confusion between treatment research and program evaluation. Field-based treatment research has the laudable goal of establishing the effectiveness of a given treatment or program, but we should also recognize that even the most well-established techniques could fail because of setting-specific incompatibilities. That failure may or may not represent a threat to the “evidence-based” status of the treatment, but it will most certainly have implications for the future of that program in that particular setting. Thus, best practices require that program evaluations are conducted in all localities, regardless of whether the treatments are “evidence-based” or not. Of course, this statement in itself is not revolutionary, but the implications for training are. It is clear from the literature that SMH professionals are largely unprepared to conduct high-quality program evaluations without external support and added resources. If SMH professionals assume that quality evaluations are only conducted by expert researchers, the true impact of local SMH programs will go unexamined. In our experiences collaborating with rural schools , this is certainly the case—SMH practitioners rarely have their programs evaluated. When programs are evaluated, practitioners seem to assume that convenient data, such as service records and client grades, are sufficient to meet the needs of the stakeholders. But as we have pointed out, such data are inadequate for meeting the standards of quality program evaluation. The solution will require fundamental changes to how SMH professionals approach and evaluate their practice.