Key words

Evidence-based medicine (EBM) is a clinical practice that uses the most empirically supported methods and most current evidence in the clinical decision-making process (Sackett, Rosenberg, Gray, Haynes, & Richardson, 1996). EBM is a patient-focused, self-correcting model, intended to improve the quality of care for clients . Benefits of EBM include the following: (1) allows clinicians to utilize updated research in decision making; (2) improves efficacy and efficiency for treatment provision; (3) decreases the use of ineffective or untested methods; (4) informs both clinician and client about the best treatments by utilizing publicly available information on different treatment routes; and (5) provides an empirical basis for constructing health-care policy (Romana, 2006).

In the field of mental health, this surge in EBM promotion pushes for the application of both evidence-based assessment (EBA ; Hunsley & Mash, 2007; Youngstrom, 2013) to enhance treatment guidance and empirically supported treatments (EST; Chambless & Hollon, 1998; Southam-Gerow & Prinstein, 2014) to ensure that the chosen approaches have been empirically validated. This chapter focuses on EBA for rural settings while seeking to integrate the guidelines of EBM with traditional psychological assessment through a client- and test-centered approach. Given the lack of literature geared for rural settings, this chapter partly uses a “lessons learned” approach gained through the authors’ real-world experiences while still providing an empirical rationale for EBA.

The benefits of adopting an EBA approach to diagnosis are substantial. Based upon multiple studies examining decision-making skills in diagnosis with real clinicians, obtaining an accurate diagnosis is significantly more likely when working through the EBA approach than when relying solely upon clinician judgment (Ægisdóttir et al., 2006; Dawes, Faust, & Meehl, 1989; Jenkins, Youngstrom, Washburn, & Youngstrom, 2011; Meehl, 1954). Data collected through intake or subsequent sessions are used more efficiently and objectively than when we impressionistically interpret the same information. The EBA approach reduces a clinician’s tendency to over-interpret cues of risk in ways that often lead to over-diagnosis of disorders (Croskerry, 2003; Gigerenzer, 2002; Jenkins et al., 2011). Lastly and most importantly for academic and community settings alike, using this type of conscientious approach to diagnosis is not only economical but also more feasible. Once it becomes habit, this method saves already overscheduled clinicians’ valuable time (Straus, Glasziou, Richardson, & Haynes, 2011; Youngstrom, Choukas-Bradley, Calhoun, & Jensen-Doss, 2014). For these reasons, the EBA approach is a commonsense model to overcome some of the difficulties faced in rural settings.

Rurality is difficult to define, and often those unfamiliar with rural settings mistakenly believe that all are identical. More individuals are living at or below the poverty level in rural areas, and there is less access to specialized medical and mental health services , including psychological care (Owens, Watabe, & Michael, 2013; Slama, 2004). Instead of these barriers thwarting the eventual existence of EBA within the rural setting, these issues clearly define more reason to employ the method. Rural areas need a simple diagnostic approach that saves money and time, increasing diagnostic validity and, in turn, accurate treatment selection.

An Introduction to Evidence-Based Assessment

EBA seeks to calculate the probability of a diagnosis when forming decisions concerning further assessment or initiation of treatment (Straus et al., 2011). Each client has a single probability of having a given diagnosis lying on a continuum between 0 and 100%. There are two thresholds that divide the continuum into three major zones of clinical action. EBA refers to these thresholds as the wait-test threshold and the test-treatment threshold (Straus et al., 2011; Youngstrom & Frazier, 2013; see Fig. 7.1). Finding the probability of a diagnosis below the wait-test threshold suggests that a diagnosis can be ruled out with confidence, while a probability estimate of a diagnosis above the test-treatment threshold suggests that a diagnosis can be confidently assigned, and treatment tailored to that diagnosis or diagnoses can begin (Straus et al., 2011).

Fig. 7.1
figure 1

The decision continuum with three major zones of clinical action divided by the wait-test threshold and the test-treat threshold. Note: This is not drawn to scale, and determining where the thresholds are to be placed will vary depending on the client

Determining where to set these thresholds requires clinical judgment based on the risks and benefits associated with a diagnosis or lack of a diagnosis (Youngstrom, Jenkins, Jensen-Doss, & Youngstrom, 2012). For example, clinicians may decide that if the probability of a diagnosis is 30% or below, then the odds of a client having the diagnosis in question are sufficiently low to cease further assessment. Thus, 30% would signify the wait-test threshold. Conversely, the clinician may decide that if the probability of a diagnosis is 80% or higher, then the odds of a client having the diagnosis in question are sufficiently high to start treatment. Here, 80% would signify the test-treatment threshold. So long as the client is above 30% and below 80%, further testing may resume. Once the client reaches 30% or lower, or 80% or higher, then the appropriate action can be taken (i.e., discontinue further assessment or initiate treatment, respectively). These numbers are not official suggestions, as they will vary from one diagnosis to the next. In the case of treatments where a risk of harm is minimal (e.g., exposure therapy for panic disorder; Olatunji, Deacon, & Abramowitz, 2009), then the test-treatment threshold may be lower relative to disorders where there is a greater risk of harm (e.g., those requiring high doses of psychiatric drugs that carry significant adverse effect risks).

How does a clinician figure out these probabilities for each case? New approaches take care of all the algebra for us. A paper tool called a “probability nomogram ” turns this into an exercise in connecting the dots (Youngstrom, 2013; see Fig. 7.2). The base rate of a disorder within a clinical setting is a good starting probability for a client before we integrate other information we have learned about them (Meehl, 1954). This base rate goes on the left-hand line of the nomogram. Factors such as high scores on diagnostic measures increase the probability of a specific diagnosis. The change in probability for a diagnosis is known as a diagnostic likelihood ratio (DLR ). DLRs go on the middle line of the nomogram. For example, an externalizing T score of 81+ on the Child Behavior Checklist (CBCL; Achenbach & Rescorla, 2001) increases the odds of a bipolar disorder diagnosis by 4.3-fold for adolescents when compared to all other presenting problems at an outpatient clinic (Youngstrom et al., 2004). Looking at the nomogram in Fig. 7.2, an individual might have started with a 2% chance of a bipolar diagnosis according to the base rate in the general population (Van Meter, Moreira, & Youngstrom, 2011); we then move across the nomogram by lining the pretest probability at 2%, the likelihood ratio (middle column) at 4.3, and then ending with a revised posttest probability of about 7–8%. With each new source of data for a given client, we revise this probability. Each updated probability can be used as the new pretest probability when more data and subsequent DLRs are attained, highlighting the potential value of multiple sources and reporters of information, such as base rates, risk factors, self-report measures, teacher or parent report measures, and diagnostic interviews. There also are apps for smartphones and web pages that will do the calculations, but the nomogram provides a visual reference that can be engaging to use in session, with the client and clinician exploring “what-if” scenarios and discussing whether more assessment would be helpful for answering the client’s questions.

Fig. 7.2
figure 2

Probability nomogram for combining prior probability with risk factors and test results. See Jenkins et al. (2011) for a detailed example of using the nomogram with a clinical case

The general model of an EBA battery includes broad measures, brief screeners specific to certain diagnoses, self-report measures, other-report measures (i.e., parents, teachers), and structured interviews (Youngstrom, 2013). The battery promotes the use of multiple sources of information spanning multiple syndromes (De Los Reyes, Thomas, Goodman, & Kundey, 2013). Clinicians are encouraged to adopt a similar approach within their own settings based on local referral patterns, demographic and cultural factors, along with individual client needs.

In order to calculate the most accurate probability of a diagnosis, the application of a step-by-step process within an EBA framework is essential (Youngstrom, 2013). These steps can be viewed in four stages: (1) pre-diagnosis, (2) diagnostic procedures, and (3) treatment planning, (4) all while ensuring that client preferences are taken into account along each stage. Following this process allows for multiple sources of data to be considered to fit within that framework. The overarching benefit of this framework is that it is inexpensive and simple, and requires little additional time over a standard intake protocol while greatly enhancing accuracy (Youngstrom, 2013). Table 7.1 lists the steps for EBA, elaborated upon in greater detail below, estimating the time each step requires and highlighting how a wealth of data can be collected in what many might consider to be a narrow window of time by clinical standards (i.e., between half a session and two sessions, depending on the complexity of the case). A composite case example for “Walter” will be used to illustrate this process at each stage.

Table 7.1 Outline of the four stages of EBA

Four Overarching Stages of EBA

Walter, a 14-year-old White male, was referred for school-based mental health services by his school counselor 2 months into his ninth-grade fall semester. His school counselor noticed that Walter’s grades had dropped significantly from As and Bs in middle school to Ds and Fs soon into high school. Walter appears to have lost all motivation for future plans because of the difficulty concentrating he has experienced in his classes. Walter disclosed to his school counselor that he gets distracted easily in class by what others are doing around him, and that he has begun to feel insecure about participating in class. The school counselor arranges a meeting between the school-based clinician and the parents. The school-based clinician becomes concerned given that Walter’s mother was diagnosed with attention-deficit hyperactivity disorder (ADHD) in middle school. Walter’s parents give their consent to a psychological evaluation administered through the school.

1. Pre-diagnosis. The first stage might be viewed as a preparation stage, before any actual diagnosis begins. What are the frequent clinical issues in Walter’s geographic and demographic region? Are there relevant social issues, such as general or specific substance abuse or multigenerational issues of psychopathology? Once these clinical issues and the variables surrounding them are identified, it is important to determine whether the EBA tools necessary for these common issues are available (Youngstrom, 2013). There are reviews that critically evaluate different psychological instruments and determine the evidence for their reliability and validity (Hunsley & Mash, 2008). Within this first stage it is important to note that few clinical settings will be able to maintain the resources needed to address all possible diagnoses. Prioritize the common issues, as roughly 80% of cases in most clinics will present with the same approximate 20% of clinical issues (Youngstrom, 2013). Thus, the goal ought to be maintaining the most updated tools for these more common diagnoses. Once the most common clinical issues are identified, then your clinical setting can be more prepared to address them. Measures for rare conditions can get added to the tool kit as time, resources, and need dictate.

Base rates . Next, find benchmark base rates for conditions at clinics similar to yours (Youngstrom et al., 2012). This rate gives Walter’s clinician a sense of the starting point in pursuing the proper diagnosis for Walter. If the clinician knew nothing else about him, then the base rate would be the most accurate guess as to how likely it is that he meets the criteria for the diagnosis in question (Meehl, 1954). The hypothesized diagnosis for Walter is ADHD, and so the school-based clinician will start the diagnostic process by attaining a base rate for ADHD. In the general population, the lifetime prevalence rate for moderate-severe ADHD for adolescents is ~4% (Merikangas et al., 2010); therefore, this is an appropriate starting base rate to use in examination of Walter’s profile.

Determining base rates in rural settings within the scientific literature can be difficult given the underrepresentation of rural mental health research . Fortunately, prevalence rates generally tend to be similar across demographic settings (Kessler et al., 1994). Should a clinician be unable to find an accurate published base rate estimate for a specific diagnosis within his/her client population, Plan B would be to estimate the rate oneself. Dividing the number of cases with a specific diagnosis by the total number of cases reviewed provides the base rate estimate (Youngstrom et al., 2014). For example, should a clinician see 75 clients in a year, 3 of whom are diagnosed with bipolar disorder, then the base rate of bipolar disorder in this setting would be 4% (i.e., 3/75). This process takes some time to set up, but the long-term benefits of having these base rates are substantial, and updating them regularly would require simply revising the number of cases meeting a diagnosis and the number of cases reviewed in total. These rates will provide the starting point for further assessment and subsequent treatment guidance. It is important to find base rates not just for one’s geographic location, but also for their actual treatment setting.

Finding base rate estimates in the literature can be done by searching various scientific databases (e.g., Medline, PsycINFO, PubMed, Google Scholar). Medline and PsycINFO are high-quality sources for psychological research, but subscriptions are expensive. PubMed and Google Scholar are free resources one can access from anywhere with an Internet connection. Before initiating a search, it would be wise to be accustomed with the syntax used with each search engine (i.e., whether quotations are required, or using “and” or “or” within the search phrase). If one is attempting to find base rates for depression, then a search consisting of the key words “depression” and “prevalence” or “epidemiology” may yield results (Youngstrom et al., 2012). Further, clinicians can change the search terms to include populations of interest (e.g., “rural”) or settings of interest (e.g., “community”) to narrow the results (Table 7.2).

Table 7.2 Base rates for various diagnoses in different settings

Risk factors . Certain risk factors or clinical signs for various disorders encourage further diagnostic hypotheses after finding a base rate for the diagnosis in question (Morrison, 2006). Different risk factors can be more indicative of certain disorders, though it is important to consider contextual factors such as family history of mental disorders or abuse. Generally speaking, risk factors may provide greater insight as far as what disorder the clinician ought to consider assessing. For example, clients presenting for disruptive behaviors may exhibit symptoms shared by multiple diagnoses, such as ADHD and bipolar disorder (Biederman, Klein, Pine, & Klein, 1998). However, in the event that such clients present with early-onset depression or a decreased need for sleep, then these risk factors might warrant further testing for bipolar disorder given the relative frequency that these are warning signs for bipolar disorder while having less association with ADHD (Youngstrom et al., 2012).

Clinicians can use scientific databases to determine relevant risk factors for different disorders. Some examples might include a family history, or characteristic symptom clusters. A clinician searching for risk factors associated with a diagnosis may again check the databases mentioned when discussing base rates by searching for “ADHD” (or any diagnosis of interest) and “risk factor.” If searching for a specific risk factor, such as heritability, then a search consisting of “depression,” “offspring,” and “heritability,” may narrow results specifically to heritable risk factors (Youngstrom et al., 2012). To identify a broader set of risk factors, remove the search terms relevant to heritability to expand the search. Web pages curated by the National Institute of Mental Health or the Centers for Disease Control often pull together relevant information, but the depth of coverage is uneven across disorders. Browsers let us bookmark our favorite pages and save our favorite searches, making it fast to update or tailor them (e.g., swapping “anxiety” for “ADHD”). It is an activity that fits easily in spaces created by “no shows” and cancellations, and we can share the best information with our colleagues.

Looking at Walter’s possible risk factors, we learned earlier that Walter’s biological mother was diagnosed with ADHD in middle school. Previous research suggests that the DLR associated with an immediate family member carrying an ADHD diagnosis is 4–5 (Faraone, Biederman, & Friedman, 2000; Faraone et al., 2000). As such, Walter’s risk, starting at 4.2% as a result of the base rate of ADHD in a general adolescent population, can now be calculated to be at 11% using the nomogram in Fig. 7.2.

This first stage is not so much a task to do for a single client, but rather a way of reconfiguring our practice to work more efficiently. It may require a large amount of time up front in order to obtain a list of common reasons for referral, base rates, necessary assessment tools, and risk factors. Having these readily accessible saves an inordinate amount of time long-term as clinicians will not have to continually research these variables for every new referral, but rather every few years as such information is updated.

2. Diagnostic procedures. This second stage can be seen as the initiation of more active diagnostic procedures. Broad-scale measurement tools (e.g., Behavior Assessment System for Children, 2nd Edition [BASC-2]; Reynolds & Kamphaus, 2004) are useful in capturing a broad array of symptomatology within an individual such as Walter. Although these measures may not be highly specific to detecting specific disorders, they are useful in refining the probability of a specific diagnosis, and they can be decisive at ruling out some otherwise common possibilities. Broadly speaking, examining the internalizing and externalizing subscales helps to narrow a tentative diagnosis. For example, a high CBCL externalizing scale score will increase the odds of a diagnosis of pediatric bipolar disorder 3- or 4-fold, while a low externalizing scale score decreases the odds 20-fold (Youngstrom et al., 2004).

The benefit of utilizing broad-scale information is not solely to refine tentative diagnoses, but also to incorporate new DLRs to update the probability of the diagnosis in question more objectively. One common way of calculating a DLR uses the sensitivity and specificity of a measure (Straus et al., 2011). The sensitivity refers to the percentage of cases who have the disorder in question that would be correctly classified by the measure, while specificity refers to the percentage of cases who do not have the disorder in question and would be correctly classified as not having the disorder by the measure (Youngstrom & Frazier, 2013). These sensitivity and specificity percentages can be used to calculate a DLR simply by taking the sensitivity and dividing it by the “false alarm rate” (i.e., 1—specificity). This results in a value that has a multiplicative effect on the odds of a diagnosis. DLRs below 1 suggest a reduction in the probability of a diagnosis, while a value above 1 will increase the probability of a diagnosis.

If Walter meets the criteria for ADHD, he is expected to receive high ratings on diagnostic scales devoted to attention problems and/or hyperactivity. A subscale on the BASC-2 parent report assesses for attention problems. Research has shown that a T score of 59.5 or higher on the attention problems subscale of the BASC-2 parent report differentiates children with ADHD from those without any disorders with a sensitivity of 93.3% and a specificity of 93.5% (Ostrander, Weinfurt, Yarnold, & August, 1998). As such, the DLR would be calculated as .933/[1 − .935] = 14.35. Thus, children with parents who indicate a T score of 59.5 or higher on the attention problems subscale of the BASC-2 parent report are at a 14.35-fold risk of an ADHD diagnosis relative to those without any evidence of pathology. Adding on to previous stage, this ratio would be applied to the currently existing DLR stemming from all previous steps, to then calculate a newly revised DLR taking all this data into account. Assuming that the parent form of the BASC-2 filled out by Walter’s parents reported a T score of 60, this DLR of 14.35 can now be applied to his previous diagnostic probability of 11%. Using Fig. 7.2, Walter’s revised diagnostic probability for ADHD is now between 60 and 65%.

Finding studies that report DLRs is uncommon, so it is typically a productive strategy to seek studies that report sensitivity and specificity values in order to calculate DLR by hand given the ease of calculation. Searching for such studies might include a search containing key words such as “ADHD” (or any disorder of interest) and “sensitivity and specificity” (Youngstrom et al., 2012). It is important to determine the assessment score relevant to the sensitivity and specificity percentages as sensitivity and specificity, and subsequently DLRs, shift based on scores (Tables 7.3 and 7.4). Further , it is imperative to determine whether the measure used in the study is differentiating between individuals with the specified diagnosis from those with no traces of pathology, or if the measure is differentiating between individuals with the specified diagnosis from those with some other pathology. Finally, when searching for studies reporting sensitivity and specificity values, it is important that the sample for the study contains at least ten individuals with the diagnosis and ten individuals without (Kraemer, 1992).

Table 7.3 Diagnostic likelihood ratios for various diagnoses from broad assessments
Table 7.4 Diagnostic likelihood ratios for various diagnoses from brief screening assessments

Brief screens . After using broad measurement tools to narrow the tentative diagnoses, the clinician ought to be gathering brief screening instruments that focus on the leading clinical hypotheses for Walter (Youngstrom et al., 2012). Looking at symptoms that might be most associated with certain disorders helps refine diagnostic probability estimates (e.g., grandiosity indicative of bipolar symptomatology). While these symptoms may not always create the greatest level of dysfunction, the goal is to find symptoms that will weaken the case for certain diagnoses while possibly strengthening others. For example, if Walter does not experience patterns of decreased need for sleep paired with excitability, the probability of the issue at hand being pediatric bipolar disorder, a diagnosis that often competes with ADHD, decreases (Youngstrom et al., 2012). The results from these brief screening devices tend to outperform the broad scales as these briefer measures will be more focused on the symptom patterns of interest. For example, Table 7.3 reports the DLRs associated with mood disorders for the CBCL—a broad scale, while Table 7.4 reports the DLRs associated with the Kutcher Adolescent Depression Scale (KADS; Brooks, 2004)—narrowly focused on depression. As a rule of thumb, comparing these DLRs suggests that the KADS may be more useful than the CBCL in detecting depression given the higher DLR (see Youngstrom, 2014 for more detail about methods for comparing tools).

In choosing a short screener to administer to Walter, the clinician chooses the Strengths and Difficulties Questionnaire (SDQ; Goodman, 1997). After calculating the results, the clinician finds that Walter scored a 7 on the hyperactivity subscale, which yields a DLR of 7.3 for ADHD (He, Burstein, Schmitz, & Merikangas, 2013). Using Fig. 7.2, we can revise Walter’s odds of having ADHD from the previous probability of 60–65%, to just over 90%.

Searches for brief screeners specific to certain diagnoses can use a similar strategy to the one above. Searching for “sensitivity and specificity” specific to diagnoses of interest is the best way to do this, again with the search engines listed in previous steps (Youngstrom et al., 2012). It may be possible to include specific tools of interest within the search terms, though again this will narrow the possible pool of results. Calculating a DLR here would be no different than how it is done for broad-scale measures.

Multiple perspectives . Obtaining multiple perspectives can help to clarify the likelihood of a diagnosis (De Los Reyes et al., 2013). Parent-report measures about Walter’s symptoms may add additional important information regarding the frequency and severity of the dysfunction related to reported symptoms (Carlson & Youngstrom, 2003). Teacher-report measures can inform about the pervasiveness of the Walter’s symptoms into school contexts (Achenbach & Rescorla, 2001; McDermott, 1995). Specifically, the teacher can help to determine whether the Walter’s symptoms are home specific or whether they occur in multiple contexts. However, a plan must be established in cases where there is disagreement among raters, as cross-informant agreement tends to be modest at best (Achenbach, McConaughy, & Howell, 1987). Savvy clinicians consider the contexts under which parent report might be more credible than self-report (Youngstrom et al., 2011). Oftentimes, making this decision may require deferring to what the literature has to say about more specific diagnoses. For example, self-report measures for internalizing disorders (i.e., depression, anxiety) are often seen as more accurate (Rothen et al., 2009), while parent or teacher reports for externalizing disorders may be more favorable, particularly for externalizing behaviors which may involve low insight, such as bipolar disorder (Pini, Dell’Osso, & Amador, 2001; Dell’Osso et al., 2002). Having less formal education or misusing substances may reduce the credibility of parent report (Youngstrom et al., 2011).

The data integration process requires caution, as it can sometimes either inflate or greatly undermine the true probability of a diagnosis. For instance, having Walter’s parent fill out multiple measures, all of which are assessing the same symptoms, could inflate the estimated probability of an actual diagnosis if all of these measures were used in revising the probability of a diagnosis (Youngstrom et al., 2012). This is easy to see when using multiple, redundant measures, as the clinician would essentially be counting the same symptoms multiple times from the same reporter, treating them as though they are separate when revising the DLR.

Finding studies that look at parent- or teacher-report measures involves similar search terms to those seeking either broad-scale measurement tools or diagnosis-specific instruments (Steps 4 and 5). If a clinician knows of specific teacher or parent forms of interest, then those terms can be included within the search to try and find those specific instruments. Further, including broad terms such as “parent” or “parent report” may be of use.

Similar to the first stage, finding studies which report sensitivity and specificity figures for different measures and having them readily available save clinicians a great deal of work in the future despite requiring an up-front investment of time. A clinician can create a list of DLRs associated with different measures (e.g., Tables 7.3 and 7.4), and, upon scoring the various measures given to a client or other informants, determine whether any of the results will yield a DLR. Again this highlights that, once these numbers (i.e., base rates, risk factors, DLRs for various measures) are obtained, the EBA process moves fluidly and adds little to no additional time beyond a typical intake while greatly enhancing accuracy .

Intensive assessment . At this point, the clinician ought to be seeking to finalize the updated tentative diagnosis with more rigorous assessment measures. As such, the use of a structured clinical interview may be seen as a more individualized, intensive diagnostic assessment tool. Such interviews will help the clinician to rule out diagnoses that do not fit Walter’s presenting symptoms which are primarily externalizing in nature while strengthening the case for diagnoses consistent with reported symptoms. Structured interviews with sound psychometric properties include the Mini-International Neuropsychiatric Interview for Children and Adolescents (MINI-KID; Sheehan et al., 1998) and Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS; Kaufman et al., 1997). These interviews are separated into different diagnostic sections, allowing the clinician to only assess specific diagnostic syndromes (e.g., anxiety disorders, mood disorders), though it would be considered more thorough and complete to assess for all diagnoses given the risk of comorbidity, as is common for child and adolescent psychopathology (Costello, Mustillo, Erkanli, Keeler, & Angold, 2003).

Often clinicians shy from using structured interviews for fear that clients dislike them, rapport will be damaged, reimbursement will be impossible, or professional autonomy will be constrained. Interestingly, all of these concerns lack empirical validity, as client surveys suggest that they appreciate the thoroughness of structured interviews without feeling a loss in rapport (Suppiger et al., 2009), Medicaid and other insurance companies will reimburse for diagnostic interviews (Youngstrom et al., 2014), and semi-structured interviews still allow for clinical judgment supported by thorough data (Axelson et al., 2003; Kaufman et al., 1997).

Walter is given a MINI-KID to confirm ADHD while ruling out other possible diagnoses. The results lend more support to the proposed diagnosis of ADHD while ruling out other possible competing diagnoses (e.g., bipolar disorder). At this point Walter has roughly a 90% probability of having ADHD, in addition to convergent support from the MINI-KID. Based on the test-treatment threshold chosen by Walter’s parent, this probability may be sufficient to begin treatment with the confidence that Walter is in fact struggling with ADHD.

3. Treatment planning. After establishing a diagnostic formulation, EBA shifts to measuring Walter’s functioning as well as factors that could moderate treatment selection or response. Such information includes medical history, past or current use of medications, comorbidities, and academic and social functioning. Any potential medical rule-outs need to be clarified at this point assuming that they have not been addressed previously (Youngstrom et al., 2012). Relational and systemic factors such as family functioning and presence of high-expressed emotion will often alter treatment approach (Cicchetti & Toth, 1998; McClellan, Kowatch, & Findling, 2007). Measures tailored to address functionality and quality of life can help the clinician to tailor treatment not solely towards symptom reduction, but also towards greater functioning amidst the presence of symptomatology. One such measure is the KINDL-R (Bullinger et al., 2008).

The rural setting itself may be a moderator of validity for both assessment and treatment . Rural areas have been known to exhibit heightened stigma against mental health services and diagnoses of psychopathology in general (Owens et al., 2013). Placing a diagnostic label on a person seeking services may not be necessary for treatment gains. Thoughtful clinicians use personal judgment informed by research on mental health stigma in regard to this issue, and seek supervision if necessary when deciding whether and how to share diagnostic information with the client or other involved parties.

Pharmacological treatment in addition to psychosocial intervention will often be a part of the treatment plan, especially with more chronic and debilitating syndromes. If this is the case, be prepared as a clinician to act as a “stop-gap” entity, providing treatment while the client is placed on a waiting list for psychiatric care. The availability of psychiatrists within rural settings, particularly those specialized for children and adolescents, is astoundingly low (Gamm, Stone, & Pittman, 2008). Complicating matters and prescription abuse and misuse can be significantly greater in rural settings compared to more urban areas (Anderson, Neuwirth, Lenardson, & Hartley, 2013). Clients may feel as though medication is the only option for symptom relief; therefore, extra attention spent in early intervention while planning for the most impairing symptoms may set up hope and motivation for the treatment process.

Assuming that connections have not already been made, clinicians in a rural setting ought to use this step to create connections within the local community. Given the lack of mental health resources in rural settings, knowledge of the existing or surrounding resources (e.g., safe houses, food banks), nonprofit organizations (e.g., boys and girls clubs), religious support, in addition to other main leaders within the community is imperative for comprehensive care (Owens et al., 2013). Further steps within the EBA framework assume that treatment is under way.

Process monitoring . Once treatment has been initiated, the clinician is encouraged to shift from treatment implementation to treatment monitoring, tracking progress being made towards the client’s goals. Ongoing assessment of symptom severity can help the clinician objectively examine the progress being made towards the decided outcome. Measuring changes in severity and defining treatment targets are associated with better therapeutic outcomes (Finn & Tonsager, 1997; Lambert et al., 2006; Poston & Hanson, 2010). Brief screeners or checklists specific to different diagnoses are sensitive to treatment effects (West, Celio, Henry, & Pavuluri, 2011; Youngstrom et al., 2013), and/or brief screeners with broad coverage in order to determine functionality across multiple symptom syndromes, such as the YOQ-30. Progress monitoring helps define the starting point for therapy, along with both short- and long-term goals.

The outcome battery used to measure treatment gains can be brief because it concentrates on the key diagnoses and dimensions of the case formulation and can omit the conditions that were initially in contention but ruled out. The battery should assess current symptom severity for the principal diagnosis. Combining information from multiple sources, which might include parent or teacher scales, will help to gain greater insight into the generalizability of improvement across multiple contexts (Youngstrom et al., 2013). It is rarely necessary to readminister a semi-structured interview to confirm the loss of a diagnosis outside of a research protocol. On the other hand, if a client fails to make expected gains, or if they show new problems, then the assessment strategies from earlier steps could be helpful in revising the diagnoses and formulation.

Progress and outcome . At the end of treatment, measures should be readministered to determine the level of progress over the course of therapy. The outcome battery usually will be a shorter version of the initial, pretreatment battery, and this step can look similar to process monitoring. It should assess current symptom severity for the principal diagnosis, along with related syndromes. Similar to before, combining information from multiple sources will help to gain greater insight into the generalizability of improvement across multiple contexts (Youngstrom et al., 2013).

Using a Reliable Change Index (RCI ; Jacobson & Truax, 1991) is especially helpful in tracking progress between pre- and posttreatment assessment scores. The RCI is a statistical method to determine whether a client has experienced meaningful change over the course of treatment. In order for a client to be classified as experiencing meaningful change, two criteria must be met: (1) the client must begin treatment with a level of symptomatology within a clinical range and then end treatment in a subclinical range, and (2) the amount of change that occurs must be sufficiently high to suggest that it cannot be attributed to random fluctuations in symptoms (Jacobson & Truax, 1991). For example, a score of 29 on the YOQ-30 has been established as the clinical cutoff, where scores 29 or higher are deemed clinical, and those below 29 are subclinical (Burlingame et al., 2004). Further, the amount of change on the YOQ-30 that must occur between pre- and posttreatment scores to qualify as reliable change is 10 (Burlingame et al., 2004). To illustrate RCI in the case of the YOQ-30, a client who starts treatment with a score of 40 and ends treatment with a score of 20 would have shown meaningful change as treatment started with a clinical score and ended with a subclinical score, and the amount of change between pre- and posttreatment scores is greater than 10. In this case, the client would be considered “recovered” (Jacobson & Truax, 1991). If this same client were to end with a score of 30 (i.e., a clinical score), then the client would have still exhibited a sufficiently large amount of change in scores where we could confidently say that they are not due solely to random fluctuation. In this case, the client would be “improved” and not yet “recovered” as the posttreatment score remains clinical (Jacobson & Truax, 1991). RCI can be calculated for most measures, often based on the data within the manual (see Jacobson & Truax, 1991 for a more in-depth explanation on how to calculate RCI).

Maintenance . Relapse prevention is a common goal for many disorders. As such, the final step of the EBA system focuses on preventing relapse via long-term monitoring of treatment gains and environmental triggers. This might entail specific tasks the individual can complete on a semi-regular basis, or it might incorporate occasional follow-up sessions, or a combination of both (Youngstrom et al., 2012). For example, generalizing exposure tasks over long periods of time and across multiple contexts helps to offset spontaneous recovery or renewal effects, even if these tasks are initiated after the client reaches subclinical symptomatology (Craske et al., 2008). As such, even after a client can be considered in remission for an anxiety disorder, it would be wise to schedule monthly or bimonthly sessions to continue with exposure tasks over a longer period of time, generalized to new contexts.

4. Client preferences. Each patient will seek treatment with preconceived attitudes which will influence engagement in the assessment and therapeutic process. These attitudes and preferences can be used to adjust the wait-test and test-treatment thresholds. For example, parents of a child who want to be certain of a bipolar disorder diagnosis before initiating a pharmacological regimen may suggest a higher test-treatment threshold, requiring that the estimate of probability for such a diagnosis be 80% or higher. Conversely, a client suspected of having depression who is eager to initiate treatment may suggest a lower test-treatment threshold, requiring that the estimate of probability for such a diagnosis be 60%. In both cases, the client’s attitudes help to formulate a cost-benefit analysis which the clinician can then apply to all data attained from the EBA process. After the most accurate DLR is calculated for a given client, both clinician and client can determine together whether the estimated probability of a diagnosis is sufficiently high to initiate treatment, along with the type of treatment that would be pursued, or whether the client would prefer to postpone any psychological treatment.

Early on in the process, Walter’s parents decide to set the wait-test threshold at 20%, and the test-treatment threshold at 80%. As such, assessment can continue until the school-based clinician reaches a probability for a diagnosis that is at or outside of this range, where 20% is a sufficiently low probability where Walter’s parents are comfortable with him no longer pursuing assessment, and 80% is sufficiently high where they can feel comfortable that he probably has ADHD and treatment can begin.

Conclusion

Evidence-based assessment provides a framework that combines the best available evidence with the individual needs of the client. Once this framework is installed in a clinical practice, it adds little or no time or expense to working with clients, and over the long run it actually can lead to substantial savings in terms of avoiding unnecessary testing, preventing over-diagnosis, misdiagnosis, and missed diagnoses, leading to better treatment matching and ultimately better outcomes. Using an EBA approach in some ways is a natural fit with work in rural mental health, because the challenges of working in rural settings require a pragmatic attitude that repeatedly asks not just “What works?” but also “What could work here, for this person, with the available resources?” The principles and habits of EBA are practical and client centered in a way that is well suited to the rapidly changing realities of work in rural settings.