As advocated by the Institute of Medicine, the field of prevention generally uses a three-tier model that classifies programs into those that address the universal needs of all individuals, those that selectively target individuals at elevated risk for problem behaviors that have not yet begun to manifest themselves, and indicated programs for those exhibiting the early stages of problem behaviors in order to address individual needs, while working to reduce population problems (Gordon 1987; Mrazek and Haggerty 1994). Implementing and delivering services at all three levels has been advocated as an effective method for increasing community resilience and preventing problem behaviors (Felix et al. 2007; Winslow et al. 2005), reducing youth violence (Sprague et al. 2003; Walker et al. 1996), and reducing youth substance use (Dishion et al. 2002). However, implementing a multi-tiered prevention strategy requires a significant commitment of time and resources. This commitment can overwhelm providers or displace other valuable programs and requires an ongoing needs assessment infrastructure that can further strain local resources (Furlong et al. in press). For these reasons, local agencies and projects may wish to choose the one intervention approach that produces the greatest impact on the outcome the intervention is designed to prevent. One way of comparing the effectiveness of programs across the IOM classification domains is to convert findings into standardized difference scores, which can then be directly compared (e.g., Derzon 2007; Lipsey and Wilson 1993; Tobler et al. 2000).

Comparisons between Substance Abuse Prevention Programs (SAPP) types have generally found selective and indicated programs to be more effective (e.g., Derzon et al. 2005). Although accurate with respect to the effectiveness of these programs for program participants, the assumption that these effectiveness estimates can be used to infer the relative effects of the program on the population is not well founded. Although we might expect some modest diffusion of such effects, the conservative assumption is that programs affect only those exposed to the intervention. Moreover, to the extent that selected and indicated programs target youth who are more likely to engage in the outcome, the effectiveness estimates obtained from these samples will be larger, and more likely significant, because of the large number of non-users in universal programs (Cuijpers 2003).

SAPP effectiveness estimates are confounded not only by sample type but also by program characteristics. Selective and indicated programs are often more intensive and require greater resources to implement. At a minimum, selective and indicated program recipients must be identified, which can involve a non-trivial expense and also subject the recipients to the potential of stigma. Prevention resource managers must choose the program or strategy that delivers the greatest ‘bang for the buck.’ If the ‘bang’ of an efficient SAPP is reducing substance use and its negative consequences in a given population, then estimating the potential impact of the program on reducing negative population level outcomes constitutes a central component of efficient resource management. To date, there has been no mechanism by which to estimate program effects on population-level substance use in such a way that these estimates can be comparable across universal, selective, and indicated prevention programs.

In this article, we introduce an approach to standardizing estimates derived from empirical tests of SAPP, so that the effect of each program on the population it targets is comparable at the population level. This adjustment provides prevention resource managers with a stronger and sounder basis on which to select among universal, selective, and indicated programs that are designed to reduce population substance use in their communities.

The IOM Distinction in Substance Abuse Prevention

The ultimate goal of SAPP is to prevent, or mitigate, the negative consequences that occur as a result of substance use in the population; however, there is a lack of comparative evidence suggesting whether it is preferable to target general or high-risk populations to achieve this goal. The Institute of Medicine’s (IOM) classification schema, as applied to SAPP (Substance Abuse and Mental Health Services Administration [SAMHSA] 2003), is a good framework for making such comparisons. Two key considerations when comparing the efficacy of different types of programs are whether the programs: (a) have a substantial effect on changing or preventing the behavior of those targeted, and (b) produce a sufficient change in behavior given the costs expended.

One criticism of universal prevention programs is that they typically have a small impact on preventing or decreasing substance use and its consequences because they target many who will never initiate substance use (cf. Pentz 1994). Indeed, the average effect size for universal programs is relatively small (d = .20; Tobler et al. 2000). Universal programs have been found to have an impact on delaying initiation of substance use among low-risk individuals, but with some exceptions (e.g., Kulis et al. 2005). Also, universal programs may be ineffective for those who have already initiated use (Masterman and Kelly 2003). This evidence notwithstanding, universal prevention programs constitute the vast majority of prevention programming (SAMHSA 2007).

The small effects documented in evaluations of universal programs may suggest that such programs are not worth the costs and burden they impose or may be largely attributable to the low base-rate of substance use in universal samples, and especially the low base-rate of substance use among younger adolescents. Cuijpers (2003) has convincingly demonstrated that infrequent outcomes lead to considerably less statistical power, making it easier to demonstrate program effects for selective and indicated programs than universal programs, which are characterized by considerably higher base rates of substance use.

The choice among universal, selective, and indicated programs is fraught with consequences. Selective and indicated programs are often more costly, as they require specialized data collection, data management, analysis, and infrastructure to identify at-risk and youth exhibiting problem behavior using procedures of uncertain sensitivity and specificity that may stigmatize the individuals involved. Selective and indicated programs also require a commitment to provide identified youth with often intensive services delivered by highly trained prevention specialists. Using the median of the average implementation cost ranges for model programs targeting alcohol and/or marijuana (SAMHSA 2007), programs with selective and/or indicated components ($23,269) cost nearly twice as much to implement as programs only offering universal components ($12,347). Even so, the costs of indicated prevention may well be justified, when considering the total economic costs to society for substance abuse (e.g., treatment, incarceration, and morbidity; Caulkins et al. 1999). Reinforcing this justification is the observation that treatment, which shares some similarities with indicated prevention, is a more targeted and thus cost effective strategy for preventing substance use than universal programs when considering school-based SAPP (Caulkins et al. 1999). Underscoring this point, Miller and his colleagues (Miller et al. 2007) suggest, based on cost-benefit ratios, that programs that target students and families of students at risk (i.e., selective populations) may be a more cost effective strategy than universal prevention strategies. On the other hand, selective and indicated programs are not without limitations. For instance, group settings may lead to the reinforcement of negative behavior by peers, leading to iatrogenic program effects (Dishon et al. 1999).

Demonstrated risk for substance use or abuse is often a condition for receiving funding to implement selected or indicated programs (Baker et al. 2006), and this requires operationally defining “at risk” or “initiation of problem behavior.” A recent meta-analysis by Derzon (2007) suggests that many commonly identified risk and protective factors are only modestly associated with alcohol, tobacco, and marijuana use (r range: .05–.35), with many effect sizes towards the bottom of this range (Derzon 2007). Because of multicollinearity among factors, composite measures of risk and protective factors may fare little better than commonly available predictors. Poverty, or percentage of youth in free or reduced lunch programs, is often used as the operational definition for risk, which may simply covary with indicators of more proximal predictors of substance use (e.g., social disorganization, opportunities for prosocial involvement). There is mounting evidence of the limited precision in how we identify youth for selective and indicated programming, which affects our ability to use these strategies to prevent and reduce youth substance use.

Given the relative costs and benefits of universal, selective, and indicated programs, Pentz (1994) suggests an approach that incorporates all three levels of these programs in preventing substance use. More specifically, she recommends that selective and indicated students should receive indicated counseling and that universal programming should be provided as a complement to the other populations targeted. This approach has yielded evidence of effectiveness (Conduct Problems Prevention Research Group [CPPRG] 2000), and students who receive both universal and indicated components have been shown to have more positive outcomes (Lochman and Wells 2002). Although this may be an ideal solution, it may also be cost prohibitive.

The study that most directly speaks to the differences among universal, selective and indicated programs is a meta-analysis conducted on 25 programs recognized as model by SAMHSA (Derzon et al. 2005; Hansen et al. 2009). This study calculated a representative effect size across the substance use and substance-use-related outcomes for each of these 25 programs. Unweighted mean effects of around d = .30 were found for four indicated and four selective and indicated programs, whereas 13 universal programs had an unweighted mean effect size of around d = .10. Although this study provides effect sizes for universal, selective and indicated programs that are comparable with respect to the affect of the programs on the individuals exposed to them, it does not permit estimates of the potential impact of these programs in reducing population-based substance use and its related consequences.

In sum, current comparisons between universal, selective, and indicated programs are predicated on the assumption that program effects on participants are of primary interest. The current study differs by focusing on the impact of these programs from a public health perspective. That is, what would be the effect of taking these program types to scale and applying them in the general population? If adopted, how many people might abstain from use or reduce the extent of their use? The adjusted effect sizes proposed in this paper reflect this public health focus and provide estimates of program effects on the larger population.

Method

Studies Used

The studies examined for this meta-analysis were drawn from the data set compiled by Derzon and colleagues (Derzon et al. 2005; Hansen et al. 2009), the purpose of which was to examine the effects of various program characteristics on the magnitude of intervention effects for risk and protective factors and the use of various substances. This work summarized the evidence cited by the 102 programs identified as effective or model by the National Registry of Effective Prevention Programs (NREPP: SAMHSA 2003). Of the 102 programs, 48 provided implementation manuals. Of these, 25 had published studies or provided reports on the effectiveness of the program in preventing or reducing substance use that could contribute to the meta-analysis. Studies were excluded because (a) only statistical significance levels were reported, (b) results were too complex (e.g., multi-way statistical interactions), (c) results were moderated by third factors (e.g., outcomes were secondary data that were difficult to interpret), or (d) sample sizes were not reported in a way that allowed for properly weighting results. Overall, 43 discrete studies had been conducted on the effectiveness of these 25 programs for reducing and preventing the use of alcohol, tobacco, and marijuana. These studies comprised the principal data for the current investigation.

The 201 effect sizes for these 43 studies were collapsed to represent program effects on the use of alcohol, tobacco, marijuana, and other drugs for each of the 25 programs examined. We began by first aggregating multiple effect sizes for each substance within each study sample. These average estimates were then averaged, by substance, to calculate a summary effectiveness estimate for each independent sample. These sample estimates were then combined to produce a summary effectiveness estimate for each of the 25 programs. Only after each program’s effectiveness was estimated were estimates averaged across program samples. All averaging was done using inverse variance weighting, which has the benefit of giving studies with larger sample sizes more influence (Hedges and Olkin 1985). Ultimately, effects for other drugs were not examined, because only seven programs presented evidence that they examined the use of drugs other than tobacco, alcohol, and marijuana.

In total, 21 studies yielded 56 effect sizes for 11 programs addressing tobacco use; 29 studies provided 110 effect sizes for 18 programs addressing alcohol use, and 15 studies provided 27 effect sizes for 11 programs addressing marijuana use (see Table 1). Overall effects were based on a large number of participants for tobacco (N = 25,332), alcohol (N = 31,740), and marijuana (N = 10,219). Only seven studies reported effects for other illegal drugs and reported them in disparate manners (i.e., different combinations of substances), so mean effect sizes could not be reliably estimated. Therefore, studies that were limited to effects on other illegal drugs were excluded from effect size calculations.

Table 1 IOM type, number of effect sizes reported, number of studies included, and weighted ES by program

Some effects reported in these studies seemed unreasonably large. For example, the Guiding Good Choices program reported effect sizes exceeding d = 6.80—a difference of nearly seven standard deviations in use between treatment and control groups (Spoth et al. 2001, 2002). To reduce the potential for these extreme values to bias our results, three outlier effect sizes were Winsorized to increase the reliability of our overall estimates of effects. Maximum and minimum cutoffs were created by calculating what effect size would be observed if at post test there was 0% use in one group and the base-rate of lifetime use (National Survey on Drug Use and Health [NSDUH] 1999) in the other group. An effect size calculated in this manner represents no use by the intervention group after an intervention with use in the comparison group remaining unchanged. To calculate this effect, the standard deviation of the base-rate was used as the basis for the denominator of the effect size calculation, which is essentially Glass’ delta. Given these assumptions for Winsorizing effect sizes, one tobacco effect size was adjusted at 1.60, one alcohol effect size was adjusted at 2.08, and one marijuana effect size was adjusted at 1.38. We performed the analyses to be discussed including and excluding the one program providing these Winsorized estimates, and the pattern and magnitude of results were nearly identical (±.01). Lacking any compelling methodological or statistical rationale to remove this program, we opted to include this study in all reported analyses.

Additional Data Sources

The NREPP web site contains data on a number of characteristics of programs identified as effective (SAMHSA 2007). These include data on the focus of the program (alcohol, tobacco, illegal drugs, inhalants, steroids, binge drinking, parents, social skills, violence, and/or health), age group(s) targeted by the program (0–11, 12–17, 18–24, 25 and older), average program duration in years, IOM classification, and the program’s cost ranges. The middle of the cost range was used in the analyses reported here. Because 6 of the 25 programs examined did not have all the data we needed on the NREPP web site, we conducted a short web survey with the program developers to obtain the missing information, to which five of the six program developers responded. More accurate cost estimates were available (per student/family costs); however, these estimates could only be accurately calculated for 14 (predominately universal) of the 25 programs (Miller et al. 2007).

Defining Programs as Universal, Selective, or Indicated

As can be seen in Table 1, the NREPP classification of programs by IOM category suggests that many are designed to be implemented with multiple populations, as characterized by level of risk. For instance, the authors of All Stars report that their program can be effective for both universal and selective samples (SAMHSA 2007). For reasons that will become clear from the section addressing approximations for adjusting program effects to the population level, we found it necessary to define each program as universal, selective, or indicated. Towards this end, studies were classified into a discrete IOM category according to the sample on which the program was tested. More specifically, the studies were coded by a research associate based on whether they reported inclusion criteria (criminal offenders; foster youth, homeless youth, or youth from troubled homes; school drop-outs; students experiencing academic failure; communities experiencing social disorganization and/or poverty; schools that are characterized by low performance or social disorganization; being high on a risk factor(s) using a validated measure; having initiated substance use; and being abusers of substances). Based on the latter coded information, studies were then coded as being implemented with a universal, selective, or indicated population. The first author also coded five randomly selected studies for which there was perfect agreement. When there were several studies for a program that was administered to populations in multiple IOM categories, the programs were classified by the most frequent type of population to whom the program was implemented. Cases where ties occurred were classified into the more specific IOM type. For example, if half of studies for a given program were conducted with selective samples and the other half with indicated samples, the program would be defined as indicated. This decision rule made our approximations more conservative, because using the proportion of the population that is indicated is smaller than the proportion of the population that might receive selective prevention services. This decision rule decreases the magnitude of population adjusted effect sizes.

Defining the Proportion of the Population that is at Risk or Exhibiting Problem Behavior

To calculate approximated effects, we began by obtaining estimates of the percentage of study participants who were at risk (selective) and exhibiting problem behaviors (indicated). More specifically, an estimate of p (proportion of those targeted) in the formulas that appear below had to be obtained for universal, selective, and indicated interventions for each substance. This paper assumes that if taken to scale, universal interventions target the entire population (i.e., 100% of the underlying sample of interest). Although this may not be literally true, as students can self-select out of an intervention, the magnitude of effect sizes was not substantially changed even if we assume that universal programs are only administered to as little as 80% of the population.

Data from the 1999 National Survey on Drug Use and Health (NSDUH 1999) were used to estimate the percentage of youth at risk for substance use. The 1999 NSDUH was chosen for two reasons: (a) it was the last survey year that included a comprehensive assessment of 24 risk and protective factors for youth 12–17 years of age, and (b) the mean publication year of the studies examined here. The risk and protective factors measured in the 1999 NSDUH are modeled after the survey used for the Communities that Care model (Pollard et al. 1998). A simple count was taken of the number of risk factors (or the absence of protective factors) each individual in the sample reported, an approach that has been used elsewhere (Derzon 2007). Positive identification of possessing a risk or not possessing a protective factor was defined as being in the upper or lower 33% of the distribution of risk or protective scores, respectively.

Three probit regressions were run using the number of positive identifications as a predictor of lifetime tobacco, alcohol, and marijuana use. All analyses suggested that the number of positive identifications was a statistically significant (p < .001) predictor of lifetime substance use. These analyses were next used to determine the point in the distribution of risk at which at least 50% of the individuals reported using the substance in their lifetime. These risk levels comprised eight identifications for tobacco and alcohol and 11 for marijuana. The percentage of individuals with this level of risk was then computed from the data. We found that 31% of youth were at risk for lifetime tobacco or alcohol use and 12% were at risk for lifetime marijuana use. These proportions were used to represent the proportion of individuals in the population targeted for selective programs.

We used a slightly higher threshold to define the percentages of those indicating problem behaviors. Lifetime use can represent a single isolated occurrence of use (i.e., initial experimentation), so an indicator of more regular (or recent) use—30-day use—was used. Again, we relied on the 1999 NSDUH to calculate estimates of 30-day use by age group (12–17, 18–25, and 26 and older). These estimates were as closely matched as possible to the age group for which a given program was designed. Defined in this manner, the percentage for 12–17 year olds, 18–25 year olds, and those 26 and older of those with problem levels of tobacco use were 17%, 45%, and 30%; the percentage of those with problem levels of alcohol use were 17%, 57%, and 49%; and the percentage of those with problem levels of marijuana use were 7%, 14%, and 3%, respectively. These proportions were used to represent the proportion of individuals in the population targeted for indicated programs.

Making IOM Program Effects Comparable

Our approach to developing a common metric by which to compare program effects on the three IOM classifications involved considering the effects of interventions on the population as a whole, as opposed to their effects on only those participating in the study. A common way of conceptualizing intervention-based research is along the dimensions of outcomes (positive vs. negative) and treatment group (intervention vs. comparison). These designs assume a common 2 × 2 table. Conceptually, the approximation we have proposed adds another condition that considers those who would be excluded from the intervention, because they did not meet selection criteria (e.g., non-high-risk students when a selective intervention was implemented). In this approximation, we assumed that the intervention would have no impact on these excluded individuals. Conceptually, the approximation considers interventions along a 2 × 3 cell design, with two dimensions of outcome (positive vs. negative) and three dimensions of treatment (intervention vs. comparison vs. not considered).

When calculating prevention program effectiveness, effect sizes are typically calculated using Hedge’s G (d G ). This effect size measure is defined as the difference between intervention and comparison group means \( \left( {\overline{X}_{k} } \right) \), divided by a pooled estimate of the standard deviation based on sample standard deviations (S k ), group sizes (n k ), and the total sample size (N). Thus, d G represents the differences between groups on a common metric of standard deviation units.

$$ d_{G} = \frac{{\overline{X}_{1} - \overline{X}_{2} }}{{\sqrt {\left( {n_{1} - 1} \right)S_{1}^{2} + \left( {n_{2} - 1} \right)S_{2}^{2} N - 2} }} $$
(1)

An approximation that adjusts the sample size to incorporate those excluded as well as included in an intervention study requires estimating of the proportion of the population targeted by the study (p). This is calculated simply as:

$$ \hat{N} = \frac{N}{P}\;{\text{or}}\;\hat{n}_{k} = \frac{{n_{k} }}{p} $$
(2)

Similarly, the mean and standard deviations must be adjusted based on the approximated sample sizes, as can be seen in formulas (3) and (4), respectively.

$$ \widehat{\overline{X}}_{K} = \frac{{n_{k} \overline{X}_{k} }}{{\frac{{n_{k} }}{p}}} $$
(3)
$$ \hat{S} = \sqrt {\frac{{S_{k}^{2} \left( {n_{k} - 1} \right)}}{{\frac{{n_{k} }}{p} - 1}}} $$
(4)

For example, suppose a selective intervention includes 33% of the population and returns the following data: \( \overline{X}_{1} = 3.78 \), \( \overline{X}_{2} = 4.23 \), S 1 = 1.23, S 2 = 1.34, n 1 = 1,003, n 2 = 860, and N = 1,863. Using formula (1) above, this study yields an effect size of −.35 for 33% of the population. Using formulas (2), (3), and (4) above to adjust the estimate to include the 67% of the population not included in the estimate returns the study values: \( \widehat{\overline{X}}_{1} = 2.49 \), \( \widehat{\overline{X}}_{1} = 2.79 \), \( \hat{S}_{1} = 1.00 \), \( \hat{S}_{2} = 1.09 \), \( \hat{n}_{1} = 1520 \), and \( \hat{n}_{2} = 1303 \). Inserting these values into formula (1), the population adjusted effect size is \( \widehat{\overline{d}}_{G} = - .29 \). This effect size represents the population-adjusted impact of the intervention for both those targeted and not targeted by the intervention. Note that for universal interventions, p = 1.00. The population-adjusted effect size calculated with formulas (2), (3), and (4) will be identical to the study estimated effect size.

Results

The average weighted effect sizes were first calculated for each substance using inverse variance weighting. The homogeneity of the effect sizes—the degree to which all effect sizes are suggesting the same magnitude of relationship—was also assessed for each of the three substances examined. As there were few selective and indicated programs, and because many programs were designed to target both selective and indicated individuals, we collapsed across selective and indicated programs for the purposes of comparisons. As can be seen in Table 2, programs were the most successful in preventing and reducing tobacco use \( \left( {\bar{d}_{G} = .18} \right) \), but the distribution of program effect sizes was heterogeneous (Q(2) = 74.58, p < .05). Programs had the smallest combined impact in preventing and reducing alcohol use \( \left( {\overline{d}_{G} = .08} \right) \), but we note that the distribution of impacts for alcohol was extremely heterogeneous (Q(2) = 106.74, p < .05). The impact of programs on preventing and reducing marijuana was also heterogeneous (\( \overline{d}_{G} = .13 \); Q(2) = 39.75, p < .05). When we examined homogeneity later by IOM group, we found that the pattern of results was identical, and that nearly all effect sizes were heterogeneous. The only exceptions to this pattern occurred when effect sizes were consistently near zero. As there were few programs to examine, the major focus of these analyses are on the overall magnitude of effects (i.e., mean effect sizes), as opposed to statistical significance.

Table 2 Weighted ES, approximated weighted ES, and sample sizes by IOM type

Table 2 presents observed and adjusted effect sizes disaggregated by IOM program type. As expected, the observed impact on alcohol and marijuana use of selective and indicated programs is greater than that of universal programs. The observed average impact of selective and indicated programs \( \left( {\overline{d}_{G} = .22} \right) \) is markedly larger than that of universal programs on alcohol use \( \left( {\overline{d}_{G} = .07} \right) \). Although the difference is smaller, on average, selective and indicated programs also show a larger impact on marijuana use \( \left( {\overline{d}_{G} = .18} \right) \) than do universal programs \( \left( {\overline{d}_{G} = .12} \right) \). However, the average effect of universal programs \( \left( {\overline{d}_{G} = .18} \right) \) was greater than the average impact of selective and indicated programs \( \left( {\overline{d}_{G} = .04} \right) \) in preventing and reducing tobacco use.

The overall impact of NREPP programs remains relatively unchanged when adjusted to represent their effects on the population as a whole. This applies to all three substances examined: tobacco \( \left( {\overline{d}_{G} = .18\;\& \;\widehat{\overline{d}}_{G} = .18} \right) \), alcohol \( \left( {\overline{d}_{G} = .08\;\& \;\widehat{\overline{d}}_{G} = .07} \right) \), and marijuana \( \left( {\overline{d}_{G} = .13\;\& \;\widehat{\overline{d}}_{G} = .11} \right) \), and is likely due to the fact that the majority of programs sampled here were universal interventions with effect sizes that did not change as a result of the approximation. As expected, the adjustment had no affect on universal programs (see Table 2). Because they target an entire population, their results were not changed by our adjustments for selection into selective or indicated programs.

What does change, and often dramatically, is the average effect sizes for selective and indicated programs. When adjusted for population impact, their average effect size was reduced by approximately half. The overall impact dropped from \( \left( {\overline{d}_{G} = .04\;{\text{to}}\;\widehat{\overline{d}}_{G} = .02} \right) \) for programs reporting tobacco use outcomes, \( \left( {\overline{d}_{G} = .22\;{\text{to}}\;\widehat{\overline{d}}_{G} = .13} \right) \) for those targeting alcohol use, and from \( \left( {\overline{d}_{G} = .18\;{\text{to}}\;\widehat{\overline{d}}_{G} = .06} \right) \) for those targeting marijuana use. Although more effective than universal programs in reducing and preventing marijuana use by the individuals directly served by the program, selective and indicated interventions were less effective than universal programs in reducing marijuana use at the population level. At the population level universal programs remained more effective in reducing tobacco use and remained less effective in reducing alcohol use than selective and indicated programs.

The match between the population for which a program was developed and implemented, as classified by the IOM, was explored by examining their cross-tabulation. We found a significant but modest relationship between program type and selection of participants based on relevant criteria, c 2(2) = 7.31, p < .01, r c  = .41. However, the surprisingly small magnitude of this relationship can be explained by the fact that reports of only 6 of the 15 selective and indicated studies mentioned selection criteria.

Discussion

Examination of this preliminary sample of 25 programs suggests that policy makers cannot assume that study findings for a group of individuals generalize to the population as a whole. Policy makers should carefully consider whether their goal for SAPP is to produce changes for a specific group of individuals or a larger and more general target population. The results of the present study suggests that their focus plays a critical role in considering the relative merits of universal, selective, and indicated prevention programming. More specifically, when considering effects at the individual level, universal programs were modestly more successful in reducing tobacco use, but selective and indicated programs were modestly more successful in reducing alcohol and marijuana use. When considering the larger effects of these interventions on the population, universal programs still maintained only modest benefits in reducing tobacco use. Similarly, selective and indicated programs maintained only modest benefits in reducing alcohol use; however, there was a smaller difference between the magnitude of effects after adjustment (i.e., universal vs. selective/indicated). However, we unexpectedly found that selective and indicated programs were more successful in reducing marijuana use when considering program effects on individuals, but were less successful when considering program effects at the population level. From a computational standpoint, the latter is likely due to marijuana’s relatively low base rates of use relative to alcohol.

These results suggest that universal programs are clearly both more effective and cost effective—both at the individual and population level—in regards to reducing tobacco use. Also, due to the often low base-rate of use, universal programming may be more effective and cost-effective at the population level for reducing marijuana use. Alcohol is the most commonly abused substance among youth and this use is more strongly affected—both at the individual and population level—by selective and indicated programming. The question remains for alcohol use whether these differences persisted in effectiveness when considering the sometimes prohibitive costs of universal programming. Small follow-up analyses were conducted to examine the partial correlation between IOM status (selective and indicated vs. universal) and program effectiveness (as represented by the weighted effect size for each program) after controlling for published approximations of program cost. These preliminary analyses suggested that independent of the money invested in a program, selective and indicated programming continued to be more effective than universal programming in reducing alcohol use at both the individual (d = .39) and population level (d = .16).

The results of this study and the methods used here also underscore the difficulties inherent in identifying target populations for selective and indicated programs. The present study found a medium-to-large sized relationship between the audience that programs were designed to target and the audience for whom programs were actually implemented; nevertheless, this is smaller than one might expect for studies designed to validate a particular program or strategy. This relationship may reflect the fact that many of the selective and indicated studies reviewed here did not report their selection criteria for the targeted audiences. However, it more likely can be attributed to efforts by study investigators and program developers to effectively identify the appropriate targets for their prevention programming. We chose to define programs based on the samples they were implemented with here, as opposed to how the progenitors of each program defined the sample with which their program should be implemented. This is a better reflection of what program effect sizes reported in studies truly represent with respect to the samples or populations that they affect.

Identifying selective and indicated populations is difficult, as it requires an additional cost of acquiring data from the population of interest, as well as deciding on selection criteria. The identification of individuals who have already started to exhibit early signs of substance abuse (i.e., indicated populations) may be easier, as initiation of substance use can be an indicator of risks for later problem behavior. However, it is much more difficult to systematically and operationally define criteria for the selection of individuals who are at risk (i.e., selective populations). The lack of reliable and strong predictors of later problem behavior (e.g., Derzon 2007) represents a major gap in the literature. This also represents a major challenge for selective and indicated programs to demonstrate their true levels of effectiveness with the populations they target.

Although the methodology of this study is not innovative, we have adapted established techniques to more accurately estimate the population-level effects of targeting any particular IOM intervention strategy. Because much of the data are readily available from current studies, our approach provides a cost-effective advance to understanding what works to prevent substance abuse. The study presents a tool for practitioners and policy makers to consider all available information when making decisions about the populations to target for prevention strategies. This tool has the potential to inform policy makers as to which program types have the largest impact on reducing substance use at the population level. This is especially important, because policy makers typically make decisions at the population level. Moreover, whereas prevention programming is often aimed at producing population change, prevention programming effectiveness is typically measured and assessed at the individual level.

There are two limitations on which to draw all conclusions from this paper. More specifically, the results presented here are limited to the results of the 25 programs that were included in the NREPP and met selection criteria for this study. Twenty-five programs were too few to identify statistically significant differences. The estimates we obtained are reasonably accurate; however, study findings should be replicated with a larger sample of studies to assure that the findings are reliable. The second limitation is that the universe of studies considered here were limited to those testing the effects of programs identified in SAMHSA’s model or promising program list. Although this does ensure that we have only examined programs with some level of empirical support, the results presented here may not represent the true state of affairs in prevention—many universal, selective, and indicated programs are unrepresented. Our results are thus more illustrative of a potential tool for prevention practitioners and policy makers, whereas our conclusions must be considered preliminary until further confirmation with a larger sample of prevention programs.

In this study we sought to examine which IOM population constituted the most effective target for reducing substance use. The answer to this question appears to be that it depends on the substance use prevention needs of the population: more specifically, whether change is desired at the individual or population level. Cigarette use appears to be most efficiently targeted by universal programming (both at the individual and population level) and alcohol use tends to be most efficiently targeted by selective and indicated programming (both at the individual and population level). If a reduction in marijuana use is desired, then universal interventions appear to be more efficient in producing change at the population level, whereas selective and indicated programs are more appropriate for the individual level. Producing changes at the population level appear to be most efficiently accomplished by universal programs for substances with low base rates of frequent use (i.e., tobacco and marijuana), but less effective for substance problems that are more easy to identify due to their high base rates of frequent use (i.e., alcohol here).