Introduction

Randomized trials recruit a single client cohort and deliver a program on a one-time basis. Trials that identify effective prevention measures often prompt the creation of operational programs that deliver the program on an ongoing basis at a larger scale, with more clients and sites than in the trial alone and continuous replacement of program completers and dropouts with new clients. A critical question when scaling up from randomized trials to operational programs is how costs will differ (Foster, Dodge, & Jones, 2003; Suter, 2010). Too often, prospective analyses of returns on investment in replications simply use cost estimates from randomized trials (e.g., Aos, Lieb, Mayfield, Miller, & Pennucci, 2004; Lee, Aos, Drake, Pennucci, Miller, & Anderson, 2012; Miller & Levy, 2000). Operational program developers may try to do better. They can budget for positions and other resources needed to start a replication or operate it as designed, and then add a contingency fund to offset the costs of variations from the ideal. Funders and policymakers, however, are more concerned with actual program costs after start-up. They know what the original trials cost, but need to know if the scaled up program will benefit from economies of scale or lose efficiency as it adds fidelity controls and layered management.

While cost estimates from randomized trials typically are available net of research costs, these do not represent sound estimates of replication costs. University overheads and fringe benefits, for example, are unlikely to mirror those in local health departments or private agencies. Replicators also are more likely to garner donated resources (Foster et al., 2003).

This article provides a case study on cost variations between randomized trials and broad implementations. We focus on costs of the Nurse-Family Partnership (NFP), which is a labor intensive program of regular prenatal and postnatal home visitation by registered nurses that targets low income mothers and their first-borns. NFP started its first large-scale operational program in 1996 after three randomized trials established its effectiveness (Kitzman et al., 1997; Korfmacher, O’Brien, Hiatt, & Olds, 1999; Olds et al., 1997). It had served 177,517 families as of 2012 (NFP National Services Office, 2013).

NFP Model and Staffing

NFP has a target of 64 visits to each family that participates in the program. Home visits usually begin between week 13 and 28 of pregnancy. Ideally the family’s nurse visits include four weekly visits after enrollment in the program and weekly visits for 6 weeks after birth; visits every other week through the child’s 21st month; and monthly visits for months 22–24. Visits never extend beyond month 24.

In reality, entry after week 13, scheduling problems, attrition, and early graduation dramatically reduce the total number of visits. In three randomized trials, NFP enrollees averaged 31 visits (33 in Memphis, Kitzman et al., 1997; 28 in Denver, Korfmacher et al., 1999; and 32 in Elmira, Olds et al., 1997). Prenatal visits averaged 7.5 (6.5, 9, and 7, respectively). In scale-ups, lifetime visits averaged 25.2 (8.4 prenatal, 11.6 in year 1, and 5.2 in year 2) among 10,367 families served by well-established NFP sites between July 1996 and December 2001 (O’Brien et al., 2012; year 2 count, personal communication, David Olds, 6/29/2012).

NFP assigns each family a primary nurse, whose targeted caseload is 25 families. Client turnover typically causes caseloads to fall slightly below their target. The largest risk factor for client attrition is nurse turnover (odds ratio = 7.5; O’Brien et al., 2012), so managers of randomized trials could not reassign clients as caseloads shrank due to graduation and attrition. The visiting nurses report to a full time nurse supervisor, who manages a maximum of eight nurses. The supervisor provides clinical supervision with reflection, demonstrating the integration of theories, and facilitating professional development through one-to-one clinical supervision, case conferences, team meetings, and field supervision.

A team of professionals from public health policy and administration, nursing, education, and program evaluation at the NFP National Services Office (NSO) and partner organizations support agencies that implement NFPs. The team helps to develop and fund the programs, train the nurses and supervisory nurses, monitor implementation fidelity to 18 model program elements, and operate a centralized data system. The data system pools client and program data collected by nurse home visitors and their supervisors. These reports guide practice, assess and guide program implementation, inform clinical supervision, enhance program quality, and demonstrate program fidelity.

Methods

The randomized trials that preceded NFP scale-up published total costs per family served net of research costs, but without cost category breakdowns. To analyze the costs in operational programs, the NFP NSO contacted states with well-established programs to secure data available on total expenditures for NFP Fiscal Year 2009–2010 including fringe benefits, overheads, and the value of any donated goods and services received. Six states supplied this information. Because state NFP programs typically delivered services through subgrants to local health departments and community-based organizations and did not require a detailed accounting of how funds were spent, they were unable to disaggregate total costs into cost categories like personnel, fringe benefits, and transportation. Anecdotal information suggests that salaries and fringe benefits for nurses and nurse-supervisors dominated the costs.

We secured data on families served, days participating in the program per family, visits per family, mean visit duration, number of full time equivalent (FTE) visiting nurses, number of nurse supervisors, and caseload per full time equivalent (FTE) nurse from the NFP NSO data system. We adjusted state costs to national prices using mean hourly wages for registered nurses by state (U.S. Bureau of Labor Statistics, 2011). In order to compare them to operational program costs, we inflated published costs per family in the randomized trials to 2010 dollars using the Employment Cost Index, total compensation for nurses (U.S. Bureau of Labor Statistics, 2013), as a price adjuster series.

We computed costs per day of program participation and per visit by state, their standard deviations among states, and their pooled means across the six states. To estimate national average costs, we multiplied the average cost per participation day times national average participation days from the NSO data system. When money is invested it earns interest. In addition to reporting the total cost per client over 2.5 years of service, we report present value, the amount that would have to be invested at client enrollment to pay the costs as they came due. Specifically, we present both undiscounted cost estimates and estimates with future costs discounted to present value using mid-year discounting at the 3 % rate recommended by the Panel on Cost-Effectiveness in Health and Medicine (Gold, Siegel, Russell, & Weinstein, 1996). Discounting means that the further in the future costs occur, the less they count.

We discussed our findings with both the program developer, David Olds, and with NFP NSO operational staff, seeking insights into the causes of the cost difference we observed.

Results

The 13,268 NFP clients nationwide who began service during 2008 averaged 24.2 visits, of which 7.6 were prenatal. However, this average was variable; among state programs, it ranged from 18.2 to 37.5 visits with a standard deviation of 4.3. Replications averaged fewer visits per client than trials (p = .05). Participation days per family averaged 503.4 with a standard deviation between states of 56.1.

The six state programs that supplied data for this study served more than 19,000 families in Fiscal Year 2009–2010 (Table 1). The average family had 25.0 visits and stayed in the program for 511 days. A 6:1 nurse to supervisor ratio meant that supervision was more intensive than in the model program. Nurses served an average of 33 families per year including those with only one or two visits.

Table 1 Undiscounted costs of delivering NFP services in 6 states, July 2009–June 2010

NFP costs per client participation day averaged $17.35, with an interstate range from $15.09 to $20.48. Costs per family varied widely across the six states with an undiscounted average of $8870. With a national average participation of 503.4 days, costs per family would have been $8734 (present value $8580). That cost estimate is comprehensive, and includes salaries, fringe benefits, administration and supervision, offices, supplies, travel, and NSO fees.

Table 2 compares means and standard deviations of cost and utilization data available for both trials and full implementation. The cost per family in operational programs was 70 % of the $12,398 present-value cost across the trials in Denver ($11,846, Miller et al., 2010), Elmira ($11,979, Olds, Henderson, Phelps, Kitzman, & Hanks, 1993), and Memphis ($13,370, Glazner, Bondy, Luckey, & Olds, 2004). Despite the small number of observations, the differences in cost per family served were statistically significant (p = .01). One driver of the difference we observed is that the replication sites had fewer visits per family than the trials. The 24.24 average visits per family in replication are 78.2 % of the 31-visit average in the three trials, a difference that is marginally significant at p = .053. A second driver is a lower cost per visit. Because variance in cost per visit is large in these small samples, the 11.6 % reduction in cost per visit from trials to replications ([$400–$353.70]/$400) is not significant (p = .30).

Table 2 Mean and standard deviation of NFP visits/family, cost/visit, and cost/family in randomized trials and scale-up, by location of the NFP program (in 2010 dollars)

Differences in visit duration did not appear to contribute to the differing cost per visit. In the Denver trial, visits averaged 77 min during pregnancy and 72 min during infancy and toddlerhood (Korfmacher et al., 1999). By comparison, through 2011, visit length in replication averaged 76 min during pregnancy, 74 min during infancy, and 73 min during toddlerhood (NFP NSO, 2013).

Nurses maintained families in their caseloads even if they missed multiple visits, only replacing them if they left the service area, graduated (typically at 24 months after birth), or declined to make further appointments. With nurse salaries a dominant cost driver and a fixed target caseload limit per nurse, senior NFP NSO staff reported that program costs were largely determined by days in the program, and not visits completed, travel time to visits, or frequency of trips that become missed appointments rather than visits. Thus, although reduced utilization may have reduced program effectiveness, it minimally affected the cost per family served. Since targeted caseload per nurse was the same in trials and replications, differing numbers of visits per client would have little impact on differences in costs per client served.

Operational staff said that the key drivers of cost differences between trials and replications were time in program and the difference in caseloads in replication versus in the wind-down period of the trials. Nationally, program caseloads for established replication sites averaged 20 families per nurse with a standard deviation of 4.1 in 2009 (computed from an unpublished NFP NSO staffing spreadsheet), essentially the same as the mean caseload of 21 per nurse in trials during the first 6 months of infancy (Kitzman et al., 1997; Korfmacher et al., 1999; Olds et al., 1997). NFP trials hired and trained nurses as recruitment started, then filled their caseloads over a 12- to 15-month recruitment period. Because the design of the NFP keeps each family paired with their original nurse-provider until the infant is 24 months old, nurse caseloads in the trials averaged about 10–12, not only during recruitment but also during a year of wind-down as families dropped out or graduated. Those small caseloads were less efficient and practically doubled nursing expenses per family during start-up and wind-down relative to an operational program. Similarly, although the 7.4 nurse average per supervisor in replication was slightly less efficient than the 8.3 average reflected in the trials, during time periods when nurses served fewer families, supervision costs per family served rose in the trials. Thus, replication clearly gained from economies of scale resulting from program continuity.

Slightly offsetting those gains, unlike in the trials, operational programs had to devote time to recruitment and enrollment on an ongoing basis, especially in programs with high client turnover. Because nurse caseload was capped, however, differences in visits per family should not affect salary expenses per family served.

Discussion

At least in this labor intensive program, despite greater attrition in scale-up, costs associated with scale-up were lower than those in randomized trials, probably because nurses in operational programs quickly added new clients as existing clients graduated or dropped out, whereas nurses in the randomized trials simply had downtime. In particular, economies of scale may account for the 11.6 % reduction in costs per visit from trial to replication.

Limitations

Participation days per family were not available for the trials we reviewed. Therefore, our unit cost comparisons were based on cost per visit, which was a second best measure. Because neither randomized trial nor operational program data were available in cost categories like salaries and travel, it was impossible to fully explore the causes of the differences we observed. For example, we do not know if differences in nurse salaries or in fringe benefit levels were a major reason for the cost differentials. Our small sample also limited our power to detect cost differences. Although the six states that provided data were geographically spread, their costs may not be fully representative of the United States. It also would have been desirable to examine standard deviations across individual or local programs as well as across state programs.

The costs shown here comprised program delivery cost but excluded participant costs. They also included average costs per family served, not the marginal costs that would be added to serve an additional family. Moreover, the costs shown were for caseloads consistent with the range of 25–31 visits per family achieved in the trials and replications. For program completers, in replication, NFP has completed roughly the same 65 % of visits during infancy and 60 % during toddlerhood that were completed in the trials (NFP NSO, 2013). Because caseloads were geared to that visit load, a program that completed the ideal 52 postnatal visits would require a smaller caseload per nurse.

Program developers and managers have performed quasi-experimental and randomized controlled studies aimed at optimizing dosage (Olds et al., 2013). Those studies led to recent program changes that increased nurse-family collaboration in deciding on visit frequency, content, and location, which the studies suggested will reduce attrition and improve outcomes (Olds et al., 2013). In reducing attrition, however, they may raise costs per client. Moreover, we estimated costs for a typical NFP program. A recent analysis that investigated how client turnover varies by maternal demographic characteristics and risk status (O’Brien et al., 2012) may provide a stronger basis for projecting costs when planning programs to serve specific catchment areas.

Comparison with Other Programs

The literature provides few studies as comparisons. Crowley, Jones, Greenberg, Feinberg, and Spoth (2012) reported that program delivery costs of three family-centered and school substance abuse prevention programs were lower in scale-up. However, the costs of centralized efforts to place programs in communities and assure implementation fidelity offset that reduction. The authors suggest that replication planning needs to consider “total costs of adoption, implementation, and sustainability” (p. 257). NFP costs in replication were lower than costs in trials when viewed in that total cost context.

Cisler, Holder, Longabaugh, Stout, and Zweben (1998) estimated that costs of different alcohol treatment regimens in a research setting varied from 2.3 to 3.0 times those in replication, depending on regimen. The lowest cost regimen differed between research and replication. Those differences were so large that we speculate that the costs of research data collection were not removed from the randomized trial cost estimates.

Ginexi and Hilton (2006) suggested that program costs vary widely across implementations. Differing population cultural characteristics, agency settings, and existing infrastructures can drive those variations (Chatterji, Caffray, Jones, Lillie-Blanton, & Werthamer, 2001; Suter, 2010).

Our estimates across six state programs supported these observations. They suggest that replicators should not expect a simple rule to guide their cost estimation for scale-ups. Nevertheless, NFP replication probably achieved some economies of scale.