Nurse-Family Partnership (NFP) is a program of intensive prenatal and postnatal home visitation by registered nurses. It targets low-income mothers and their first children. Visits start prenatally and ideally continue through age 2; 25–30 home visits over 17 months are typical.

NFP’s goals are to help parents improve the following: (1) prenatal health and pregnancy outcomes, (2) child health and development through more sensitive and competent care, and (3) parental life-course by developing and fulfilling a vision for their future, planning future pregnancies, completing educations, and finding work (Olds et al. 2002). Prenatally, NFP focuses on improving diet; reducing alcohol, tobacco, and other drug use during pregnancy; coordinating prenatal care; identifying pregnancy complications and treating them early; and helping expectant mothers plan their future (Kitzman et al. 1997). Postnatal priorities shift to assuring that the baby has a safe and healthy home; improving child physical care, emotional care, play skills, and communication skills that promote developmental gains; encouraging breast-feeding; maintaining maternal health behavior gains; reducing domestic violence (an issue given greater attention after the first randomized trial); and setting and achieving personal life-course goals.

Reviews of social service programs (e.g., Promising Practices Network http://www.promisingpractices.net/program.asp?programid=16, Lee et al. 2012; Miller and Levy 2000) consistently conclude that strong evidence shows NFP works. Recruitment for the program’s first randomized controlled trial began in Elmira, NY, in 1978 (Olds et al. 1986). Program model developers conducted additional trials in Denver and Memphis (Kitzman et al. 1997; Olds et al. 2002). These trials tracked participants longitudinally. Independent trials in Orange County, California, Louisiana, and the Netherlands added supporting evidence on short-term effects (Mejdoubi et al. 2014; Nguyen et al. 2003; Sonnier 2007). Less robust evaluations also are accumulating on NFP effectiveness in broad-based implementation (e.g., Rubin et al. 2011).

Lee et al. (2012) used meta-analytic techniques to assess eight outcomes across the three trials by NFP’s developers. This article is broader. It provides a systematic review of findings on 21 outcomes including 10 with evidence from independent trials or operational programs. It adjusts all outcomes downward to account for imperfect fidelity in replication.

NFP began program replication in 1996. Unlike many operational programs, NFP replication is highly regimented and closely monitored (NFP National Service Office 2011; Olds et al. 2013, 2002). Use of the NFP model and name is limited to implementing agencies that contract with the NSO, participate in centralized training and extensive reporting (including longitudinal data by client), pay fees to the NSO to administer the data system and monitor quality, and comply with 18 quality elements including standards governing maximum case loads of nurses and supervisors, time spent on NFP’s six domains, and nurse qualifications. NSO trains all nurse administrators, nurse supervisors, and nurse home visitors. NSO regional staff talk with state program coordinators at least weekly. Model improvements are evaluated in rigorous pilot studies (e.g., Ingoldsby et al. 2013), then rolled out to all sites.

By December 2013, 177,517 pregnant women enrolled in operational NFP programs (NFP National Service Office 2014). Online Table 1 describes the enrollees. Estimated costs were $8742 per family served (Karoly and Bigelow 2005) and $1.55 billion total (in 2010 dollars). This article aims to estimate how NFP will affect their lives and the lives of their babies, with future research planned on the associated return on investment.

Between Fiscal Years 2010 and 2014, the federal Maternal, Infant, and Early Childhood Home Visiting (MIECHV) Program provided $1.5 billion in funding to expand evidence-based home visiting programs. NFP programs received perhaps one quarter of all MIECHV funding. Thus, estimates of NFP’s outcomes can inform and bolster periodic reauthorization discussions.

Methods

To identify evaluations, we contacted NFP program model developers, replicators, and NSO staff and searched the literature. We identified 39 evaluation reports on NFP spread over time and place. This included 23 reports on the three randomized trials by the program model’s developers. We extracted effectiveness estimates for 21 outcomes and added evidence captured by the NFP NSO’s mandatory reporting system on six of them. Randomized trials by the program model’s developers provided all published evidence on eight of the outcomes. We computed impacts on three additional outcomes—preterm second births, subsidized child care, and Medicaid spending per child recipient—from documented impacts.

Table 1 summarizes the randomized trials and rates their quality. It shows enrollment by arm. Both Elmira and Memphis included arms that only received prenatal visits. Postnatal outcomes were not tracked in Memphis for this arm and the associated control group.

Table 1 Characteristics and Quality of Randomized Nurse-Family Partnership Trials

Louisiana trial data are less reliable than data from other trials because of heavy early dropout and loss to follow-up. Documentation is incomplete (simply a list of significant findings) and study staff refused to provide access to unpublished supporting tables. The Orange County trial’s birth-outcome evaluation was conducted early, before some pregnancies reached term. Planned Orange County follow-up data were not collected at age 1, and county staff were unable to provide birth outcomes for mothers not included in the published report. We excluded a German trial because it did not use nurses as its home visitors. As in the paraprofessional visitor cohort of the Denver trial (Olds et al. 2002), NFP delivery by German social workers and midwives had minimal effectiveness (Sandner 2013a, b).

This article looks across trials to decide which outcomes are assured and which are tentative. Some outcomes, however, only were measured in recent trials or time periods. For example, child psychological assessments first used in Memphis suggested NFP-associated improvements. That finding led to a more probing assessment in Denver which pinpointed the improvements. Such evolution prevents cross-trial comparison. As the last row in Table 1 shows, another source of nonequivalence is the variation in follow-up time between trials, notably in Elmira where follow-ups were spaced by 8–10 years.

The replication studies on operational programs use quasi-experimental designs. They compare outcomes for NFP mothers to outcomes for other mothers. Their quality is reduced by imperfect comparison group matching. Rubin et al. (2009, 2011) and Matone et al. (2012a, b), for example, used propensity scoring to select a comparison group but lacked the information needed to exclude families who declined NFP services. Decliners probably were at higher risk than those who accepted service. Conversely, lack of data on risk factors used in targeting NFP service offers means the comparison group also may include families at lower risk than NFP families. Thus, the direction of bias is unclear.

We estimated program effectiveness using mixed methods. For binary outcomes (e.g., was a birth preterm or was the child injured), we meta-analytically pooled estimates across the randomized trials, favoring estimates that were regression-adjusted to achieve sample balance. We used systematic review methods for continuous outcomes because some NFP effectiveness estimates came from studies that published mean effect differences but nothing precise about their variance. Also, only one or two effectiveness estimates existed for those outcomes; so, we lacked enough studies to develop cross-study estimates of effect using meta-analytic regressions.

To arrive at effectiveness estimates, we pooled data from randomized trials or computed a mean estimate across them. As described below, we made exceptions for infant mortality as the trials were not powered to detect changes, welfare spending (because eligibility rules changed after the Elmira trial), and immunizations (where replication data favored one trial over another).

Programs typically have lower effectiveness in replication than randomized trial (Lee et al. 2012). Our estimates arbitrarily assume effectiveness declines proportionally with the decline in visits per family from trials to operational programs. That suggests outcomes in replication will be 78.2 % of trial outcomes. We used the Crystal Ball® add-in to Excel to run bootstrap simulations that estimated 95 % confidence intervals (CIs) around our estimated savings based on our standard error estimates for percentage gains, 10 % standard errors for the unit medical costs, and a triangular distribution matching interstate visit rate variation for the replication factor.

Our outcome estimates often include problem incidence absent intervention. Additional baseline levels used to compute NFP savings were as follows: (1) percentage of unmarried pregnant women who report smoking during third trimester (20.6 % nationally, from online analysis of 2010 National Survey on Drug Use and Health data), (2) national repeat teen birth rate (Ikramullah et al. 2011), (3) 22 % of first-time low-income births involving pregnancy-induced hypertension (PIH) from New York City Medicaid data prior to NFP implementation (Senter et al. 2010), consistent with the 18 % rate in the pooled Memphis and Elmira control groups, (4) national neonatal mortality rate of 0.419 % (Martin et al. 2011); a rate for low-income infants in Illinois of 1.33 times the average (8.1/6.1) (University of Illinois at Chicago 2010), (5) national child maltreatment rates for low-income families, by year of age, and by type (e.g., physical abuse) (Sedlak et al. 2010), (6) 17.4 % of children aged 0–2 annually treated for injury nationally (Corso et al. 2006), (7) national youth arrest rates by year of age in 2009 (Snyder 2011) with an estimated 5.3 % of youth crimes resulting in arrest (Miller and Hendrie 2015), (8) alcohol, tobacco, and marijuana usage patterns at ages 12–15 from online analysis of 2010 National Survey on Drug Use and Health data.

Results

Table 2 summarizes evidence-based outcomes, our best estimates of effectiveness, and projected cumulative outcomes by 2031 for NFP clients enrolled in 1996–2013. Tables 3 and 4 provide evidence supporting the estimates. Here, we describe the rationale for our choices. All effects are statistically significant at the 95 % confidence level or greater unless otherwise stated.

Table 2 Expected life status and financial outcomes when first-time low-income mothers receive Nurse-Family Partnership home visitation services and projected total outcome change due to 177,517 NFP enrollments in 1996–2013
Table 3 Estimates by randomized trial and pooled estimates for 10 dichotomous NFP program outcomes
Table 4 Evidence about six outcomes of NFP by study

Reduced Smoking During Pregnancy

NFP mothers smoke 24.2 % less tobacco during their pregnancy. Rationale for percentage chosen: Cotinine is the gold-standard measure of tobacco use. Therefore, we chose the Denver trial’s value (times 78.2 % expected in replication) over the self-reported estimates. The PA study and NFP data system captured number of smokers rather than quantity smoked. Their information came from birth certificates or other self-reports which are an unreliable source of data on smoking during pregnancy (Northam and Knapp 2006).

Reduced Pregnancy-Induced Hypertension (PIH)

PIH declined by 31.3 %. Rationale: We multiplied a pooled 40 % PIH reduction in Elmira and Memphis times the 78.2 % replication factor.

Fewer Preterm First Births

NFP reduces preterm births (less than 37 weeks) by 14.7 %. Rationale: Because we want to estimate the impact of NFP in the USA, we used the 18.8 % pooled decrease across 5 US randomized trials times the 78.2 % replication factor. We suspect that this estimate is a conservative lower bound, both because prenatal visits per family have not declined from trials to replication and because the 30 % reductions observed in three analyses of operational programs suggest that 14.7 % may be low.

Fewer Infant Deaths

NFP participation reduces infant deaths by 45.4 %. Rationale: We chose the 58 % (95 % CI 44–70 %) mortality reduction from Cox’s OK study over the Cincinnati rate because results were not commingled with another program. We conservatively defined it as mortality reduction before age 1 and chose it over the sustained 60.7 % reduction in Memphis through age 9. Although the evidence came from operational programs, comparison group biases (see the online supplement) led us to reduce effectiveness with the 78.2 % replication factor.

Improved Birth Spacing

NFP mothers have 31.2 % fewer closely spaced second births within 24 months, thus reducing risks of costly complications. In years 3–12 post-partum, NFP neither raises nor lowers the birth rate. Rationale: The pooled 39.9 % estimate of reduction in close spacing from the three randomized trials is of highest quality. Applying the 78.2 % replication factor yields a 31.3 % reduction in closely spaced births in replication, close to the 27 % decline for young mothers in PA and the 31 % decline in New York City. Multiplying the percentage reduction times the US 2008 repeat teen birth rate of 23.46 % (Ikramullah et al. 2011) (a more conservative choice than the 28.0 % rate among controls in the pooled trials) suggests NFP mothers choose to bear an average of .0735 fewer subsequent children than controls (or .094 before the replication adjustment, a number used in the online supplement).

Fewer Abortions within 48 Months of the First Birth

30.7 % reduction in abortions through child age 3. Rationale: We multiplied the 39.2 % reduction in pooled Elmira low income, Memphis, and Denver data times the 78.2 % replication factor.

Fewer Subsequent Preterm Births

NFP mothers have 0.035 fewer subsequent preterm births. Computations: The online supplement describes the calculations. They account for the preterm birth rate for any subsequent birth and the rate elevation for closely spaced births.

Increased Breast-feeding Attempts

11.2 % (7.6 percentage point) increase in mothers who tried breast-feeding. Rationale: In pooled Elmira and Memphis data, breast-feeding rose 9.7 percentage points. Multiplying times 78.2 % yields an estimated 7.6 percentage point increase (an 11.3 % increase over the 2011 WIC-eligible breast-feeding level). This estimate should be conservative as it is lower than the observed 10.0–11.6 percentage point increase in operational programs.

Reduced Intimate Partner Violence (IPV)

16.1 % reduction in IPV through child age 4. Rationale: Violent victimization is subject to recall bias (Bushery 1981). Therefore, we favored 6 months over 3-year recall in Denver. As the online supplement details, we adjusted recall beyond 6 months for recall bias and computed 6-month victimization rates from longer-term reports. Pooling US data from like time periods (including using the presumably understated 2 % estimate from Memphis at ages 0–5 multiple times), average reductions were 31.7 % prenatally, 19.5 % at ages 0–2, and 26.9 % at age 4. Reductions of 12.5–15.1 % at ages 6 and 9 were not significant at even the 80 % confidence level; so, we assumed reductions ended at age 4. From ages 0–4, IPV was reduced by 20.6 %, which we multiplied times the 78.2 % replication factor. We defined IPV rates per 6-month period absent NFP as the 13.7 % postnatal probability and 18.1 % prenatal probability in the pooled control groups from the three US trials.

Fewer Childhood Injuries

Through age 2, NFP babies have 32.6 % fewer injuries treated in emergency departments (EDs) or admitted to hospital. Rationale: Multiplying the pooled 41.6 % reduction across the Elmira, Louisiana, and Memphis trials times 78.2 % suggests a 32.6 % reduction in replication.

Fewer Child Maltreatments

NFP reduces child maltreatment by 31.0 % at ages 4 through 15. Rationale: We multiplied the 39.7 % US reduction from Elmira (which is slightly lower than the Dutch reduction) times the 78.2 % replication factor. Child maltreatment follows a severity distribution; so, we assume that unconfirmed case counts will change as CPS-confirmed (substantiated or otherwise indicated) counts do. That assumption is conservative because NFP increases detection and captures evidence required for substantiation (Olds et al. 1995), which should cause a larger decrease in unconfirmed than confirmed cases. Temporally, reductions are concentrated at ages 4–15 (Zielinski et al. 2009). Our analysis conservatively assumes any effect before that age is subsumed in the broader reduction in nonfatal injury through age 2.

Better Language Development

NFP reduces language delay by 39.1 %, thus reducing the need for preschool or school-based remedial services. Rationale: Although the Elmira and Memphis trials demonstrated language development gains, Denver measured them more clearly. We multiplied the 50 % reduction in Denver times the 78.2 % replication factor.

Fewer Youth Criminal Offenses

NFP reduces youth arrests by 44.6 % at ages 11 through 19, with reduced arrests of girls predominating and arrest probabilities equalizing by age 19. Rationale: To date, this outcome only was reported in Elmira. We multiplied Elmira’s 57 % reduction times the 78.2 % replication factor. We assumed reduction in crimes committed mirrored reduction in arrests.

Reduced Youth Substance Abuse

NFP reduces alcohol, tobacco, and marijuana use by 53.2 % at age 12 until at least age 15. Rationale: We multiplied the 68 % average reduction in Elmira and Memphis times the 78.2 % replication factor.

Increased Immunizations

NFP participation is associated with a 13.0 % (9.1 percentage point) increase in probability that children covered by Medicaid will have complete immunizations at age 2. Rationale: We multiplied the 11.6 percentage point reduction versus Elmira controls without transport assistance times the 78.2 % replication factor. The Memphis trial estimate on this measure is contaminated because the trial reminded controls about and transported them to immunizations. Although 2 operational program comparisons found statistically significant 19 and 22 percentage point differences (p > .95), neither was based on a carefully matched sample.

Reduced TANF Payments

NFP reduces Temporary Assistance for Needy Families (TANF) payments by 5.6 % for 12 years post-partum. These savings result from reduced subsequent births and altered earning patterns that reduce TANF eligibility and payments per eligible family. Rationale: We multiplied the 7.2 % average reduction for the TANF-specific Memphis and Denver evaluations times the 78.2 % replication factor. Applying this percentage to current TANF participation data accounts for the downward shift in participation since 1996.

Reduced Food Stamp Payments

NFP reduces food stamp payments by 9.6 % for at least 12 years post-partum. These savings result from reduced subsequent births and altered earning patterns that reduce food stamp eligibility and payments per eligible family. Rationale: We multiplied the 12.3 % average reduction across the three trials times the 78.2 % replication factor.

Reduced Need for Medicaid Coverage

NFP reduces person-months on Medicaid by 7.6 % for at least 15 years post-partum, with these savings expected to continue. The participation reductions have two causes. First, the reduced second birth rate resulting from NFP services and possibly differences in earning patterns increase Medicaid graduation of mothers and to a lesser extent, of first-born children (although fewer children would graduate today because the Child Health Insurance Program and Affordable Care Act raised many state income eligibility thresholds). Second, NFP mothers bear fewer children. The births avoided are closely spaced ones at high risk of costly complications. Associated Medicaid cost savings include both birth-related costs and costs of continuing Medicaid participation of these second babies. Rationale: We multiplied the 9.8 % average reduction across the three trials times the 78.2 % replication factor.

Lower Costs if on Medicaid

NFP reduces the present value of Medicaid spending per child recipient by 8.5 % from birth through age 18 (bootstrap-estimated 95 % CI 4.5 %, 12.5 %). As documented above, NFP reduces smoking during pregnancy and related prematurity, pregnancy-associated preeclampsia, child injury in the first two years of life, medical and mental health spending on victims of child maltreatment, adherence to immunization schedules, and second births with complications. Those health status improvements should reduce Medicaid claims costs of mothers and first-born children. Rationale: Data availability prevented direct evaluation of savings in the randomized trials. The online supplement models the savings. We divided the savings by the present value of annual Medicaid spending per child recipient from birth through age 18 exclusive of live birth costs, $35,287 (Henry J Kaiser Family Foundation 2014).

Reduced Subsidized Child Care, Second Births

An estimated 4.85 % of the second babies who would have been born within two years of the first birth would have used subsidized child care funded by the Child Care Development Block Grant. Computations: 4.85 % of Medicaid and SCHIP children use subsidized child care nationwide (Office of Child Care 2010). We multiplied that rate times the 7.35 % reduction in subsequent births (derived above).

Other Outcomes

As the on-line supplement details, low birth weight, subsequent miscarriages, intimate partner violence after age 4, maternal criminal offenses, maternal depression, and grade retention declined in some trials but not in others or changed consistently but not enough to differ statistically from controls at the 90 % confidence level.

Outcomes Achieved

The last column in Table 2 shows estimated problems that program enrollments in 1996–2013 prevented or are projected to prevent and 95 % confidence intervals for those estimates. NFP enrollments through 2013 will prevent a projected 500 infant deaths, 10,000 preterm births, 4700 abortions, 13,000 dangerous closely spaced second births, 42,000 child maltreatment incidents, 16,000 other child injuries, 36,000 intimate partner violence incidents, 90,000 violent crimes by youth, 594,000 property and public order crimes (e.g., vandalism, loitering) by youth, 36,000 youth arrests, and 41,000 person-years of youth substance abuse. It will cause 16,000 children to comply with immunization schedules.

Since NFP families earn more and space children better, they place fewer burdens on government safety net programs. NFP is expected to eliminate the need for 4.8 million person-months of child Medicaid coverage. In 2010 dollars, (converted to present value using a 3 % discount rate), it will reduce estimated spending on TANF by $250 million, on food stamps by $540 million, and on Medicaid by $2.2 billion. Safety net savings will total $3.0 billion.

Discussion

NFP has broad-reaching effects on lives of mothers and children. Longitudinal NFP trials, therefore, can assess hundreds of outcomes. Statistically, a 95 % confidence level means 95 of 100 significant differences are real and five are artifacts, random events that do not represent true differentials. Given that, this systematic review fills a critical need by identifying findings that are consistent across trials or are significant at the 99 % confidence interval, meaning they should remain significant after statistical adjustment to account for the large number of outcomes tested.

Ethnic diversity of the trial populations is both a strength and a weakness. Reassuringly, trial findings replicate across cultures. Differential effectiveness, however, could represent cultural differences rather than lack of replicability. Olds et al. (1986) reported Elmira results for white mothers only while Orange County was restricted to teenaged Hispanic mothers. By design, these trials essentially become subgroup analyses. If outcomes vary by race/ethnicity, our practice of computing pooled cross-trial impacts giving each family equal weight may not yield a valid picture for the USA as a whole. At the same time, stratifying by race would virtually exclude Asians and Native Americans and force reliance on subgroup analyses for blacks. Smaller samples in subgroup analyses tend to lack statistical power and have wide uncertainty.

Our analysis has additional limitations. Some outcomes only were evaluated in one trial. Even pooling across six trials, impacts on birth outcomes are clouded by modest statistical power. (Ongoing trials should elucidate these effects.) Recent changes in safety net program rules, smoking rate, and teen birth rate reduce our confidence that related trial outcomes are replicable. Impact estimates also are less certain for outcomes like child maltreatment and medically treated injuries where nurse presence can increase reporting or change treatment decisions. Estimated Medicaid savings largely are computed, not observed. Although effectiveness is likely to decline from trials to operational programs, the degree of decline is unclear so our adjustment unavoidably is somewhat arbitrary. Finally, our national estimates implicitly assume operational program and trial populations are similar. Indeed, the birth-proximal outcomes generally replicate or are exceeded in the trials not conducted by the developers, the national data system, and many of the methodologically weaker evaluations of program efficacy in operational programs.

NFP clearly achieved most of its goals. It enriched the lives of participating low-income mothers and their offspring. It will benefit society more broadly by reducing crime and safety net demand. The $3.0 billion in expected TANF, food stamp, and Medicaid spending reductions (95 % CI, $2.0–$4.1 billion) far exceed the program’s $1.6 billion cost.

Federal policy has embraced home visiting programs. Our findings affirm that home visiting using the NFP program model makes major differences in the lives of low income families. It reduces intimate partner violence, child maltreatment, and youth crime and substance abuse, increases independence, and saves both money and lives. Expanding MIECHV and other public funding for NFP thus seems a wise investment. Nevertheless, the high cost per family requires a substantial front-end investment.