Significance

Elimination of malnutrition could be achieved by evidence-based interventions which could be identified by rigorous evaluations. We recently assessed the effect of a multidisciplinary program in a frozen conflict zone. We reported higher odds of stunting and lower odds of anemia among children receiving the program compared to children in the control group. However, the conditional models estimate the effect of interventions at subgroup levels. In community interventions, the causal effects of interventions in specific populations are of interest and could be estimated by marginal models. We compared the findings from marginal models to conditional models and their public health relevance.

Introduction

Child malnutrition remains an unresolved issue, particularly in lower and middle-income countries (de Onis et al., 2013). Although the global prevalence of stunting, an indicator of poor linear growth in children, has generally decreased over the past twenty years, this trend is due to disproportionate progress in infant and child feeding and maternal and prenatal health in high-income countries (de Onis et al., 2013). The global prevalence of anemia—also considered an indicator of malnutrition and poor health—has declined slowly, by only 12% between 1995 and 2011 (World Health Organization, 2018). This grim situation warrants massive efforts to reduce malnutrition. With current trends in reduction of prevalence of stunting (de Onis et al., 2013), it is unlikely that goals set by the World Health Assembly will be met (McGuire, 2015). Elimination of malnutrition—one of the sustainable development goals—could be achieved by informed and evidence-based interventions (United Nations Development Programme, 2019).

Numerous interventions have sought to reduce the prevalence of stunting, though not all have been successful. In a recent randomized cluster trial conducted in rural Zimbabwe, infant and young child feeding combined with a water sanitation and hygiene intervention was not superior to infant and young child feeding alone in reducing stunting (Humphrey et al., 2019). In a systematic review of the literature regarding intervention programs to reduce stunting in lower and middle-income countries, several approaches, including education and counseling for nutrition, vitamin supplementation, immunization, water sanitation and hygiene interventions, food safety, security, and social safety net interventions exhibited benefits in a number of settings (Hossain et al., 2017).

Identifying interventions to reduce the prevalence of stunting is challenging and requires robust and rigorous tools for evaluation study design and analysis. Among the several studies assessing the effect of interventions on childhood stunting included in the recent systematic review (Hossain et al., 2017), only one intervention was assessed by a randomized trial design (Arifeen et al., 2009). Although the populations that the effect estimates can be inferred to are implicit in the randomized trials, the interpretability of the observed effect estimates and the populations these effects could be inferred to are debatable in observational studies. Observational evaluation studies in the field of maternal and child health have often used parametric methods such as linear or logistic regressions (conditioning on potential confounding) to assess the intervention effects on outcomes such as stunting (Fenn et al., 2012; Haddad et al., 2014; Lima et al., 2010). In the absence of thorough discussions about the estimated effects, it could be challenging to identify the populations or sub-populations that will benefit from the potential effects of public health interventions.

We recently assessed the effect of a multidisciplinary program implemented in restive (Balalian et al., 2017) Tavush province in Armenia (Simonyan et al., 2019). The border regions of Tavush province have been affected by a chronic frozen conflict with neighboring Azerbaijan for more than 20 years (A. N. I. Armenian studies Research Center, 2015; The Economist, 2016). In 2013, the Fund for Armenian Relief of America initiated a multidisciplinary community development program to break the cycle of poverty (BCP). The details of this program have been previously described (Simonyan et al., 2019). In brief, BCP included food supplementation and healthcare capacity building at the family, kindergarten, and community level, provision of medical equipment and supplies, as well as other activities aimed at improving women’s and children’s health (Supplementary Table 1). The communities were selected based on their proximity to the border and available resources. In 2016, BCP was expanded to include nine new communities (Simonyan et al., 2019).

We reported that the odds of stunting were significantly higher (OR 1.92; 95%CI 1.13–3.26) and the odds of anemia significantly lower (OR 0.24; 95%CI 0.16–0.36) among children who lived in communities included in the BCP program compared with children living in communities not included in the program(Simonyan et al., 2019). However, the OR estimates were obtained by logistic regression models after conditioning on potential confounding variables. Furthermore, the conditional models estimate the effect of intervention at the subgroup level, though in community interventions, the effect of intervention at the specific populations (e.g. treated population) is of interest. Marginal models allow the estimation of average treatment effect in the entire population (ATE), average treatment effect among treated(exposed) (ATT), and average treatment effect using overlapping weights among overlapped populations (ATO). These effects are calculated using methods based on propensity scores and have improved interpretability, significant policy, and public health implications(Li et al., 2018; Pang et al., 2016).

The current study aimed to examine the causal effect of the BCP intervention program on anemia and stunting at the population level by applying and comparing three advanced marginal epidemiological methods: propensity score matching, inverse probability weighting (IPTW), and overlap weighting (OW). We further sought to demonstrate the differences in the effect estimates offered by each of these methods in comparison with conditional models, their interpretation, and their relevance to the evaluation of public health interventions.

Methods

Study Setting and Sampling

The current study is a secondary analysis of data collected in 2016 to evaluate FAR BCP program. The study settings and sampling methods have been previously described (Balalian et al., 2017; Simonyan et al., 2019). Briefly, in 2016 we sampled 983 children living in five intervention (BCP program) communities (n = 347) and eight control communities (n = 636). We estimated the required number of children within each age group to detect a 5% difference in population prevalence of stunting and anemia between exposed and unexposed groups using proc power procedure in SAS (with α = 0.05 and power = 0.80).

Measurements and Definition of Variables

The primary and secondary outcomes of the study were stunting and anemia. Each child’s weight and height (recumbent length for children younger than 24 months) were measured using electronic scales and measuring boards or stadiometers by physicians at each study site. Stunting was defined as a height at least two standard deviations (< 2SD) below the median height-for-age of the WHO growth standards (World Health Organization, 1995). Children’s hemoglobin (HgB) levels were measured using the HemoCue® HB 301, a device designed for quick analysis of capillary blood HgB (Morris et al., 2007; US Food & Drug Administration, 2019). A child was considered anemic if the blood hemoglobin level was less than 110 g/L. The socioeconomic and demographic characteristics of the children’s family and diet in the past 24 h were recorded using a self-administered survey and food frequency questionnaire completed by children’s caregivers (Simonyan et al., 2019).

Children were considered “exposed” to the BCP program if they were living in one of the five communities where the program was implemented during 2013–2016.

Ethical Considerations

The study protocol was approved by the Institutional Review Board at the Center of Medical Genetics and Primary Health Care, Yerevan, Armenia. Caregivers gave oral consent to participate in the study.

Statistical Methods

Missing Data

We used multiple imputation methods with the fully conditional specification procedure using 200 iterations between each imputation to run 15 imputations (Van Buuren et al., 2006), to impute missing data for the following covariates: (a) length at birth (n = 72); (b) whether child had ever had diarrhea (n = 21); (c) father’s employment (n = 15); (d) mother’s education (n = 12); (e) mother’s employment (n = 5); and child’s body mass index (n = 3). Data analyses were performed using SAS 9.4 and R (v. 3.6.0).

We estimated the causal effects of the BCP program on stunting and anemia by matching the observations using propensity scores and weighting the observations using the inverse of the propensity scores and by overlap weights.

Propensity Score Calculation

We calculated the propensity scores for each child by fitting a logistic regression model where exposure to BCP program was set as the outcome. The models included: kindergarten attendance, child age by month, weight of the child at birth, length of the child at birth, minimum dietary diversity, any history of previous diagnosis of intestinal parasite infection, any history of diarrhea, mother’s and father’s height, mother’s education, mother’s employment status, father’s employment status, monthly household expenditure, presence of sewage and water system in residence, access to electricity and natural gas at home, having participated in community education events and having received any printed materials related to maternal and child health and safety, all of which were used as covariates in the multivariable models of the original study (Simonyan et al., 2019). Since 15 imputed datasets were created, the mean probability of being exposed from the 15 imputed databases was used as the propensity score for each observation to conduct the matched and IPTW and OW analysis, as suggested by Mitra and Reiter (2012).

Propensity Score Matching (PSM)

Children who lived in communities that participated in the BCP program were matched with children living in the unexposed communities. Matching was performed by the propensity to reside in areas that engaged in the BCP program.

We matched the exposed and unexposed children (ratio of 1:1) using the greedy matching method. The caliper was set to 0.2 standard deviations of the logit of the propensity score (Austin, 2011; Rosenbaum & Rubin, 1985). Finally, we compared the matched exposed and unexposed groups and calculated the ATT of stunting and anemia and corresponding 95% confidence intervals (95%CI) by fitting negative-binomial models using the proc genmod procedure in SAS 9.4.

Inverse Probability Weighting (IPTW)

We calculated the ATT by the stabilized IPTW (IPTW-ATT) using the proc genmod procedure in SAS 9.4 (Hernan & Robins, 2010). Inverse population weighting was used in order to standardize the sample in both the exposed and unexposed groups. The inverse probability weights were calculated based on the propensity for exposure to the BCP program. Each observation was assigned a weight, such that the contributions of under- and over-represented observations in the unexposed group were leveled with the distribution of covariates in the exposed group. “Exposed” observations were given a weight of P(Exposed)/(propensity score to be included in the FAR program), and “unexposed” observations were assigned a weight of P(Unexposed)/(1 − propensity score to be included in the FAR program) (Williamson et al., 2012). We further stabilized the weights to preserve the original sample size (Hernan & Robins, 2010). The IPTW ensures that the distribution of measured covariates is independent of exposure assignment (Austin, 2011; Rosenbaum & Rubin, 1985).

Overlap Weights (OW)

The overlap weighting seeks to weight the study population to reduce the contribution of the outlier observations that have the highest or lowest propensity for receiving the intervention. Overlap weights increase the contribution of the observations with a similar propensity to receive the intervention by assigning higher weights to them. This weighting method does not inflate the sample size and ensures a balance in the measured baseline characteristics between the groups, mimicking the conditions achieved by randomized controlled trials (Li et al., 2018). The observations exposed to the FAR BCP intervention are weighted based on the propensity of not being exposed to the intervention (1-propensity score to be included in the FAR program), while the unexposed observations are weighted based on the propensity score of receiving the intervention (propensity score to be included in the FAR program). The ATO corresponds to the effect of BCP intervention among the participants that share the most overlap in characteristics and are assigned a higher weight. The ATO was calculated (Li et al., 2018) using proc genmod procedure in SAS 9.4.

Results

The sociodemographic and dietary characteristics of study participants are presented in Table1. In 2016, 1300 children were invited to participate in the evaluation study. The response rate was 85% (Supplementary Fig. 1). The final sample included 347 children from exposed and 636 from unexposed communities. The propensity for BCP intervention ranged from 0.06 to 0.87 (Table 1). We matched 308 children exposed to the FAR program to 308 unexposed based on the propensity scores. A total of 367 observations were excluded from the propensity score matching analysis by the matching process. Most of the excluded unexposed observations had higher propensities to participate in the FAR program. In general, propensity score matching achieved its goal of obtaining a balanced propensity of exposure among those exposed and unexposed to the FAR program (Fig. 1). The average treatment effects among the children exposed to the program calculated by propensity score matching (PSM-ATT) was similar to the odds ratio estimates calculated in the conditional models by Simonyan et al.(2019): (PSM-ATT 1.93; 95%CI 1.15–3.28) for stunting and 0.28 (95%CI 0.19–0.42) for anemia, respectively (Fig. 2; Supplementary Table 2).

Table 1 Comparison of study participants in communities receiving and not receiving intervention (children 6 month-6 years old)
Fig. 1
figure 1

Top. Mean propensity among exposed and unexposed groups. A (Left) Before (n = 983) and B(right) after 1:1 ratio matching (n = 616). Blue bars indicate participants exposed and the red bars indicate the participants who were not exposed to the program. The number of participants are on the x axis and the propensity score to be exposed to intervention is on the y axis

Fig. 2
figure 2

Top. Effect estimates from propensity score matched, IPTW and overlap weighting analysis compared with the results of Simonyan et al. (2019). OR odds ratio, RR risk ratio, CI confidence interval, ATT average treatment effect among exposed, IPTW inverse probability weighting, ATO average treatment effect in the overlapping population using overlapping weights

The ATT estimate of stunting calculated by IPTW was reduced (IPTW-ATT:1.82; 95%CI 1.16–2.86) compared to the estimates calculated by logistic regression models conditional on confounders (OR 1.92; 95%CI 1.13–3.26). Nevertheless, the ATT estimate for anemia was similar to those calculated by propensity score matching (IPTW-ATT 0.33; 95%CI 0.23–0.46). The effect estimates calculated by overlapping weights were reduced slightly (ATO 1.75; 95%CI 1.14, 2.68) for stunting and (ATO:0.31; 95%CI 0.22, 0.43) for anemia compared to IPTW-ATT and PSM-ATT (Fig. 2; Supplementary Table 2).

Discussion

The current study found a slightly attenuated association between participation in the FAR program and stunting in a propensity score matching analysis, compared with the effect estimate from the models constructed by Simonyan et al. (Simonyan et al., 2019). The ATT obtained from IPTW analysis and ATO calculated by overlapping weights also showed a slightly lower risk for stunting following participation in the intervention program. Despite the identical covariate adjustments in the conditional and marginal models assessing the risk of stunting, there were notable differences between the estimates obtained from these models. However, the estimates of the effect of exposure to the BCP program on the risk of anemia from the conditional and marginal models were very similar.

We posit several reasons for the observed findings of our study. It is implausible that the BCP intervention has caused stunting, we rather believe the observed estimated effects of intervention could reflect the lingering effects of pre-existing (unobserved) community-specific stressors that led them to be selected as target communities for FAR intervention sites. Second, as Simonyan et al. (2019) described, the higher risk of stunting in the intervention communities could be due to surveillance bias among physicians trained by the FAR BCP program or due to a higher proportion of stunting among children in the intervention communities from 2013 compared to the control communities that could not be alleviated by a three-year program. Due to the lack of baseline measures in the control communities, we are unable to determine if intervention and control communities were similar at baseline in terms of outcomes and covariates.

We observed slight differences in effect estimates across the marginal models and in comparison with the conditional models. The characteristics of the underlying models may explain these observed differences Conditional methods estimate the interventions’ weighted average of stratum-specific effect on health outcomes; they do not necessarily estimate the ATE and ATT (Pang et al., 2016). Both IPTW and propensity score matching can be used to estimate the ATT. The propensity-score matched sample is a subset of the study population in which the covariates are balanced by matching the exposed and unexposed observations based on propensity score (Austin, 2011; Hernan & Robins, 2010). Propensity score matching also results in a reduction in sample size due to excluding unmatched observations. Since the caliper matching method utilizes more stringent criteria to match exposed with unexposed participants, it did not find unexposed matches for all exposed children, and not all exposed participants were matched with an unexposed participant. Hence, the analytical sample did not coincide with the initially intended population. Thus, the ATT from propensity score matching could be interpreted as effect estimates among the exposed who were matched to an unexposed subpopulation and might vary across the studies (Lunt et al., 2009).

In contrast, the IPTW achieves the covariate balance by creating a pseudo-population in which each individual is weighted by the inverse of the conditional probability of attending the observed intervention. The IPTW can be used to calculate all ATT or ATE depending on the weight used to create the pseudo-population (Austin, 2011). Nonetheless, the extreme probabilities of receiving or not receiving the intervention can inflate variance and reduce the balance between intervention groups.

Overlap weights attempt to address some of these challenges by assigning higher weights to the observations with a similar propensity of receiving or not receiving the intervention (observations with propensity scores closer to 50%) to achieve exact balance in baseline characteristics (Li et al., 2018). The average treatment effect among the overlapped population could be interpreted as the intervention effect estimates among the populations with the most overlap in background characteristics between the exposed and unexposed groups. In our study the ATO refers to the children exposed and unexposed to BCP program who had the most overlap in their background characteristics.

The non-collapsibility of odds ratios between conditional and marginal models (Austin, 2011; Greenland, 1987; Pang et al., 2016) in observational studies may be another source of the difference we found between our conventional analysis and marginal models. An effect measure is considered collapsible if the conditional and marginal estimates coincide. For example, in randomized clinical trials, the risk differences and difference in means coincide. Nonetheless, odds ratios and hazard ratios do not have the same collapsibility properties, even in the absence of confounding. Therefore, the conditional and marginal models might not result in the same effect estimates (Austin, 2011; Greenland & Morgenstern, 2001). In the absence of confounding, both conditional and marginal models are valid and can be interpreted based on the question the researcher seeks to answer (Pang et al., 2016).

We note that marginal models provide clarity over conditional models by offering effect estimates comparable in well-defined (sub)populations of interest. However, they do not eliminate threats to internal validity. Calculation of propensity scores depends on accurate measurement of relevant covariates, thorough sampling strategies, and selection of source and target populations to reduce the possibility of bias due to unmeasured confounding, differential measurement errors, and selection bias.

Marginal models based on propensity scores are alternative epidemiological methods that can be applied to estimate the causal effects of interventions on specific well-defined (sub)populations that can benefit from those interventions. These methods may be used in similar quasi-experimental settings, given that sufficient information is obtained to calculate the probability of participating in the intervention program and have significant public health and policy implications.