Introduction

Compared to conventional epidemiologic risk factor research estimating probabilistic average associations at a single level, multilevel modeling accommodates complexity in contextual heterogeneity [1,2,3,4]. Hence, multilevel modeling has become increasingly common in health literature that aims to promote population health and health equity. To date, multilevel studies have been restricted to modeling variation at higher level(s) only (i.e., between-population), while assuming a single constant variance at the individual level (i.e., within-population). The differential variation in health between individuals within a population remains unstudied [5, 6].

The fundamental assumption of homogeneity is quite unrealistic for many physiological, behavioral, and social outcomes [7]. People of different age, gender, race/ethnicity, and socioeconomic status may be healthier or sicker on average, and also differently variable in terms of their health status. An alternative approach proposed by Goldstein [8] recognizes and explicitly models such “complex level-1 variation” as a function of specified predictor variable(s) [8]. While the technical advantages of this methodology have been discussed previously [7, 8], its substantive importance has been underappreciated and it has not been applied in health literature. Identifying groups with particularly large inequality in health experiences and understanding factors that systematically contribute to differential variation have important implications for individual and population health [5, 9].

Increasing within-group inequality in body mass index (BMI) has been consistently documented in different populations [10,11,12]. This paper aims to explicitly model individual heterogeneity in women’s BMI using global data pooled from 57 low- and middle-income countries (LMICs). In this novel application, we specifically investigate the following three questions. First, we explore whether women of different sociodemographic characteristics are more or less variable in terms of their BMI measures. For instance, there are physiological reasons to anticipate that older women, on average, have higher BMI than younger women, but it is also important to quantify whether variability in BMI also systematically depends on age. Second, we attempt to model the full complexity of inter-individual variation in BMI as a function of a combination of sociodemographic variables. Lastly, to better understand the stochastic and systematic components of individual variation in BMI, we examine the proportion of variance explained by basic socioeconomic characteristics, each individually and collectively, for different age groups.

Methods

Data source and sampling plan

The data for this study were extracted from the latest cross-sectional Demographic and Health Surveys (DHS) conducted in 57 LMICs between 2005 and 2014. DHS includes standardized and representative sampling of participants, objective measurement of anthropometric measures, and high response rates [13]. Given the standardized data collection procedures across countries and consistent content over time, DHS allows comparability across populations cross-sectionally and over time [13]. DHS collected individual observations following a probability-based cluster sampling procedure, which was then adapted to specific contexts within each country. Sampling frames were first developed on the basis of non-overlapping units of geography (identified as the primary sampling units (PSUs)) that cover the entire country and a fixed proportion of households were selected using systematic sampling within each PSU [14].

Study population and sample size

Our analytic study population was pooled from the DHS rounds V, VI, or VII, whichever was the latest round for each of the 57 countries. Of the 912,444 women, 4801 women younger than 15 or older than 49 years, 68,571 women who were pregnant at the time of the survey, and 175,380 who were not measured for height and/or weight by the study protocol did not meet the eligibility criteria for the analysis. Moreover, 19,891 women were excluded for missing anthropometric measures and 322 for having biologically implausible height (< 100 or > 200 cm) and/or weight (< 20 or > 200 kg) measures. Lastly, 164 women were excluded for missing information on education level and marital status, leaving 643,315 women in the final analytic sample (Fig. 1).

Fig. 1
figure 1

Final analytic sample from 57 low and middle income countries (Demographic Health Surveys, 2005–2014). *Albania (2008–2009), Armenia (2005), Azerbaijan (2006), Bangladesh (2014), Benin (2011–2012), Bolivia (2008), Burkina Faso (2010), Burundi (2010), Cambodia (2014), Cameroon (2011), Chad (2014–2015), Colombia (2010), Comoros (2012), Congo (Brazaville) (2011–2012), Congo (DRC) (2013–2014), Cote d’Ivoire (2011–2012), Dominican Republic (2013), Egypt (2014), Ethiopia (2011), Gabon (2012), Gambia (2013), Ghana (2014), Guinea (2012), Guyana (2009), Haiti (2012), Honduras (2011–2012), India (2005–2006), Jordan (2012), Kenya (2014), Kyrgyz Republic (2012), Lesotho (2014), Liberia (2013), Madagascar (2008–2009), Malawi (2010), Maldives (2009), Mali (2012–2013), Moldova (2005), Mozambique (2011), Namibia (2013), Nepal (2011), Niger (2012), Nigeria (2013), Pakistan (2012–2013), Peru (2012), Rwanda (2014), SaoTome and Principe (2008–2009), Senegal (2010–2011), Sierra Leone (2013), Swaziland (2006–2007), Tajikistan (2012), Tanzania (2010), Timor-Leste (2009–2010), Togo (2013–2014), Uganda (2011), Yemen (2013), Zambia (2013–2014), Zimbabwe (2010–2011)

Outcome

BMI (kg/m2) was calculated as weight (kg) divided by the square of height (m2). Trained investigators weighed each woman using a solar-powered scale with an accuracy of 100 g and measured height using an adjustable board calibrated in millimeters.

Explanatory variables

The following five sociodemographic variables, including age, place of residence, household wealth, education, and marital status, were considered in the analysis. Women’s age was categorized into seven groups of 15–19, 20–24, 25–29, 30–34, 35–39, 40–44, and 45–49 years. A binary variable for place of residence was defined as census-based urban versus rural. In DHS, household wealth was captured through a composite index of relative standard of living derived from country-specific indicators of asset ownership, housing characteristics and water and sanitation facilities, which was then divided into quintiles for each country [15]. Women’s education was coded in four categories indicating no schooling, primary schooling, secondary schooling, and higher schooling. Finally, a variable with three categories of never married, married/living together, divorced/separated/widowed was used for women’s marital status.

Statistical analysis

The individual files from 57 countries were combined to create a pooled dataset. First, a linear regression model for BMI, adjusting for all the pre-specified covariates (women’s age, place of residence, marital status, education, and wealth), was constructed to serve as the base model for comparison with subsequent models (Model 1). In this ordinary least squares model assuming homoscedasticity, the level-1 (between-individual) variance in BMI is estimated as \(e_{0} \sim{\text{i}}.{\text{i}}.{\text{d}}. N(0,\sigma_{e0}^{2} )\). Next, we applied Goldstein’s method [8] to model BMI variance as a function of age (Model 2a), type of residence (Model 2b), wealth quintiles (Model 2c), education (Model 2d), and marital status (Model 2e). At level-1, the residuals are no longer summarized with one variance, and instead a variance covariance matrix is estimated. All the calculations used to derive the variances for Models 2a to e are described in detail in Technical Appendix. To check for consistency in the patterning of BMI variation by education and by wealth, we also conducted country-specific analyses for Models 2c and d. To explore potential interaction between age and socioeconomic factors, we first included fixed effects of interaction between age and education (Model 3a). Then, we further modeled BMI variance as a function of all possible cross-classification of age groups and education levels (Model 3b). This procedure was repeated for the interaction between age and wealth quintiles (Model 4a, b; more details presented in Technical Appendix). Lastly, we performed a stratified analysis by age groups to test whether differential amount of BMI variance gets explained within these subgroups after adjusting for each of the socioeconomic predictors individually and all collectively. The proportion of variance explained by covariate adjustments were computed by subtracting the variance of the model with more terms from the variance of the initial model, and converting to percentage. The AIC from different models were compared for model fit [16]. All analyses were adjusted for country fixed effects and PSU random effects. Multilevel modeling was performed using MLwiN 2.32.

Results

Of the 643,315 women included in the final analytic sample, the average BMI was 23.2 kg/m2 with a standard deviation (SD) of 5.1 kg/m2. Both mean and dispersion in BMI varied by sociodemographic predictors, such that older women had higher mean and SD compared to younger women (mean: 25.5 kg/m2, SD: 5.9 kg/m2 vs. mean: 20.9 kg/m2, SD: 3.4 kg/m2). The SD in BMI was also larger for women of higher education and wealthier quintiles (Table 1). In the multivariable linear regression model, the predictor variables were on average associated with BMI in the expected direction (Supplementary Table 1).

Table 1 Mean BMI and standard deviation (SD) by individual predictors

Assuming \(e_{0} \sim{\text{i}}.{\text{i}}.{\text{d}}. N(0,\sigma_{e0}^{2} )\) in Model 1, the variance in BMI at individual level was 16.7 (95% CI: 16.6, 16.7) (Table 2). However, results from Models 2a to e indicate that individual residuals in BMI were not independent of each of the sociodemographic variables. The BMI variance within the oldest age group was approximately 2.5 times larger than variance within the youngest group: 9.8 (95% CI: 9.8, 9.9) for women aged 15–19 years to 23.2 (95% CI: 22.9, 23.5) for 45–49 year olds. Women were much more variable in terms of BMI in urban areas (variance estimate (VE): 20.2; 95% CI: 20.1, 20.3) compared to rural areas (VE: 13.9; 95% CI: 13.8, 13.9). By education level, BMI was found to be the least variable among women with no formal education (VE: 14.2; 95% CI: 14.1, 14.3) and the most variable among those who have completed higher education (VE: 19.7; 95% CI: 19.5, 19.9). A similar pattern was observed for wealth quintiles, with the VE being substantially smaller for the poorest quintile group (VE: 13.6; 95% CI: 13.5, 13.7) compared to the wealthiest group (VE: 20.1; 95% CI: 20.0, 20.2). In terms of marital status, the widowed/divorced/separated women had VE of 20.8 (95% CI: 20.6, 21.1), which was much larger than the VE for never married women (VE: 12.5; 95% CI: 12.4, 12.6) (Table 2). The pattern of increasing variation in women’s BMI according to education and wealth was consistently found in country-specific analyses, with some differences in the magnitude (Supplementary Table 2).

Table 2 Individual level variance estimates (VE) in BMI modeled as a constant and as a function of each of the selected sociodemographic characteristic

On average, there was a significant interaction between age and socioeconomic factors, with the positive association between age and BMI being stronger for more educated and wealthier women (Supplementary Table 3). Independent of such interactive effects in the fixed part, residual variance in BMI was also found to be different for different combination of sociodemographic groups. Even within the same age group, BMI variance was much larger for more educated and wealthier groups. For instance, while the overall variance for 15–19 year olds was 9.8 (95% CI: 9.8, 9.9), the VE ranged from 7.7 (95% CI: 7.5, 7.9) for 15–19 year olds with no formal education to 11.9 (95% CI: 11.4, 12.4) for those who have completed higher education. Among 45–49 year old women, the variance was smallest for the least educated group (VE: 19.1; 95% CI: 18.7, 19.4) and largest for the secondary education group (VE: 27.6; 95% CI: 26.9, 28.1) (Fig. 2a). A similar trend was observed by wealth quintiles. Within the youngest age group, the VE ranged from 8.3 (95% CI: 8.2, 8.5) for the poorest quintile to 11.7 (95% CI: 11.5, 11.9) for the wealthiest quintile. Within the oldest age group, the VE ranged from 19.3 (95% CI: 18.8, 19.8) for the poorest quintile to 26.6 (95% CI: 26.0, 27.3) for the wealthiest quintile (Fig. 2b). The exact VEs from Models 3b and 4b are presented in Supplementary Table 4.

Fig. 2
figure 2

Differential variability in individual BMI (95% CI) by a women’s age and education, b women's age and wealth quintiles. Adjusted for all covariates (women’s age, place of residence, education, wealth, marital status), country fixed effects, and PSU random effects, and fixed effect for interaction between a age and education, b age and wealth quintiles

Lastly, adjusting for type of residence, education, wealth, and marital status explained less than 1% of the individual variation in BMI for all age groups. The explanatory variables that had statistically significant association with BMI on average had almost no systematic contribution to individual variation, and this remained consistent across all age groups (Table 3). As expected, the fixed effect estimates were largely unaffected by modeling level-1 heterogeneity, but the model fit as measured by AIC was better when BMI variance was modeled as a function of the selected predictors (Supplementary Table 5).

Table 3 Variance estimate (VE) stratified by seven age groups and proportion explained (%) by basic socioeconomic factors independently and jointly

Discussion

In this global analysis pooling the latest cross-sectional data from 57 LMICs, we provide an empirical evidence that assuming individual variation in BMI to be constant erroneously masks the underlying systematic differences in variability by sociodemographic characteristics. Women of older age, living in urban areas, more educated, wealthier, and widowed/divorced/separated groups exhibited higher variability in respect to their BMI measures compared to their reference groups. Existing literature on the association between socioeconomic status and BMI [17] should be interpreted with the understanding that average association corresponds to abstractions that do not correctly represent individual heterogeneity. Modeling variance in health outcomes directly as a function of explanatory variables and quantifying the extent to which individuals differ more from each other in some groups than in other groups can uncover important insights into how different types of people vary among themselves [8, 18].

The technical advantages of modeling complex level-1 variation have been discussed previously. Very briefly, heterogeneous models are recommended to obtain unbiased estimates of standard errors for the fixed-effect and random-effect parameters for any data that violate the homogeneity assumption [19] because the validity of inferences about model parameters may be severely affected [20, 21]. In multilevel analysis, inappropriate assumption of homoscedasticity at level-1 can lead to exaggerated estimates of contextual differences as heterogeneity between-individuals can confound between-context variation [7, 8]. In our analyses, we found that modeling the complex level-1 variation indeed improved the statistical fit of all models.

The observed pattern in BMI heterogeneity by socioeconomic predictors in our analysis was somewhat contrary to what was found in the context of the United States, where the increase in BMI dispersion was shown to be greater for the disadvantageous groups of non-Hispanic blacks and individuals with less than a high school education and high school graduates [10]. According to the “Schmalhausen’s Law”, under severe or unusual stress conditions, even small environmental and genetic differences may lead to major effect in manifested physiology and development [22]. Assuming that predictors such as education and wealth are a reasonable proxy for an individual’s social condition, BMI was indeed shown to be more sensitive to random disturbances caused by more stressful conditions in the US, but more variable in the better-off conditions in the LMICs. This reinforces the need to further examine how diverse aspects of the social and physical environments interact with individual characteristics or susceptibilities and result in varied effects in different contexts [23].

We explored two possible explanations for the presence of complex variation in BMI. First, the same set of factors may have differential effects on individuals, also known as treatment heterogeneity in clinical settings [24]. Interaction effect is routinely tested in the fixed part of the model when the same exposure-outcome relationship is, on average, expected to be different for particular subgroups. Our findings suggest that independent of such average interactive effects there may be important interaction in variability as well. For instance, in addition to BMI variance being larger for older women, the observed age heterogeneity in BMI was also systematically patterned by education level and wealth quintiles. Inequality in BMI within the oldest and the wealthiest (or most educated) group was more than three times larger than the youngest and the poorest (or least educated) group. The increase in BMI heterogeneity by age may be explained by differential long-term weight gain that results from a combination of accumulated effects of obesogenic behaviors, including changes in the consumption of specific foods and beverages and amount of physical activity, as well as biological changes over lifecourse [25, 26]. The consistent social gradient in the BMI heterogeneity by age further supports that some health behaviors or proclivities that affect long-term weight are likely conditioned by some common social processes, including childhood conditions [27].

The second explanation for the observed complexity in BMI heterogeneity may be attributed to a completely different set of factors driving the variation within different segments of the population. In this case, in-depth substantive expertise knowledge is necessary to conduct stratified analyses by meaningfully defined subgroups with potentially different sets of risk factors relevant for each. We attempted to explore the stochastic and systematic components of the within-group variation by different age groups, but found little support for differential amount of variation being explained across the age groups. A recent study from Indonesia found that BMI in men tend to become more variable over time compared to women, and that socioeconomic factors explained more of the variation in BMI among men [12]. Due to DHS data on anthropometric measurements being restricted to reproductive age women only in most countries, we were not able to test for gender-specific determinants of BMI heterogeneity at a global scale. For India, a country where BMI measure was collected for 64,958 men aged 15–49 years, we found that socioeconomic factors explain 3.5% of the variation in men and 1.1% in women (data not shown). Future investigations using a larger set of predictors can potentially identify determinants of inter-individual variation that are specific to each of the subgroups that exhibit substantially larger inequality.

While we have pooled our global analytic sample from nationally representative surveys across a broad range of countries, the cross-sectional nature of our dataset inhibits exploration of potential changes in individual variation over time. A study based on repeated cross-sectional data in Indonesia has shown increasing within-group inequalities in BMI over time (1993–2008) that was greatest among individuals in low education and low per capita expenditure groups [12]. Whether this trend holds true across other LMICs should be tested. Additionally, our study population was restricted to young- and middle-aged women. However, we found the same pattern in BMI heterogeneity by sociodemographic factors among a sample of men in India where BMI measure was available (Supplementary Table 6). Lastly, our analysis was limited to a small number of predictor variables that were consistently measured across the countries, but future studies should explore data with diverse sociodemographic, dietary consumption, and genetic information to provide more comprehensive evidence on individual heterogeneity in BMI as well as their relative contributions to variously defined subgroups.

In conclusion, our study suggests that individual heterogeneity should be routinely questioned and empirically tested rather than being treated as “error” or “chance” phenomenon that can be averaged away [28]. The nonrandom patterns of variability observed in BMI suggest that examination of the degree and pattern of heterogeneity between individuals within a population may provide information not evident from the analysis of mean values alone [29]. This approach is especially pertinent given the increasing within-population inequalities for many health outcomes and related risk factors [30, 31]. Identifying groups with systematically large variation and understanding their specific determinants are important for preventive strategies that aim to reduce health disparity [32].