Abstract
The economics of obesity literature implicitly assumes that measured anthropometrics are error-free and they are often treated as a gold standard when compared to self-reported data. We use factor mixture models to analyse measurement error in both self-reported and measured anthropometrics with nationally representative data from the 2013 National Health Survey in Brazil. A small but statistically significant fraction of measured anthropometrics are attributed to recording errors, while, as they are imprecisely recorded and due to reporting behaviour, only between 10 and 23% of our self-reported anthropometrics are free from any measurement error. Post-estimation analysis allows us to calculate hybrid anthropometric predictions that best approximate the true body weight and height distribution. BMI distributions based on the hybrid measures do not differ between our factor mixture models, with and without covariates, and are generally close to those based on measured data, while BMI based on self-reported data under-estimates the true BMI distribution. “Corrected self-reported BMI” measures, based on common methods to mitigate reporting error in self-reports using predictions from corrective equations, do not seem to be a good alternative to our “hybrid” BMI measures. Analysis of regression models for the association between BMI and health care utilization shows only small differences, concentrated at the far-right tails of the BMI distribution, when they are based on our hybrid measure as opposed to measured BMI. However, more pronounced differences are observed, at the lower and higher tails of BMI, when these are compared to self-reported or “corrected self-reported” BMI.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Obesity is a strong predictor of overall mortality (Li et al. 2021; Prospective Studies Collaboration et al. 2009) and an important risk factor for several noncommunicable diseases such as cardiovascular diseases, diabetes, musculoskeletal disorder and some cancers (Lin et al. 2020). A large literature has explored the economic and social ramifications of obesity, such as poorer labour market outcomes, increased health care utilization and associated public health costs (e.g. Cawley 2004, 2015; Rooth 2009). Moreover, studies have investigated and measured socioeconomic inequalities in obesity (e.g. Bilger et al. 2017; Davillas and Benzeval 2016; Zhang and Wang 2004).
Despite an influential report on the importance of physically measured health indicators for understanding how the social and economic environment may get under the skin, several multi-purpose social science datasets continue to collect only self-reported weight and height data (Cawley 2015). Some existing studies do use datasets that collect measured anthropometrics, often in addition to self-reported anthropometric data (e.g. Cawley 2015; Cawley et al. 2015; Davillas and Jones 2021; Gil and Mora 2011). Studies that analyse measurement error in anthropometric data typically compare self-reports and measured anthropometric data; this research explicitly assumes that measured anthropometric data are error-free “gold-standard” measures. Specifically, Cawley et al. (2015), using the US National Health and Nutrition Examination Survey (NHANES) for the period between 2003 and 2010, compare self-reports with measured weight and height data. They find that reporting error in self-reported data is non-classical, with those who are underweight, based on measured anthropometrics, tending to over-report and overweight and obese respondents tending to under-report their weight. Gil and Mora (2011) use data from the 2006 Catalan Health and Health Examination Surveys and compare self-reported with measured anthropometrics. They find that social norms regarding ideal weight may affect reporting bias in self-reported anthropometrics. Those respondents who are more satisfied with their own body image are less prone to under-report their weight, although these results are subject to the definition of social norms on body image.
Using UK data, Davillas and Jones (2021) conducted an experiment to explore the extent of measurement error in body mass index (BMI), when self-reported body weight and height data are compared to measured anthropometrics. This study shows non-classical reporting error in height and weight; taller people seem to report their height more accurately and a sharp increase in reporting errors in self-reported body weight for those of greater measured weight. Further analysis shows that heterogeneity in self-reported anthropometrics is associated with within-household measured BMI data. A study employing Swedish data (Ljungvall et al. 2015) finds the presence of reporting error in self-reports of BMI (called misreporting in the study), when compared to measured anthropometric data, and that there is systematic social patterning in misreporting which matters for estimation of the education and income gradients in BMI when based on self-reports. O’Neill and Sweetman (2013), using a selective sample of mothers from the Irish Cohort Study, find that self-reported BMI, as compared to measured anthropometrics, is subject to substantial measurement error, which also causes an overestimation of the relationship between BMI and income. Finally, the role of interviewers is examined by Olbrich et al. (2022). Using various datasets from the USA, UK and Germany, this study shows that interviewers play an important role in differences between reported and measured body height data as well as on the changes in reported height over survey waves.
Over and above the fact that these existing studies assume that measured anthropometrics are error-free, they mostly compare self-reports and measured anthropometric data that were collected with a considerable time difference or where respondents were informed about the subsequent physical measurements (Cawley et al. 2015; Gil and Mora 2011). The related medical literature is often based on selected age groups, non-representative samples and neither aims to characterize the measurement error nor to quantify the implications of the measurement error for economic modelling (e.g. Engstrom et al. 2003; Gorber et al. 2007; Keith et al. 2011).
Despite the fact that measured anthropometrics are assumed to be error-free in much of the existing literature, the accuracy of measured anthropometrics may indeed be affected by several factors. For instance, recent evidence has documented the influence of interviewers on reliability of measured and self-reported body height data in different surveys (e.g. Finn and Ranchhod 2017; Olbrich et al. 2022). Potential sources of measurement error include both unintentional (such as accidental recording errors from measurement equipment to survey materials) and intentional (i.e. fabricating parts of the measurement or even conducting physical measurements on the wrong respondent) recording errors (e.g. Finn and Ranchhod 2017; Groves 2005; Olbrich et al. 2022). These may not be easy to detect if the interviewers visited the household to conduct the interview (Olbrich et al. 2022). Moreover, in some datasets, interviewers may visit the household more than once to complete a socioeconomic questionnaire and collect physical measurements, including anthropometrics; this increases the likelihood that mis-identification takes place. More broadly, the literature has discussed the presence of measurement error in more objectively measured nurse-collected and blood-based health data (Davillas and Pudney 2020a,b). These studies use latent variable models to account for measurement error, but they do not aim to explicitly model measurement error or to explore its potential implications for economic models. Overall, there is limited research that has access to both self-reported and measured anthropometric data collected within the same survey wave.
Our paper contributes to the literature in various ways. We model potential measurement error in both self-reported and measured anthropometrics (i.e. body weight and body height). We use data from the 2013 National Health Survey (Pesquisa Nacional de Saúde; PNS 2013) of Brazil, which is a nationally representative dataset that allows for measured and self-reported data on body weight and height to be collected from the same individuals within the span of a household interview. In Brazil, obesity has systematically increased since the 2010s, with one in every five adults experiencing obesity (Triaca et al. 2020). Projections of the obesity-related costs in Brazil show that the annual health care costs may double from 2010 ($5.8 billion) to 2050 ($10.1 billion)─a total health care cost of $330 billion over 40 years (Rtveladze et al. 2013). As such, obesity is an important public health concern for Brazil.
To analyse measurement error in the Brazilian data, we use a factor mixture model, initially proposed by Kapteyn and Ypma (2007). This Kapteyn and Ypma (KY) factor mixture model is applied and extended by Jenkins and Rios-Avila (2020) and Jenkins and Rios-Avila (2021, 2023a) to analyse measurement error in self-reported and administrative income data. To the best of our knowledge, the KY factor mixture model has not been used to analyse measurement error in self-reported and measured anthropometric data. Unlike the existing literature that assumes no measurement error in measured body weight and height data, our analysis allows us to model different types of errors in both self-reported and measured anthropometrics. Specifically, we test the hypothesis that measured anthropometrics encompass data recording errors. Moreover, the self-reported anthropometric data are assumed to be subject to a wider set of measurement errors. These include the precision of the scale for the self-reported data, which are only recorded as whole numbers (in cm or Kg), non-classical mean-reverting errors and other types of remaining errors. As permitted by our data, we also estimate factor mixture models that account for individual-level covariates to explore the extent to which true latent anthropometrics as well as reporting error in self-reported anthropometrics, and their dispersion, may vary across population groups. Absence of interviewer-level data, however, prevents us from exploring heterogeneity in measured anthropometrics due to interviewer characteristics.
Our analysis also allows us to estimate the probability of each type of measurement error in both self-reported and measured data. Of particular interest, given that measured anthropometric data are often considered error-free (e.g. Cawley 2015; Davillas and Jones 2021; Gil and Mora 2011), our results suggest that a small but systematic fraction of measured anthropometrics contain data recording errors. Turning to self-reported weight and height, the estimated probabilities that the self-reported anthropometrics equal the true body weight and height are relatively low, at 10% and 23%, respectively.
Post-estimation analysis allows us to generate a set of predictions of the distribution of the true latent weight and height data that combine information from both self-reported and measured anthropometrics. Based on reliability measures and mean squared errors, estimated using simulated out-of-the-sample predictions, we select the best performing predictions of latent weight and height distributions. After choosing our preferred prediction, for our factor mixture models with and without covariates, our sample data are used to compute body weight and height measures that approximate the true values; these are then used to calculate our proxies of the true BMI distribution.
Finally, we compare the distributions of BMI using self-reported, measured and our proxies of true BMI; the latter are very close to the distribution of BMI based on measured anthropometrics, while the BMI based on self-reported data under-estimates the true BMI distribution. We also employ the “corrected self-reported BMI” as an additional measure—a conventional measure used in the existing economics of obesity literature to correct self-reported data for reporting error (Cawley 2015). Our results show that these “corrected self-reported BMI” measures are not a good alternative to our “hybrid” BMI measures. In addition, we provide evidence to explore the potential implications of the measurement error in both self-reported and measured anthropometrics. As an illustration, we compare results when each of the self-reported, “corrected self-reported”, measured and hybrid BMI measures is used as explanatory variables in linear regression models for the frequency of hospital admissions in the past 12 months. We find only moderate differences in the results between the hybrid BMI measure and the one based on measured anthropometrics, and these are concentrated in the far-right tails of the BMI distribution. More pronounced disparities are observed, at the lower and higher BMI tails, when our hybrid measures are compared with BMI measures based on self-reported or “corrected self-reported” data.
Understanding and characterizing measurement error in both self-reported and measured anthropometrics has important public health implications. Self-reported and/or physical measurements of anthropometrics are collected in several nationally representative surveys. For example, the Survey of Health, Ageing and Retirement in Europe (SHARE), the European Community Household Panel (ECHP), the German Socio-Economic Panel (GSOEP) as well as the National Longitudinal Survey of Youth (NLSY), the Medical Expenditure Panel Survey (MEPS) and the Behavioural Risk Factor Surveillance System (BRFSS) are datasets that are frequently used for obesity research but are limited to self-reports of body weight and height data. Recent advances to survey measurement allow for measured anthropometrics to be collected as part of multi-purpose social science surveys to improve data reliability on anthropometric measurement (Cawley et al. 2015). Data from nationally representative surveys are used to estimate obesity prevalence at the national level as well as for international comparisons (Ng et al. 2014). Measurement errors that may contaminate both self-reported and measured anthropometrics may affect within and between country and region comparisons of obesity prevalence and estimates of the population at increased health risks. Depending on the size of the measurement error, this may mislead potential public policies to mitigate regional or cross-country differences in excess body weight. Moreover, studies that quantify the (public) health care costs associated with obesity and related diseases often rely on survey data (Cawley and Meyerhoefer 2012); this research is influential and is used to justify government programmes to prevent obesity on the grounds of external costs (USDHHS 2010). The extent to which these estimates may be biased due to measurement error in measured and/or self-reported anthropometrics collected in surveys is of relevance from a public health point of view given the cost savings from reducing obesity prevalence.
The rest of the paper is organized as follows. Section 2 presents the methods used to analyse measurement error in both self-reported and measured anthropometric data. Our data source and descriptive statistics are presented in Sect. 3. The results of our analysis, post-estimation predictions and a preliminary analysis of the potential implications on measurement error in both self-reported and measured anthropometrics for economic research are presented in Sect. 4. Section 5 concludes and provides a summary of our findings.
2 Methods
We adapt the factor mixture model, proposed by Kapteyn and Ypma (2007), to model the relationship between measured and self-reported anthropometrics. This model has been applied and extended by Jenkins and Rios-Avila (2020) and Jenkins and Rios-Avila (2021, 2023a) to analyse measurement error in income data. In this study, we apply the KY model to measurement error in both self-reported and measured anthropometric data, on weight and height, using the 2013 National Health Survey of Brazil.
We assume that the true values of each anthropometric measure (weight or height) for an individual \(i\) \(\left( {\xi_{i} } \right)\) are unobserved, but we can observe both measured \(\left( {r_{i} } \right)\) and self-reported \(\left( {s_{i} } \right)\) anthropometrics. Table 1 provides a description of the types of errors in measured and self-reported anthropometric data that can be captured by our factor mixture model.
Measured anthropometrics are collected at the end of the individual questionnaire in our dataset. According to the survey protocol, it is possible for the individual interview to be completed in more than one visit, so it may be the case that physical measurements (which are time consuming as they include anthropometrics and blood pressure measurements) may not take place on the same day. Also, the measured anthropometrics are recorded by the interviewer by hand in the survey materials. Thus, measured anthropometrics may suffer from (unintentional or intentional) recording error related to entering values from the measurement equipment to the survey materials, fabrication of the measurement of anthropometrics by the interviewerFootnote 1 or even physical measurements taken from the wrong household member (especially if the main interview and physical measurements are not collected on the same day). These measurement errors, although they may occur with low frequency, could have a non-negligible impact on data reliability.
Thus, in the case of measured anthropometrics, we assume that the distribution of each anthropometric measure is a mixture of two types of observation:
where measured anthropometrics \(\left( {r_{i} } \right)\) equals the true value with probability \(\pi_{r}\) (case R1). However, measured anthropometrics may be not equal to the true value for certain respondents with probability \(1 - \pi_{r}\) (case R2); thus, an error-ridden measure \(\left( {\zeta_{i} } \right)\) is observed in this case. In the spirit of the KY factor mixture model, this erroneous anthropometric measure, which is incorrectly attributed to individual \(i\), is denoted by \(\zeta_{i}\).Footnote 2 The true values and those with recording errors are both assumed to be independently and identically normally distributed: \(\xi_{i} \sim N\left( {\mu_{\xi } ,\sigma_{\xi }^{2} } \right)\), \(\zeta_{i} \sim N\left( {\mu_{\zeta } ,\sigma_{\zeta }^{2} } \right)\); this implies that the marginal distribution of \(r_{i}\) is a mixture of two normals. Given the type of errors that are captured by \( \zeta_{i}\), as described above, we assume that there is no correlation between \(\xi_{i}\) and \(\zeta_{i}\).Footnote 3 The assumption that the erroneous measurements are uncorrelated with the true values contributes to the identification of the full model as it implies that these measurements are also uncorrelated with the self-reported anthropometrics.
Each of our self-reported anthropometrics (i.e. weight or height) is assumed to be a mixture of three types of observation:
Table 1 describes all sources of measurement errors in self-reported anthropometrics that are captured in Eq. 2. Specifically, we assume that the self-reported anthropometrics \(\left( {s_{i} } \right)\) equals the true latent value \(\left( {\xi_{i} } \right)\) with probability \(\pi_{s}\) (case S1). The self-reported values are recorded as integers so this case only applies when the true value is a whole number.Footnote 4 Otherwise (cases S2 and S3), there must be some imprecision in \(s_{i}\) due the scale of measurement. This imprecision, reflecting different ways in which respondents may round their responses to whole numbers along with random noise in the self-reports, is captured by the error term \(\eta_{i}\). This error is independent of the true value \(\left( {\xi_{i} } \right)\). In addition, we allow for the possibility of non-classical mean-reverting (or mean-diverging) error (survey measurement error, which is captured by term \(\rho \left( {\xi_{i} - \mu_{\xi } } \right)\).Footnote 5 Existing studies comparing measured with self-reported data have shown the presence of mean-reverting errors in self-reported body weight (Cawley et al. 2015). The second case (S2), which allows for both sources of error, occurs with probability \(\left( {1 - \pi_{s} } \right)\left( {1 - \pi_{\omega } } \right)\). The third case (S3), which occurs with probability \(\left( {1 - \pi_{s} } \right)\pi_{w}\), adds a third source of measurement error \(\left( {\omega_{i} } \right)\) to allow for additional random noise that may occur in some observations who make additional errors in their self-assessments of height or weight (see Table 1). The measurement errors are both assumed to be independently and identically normally distributed: \(\eta_{i} \sim N\left( {\mu_{\eta } ,\sigma_{\eta }^{2} } \right)\), and \(\omega_{i} \sim N\left( {\mu_{\omega } ,\sigma_{\omega }^{2} } \right)\).
Note that the survey team undertook significant effort to minimize the risk of equipment failure for physical anthropometric measurements; our dataset employs international measurement protocols and validated equipment, which is calibrated daily to ensure reliability of the measurements. The procedures for taking anthropometric measures are defined to prevent biologically inaccurate measures and were done in partnership with the Laboratory for Nutritional Evaluation of Populations (LANPOP), part of the Public Health School in the University of São Paulo (Damacena et al. 2015; Szwarcwald et al. 2014). Also, the availability of two repeated physical measurements of body weight and height (we took the second measure for our main estimation and, for sensitivity analysis, the average of these measures) further reduces the likelihood of errors related to equipment failure. Thus, we do not capture this potential source of error in our factor mixture models.Footnote 6
The full KY model defines a mixture of six latent classes that correspond to the combination of cases R1 or R2 with S1, S2 or S3. Table 2 describes all the potential latent classes. For instance, the class 1 (R1, S1) consists of error-free self-reported (S1) and measured (R1) data and occurs with probability \(\pi_{r} \pi_{s}\). The full model is a mixture of the six bivariate normal distributions for the observed outcome pairs (\(r_{i}\), \(s_{i}\)), each with different means and covariance matrices (see Jenkins and Rios-Avila (2020, 2021) and Kapteyn and Ypma (2007) for full details).
The parameter estimates are obtained by maximizing the model log-likelihood (see Kapteyn and Ypma 2007, Appendix B), with identification relying on the existence of the “completely labelled” group that contains observations with error-free anthropometrics (class 1: R1-S1). Parameters \(\mu_{\xi }\) and \(\sigma_{\xi }^{2}\) are identified from these “completely labelled” observations and this contributes to identification of the other unknown parameters from the mixture of normals implied by the model specification (see Kapteyn and Ypma (2007) for further details on identification). Kapteyn and Ypma (2007) provide the expressions for the probability density functions and the associated log-likelihood function. Employing Jenkins and Rios-Avila’s (2023b) user-written Stata command, we fit the full Kapteyn and Ypma (2007) model by maximum likelihood, assuming that the sample likelihood function is a finite mixture of latent class distributions. Our analysis is done separately for each of our anthropometric measures, i.e. for weight and height.
2.1 Accounting for covariates
Following Jenkins and Rios-Avila (2020; 2021), the factor mixture model is based on unconditional distributions. However, allowing the measurement error distributions to vary across observed characteristics has the advantage of increased flexibility and can be used to assess whether the distributions of measurement errors differ across population sub-groups (Jenkins and Rios-Avila 2023b). Goodness of fit tests based on the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are used to compare our factor mixture models with and without covariates.
Jenkins and Rios-Avila (2023a) extend the Kapteyn and Ypma (2007) model, to allow transformations of relevant parameters to be specified as linear indices of characteristics (\(X_{i} )\):
where for each factor mixture model parameter of interest (\(\gamma\)), \(\alpha_{\gamma }\) is a constant and \(\beta_{\gamma }\) are the slopes associated with individual-level characteristics \((X_{i}\)). The function \(G\left( \cdot \right)\) is the specific transformation function of the parameter of interest. These are the identity function for means \(\left( \mu \right)\), the logarithmic function for SDs \(\left( \sigma \right)\) and Fisher’s z transformation for correlations \(\left( \rho \right)\).
In practice, and for parsimony in the estimation of our factor mixture models with covariates, we parameterize errors in self-reported data using age, gender and region of residence. Existing studies argue that measurement error in self-reported body weight and height data depends on respondents’ characteristics (e.g. Cawley et al. 2015; Davillas and Jones 2021). Specifically, for the self-reported anthropometrics, we model the \(\mu\) and the \(\sigma\) of the imprecision error \(\left( {\eta_{i} } \right)\) and of the additional random error \(\left( {\omega_{i} } \right)\) as a function of individual characteristics (age groups, gender and region of residence); we also condition the non-classical mean-reverting error on these respondent-level characteristics.
Moreover, for the latent true body weight and height, we assume that the mean (\(\mu_{\xi }\)) varies by respondents’ age, gender and region of residence; the same covariates are also used for the SD equation (σξ). Earlier research has considered these demographics as basic correlates of obesity (e.g. Baum and Ruhm 2009; Davillas and Jones 2020). Finally, we model the distribution of measured anthropometrics without covariates, given that the protocols on physical measurements of anthropometrics collected in surveys are the same for all respondents (irrespective of gender, age and region of residence). Interviewer characteristics, for those who are responsible for the physical measurements, might be more relevant sources of measurement error in measured anthropometrics (Olbrich et al. 2022).Footnote 7 However, these are not available in our dataset.
For the factor mixture models with covariates, we calculate the estimated parameters in their natural metrics, computing the Average Predicted Margins (APMs); for each measurement model parameter of interest \(\left( \gamma \right)\), we predict \(\gamma\) for every individual in our sample using the fitted model and assuming all other covariates are at their observed values and, then, calculate the sample average of \(\gamma\) (and its associated standard error). For presentation purposes, we report how each measurement error parameter \(\left( \gamma \right)\) varies across covariates using the APMs (Jenkins and Rios-Avila 2023a,b). For example, for a gender dummy, we calculate the APMs, for males by setting all sample values of gender to male and then taking the average over the whole sample; APMs for females are calculated analogously. This allows us to test whether there are systematic gender differences in APM for each particular parameter of interest (\(\gamma )\). For comparison purposes, in addition to APMs across population groups (by gender, age groups and region of residence) we also report the corresponding APMs for all observations in the sample.
2.2 Post-estimation predictions
As a post-estimation exercise, we generate predictions of the distribution of the true latent weight and height (e.g. Meijer et al. 2012). In line with Jenkins and Rios-Avila (2023a), we employ the most reliable prediction among all the potential hybrid measures of weight and height and then calculate BMI as weight (in Kg) over the square of height (in metres). We compare the distributions of hybrid, self-reported and measured BMI. We take the estimated parameters of our mixture models, separately for the case of models with and without covariates, to create “hybrid” anthropometric predictions that combine information from both self-reported and measured anthropometrics.Footnote 8
Specifically, in line with Meijer et al. (2012), both with and without covariates, we compare a number of approaches that combine measured and self-reported data to obtain the best prediction of the “true” anthropometrics of interest. Meijer et al. (2012) begin by deriving two predictors for the case of a single latent class (as described in Table 2 for our analysis): one that minimizes the mean squared error (MSE) and one that minimizes the MSE conditional on unbiasedness. Because class membership is unobserved, Meijer et al. (2012) proposed three ways to proceed: (1) compute the within-class predictors for each class and combine them in a weighted average using the (un)conditional class probabilities for weighting; (2) predict class membership and then use the within-class predictor for the predicted class; and (3) derive predictors that minimize the total mean squared prediction error. Because either the predictor based on MSE or on MSE conditional on unbiasedness could be the within-predictor for each of the three approaches listed above, there are six potential predictors in total. Finally, a system-wide predictor minimizes MSE under the assumption of linearity and imposing the condition of unbiasedness.
As described above, following Meijer et al. (2012), seven “hybrid” measures to approximate the true body weight and height are generated in our study: (1) Weighted (unconditional), (2) Weighted (unconditional) unbiased, (3) Weighted (conditional), (4) Weighted (conditional) unbiased, (5) Two-stage, (6) Two-stage, unbiased and (7) System-wide linear. Predictions 1 to 6 use two within-class predictors for \(\xi\). The first set \(\hat{\xi }_{i}^{j}\), used for predictors 1, 3 and 5, minimize the mean square error (MSE), \(E\left[ {\left( {\xi_{i} - \xi_{i}^{j} } \right)^{2} |\xi_{i} , i \in J} \right]\). The second of set predictors,\(\hat{\xi }_{i}^{Uj}\), used for predictors 2, 4 and 6, minimize the MSE conditional on \(E\left( {\xi_{i} - \xi_{i}^{Uj} | i \in J} \right) = 0\). Predictors 1 and 2 provide weighted predictions using the unconditional within-class probabilities \(\pi_{j}\). Predictors 3 and 4 provide weighted predictions using conditional or posterior within-class probabilities \(\pi_{j} \left( {r_{i} ,s_{i} } \right)\). Predictors 5 and 6 use a two-step Bayesian classification; i.e. the predicted class membership is obtained first and, then, the class-specific predictor of the predicted class is used. Finally, the seventh predictor \(\left( {\xi_{7i} } \right)\) is the system-wide predictor that minimizes MSE under the assumption of linearity and imposing the condition of unbiasedness.
To assess the precision of those predictions, we estimate reliability statistics and the MSE.Footnote 9 These are computed with respect to the seven “hybrid” measures that come from the sample simulations for body weight and body height based on estimated parameters for the factor mixture models both with and without covariates. Simulation analysis is done using the user-written Stata command “ky_sim” (Jenkins and Rios-Avila 2023b).
We provide some further analysis to explore the implications of the measurement error in both self-reported and measured anthropometrics for empirical research on the association between obesity and health care utilization. Specifically, we compare results when each of the self-reported, measured and hybrid BMI measures is used as explanatory variables. If measurement error is non-classical, i.e. systematically associated with the measured values, it may cause bias in regression models that use anthropometrics as a regressor, even in the case where instrumental variable analysis is employed to deal with endogeneity or errors in variables (Cawley et al. 2015; O’Neill and Sweetman 2013).
3 Data
Data on self-reported and measured anthropometrics are extracted from the 2013 National Health Survey of Brazil (Pesquisa Nacional de Saúde –PNS 2013).Footnote 10 This is a cross-sectional, nationally representative dataset for all Brazilian states and geographic regions. The survey focuses on use of health care services, population health conditions and surveillance of chronic noncommunicable diseases and their associated risk factors. The PNS-2013 collects demographics and socioeconomic characteristics of all household members. For each household, a randomly selected household member aged 18 or older is chosen for their body weight and height to be measured along with self-reports of the same anthropometrics.Footnote 11 This results in a working sample of 37,335 respondents, men and non-pregnant women aged 20 or older, with valid self-reported and measured weight and height data. We focus on adults (aged 20 +) to avoid any puberty-related changes in body size.
3.1 Self-reported and measured body weight and height data
Self-reported body weight and height data are collected as part of the survey questionnaire. Measured weight and height are collected twice by a trained survey team member at the end of the questionnaire. Weight is measured by a portable digital scale, following standard measurement protocols which require that the respondents remove their shoes, heavy clothes, accessories and objects from their pockets (PNS 2013). Following common practice in the literature, when measured health data are used, we take the second measurement for weight and height for our base case analysis to reduce any potential errors in measured anthropometrics (e.g. Johnston et al. 2009; Davillas and Pudney 2017). A sensitivity analysis is done using the average of the two measures.Footnote 12
For height, a portable stadiometer is used to measure stature (PNS 2013). Measurement protocols for body height require that the respondent must remove their shoes and other accessories, if possible, and keep at least three points of the body on the posterior surface of the stadiometer (PNS 2013). International measurement protocols together with validated and daily calibrated equipment are employed for anthropometric physical measurements. These procedures are settled in partnership with the Laboratory for Nutritional Evaluation of Populations (LANPOP), part of the Public Health School in the University of São Paulo, to prevent biologically inaccurate anthropometric measurements (Szwarcwald et al. 2014).
Our analysis allows for modelling all hypothesized errors in the measured and self-reported anthropometrics as relevant to our dataset and described in detail in Table 1. Along with the unconditional factor mixture measurement error models, we also estimate models that account for a parsimonious set of covariates to explore potential differential patterns in measurement errors across population groups. Specifically, in these models, we account for the respondent’s gender, while respondents age is captured by a 6-category age group variable (20–29, 30–39, 40–49, 50–59, 60–69, and 70 or more). Region of residence is captured by a categorial variable for the five geographical regions (often called macro regions) of Brazil as defined by the Brazilian Institute of Geography and Statistics: North, Northeast, Central-West, Southeast and South.
3.2 Descriptive statistics
Figure 1 shows the histograms of the raw difference between measured and self-reported body weight and height data, as well as for BMI created from the measured and self-reported anthropometrics; a normal distribution is overlayed on each histogram. The horizontal axis is the number of units of raw reporting error; negative numbers indicate that self-reports are higher than measured values, and vice versa. The histograms would have been a single bar, with all the sample having zero reporting error, if every respondent reported the same measured and self-reported anthropometrics.
Overall, across the graphs for body weight, height and BMI, the distribution of the raw difference between reported and measured values deviates from the normal distribution. Specifically, there is more mass around zero for the raw difference, as shown by the histograms, as opposed to the normal distribution. Compared to the normal distribution, less mass is observed with moderate and larger raw differences for all anthropometrics, while there is more mass at very high raw differences. Finally, it seems that the distribution for the raw body height difference is more skewed.
Descriptive statistics for the self-reported and measured weight and height data as well as for BMI measures are presented in Table 3.Footnote 13 The mean self-reported weight (71.5 kg) is slightly smaller than the mean measured weight (72 kg). Mean self-reported height is 0.8 cm higher than measured height. Table 3 also shows that the mean absolute difference between the self-reported and measured data (expressed in terms of percentage of the measured values) is about 3% for body weight, 1% for height and 4.5% for the derived BMI measure.
Existing literature argues that reporting error in body weight self-reports may be mean reverting, when compared with measured anthropometric data (Cawley et al. 2015); respondents with high (low) values of measured body weight data tend to under-report (over-report) their body weight in self-reports. To provide some preliminary evidence of this, under the assumption that measured data are not subject to measurement error (an assumption we will relax later), Fig. 2 shows the mean raw difference (measured self-reported) in body weight and height data across deciles of the measured anthropometrics. Our results for body weight show that the mean raw reporting error becomes less negative moving across the first three groups. This indicates that, on average, the self-reported weight is higher than measured weight for those with the lowest measured weight data. For the higher deciles of measured body weight, there is a progressively increasing positive raw error indicating that measured weight is higher than the self-reports, with the under-reporting becoming more evident for those with higher measured weight.
Figure 2 also displays the mean raw differences for height. There is a progressively less negative mean raw difference moving to those of higher measured height up to the 80th percentile of measured height, i.e. self-reports of height are higher than measured data on average, with the over-reporting (almost) monotonically reducing in magnitude for those with higher measured height. For the two tallest deciles, the mean raw reporting error is positive, suggesting that those of very high measured height tend to under-report their height. Overall, and despite the observed differences between weight and height, these results show that respondents with high (very high) measured body weight (height) tend to under-report, while over-reporting is evident for those of lower measured values. These summary statistics provide initial evidence on the presence of mean-reverting error in self-reported anthropometrics under the assumption that measured data are not subject to measurement error. Although this is an assumption that we relax in our factor mixture models, this motivates accounting for mean reversion (or mean divergence) in the measurement error models.
4 Results
4.1 Estimates of structural parameters: mixture model without covariates
Table 4 presents the estimates for the KY model (expressed in their natural metrics). Following Jenkins and Rios-Avila (2020), the completely labelled observations are defined as those observations with \(\left| {r_{i} - s_{i} } \right| \le \delta\). Our model presented in Table 4 assumes \(\delta = 0\), i.e. the completely labelled observations are only those with no differences between self-reported and measured values. Under this demanding requirement, given the differences in precision of the scales used for measured and the self-reported outcomes, the completely labelled cases represent just 10% and 23% of our observations for weight and height, respectively. Sensitivity analysis is also conducted to test the robustness of our results when this requirement is relaxed.
Table 4 shows that the mean of latent true body weight \(\left( {\mu_{\xi } } \right)\) is 71.9 kg \(\left( {{\text{with }}\;{\text{a}}\;{\text{standard}}\;{\text{deviation}},{ }\sigma_{\xi } = 14.9} \right)\). The distribution of the latent true weight has a higher mean (by about 0.4 kg) than the mean of self-reported body weight (Table 3); the p-value for the difference in means is less than 0.01. The estimated mean of true body height is 164.5 cm \(\left( {{\text{with}}\;{\text{a}}\;{\text{standard}}\;{\text{deviation}} \sigma_{\xi } = 9.4} \right)\). This value is lower (by − 0.7 cm) than the mean of the self-reported height (Table 3).
The probability \(\left( {\pi_{r} } \right) \) that measured weight and height reflect the corresponding true values is high: 98.6% for weight and 96.7% for height. This indicates that the probability of error-prone measured body weight and height data occurs with a low, but systematically different from zero, probability \(\left( {1 - \pi_{r} } \right)\) of about 1.4% (p-value < 0.01) and 3.3% (p-value < 0.01), respectively. Error-prone measurement of body weight (reflecting the recording errors) leads to an estimated mean \(\left( {\mu_{\zeta } } \right)\) of 78.9 kg for these erroneous observations, which is 7 kg (or almost 10%) higher than the estimated mean of true weight; data recording error in measured weight is also associated with a higher standard deviation \(\left( {\sigma_{\zeta } = 19.4} \right)\) compared to the estimated true weight distribution \(\left( {\sigma_{\xi } = 14.9} \right)\). Similarly, error-prone measured body height (that is subject to potential recording error) has an estimated mean \(\left( {\mu_{\zeta } } \right)\) for the erroneous observations of 159.8 cm, which is lower than the estimated mean of the true height (by about 4.7 cm, i.e. 2.9% of the mean of the true height), as well as having a lower estimated standard deviation compared to the true height distribution (\(\sigma_{\zeta } = 8.9\) compared to \(\sigma_{\xi } = 9.4\)).
Turning to self-reported weight and height, the estimated probability \(\left( {\pi_{s} } \right) \) that the self-reported anthropometrics equal the true body weight and height (i.e. they are free from any measurement error) is, as expected given the difference in precision of the two measures, relatively low at about 10% and 24%, respectively. Table 4 shows that mean reversion \(\left( \rho \right)\) in case of both self-reported body weight and height data is small in magnitude (close to zero) although statistically significant at the 1% level. This indicates that after accounting for all other sources of measurement error in self-reported data, mean reversion seems to play a limited role. Error due to the reporting precision (precision error) in self-reported body weight and height data has mean values \(\left( {\mu_{\eta } } \right)\) of − 0.33 kg for weight and 0.4 cm for height. The estimated probability of the Case S2 type of observations, \(\left( {1 - \pi_{s} } \right)\left( {1 - \pi_{\omega } } \right)\), is about 62% for weight and 44% for height. Moreover, Table 4 shows that the probability \(\left( {1 - \pi_{s} } \right)\pi_{\omega }\) that self-reported anthropometric data contains additional measurement error, Case S3, is about 28% for self-reported weight and 31% for self-reported height.
Table 4 (Panel B) presents estimates of the membership probabilities for the six latent classes (as described in Table 2). The first latent class consists of error-free self-reported (S1) and measured (R1) anthropometric data with a probability of 10% for body weight and 23% for height. The probability that there are error-free measured anthropometrics and survey reporting error in self-reported anthropometrics is about 61% for weight and 43% for height \(\left( {\Pr \left( {R = 1,S = 2} \right)} \right). \) The probability of error-free measured anthropometrics and additional reporting error in self-reported data, corresponding to the third latent class, is 27% for weight and 30% for height. Regarding the remaining latent classes, where there are recording errors in measured anthropometrics, we find small probabilities. For instance, the probability that weight and height observations contain error in the self-reported data and recording errors in the measured anthropometrics, corresponding to the fifth latent class \(\left( {\Pr \left( {R = 2,S = 2} \right)} \right)\), is 0.9% and 1.5% for weight and height, respectively. Overall, these results indicate that although there are non-negligible recording errors in measured body weight and height data (about 7 kg and 4.7 cm difference on average as compared to true body weight and height, respectively), their probability of occurrence is small.
We conducted a sensitivity analysis, where measured body weight and height data are rounded to the nearest integer (Table 12, Appendix); this allows us to have the same scale in measured and reported data, but it masks the part of measurement error that is attributable to lack of precision in the recording of the self-reported data. There are differences in the six latent classes probabilities, reflecting the difference in the proportion of completely labelled cases \(\left( {\Pr \left( {R = 1,S = 1} \right)} \right)\). For instance, the increase in the probability of completely labelled cases as opposed to the case of our base case results (from 10% in the base case to 26.3% for the sensitivity analysis for weight; and, from 23.3% to 32.4% for height) is reflected in the reduction in the latent class probabilities for classes two and three (Table 4 vs. Table 12).
Finally, we conducted a sensitivity analysis to explore whether our results presented in Table 4 are sensitive to using the average of the two weight and height measurements to define measured anthropometrics (for the mixture models). The corresponding parameter estimates and latent class probabilities (Table 13, Appendix) are practically identical to those presented in Table 4.
4.2 Mixture model with covariates
Table 5 reports the AIC and BIC for the KY models, separately for body weight and height, with no covariates (i.e. our baseline model), and for the KY models that account for our set of covariates. Across all factor mixture models for body weight and height, those that account for covariates have lower AIC and BIC as opposed to the counterparts without covariates (baseline models). Overall, it seems that models with covariates perform better than our baseline models, suggesting that the former can be used to explore potential differential patterns in measurement error across individual characteristics.
Tables 6 and 7 report the estimates of our factor mixture model with covariates for body weight and height, respectively. We report the APMs for the full sample ("all"), and for the specific groups of individuals based on the set of covariates we account for in the case of true latent body weight and height, precision error \(\left( {\eta_{i} } \right)\), additional random noise \(\left( {\omega_{i} } \right)\) and the mean-reverting error in self-reported anthropometrics (Panel A). Tables 6 and 7 also report the estimates of error probabilities and class probabilities (Panel B). Regarding the estimates of the membership probabilities for the six latent classes (Tables 6 and 7, Panel B), these are very similar to the corresponding results without covariates (Table 4).
True anthropometrics: The estimated mean of true body weight is 71.9 and SD 13.8 ("all" estimates); as in the case of our model without covariates (Table 4), these results show that the distribution of the latent true weight has a higher mean (by about 0.4 kg) than the mean of self-reported body weight (p-value of the difference in means < 0.001). Differences across individual characteristics are as we expect. In line with existing findings (Fryar et al. 2021), men have greater average and more dispersed body weight than women; gender differences in APM are highly statistically significant (as shown by “+++” reflecting the statistical significance of the pairwise comparisons of APMs by gender). Moreover, there is an inverted U-shaped association between mean true body weight and age (Baum 2007); variations in the dispersion of the true weight distribution are also observed across age groups. We also observe systematic regional differences in mean latent weight (with “+++” in Table 6, reflecting the joint significance of between-region groups differences in APM); the APM for the mean true body weight is higher in the South and the Southeast as opposed to other regions; this confirms existing literature about higher obesity rates for these regions in Brazil (Rimes-Dias et al. 2022).
Turning to the latent true height (Table 7), the estimated mean is 164.4 (with a standard deviation \(\sigma_{\xi }\) = 6.848); in line with the corresponding values from our model without covariates (Table 4), the estimated true mean height is lower (by -0.8 cm) than the mean of self-reported height (Table 3). Higher mean true height, but also a larger dispersion in the relevant distribution, is observed for males and younger individuals (Fryar et al. 2021). Regional variations in the mean and standard deviation of the true height distribution are also evident in Table 7.
Measured anthropometrics: We find that the mean and the standard deviation of measured body weight and height (\(\mu_{\zeta } , \sigma_{\zeta } )\) are comparable to the corresponding parameters from our baseline models without covariates (Tables 6 and 7 versus Table 4), and thus, estimating models with covariates does not change the conclusions of our analysis. Note that, given absence of interviewer-level data, we do not condition these parameters on covariates when estimating the factor mixture models presented in Tables 6 and 7. Specifically, the error-prone measured body weight has a mean and standard deviation (\(\mu_{\zeta } = 78.792\), \(\sigma_{\zeta } = 18.952\)), which are higher than the corresponding values for the true weight distribution; in line with our results from our baseline model without covariates, the estimated mean of measured body weight for those cases that are subject to (intentional or unintentional) recording error is around 7 kg (or almost 10%) higher than the estimated mean of true weight.
Regarding height, the estimated mean of the measured height for those cases that are subject to error (error-prone measured body height) is around 163.0 cm; this is very close to the corresponding estimated mean from the baseline model without covariates (around 160 cm in Table 4) and confirms our baseline results, suggesting that the mean of the error-prone measured height is lower than the estimated mean of true height. In line with our baseline models, our analysis shows that errors in measured anthropometrics occur with a probability \(\left( {1 - \pi_{r} } \right)\) that is very low in magnitude (about 1.5% for body weight and height) but systematically different from zero (p-values < 0.01).
Measurement errors heterogeneity in self-reported weight: The estimated mean value of precision error in self-reported body weight \(\left( {\mu_{\eta } } \right)\) is -0.29 \(\left( {{\text{with }}\;{\text{a}}\;{\text{standard}}\;{\text{deviation}},{ }\sigma_{\eta } = 1.60} \right)\). Taking these values as a benchmark (“All” estimates, Table 6), there are systematic differences in the precision error distribution across population groups. Mean of the reporting precision error is positive for males (0.057), while it is negative for females (− 0.60); gender differences in APM are highly statistically significant (as shown by “+++” reflecting the statistical significance of the pairwise comparisons of APMs by gender). Moreover, the standard deviation of the imprecision error is higher for men. In other words, it seems that men and women have different patterns of reporting for self-reported weight; there is systematic mean upward bias for males with a higher dispersion, while a downward bias with lower dispersion is observed for females. Turning to age groups, the estimated APM of mean imprecision error is negative for all age groups, with variations in both mean imprecision error in body weight (and its dispersion) across age groups. There are systematic regional differences in mean imprecision error (as evident by the “+++” in Table 6 reflecting the joint significance of between-region groups differences in APM); APM for the mean impression error is more negative for the Southeast (− 0.347) and Northeast (− 0.292), while the South has an APM for mean imprecision error of − 0.180.
The additional random error in self-reported body weight data has a negative mean \(\left( {\mu_{\omega } } \right)\), − 0.460, and a standard deviation of \({ }\sigma_{\omega } = 4.97\). As in the case of the imprecision error, there is a gender differences, with random error being much more negative and highly significant for females (− 0.713) than males (− 0.178). There are also systematic age variations (with the joint test for between-age groups differences in APMs being statistically significant at the 1% level)—notably the APM for the mean random error is negative and (non-monotonically) increasing in absolute terms for older respondents, while it is positive and higher in (absolute) magnitude for the oldest age group (70 +); the oldest age group has the highest dispersion (APM for standard deviation is 5.7). There are negative and statistically significant APMs for mean random error for all regions and between-region differences in the mean random errors and their dispersion.
Turning to the APM for mean reversion \(\left( \rho \right)\), we confirm our results from the models without covariates suggesting the mean reversion is small in magnitude (close to zero) but statistically significant (“all” APM in Table 6). There are gender, between-age group and between-region differences in mean-reverting patterns.
Measurement error heterogeneity in self-reported height. The estimated mean and standard deviation of precision error in self-reported body height \(\left( {\mu_{\eta } ,\sigma_{\eta } } \right)\) are 0.50 and 1.92 (“All” estimates, Table 7), respectively. Taking these values as a benchmark, we observe differences in the precision error distribution across covariates (Table 7). The mean of the reporting precision error does not differ systematically by gender. The imprecision error in body height is almost monotonically increasing across age groups, with a similar pattern for the dispersion of the imprecision reporting error distribution for older age groups. There are systematic regional variations for the mean imprecision error; the APM for the mean impression error which is more positive for the South (0.70), while the Northeast has the lowest estimated APM for mean imprecision error (0.39).
The additional random error in self-reported body height has a positive mean value \(\left( {\mu_{\omega } } \right)\), 1.28, and a standard deviation of \({ }\sigma_{\omega } = 4.96\) (“All” estimates in Table 7). Mean values of the additional random noise are higher for females as opposed to males (2.34 vs. 0.09). There are also systematic age variations (as shown by “+++” in Table 7)—notably the APM for the mean random error is positive and (mostly) increasing with age; the oldest age groups (60–69 and ≥ 70) have the highest mean of the random error and the highest dispersion. Systematic regional differences in APM for mean random error as well as for the dispersion of the distribution of the errors are also evident.
As in the baseline model without covariates (Table 4), the mean reversion \(\left( \rho \right)\) for the case of self-reported height is small in magnitude (close to zero) but statistically significant (“all” APM in Table 7).
4.3 Post-estimation analysis
Table 8 shows the precision of the seven types of “hybrid” predictions for body weight, for our factor mixture models with and without covariates, using simulations with 1000 replications. The results for height are shown in Table 9. Our first measure of reliability is analogous to the slope coefficient from a (hypothetical) regression of true anthropometrics on the observed anthropometrics measure; higher values correspond to greater reliability, and a value greater than one indicates mean reversion. Reliability 2 represents the squared correlation between true anthropometrics and observed anthropometrics measure. These reliability measures should only be used to assess how close a given measure is to the relevant true value and should not be compared across model specifications. For the models of body weight and height, with and without covariates, all hybrid measures provide very large reliability coefficients. A closer look at Tables 8 and 9 shows that the smallest MSE is found for the weighted (conditional) prediction for both anthropometric measures and across models with and without covariates. This indicates that these predictors perform better, as shown by the MSE using out-of-the-sample simulations, and thus, the weighted (conditional) prediction is our preferred “hybrid” prediction for both weight and height.
Simulation analysis helps to identify the preferred predictors for the latent true body weight and height. After choosing the preferred prediction, the sample data are used to compute the true latent anthropometric measures in order to calculate a BMI measure that aims to approximate the true BMI distribution. Table 10 provides descriptive statistics of this preferred “hybrid” BMI measure obtained from models with and without covariates. Table 10 presents descriptive statistics for the "corrected self-reported BMI” measure—a frequently used measure in studies that do not have access to measured anthropometric data (Cawley 2015).Footnote 14 Although the “corrected self-reported BMI” is not driven by the mixture measurement error models, adding BMI measures in Table 10 that are based on “corrective” equations allows comparisons with a popular measure used in the existing literature in the absence of measured BMI data.
The “hybrid” BMI measures, with and without covariates, and the BMI based on measured data are very close both at the mean and across their distribution. It seems, however, that at the right tails, the q75 and q90 are slightly lower for the “hybrid” measures (both with and without covariates) as opposed to the BMI based on measured data; the latter reflected in the difference in inter-quantiles ranges between the “hybrid” measures and measured BMI. On the other hand, BMI values that are based on self-reported data are always lower at the mean level and across quantiles of the distribution as well as with a lower dispersion compared to both the “hybrid” measures (obtained from models with and without covariates) and the BMI based on measured data. Statistics for the “corrected self-reported BMI” depart from the self-reported BMI values; the “corrected self-reported BMI” has a higher mean and quantiles, up to median of the distribution, when compared to the “hybrid” measures, while the opposite is the case at higher quantiles (p75 and p90).
Overall, these results suggest that similar “hybrid” measures are obtained from the models with and without covariates and that these measures are very close to the BMI measure based on the measured weight/height data. On the other hand, the BMI based on self-reported data under-estimates the “true” values. This indicates that the recording error in measured anthropometrics does not translate into major differences between the “hybrid” and the measured anthropometrics as a result of their small likelihood of occurrence in our sample. Finally, the distribution of the popular “corrected self-reported BMI” does not perform well as an alternative to measured BMI or our “hybrid” BMI measures.
4.4 Implications for the association between BMI and health care utilization
In this sub-section, we provide evidence to test the sensitivity of econometric analyses where BMI is used as an explanatory variable. We compare results with the “hybrid” BMI measures, estimated from our factor mixture models, with those based on self-reported, “corrected self-reported” and measured anthropometrics.
We estimate linear regression models to measure the association between BMI and the frequency of hospital admissions in the previous 12 months (Table 11). To facilitate interpretation of specifications that use polynomials in BMI, Fig. 3 presents the adjusted predictions at representative values (APRs), i.e. the predicted health care use across selected BMI values with all the other variables kept at their mean values (based on the models presented in Table 11). As shown in Fig. 3, the APRs for health care use are practically identical for the “hybrid” measures of BMI. Although the APRs for health care use for the measured and “hybrid” BMI measures are similar across their distribution, they differ at the very high tails of the BMI distribution (above BMI values of 41.5 kg/m2). Turning to self-reported BMI, the relevant results depart from those of our “hybrid” BMI measures (and measured BMI) especially at the lower (BMI values below 23.5 kg/m2) and higher tails (BMI values above 37 kg/m2) of the BMI distribution. The APRs for health care use for the corrected self-reported BMI lie between the corresponding results for self-reported and measured anthropometrics, lying much closer to those for BMI based on measured data.
Our aim in this sub-section is not to guide whether or not “hybrid” BMI measures, based on factor mixture model specifications, should be used in empirical research on the economics of obesity. We only argue that, in the context of the analysis of the association between BMI and health care utilization, results based on our “hybrid” BMI data show some (limited) differences, compared to those obtained from measured data, at the far-right tails of the BMI distribution. Given our evidence that a small but statistically significant fraction of measured anthropometrics is attributed to recording errors, the observed differences at the far-right tails of the BMI distribution suggest that hybrid measures may offer some potential advances when targeting the far-right tails of the BMI distribution, although more research should be undertaken on this.
5 Conclusion
Existing research in the economics of obesity shows that self-reported data are subject to measurement error, which can lead to biased estimates in empirical research that relies on self-reported anthropometrics (e.g. Cawley 2015; Cawley et al. 2015; Davillas and Jones 2021; Gil and Mora 2011; O’Neill and Sweetman 2013). These analyses, however, explicitly assume that measured anthropometrics are error-free as they are treated as “gold standards” when compared to self-reported data. The literature provides little discussion of the potential measurement errors that measured anthropometrics may entail. The latter is of particular relevance given developments in large-scale social surveys that involve the integration of physical health measurements, in addition to traditional self-reported measures. To fill this gap in the literature, we use the KY factor mixture model (Kapteyn and Ypma 2007) to analyse and characterize measurement error in both self-reported and measured anthropometrics with national representative data from the 2013 National Health Survey in Brazil.
We find that a very small, but statistically significant fraction of measured anthropometrics, may contain recording errors. The estimated probability that the self-reported anthropometrics are free from any measurement error is, as expected, relatively low at about 10% and 23% for body weight and height data─these results remain robust with and without accounting for covariates in our factor mixture models. This highlights that respondent’s lack of awareness of their true anthropometrics in combination with the lack of precision of the self-reported questionnaires may be sources of the observed measurement error. For example, it has been argued that enhancing people’s knowledge of their exact anthropometric values (by monitoring interventions) may indeed improve their ability to accurately report their anthropometric values (Sherry et al. 2007).
Of particular interest, our analysis reveals that mean-reverting errors in self-reported anthropometrics are low in magnitude, after accounting for other sources of errors in self-reported data. These findings contradict the existing literature that compares self-reported with measured anthropometrics, arguing there are strong mean-reverting patterns at least for body weight (e.g. Cawley et al. 2015). However, unlike our analysis, it should be noted that most studies that compare self-reported with measured anthropometrics assume measured anthropometrics are error-free and do not account for other potential sources of measurement errors. Our study is, therefore, potentially useful for exploring the sources of measurement errors that may affect both self-reported and measured anthropometrics and the magnitude of bias that each source of error may cause.
Factor mixture models that account for covariates are used to explore the potential heterogeneity of reporting errors in self-reported as well as the true latent anthropometrics across population groups. A limitation of our analysis is the absence of interviewer-level data. This necessarily limits our factor mixture models, with no covariates accounted for in the measurement error in measured anthropometrics. If we had had the opportunity to provide insights on how measurement error in the physical measurements varies across interviewer characteristics, we would have been able to provide relevant recommendations to survey data teams to improve measurement protocols.
Latent true anthropometrics vary across age groups, by gender and across regions broadly in line with the existing literature (e.g. Arntsen et al. 2023; Baum and Ruhm 2009; Davillas and Jones 2020; Fryar et al. 2021; Rimes-Dias et al. 2022). Males have higher true mean body weight and height; mean true body weight is associated with an inverted U-shaped relationship with age; mean true height is monotonically decreasing with age, reflecting birth cohort effects and loss of height as people become older (e.g. Arntsen et al. 2023).
Overall, we find that being older is associated with higher reporting errors (due to imprecision error or other random errors) in self-reported body weight and height. These results highlight the role of age-related changes in cognitive and communicative functioning on self-reported data (Knäuper et al. 2016); it has been shown than age-related impacts on cognitive ability, question interpretation as well as memory retrieval have impacts of people’s self-reports as they become older. This is of particular relevance in the case of self-reports of body weight and height as it involves respondent’s cognitive ability, memory and ability to process information. Moreover, we find the presence of systematic gender differences in measurement error in self-reported weight and height data, with women reporting with more errors than men. These results are broadly in line with the existing literature (e.g. Cawley et al. 2015; Gil and Mora 2011). For example, it may be the case that women experience greater social stigma than men for having excess weight or that other related social norm pathways affecting reporting behaviour in anthropometrics may be the relevant underlying mechanisms (Gil and Mora 2011; Puhl and Heuer 2009; Sattler et al. 2018). The observed regional differences in self-reported measurement error may reflect cultural, socioeconomic and demographic differences across Brazilian regions; however, it is impossible to disentangle the role of particular characteristics on shaping these results given the aggregate regional variations we employ in our analysis.
To explore the practical implications of our measurement error results, post-estimation analysis and out-of-the-sample simulations are employed to estimate hybrid anthropometric predictions that best approximate the true body weight and height distribution. Our proxies of true BMI distribution are very close to the distribution of BMI based on measured anthropometrics. On the other hand, BMI based on self-reported data seems to under-estimate the true BMI distribution. “Corrected self-reported BMI” measures, based on conventional methods to mitigate reporting error in self-reports using predictions from corrective equations, do not perform as well as our “hybrid” BMI measures.
We implement analysis on the potential implications of the measurement error when different BMI measures are used as explanatory variable in econometric models on health care utilization. We find similar econometric results when they are based on measured data or on our hybrid BMI measures across most of the BMI distribution, while only small differences are observed at the very high tails of the distribution (above BMI values of 41.5 kg/m2). Differences are also observed in the econometric results based on self-reported or “corrected self-reported” data when compared to our hybrid BMI measures at the lower and higher BMI tails. Our findings further confirm existing evidence suggesting that BMI based on self-reported data may bias econometric results when BMI is used as an explanatory variable (e.g. Cawley et al. 2015), and suggest that conventional ways to correct self-reported anthropometrics may not provide mitigation.
Measured anthropometrics may encompass some systematic measurement error, but our estimates suggest a very low prevalence of errors and this is reflected in the presence of only small differences, concentrated at the right tail of the distribution, compared to our proxies of true BMI. Nevertheless, the possibility of errors in measured anthropometrics should be acknowledged when searching for an error-free adiposity measure, especially when focusing on the extreme right tail of the distribution of BMI.
Notes
These fabrication errors (if they exist) are unlikely to result in mean reversion/mean divergence but may be fairly random errors. Existing studies have shown evidence of misperception of body size (Zelenytė et al. 2021), suggesting that interviewers may not be able to accurately predict participants’ body weight/height (if not measured) and, thus, not be able to make guesses that may lead to mean reversion/mean divergence (i.e. guesswork that is strongly correlated with true body weight and height).
The factor mixture measurement error model proposed by Kapteyn and Ypma (2007) assumes that observed administrative income data are a mixture of correct matches and mismatches (with survey data). However, they argue that, over and above potential mismatches in the linkage between administrative and survey data, it is also likely that administrative and survey data may capture conceptually different things. As such, they argue that there is no loss of generality to assume that measurement error in administrative data may reflect different sources. Analogously, in our analysis measurement error in measured anthropometrics may reflect different sources (as described above), in particular interviewers’ errors related to entering values from the measurement equipment to the survey materials, fabrication of the measurement of anthropometrics by the interviewer or even physical measurements for the wrong household member.
Even in the case of fabricated interviews or when anthropometric measurement is not conducted for the intended respondent, this may be a strong assumption if quality control takes place. However, there is no such quality control undertaken in the dataset used in our analysis (as well as in many other multi-purpose social science datasets that collect anthropometrics).
Self-reported anthropometrics are collected as integer values (cm for height and Kg for weight), while the corresponding measured values are measured to one decimal point. In those cases where the respondent provided a non-integer value of their self-reported body weight and/or height (for example 61.5 kg), the interviewer recorded an integer value (such as 61 kg or 62 kg).
Mean reversion (ρ < 0) means that respondents with high (low) values of true anthropometric measures, relative to the true mean, tend to under-report (over-report) their body weight and height in self-reports; the opposite is the case for mean divergence (ρ > 0).
Moreover, one may argue that survey mode may influence measurement error in self-reported anthropometrics. For example, social desirability bias is much lower in the case of self-completion as opposed to the open interview (Bowling 2005); thus, assuming that being taller and not of excess weight is more socially desirable, shorter people and those with excess weight may have distinct reporting patterns across collection modes. However, existing studies do not confirm the presence of such influences in reporting errors. Davillas and Jones (2021) find that measurement errors in anthropometrics do not differ according to the mode of interview, with similar patterns observed when self-reported anthropometrics are collected using randomly assigned open interview and self-completion modes. Along similar lines, Cawley et al. (2015) who also discuss mean reversion in reporting error in weight highlight that interviewers do not amend/correct the self-reported anthropometrics based on measured data in their datasets and, thus, no additional interviewer effects are expected.
Typically, failures of measurement equipment may be also relevant for measurement error in physical measurements of anthropometrics. However, we believe that the risk of equipment failure is less relevant in our dataset given the prevention mechanisms/protocols we describe above.
The mean square error is computed as \(E\left( {{\text{predictor}} - \xi } \right)^{2} = {\text{Bias}}^{2} + {\text{Variance}}\). Reliability measures are computed as follows: \({\text{Rel}}1\left( r \right) = {\text{cov}} \left( {\xi ,r} \right)/{\text{var}} \left( r \right)\), \({\text{Rel}}1\left( s \right) = {\text{cov}} \left( {\xi ,s} \right)/{\text{var}} \left( s \right)\), \({\text{Rel}}2\left( r \right) = {\text{cov}} \left( {\xi ,r} \right)^{2} /\left[ {{\text{var}} \left( \xi \right) \cdot {\text{var}} \left( r \right)} \right]\) and \({\text{Rel}}2\left( s \right) = {\text{cov}} \left( {\xi ,s} \right)^{2} /\left[ {{\text{var}} \left( \xi \right) \cdot {\text{var}} \left( s \right)} \right]\). Further details can be found in Jenkins and Rios-Avila (2023a).
The 2013 National Health Survey of Brazil is publicly available online: https://www.ibge.gov.br/estatisticas/sociais/saude/9160-pesquisa-nacional-de-saude.html?=&t=microdados.
In PNS-2019, that collected data in 2019, body weight and height were measured for a much smaller sub-sample of respondents, due to the difficulties in physical anthropometric measurements for the full survey sample selected for individual interviews (Reis et al. 2022). On the other hand, in PNS-2013, the anthropometric measurements were carried out on all residents selected for the individual interview, except pregnant women (Damacena et al. 2015). Collection of both self-reported and measured anthropometrics at the same wave is necessary for our research question and the estimation requirements of our factor mixture models. Given that measured anthropometrics are only available for a small fraction of the total survey sample in PNS-2019 and because time sensitivity is not a constraint for the scope and the nature of our research question for this study, we have used the PNS-2013 data for our analysis.
Figure 4 (Appendix) plots the absolute differences between the 1st and 2nd body weight and height physical measurement. The graph shows that the mass of the absolute difference is concentrated at zero, and there are a few observations with absolute differences between the 1st and 2nd measurement that exceeds 1.5 kg (for body weight) or 1.5 cm (for body height).
The corresponding kernel density distributions for self-reported and measured body weight, height and BMI are presented in Figure 5 (Appendix). It seems that both self-reported and measured body height data have approximately normally shaped distributions, although right-skewed distributions are observed for the case of body weight and BMI. This is important as our model assumes normality for the factor distributions and identification of the components of the mixture of normals stems from non-normality in the (joint) distribution of observed outcomes.
Existing studies in the economics of obesity literature that rely on self-reported anthropometrics often estimate corrective equations (or utilize the coefficients from existing equations) based on the relationship between measured and self-reported body weight and height data from alternative data sources (Cawley 2015). To mimic correction procedures for self-reported anthropometrics in the existing studies, we estimate analogous “corrective” equations by regressing measured weight and height data on self-reports and a vector of demographics (results from these equations are available in Appendix, Table 14). The predictions from these equations are used to calculate self-reports of body weight and height that are corrected for reporting error—these results from our "corrected self-reported BMI” measure as presented in Tables 10 and 11.
References
Arntsen SH, Borch KB, Wilsgaard T, Njølstad I, Hansen AH (2023) Time trends in body height according to educational level: a descriptive study from the Tromsø Study 1979–2016. PLoS ONE 18(1):e0279965
Baum CL II, Ruhm CJ (2009) Age, socioeconomic status and obesity growth. J Health Econ 28(3):635–648
Baum CL (2007) The effects of race, ethnicity, and age on obesity. J Popul Econ 20:687–705
Bilger M, Kruger EJ, Finkelstein EA (2017) Measuring socioeconomic inequality in obesity: looking beyond the obesity threshold. Health Econ 26:1052–1066
Bowling A (2005) Mode of questionnaire administration can have serious effects on data quality. J Public Health 27(3):281–291
Cawley J (2015) An economy of scales: a selective review of obesity’s economic causes, consequences, and solutions. J Health Econ 43:244–268
Cawley J, Meyerhoefer C (2012) The medical care costs of obesity: an instrumental variables approach. J Health Econ 31(1):219–230
Cawley J (2004) The impact of obesity on wages. J Hum Resources 39(2):451–474
Cawley J, Maclean JC, Hammer M, Wintfeld N (2015) Reporting error in weight and its implications for bias in economic models. Econ Hum Biol 19:27–44
Damacena GN, Szwarcwald CL, Malta DC et al (2015) The development of the National Health survey in Brazil, 2013. Epidemiologia e Serviços De Saúde 24:197–206
Davillas A, Benzeval M (2016) Alternative measures to BMI: exploring income-related inequalities in adiposity in Great Britain. Soc Sci Med 166:223–232
Davillas A, Jones AM (2020) Regional inequalities in adiposity in England: distributional analysis of the contribution of individual-level characteristics and the small area obesogenic environment. Econ Hum Biol 38:100887
Davillas A, Jones AM (2021) The implications of self-reported body weight and height for measurement error in BMI. Econ Lett 209:110101
Davillas A, Pudney S (2017) Concordance of health states in couples: analysis of self-reported, nurse administered and blood-based biomarker data in the UK Understanding Society panel. J Health Econ 56:87–102
Davillas A, Pudney S (2020a) Biomarkers as precursors of disability. Econ Hum Biol 36:100814
Davillas A, Pudney S (2020b) Biomarkers, disability and health care demand. Econ Hum Biol 39:100929
Engstrom JL, Paterson SA, Doherty A et al (2003) Accuracy of self-reported height and weight in women: an integrative review of the literature. J Midwifery Womens Health 48(5):338–345
Finn A, Ranchhod V (2017) Genuine fakes: the prevalence and implications of data fabrication in a large South African survey. World Bank Econ Rev 31(1):129–157
Fryar CD, Carroll MD, Gu Q, Afful J, Ogden CL (2021) Anthropometric reference data for children and adults. U.S. Department of Health & Human Services, National Centre of Health Statistics, United States
Gil J, Mora T (2011) The determinants of misreporting weight and height: the role of social norms. Econ Hum Biol 9:78–91
Gorber SC, Tremblay M, Moher D, Gorber B (2007) A comparison of direct vs. self-report measures for assessing height, weight and body mass index: a systematic review. Obes Rev 8(4):307–326
Groves RM (2005) Survey errors and survey costs. Wiley
Jenkins SP, Rios-Avila F (2020) Modelling errors in survey and administrative data on employment earnings: sensitivity to the fraction assumed to have error-free earnings. Econ Lett 192:109253
Jenkins SP, Rios-Avila F (2021) Measurement error in earnings data: replication of Meijer, Rohwedder, and Wansbeek’s mixture model approach to combining survey and register data. J Appl Economet 36(4):474–483
Jenkins SP, Rios-Avila F (2023a) Reconciling reports: modelling employment earnings and measurement errors using linked survey and administrative data. J R Stat Soc Ser A Stat Soc 186(1):110–136
Jenkins SP, Rios-Avila F (2023b) Finite mixture models for linked survey and administrative data: estimation and post-estimation. Stand Genomic Sci 23(1):53–85
Johnston DW, Propper C, Shields MA (2009) Comparing subjective and objective measures of health: evidence from hypertension for the income/health gradient. J Health Econ 28(3):540–552
Kapteyn A, Ypma JY (2007) Measurement error and misclassification: a comparison of survey and administrative data. J Law Econ 25:513–551
Keith SW, Fontaine KR, Pajewski NM, Mehta T, Allison DB (2011) Use of self-reported height and weight biases the body mass index–mortality association. Int J Obes 35(3):401–408
Knäuper B, Carrière K, Chamandy M, Xu Z, Schwarz N, Rosen NO (2016) How aging affects self-reports. Eur J Ageing 13:185–193
Li J, Simon G, Castro MR, Kumar V, Steinbach MS, Caraballo PJ (2021) Association of BMI, comorbidities and all-cause mortality by using a baseline mortality risk model. PLoS ONE 16(7):e0253696
Lin X, Xu Y, Jl Xu et al (2020) Global burden of noncommunicable disease attributable to high body mass index in 195 countries and territories, 1990–2017. Endocrine 69(2):310–320
Ljungvall Å, Gerdtham UG, Lindblad U (2015) Misreporting and misclassification: implications for socioeconomic disparities in body-mass index and obesity. Eur J Health Econ 16:5–20
Meijer E, Rohwedder S, Wansbeek T (2012) Measurement error in earnings data: using a mixture model approach to combine survey and register data. J Bus Econ Stat 30:191–201
Ng M, Fleming T, Robinson M et al (2014) Global, regional, and national prevalence of overweight and obesity in children and adults during 1980–2013: a systematic analysis for the Global Burden of Disease Study 2013. The Lancet 384(9945):766–781
O’Neill D, Sweetman O (2013) The consequences of measurement error when estimating the impact of obesity on income. IZA J Labor Econ 2(1):1–20
Olbrich L, Kosyakova Y, Sakshaug JW (2022) The reliability of adult self-reported height: the role of interviewers. Econ Hum Biol. https://doi.org/10.1016/j.ehb.2022.101118
PNS (2013) Pesquisa Nacional de Saúde 2013 – Manual de Antropometria. Instituto Brasileiro de Geografia e Estatistica. Rio de Janeiro. Available at: https://biblioteca.ibge.gov.br/visualizacao/instrumentos_de_coleta/doc3426.pdf
Collaboration PS, Whitlock G, Lewington S et al (2009) Body-mass index and cause-specific mortality in 900000 adults: collaborative analyses of 57 prospective studies. The Lancet 373(9669):1083–1096
Puhl RM, Heuer CA (2009) The stigma of obesity: a review and update. Obesity 17(5):941–964
Reis RCPD, Duncan BB, Malta DC et al (2022) Evolution of diabetes in Brazil: prevalence data from the 2013 and 2019 Brazilian National Health Survey. Cad Saude Publica 38:e00149321. https://doi.org/10.1590/0102-311X00149321
Rimes-Dias KA, Costa JC, Canella DS (2022) Obesity and health service utilization in Brazil: data from the National Health Survey. BMC Public Health 22(1):1474
Rooth DO (2009) Obesity, attractiveness, and differential treatment in hiring a field experiment. J Hum Resources 44(3):710–735
Rtveladze K, Marsh T, Webber L et al (2013) Health and economic burden of obesity in Brazil. PLoS ONE 8(7):e68785. https://doi.org/10.1371/journal.pone.0068785
Sattler KM, Deane FP, Tapsell L, Kelly PJ (2018) Gender differences in the relationship of weight-based stigmatisation with motivation to exercise and physical activity in overweight individuals. Health Psychol Open 5(1):2055102918759691. https://doi.org/10.1177/2055102918759691
Sherry B, Jefferds ME, Grummer-Strawn LM (2007) Accuracy of adolescent self-report of height and weight in assessing overweight status: a literature review. Arch Pediatr Adolesc Med 161(12):1154–1161
Szwarcwald CL, Malta DC, Pereira CA et al (2014) Pesquisa Nacional de Saúde no Brasil: concepção e metodologia de aplicação. Cien Saude Colet 19(2):333–342
Triaca LM, Jacinto PA, França MTA, Tejada CAO (2020) Does greater unemployment make people thinner in Brazil? Health Econ 29:1279–1288
U.S.D.H.H.S. (2010) The Surgeon General’s Vision for a Healthy and Fit Nation. U.S. Department of Health and Human Services, Office of the Surgeon General, Rockville, MD
Zelenytė V, Valius L, Domeikienė A et al (2021) Body size perception, knowledge about obesity and factors associated with lifestyle change among patients, health care professionals and public health experts. BMC Fam Pract 22(1):1–13
Zhang Q, Wang Y (2004) Socioeconomic inequality of obesity in the United States: do gender, age, and ethnicity matter? Soc Sci Med 58(6):1171–1180
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Davillas, A., de Oliveira, V.H. & Jones, A.M. A model of errors in BMI based on self-reported and measured anthropometrics with evidence from Brazilian data. Empir Econ (2024). https://doi.org/10.1007/s00181-024-02616-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00181-024-02616-w