1 Introduction

Obesity is a strong predictor of overall mortality (Li et al. 2021; Prospective Studies Collaboration et al. 2009) and an important risk factor for several noncommunicable diseases such as cardiovascular diseases, diabetes, musculoskeletal disorder and some cancers (Lin et al. 2020). A large literature has explored the economic and social ramifications of obesity, such as poorer labour market outcomes, increased health care utilization and associated public health costs (e.g. Cawley 2004, 2015; Rooth 2009). Moreover, studies have investigated and measured socioeconomic inequalities in obesity (e.g. Bilger et al. 2017; Davillas and Benzeval 2016; Zhang and Wang 2004).

Despite an influential report on the importance of physically measured health indicators for understanding how the social and economic environment may get under the skin, several multi-purpose social science datasets continue to collect only self-reported weight and height data (Cawley 2015). Some existing studies do use datasets that collect measured anthropometrics, often in addition to self-reported anthropometric data (e.g. Cawley 2015; Cawley et al. 2015; Davillas and Jones 2021; Gil and Mora 2011). Studies that analyse measurement error in anthropometric data typically compare self-reports and measured anthropometric data; this research explicitly assumes that measured anthropometric data are error-free “gold-standard” measures. Specifically, Cawley et al. (2015), using the US National Health and Nutrition Examination Survey (NHANES) for the period between 2003 and 2010, compare self-reports with measured weight and height data. They find that reporting error in self-reported data is non-classical, with those who are underweight, based on measured anthropometrics, tending to over-report and overweight and obese respondents tending to under-report their weight. Gil and Mora (2011) use data from the 2006 Catalan Health and Health Examination Surveys and compare self-reported with measured anthropometrics. They find that social norms regarding ideal weight may affect reporting bias in self-reported anthropometrics. Those respondents who are more satisfied with their own body image are less prone to under-report their weight, although these results are subject to the definition of social norms on body image.

Using UK data, Davillas and Jones (2021) conducted an experiment to explore the extent of measurement error in body mass index (BMI), when self-reported body weight and height data are compared to measured anthropometrics. This study shows non-classical reporting error in height and weight; taller people seem to report their height more accurately and a sharp increase in reporting errors in self-reported body weight for those of greater measured weight. Further analysis shows that heterogeneity in self-reported anthropometrics is associated with within-household measured BMI data. A study employing Swedish data (Ljungvall et al. 2015) finds the presence of reporting error in self-reports of BMI (called misreporting in the study), when compared to measured anthropometric data, and that there is systematic social patterning in misreporting which matters for estimation of the education and income gradients in BMI when based on self-reports. O’Neill and Sweetman (2013), using a selective sample of mothers from the Irish Cohort Study, find that self-reported BMI, as compared to measured anthropometrics, is subject to substantial measurement error, which also causes an overestimation of the relationship between BMI and income. Finally, the role of interviewers is examined by Olbrich et al. (2022). Using various datasets from the USA, UK and Germany, this study shows that interviewers play an important role in differences between reported and measured body height data as well as on the changes in reported height over survey waves.

Over and above the fact that these existing studies assume that measured anthropometrics are error-free, they mostly compare self-reports and measured anthropometric data that were collected with a considerable time difference or where respondents were informed about the subsequent physical measurements (Cawley et al. 2015; Gil and Mora 2011). The related medical literature is often based on selected age groups, non-representative samples and neither aims to characterize the measurement error nor to quantify the implications of the measurement error for economic modelling (e.g. Engstrom et al. 2003; Gorber et al. 2007; Keith et al. 2011).

Despite the fact that measured anthropometrics are assumed to be error-free in much of the existing literature, the accuracy of measured anthropometrics may indeed be affected by several factors. For instance, recent evidence has documented the influence of interviewers on reliability of measured and self-reported body height data in different surveys (e.g. Finn and Ranchhod 2017; Olbrich et al. 2022). Potential sources of measurement error include both unintentional (such as accidental recording errors from measurement equipment to survey materials) and intentional (i.e. fabricating parts of the measurement or even conducting physical measurements on the wrong respondent) recording errors (e.g. Finn and Ranchhod 2017; Groves 2005; Olbrich et al. 2022). These may not be easy to detect if the interviewers visited the household to conduct the interview (Olbrich et al. 2022). Moreover, in some datasets, interviewers may visit the household more than once to complete a socioeconomic questionnaire and collect physical measurements, including anthropometrics; this increases the likelihood that mis-identification takes place. More broadly, the literature has discussed the presence of measurement error in more objectively measured nurse-collected and blood-based health data (Davillas and Pudney 2020a,b). These studies use latent variable models to account for measurement error, but they do not aim to explicitly model measurement error or to explore its potential implications for economic models. Overall, there is limited research that has access to both self-reported and measured anthropometric data collected within the same survey wave.

Our paper contributes to the literature in various ways. We model potential measurement error in both self-reported and measured anthropometrics (i.e. body weight and body height). We use data from the 2013 National Health Survey (Pesquisa Nacional de Saúde; PNS 2013) of Brazil, which is a nationally representative dataset that allows for measured and self-reported data on body weight and height to be collected from the same individuals within the span of a household interview. In Brazil, obesity has systematically increased since the 2010s, with one in every five adults experiencing obesity (Triaca et al. 2020). Projections of the obesity-related costs in Brazil show that the annual health care costs may double from 2010 ($5.8 billion) to 2050 ($10.1 billion)─a total health care cost of $330 billion over 40 years (Rtveladze et al. 2013). As such, obesity is an important public health concern for Brazil.

To analyse measurement error in the Brazilian data, we use a factor mixture model, initially proposed by Kapteyn and Ypma (2007). This Kapteyn and Ypma (KY) factor mixture model is applied and extended by Jenkins and Rios-Avila (2020) and Jenkins and Rios-Avila (2021, 2023a) to analyse measurement error in self-reported and administrative income data. To the best of our knowledge, the KY factor mixture model has not been used to analyse measurement error in self-reported and measured anthropometric data. Unlike the existing literature that assumes no measurement error in measured body weight and height data, our analysis allows us to model different types of errors in both self-reported and measured anthropometrics. Specifically, we test the hypothesis that measured anthropometrics encompass data recording errors. Moreover, the self-reported anthropometric data are assumed to be subject to a wider set of measurement errors. These include the precision of the scale for the self-reported data, which are only recorded as whole numbers (in cm or Kg), non-classical mean-reverting errors and other types of remaining errors. As permitted by our data, we also estimate factor mixture models that account for individual-level covariates to explore the extent to which true latent anthropometrics as well as reporting error in self-reported anthropometrics, and their dispersion, may vary across population groups. Absence of interviewer-level data, however, prevents us from exploring heterogeneity in measured anthropometrics due to interviewer characteristics.

Our analysis also allows us to estimate the probability of each type of measurement error in both self-reported and measured data. Of particular interest, given that measured anthropometric data are often considered error-free (e.g. Cawley 2015; Davillas and Jones 2021; Gil and Mora 2011), our results suggest that a small but systematic fraction of measured anthropometrics contain data recording errors. Turning to self-reported weight and height, the estimated probabilities that the self-reported anthropometrics equal the true body weight and height are relatively low, at 10% and 23%, respectively.

Post-estimation analysis allows us to generate a set of predictions of the distribution of the true latent weight and height data that combine information from both self-reported and measured anthropometrics. Based on reliability measures and mean squared errors, estimated using simulated out-of-the-sample predictions, we select the best performing predictions of latent weight and height distributions. After choosing our preferred prediction, for our factor mixture models with and without covariates, our sample data are used to compute body weight and height measures that approximate the true values; these are then used to calculate our proxies of the true BMI distribution.

Finally, we compare the distributions of BMI using self-reported, measured and our proxies of true BMI; the latter are very close to the distribution of BMI based on measured anthropometrics, while the BMI based on self-reported data under-estimates the true BMI distribution. We also employ the “corrected self-reported BMI” as an additional measure—a conventional measure used in the existing economics of obesity literature to correct self-reported data for reporting error (Cawley 2015). Our results show that these “corrected self-reported BMI” measures are not a good alternative to our “hybrid” BMI measures. In addition, we provide evidence to explore the potential implications of the measurement error in both self-reported and measured anthropometrics. As an illustration, we compare results when each of the self-reported, “corrected self-reported”, measured and hybrid BMI measures is used as explanatory variables in linear regression models for the frequency of hospital admissions in the past 12 months. We find only moderate differences in the results between the hybrid BMI measure and the one based on measured anthropometrics, and these are concentrated in the far-right tails of the BMI distribution. More pronounced disparities are observed, at the lower and higher BMI tails, when our hybrid measures are compared with BMI measures based on self-reported or “corrected self-reported” data.

Understanding and characterizing measurement error in both self-reported and measured anthropometrics has important public health implications. Self-reported and/or physical measurements of anthropometrics are collected in several nationally representative surveys. For example, the Survey of Health, Ageing and Retirement in Europe (SHARE), the European Community Household Panel (ECHP), the German Socio-Economic Panel (GSOEP) as well as the National Longitudinal Survey of Youth (NLSY), the Medical Expenditure Panel Survey (MEPS) and the Behavioural Risk Factor Surveillance System (BRFSS) are datasets that are frequently used for obesity research but are limited to self-reports of body weight and height data. Recent advances to survey measurement allow for measured anthropometrics to be collected as part of multi-purpose social science surveys to improve data reliability on anthropometric measurement (Cawley et al. 2015). Data from nationally representative surveys are used to estimate obesity prevalence at the national level as well as for international comparisons (Ng et al. 2014). Measurement errors that may contaminate both self-reported and measured anthropometrics may affect within and between country and region comparisons of obesity prevalence and estimates of the population at increased health risks. Depending on the size of the measurement error, this may mislead potential public policies to mitigate regional or cross-country differences in excess body weight. Moreover, studies that quantify the (public) health care costs associated with obesity and related diseases often rely on survey data (Cawley and Meyerhoefer 2012); this research is influential and is used to justify government programmes to prevent obesity on the grounds of external costs (USDHHS 2010). The extent to which these estimates may be biased due to measurement error in measured and/or self-reported anthropometrics collected in surveys is of relevance from a public health point of view given the cost savings from reducing obesity prevalence.

The rest of the paper is organized as follows. Section 2 presents the methods used to analyse measurement error in both self-reported and measured anthropometric data. Our data source and descriptive statistics are presented in Sect. 3. The results of our analysis, post-estimation predictions and a preliminary analysis of the potential implications on measurement error in both self-reported and measured anthropometrics for economic research are presented in Sect. 4. Section 5 concludes and provides a summary of our findings.

2 Methods

We adapt the factor mixture model, proposed by Kapteyn and Ypma (2007), to model the relationship between measured and self-reported anthropometrics. This model has been applied and extended by Jenkins and Rios-Avila (2020) and Jenkins and Rios-Avila (2021, 2023a) to analyse measurement error in income data. In this study, we apply the KY model to measurement error in both self-reported and measured anthropometric data, on weight and height, using the 2013 National Health Survey of Brazil.

We assume that the true values of each anthropometric measure (weight or height) for an individual \(i\) \(\left( {\xi_{i} } \right)\) are unobserved, but we can observe both measured \(\left( {r_{i} } \right)\) and self-reported \(\left( {s_{i} } \right)\) anthropometrics. Table 1 provides a description of the types of errors in measured and self-reported anthropometric data that can be captured by our factor mixture model.

Table 1 Types of measurement error and their sources used in the factor mixture models

Measured anthropometrics are collected at the end of the individual questionnaire in our dataset. According to the survey protocol, it is possible for the individual interview to be completed in more than one visit, so it may be the case that physical measurements (which are time consuming as they include anthropometrics and blood pressure measurements) may not take place on the same day. Also, the measured anthropometrics are recorded by the interviewer by hand in the survey materials. Thus, measured anthropometrics may suffer from (unintentional or intentional) recording error related to entering values from the measurement equipment to the survey materials, fabrication of the measurement of anthropometrics by the interviewerFootnote 1 or even physical measurements taken from the wrong household member (especially if the main interview and physical measurements are not collected on the same day). These measurement errors, although they may occur with low frequency, could have a non-negligible impact on data reliability.

Thus, in the case of measured anthropometrics, we assume that the distribution of each anthropometric measure is a mixture of two types of observation:

$$ r_{i} = \left\{ {\begin{array}{*{20}l} {\xi_{i} } \hfill & {{\text{with }}\;{\text{probability }}\;\pi_{r} } \hfill \\ {\zeta_{i} } \hfill & {{\text{with }}\;{\text{probability}}\;{ }\left( {1 - \pi_{r} } \right)} \hfill \\ \end{array} } \right. $$
(1)

where measured anthropometrics \(\left( {r_{i} } \right)\) equals the true value with probability \(\pi_{r}\) (case R1). However, measured anthropometrics may be not equal to the true value for certain respondents with probability \(1 - \pi_{r}\) (case R2); thus, an error-ridden measure \(\left( {\zeta_{i} } \right)\) is observed in this case. In the spirit of the KY factor mixture model, this erroneous anthropometric measure, which is incorrectly attributed to individual \(i\), is denoted by \(\zeta_{i}\).Footnote 2 The true values and those with recording errors are both assumed to be independently and identically normally distributed: \(\xi_{i} \sim N\left( {\mu_{\xi } ,\sigma_{\xi }^{2} } \right)\), \(\zeta_{i} \sim N\left( {\mu_{\zeta } ,\sigma_{\zeta }^{2} } \right)\); this implies that the marginal distribution of \(r_{i}\) is a mixture of two normals. Given the type of errors that are captured by \( \zeta_{i}\), as described above, we assume that there is no correlation between \(\xi_{i}\) and \(\zeta_{i}\).Footnote 3 The assumption that the erroneous measurements are uncorrelated with the true values contributes to the identification of the full model as it implies that these measurements are also uncorrelated with the self-reported anthropometrics.

Each of our self-reported anthropometrics (i.e. weight or height) is assumed to be a mixture of three types of observation:

$$ s_{i} = \left\{ {\begin{array}{*{20}l} {\xi_{i} } \hfill & {{\text{with}}\;{\text{probability}}\; \pi_{s} } \hfill \\ {\xi_{i} + \eta_{i} + \rho \left( {\xi_{i} - \mu_{\xi } } \right) } \hfill & {{\text{with}}\;{\text{probability}}\; \left( {1 - \pi_{s} } \right)\left( {1 - \pi_{\omega } } \right)} \hfill \\ {\xi_{i} + \eta_{i} + \rho \left( {\xi_{i} - \mu_{\xi } } \right) + \omega_{i} } \hfill & {{\text{with}}\;{\text{probability}}\; \left( {1 - \pi_{s} } \right)\pi_{\omega } } \hfill \\ \end{array} } \right. $$
(2)

Table 1 describes all sources of measurement errors in self-reported anthropometrics that are captured in Eq. 2. Specifically, we assume that the self-reported anthropometrics \(\left( {s_{i} } \right)\) equals the true latent value \(\left( {\xi_{i} } \right)\) with probability \(\pi_{s}\) (case S1). The self-reported values are recorded as integers so this case only applies when the true value is a whole number.Footnote 4 Otherwise (cases S2 and S3), there must be some imprecision in \(s_{i}\) due the scale of measurement. This imprecision, reflecting different ways in which respondents may round their responses to whole numbers along with random noise in the self-reports, is captured by the error term \(\eta_{i}\). This error is independent of the true value \(\left( {\xi_{i} } \right)\). In addition, we allow for the possibility of non-classical mean-reverting (or mean-diverging) error (survey measurement error, which is captured by term \(\rho \left( {\xi_{i} - \mu_{\xi } } \right)\).Footnote 5 Existing studies comparing measured with self-reported data have shown the presence of mean-reverting errors in self-reported body weight (Cawley et al. 2015). The second case (S2), which allows for both sources of error, occurs with probability \(\left( {1 - \pi_{s} } \right)\left( {1 - \pi_{\omega } } \right)\). The third case (S3), which occurs with probability \(\left( {1 - \pi_{s} } \right)\pi_{w}\), adds a third source of measurement error \(\left( {\omega_{i} } \right)\) to allow for additional random noise that may occur in some observations who make additional errors in their self-assessments of height or weight (see Table 1). The measurement errors are both assumed to be independently and identically normally distributed: \(\eta_{i} \sim N\left( {\mu_{\eta } ,\sigma_{\eta }^{2} } \right)\), and \(\omega_{i} \sim N\left( {\mu_{\omega } ,\sigma_{\omega }^{2} } \right)\).

Note that the survey team undertook significant effort to minimize the risk of equipment failure for physical anthropometric measurements; our dataset employs international measurement protocols and validated equipment, which is calibrated daily to ensure reliability of the measurements. The procedures for taking anthropometric measures are defined to prevent biologically inaccurate measures and were done in partnership with the Laboratory for Nutritional Evaluation of Populations (LANPOP), part of the Public Health School in the University of São Paulo (Damacena et al. 2015; Szwarcwald et al. 2014). Also, the availability of two repeated physical measurements of body weight and height (we took the second measure for our main estimation and, for sensitivity analysis, the average of these measures) further reduces the likelihood of errors related to equipment failure. Thus, we do not capture this potential source of error in our factor mixture models.Footnote 6

The full KY model defines a mixture of six latent classes that correspond to the combination of cases R1 or R2 with S1, S2 or S3. Table 2 describes all the potential latent classes. For instance, the class 1 (R1, S1) consists of error-free self-reported (S1) and measured (R1) data and occurs with probability \(\pi_{r} \pi_{s}\). The full model is a mixture of the six bivariate normal distributions for the observed outcome pairs (\(r_{i}\), \(s_{i}\)), each with different means and covariance matrices (see Jenkins and Rios-Avila (2020, 2021) and Kapteyn and Ypma (2007) for full details).

Table 2 Groups (latent classes) in mixture model of self-reported and measured anthropometrics

The parameter estimates are obtained by maximizing the model log-likelihood (see Kapteyn and Ypma 2007, Appendix B), with identification relying on the existence of the “completely labelled” group that contains observations with error-free anthropometrics (class 1: R1-S1). Parameters \(\mu_{\xi }\) and \(\sigma_{\xi }^{2}\) are identified from these “completely labelled” observations and this contributes to identification of the other unknown parameters from the mixture of normals implied by the model specification (see Kapteyn and Ypma (2007) for further details on identification). Kapteyn and Ypma (2007) provide the expressions for the probability density functions and the associated log-likelihood function. Employing Jenkins and Rios-Avila’s (2023b) user-written Stata command, we fit the full Kapteyn and Ypma (2007) model by maximum likelihood, assuming that the sample likelihood function is a finite mixture of latent class distributions. Our analysis is done separately for each of our anthropometric measures, i.e. for weight and height.

2.1 Accounting for covariates

Following Jenkins and Rios-Avila (2020; 2021), the factor mixture model is based on unconditional distributions. However, allowing the measurement error distributions to vary across observed characteristics has the advantage of increased flexibility and can be used to assess whether the distributions of measurement errors differ across population sub-groups (Jenkins and Rios-Avila 2023b). Goodness of fit tests based on the Akaike information criterion (AIC) and Bayesian information criterion (BIC) are used to compare our factor mixture models with and without covariates.

Jenkins and Rios-Avila (2023a) extend the Kapteyn and Ypma (2007) model, to allow transformations of relevant parameters to be specified as linear indices of characteristics (\(X_{i} )\):

$$ G\left( \gamma \right) = \alpha_{\gamma } + \beta_{\gamma }^{\prime } X_{i} . $$
(3)

where for each factor mixture model parameter of interest (\(\gamma\)), \(\alpha_{\gamma }\) is a constant and \(\beta_{\gamma }\) are the slopes associated with individual-level characteristics \((X_{i}\)). The function \(G\left( \cdot \right)\) is the specific transformation function of the parameter of interest. These are the identity function for means \(\left( \mu \right)\), the logarithmic function for SDs \(\left( \sigma \right)\) and Fisher’s z transformation for correlations \(\left( \rho \right)\).

In practice, and for parsimony in the estimation of our factor mixture models with covariates, we parameterize errors in self-reported data using age, gender and region of residence. Existing studies argue that measurement error in self-reported body weight and height data depends on respondents’ characteristics (e.g. Cawley et al. 2015; Davillas and Jones 2021). Specifically, for the self-reported anthropometrics, we model the \(\mu\) and the \(\sigma\) of the imprecision error \(\left( {\eta_{i} } \right)\) and of the additional random error \(\left( {\omega_{i} } \right)\) as a function of individual characteristics (age groups, gender and region of residence); we also condition the non-classical mean-reverting error on these respondent-level characteristics.

Moreover, for the latent true body weight and height, we assume that the mean (\(\mu_{\xi }\)) varies by respondents’ age, gender and region of residence; the same covariates are also used for the SD equation (σξ). Earlier research has considered these demographics as basic correlates of obesity (e.g. Baum and Ruhm 2009; Davillas and Jones 2020). Finally, we model the distribution of measured anthropometrics without covariates, given that the protocols on physical measurements of anthropometrics collected in surveys are the same for all respondents (irrespective of gender, age and region of residence). Interviewer characteristics, for those who are responsible for the physical measurements, might be more relevant sources of measurement error in measured anthropometrics (Olbrich et al. 2022).Footnote 7 However, these are not available in our dataset.

For the factor mixture models with covariates, we calculate the estimated parameters in their natural metrics, computing the Average Predicted Margins (APMs); for each measurement model parameter of interest \(\left( \gamma \right)\), we predict \(\gamma\) for every individual in our sample using the fitted model and assuming all other covariates are at their observed values and, then, calculate the sample average of \(\gamma\) (and its associated standard error). For presentation purposes, we report how each measurement error parameter \(\left( \gamma \right)\) varies across covariates using the APMs (Jenkins and Rios-Avila 2023a,b). For example, for a gender dummy, we calculate the APMs, for males by setting all sample values of gender to male and then taking the average over the whole sample; APMs for females are calculated analogously. This allows us to test whether there are systematic gender differences in APM for each particular parameter of interest (\(\gamma )\). For comparison purposes, in addition to APMs across population groups (by gender, age groups and region of residence) we also report the corresponding APMs for all observations in the sample.

2.2 Post-estimation predictions

As a post-estimation exercise, we generate predictions of the distribution of the true latent weight and height (e.g. Meijer et al. 2012). In line with Jenkins and Rios-Avila (2023a), we employ the most reliable prediction among all the potential hybrid measures of weight and height and then calculate BMI as weight (in Kg) over the square of height (in metres). We compare the distributions of hybrid, self-reported and measured BMI. We take the estimated parameters of our mixture models, separately for the case of models with and without covariates, to create “hybrid” anthropometric predictions that combine information from both self-reported and measured anthropometrics.Footnote 8

Specifically, in line with Meijer et al. (2012), both with and without covariates, we compare a number of approaches that combine measured and self-reported data to obtain the best prediction of the “true” anthropometrics of interest. Meijer et al. (2012) begin by deriving two predictors for the case of a single latent class (as described in Table 2 for our analysis): one that minimizes the mean squared error (MSE) and one that minimizes the MSE conditional on unbiasedness. Because class membership is unobserved, Meijer et al. (2012) proposed three ways to proceed: (1) compute the within-class predictors for each class and combine them in a weighted average using the (un)conditional class probabilities for weighting; (2) predict class membership and then use the within-class predictor for the predicted class; and (3) derive predictors that minimize the total mean squared prediction error. Because either the predictor based on MSE or on MSE conditional on unbiasedness could be the within-predictor for each of the three approaches listed above, there are six potential predictors in total. Finally, a system-wide predictor minimizes MSE under the assumption of linearity and imposing the condition of unbiasedness.

As described above, following Meijer et al. (2012), seven “hybrid” measures to approximate the true body weight and height are generated in our study: (1) Weighted (unconditional), (2) Weighted (unconditional) unbiased, (3) Weighted (conditional), (4) Weighted (conditional) unbiased, (5) Two-stage, (6) Two-stage, unbiased and (7) System-wide linear. Predictions 1 to 6 use two within-class predictors for \(\xi\). The first set \(\hat{\xi }_{i}^{j}\), used for predictors 1, 3 and 5, minimize the mean square error (MSE), \(E\left[ {\left( {\xi_{i} - \xi_{i}^{j} } \right)^{2} |\xi_{i} , i \in J} \right]\). The second of set predictors,\(\hat{\xi }_{i}^{Uj}\), used for predictors 2, 4 and 6, minimize the MSE conditional on \(E\left( {\xi_{i} - \xi_{i}^{Uj} | i \in J} \right) = 0\). Predictors 1 and 2 provide weighted predictions using the unconditional within-class probabilities \(\pi_{j}\). Predictors 3 and 4 provide weighted predictions using conditional or posterior within-class probabilities \(\pi_{j} \left( {r_{i} ,s_{i} } \right)\). Predictors 5 and 6 use a two-step Bayesian classification; i.e. the predicted class membership is obtained first and, then, the class-specific predictor of the predicted class is used. Finally, the seventh predictor \(\left( {\xi_{7i} } \right)\) is the system-wide predictor that minimizes MSE under the assumption of linearity and imposing the condition of unbiasedness.

To assess the precision of those predictions, we estimate reliability statistics and the MSE.Footnote 9 These are computed with respect to the seven “hybrid” measures that come from the sample simulations for body weight and body height based on estimated parameters for the factor mixture models both with and without covariates. Simulation analysis is done using the user-written Stata command “ky_sim” (Jenkins and Rios-Avila 2023b).

We provide some further analysis to explore the implications of the measurement error in both self-reported and measured anthropometrics for empirical research on the association between obesity and health care utilization. Specifically, we compare results when each of the self-reported, measured and hybrid BMI measures is used as explanatory variables. If measurement error is non-classical, i.e. systematically associated with the measured values, it may cause bias in regression models that use anthropometrics as a regressor, even in the case where instrumental variable analysis is employed to deal with endogeneity or errors in variables (Cawley et al. 2015; O’Neill and Sweetman 2013).

3 Data

Data on self-reported and measured anthropometrics are extracted from the 2013 National Health Survey of Brazil (Pesquisa Nacional de Saúde –PNS 2013).Footnote 10 This is a cross-sectional, nationally representative dataset for all Brazilian states and geographic regions. The survey focuses on use of health care services, population health conditions and surveillance of chronic noncommunicable diseases and their associated risk factors. The PNS-2013 collects demographics and socioeconomic characteristics of all household members. For each household, a randomly selected household member aged 18 or older is chosen for their body weight and height to be measured along with self-reports of the same anthropometrics.Footnote 11 This results in a working sample of 37,335 respondents, men and non-pregnant women aged 20 or older, with valid self-reported and measured weight and height data. We focus on adults (aged 20 +) to avoid any puberty-related changes in body size.

3.1 Self-reported and measured body weight and height data

Self-reported body weight and height data are collected as part of the survey questionnaire. Measured weight and height are collected twice by a trained survey team member at the end of the questionnaire. Weight is measured by a portable digital scale, following standard measurement protocols which require that the respondents remove their shoes, heavy clothes, accessories and objects from their pockets (PNS 2013). Following common practice in the literature, when measured health data are used, we take the second measurement for weight and height for our base case analysis to reduce any potential errors in measured anthropometrics (e.g. Johnston et al. 2009; Davillas and Pudney 2017). A sensitivity analysis is done using the average of the two measures.Footnote 12

For height, a portable stadiometer is used to measure stature (PNS 2013). Measurement protocols for body height require that the respondent must remove their shoes and other accessories, if possible, and keep at least three points of the body on the posterior surface of the stadiometer (PNS 2013). International measurement protocols together with validated and daily calibrated equipment are employed for anthropometric physical measurements. These procedures are settled in partnership with the Laboratory for Nutritional Evaluation of Populations (LANPOP), part of the Public Health School in the University of São Paulo, to prevent biologically inaccurate anthropometric measurements (Szwarcwald et al. 2014).

Our analysis allows for modelling all hypothesized errors in the measured and self-reported anthropometrics as relevant to our dataset and described in detail in Table 1. Along with the unconditional factor mixture measurement error models, we also estimate models that account for a parsimonious set of covariates to explore potential differential patterns in measurement errors across population groups. Specifically, in these models, we account for the respondent’s gender, while respondents age is captured by a 6-category age group variable (20–29, 30–39, 40–49, 50–59, 60–69, and 70 or more). Region of residence is captured by a categorial variable for the five geographical regions (often called macro regions) of Brazil as defined by the Brazilian Institute of Geography and Statistics: North, Northeast, Central-West, Southeast and South.

3.2 Descriptive statistics

Figure 1 shows the histograms of the raw difference between measured and self-reported body weight and height data, as well as for BMI created from the measured and self-reported anthropometrics; a normal distribution is overlayed on each histogram. The horizontal axis is the number of units of raw reporting error; negative numbers indicate that self-reports are higher than measured values, and vice versa. The histograms would have been a single bar, with all the sample having zero reporting error, if every respondent reported the same measured and self-reported anthropometrics.

Fig. 1
figure 1

Histograms of the raw difference between measured and self-reported (measured—self-reported) anthropometrics. Note: The normal density curve is overlayed to each histogram

Overall, across the graphs for body weight, height and BMI, the distribution of the raw difference between reported and measured values deviates from the normal distribution. Specifically, there is more mass around zero for the raw difference, as shown by the histograms, as opposed to the normal distribution. Compared to the normal distribution, less mass is observed with moderate and larger raw differences for all anthropometrics, while there is more mass at very high raw differences. Finally, it seems that the distribution for the raw body height difference is more skewed.

Descriptive statistics for the self-reported and measured weight and height data as well as for BMI measures are presented in Table 3.Footnote 13 The mean self-reported weight (71.5 kg) is slightly smaller than the mean measured weight (72 kg). Mean self-reported height is 0.8 cm higher than measured height. Table 3 also shows that the mean absolute difference between the self-reported and measured data (expressed in terms of percentage of the measured values) is about 3% for body weight, 1% for height and 4.5% for the derived BMI measure.

Table 3 Descriptive statistics and (raw and absolute) difference between measured and self-reported data
Fig. 2
figure 2

Differences between measured and self-reported weight/height data by decile groups of measured anthropometrics

Existing literature argues that reporting error in body weight self-reports may be mean reverting, when compared with measured anthropometric data (Cawley et al. 2015); respondents with high (low) values of measured body weight data tend to under-report (over-report) their body weight in self-reports. To provide some preliminary evidence of this, under the assumption that measured data are not subject to measurement error (an assumption we will relax later), Fig. 2 shows the mean raw difference (measured self-reported) in body weight and height data across deciles of the measured anthropometrics. Our results for body weight show that the mean raw reporting error becomes less negative moving across the first three groups. This indicates that, on average, the self-reported weight is higher than measured weight for those with the lowest measured weight data. For the higher deciles of measured body weight, there is a progressively increasing positive raw error indicating that measured weight is higher than the self-reports, with the under-reporting becoming more evident for those with higher measured weight.

Figure 2 also displays the mean raw differences for height. There is a progressively less negative mean raw difference moving to those of higher measured height up to the 80th percentile of measured height, i.e. self-reports of height are higher than measured data on average, with the over-reporting (almost) monotonically reducing in magnitude for those with higher measured height. For the two tallest deciles, the mean raw reporting error is positive, suggesting that those of very high measured height tend to under-report their height. Overall, and despite the observed differences between weight and height, these results show that respondents with high (very high) measured body weight (height) tend to under-report, while over-reporting is evident for those of lower measured values. These summary statistics provide initial evidence on the presence of mean-reverting error in self-reported anthropometrics under the assumption that measured data are not subject to measurement error. Although this is an assumption that we relax in our factor mixture models, this motivates accounting for mean reversion (or mean divergence) in the measurement error models.

4 Results

4.1 Estimates of structural parameters: mixture model without covariates

Table 4 presents the estimates for the KY model (expressed in their natural metrics). Following Jenkins and Rios-Avila (2020), the completely labelled observations are defined as those observations with \(\left| {r_{i} - s_{i} } \right| \le \delta\). Our model presented in Table 4 assumes \(\delta = 0\), i.e. the completely labelled observations are only those with no differences between self-reported and measured values. Under this demanding requirement, given the differences in precision of the scales used for measured and the self-reported outcomes, the completely labelled cases represent just 10% and 23% of our observations for weight and height, respectively. Sensitivity analysis is also conducted to test the robustness of our results when this requirement is relaxed.

Table 4 Estimates of factor mixture model for body weight and height

Table 4 shows that the mean of latent true body weight \(\left( {\mu_{\xi } } \right)\) is 71.9 kg \(\left( {{\text{with }}\;{\text{a}}\;{\text{standard}}\;{\text{deviation}},{ }\sigma_{\xi } = 14.9} \right)\). The distribution of the latent true weight has a higher mean (by about 0.4 kg) than the mean of self-reported body weight (Table 3); the p-value for the difference in means is less than 0.01. The estimated mean of true body height is 164.5 cm \(\left( {{\text{with}}\;{\text{a}}\;{\text{standard}}\;{\text{deviation}} \sigma_{\xi } = 9.4} \right)\). This value is lower (by − 0.7 cm) than the mean of the self-reported height (Table 3).

The probability \(\left( {\pi_{r} } \right) \) that measured weight and height reflect the corresponding true values is high: 98.6% for weight and 96.7% for height. This indicates that the probability of error-prone measured body weight and height data occurs with a low, but systematically different from zero, probability \(\left( {1 - \pi_{r} } \right)\) of about 1.4% (p-value < 0.01) and 3.3% (p-value < 0.01), respectively. Error-prone measurement of body weight (reflecting the recording errors) leads to an estimated mean \(\left( {\mu_{\zeta } } \right)\) of 78.9 kg for these erroneous observations, which is 7 kg (or almost 10%) higher than the estimated mean of true weight; data recording error in measured weight is also associated with a higher standard deviation \(\left( {\sigma_{\zeta } = 19.4} \right)\) compared to the estimated true weight distribution \(\left( {\sigma_{\xi } = 14.9} \right)\). Similarly, error-prone measured body height (that is subject to potential recording error) has an estimated mean \(\left( {\mu_{\zeta } } \right)\) for the erroneous observations of 159.8 cm, which is lower than the estimated mean of the true height (by about 4.7 cm, i.e. 2.9% of the mean of the true height), as well as having a lower estimated standard deviation compared to the true height distribution (\(\sigma_{\zeta } = 8.9\) compared to \(\sigma_{\xi } = 9.4\)).

Turning to self-reported weight and height, the estimated probability \(\left( {\pi_{s} } \right) \) that the self-reported anthropometrics equal the true body weight and height (i.e. they are free from any measurement error) is, as expected given the difference in precision of the two measures, relatively low at about 10% and 24%, respectively. Table 4 shows that mean reversion \(\left( \rho \right)\) in case of both self-reported body weight and height data is small in magnitude (close to zero) although statistically significant at the 1% level. This indicates that after accounting for all other sources of measurement error in self-reported data, mean reversion seems to play a limited role. Error due to the reporting precision (precision error) in self-reported body weight and height data has mean values \(\left( {\mu_{\eta } } \right)\) of − 0.33 kg for weight and 0.4 cm for height. The estimated probability of the Case S2 type of observations, \(\left( {1 - \pi_{s} } \right)\left( {1 - \pi_{\omega } } \right)\), is about 62% for weight and 44% for height. Moreover, Table 4 shows that the probability \(\left( {1 - \pi_{s} } \right)\pi_{\omega }\) that self-reported anthropometric data contains additional measurement error, Case S3, is about 28% for self-reported weight and 31% for self-reported height.

Table 4 (Panel B) presents estimates of the membership probabilities for the six latent classes (as described in Table 2). The first latent class consists of error-free self-reported (S1) and measured (R1) anthropometric data with a probability of 10% for body weight and 23% for height. The probability that there are error-free measured anthropometrics and survey reporting error in self-reported anthropometrics is about 61% for weight and 43% for height \(\left( {\Pr \left( {R = 1,S = 2} \right)} \right). \) The probability of error-free measured anthropometrics and additional reporting error in self-reported data, corresponding to the third latent class, is 27% for weight and 30% for height. Regarding the remaining latent classes, where there are recording errors in measured anthropometrics, we find small probabilities. For instance, the probability that weight and height observations contain error in the self-reported data and recording errors in the measured anthropometrics, corresponding to the fifth latent class \(\left( {\Pr \left( {R = 2,S = 2} \right)} \right)\), is 0.9% and 1.5% for weight and height, respectively. Overall, these results indicate that although there are non-negligible recording errors in measured body weight and height data (about 7 kg and 4.7 cm difference on average as compared to true body weight and height, respectively), their probability of occurrence is small.

We conducted a sensitivity analysis, where measured body weight and height data are rounded to the nearest integer (Table 12, Appendix); this allows us to have the same scale in measured and reported data, but it masks the part of measurement error that is attributable to lack of precision in the recording of the self-reported data. There are differences in the six latent classes probabilities, reflecting the difference in the proportion of completely labelled cases \(\left( {\Pr \left( {R = 1,S = 1} \right)} \right)\). For instance, the increase in the probability of completely labelled cases as opposed to the case of our base case results (from 10% in the base case to 26.3% for the sensitivity analysis for weight; and, from 23.3% to 32.4% for height) is reflected in the reduction in the latent class probabilities for classes two and three (Table 4 vs. Table 12).

Finally, we conducted a sensitivity analysis to explore whether our results presented in Table 4 are sensitive to using the average of the two weight and height measurements to define measured anthropometrics (for the mixture models). The corresponding parameter estimates and latent class probabilities (Table 13, Appendix) are practically identical to those presented in Table 4.

4.2 Mixture model with covariates

Table 5 reports the AIC and BIC for the KY models, separately for body weight and height, with no covariates (i.e. our baseline model), and for the KY models that account for our set of covariates. Across all factor mixture models for body weight and height, those that account for covariates have lower AIC and BIC as opposed to the counterparts without covariates (baseline models). Overall, it seems that models with covariates perform better than our baseline models, suggesting that the former can be used to explore potential differential patterns in measurement error across individual characteristics.

Table 5 AIC and BIC for the factor mixture models for body weight and height: models with and without covariates

Tables 6 and 7 report the estimates of our factor mixture model with covariates for body weight and height, respectively. We report the APMs for the full sample ("all"), and for the specific groups of individuals based on the set of covariates we account for in the case of true latent body weight and height, precision error \(\left( {\eta_{i} } \right)\), additional random noise \(\left( {\omega_{i} } \right)\) and the mean-reverting error in self-reported anthropometrics (Panel A). Tables 6 and 7 also report the estimates of error probabilities and class probabilities (Panel B). Regarding the estimates of the membership probabilities for the six latent classes (Tables 6 and 7, Panel B), these are very similar to the corresponding results without covariates (Table 4).

Table 6 Estimates of factor mixture model for body weight with covariates
Table 7 Estimates of factor mixture model for body height with covariates

True anthropometrics: The estimated mean of true body weight is 71.9 and SD 13.8 ("all" estimates); as in the case of our model without covariates (Table 4), these results show that the distribution of the latent true weight has a higher mean (by about 0.4 kg) than the mean of self-reported body weight (p-value of the difference in means < 0.001). Differences across individual characteristics are as we expect. In line with existing findings (Fryar et al. 2021), men have greater average and more dispersed body weight than women; gender differences in APM are highly statistically significant (as shown by “+++” reflecting the statistical significance of the pairwise comparisons of APMs by gender). Moreover, there is an inverted U-shaped association between mean true body weight and age (Baum 2007); variations in the dispersion of the true weight distribution are also observed across age groups. We also observe systematic regional differences in mean latent weight (with “+++” in Table 6, reflecting the joint significance of between-region groups differences in APM); the APM for the mean true body weight is higher in the South and the Southeast as opposed to other regions; this confirms existing literature about higher obesity rates for these regions in Brazil (Rimes-Dias et al. 2022).

Turning to the latent true height (Table 7), the estimated mean is 164.4 (with a standard deviation \(\sigma_{\xi }\) = 6.848); in line with the corresponding values from our model without covariates (Table 4), the estimated true mean height is lower (by -0.8 cm) than the mean of self-reported height (Table 3). Higher mean true height, but also a larger dispersion in the relevant distribution, is observed for males and younger individuals (Fryar et al. 2021). Regional variations in the mean and standard deviation of the true height distribution are also evident in Table 7.

Measured anthropometrics: We find that the mean and the standard deviation of measured body weight and height (\(\mu_{\zeta } , \sigma_{\zeta } )\) are comparable to the corresponding parameters from our baseline models without covariates (Tables 6 and 7 versus Table 4), and thus, estimating models with covariates does not change the conclusions of our analysis. Note that, given absence of interviewer-level data, we do not condition these parameters on covariates when estimating the factor mixture models presented in Tables 6 and 7. Specifically, the error-prone measured body weight has a mean and standard deviation (\(\mu_{\zeta } = 78.792\), \(\sigma_{\zeta } = 18.952\)), which are higher than the corresponding values for the true weight distribution; in line with our results from our baseline model without covariates, the estimated mean of measured body weight for those cases that are subject to (intentional or unintentional) recording error is around 7 kg (or almost 10%) higher than the estimated mean of true weight.

Regarding height, the estimated mean of the measured height for those cases that are subject to error (error-prone measured body height) is around 163.0 cm; this is very close to the corresponding estimated mean from the baseline model without covariates (around 160 cm in Table 4) and confirms our baseline results, suggesting that the mean of the error-prone measured height is lower than the estimated mean of true height. In line with our baseline models, our analysis shows that errors in measured anthropometrics occur with a probability \(\left( {1 - \pi_{r} } \right)\) that is very low in magnitude (about 1.5% for body weight and height) but systematically different from zero (p-values < 0.01).

Measurement errors heterogeneity in self-reported weight: The estimated mean value of precision error in self-reported body weight \(\left( {\mu_{\eta } } \right)\) is -0.29 \(\left( {{\text{with }}\;{\text{a}}\;{\text{standard}}\;{\text{deviation}},{ }\sigma_{\eta } = 1.60} \right)\). Taking these values as a benchmark (“All” estimates, Table 6), there are systematic differences in the precision error distribution across population groups. Mean of the reporting precision error is positive for males (0.057), while it is negative for females (− 0.60); gender differences in APM are highly statistically significant (as shown by “+++” reflecting the statistical significance of the pairwise comparisons of APMs by gender). Moreover, the standard deviation of the imprecision error is higher for men. In other words, it seems that men and women have different patterns of reporting for self-reported weight; there is systematic mean upward bias for males with a higher dispersion, while a downward bias with lower dispersion is observed for females. Turning to age groups, the estimated APM of mean imprecision error is negative for all age groups, with variations in both mean imprecision error in body weight (and its dispersion) across age groups. There are systematic regional differences in mean imprecision error (as evident by the “+++” in Table 6 reflecting the joint significance of between-region groups differences in APM); APM for the mean impression error is more negative for the Southeast (− 0.347) and Northeast (− 0.292), while the South has an APM for mean imprecision error of − 0.180.

The additional random error in self-reported body weight data has a negative mean \(\left( {\mu_{\omega } } \right)\), − 0.460, and a standard deviation of \({ }\sigma_{\omega } = 4.97\). As in the case of the imprecision error, there is a gender differences, with random error being much more negative and highly significant for females (− 0.713) than males (− 0.178). There are also systematic age variations (with the joint test for between-age groups differences in APMs being statistically significant at the 1% level)—notably the APM for the mean random error is negative and (non-monotonically) increasing in absolute terms for older respondents, while it is positive and higher in (absolute) magnitude for the oldest age group (70 +); the oldest age group has the highest dispersion (APM for standard deviation is 5.7). There are negative and statistically significant APMs for mean random error for all regions and between-region differences in the mean random errors and their dispersion.

Turning to the APM for mean reversion \(\left( \rho \right)\), we confirm our results from the models without covariates suggesting the mean reversion is small in magnitude (close to zero) but statistically significant (“all” APM in Table 6). There are gender, between-age group and between-region differences in mean-reverting patterns.

Measurement error heterogeneity in self-reported height. The estimated mean and standard deviation of precision error in self-reported body height \(\left( {\mu_{\eta } ,\sigma_{\eta } } \right)\) are 0.50 and 1.92 (“All” estimates, Table 7), respectively. Taking these values as a benchmark, we observe differences in the precision error distribution across covariates (Table 7). The mean of the reporting precision error does not differ systematically by gender. The imprecision error in body height is almost monotonically increasing across age groups, with a similar pattern for the dispersion of the imprecision reporting error distribution for older age groups. There are systematic regional variations for the mean imprecision error; the APM for the mean impression error which is more positive for the South (0.70), while the Northeast has the lowest estimated APM for mean imprecision error (0.39).

The additional random error in self-reported body height has a positive mean value \(\left( {\mu_{\omega } } \right)\), 1.28, and a standard deviation of \({ }\sigma_{\omega } = 4.96\) (“All” estimates in Table 7). Mean values of the additional random noise are higher for females as opposed to males (2.34 vs. 0.09). There are also systematic age variations (as shown by “+++” in Table 7)—notably the APM for the mean random error is positive and (mostly) increasing with age; the oldest age groups (60–69 and ≥ 70) have the highest mean of the random error and the highest dispersion. Systematic regional differences in APM for mean random error as well as for the dispersion of the distribution of the errors are also evident.

As in the baseline model without covariates (Table 4), the mean reversion \(\left( \rho \right)\) for the case of self-reported height is small in magnitude (close to zero) but statistically significant (“all” APM in Table 7).

4.3 Post-estimation analysis

Table 8 shows the precision of the seven types of “hybrid” predictions for body weight, for our factor mixture models with and without covariates, using simulations with 1000 replications. The results for height are shown in Table 9. Our first measure of reliability is analogous to the slope coefficient from a (hypothetical) regression of true anthropometrics on the observed anthropometrics measure; higher values correspond to greater reliability, and a value greater than one indicates mean reversion. Reliability 2 represents the squared correlation between true anthropometrics and observed anthropometrics measure. These reliability measures should only be used to assess how close a given measure is to the relevant true value and should not be compared across model specifications. For the models of body weight and height, with and without covariates, all hybrid measures provide very large reliability coefficients. A closer look at Tables 8 and 9 shows that the smallest MSE is found for the weighted (conditional) prediction for both anthropometric measures and across models with and without covariates. This indicates that these predictors perform better, as shown by the MSE using out-of-the-sample simulations, and thus, the weighted (conditional) prediction is our preferred “hybrid” prediction for both weight and height.

Table 8 Precision of “hybrid” body weight predictions
Table 9 Precision of “hybrid” body height predictions

Simulation analysis helps to identify the preferred predictors for the latent true body weight and height. After choosing the preferred prediction, the sample data are used to compute the true latent anthropometric measures in order to calculate a BMI measure that aims to approximate the true BMI distribution. Table 10 provides descriptive statistics of this preferred “hybrid” BMI measure obtained from models with and without covariates. Table 10 presents descriptive statistics for the "corrected self-reported BMI” measure—a frequently used measure in studies that do not have access to measured anthropometric data (Cawley 2015).Footnote 14 Although the “corrected self-reported BMI” is not driven by the mixture measurement error models, adding BMI measures in Table 10 that are based on “corrective” equations allows comparisons with a popular measure used in the existing literature in the absence of measured BMI data.

Table 10 Distributions of BMI based on preferred “hybrid” anthropometric predictions, BMI based on self-reported, “corrected self-reported” and measured body weight/height data
Table 11 Linear regression models of healthcare utilization in the last 12 months on BMI measures

The “hybrid” BMI measures, with and without covariates, and the BMI based on measured data are very close both at the mean and across their distribution. It seems, however, that at the right tails, the q75 and q90 are slightly lower for the “hybrid” measures (both with and without covariates) as opposed to the BMI based on measured data; the latter reflected in the difference in inter-quantiles ranges between the “hybrid” measures and measured BMI. On the other hand, BMI values that are based on self-reported data are always lower at the mean level and across quantiles of the distribution as well as with a lower dispersion compared to both the “hybrid” measures (obtained from models with and without covariates) and the BMI based on measured data. Statistics for the “corrected self-reported BMI” depart from the self-reported BMI values; the “corrected self-reported BMI” has a higher mean and quantiles, up to median of the distribution, when compared to the “hybrid” measures, while the opposite is the case at higher quantiles (p75 and p90).

Overall, these results suggest that similar “hybrid” measures are obtained from the models with and without covariates and that these measures are very close to the BMI measure based on the measured weight/height data. On the other hand, the BMI based on self-reported data under-estimates the “true” values. This indicates that the recording error in measured anthropometrics does not translate into major differences between the “hybrid” and the measured anthropometrics as a result of their small likelihood of occurrence in our sample. Finally, the distribution of the popular “corrected self-reported BMI” does not perform well as an alternative to measured BMI or our “hybrid” BMI measures.

4.4 Implications for the association between BMI and health care utilization

In this sub-section, we provide evidence to test the sensitivity of econometric analyses where BMI is used as an explanatory variable. We compare results with the “hybrid” BMI measures, estimated from our factor mixture models, with those based on self-reported, “corrected self-reported” and measured anthropometrics.

We estimate linear regression models to measure the association between BMI and the frequency of hospital admissions in the previous 12 months (Table 11). To facilitate interpretation of specifications that use polynomials in BMI, Fig. 3 presents the adjusted predictions at representative values (APRs), i.e. the predicted health care use across selected BMI values with all the other variables kept at their mean values (based on the models presented in Table 11). As shown in Fig. 3, the APRs for health care use are practically identical for the “hybrid” measures of BMI. Although the APRs for health care use for the measured and “hybrid” BMI measures are similar across their distribution, they differ at the very high tails of the BMI distribution (above BMI values of 41.5 kg/m2). Turning to self-reported BMI, the relevant results depart from those of our “hybrid” BMI measures (and measured BMI) especially at the lower (BMI values below 23.5 kg/m2) and higher tails (BMI values above 37 kg/m2) of the BMI distribution. The APRs for health care use for the corrected self-reported BMI lie between the corresponding results for self-reported and measured anthropometrics, lying much closer to those for BMI based on measured data.

Fig. 3
figure 3

Predicted health care use across selected BMI values (based on OLS models in Table 11)

Our aim in this sub-section is not to guide whether or not “hybrid” BMI measures, based on factor mixture model specifications, should be used in empirical research on the economics of obesity. We only argue that, in the context of the analysis of the association between BMI and health care utilization, results based on our “hybrid” BMI data show some (limited) differences, compared to those obtained from measured data, at the far-right tails of the BMI distribution. Given our evidence that a small but statistically significant fraction of measured anthropometrics is attributed to recording errors, the observed differences at the far-right tails of the BMI distribution suggest that hybrid measures may offer some potential advances when targeting the far-right tails of the BMI distribution, although more research should be undertaken on this.

5 Conclusion

Existing research in the economics of obesity shows that self-reported data are subject to measurement error, which can lead to biased estimates in empirical research that relies on self-reported anthropometrics (e.g. Cawley 2015; Cawley et al. 2015; Davillas and Jones 2021; Gil and Mora 2011; O’Neill and Sweetman 2013). These analyses, however, explicitly assume that measured anthropometrics are error-free as they are treated as “gold standards” when compared to self-reported data. The literature provides little discussion of the potential measurement errors that measured anthropometrics may entail. The latter is of particular relevance given developments in large-scale social surveys that involve the integration of physical health measurements, in addition to traditional self-reported measures. To fill this gap in the literature, we use the KY factor mixture model (Kapteyn and Ypma 2007) to analyse and characterize measurement error in both self-reported and measured anthropometrics with national representative data from the 2013 National Health Survey in Brazil.

We find that a very small, but statistically significant fraction of measured anthropometrics, may contain recording errors. The estimated probability that the self-reported anthropometrics are free from any measurement error is, as expected, relatively low at about 10% and 23% for body weight and height data─these results remain robust with and without accounting for covariates in our factor mixture models. This highlights that respondent’s lack of awareness of their true anthropometrics in combination with the lack of precision of the self-reported questionnaires may be sources of the observed measurement error. For example, it has been argued that enhancing people’s knowledge of their exact anthropometric values (by monitoring interventions) may indeed improve their ability to accurately report their anthropometric values (Sherry et al. 2007).

Of particular interest, our analysis reveals that mean-reverting errors in self-reported anthropometrics are low in magnitude, after accounting for other sources of errors in self-reported data. These findings contradict the existing literature that compares self-reported with measured anthropometrics, arguing there are strong mean-reverting patterns at least for body weight (e.g. Cawley et al. 2015). However, unlike our analysis, it should be noted that most studies that compare self-reported with measured anthropometrics assume measured anthropometrics are error-free and do not account for other potential sources of measurement errors. Our study is, therefore, potentially useful for exploring the sources of measurement errors that may affect both self-reported and measured anthropometrics and the magnitude of bias that each source of error may cause.

Factor mixture models that account for covariates are used to explore the potential heterogeneity of reporting errors in self-reported as well as the true latent anthropometrics across population groups. A limitation of our analysis is the absence of interviewer-level data. This necessarily limits our factor mixture models, with no covariates accounted for in the measurement error in measured anthropometrics. If we had had the opportunity to provide insights on how measurement error in the physical measurements varies across interviewer characteristics, we would have been able to provide relevant recommendations to survey data teams to improve measurement protocols.

Latent true anthropometrics vary across age groups, by gender and across regions broadly in line with the existing literature (e.g. Arntsen et al. 2023; Baum and Ruhm 2009; Davillas and Jones 2020; Fryar et al. 2021; Rimes-Dias et al. 2022). Males have higher true mean body weight and height; mean true body weight is associated with an inverted U-shaped relationship with age; mean true height is monotonically decreasing with age, reflecting birth cohort effects and loss of height as people become older (e.g. Arntsen et al. 2023).

Overall, we find that being older is associated with higher reporting errors (due to imprecision error or other random errors) in self-reported body weight and height. These results highlight the role of age-related changes in cognitive and communicative functioning on self-reported data (Knäuper et al. 2016); it has been shown than age-related impacts on cognitive ability, question interpretation as well as memory retrieval have impacts of people’s self-reports as they become older. This is of particular relevance in the case of self-reports of body weight and height as it involves respondent’s cognitive ability, memory and ability to process information. Moreover, we find the presence of systematic gender differences in measurement error in self-reported weight and height data, with women reporting with more errors than men. These results are broadly in line with the existing literature (e.g. Cawley et al. 2015; Gil and Mora 2011). For example, it may be the case that women experience greater social stigma than men for having excess weight or that other related social norm pathways affecting reporting behaviour in anthropometrics may be the relevant underlying mechanisms (Gil and Mora 2011; Puhl and Heuer 2009; Sattler et al. 2018). The observed regional differences in self-reported measurement error may reflect cultural, socioeconomic and demographic differences across Brazilian regions; however, it is impossible to disentangle the role of particular characteristics on shaping these results given the aggregate regional variations we employ in our analysis.

To explore the practical implications of our measurement error results, post-estimation analysis and out-of-the-sample simulations are employed to estimate hybrid anthropometric predictions that best approximate the true body weight and height distribution. Our proxies of true BMI distribution are very close to the distribution of BMI based on measured anthropometrics. On the other hand, BMI based on self-reported data seems to under-estimate the true BMI distribution. “Corrected self-reported BMI” measures, based on conventional methods to mitigate reporting error in self-reports using predictions from corrective equations, do not perform as well as our “hybrid” BMI measures.

We implement analysis on the potential implications of the measurement error when different BMI measures are used as explanatory variable in econometric models on health care utilization. We find similar econometric results when they are based on measured data or on our hybrid BMI measures across most of the BMI distribution, while only small differences are observed at the very high tails of the distribution (above BMI values of 41.5 kg/m2). Differences are also observed in the econometric results based on self-reported or “corrected self-reported” data when compared to our hybrid BMI measures at the lower and higher BMI tails. Our findings further confirm existing evidence suggesting that BMI based on self-reported data may bias econometric results when BMI is used as an explanatory variable (e.g. Cawley et al. 2015), and suggest that conventional ways to correct self-reported anthropometrics may not provide mitigation.

Measured anthropometrics may encompass some systematic measurement error, but our estimates suggest a very low prevalence of errors and this is reflected in the presence of only small differences, concentrated at the right tail of the distribution, compared to our proxies of true BMI. Nevertheless, the possibility of errors in measured anthropometrics should be acknowledged when searching for an error-free adiposity measure, especially when focusing on the extreme right tail of the distribution of BMI.