Introduction

Obesity is a growing public health concern linked to a range of health problems, including diabetes, cardiovascular disease, depression, and some forms of cancer [13]. Since 1975, worldwide obesity has more than tripled, with over 650 million people living with obesity today [4]. Given the rising obesity trends, there has been an increasing demand for obesity treatments [5, 6]. Interventions to lose weight include lifestyle, medical, or surgical weight loss management, each with different risks and benefits.

To facilitate the delivery of evidence-based healthcare, a comprehensive understanding of the full range of outcomes associated with weight loss treatments is essential. Although the clinical benefits of various weight loss treatments are well established [713], the outcomes from the patient’s perspective have been poorly described [14, 15]. Weight loss intervention, particularly bariatric surgery, can be a life-changing event that significantly impacts patients’ health and well-being, but outcomes from the patient perspective are not systematically measured in a rigorous way. The collection of patient-reported outcomes (PRO) data provides a means to measure outcomes most relevant to patients undergoing weight loss treatment. The patient perspective is best measured by means of PRO measures (PROMs). In comparative effectiveness research, the use of PROMs could support an evidence-approach to selecting optimal weight loss treatments [16].

A recent systematic review of PROMs in bariatric surgery highlighted that the BODY-Q was the most rigorously developed and validated PROM for this population [17]. The BODY-Q was developed in a mixed-methods study that involved a literature review, qualitative and cognitive interviews with patients, and input from experts [18]. The BODY-Q measures outcomes related to health-related quality of life (HRQOL), appearance, and experience of healthcare [19]. The BODY-Q is comprised of a set of independently functioning scales, which provides the flexibility of adding new scales to assess additional concepts of interest as they are identified [20, 21]. A gap in the BODY-Q is the lack of scales for specific eating-related concepts that are particularly relevant to weight loss treatment. In this study, we developed and field-tested 5 new BODY-Q scales that measure the following PROs not adequately covered by existing PROMs: weight loss expectations, eating behaviors, eating-related distress, eating-related symptoms, and work life.

Methods

The new BODY-Q scales were developed using a multiphase iterative mixed-methods approach that has been described in detail elsewhere [22]. This paper describes phases I (qualitative research) and II (quantitative research).

Phase I: Qualitative Research

To develop the new BODY-Q scales, general and specific codes related to eating from the original set of 63 patient interview transcripts [23] were reexamined. A set of items covering the following concepts were created using the lowest possible grade reading level: expectations, eating behavior, eating-related distress, eating-related symptoms, and work life. For each scale, instructions, a time frame for reporting, and response options were developed.

Local research ethics board approval at the coordinating center in Canada (Hamilton Integrated Research Ethics Board) was obtained prior to starting the study. Participants provided verbal consent to participate in the study at the beginning of recorded phone interviews.

To refine the scales and to establish content validity for people with obesity, we recruited participants who indicated they were willing to continue to be involved in BODY-Q research from the original BODY-Q field-test study [18]. These participants were sent an email which included an invitation to participate in a cognitive interview to review the new BODY-Q scales. The first 10 to respond to the invitation were interviewed in September and October 2018. In addition, 7 new participants were recruited from St Joseph’s Healthcare bariatric program in Hamilton (Canada) between November 2018 and January 2019.

Cognitive interviews were conducted by an experienced qualitative interviewer. Participants were asked to provide feedback on the instructions, items, and response options, and to identify any items that were confusing or not relevant. Participants were also asked to identify any missing content. Interviews were audio-recorded, transcribed, and coded. All data related to the items, response options, and instructions was transferred to a Microsoft Excel worksheet for analysis. The scales were revised iteratively throughout the process.

Input by experts in the area of bariatric surgery and weight loss management was obtained. The experts were chosen across the 4 countries based on their extensive experience in bariatric surgery, weight loss management, bariatric nutrition, and/or PROMs development. The experts were sent the new BODY-Q scales via emailed or shown the scales in person. Expert feedback was provided on the scales’ instructions, items, and response options. Experts were also asked to provide feedback to determine if all clinically important issues from the perspective of experts were included in the scales.

After the patient and expert input phase, the new BODY-Q scales were translated into Dutch and Danish in accordance with recommended guidelines [24, 25]. The translation involved two independent forward translations, one backward translation, expert feedback, and cognitive debriefing interviews with patients undergoing weight loss treatment. Feedback from experts and patients was used to revise and finalize the Dutch and Danish translation of the scales.

Phase II: Quantitative Research

The field-test study of the new BODY-Q scales took place between June 2019 and January 2020. This study was conducted in accordance with the Handbook for Good Clinical Research Practice of the World Health Organization and the Declaration of Helsinki principles and was approved by the regional and local institutional review boards or data protection agency (Brigham and Women’s Hospital, United States (US); Medical Research Ethics Committees United, The Netherlands; Danish Data Protection Agency).

The new BODY-Q scales were included in three cohort studies with bariatric and weight loss management patients aged 18 years and older in the US (Brigham and Women’s Hospital), The Netherlands (OLVG, Amsterdam; St. Antonius Ziekenhuis, Nieuwegein; Catharina Hospital, Eindhoven), and Denmark (Odense University Hospital and Hospital of Southwest Jutland), and in a web-based survey using the online crowdworking platform Prolific Academic (www.prolific.co) that included participants from the US and Canada. After participants gave (online) consent, data were collected either face-to-face using iPads in the outpatient clinic, with data entered into a secure web-based application (Research Electronic Data Capture (REDCap);US) [26], or by email including a URL that linked directly into a secure web-based application (REDCap or Castor EDC;The Netherlands, Denmark, and Prolific) [26, 27]. The surveys included demographic questions and the 5 new BODY-Q scales. Participants were sent up to 3 email reminders, spaced by 7 days. At the end of the questionnaire, participants from the cohort studies in the US and The Netherlands were invited to participate in a test-retest study. Those who agreed were sent an email 1 week after their appointment with a URL link to the questionnaire, and up to two email reminders were sent spaced by 7 days. Branching logic was used to ensure that only participants who worked at a job with co-workers in the past 3 months completed the work life scale and that only participants who were pre-bariatric surgery, or had only their first appointment in the weight management clinic, completed the expectation scale.

Prolific pays participants at minimum $6.50 per hour to complete a survey. Participants took between 20 and 30 min to complete the survey and were rewarded $3.50 for a completed survey. Participants were informed about the payment before they agreed to participate in the study.

Data Analysis

Patient characteristics were described as the mean ± standard deviation (SD) or by percentages. Descriptive characteristics were analyzed using SPSS software (IBM SPSS Statistics, version 26.0, IBM Corp). The psychometric analysis involved Rasch measurement theory (RMT) analysis [28] conducted within RUMM 2030 software (RUMM version 2030, RUMM Lab.). The following set of statistical and graphical tests were performed to identify the best subset of items to retain in each scale: threshold maps, item fit statistics, dependency, targeting, Differential Item Functioning (DIF), and Person Separation Index (PSI) (Supplementary information, Table 1). For the item fit analysis, we amended the sample to 500 to adjust the P-values given the large sample size. The following subgroups were examined for the DIF analysis: sample (Prolific versus clinical), age (18 to 29, 30 to 39, 40 to 49, 50 to 59, 60 plus years), body mass index (BMI) classification (24.9 or less, 25.0 to 29.9, 30 to 34.9, 35.0 to 39.9, 40 plus), language (English, Danish, Dutch), sex (male, female), treatment (bariatric surgery, weight management), and bariatric surgery group (preoperative, first year, 1–2 years, 3 or more years after surgery). The Rasch logit score was used to convert the raw scores for each scale to a score ranging from 0 (worst) to 100 (best). For the test-retest (TRT) data, the intra-class correlation coefficients (ICC) were calculated using the two-way random effects model. Higher ICC values indicate greater test-retest reliability. The proportion of participants with scores at the floor and ceiling were computed. A high proportion of participants with scores at the floor and ceiling can be an indication that a scale is not comprehensive.

For concurrent validity, we included the EQ-5D-5L which has 2 parts [29]. Part 1 is a 5-item questionnaire that measures mobility, self-care, usual activities, pain/discomfort, and anxiety/depression in terms of 5 levels, with level 1 representing no problems and level 5 representing unable to/extreme problems [29]. Part 2 is the EQ-VAS which asks participants to chose a number from 0 to 100 to indicate their current health status, with higher scores indicating better health. The EQ-5D-5L has been validated for use in bariatric surgery, with adequate convergent/divergent validity. In terms of concurrent validity, we expected that the new BODY-Q scales would correlate weakly to moderately with the EQ-5D-5L and the EQ-VAS (the scales measure dissimilar constructs). We also expected that correlations between the eating behavior and eating-related distress scales would be higher than correlations between these two scales and the expectations, work life, and symptoms scales. We also tested three hypotheses. First, we hypothesized that patient characteristics (age, sex, and race) would correlate weakly with the new BODY-Q scales. Second, we hypothesized that worse scores on the new BODY-Q scales would be associated with higher BMI. Third, we hypothesized that participants who were waiting for bariatric surgery compared to participants who had undergone bariatric surgery would report lower mean scores on the eating behavior, eating-related distress, eating-related symptoms, and work life scales.

Depending on normality of the data, Pearson or Spearman correlations were used to examine associations between continuous variables, and  t-tests or one-way analysis of variance (ANOVA) or the non-parametric equivalents were used to examine differences between groups. P-values of <0.05 were considered statistically significant for these tests of construct validity.

Results

Qualitative Phase

The new BODY-Q scales were reviewed by 15 females and 2 male patients of whom 12 had undergone bariatric surgery. The mean age of participants was 48 years (range 32–62 years). The scales were also reviewed by 20 experts including 5 bariatric surgeons, 3 psychologists, 1 bariatric physician, 1 bariatric physician assistant, 1 bariatric dietician, 2 bariatric nurses, 2 plastic surgeons, 1 plastic surgery trainee, and 4 physician trainees (PhD and/or resident) from 4 countries (Canada, Denmark, The Netherlands, the US). The Supplementary Information shows sample quotes of patients during the cognitive interviews (Table 2) and the number of items reviewed by participants and experts in each round that were retained, revised, dropped, and added (Table 3). At the end of the process, the set of scales included 89 items.

Instructions and Response Options

Due to participant feedback and expert input, the instructions of the scales were slightly changed throughout the rounds. In terms of response options, the expectations scale measures how likely (very unlikely, somewhat unlikely, somewhat likely, very likely) participants thought a set of statements would apply to them; the work life scale measures agreement (definitely disagree, somewhat disagree, somewhat agree, definitely agree) to a set of statements. The remaining scales measure frequency (never, sometimes, often, always). Participants generally did not have any problems with the response options of the scales during each round. Some participants wanted to add additional response options such as a neutral option (e.g., neither agree nor disagree), but in RMT, the scoring and interpretation of neutral options are unclear [30]. Additionally, 4–5 response options are considered ideal [31].

Translation

To finalize the field-test version, two items that proved difficult to translate into Dutch and Danish were dropped. After these deletions, the field-test version of the new scales consisted of 87 items.

Quantitative Phase

For the clinical samples, the response rate varied by country, i.e., The Netherlands 62%, Denmark 59%, the US 94%. A total of 4123 participants started the survey, and 4004 (2057 Prolific and 1947 clinical) completed at least 1 scale and were included in the analysis. Table 1 shows the sample characteristics. The sample included more females (N=2743, 69%), had a mean age of 45 (SD 13), and mean BMI of 31.1 (SD 8.7).

Table 1 Patient characteristics of the field-test sample by country of recruitment

For the RMT analysis, data for the sample of 1038 underweight and normal weight participants were excluded.

The RMT analysis reduced the 87 items to 59 items across 5 scales. Items that were excluded from the scales in the item reduction phase of the RMT analysis are shown in the Supplementary Information Table 4). Items were mainly excluded due to misfit to the Rasch model. All items in 4 scales had ordered thresholds, demonstrating that respondents could  appropriately discriminate amongst response options (see example in Supplementary Information), Fig. 1). In contrast, 7 of 15 items in the expectations scale evidenced disordered thresholds. The two response options (very unlikely and somewhat unlikely) were combined to one category (unlikely) to simplify the scoring. The RMT analysis used the following rescored items: unlikely, 0; somewhat likely, 1; and very likely, 2. After rescoring, all 15 items had ordered thresholds.

The item fit statistics provided evidence of validity for the 5 scales (Supplementary Information, Table 5). For each scale except the work life scale, the observed data fit the Rasch model, with non-significant overall model fit (Table 2). For item fit statistics, all 59 items had non-significant chi-square P-values after Bonferroni adjustment, and item fit was inside the criteria of ±2.5 for 22 of the 59 items (Supplementary Information, Table 5). The item residual correlations were greater than 0.20 for 11 pairs of items within 4 scales, indicating some degree of dependency. However, in subtest analyses, the correlated items were found to have marginal influence on the reliability of the scales (Table 2).

Table 2 Scale level results

In terms of targeting, the percentage of participants who scored within the scales’ range of measurement was from 76% for the eating-related distress scale to 99% for the eating behavior scale (Table 2). The Supplementary Information (Fig. 1) shows an example of targeting for the eating behavior scale. The distribution of person measurement and item locations showed that patients with obesity, undergoing medical weight loss, or weight loss surgery were evenly distributed to match all levels of the construct that is measured for each scale, which is reflected by the mirrored distribution of the persons (top half of the graph) and item locations (lower half of the graph) of the scales.

The Supplementary Information (Table 6) shows the items that showed DIF by the relevant patient characteristic in the adjusted and unadjusted analysis. When the items that evidenced DIF were split by the relevant patient characteristic, Pearson correlations between the original and split person locations were 0.995 or higher showing negligible impact on scoring. This indicates that the total scores on the 5 scales can be compared between subgroups of participants with different patient characteristics.

The RMT analysis provided evidence of reliability for the 5 scales, with PSI values with and without extremes ≥0.70 and Cronbach alpha values ≥ 0.81 (Table 2). For the TRT study, the response rate was 70% in The Netherlands and 43% in the US. A total of 303 participants completed a TRT 1 week after the initial assessment. For TRT, the ICC values, shown in Table 2, were sufficient, with values ≥ 0.79 for all 5 scales.

After transforming the scores from 0 to 100, the percentage of participants who scored on the floor (lowest score)/ceiling (highest score) for each scale were as follows: expectations = 1.3/17.0, eating behavior = 0.1/1.3, eating-related distress = 0.2/26.4, eating-related symptoms = 11.1/1.3, and work life = 0.3/21.8.

The results for concurrent validity are shown in Table 3. The expectations, eating behavior, and eating-related symptoms scales correlated weakly with the EQ-5D-5L and the EQ-VAS. The eating-related distress scale correlated moderately with anxiety/depression and the EQ-VAS, and the work life scale correlated moderately with the EQ-5D-5L items (exception self-care) and the EQ-VAS. The scales measuring eating behavior and eating-related distress correlated more strongly with each other than with scales with unrelated constructs (expectations, work life, and eating-related symptoms). The expectations scale only correlated with the eating behavior scale. The majority of the results were in concordance with the three hypotheses. Patient characteristics (age, sex, and race) correlated weakly with the new BODY-Q scales. Four of the new BODY-Q scales (exception expectations) correlated with BMI, with the strongest correlation between the eating-related distress scale and BMI (r= −0.249, P < 0.001) (Table 3). Mean scores differed significantly across BMI classification group for all scales but the expectations scale, with lower (worst) scores for participants with class 1 to 3 obesity (P < 0.001 on ANOVA) (Fig. 1). Only the eating behavior scale correlated with time since surgery (r=−0.141, P = 0.01). Mean scores differed for the bariatric surgery group, with the lowest scores for preoperative patients (P < 0.001 on ANOVA). Only for the eating-related symptoms scale, the preoperative patients, who have not had bariatric surgery, scored better.

Table 3 Correlations between BODY-Q scales, EQ-5D-5L, and patient characteristics
Fig. 1
figure 1

Mean scores for the 5 new BODY-Q scales by BMI classification

Discussion

The new BODY-Q scales further augment the range of concepts important to patients covered by the BODY-Q. These new scales were developed using existing qualitative interview data from the original BODY-Q and with new input from patients and experts to ensure that the content of the scales was comprehensive and relevant to patients seeking or undergoing weight loss interventions. A modern psychometric approach, RMT analysis, was used to refine the content of the new BODY-Q scales while staying grounded in the experience of patients and experts. The detailed RMT analysis provided evidence of validity and reliability of the new BODY-Q scales in a large international, multilingual sample. Different types of samples were included as well as different languages to increase generalizability of the new scales for international uptake. Importantly, we found that the scales worked the same across patients who differed by sample type, age, sex, and language. The scales were appropriately targeted to patients with obesity or undergoing weight loss treatment. The new BODY-Q scales can be used in national and international comparative effectiveness studies of various weight loss interventions.

Understanding patients’ expectations of bariatric surgery is critically important to ensuring effective preoperative informed consent [32]. Failing to establish realistic expectations can result in inappropriate patient selection and significant postoperative distress. The expectations scale of the BODY-Q directly assesses this important metric. Realistic patient expectations have been found to be important for better postoperative satisfaction and HRQOL outcomes [33, 34]. An improved understanding of the expectations of patients may help clinicians better educate and select patients who are most appropriate for weight loss treatment [35]. Future research is needed to examine how patients’ expectations of weight loss treatment are related to outcomes after treatment.

Eating plays a critical role in weight loss treatment. One of the main goals of weight loss treatment is to achieve better eating habits [3639]. For example, bariatric surgery alters the patient’s anatomy to encourage smaller portion meals. Poor postoperative eating habits place patients at risk for weight regain [38, 40, 41]. Thus, identifying predictors for poor eating behaviors may be valuable in the preoperative setting to identify adequate bariatric surgery candidates. Additionally, patients often have strong emotions attached to eating. Achieving better control of eating may lead to positive feelings, while failing to change eating habits may lead to negative feelings, such as frustration [39, 40]. Other constructs associated with eating are also challenged after bariatric surgery. Some patients can experience unpleasant side effects following bariatric surgery, including nausea, vomiting, and diarrhea [36]. The new eating scales of the BODY-Q provide a means of measuring a range of eating behaviors, eating-related distress, as well as symptoms associated with eating. These scales are the first independently functioning scales specifically targeted to patients living with obesity and to patients undergoing weight loss treatment. The scales allow for quantitative measurements of important eating-related outcomes and can be used in research and individual clinical care where the collection and use of PRO data for weight loss treatment would be informative. Further application of these scales may give rise to better understanding of differences in eating behavior, eating-related distress, and eating-related problems following different surgical procedures, which can be used to inform decision-making.

Living with obesity can negatively impact one’s function at work, which can compromise one’s overall HRQOL. Patients living with obesity may feel stigmatized at work and have fewer opportunities than their colleagues, and improving work opportunities is an important factor to seek weight loss treatment [36, 42]. Patients who underwent bariatric surgery endorsed improvement of work opportunities, effectiveness at work, and increased recognition by their colleagues [4347]. The BODY-Q work life scale is a rigorously developed PROM that assesses these obesity-associated work life experiences that are closely related with health-related quality of life.

There were some limitations to this study. First, we used an online platform (Prolific) as part of our research study to recruit participants. While the advantage of Prolific is that a large sample of participants can be recruited in a short period of time, a disadvantage is that the sample included more Caucasian participants than the US clinical sample, and the number of Canadian participants was limited. In addition, although we collected BMI information for the Prolific sample, questions about bariatric surgery or other weight loss treatments and waist circumference or body composition were not included in the survey. Furthermore, we did not collect data on medication that could have affected for example appetite or mood. Second, given that smaller numbers of participants were pre-bariatric, we did not have enough data to examine DIF and test-retest reliability for the expectations scale. Third, the study data were cross-sectional, which is appropriate for PROM development. However, longitudinal studies using the new scales are necessary to assess ability to measure change and to determine minimal important difference.

Conclusion

The new BODY-Q scales enable rigorous assessment of outcomes that are particularly relevant to patients who live with obesity or undergo weight loss treatment. These scales equip clinicians and researchers with measurement instruments that can promote high-quality, patient-centered care.