Introduction

Overweight and obesity increase the risk of developing type 2 diabetes [1], cardiovascular disease [2], cancer [3], among other conditions. Despite efforts to lose and maintain weight loss, weight regain after considerable weight loss is often reported [4,5,6]. Modifiable energy-balance behaviours, such as a healthy diet and physical activity, are crucial for weight loss and weight maintenance [7, 8].

Behavioural interventions focusing on diet and physical activity show beneficial effects for at-risk populations [9, 10] and there is now a reasonably robust evidence-base for the effectiveness of theory-based behavioural change interventions, focusing on individual and social factors [11,12,13] (for a review: [14]). Recent systematic reviews have reported evidence that increasing levels of autonomous motivation, self-regulation skills and self-efficacy mediated long-term weight management in lifestyle interventions (e.g. [15, 16]) and interventions based on self-determination theory (SDT) are designed to include these components. SDT literature seems to give promising guidance to the explanation of long-lasting health-related behaviours [17,18,19], including those relevant for weight management [20] and long-term exercise among women [21].

Goal content from the perspective of self-determination theory

Self-determination theory [22] posits that humans have three innate psychological needs that are required for psychological growth: the need for autonomy, or the experience of authorship, volition and integrity of one's actions; the need for competence or the feeling of being effective in responding to challenges; and the need for relatedness, or the sense of belonging and being cared for. To nourish psychological needs, need-supportive social environments are crucial when striving for behaviour change. However, different health goals provide different motivational energy towards health behaviours, such as physical activity or healthy eating.

In goal content theory (a sub-theory in SDT), these behavioural goals are therefore defined as intrinsic or extrinsic based on the nature of their content (e.g., to improve health vs to improve image [23, 24]) and each kind of goal may relate differently to the satisfaction of the basic psychological needs (e.g., [25]). For instance, intrinsic goals are developed from within the individual and are more likely to satisfy psychological needs, for example “to seek out novelty and challenges, to extend and exercise one’s capacity to explore and to learn” [22 (pp 70)]. In contrast, extrinsic goals have an outward instrumental orientation typically focussed on attaining external indicators of worth and are not related to or might undermine psychological need satisfaction (e.g., wealth, possessions, appearance) [26]. The definition of intrinsic and extrinsic goals (the “what” of the behaviour) is conceptually different to the behavioural regulation that motivates people towards that goal which can vary from being highly autonomous to highly controlled (the “why” of the behaviour; for a review see [22, 27]).

The measurement of goal content [intrinsic vs. extrinsic] is critical to understand the relationships between goals, behavioural regulations, health behaviours and long-term psychological well-being [28]. The content of people’s goals when pursuing weight loss maintenance is an important but understudied area related to the maintenance of energy-balance behaviours over time (i.e. physical activity and healthy eating). Some advancements have been made in the development of self-reported measures to assess goal content. For example, the Goal Content for Exercise Questionnaire [29] evaluates three intrinsic (i.e., skill development, social affiliation and health management) and two extrinsic (i.e., social recognition and image) exercise goals and previous work reported positive association between intrinsic goals and exercise engagement and psychological well-being (e.g., [27]).

Furthermore, we aimed at confirming measurement invariance and cross-cultural generalizability. According to Sue [30], cross-cultural comparisons that allow to test the equivalence of the results of a psychometric instrument is a fundamental approach to test the cross-cultural applicability of theories and models. To test measurement invariance is key when using psychological measures in group comparison, such as countries with different languages and cultural background, as it exposes the degree to which items response patterns are preserved and maintain their meaning across all the tested groups [31, 32]. Simply put, using a non-invariant measure to compare different groups is worthless because the instrument will perform differently across groups. This way, one may misinterpret mean differences [33] that will have a detrimental impact when testing the efficacy of interventional trials with large multi-country samples. When measurement invariance is not met, it may reflect differential functioning of the measure across the different groups rather than the intended differences on the construct that is being measured [31]. Following this argument, because women are generally more concerned with weight loss maintenance-related outcomes than men (e.g., body dissatisfaction) [34], we also investigated model measurement invariance across gender.

Within the context of a large European Trial (the NoHoW H2020 Trial), based on SDT theoretical framework, we assessed the conceptual structure of the motives for weight loss maintenance to understand if the intervention was effective in helping participants in this transition from extrinsic motives to intrinsic ones. The understanding of the motivational role of peoples’ goals for weight loss maintenance may help the development of more effective interventions. From an SDT perspective, the present study developed an adapted instrument to assess the content of people’s goals for maintaining weight loss.

Aims

The objectives of this study were to first develop the Goal Content for Weight Maintenance Scale (GCWMS), and second examine its psychometric properties by testing: (1) the factorial validity; (2) the measurement invariance across groups by comparing the stability of the factorial model in a large and diverse population of adults who have lost weight from three European countries—Denmark, Portugal, United Kingdom; (3) the internal consistency and capacity to discriminate the intrinsic and extrinsic goals as defined in SDT; and (4) the convergent and external validity by comparing with validated measures of behavioural regulation for exercise and eating.

Methods

Participants

The data used for this validation procedure were collected in the context of the baseline data from a European Commission funded intervention project: the NoHoW Trial (trial registration number ISRCTN88405328). The NoHoW project is testing a digital toolkit to support weight maintenance in 2 × 2 factorial randomised controlled trial in 1627 European adults (M age = 44.01 ± 11.86; 68.7% women) from Denmark, Portugal and the United Kingdom (UK), who achieved 5% weight loss, in the 12 months prior trial enrolment (for more information about this project see [35]).

Following Hair et al. guidance [36], we chose to include only individuals with complete data to use the largest sample size possible without imputing values (73 individuals from 1627 were excluded). Multivariate outliers were detected through Mahalanobis distance and 43 observations reporting a relatively high \(D_{{\text{M}}}^{2}\) with a p value below 0.001 were removed [37]. After data cleaning, the sample available for this study was 1511 individuals (M age = 44; SD = 11.9; 68.3% women; for complete information on descriptive statistics of the final sample after missing data and outlier removal please refer to supplementary file 1 [https://osf.io/vfmgq/). Approximately 33.7% were from Portugal (N = 509; M age = 40, SD = 9.7; 44.8% women), 34.1% from the UK (N = 516; M age = 44.7, SD = 13; 79.3% women), and 32.2% from Denmark (N = 486; M age = 47.4, SD = 11.4; 81.3% women). Full details on demographics per country available on supplementary file 1 (https://osf.io/vfmgq/).

Scale development and procedure

The GCWMS’ items (see full original scale in supplementary file 2: https://osf.io/5n9h2/) were adapted from two well-validated and commonly used scales to assess goal content (Goal Content for Exercise Questionnaire [GCEQ] [29]) and motives (Exercise Motives Inventory-2 [38]) for exercise. The original GCEQ was adapted to the weight loss maintenance framework by changing the stem (i.e., “I exercise to…” to “I manage my weight to….”) and wording of the behaviour expressed in each item (i.e., “exerciser” to “healthy person”). However, the original 4 items tapping Skill Development are exercise specific and could not be adapted to weight loss maintenance. As such, 4 items focusing on challenge and skill development (i.e., “To give me goals to work towards”) were used from the Exercise Motives Inventory-2 [38]. The initial item pool comprised 16 items addressing four types of goals that people may have to maintain their weight loss in a non-orthogonal model: Challenge and Health Management for intrinsic goals, and Image and Social Recognition for extrinsic goals.

These changes were conducted by a panel of specialists in the fields of psychology, psychometrics, obesity, and behaviour change. All items and instructions were then translated to Portuguese and Danish, and back-translated to English by an external company (Ipsos Mori) and then revised by the panel of specialists to ensure that content and meaning were preserved. The original GCEQ response format (1-to-7-point response scale, from strongly disagree to strongly agree), was used to allow participants to indicate the extent to which each item was important to them.

Data collection

Responses were taken via the Qualtrics™ online platform as part of the NoHoW trial; all participants provided signed, informed consent. Ethical approval was given by institutions involved in the study (Universities of Leeds [17–0082; 27 February 2017], Lisbon [17/2016; 20 February 2017] and Capital Region of Denmark [H-16030495, 8 March 2017]). All data were anonymised, and subjects were given a unique identification code. Information concerning data handling is available elsewhere [35].

Data analysis

We conducted confirmatory factor analysis procedures using the maximum likelihood estimation method to test validity and invariance of factorial structure within and across country and gender. More specifically, we intended with this study to (1) test for the factorial validity of the GCWMS for an European sample, (2) to cross-validate findings across a second independent European sample, and (3) to test for invariant factorial measurement and structure across gender (women/men) and country (Portugal/Denmark/UK) with a larger sample. To achieve this purpose, we analysed factorial adjustment indices, internal reliability, construct validity, and construct reliability. To test the factorial structure of GCWMS we cross-validated the modified model with split-sample data in a three-stage process. First, a preliminary CFA was done using a randomly selected sub-sample of approximately half of the subjects of the Portuguese data set to test the factorial validity of the four-factor model regarding weight loss maintenance, because the original factorial structure was conceived in a different behavioural domain—exercise [29]. The Portuguese sample was used as it presented the most balanced number of men and women (44.8% women). Second, the second half of the Portuguese sample comprising the remaining participants was used to test independently the hypothesised changes to the model. Third, the fit of the final model was examined in the full sample. Because large samples are susceptible to multivariate outliers, we first inspected the data for distribution of normality (kurtosis; skewness) and checked potential multivariate outliers and Mardia’s coefficient for multivariate kurtosis (see supplementary file 3 for AMOS outputs: https://osf.io/hs5aj/). Because the value was superior to the multivariate normality recommended cut-off (> 5.0) [39], we proceeded with Bollen–Stine bootstrap with 2000 samples for all analysis [40] and Spearman’s rho as the correlation coefficient to test the strength of association between scale factors.

Confirmatory factor analysis was conducted using AMOS Version 25 [41]. Kline’s references [37] were used to analyse univariate normality (skew index ≤ 3.0; kurtosis index ≤ 10.0). Good model fit cut-off values adopted are those proposed by Hair et al. [42] and Schumacker and Lomax [43] (samples with N > 250 and number of observed variables between 12 and 30): Chi-square test of model fit (χ2/df) values less than 5 reflects a good model fit; comparative fit index (CFI) values close to 0.90 or 0.95 reflects a good model fit; goodness-of-fit index (GFI) values close to 0.90 or 0.95 indicates a good model fit; standardized root mean square residual (SRMR) and root mean square error of approximation (RMSEA) values below 0.05 indicate good fit and values between 0.05 to 0.08 with a CFI of 0.92 or higher indicate close model fit. Convergent validity was established by the average variance extracted (AVE ≥ 0.50; [42]) and internal consistency was assessed through Cronbach’s alpha (α ≥ 0.70) and composite reliability (CR ≥ 0.70) [44]. We also evaluated the modification indices to identify potential model specifications (MI < 10) [39].

To test the degree of measurement invariance of the scale by gender and country, we conducted the commonly used likelihood ratio test (differences in Chi-square between two nested models); however, it is sensitive to sample size [37] and change in CFI was the primary measure of invariance (models are equivalent when ΔCFI ≤ 0.01) [29, 39, 45]. Further, we checked for changes in ΔSRMR (≤ 0.030 for metric invariance; ≤ 0.010 for scalar or residual invariance) and ΔRMSEA (≤ 0.015 for metric invariance, scalar and residual invariance) following Chen’s guidelines [46]. We employed a sequential model testing approach where increasingly constrained models were specified and compared (i.e., model estimated freely across all groups simultaneously) to evaluate (1) configural invariance (Model 1; i.e., whether items were associated with the same constructs between groups); (2) metric invariance (Model 2; i.e., to test equivalence of the item loadings on the factors); and (3) scalar invariance (Model 3; i.e., to test equivalence of item intercepts, for metric invariant items); and (4) residual invariance (Model 4; i.e., to test equivalence of item residuals of metric and scalar invariant items) [47].

To examine external convergent validity, two other questionnaires were used to compare the potential correlations of goal content scores with measures of behavioural regulation: Behaviour Regulations for Exercise Questionnaire 3 (BREQ3 [48, 49]) and Regulations for Eating Behaviour Scale (REBS [50]). The BREQ3 was originally developed to assess six different behavioural regulations for the exercise domain, as conceptualized in SDT. The questionnaire contains 24 items using a 1-to-7-point response scale (strongly disagree to strongly agree) which measures amotivation, external regulation, introjected regulation, identified regulation, integrated regulation, and intrinsic motivation. The same six behavioural regulations for eating behaviour were assessed using the 18-item REBS with the same 1-to-7-point response scale. The BREQ3 and REBS subscale scores were aggregated to form scores for the second order latent factors of autonomous and controlled motivation for exercise and for eating behaviours, as proposed in SDT, that were subsequently used in the correlational analysis with the GCWMS factors. All BREQ3 and REBS latent factors displayed good internal consistency (BREQ3: controlled motivation a = 0.761; autonomous motivation a = 0.948; REBS: controlled motivation a = 0.742; autonomous motivation a = 0.893). Based on SDT literature, we anticipated that intrinsic goals (Challenge and Health Management) would be more positively correlated with autonomous motivation for healthy eating and exercise than with controlled motivation, and extrinsic goals (Image and Social Recognition) would be more positively correlated with controlled motivation for healthy eating and exercise than with autonomous motivation.

Results

Confirmatory factor analysis

Following the recommended procedures for factorial validation [51], the first factorial analysis was conducted with a small sub-sample randomly extracted from the NoHoW Trial database. To reduce measurement errors, we extracted only participants from one country. The Portuguese sample was chosen because it had the most balanced number of participants from both genders (N = 509; M age = 40 years, SD = 9.7 years; 44.8% women). Two samples were extracted: random sample 1 (N = 260; M age = 39.9 years, SD years = 9.4; 44.2% women) and random sample 2 (N = 249; M age = 40.1 years, SD 10.0 years; 45.4% women) (full details on demographics available on supplementary file 1: https://osf.io/vfmgq/).

The original model (see model in supplementary file 3: https://osf.io/hs5aj/) was first tested in the random sample 1 and presented poor fit to the data: χ2(98) = 455.310; p < 0.001; χ2/df = 4.646; CFI = 0.857; GFI = 0.822; SRMR = 0.117; RMSEA = 0.119 (LL = 0.108; UL = 0.130). Modification indices for the regression weights revealed that Item 10 (“To be slim so to look attractive to others”) cross-loaded on Social Recognition (Item10 ← Social R.; MI = 58.439), suggesting that the re-specification of the model by associating Item 10 also to the Social Recognition factor would improve model fit. Modification indices also showed evidence of misspecification associated with errors variances of Item 9 (“To improve my overall health”) and Item 1 (“To increase my resistance to illness and disease”) that could reflect some degree of overlap in item content [39], as both belong to Health Management factor (err1 ↔ err9; MI = 27.203).

To proceed with a second analysis, we included two additional parameters: (1) the error covariance parameter between Item 9 and Item 1, and (2) specification of a cross-loading path that allowed Item 10 to load also on Social Recognition factor. When accounting for these changes in the model a review of the goodness-of-fit indices revealed better model fit (χ2[96] = 318.359; p < 0.001; χ2/df = 3.316; CFI = 0.911; GFI = 0.871; SRMR = 0.098; RMSEA = 0.095 [LL = 0.083; UL = 0.106]), still with room for improvement based on modification indices. There was evidence of item content overlap as suggested by the covariation of the error variance from Item 16 (“To measure myself against personal standards”), originally belonging to the Challenge intrinsic factor, with the error variance of the extrinsic items 14 (err16 ↔ err14; MI = 22.733) and 15 (err16 ↔ err15; MI = 30.431). There was also cross-loading of Item 16 on two theoretically incongruent extrinsic factors (Item16 ← Social R.; MI = 22.446; Item16 ← Image; MI = 39.853). Further, this item revealed a low factor loading (λ = 0.47). Finally, Item 3 (“I manage my weight to be well thought of by others”) cross-loaded on Image factor (Item3 ← Image MI = 14.146).

A second iteration was specified removing Item 16 and by specifying an additional parameter to allow the Item 3 to load also on Image factor. With the suggested changes (see each updated model iteration in supplementary file 3: https://osf.io/hs5aj/), we achieved an acceptable model fit with no further justification for additional specifications: χ2(81) = 194.619; p < 0.001; χ2/df = 2.403; CFI = 0.952; GFI = 0.917; SRMR = 0.070; RMSEA = 0.074 (LL = 0.060; UL = 0.087).

To confirm model adjustments, a second and independent analysis was conducted with the remaining half of the sample of the selected country (N = 260; M age = 39.90; SD = 9.37; 44% women). The analysis in the new sub-sample revealed a marginal fit of the new hypothesized model (χ2(81) = 216.811; p < 0.001; χ2/df = 2.677; CFI = 0.937; GFI = 0.897; SRMR = 0.065; RMSEA = 0.082 [LL = 0.069; UL = 0.096]). Inspection of the modification indices indicated misspecification associated with error variances related to Items 3 and 10. Despite these items are both extrinsic, they were originally hypothesized as belonging to different factors. Therefore, we decided to delete Item 3 and 10 due to recurrent cross-loading issues. The final model (see updated model iteration in supplementary file 3: https://osf.io/hs5aj/) presented reasonable fit to the data: (χ2(58) = 145.031; p < 0.001; χ2/df = 2.501; CFI = 0.949; GFI = 0.921; SRMR = 0.069; RMSEA = 0.078 [LL = 0.062; UL = 0.094]). Given the non-threatening modification indices, we saw no rational need for further model specification considering this the most parsimonious model to represent the data.

For the final step of factorial validity inspection (Table 1), we conducted a CFA on the final model (Fig. 1) with the full sample of 1511 participants.

Table 1 Goodness-of-fit indices of the CFA model iteration process across samples
Fig. 1
figure 1

GCWMS refined model—confirmatory factor analysis. Social R. social recognition factor, Health M. health management factor

The new factorial structure presented acceptable model fit to the data: χ2(58) = 599.982; p < 0.001; χ2/df = 10.345; CFI = 0.940; GFI = 0.941; SRMR = 0.063; RMSEA = 0.079 (LL = 0.073; UL = 0.084). All items present meaningful factor loadings (Table 2; λ ≥ 0.5).

Table 2 Confirmatory factor analysis factor loadings

Reliability and validity

Composite reliability and convergent validity indices are described in Table 3. The Health Management factor demonstrated convergent factorial validity with values slightly below the recommended level (average variance extracted < 0.50) [36]. The intrinsic goals (Health Management and Challenge) showed significant moderate positive inter-correlation (rho = 0.37; p < 0.001) as did the two extrinsic goals, (Image and Social Recognition; rho = 0.47, p < 0.001). All other correlations are weak or null in accordance with theoretical predictions.

Table 3 Cronbach’s alpha [diagonal], spearman’s correlations [below diagonal], and factorial validity outcomes

Discriminant validity was examined through the associations between each goal and the conceptually related constructs from other motivational based scales (Table 4). In line with theoretical assumptions, the Health Management factor presented weak to moderate positive correlations with the autonomous motivation (BREQ rho = 0.29; p < 0.001; REBS rho = 37; p < 0.001) and the two extrinsic goal factors were weakly to moderately positively associated with controlled motivation assessed by BREQ (Image rho = 0.22; p < 0.001; Social Recognition rho = 0.41; p < 0.001) and REBS (Image rho = 0.27; p < 0.001; Social Recognition rho = 0.41; p < 0.001). The Challenge factor presented weak correlations both with autonomous and controlled motivation, albeit the latter presented higher values than expected (for full information please refer to supplementary file 4: https://osf.io/t5pkb/).

Table 4 Correlations between weight maintenance goal content and autonomous and controlled motivation

Multi-group invariance—gender

The measurement model invariance analysis results are presented in Table 5. The overall good fit of multi-group model of GCWMS (χ2/df = 5.363; CFI = 0.944; GFI = 0.940; SRMR = 0.063; RMSEA = 0.054 [LL = 0.050; UL = 0.058]) confirms goodness-of-fit for the configural model across participant gender (i.e., the same number of factors are present in each group and are explained by the same set of items). For metric invariance, goodness-of-fit results also showed evidence of well-fitting model with factor loadings specified to be invariant (χ2/df = 5.184; CFI = 0.942; SRMR = 0.062; RMSEA = 0.053). The difference in the Chi-square was significant but changes in fit indices were all below the suggested cut-off values (ΔCFI = − 0.002; ΔSRMR = − 0.001; ΔRMSEA = 0.001), supporting the conclusion that factor loadings were operating similarly across men and women.

Table 5 Scale invariance analysis showing fit statistics for the unconstrained model versus the constrained models [country and gender]

Next, we tested scalar invariance with variable intercepts and factorial loadings constrained to be equivalent across groups (Model 3). This model presented good fit to the data (χ2/df = 4.914; CFI = 0.942; SRMR = 0.062; RMSEA = 0.051). The test of Chi-square-difference between models was significant but the fit indices differences met the cut-off criteria for suggested fit indices (ΔCFI = 0.000; ΔSRMR = 0.000; ΔRMSEA = − 0.002). Using the delta fit indices model comparison criteria, we are confident in asserting that GCWMS achieved scalar invariance for gender comparison, confirming that there is no differential scoring on each factor’s items between men and women.

Final step was to test for residual invariance by restraining the item residuals to be equivalent between groups. The overall fit indices suggested acceptable fit to the data (χ2/df = 4.985; CFI = 0.935; SRMR = 0.062; RMSEA = 0.051) and differences when comparing to the scalar invariance model (Model 3) fit indices were all below the proposed cut-off for residual invariance (ΔCFI = − 0.007; ΔSRMR = 0.000; ΔRMSEA = 0.000). In line with proposed procedures and guidelines [32, 45, 47, 52], we have statistical evidence to assume that the GCWMS final model achieved full factorial invariance regarding gender.

Multi-group invariance—country

The individual country model fit indices and cross-cultural measurement invariance models’ results are also compared in Table 5. Despite model fit for Denmark and for UK presented marginal fit, configural invariance (Model 1) presented good fit to the data for all country’s comparisons (Portugal–UK: χ2/df = 4.583; CFI = 0.937; SRMR = 0.067; RMSEA = 0.059; Portugal–Denmark: χ2/df = 4.272; CFI = 0.941; SRMR = 0.075; RMSEA = 0.057; UK—Denmark: χ2/df = 4.745; CFI = 0.926; SRMR = 0.075; RMSEA = 0.061) supporting evidence for the same organization of the constructs for all three countries. Metric invariance indices presented acceptable model fit according to goodness-of-fit criteria (Portugal–UK: χ2/df = 4.559; CFI = 0.933; SRMR = 0.071; RMSEA = 0.059; Portugal–Denmark: χ2/df = 4.282; CFI = 0.936; SRMR = 0.078; RMSEA = 0.057; UK–Denmark: χ2/df = 4.614; CFI = 0.923; SRMR = 0.078; RMSEA = 0.060). Albeit, the differences in all Chi-square tests were significant, changes in CFI, SRMR, and RMSEA are below the assumed cut-off values (Portugal–UK: ΔCFI = − 0.004; ΔSRMR = 0.004; ΔRMSEA = 0.000; Portugal–Denmark: ΔCFI = − 0.005; ΔSRMR = 0.003; ΔRMSEA = 0.000; UK–Denmark: ΔCFI = − 0.003; ΔSRMR = 0.003; ΔRMSEA = − 0.001) and following Cheung’s and Rensvold’s [45, 52] guidelines, there was statistical support for the assumption that GCWMS factors have the same meaning across the evaluated countries.

After restricting variable intercepts and factorial loadings concurrently to test scalar invariance, the multi-group models presented marginal acceptable fit to the data for all country’s comparisons (Portugal–UK: χ2/df = 4.380; CFI = 0.931; SRMR = 0.076; RMSEA = 0.057; Portugal–Denmark: χ2/df = 4.157; CFI = 0.934; SRMR = 0.083; RMSEA = 0.056; UK–Denmark: χ2/df = 4.522; CFI = 0.919; SRMR = 0.087; RMSEA = 0.059). The tests of Chi-square-difference between models were all significant but the fit indices differences between Model 3 (scalar invariance) and Model 2 (metric invariance) were all below the suggested cut-off values (Portugal–UK: ΔCFI = − 0.002; ΔSRMR = 0.005; ΔRMSEA = − 0.002; Portugal–Denmark: ΔCFI = − 0.002; ΔSRMR = 0.005; ΔRMSEA = − 0.001; UK–Denmark: ΔCFI = − 0.006; ΔSRMR = 0.001; ΔRMSEA = 0.000). Following recommended procedures and guidelines [32, 45, 47, 52], we assume that constraining the item intercepts among groups did not significantly affect the model fit and thus supporting evidence for scalar invariance (i.e., mean differences in the hypothesized constructs encapsulate all mean differences in shared variance of the items).

Lastly, residual invariance was tested (Model 4) and presented a poorer fit to the data for all country’s comparisons (Portugal–UK: χ2/df = 5.286; CFI = 0.903; SRMR = 0.081; RMSEA = 0.065; Portugal–Denmark: χ2/df = 5.781; CFI = 0.889; SRMR = 0.079; RMSEA = 0.069; UK–Denmark: χ2/df = 4.432; CFI = 0.913; SRMR = 0.088; RMSEA = 0.059). Also, chi-square test differences were significant and CFI difference to the previous model (Model 3) surpassed the cut-off value proposed (Portugal–UK: ΔCFI = − 0.028; Portugal–Denmark: ΔCFI = − 0.045). We have no robust statistical argument to defend residual invariance of the GCWMS.

Discussion

This study reported the development and factorial validation of the Goal Content for Weight Maintenance Scale. After model re-specifications following the rationale supporting the development of this scale and following Byrne’s [39] and Brown’s [51] recommendations, we decided to go further in our endeavour to test the validity of the measurement model. Therefore, using a systematic theory-informed methodology, we observed a good fit of the factorial structure to the model and good internal consistency of all subscales. According to Self-Determination Theory, the essence of goal content is a core feature for sustained health behaviour and weight loss maintenance motivation [26]. Examining the extrinsic or intrinsic nature of goals to engage in the targeted behaviour for sustained weight-loss maintenance may help practitioners and researchers understand the quality, sustainability, and likely effectiveness of people’s motivation. However, advancement in this field of knowledge has been hampered by the lack of a validated psychometric instruments to measure the content of people’s weight loss maintenance goals.

Sixteen items derived from measures of goal content for exercise were specified and reviewed by an expert panel to assess four goals: Health Management; Challenge; Image; and Social Recognition. Initial CFA analysis suggested theoretically consistent and pragmatic scale modifications. Based on modification indices, Image factor’s item 10 (“To be slim so to look attractive to others”) cross-loaded on Social Recognition, which is plausible given the item’s reference to other people’s perceptions. Item’s 1 (“To increase my resistance to illness and disease”) and item’s 9 (“To improve my overall health”) error variances presented a possible covariance between each other indicating content overlap.

Furthering the analysis, modification indices associated with item 16 (“To measure myself against personal standards”) provided evidence of cross-loading issues with extrinsic factors (Image and Social Recognition) that were incongruent with the SDT framework. Indeed, the underlying competitive nature of this item’s content may be misleading and therefore induce social comparison of one’s abilities (extrinsic). Due to this incongruency, we deleted item 16. Additionally, item 3 (“To be well thought of by others”) cross-loaded on Image factor and presented a misspecification with item 10 error variance. Item content may be misrepresented as a notion of social image instead of the social identification of one’s identity. Subjecting items 1, 3, 9, 10 and 16 to further refinement by a panel of specialists with face validity analysis among people trying to maintain their weight loss is recommended. Furthermore, we suggest investigation of the potential validity and reliability of shorter version of GCWMS with 12 or even 8 items, which would be suitable for increasingly used digital platforms that allow faster and simpler assessments (e.g., in apps or web-based platforms instead of paper and pencil).

In line with previous research using the GCEQ (the scale on which the GCWMS was based), our analysis confirmed that goal content subscales associations were aligned with the proposed motivational model, that is, extrinsic goals were inter-correlated as were intrinsic goals. Also as expected, extrinsic goals (Image and Social Recognition) were positively correlated with controlled motivation for exercise. The Health Management goal, a hypothesised intrinsic goal, correlated positively with autonomous motivation. In contrast, the challenge-oriented goals correlated positively with both autonomous and controlled motivation. One reason for this might be different interpretations of some items in this subscale (e.g., “To give me goals to work towards”; “To give me personal challenges to face”) as either personal skill development or as an outcome-oriented pursuit. It would be of interest to verify this correlation pattern in future studies. Future research could also refine this factor’s items to capture a sense of self-development by focusing on the process rather that a goal-oriented mind-set, e.g.: “To feel competent and in control; “To overcome my current difficulties”.

Measurement invariance tests confirmed full factorial invariance of the GCWMS across gender. Residual invariance was not achieved across countries. Although important for full factorial invariance, residual invariance is not a prerequisite for testing mean differences [53]. Therefore, based on these results, the scale scores can be compared across gender and the three analysed countries (Portugal, UK, and Denmark) as the underlying latent variables were interpreted in the same way by all individuals. We further demonstrated that the factor structure of the GCWMS is consistent across the studied groups and we may assume an unbiased scoring of the scale’s items. We are confident that further refinements may improve scale performance to achieve full factorial validity also across countries.

Initial poor model fit indices of the GCWMS may be explained in terms of its derivation from two other scales that were originally designed to assess goal contents for exercise. Future item re-specification to address weight loss maintenance-related goals might improve the scale (possibly based in qualitative research considering people’s experiences). However, identifying latent constructs that focus on more distal outcomes (weight) rather than tangible behaviours (such as exercising) could be challenging.

Strengths and limitations

The findings are supported by advanced multivariate analysis allowing the model to be tested in a large sample of individuals engaged in a weight loss maintenance intervention. The scale was also thoroughly developed based on previous original work [29, 38]. The use of a large data set derived from three European countries allowed testing of gender and cultural measurement invariance. The data were collected as part of baseline measures of a large controlled trial, providing confidence in the quality of the recruitment and data collection procedures.

Despite the advantages of this study being nested within a controlled trial, this also may limit the generalisability of the developed scale to the overall population as participants in the NoHoW trial had successfully lost ≥ 5% weight in the last 12 months and may be more motivated in their weight management efforts than the general population. Additionally, the absence of residual invariance across countries may be a result of measurement errors related to the complex process of translation to different languages and cultural backgrounds [32].

Future studies should refine GCWMS items that were removed due to cross-loading and items with error variances that were found to co-vary suggesting content overlap. Indeed, new more specific weight management items may be formulated to improve scale’s performance. A qualitative method approach with interviews may be of substantial usefulness for item refinement.

What is already known on this subject?

The literature that aims to provide a motivational viewpoint on regulation of eating behaviour is limited and does not pertain adequate attention to the motivational dynamics involved in the weight maintenance behavioural processes. To the best of our knowledge, there are no scales available to assess the nature of people’s goals for weight loss maintenance.

What your study adds?

This study provides initial evidence for the validity and reliability of scores derived from Goal Content for Weight Maintenance Scale and confirms measurement invariance across gender and three European countries. Findings support that GCWMS can be used to measure accurately the content of weight loss maintenance goals, establishing its usefulness in the endeavour of advancing measurement of theory-based aspects of motivation for weight loss maintenance that will enable a better understanding of what contributes, in motivational terms, to weight loss maintenance and why.