Introduction

In Western societies, beauty standards promote a thin-ideal body. However, because of its extreme thinness, this ideal body is impossible to achieve for most women (Brownell 1991). Women nonetheless perceive it as allowing them to have a more successful life (Engeln-Maddox 2006). Body dissatisfaction (BD) may therefore appear in women who have internalized the thin-ideal but struggled to achieve it (Stice and Shaw 2002). BD is defined as the discrepancy between the ideal body a woman wants to achieve and the perception she has of her body (perceived body; Furnham and Greaves 1994). However, it is more than the simple discrepancy between these two bodies, given that perceived and ideal bodies have both been shown to explain significant proportions of BD variance (Williamson et al. 1993). Regarding ideal bodies, eating disorder (ED) symptoms were shown to be related to lower ideals and, ultimately, greater BD by the increase in the perceived/ideal discrepancy (MacNeill and Best 2015). Concerning the accuracy of body size estimation (BSE), distortions in own body perception (e.g., overestimation of one’s body size) were highlighted among patients with anorexia nervosa and in nonclinical samples (Cornelissen et al. 2013; Gardner and Brown 2014) and might influence the degree of BD experienced: the greater the distortions, the higher the BD (Williamson et al. 1993). Given that BD plays a role in various problems (e.g., low self-esteem, sexual difficulties; O’Dea 2012; Wiederman 2012) and is a common feature of ED (anorexia nervosa and bulimia nervosa; Fairburn 2008), reliable assessment of the processes underlying it—namely the perceived body, the ideal body, and their discrepancy—is necessary.

Nevertheless, the assessment of distortions in own body perception is complex, given that distortions are conceptualized as cognitive biases (Smeets and Panhuysen 1995). In this view, top-down/cognitive mechanisms influence perception and therefore require specific evaluation conditions to allow their activation. More specifically, BSE might involve the activation of appearance self-schemas (cognitive structures that overvalue information about body shape/weight) that would influence the accuracy of the outcome when individuals cannot rely on physical information (i.e., in ambiguous situations; Williamson et al. 1999). For instance, Docteur et al. (2012) showed that an overestimation bias appeared among women from the community in a recall condition (i.e., representation, from memory, of one’s own body size). Conversely, when participants had access to visual information (i.e., when they were able to estimate their body size with the aid of a mirror), they made few errors of estimation. Therefore, the recall condition, which can be considered an ambiguous situation, potentially appealed to their knowledge and beliefs about their own body size (stored in appearance self-schemas), altering the accuracy of BSE. Ambiguous situations are therefore theoretically (and empirically) required for a valid assessment of estimation biases.

So far, BSE (perceived body) and ideal body perception have been assessed with a wide range of techniques (for a review, see Gardner 2012), the most widely used being photograph distortion and figure rating scales (FRS). In photograph distortion techniques, participants use a computerized program to adjust a modified photograph of themselves until it fits their perceived or ideal body. FRS consist of a continuum of drawings or photographs ranked from thinnest to most obese where participants have to select the figure that best fits their perceived and/or ideal body. These two methods can assess own and ideal body perceptions and their discrepancy. Moreover, since participants do not have access to visual information about their body during task completion, these two methods can be considered ambiguous; they potentially activate participants’ appearance self-schemas, requiring them to invoke their own body representations. Nevertheless, unlike photograph distortion techniques, FRS have the advantage of being easier to implement and quicker to complete (Gardner and Brown 2010). Thus, they are less time-consuming and more convenient for clinical practice.

A large number of FRS have therefore been developed (for a review, see Gardner and Brown 2010), but all current FRS have limitations. First, the majority consist of a small number of figures (nine or even fewer; Gardner and Brown 2010), which often exhibit clear differences in terms of body mass index (BMI) in order to represent a continuum of body shapes, resulting in easy selection of a figure as the perceived or ideal body. These continua are therefore not ambiguous, allowing the possibility of choosing a figure by simple deduction without referring to own body representations: the participant can easily remove the figures considered to be distant from her own body, choosing from the few remaining figures. Thus, activation of appearance self-schemas during task completion seems implausible. Second, concerning the figures’ characteristics, facial features are shown in some figures, which can draw attention away from the body (Gardner et al. 2009). Moreover, schematic/drawn figures contain few details and are thus less realistic than photographs. They also convey the artist’s representation of body shapes, leading to potential inaccuracy or errors compared with real body dimensions and BMIs (Gardner 2012). FRS containing such figures therefore do not include information concerning their specific BMIs. Thus, although these FRS can estimate perceived and ideal bodies and their discrepancy, they do not allow assessment of the accuracy of BSE (Gardner and Brown 2010). Third, the use of photographs might reduce participants’ abilities to project themselves in the photographed bodies because it could be more difficult to project oneself into the body of someone else than into an unreal body (belonging to nobody), as in schematic/drawn figures. Finally, the psychometric properties (e.g., reliability, validity) of some FRS are unavailable (Gardner 2012) and the test–retest reliability of the FRS whose psychometric properties are known might have been overvalued. Indeed, the small number of figures presented probably led to a higher probability of choosing the same figure at the retest by remembering the figure chosen or performing the same deduction as in the first test.

Overview of the Present Study

To overcome these limitations, we developed a new FRS consisting of a broad continuum of 27 realistic, validated computer-generated women’s bodies selected from the database proposed by Moussally et al. (2016). This database contains women’s bodies that were gradually manipulated from extreme thinness to extreme fatness. Moussally et al. (2016) obtained valence and thinness/fatness ratings of bodies and proposed different subcategories of bodies from these ratings (e.g., fat/negative, thin/positive, or thin/negative bodies). These authors also showed that the thin bodies were more positively judged by women with high levels of BD or thin-ideal internalization. More important, the thinness/fatness ratings were not influenced by these variables. This is central in the field of FRS, given that, to obtain a valid assessment, participants have to be able to estimate the size/BMI of the body chosen.

Bodies from this database are presented without their heads, and BMI information for each body is provided to the researchers, allowing our FRS to measure the accuracy of BSE. Furthermore, the computer-generated bodies are standard bodies that do not belong to a real person, meaning that each participant has to make the effort to project herself into the bodies. Finally, given the large number of bodies and the fine distinctions between them, the selection of a body in our FRS is not as easy as in other FRS. Indeed, to make their choice, women have to consider the different bodies and refer to the image they have of their own body, which might activate their appearance self-schemas and the potential cognitive distortions related to them. By increasing the task’s difficulty, we thus aim to provide a valid measure of own and ideal body perception.

The psychometric properties of our FRS were examined in a large sample of women from the community. Specifically, we analyzed the task’s content, construct and convergent validities, and sensitivity. Test–retest reliability was tested with a short-term longitudinal design at 14 days. We also obtained preliminary data concerning the scale’s discriminant power in a sample of women with ED (compared with a control sample) to highlight the utility of this new FRS in clinical samples.

Methods

Participants

The community sample included 113 women volunteers recruited in Geneva (Switzerland) with advertisements displayed around the university campus and posted on social networks. Inclusion criteria were defined as follows: age 17–35 years, native or fluent French speakers, having grown up in a Western country, normal or corrected-to-normal vision. The presence of a current self-reported ED was an exclusion criterion. The patient sample included 20 women with a primary diagnosis of ED (10 anorexic-restrictive type patients [AN-R], 10 patients with an ED not otherwise specified [EDNOS]) recruited in an inpatient unit of two psychiatric clinics (Brussels/Braine-l’Alleud, Belgium). The diagnoses, based on DSM-IV-TR criteria (APA 2000), were obtained from the patients’ medical records. Inclusion criteria were the same as for the community sample. Finally, 20 age-matched women (Mann–Whitney U test; z = 0.30, p =  0.766)Footnote 1 served as a control sample. They were randomly selected from the community sample database on the basis of the following criteria: similar age, normal weight (BMI 18.50–24.99 kg/m2), no self-reported past or current psychiatric or neurological disorders. We also ensured that control and patient groups differed only on BMI (Mann–Whitney U test; z = 4.69, p < 0.001, abs(r) = 0.742), with ED patients presenting a lower mean BMI (see Table 1). The two samples did not differ on education level [χ2(3) = 2.12, p = 0.549], percentage of individuals whose mother tongue was not French [χ2(1) = 2.06, p = 0.151], or percentage of individuals who had corrected vision problems [χ2(1) = 0.96, p = 0.327]. The characteristics of the three samples are described in Table 1.

Table 1 Characteristics of sample groups

Materials

Personal Information Sheet

Participants answered questions about their personal data (age, mother tongue, education level, vision problems, psychiatric or neurological disorders). Concerning the patient sample, their primary diagnosis was obtained from their medical records.

The Computer-Generated Figure Rating Scale

The Computer-Generated Figure Rating Scale (CGFRS) is composed of 27 computer-generated women’s bodies (without heads) dressed in white panties and undershirts. The bodies were selected from Moussally et al. (2016) database and represent a continuum of body shapes from extreme underweight to morbid obesity (estimated BMIs range from 12.69 to 69.56 kg/m2). The 27 bodies are distributed into the BMI classes proposed by the World Health Organization (1995, 2000, 2004) as follows: (1) moderate to severe underweight: eight bodies with a BMI <17.00 kg/m2; (2) mild underweight: three bodies with a BMI between 17.00 and 18.49 kg/m2; (3) normal range: five bodies with a BMI between 18.50 and 24.99 kg/m2; (4) overweight: three bodies with a BMI between 25.00 and 29.99 kg/m2; and (5) obesity: eight bodies with a BMI > 30.00 kg/m2. Figure 1 illustrates examples of bodies.

Fig. 1
figure 1

Representative examples (8 of 27) for each body mass index class of computer-generated women’s bodies used in the “own body” and “ideal body” versions of the Computer-Generated Figure Rating Scale (CGFRS)

The CGFRS comes in two different versions to measure the participants’ perceptions of their own (CGFRS–OB) and ideal (CGFRS–IB) bodies; both use the same continuum of 27 bodies. In the CGFRS–OB, participants have to indicate which body on the continuum best fits their own body. The raw score obtained corresponds to the subtraction of the participant’s BMI from the chosen body’s BMI (perceived BMI). A positive score indicates that the participant perceives herself as bigger than she is; a negative score indicates that she perceives herself as thinner. The absolute value of this raw score can be computed to obtain the distance (without taking into account the direction of the divergence) between the two BMIs, corresponding to the participant’s perception/estimation bias for her own body.

In the CGFRS–IB, participants have to indicate which body on the continuum is the closest to their ideal body. The raw score obtained corresponds to the subtraction of the ideal BMI (chosen body’s BMI) from the perceived own BMI (obtained with the CGFRS–OB). A positive score indicates that the participant would like to be thinner than she perceives herself; a negative score indicates that she would like to be bigger. Again, the absolute value of the raw score allows one to obtain the absolute distance between the two BMIs, corresponding here to the participant’s body satisfaction/dissatisfaction (the greater the discrepancy between the ideal and the perceived BMI, the more dissatisfied she is).

In sum, each CGFRS version results in three scores: (1) the BMI of the chosen body (perceived or ideal BMI); (2) the raw score assessing the distance between two BMIs: namely, the participant’s BMI and the perceived BMI (OB score) or the perceived and ideal BMIs (IB score); and (3) the absolute distance corresponding to the perception/estimation bias or the body satisfaction/dissatisfaction score. The CGFRS can be downloaded from the web (http://www.unige.ch/fapse/PSY/groups/upnc/CGFRS.zip).

Stimulus Evaluation Tasks

The participants evaluated the 27 bodies in two computerized tasks. First, they evaluated the bodies’ plausibility (plausibility task). Each picture appeared on the screen accompanied by the following instructions: “Indicate how likely it is that a woman with such a body exists.” The picture remained on the screen until the participant evaluated it with a rating scale ranging from 1 (not at all) to 9 (absolutely). The bodies appeared one after the other in random order.

In the second task (classification task), the 27 bodies appeared on the screen again, one after another, and the participants had to assign each body to one of the following BMI categories (“Indicate which category you think that this body belongs to.”): (1) emaciation/undernourishmentFootnote 2, (2) underweight,2 (3) normal weight, (4) overweight, (5) obesity. A sixth possibility allowed participants to specify that they were not sure of the category. The pictures also appeared in random order and remained on the screen until the participant responded.

Questionnaires

BD was also assessed with the Body Shape Questionnaire (BSQ; Cooper et al. 1987; French validation: Rousseau et al. 2005). This 34-item questionnaire evaluates concerns and affects toward one’s own body shape/weight. Answers are given on rating scales ranging from 1 (never) to 6 (always). Its internal consistency in the present samples was excellent (Cronbach’s α = 0.96–97).

The participants also completed the Eating Disorder Inventory (EDI; Garner et al. 1983; French validation: Criquillion-Doublet et al. 1995), which assesses cognitive and behavioral features of ED multidimensionally. In this study, we focused on two EDI subscales: BD (nine items) and drive for thinness (DT; seven items). The DT subscale assesses the pursuit of thinness, which includes excessive concerns towards weight (wish to lose weight and fear to gain weight) and dieting. The items are scored on a rating scale ranging from 1 (always) to 6 (never). The internal consistency of these subscales was likewise excellent (Cronbach’s α = 0.89–90).

Anthropometric Data

Participants were weighed and measured. Their height (in meters) and weight (in kilograms) were determined to the nearest second decimal to calculate their BMI. Participants also stated their ideal weight on a separate sheet. These data were used to obtain their self-reported ideal BMI.

Procedure

After giving their informed consent, participants took part in a first session. They completed the personal information sheet and the CGFRS and then evaluated the 27 bodies by performing the plausibility and classification tasks. Finally, they completed the questionnaires and their anthropometric data were determined.

A follow-up was organized after exactly 14 days. During this second session, participants completed both versions of the CGFRS only. All participants performed the same procedure. The sessions were organized separately for each participant, meaning that the date of a participant’s first session defined the date of her second session. The Ethics Committees of the Faculty of Psychology and Educational Sciences at the University of Geneva and the Saint-Luc Academic Hospital (Belgium) approved the study protocol.

Objectives and Statistical Analyses

The psychometric properties of the CGFRS were examined in the community sample. The task’s sensitivity was investigated with the descriptive statistics of the CGFRS raw scores (scores without absolute values, namely OB and IB scores) by ensuring that they showed variability among participants, without ceiling or floor effects.

Content validity was assessed with (1) the plausibility evaluation of the bodies and (2) the percentage of correct classifications of the bodies to the BMI categories that they belong to (emaciation/undernourishment, underweight, normal weight, overweight, obesity). Concerning the plausibility ratings, we wanted to ensure that the bodies were, on average, considered realistic, although we expected that extremely thin or obese bodies might be evaluated as less plausible than bodies in the center of the CGFRS continuum, given that they are more extreme. Concerning classification performance, errors made in classifying bodies were examined in parallel with the percentages of correct classifications to ensure that, when participants chose a body in the CGFRS, they were able to approximately identify the BMI category underlying it. A weighted kappa coefficient was also performed to examine the classification agreement beyond chance.

Construct validity was assessed by the degree of correspondence, using Pearson’s correlations, between the CGFRS scores and their counterparts assessed with validated measures (e.g., correspondence between the perceived BMI and the participant’s BMI, or the CGFRS body satisfaction/dissatisfaction score and the BD scores of the questionnaires). Moreover, given that the accuracy of BSE has been shown to be related to BD (Williamson et al. 1993), the relationship between these CGFRS scores (perception/estimation bias and body satisfaction/dissatisfaction scores) was also examined, as was the relationship between the perception/estimation bias and the BD scores of the questionnaires.

Convergent validity was examined with Pearson’s correlations between the CGFRS absolute scores (perception/estimation bias and body satisfaction/dissatisfaction score) and the DT score, as well as the participants’ BMI, given that BMI and ED symptoms—including DT—are related to BSE and/or BD (e.g., Cash 2012; Cash and Deagle 1997; Probst et al. 1998; Stice and Shaw 2002).

Test–retest reliability in the community sample (N = 110/113)Footnote 3 was assessed for the perceived and ideal BMI scores, as well as for the OB and IB raw scores (from which the perception/estimation bias and body satisfaction/dissatisfaction scores are derived). The intraclass correlation coefficients (ICCs) between the ratings during the two sessions were examined with a two-way random effects model by using an absolute agreement definition (coded as ICC(A,1); McGraw and Wong 1996) in order to take into account random and systematic variations between the test and the retest (Weir 2005). The coefficients were interpreted by following the standards proposed by Shrout (1998): 0.41 to 0.60 = fair; 0.61 to 0.80 = moderate; ≥0.81 = substantial.

Finally, for discriminant validity, the patient sample was compared with the control sample on the different CGFRS scores, as well as on plausibility ratings and classification performance in order to examine whether (1) the CGFRS was able to dissociate between women with a high level of ED symptoms and women from the community, and (2) the content validity subcomponents showed similar results in the patient sample and in the community sample. Mann–Whitney U tests were performed because the assumption of normality was violated for some variable distributions in both samples.

Results

Descriptive Statistics and Sensitivity

Descriptive statistics for the community sample are displayed in Table 2. Interestingly, concerning the OB and IB scores, participants from the community sample were spread over the possible outcomes. More specifically, for the CGFRS–OB, 21.24% of the participants perceived themselves as thinner than they were (OB score ≤−1 kg/m2), 32.74% had a perception/estimation close to their actual BMI (−1 kg/m2 < OB score < 1 kg/m2), and 46.02% perceived themselves as bigger than they were (OB score ≥ 1 kg/m2). For the CGFRS–IB, 11.50% of the participants would have liked to be bigger than they perceived themselves (IB score ≤−1 kg/m2), 16.81% were quite satisfied with their bodies (−1 kg/m2 < IB score < 1 kg/m2), and 71.68% would have liked to be thinner than they perceived themselves (IB score ≥ 1 kg/m2).

Table 2 Descriptive statistics of the community sample for own and ideal body versions of the Computer-Generated Figure Rating Scale and the questionnaires

Content Validity

Stimulus Plausibility

On average, the bodies were assessed as plausible (possible responses from 1 to 9; M = 5.77, SD = 1.22; significance of the difference from the center of the evaluation continuum [5.00]: t(112) = 6.69, p < 0.001, d = 0.629). Figure 2 displays the plausibility judgments for each body. At a descriptive level, the ratings presented a curvilinear relationship with the bodies’ estimated BMI. The higher or lower the bodies’ estimated BMIs, the less plausible participants found them to be (severe/moderate underweight: M = 3.47, SD = 1.50; obesity: M = 5.44, SD = 2.02). Conversely, the more the bodies’ estimated BMIs were in or around the normal range, the more plausible participants assessed them to be (mild underweight: M = 7.29, SD = 1.44; normal range: M = 7.94, SD = 1.05; overweight: M = 7.65, SD = 1.23)Footnote 4.

Fig. 2
figure 2

Plausibility judgments of the 27 computer-generated bodies. a Severely to moderately underweight bodies (estimated BMI < 17.00 kg/m2). b Mildly underweight bodies (estimated BMI between 17.00 and 18.49 kg/m2). c Bodies in the normal range (estimated BMI between 18.50 and 24.99 kg/m2). d Overweight bodies (estimated BMI between 25.00 and 29.99 kg/m2). e Obese bodies (estimated BMI > 30.00 kg/m2)

Stimulus classification

On average, the bodies were correctly classified (percentage of correct classification: M = 65.39, SD = 10.41). Figure 3a displays the percentage of correct classifications for each body. The majority of the bodies presented a percentage of correct classifications that was superior to the random classification threshold of 16.67% (i.e., 100/6 possible responses; percentage at which correct classifications differed significantly from 16.67%: ≥ 27.50%, z ≥ 3.09, p < 0.050). Interestingly, as shown in Fig. 3a, the bodies for which the classification percentage was lowest were those on the borderline between two BMI categories. For these bodies, the analysis of errors performed revealed that, except for one bodyFootnote 5, participants classified them in the nearest incorrect category (e.g., “underweight” instead of “emaciation/undernourishment”; see Fig. 3b). Concerning the different classes of bodies, although the mildly underweight bodies presented a lower classification percentage than did those of the other classesFootnote 6, all classes presented a percentage of correct classification that was superior to the random classification threshold (severe/moderate underweight: M = 64.82, SD = 14.42; mild underweight: M = 29.50, SD = 33.26; normal range: M = 91.68, SD = 13.29; overweight: M = 58.70, SD = 36.26; obesity: M = 65.49, SD = 16.22). Finally, the weighted kappa coefficient was high (κw = 0.791, 95% CI = [0.780, 0.803]), denoting a strong agreement beyond chance.

Fig. 3
figure 3

Classification of the 27 computer-generated bodies. a Percentage of correct classifications. The horizontal line corresponds to the random classification threshold of 16.67%. (a) Severely to moderately underweight bodies (estimated BMI < 17.00 kg/m2 corresponding to the “emaciation/undernourishment” classification category). (b) Mildly underweight bodies (estimated BMI between 17.00 and 18.49 kg/m2 corresponding to the “underweight” classification category). (c) Bodies in the normal range (estimated BMI between 18.50 and 24.99 kg/m2 corresponding to the “normal weight” classification category). (d) Overweight bodies (estimated BMI between 25.00 and 29.99 kg/m2). (e) Obese bodies (estimated BMI > 30.00 kg/m2). b Analysis of classification errors. “Closest category” corresponds to the nearest incorrect classification category. “Others” includes all other incorrect classification categories. (ae) Definitions identical to those in a

Construct Validity

For the chosen bodies, the perceived BMI was closely related to the participant’s actual BMI (r = 0.84, p < 0.001) and the ideal BMI correlated strongly with the self-reported ideal BMI (r = 0.63, p < 0.001). Of the absolute scores, the body satisfaction/dissatisfaction score was closely related to the BD score measured with the BSQ (r = 0.64, p < 0.001) and the EDI (r = 0.71, p < 0.001). Moreover, perception/estimation bias was positively related to the body satisfaction/dissatisfaction score (r = 0.50, p < 0.001) and to the BD score assessed with the BSQ (r = 0.38, p < 0.001) or the EDI (r = 0.42, p < 0.001), meaning that the more distorted the participants’ perceptions of their own bodies were, the higher their BD was (independently of the instrument chosen).

Convergent Validity

The perception/estimation bias presented moderate correlations with the DT score (r = 0.26, p = 0.005) and the participants’ BMI (r = 0.38, p < 0.001). Therefore, the higher the participants’ BMI or DT, the more distorted the participants’ perceptions of their own bodies. On the CGFRS–IB, the body satisfaction/dissatisfaction score was closely related to the participants’ DT (r = 0.48, p < 0.001) and BMI (r = 0.67, p < 0.001). Therefore, higher BMI and DT were associated with higher BD.

Test–Retest Reliability

Concerning the CGFRS–OB, the perceived BMI presented substantial test–retest reliability (ICC(A,1) = 0.89, p < 0.001) and the reliability of the OB score was moderate (ICC(A,1) = 0.61, p < 0.001). For the CGFRS–IB, the ideal BMI (ICC(A,1) = 0.80, p < 0.001) and the IB score (ICC(A,1) = 0.80, p < 0.001) presented moderate reliabilityFootnote 7.

Discriminant Validity

We first ensured that patient and control samples differed on body-image-related variables. As expected, they differed significantly on BD (BSQ score: t(35) = 5.86, p < 0.001, d = 1.918; EDI subscale: t(35) = 3.33, p = 0.002, d = 1.091) and DT (Mann–Whitney U test; z = 2.96, p = 0.003, abs(r) = 0.467): ED patients presented a higher BD assessed with the BSQ (M patients = 139.24, SD patients = 36.34; M controls = 75.60, SD controls = 29.68) and the EDI (M patients = 14.49, SD patients = 7.81; M controls = 6.65, SD controls = 6.49) and a higher level of DT (M patients = 11.13, SD patients = 6.63; M controls = 4.45, SD controls = 6.18).

The discriminant validity results (see Table 3) shows that, although they had a significantly lower BMI than did controls, ED patients did not differ from controls on perceived BMI, meaning that they perceived themselves as having a BMI similar to the one the controls chose. Logically, this divergence in perception/estimation implies that ED patients had a higher perception/estimation bias than did controls: the bodies they chose on the CGFRS–OB were more distant from their real BMI than were the controls’ choices. On the CGFRS–IB, ED patients’ ideal BMI was significantly lower than that of controls (i.e., their ideal was to achieve a thinner body than controls) and their BD assessed with the CGFRS was higher, replicating the results with the questionnaires. Regarding plausibility ratings, ED patients evaluated the bodies as being more plausible than controls did. This result seems to be particularly due to their ratings of the severely/moderately underweight bodies: ED patients evaluated them as being more plausible than controls did, but did not differ from controls on the other classes of bodies. Finally, ED patients did not differ from controls on their general classification performance. However, the exploration of group differences in classes of bodies revealed that ED patients classified obese bodies better than controls didFootnote 8.

Table 3 Group comparisons: Mann–Whitney U tests performed on scores obtained from the Computer-Generated Figure Rating Scale, plausibility ratings, and classification performance

Discussion

The CGFRS was developed to provide a new and reliable measure of distortions in BSE and BD. This study therefore aimed to validate the CGFRS by determining its psychometric properties on the basis of a large sample of women from the community and by highlighting its discriminant validity in a small sample of women with ED.

Concerning the results obtained in the community sample, the raw scores of the CGFRS–OB and CGFRS–IB versions showed great variability, suggesting that participants have very different perceptions of their own and ideal bodies (e.g., some perceived themselves as thinner than they were, others as bigger). By appropriately dissociating individuals, the CGFRS presents satisfactory sensitivity.

Its content validity is also satisfactory. First, on average, the bodies were judged as being plausible/realistic. However, the ratings varied as expected with the bodies’ estimated BMIs: bodies with very high or low BMIs were judged as being less plausible than bodies in or around the normal weight range, probably because they are more extreme and less frequent in the general population (Penman and Johnson 2006). Second, the bodies’ classification agreement was strong and, on average, they were correctly classified. Participants were thus able to identify the BMI category to which the bodies belong, with the exception of some bodies on the category borderlines, which may be considered ambiguous. Interestingly, the mildly underweight bodies presented the lowest correct classification percentage: most participants who classified these bodies incorrectly assigned them to the normal weight category (nearest incorrect BMI category for most of these bodies). This result probably illustrates the impact of Western societies’ thinness standards on participants’ judgments of body shapes. More specifically, Moussally et al. (2016) showed that thin bodies with BMIs ranging from normal weight to moderate underweight (up to a BMI of 16.12 kg/m2) were assessed as attractive. It is therefore possible that thinness standards conveyed by the media play a role in the development of a cognitive distortion by suggesting that mild and even moderate underweight corresponds to a normal BMI (for the impact of thinness exposure on perceptions of body normality, see Glauert et al. 2009). Consequently, mildly underweight body shapes may be harder to classify than other bodies.

Concerning construct validity, the CGFRS scores presented high correlations with their counterparts assessed with validated measures, and the perception/estimation bias score was highly associated with the BD scores (of the CGFRS and the questionnaires), supporting the supposition that one’s own body perception and body satisfaction/dissatisfaction are not independent. The convergent validity of the CGFRS also presented adequate results, with moderate to high associations.

The test–retest reliability was generally satisfactory. More specifically, there was strong agreement between perceived BMIs over time as well as relatively strong concordance in ideal BMIs over time. However, the reliability of all scores—especially of the CGFRS–OB raw score—was not perfect, denoting variability in measurement. This could be explained by the fact that, although participants chose similar perceived or ideal bodies at both sessions, the bodies they chose were not identical, which affected the scores and their reliability. Nevertheless, given that the perceived and ideal BMIs chosen did not, on average, differ over time, the sources of variability could not be considered systematic. Furthermore, as suggested earlier, the test–retest reliability of existing FRS may have been overvalued because of the small number of figures they contain. In the CGFRS, the number of figures—and thus the difficulty of the task—is increased, which may explain the slightly lower test–retest reliability: its scores are sensitive to small variations. On the other hand, the reliability of the different scores was at least moderate, meaning that, even if biases in measurement are present, their effects are limited (Shrout 1998).

The discriminant validity indicated that the CGFRS absolute scores significantly distinguish between a sample of women with ED and a control sample: patients had a higher perception/estimation bias and BD, replicating results described in the literature. Indeed, Cash and Deagle’s (1997) meta-analysis showed that women with anorexia nervosa or bulimia nervosa presented more inaccurate BSE and greater BD.

Finally, other interesting results should be discussed. First, ED patients and control women presented similar perception patterns: on average, both chose an ideal body that was thinner than the perceived body. This was also observed in the community sample: most participants wanted to be thinner, replicating the findings of previous studies (e.g., Crossley et al. 2012) and illustrating the valuation of thinness in Western societies (Stice and Shaw 2002). Second, ED patients judged severely/moderately underweight bodies (bodies closest to most of their BMIs) as being more plausible and classified obese bodies (bodies farthest from theirs) better than controls did. ED patients therefore seem to perceive the signs of obesity more correctly than control women do, perhaps because obesity is the opposite of their ideal body and is therefore more threatening for them. This result could also be conceptualized as better “realism” (widely highlighted in the field of depression; Moore and Fresco 2012) of ED patients toward obesity signs. Concerning the higher plausibility ratings of the severely/moderately underweight bodies, ED patients did not misperceive signs of underweight, given that they did not differ from controls in terms of classification errors. This result might rather suggest that ED patients see severely/moderately underweight bodies more frequently during their lifetime (e.g., during their hospitalizations) and are therefore better able to judge the plausibility of these bodies. Nevertheless, we cannot rule out the possibility that a self-protection mechanism (by giving higher ratings to bodies close to their BMIs or to their ideal body) also underlies the high level of plausibility assigned.

Limitations and Future Prospects

The study design has several limitations. For instance, the plausibility task was always completed before the classification task. The first task could therefore have predisposed participants’ responses to the second task. Likewise, the first session could have sensitized participants to their body shape, influencing their answers in the second session and, thus, the test–retest reliability. Moreover, we did not evaluate the CGFRS divergent validity. The higher correlation sizes obtained for construct validity (between the body satisfaction/dissatisfaction score and the questionnaires) than for convergent validity (with the DT score) showed that the CGFRS-IB correctly evaluated BD and dissociated it from other body-image-related constructs. Nevertheless, the addition of a measure more distant from body image (e.g., anxiety scale) could have been valuable. Similarly, we did not include an alternative FRS, preventing comparisons of the CGFRS to existing FRS. Finally, the anthropometric data were assessed only at the first session, given the short interval between test and retest. Some weight changes—even if minimal—could have thus affected the retest, negatively influencing the test–retest reliability of the CGFRS–OB raw score.

Concerning the patient sample, it would have been valuable to confirm the patients’ diagnoses with a valid diagnostic instrument/interview. Moreover, although the data supported the discriminant validity of the CGFRS, the sample size was nonetheless small: replication studies in larger samples are required. Furthermore, half the patients presented a diagnosis of AN-R and the average BMI was under the normal range, so it is also important to replicate the results with obese patients and/or patients with a BMI in the normal range to determine whether some of our findings can be attributed to our specific patient sample.

Finally, the computer-generated bodies were principally designed for Caucasian women (Moussally et al. 2016). As Moussally et al. suggested, it is possible to present the bodies in grayscale to attenuate the impact of ethnicity. Nevertheless, given that this study validated the CGFRS in Caucasian women, future studies are necessary to examine whether the CGFRS is a valid measure in ethnically different samples. Indeed, Kronenfeld et al. (2010) have shown that body ideals, as well as the perception of the figures themselves (persons with a similar BMI from a different ethnic group could select significantly different own bodies), could vary in terms of ethnicity. More important, given that individuals’ body shapes could also vary in terms of ethnicity (McClure et al. 2012), further explorations are needed to determine whether the CGFRS bodies are representative of different ethnic groups or whether the use of the CGFRS is restricted to Caucasian women.