Introduction

Body image is an elaborate construct. A key element is subjective evaluation of the body, which is substantially more negative in relevant clinical groups than in non-clinical individuals. Such evaluations are made using affective, cognitive, and behavioural assessments of size, fitness, function, health, sensation, and aesthetic properties [1]. Body image can have a strong impact on an individual’s self-esteem, relationships, and quality of life. As well as being central to eating disorders, excessive concerns about weight and appearance can trigger problems such as anxiety and depression, leading to deterioration in quality of life [2]. Many factors have been proposed as causes of negative body image, including sociocultural pressures, peer pressure, teasing, and developmental changes [3, 4]. Such causal factors are potentially important for planning prevention. However, in clinical terms, it is the factors that maintain negative body image that are key; as they are the targets of most effective therapeutic approaches [5, 6].

There are three principal safety behaviours that maintain negative body image. Each serves the short-term function of reducing the individual’s anxiety, but results in long-term worsening of the anxiety and body image itself. The best known and understood of these behaviours are body avoidance and body checking [7], assessed using measures such as the Body Image Avoidance Questionnaire [8]. Modifying avoidance and checking behaviours and cognitions through behavioural experiments and exposure are effective ways of reducing such negative body image [9, 10]. In contrast, body comparison is less well researched and understood.

Body comparison is the use of other people’s physical and related attributes to evaluate one’s own appearance relative to theirs. Comparison is a characteristic that is found in many domains of human function. Social comparison theory [11] hypothesises that humans have an innate drive to evaluate their own opinions, abilities, progress, and standing in life. To fulfil this need, individuals identify standards against which they can compare themselves to others, and appearance is one of those factors. Women are particularly likely to engage in frequent comparisons with peers, judging their weight and shape in relation to others [12]. The importance of such comparison with others’ appearance is demonstrated by the way that it has been implicated in the links between poor social security and eating pathology [13], and in research that shows making appearance comparisons has a causal relationship with greater state body dissatisfaction [14].

In Western society, emphasis is on a slim female figure. Therefore, many women are likely to feel pressure to lose weight to achieve a more favourable comparison to their peers and other images (e.g., people in the media). Social media exacerbate this comparison behaviour, particularly outlets such as Instagram, where visual judgements and comparison are emphasised [15]. This goal of making a favourable comparison is achievable at times (thus providing the immediate positive return from the safety behaviour), but is made less achievable in the long term by the tendency to make the comparisons with role models who are likely to be slim (e.g., media figures) and the individual’s overestimation of their own body size relative to that of others [16, 17].

Given the links between body image and disordered eating, social anxiety, self-esteem, depression, body dysmorphic disorder, and eating disorders [1, 18], a greater understanding of the role of body comparison is needed to plan and deliver interventions for those individuals who use those behaviours. The first step in developing such understanding is the development of a clinically useful measure. There are some such measures in existence already, such as the Body Eating, and Exercise Comparison Orientation Measure (BEECOM) [19], Upwards and Downwards Physical Appearance Comparison Scales (UPACS and DACS) [20], and different forms of the Physical Appearance Comparison Scale (PACS) [21, 22]. However, each has limitations. The different versions of PACS and the UPAC and DACS focus solely on weight and body image, omitting the other aspects of comparison (e.g., personality and social). Similarly, the BEECOM was designed around assessing the exercise comparison within the context of eating-disordered behaviours. There is a need for a body comparison measure that includes the full range of common comparisons—physical, interpersonal, and appearance. Physical comparison involves individuals making a perceptual comparison of their own body shape with those of others. Interpersonal/Social comparison involves the individual considering what traits other people have in comparison to themselves. Appearance comparison occurs when other people’s clothes, hair, make up, and presentation of self are considered in relation to one’s own appearance. It is possible that each of these aspects of these elements of comparison is related to body image disturbance and eating pathology, and that any or all of these elements could be usefully addressed in clinical work. However, the role of these elements of body comparison remains to be established.

Therefore, this study will develop a measure of body comparison that addresses the full range of components of body comparison. Such a measure needs to be reliable, valid, and clinically relevant. It is hypothesised that the developed measure will have distinct factors that measure the three hypothesised components of body comparison, and that each will have a strong level of internal consistency and test–retest reliability (including trait status rather than state). Its clinical validity will be determined relative to an existing measure of comparison (the BEECOM), by contrasting the two measures in terms of association with eating and other measures of psychopathology.

Method

Ethics

This study was reviewed and approved by the Ethics Committee of the University of Sheffield, UK.

Design

This study used a mixed comparative and correlational design, with cross-sectional independent measures.

Participants

A sample of 412 adults completed the study. Of these, 314 were female and 98 were male (mean age = 31.72 years; SD = 12.87; range = 18–67 years). Convenience sampling was used to recruit participants. The sample consisted largely of volunteers recruited through the University volunteer participant system. However, to extend the age range, the study was also circulated to personal contacts. The only exclusion criterion was that participants were excluded if they were below 18 years of age.

Measures

The following measures were used to measure comparison, eating behaviours and cognitions, levels of anxiety and depression, and body satisfaction.

Comparison of self-survey (CoSS) This was the primary measure to be validated in this study, and was devised for the purposes of this research. Initially, 40 items were generated by the team, based on clinical experience and gender cognisance. The items were then reviewed, and some were reworded to ensure clarity of what was being asked. Others were removed from the data due to similarities to other items. The items were then externally reviewed by non-clinical individuals. Finally, 37 items were included in the original CoSS. No reverse-scored items were generated. Items are answered on a seven-point Likert-type scale ranging from 1 = Never to 7 = Always. The relevant items are given in Table 1 and “Appendix” in section. The measure was completed by all participants, and repeated by the test–retest participants.

Table 1 Principal component analysis (varimax rotation) of CoSS scales for non-clinical participants (N = 412), with item mean scores and internal consistency of resulting scales

Body, eating, and exercise comparison orientation measure (BEECOM) [19] The BEECOM is an 18-item self-report scale, with three factors—body, exercise, and eating comparison. Items are answered on a seven-point Likert-type scale ranging from 1 = Never to 7 = Always. No reverse items are included. The scores for the three factors were calculated, as well as an overall comparison score. Higher scores indicate a greater tendency towards comparison. The BEECOM has good psychometric properties including good concurrent validity (r = .42–.76), and test–retest reliability (total scale, r = .90; body factor, r = .85; eating factor, r = .88; exercise factor, r = .84) [19].

ED-15 [23] The ED-15 is a 15-item self-report measure of eating-disordered cognitions and behaviours. For the purpose of this study, only ten questions were used. Items are answered on a seven-point Likert-type scale ranging from 0 = Not at all to 6 = All the time. No reverse items are included. These questions form two factors—Weight and Shape, and Eating. Higher scores indicate a greater level of eating pathology. The scores for the two factors were calculated, as well as an overall eating pathology score. The ED-15 has strong psychometric properties, including test–retest reliability (r = .85–.93), concurrent validity (r = .56–.89), and convergent validity with (r = .31–.63). The ED-15 also shows good clinical validity [23].

Generalized anxiety disorder questionnaire (GAD-7) [25] The GAD-7 is a seven-item self-report measure, used for screening and measuring the severity of anxiety. Items are answered on a four-point Likert-type scale ranging from 0 = Not at all to 4 = Nearly every day. No reverse items are included. Higher scores indicate a higher level of anxiety. An overall anxiety score was calculated. The GAD-7 has satisfactory psychometric properties including internal consistency (α = .92), test–retest reliability (r = .83), and concurrent validity (r = .72–.74; [21]). Clinical validity is good in a general and primary care population [24]. However, clinical validity in a psychiatric population is weaker. Convergent validity is strong (r = .74–.750 )[26].

Patient health questionnaire (PHQ-9) [27] The PHQ-9 is a nine-item self-report measure of depression, which is used widely within clinical settings for screening and the measurement of outcome. Items are answered on a four-point Likert-type scale ranging from 0 = Not at all to 3 = Nearly every day. No reverse items are included. Higher scores indicate a higher level of depression. An overall depression score was calculated. The PHQ-9 has well-established psychometric properties, including good internal validity (Cronbach’s alpha = .86–.89). Clinical validity within the general and primary care population is excellent, and is convergent validity [27].

Body satisfaction scale (BSS) [28] The BSS is a 16-item self-report measure, which determines the individual’s level of satisfaction with their body. There are two factors—head and body. Items are answered on a seven-point Likert-type scale ranging from 1 = Very satisfied to 7 = Very unsatisfied. No reverse items are included. Higher scores indicate a lower level of body satisfaction. Scores for the two factors were calculated, as well as an overall satisfaction score. The BSS has reasonably high internal consistency (α = .79–.89) [28].

Procedure

The survey was distributed using Qualtrics survey software via email to a volunteer pool. A brief explanation of the study was given in the body of the email, before participants used a link that directed them to the study. 451 individuals activated the link. 412 completed the CoSS as the primary measure: completion rate = 91.35%, the BEECOM: N = 384, completion rate = 85.14%, ED-15: N = 379, completion rate = 84.04%, GAD-7: N = 399, completion rate = 88.47%, PHQ-9: N = 395, completion rate = 87.58%, and BSS: N = 378, completion rate = 83.81%. Total survey: N = 373 and completion rate = 82.71%. There were different completion rates for the different measures; therefore, N varies slightly from analysis to analysis.

Following completion of the measures, participants were thanked for their help, and asked if they would repeat the CoSS 2 weeks later for test–retest purposes. Of the participants in the first wave of data collection, 145 volunteered during that stage and undertook the second stage, re-taking the CoSS online.

Data analysis

Factor analysis was used to determine the number of scales in the CoSS. Principal Component Analysis was used, with Varimax rotation. Items were included only if they loaded at above .5 on the relevant factor, and if there was more than .2 difference in loading between factors. Cronbach’s alpha was used to establish the internal consistency of the resulting scales, and whether that consistency was enhanced by the removal of any items. Test–retest reliability was established using Pearson’s correlation coefficients and paired t tests. Gender differences were tested using independent sample t tests for all of the scales. The concurrent validity of the CoSS was determined through correlation with the BEECOM (Pearson’s correlations). Finally, the clinical validity of both the CoSS and BEECOM was compared testing their associations with the GAD-7, PHQ-9, BSS, and ED-15, first, using Pearson’s correlation coefficient, and then with multiple regressions to determine the parsimonious model of associations.

Results

Factor structure of the CoSS

In keeping with the exploratory nature of the factor analysis at this initial developmental stage, Principal Component Analysis was used to determine the factor structure of the CoSS. The most meaningful factor structure emerged with a Varimax rotation. The result of this analysis is presented in Table 1. Seven factors emerged, but only two met the criteria of having an eigenvalue greater than 1.0 and being visually apparent in scree analysis. The larger and smaller of these two factors accounted for 30.9% and 8.7% of the variance, respectively.

Using the criteria outlined above for item inclusion (factor loading > .5; difference of at least .2 between loadings), 22 of the 37 items loaded onto one of the two factors. The remaining 15 did not meet these criteria, and were, therefore, excluded from further consideration. As shown in Table 1, 12 items (2, 3, 4, 6, 8, 9, 12, 14, 17, 20, 30, and 35) loaded onto the larger factor. Given the content of those 12 items, this factor was labelled ‘Appearance Comparison’. The other factor consisted of ten items (1, 5, 10, 13, 16, 23, 25, 32, 34, and 37) that related to comparison with the personality of other people, and, therefore, was labelled ‘Social Comparison’. The internal consistencies of the items in the two scales were strong (Cronbach’s alpha = .916 and .891 for the Appearance and Social comparison scales, respectively).

Scoring of the CoSS

The two scales were scored by taking the item mean score for each factor, and a total score was calculated using the mean of the full set of 22 items (range = 1–7 for all scales). The mean scores for the total sample are given in Table 1, with higher scores indicating a higher level of comparison. In case of missing items when a respondent completes the measure, it is important to know whether the omission of any items is possible without making the scores unreliable. Inspection of the Cronbach’s alpha if items were deleted indicates that the scales remained internally consistent even if a number of items were removed. Therefore, in case of respondent error, it is recommended that a maximum of two items can be omitted from each CoSS scale (and the item mean adjusted accordingly) without invalidating the measure.

Test–retest reliability of the CoSS scales

All 145 of the retest sample completed the Appearance Comparison scale and 134 completed the Social Comparison scale. The mean gap was 13 days (range = 11–21). Their mean scores for each scale were as follows: Appearance Comparison at time 1 = 3.02 (SD = 1.18); Appearance Comparison at time 2 = 2.98 (SD = 1.21); Social Comparison at time 1 = 3.09 (SD = .97); and Social Comparison at time 2 = 3.07 (SD = 1.02). There were no differences between the scores at the two time points for either CoSS scale, using paired t tests (Appearance Comparison—t = 1.02, NS; Social Comparison—t = .48, NS). The Pearson’s correlations between time 1 and 2 scores were as follows for the two scales: Appearance Comparison—r = .93, P < .001; and Social Comparison—r = .90, P < .001. These strong correlations and lack of difference in mean scores show that the two CoSS scales have strong test–retest reliability.

Gender differences in body image, body comparison, mood, and eating

Table 2 shows the mean scores for the CoSS, BEECOM, ED-15, GAD-7, PHQ-9, and BSS for the males and females who completed the main study. These means were broadly comparable with the existing norms for the measures. Females had significantly higher scores than males for CoSS Appearance Comparison, BEECOM body & eating comparison, ED-15 weight & cognition factors, and depression & the BSS body factor. This suggests that females were more likely to make comparisons of their appearance, body, and eating, more likely to display eating pathology and to be depressed, and were less satisfied with their bodies than males.

Table 2 Mean scores on measures of comparison, eating, anxiety, depression, and satisfaction for the non-clinical males and females

Concurrent validity of the CoSS appearance and social scales

CoSS scores were correlated with the BEECOM scores. Table 3 shows the pattern of correlations. The CoSS Appearance scale was associated with the BEECOM Body and Eating scales. However, the CoSS Social scale was less strongly associated with those scales. Finally, the BEECOM Exercise scale had the weakest associations with both CoSS scales. This pattern of findings suggests that the CoSS Appearance scale maps onto the function of two BEECOM scales, and that the CoSS Social and BEECOM Exercise scales measure distinct constructs, which are not measured by the other comparison measure.

Table 3 Pearson’s correlation (r) between CoSS and BEECOM scales. All correlations were significant at P < .001

Clinical validity of the CoSS and BEECOM

To determine the clinical validity of the CoSS relative to the BEECOM, initially, each BEECOM and CoSS scale was correlated with the measures of eating (ED-15), body satisfaction (BSS), anxiety (GAD-7), and depression (PHQ-9). Thereafter, multiple regression analyses were used to determine the most parsimonious set of BEECOM and CoSS scales needed to explain the variance in each measure of psychopathology.

Bivariate correlations Pearson’s correlations were used to determine the associations of the two comparison measures (CoSS and BEECOM) with anxiety, depression, eating attitudes, and body satisfaction. Table 4 shows that both the CoSS and the BEECOM were correlated with almost all of the clinical measures, suggesting comparable levels of clinical value. However, it is noteworthy that the CoSS tended to have stronger correlations with the depression and anxiety measures than the BEECOM.

Table 4 Pearson’s correlation (r) between the CoSS and BEECOM scales and GAD anxiety, PHQ depression, BSS satisfaction, and ED-15 scales

Multiple regressions In contrast, when controlling for the other comparison scales (using multiple regressions), Table 5 demonstrates that there are different patterns of association for different pathologies. The CoSS scales were the only comparison scales that were associated with anxiety and depression. In contrast, a mixture of CoSS and BEECOM scales explained the maximum variance in ED-15 and BSS body scores. It is particularly worthy note that it was the CoSS Social Comparison scale which was associated with both BSS scales.

Table 5 Multiple regressions showing associations between the comparison scales (CoSS and BEECOM) and psychopathology measures (GAD-7 anxiety, PHQ-9 depression, BSS, and ED-15)

Discussion

The aim of this study was to develop a measure of body comparison that addresses the full range of components of body comparison that are routinely encountered in clinical practice. The measure that emerged consisted of 12 items that loaded onto Appearance Comparison, and 10 that loaded onto Social Comparison. Each had strong internal consistencies and test–retest reliability. Females had significantly higher scores than males for appearance and body/eating comparison, but not for social or exercise comparison. The overlap between BEECOM and CoSS scales was noteworthy—the CoSS Appearance Scale appears to measure similar constructs to both the BEECOM Eating and Body scales [19], while the CoSS Social and BEECOM Exercise scales appeared to measure different constructs. It is noteworthy that the other hypothesised construct—physical comparison—did not emerge as a separate factor. This outcome suggests that individuals focus more on the way that other people present themselves (e.g., dress) than on their physical shape per se.

Considering the clinical validity of the two comparison measures, both were related to the measures of eating pathology and body dissatisfaction, though in different ways. However, the CoSS had unique associations with anxiety and depression. Therefore, it seems appropriate to conclude that the measure of comparison used should be determined by the specifics of the clinical question being asked at the time [13, 15, 19].

It is important to note that CoSS social comparison was more strongly associated with body image than either CoSS appearance comparison or any of the BEECOM scales. This pattern of linkage indicates that one should not assume that it is appearance comparison that drives body image, but interpersonal comparison. Therefore, it is possible to hypothesise that it is comparison’s impact on self-esteem that results in negative body image—not its impact on perceptual features.

The association of the CoSS scales with anxiety and depression is particularly important in understanding the negative impact of comparison. The most appropriate model for understanding this linkage is to consider both body and social comparison as safety behaviours [29]. Despite the initial positive experience of comparison, as with all safety behaviours, the outcome is an increased level of anxiety and a longer term experience of lowered mood [30]. Body comparison is particularly likely to have these negative effects with relatively limited positive effects, because of the nature of comparison and self-perception. While downward comparison (relative to individuals who are seen as worse than oneself) is associated with positive self-perception (at least in the short-term), upward comparison has the opposite effect, resulting in poorer self-perception [14]. However, the fact that women in particular tend to see their own bodies relatively negatively [16, 17] means that downward comparisons are harder to find. Thus, comparison is more likely to be upward, resulting in worse self- and body-perception. The CoSS appears to demonstrate this theoretically coherent association more clearly than the BEECOM, hence, explaining the links of the CoSS scores with mood and anxiety scales.

Limitations and future research directions

The current study has some limitations. The sample consisted of non-clinical adults (largely female), with limited ethnic diversity. The possibility of age, gender, and ethnicity influencing these findings should be explored in further studies. For example, it is possible that males are influenced by comparison behaviours in the same way, but the fact that they use comparison less than females [31] means that they are affected less than women. Similarly, it can be hypothesised that some cultures encourage comparison more or in different forms than other cultures do (e.g., countries where women are or are not encouraged to cover their bodies in everyday dress).

Clearly, it will be particularly important to extend this study to clinical samples, to determine the CoSS’s clinical utility across different eating disorder diagnoses. Such work should also compare the CoSS’s utility to that of other measures, such as the BEECOM, UPACS, DACS, and PACS. Testing and comparing the clinical validity of the different measures would allow for better understanding of the key elements of comparison that each measure addresses. It is possible that the questionnaires will yield a common core ‘comparison’ factor, which is equally valid across measures, but that some or all will have unique elements that predict different clinical behaviours (e.g., comparison of exercise, personality). In that case, the utility of each measure is likely to be dependent on the context. Such examination of the role of context would determine which of the measures is best suited to the formulation and treatment of eating and body image disorders.

Finally, while body and social comparisons have the expected association with eating and body concerns, including anxiety and depression [1, 18], it is important to note that these are correlational studies, and the causal conclusions that can be reached are limited. A greater emphasis should be placed on experimental studies [14, 15], considering the short- and long-term positive and negative impacts of manipulation of the type and level of comparisons that the individual makes. It will be particularly important to ensure that upward and downward comparison should be considered separately, as they can be predicted to have different impacts on body image and eating pathology [11, 20].

Clinical implications

If these findings are replicated and extended in the ways outlined here, then appearance and social comparison should be considered for their clinical implications. Different forms of comparison should be considered in assessing, formulating, and treating eating disorders, particularly where there is substantial comorbid anxiety or depression. Such use of comparison as a safety behaviour [29] should also be considered in other conditions where body image is a key issue (e.g., body dysmorphic disorder). However, it will be important to note that the comparison of appearance (self-presentation) shown here does not equate to pure physical comparison with others, as that did not emerge as a separate factor. Interventions for such comparisons could include behavioural experiments, where the individual is taught to identify that the use of self-comparison might have positive short-term outcomes, but has more profound negative long-term outcomes [6]. The CoSS might be employed routinely as an assessment and outcome measure in such interventions.