Introduction

Muscle dysmorphia (MD) is a subcategory of body dysmorphic disorder characterized by an obsession with one’s body not being sufficiently lean and muscular, clinical depression, and social and occupational impairment [1]. Features associated with MD comprise the use of performance-enhancing substances, compulsive mirror checking, significant time spent on weightlifting, and excessive attention to diet [2]. The drive for muscularity is a term coined by McCreary and collaborators [3] that describes individual motivation to become more muscular. The drive for muscularity has been suggested to be associated with MD [4]. Women and men are both affected by MD; however, it tends to be disproportionately diagnosed among men [5], particularly in those who engage in sports that emphasize increased muscle mass or power gain, such as football [6], weightlifting, or bodybuilding [7].

In accordance with the growing body of the literature on MD, a variety of measures has been developed for epidemiological and diagnostic purposes [8]. Such measures specifically evaluate symptoms and diagnostic criteria of MD, including the male body attitudes scale [9], the muscle dysmorphic disorder Inventory [10], the muscle dysmorphia inventory [11], and the muscle appearance satisfaction scale (MASS) [12]. Because of these scales’ novelty, little is known about their psychometric properties for the Mexican population.

Particularly, the MASS has been widely used in studies on MD sample characterization [13, 14], transcultural comparisons [15], evaluation of anthropometric correlates [16, 17], related factors [18], and predictors [19]. It is a 19-item measure developed to assess body dysmorphic symptoms related to muscle size, which analyses the cognitive, affective, and behavioural domains of MD. Originally, the MASS was created using a 7-point Likert-type scale (1 = strongly disagree, 7 = strongly agree). To maintain consistency of scaling, three items designed to evaluate muscular satisfaction (items 1, 4 and 14) are coded inversely (5 = strongly disagree; 1 = strongly agree). Higher scores reflect a tendency towards MD.

Mayville and collaborators [12] examined the factor structure of the MASS across two samples of male weightlifters and identified the following five subscales. (1) Bodybuilding dependence (items 2, 7, 8, 12, and 15; Cronbach’s α = 0.78–0.80) evaluates excessive weightlifting activity and compulsive tendency to work out. (2) Muscle checking (items 3, 11, 18, and 19; α = 0.79) examines mirror checking and reassurance-seeking behaviour to evaluate muscle appearance. (3) Substance use (items 5, 6, 9, and 17; α = 0.74–0.75) assesses the willingness to use anabolic/androgenic steroids and other substances to gain muscle mass. (4) Injury risk (items 10, 13, and 16; α = 0.76–0.77) measures the symptoms of overtraining and beliefs related to unsafe weightlifting behaviour. (5) Muscle satisfaction (items 1, 4, and 14; α = 0.73–0.75) assesses satisfaction with the individual’s own muscle size and shape. The MASS has good internal consistency (α = 0.82–0.87), test–retest reliability after 2 weeks (r = 0.82), and convergent validity with measures of body dysmorphic disorder and body satisfaction [12].

The MASS has been validated in Brazil [22], the UK [20], Hungary [23], Spain [5], Mexico [24], and China [21]. Generally, the psychometric properties of the MASS are supported by evidence of its internal consistency, test–retest reliability, and construct validity. With regard to the scale’s internal consistency, the Cronbach’s α coefficient ranges from 0.77 [21] to 0.94 [5]. With regard to temporal reliability, reported correlations have ranged from 0.79 [12] to 0.94 [5]. However, previous studies have reported inconsistent results with respect to the scale’s factor structure. The reason for this discrepancy may be due to three partial causes: differences in sample type [23], statistical methods used [20], or a combination of both.

In relation to sample type, four studies have examined the factor structure of the MASS in male weightlifters [5, 2123], and three of them [5, 21, 23] agreed that the MASS measures the five factors proposed by Mayville et al. [12]. However, in community samples of men (mostly undergraduate students), the obtained structures differed between studies, which identified four factors (injury risk and muscle checking, substance use, dependence on exercise, and muscle satisfaction [24]), three (muscle satisfaction, dependence and injury risk, and muscle checking and substance use [23]), or one (general MD symptoms [20]).

A limited number of studies [5, 2024] have evaluated the relevance of the MASS’s factorial structure. Almeida et al. [22] found a 19-item four-factor model with a weightlifters Brazilian sample. Ryan and Morrison [20] found a six-item one-factor model with an employed and undergraduates England sample. Babusa et al. [23] found an 18-item five-factor model with weightlifters, and a 17-item 3-factor model with an undergraduates Hungarian sample. González-Martí et al. [5] found a 19-item five-factor model with a weightlifters Spanish sample. López et al. [24] found a 19-item four-factor model with an undergraduates and high school Mexican sample. Finally, Jin et al. 2015 [21] found a 17-item five-factor model with a weightlifters Chinese sample. All this studies (except for Almeida et al. [22] who do not indicate) used a five-format answer option (1 = strongly disagree; 5 = strongly agree).

Moreover, with the exception of the work of López et al. on Mexican students [24], we are unaware of any studies that have examined this construct using Mexican bodybuilders as participants.

Considering this weaknesses, the purpose of this study was to analyse the psychometric properties of the MASS in a sample of Mexican bodybuilders. We expected that the Mexican version of the MASS would have the same conceptual constructs as the original version.

We examined the internal consistency, test–retest reliability, and we performed confirmatory factor analyses (CFA) to test the competing models of the latent structure of the MASS, including the original version of the MASS.

Method

Participants

The sample was composed of 258 male participants aged 15–57 years (M = 25.63; SD = 7.34), who were recruited from 13 private gyms in northern Mexico City. The majority of participants reported that they were single (77); regarding occupation, most were office workers (58%) or students (36%). Few did not provide data (5%). All participants met the inclusion criteria: male bodybuilders who worked out with weights to enhance muscular development and achieve an ideal male physique, and who spent at least 3 h a week in the gym (M = 10.13; SD = 3.94) for at least 6 months (M = 44.77; SD = 58.80). Bodybuilders who worked out less than 3 h per week were excluded. Because the questionnaires were answered individually and face-to-face, the response rate was 100%.

Measures

We used the MASS as it was previously described [12] according to the 5-point response format proposed by Ryan and Morrison [20]. We also used the drive for muscularity scale (DMS), which assesses attitudes and behaviours related to muscular appearance [3] on the basis of 15 items using a 6-point response format (1 = always; 6 = never). The items are inversely coded, with high scores indicating a higher drive for muscularity (i.e., more concern about muscularity). The DMS has satisfactory internal consistency (α = 0.84) and convergent validity with the desire for increased muscle mass. In Mexico, Escoto et al. [25] identified a three-factor structure for the DMS: attitudes (α = 0.87), substance intake (α = 0.72), and training adherence (α = 0.68). The suitability of the scale was verified by confirmatory factor analysis. In present study, internal consistency with McDonald’s omega was satisfactory for DMS (ω = 0.91), and for its subscales: attitudes (ω = 0.90), substance intake (ω = 0.85), and training adherence (ω = 0.71).

Procedure

Two independent professional bilingual translators forward-translated the original English version of the MASS into Spanish. The two independent professional bilingual translators and two members of the research group met to review, reconcile, and harmonize the forward translation. The reviewed forward translation was backward-translated into English by two independent bilingual translators. The research group and the forward translators then reviewed and compared the backward translation with the original English MASS. Twenty-eight participants completed the questionnaire and were interviewed about each item and the answer options. Generally, the answer options “strongly disagree/agree” were modified to “definitely disagree/agree” and the word “tone” to “definition”.

Ethical approval for the study was obtained from the Ethics Committee of the Autonomous University of the State of Mexico. Participants were invited to participate in a study on body image. All participants signed informed consent forms before testing, and they were accompanied to a private room in the gym setting, where testing took place. Testing sessions lasted 20–30 min.

Data analysis

We performed confirmatory factor analyses (CFA) to test the competing models [5, 2024] of the latent structure of the MASS (including the original version of the scale), using the structural equation program (EQS-6.1) [26]. To assess multivariate normality, we used the R software to calculate Mardia’s normalized kurtosis and skewness coefficients. It has been suggested that utilizing maximum likelihood estimation with data having a standardized Mardia’s p value lesser or equal to 0.05 could result in bias. As the MASS outcome variables were not normally distributed, a robust maximum likelihood algorithm (Satorra–Bentler) [27], based on the raw data matrix, was used to examine the fit of the model using EQS version 6.1. Goodness-of-fit indicators included [28]: Chi-squared (χ 2) and χ 2-normed (χ 2/df), the non-normed fit index (NNFI), the comparative fit index (CFI) [29], the incremental fit index (IFI) [29], and the goodness-of-fit index (GFI [26]). The cut-off value considered for the NNFI, CFI, IFI, and GFI was ≥0.90; thus, standardized root mean square residual (SRMR) [30] and root mean square error of approximation (RMSEA) should yield values <0.08 [31].

The Statistical Package for Social Sciences 19.0 was used for two analyses: For the temporal stability with a 2-week test–retest interval, an intraclass correlation coefficient [two-way mixed, single-measure (ICC3,1)] was utilized. For the convergent validity, Pearson’s correlation between the MASS and DMS, as well as between the MASS and training frequency were performed. Additionally, internal consistency was estimated using McDonald’s omega [32], with R free-software.

Results

Sociodemographic data are shown in Table 1. Most participants were office workers under 22 years old, single, with high school education or lower.

Table 1 Sociodemographic data for sample

Confirmatory factor analysis

Before conducting the structural equation modelling analyses, a multivariate normality test was made. Normalized estimation of Mardia’s coefficients (11.32–13.27 for kurtosis and from 49.30 to 64.87 for skewness) for the data in each model suggested the data deviated from multivariate normality (p ≤ 0.05), except for the Ryan’s model (normalized Mardia’s kurtosis coefficient = 0.13, p = 0.89).

The comparative summarized table of model fit indices based on estimation techniques is given in Table 2.

Table 2 Comparison of fit indices of confirmatory factor analysis for five models

In all six models, a robust estimation method and standard errors were considered, given the assumption of non-multicollinearity between the variables. The models of Mayville et al. [12], Ryan and Morrison [20] as well as Jin et al. [21] showed an acceptable adjustment to the data, significantly higher than the rest of the models. Empirically, the three models have adequate goodness-of-fit indices; however, from the theoretical perspective, Ryan and Morrison’s model [20] does not cover all the dimensions of the construct. Finally, Jin et al.’s model [21] showed higher goodness-of-fit indices with regard to Mayville et al.’s model [12], so the model that best fits the Mexican sample is Jin et al.’s model [21] with 17 items (Table 3).

Table 3 Descriptive statistics and comparative standardized factor loadings for the Mexican version of the MASS

The items’ loading coefficients onto their factors for the Jin et al.’s model [21] were significant [M = 0.62; range from 0.47 (item 10 of injury risk) to 0.83 (item 19 of muscle checking)]. The muscle checking factor had the highest average intra-factor loading coefficients (0.72); conversely, the substance use factor and bodybuilding dependence had the lowest corresponding mean (0.61). The correlation coefficients between the five MASS factors (Table 4) ranged from 0.01 (substance use and muscle dissatisfaction) to 0.65 (substance use and bodybuilding dependence).

Table 4 Omega, test–retest, intraclass, and factor correlation of the MASS

Reliability of the MASS

The MASS and its subscales have acceptable reliability, with omega >0.77 [33]. McDonald’s omega for the total scale was good (ω = 0.88). To calculate test–retest reliability for the MASS, intraclass correlation coefficients were calculated for a subsample of 38 participants (Table 4). The scale and subscale scores showed from acceptable to good test–retest reliability (0.75–0.91).

Convergent validity

Muscle appearance satisfaction scale total and subscale scores were correlated with those of the DMS (Table 5). These associations were positive and significant, except for that between muscle dissatisfaction-MASS and substance intake-DMS. Notably, the correlation of muscle dissatisfaction-MASS with total-DMS as well as with training adherence-DMS was weak. Additionally, we examined the correlations of the MASS and its subscales with training frequency (number of hours of training per week). The training frequency was related positively and significantly with the MASS and their subscales, except for muscle dissatisfaction.

Table 5 Correlations of the MASS with DMS

Discussion

This study was aimed to analyse the psychometric properties of the MASS, including performance of confirmatory factor analysis and evaluation of internal consistency, test–retest reliability, as well as convergent validity, in a sample of Mexican male bodybuilders. Our confirmatory factor analysis supports three models: The one-factor model [20] and the five-factor models [12, 21]. We believe that the five-factor structure proposed by Jin et al. [21] characterized the construct in a better way. In this sense, the scale could be useful to differentiate five dimensions that have been linked to muscle dysmorphia: muscle checking (items 3, 11, 18, and 19), muscle satisfaction (items 1, 4, and 14), substance use (items 5, 6, and 9), injury risk (items 10, 13, and 16), and bodybuilding dependence (items 2, 7, 8, and 12).

Similar to what Jin et al. [21] found in his study, we observed that the CFA consisting of 17 items proves to be the most adequate model for the Mexican version of the survey. Therefore, despite minor variations in the number of grouped items, our results confirm the five-factor structure proposed by Mayville et al. [12] for the MASS [5, 21, 23]. Studies of the Mexican version of the MASS have included high school and college students and identified four factors [24]. However, López et al. [24] did not indicate the criteria for factor retention, and he did not perform a CFA. In contrast, the present study included bodybuilders, and CFA ratified the five-factor structure confirmed in other studies [5, 12, 21, 23].

In our study, the internal consistency of the MASS was similar to that reported among male weightlifters in most previous studies [12, 13, 15, 21, 23, 34], but it was lower than that estimated by González-Martí et al. [5]. In all subscales, except Bodybuilding Dependence and Injury Risk, the internal consistency was similar to that reported by Mayville et al., Jin et al., and Babusa et al. [12, 21, 23], but it was lower than that reported by González-Martí et al. [5]. The test–retest reliability coefficients were similar to those in previous findings [5]; therefore, the MASS is generally stable when re-administered within 2 weeks.

To examine convergent validity, correlations between the MASS and DMS were calculated. The MASS total and subscale scores were correlated with those of the DMS. In general, such associations were significant, indicating good convergent validity. However, the correlations of the muscle dissatisfaction-MASS subscale with the DMS total score and the training adherence-DMS subscale were weaker, but still significant. Additionally, the association between muscle dissatisfaction-MASS and substance Intake-DMS was non-significant. DMS and MASS results were linked, suggesting similarity between the two measures and supporting the construct validity of the MASS. This result is congruent with the findings of previous studies [5], which found that the drive for muscularity significantly predicted MD symptoms.

Besides, training frequency was related positively and significantly with the MASS and their subscales, except for muscle dissatisfaction. This findings correspond to results of previous studies [35], which indicate that there are several unhealthy behaviours (e.g., substance use) linked to MD, and the most common is weight training, in order to gain muscle mass.

Our study showed that the MASS is a robust and reliable measure of MD symptoms. However, as in previous studies [12, 18, 2123], the muscular dissatisfaction-MASS subscale showed weak (or absent) correlations with some measures of exercise performance (e.g., bodybuilding dependence-MASS and training adherence-DMS), as well as with substance use to increase muscle mass (e.g., substance use-MASS and substance intake-DMS). Additionally, muscular dissatisfaction-MASS was the only subscale that did not show a significant difference in results between bodybuilders and weightlifters, or between weightlifters grouped by risk: low, moderate, or high [13]. This subscale is also not significantly correlated with fulfilment (or non-fulfilment) of the diagnostic criteria for MD [34]. Therefore, future research should determine whether these findings are attributable to the scale, because other measures have shown a significant ability to differentiate between bodybuilders and non-bodybuilders [36, 37], or between men diagnosed with MD and typical gym users [38].

Our data revealed the following four important aspects: (1) The best model included 17 items grouped in five factors. (2) In terms of items, the muscle checking factor showed the highest average intra-factor loading coefficient and temporal stability. (3) The correlation coefficients between factors were less than 0.70, indicating lack of overlap [39] among them. (4) The muscle dissatisfaction factor was the least correlated with the other factors, specifically with substance use and injury risk. (5) The bodybuilding dependence factor had the strongest correlations, mainly with substance use and injury risk.

This research has some limitations. First, our reliance on a non-probabilistic sample necessarily limits the generalization of the present results. Additionally, we only used a small number of additional scales to investigate the construct validity of the MASS. Future studies should examine the relationship between the MASS and other individual variables, such as self-esteem, social desirability, exercise dependence, attitudes towards steroid use, perfectionism, preoccupation with food, and fat-free mass index. Future studies need to examine whether widely accepted theories (e.g., objectification and cognitive behavioural and sociocultural theories) are useful predictors of MD symptomatology. To support the validity of the MASS, researchers could administer it to women to demonstrate that men are more preoccupied with muscularity. The stability of the MASS should also be evaluated over longer periods. In addition, all participants in the present study completed the questionnaires in the same sequence, such that order effects were not controlled. It would be appropriate to counterbalance the ordering in future studies. Finally, this evaluation of psychometric properties confirms the viability of employing the MASS with 17 items and five factors in Mexican male bodybuilders to measure their MD symptoms.