Introduction

The incidence of skin cancers has reached epidemic proportions in the United States; more than 1 million cases of basal and squamous cell carcinoma and 62,190 cases of malignant melanoma are expected to be diagnosed in 2006, resulting in a total of 10,710 deaths (7,910 from melanoma [1]). Research suggests that ultraviolet (UV) radiation through sun and sunbed/sunlamp exposure is a central risk factor for the development of skin cancers, but risk can be reduced through less UV exposure and sun protection [2]. A consistent finding in the literature is that a central reason for deliberate UV exposure is the positive effect exposure has on skin appearance [36]. Corresponding to this association, intervention research in the area has clearly shown that manipulation of appearance attitudes results in beneficial effects on these attitudes, intentions, and behaviors to tan less and sun protect more [710]. Despite the relevance of appearance for tanning and sun protection, little research has addressed multiple construct domains related to a tan appearance. Our previous studies in this area have demonstrated the multidimensional nature of appearance motives related to a tan appearance [11] and how these appearance constructs may be related to UV exposure and sun-protection intentions/behaviors in a series of mediational models [12]. In the current paper, we extend our work designed to uncover and quantify appearance-based motives to UV exposure and protection in several ways: test hypothesized models of factor structure, examine gender differences in these factor models, and evaluate the internal consistency, temporal stability, and criterion-related validity of the identified factors.

Appearance constructs are central to the development of theoretical models of UV exposure and protection behaviors. Previous research suggests that one of the best predictors of UV exposure behaviors and intentions is a positive attitude related to a tan appearance [36]. However, research in this area has been limited because of a focus on appearance constructs that are broad or confounded with indicators that are not appearance-based. In addition, for many of the measures, there is only selective evidence of reliability and validity (e.g., rarely evidence based on factor analysis). Cafri et al. [11] found, through factor analytic methods, that there are appearance reasons for pursuing a tan that are broad (e.g., general appearance enhancement) and specific (including reducing the appearance of acne and enhancing the appearance of body shape), appearance reason for not pursuing a tan (skin-aging), and sociocultural influences to tan (i.e., media, peer, and family), each of which are significantly related to UV risk and protective intentions. Given that a growing literature supports the efficacy of appearance-based interventions [710], identification of these more specific constructs and their interrelations affords researchers the opportunity to develop more refined models of risk that can be used to better identify intervention targets. This is particularly important because the above referenced interventions target only a single factor related to appearance, the negative appearance effects resulting from skin aging. Moreover, as new appearance-based intervention targets are tested, scales that have strong evidence of validity and reliability can be used to precisely investigate mechanisms of change.

Although previous research identified several relevant appearance attitudes based on exploratory and confirmatory factor analytic methods [11], it did not consult the target population when constructing items, resulting in possible exclusion of important factors and item content [13], and including only females in its sample. Although previous studies have found that females are more likely than males to use indoor tanning salons and engage in outdoor tanning behaviors [1416], males, as a group, engage in tanning behaviors at levels that place them at risk for skin cancer, a risk that is increased when considering that their sun-protective practices tend to be less than their female counterparts [1718]. Therefore, from a public health perspective, there is ample reason for research to be inclusive of both genders. Moreover, recent research suggests that males increasingly value their appearance and, as a result, engage in risky health behaviors [19]. Given that a tan is generally viewed as attractive, it is likely that appearance motives to tan are salient among males. Consequently, including both genders in an examination of appearance motives related to UV exposure and protection would be important, as well as examining possible gender differences.

The first goal of this study is to identify a valid factor structure within each gender. To achieve this goal, we use items from a previous study in the area [11], develop additional items based on focus groups, and test a hypothesized factor structure separately for males and females. Although these methods are necessary to establish construct validity, it is possible that the constructs will not be comparable across genders. To determine if this form of construct validity exists, it is necessary to test for measurement invariance, the second goal of this study. If certain levels of measurement invariance can be established, it is also possible to test for latent mean differences between males and females on the factors, which is the third goal. The fourth goal is to provide evidence of internal consistency, temporal stability, and criterion-related validity of the identified factors.

Method

Participants

Participants were 589 female and 335 male students from the University of South Florida. The inclusion criterion was being between the ages of 18 to 26. Participants with skin types V and VI [20] were excluded from the study because people with brown and black skin types are at a significantly reduced risk for developing skin cancer [21]. Participants were recruited from introductory psychology classes and given course credit for their participation. The study was completed online. All female data were collected October–November, 2005. The male data were collected during two time periods, October–November, 2005 (n = 164) and February–April, 2007 (n = 171), but using identical data collection methods. Given the differences between the two samples in terms of season and year in which the data were collected, it is necessary to test whether the groups are equal before aggregating the data. To test for group equality, the covariance matrices and means of all items were constrained equal across the two samples using the estimation methods and tests described in the planned analyses. The result was χ 2 (1,034) = 1,511.96, p < 0.05, RMSEA = 0.037 (90% CI 0.033, 0.041), CFI = 0.96, NNFI = 0.93, which suggests group equality. Consequently, data from the two time periods were aggregated. The female sample had a mean age of 19.88 (SD = 1.84), and the male sample had a mean age of 20.06 (SD = 1.89). The racial distribution of female participants was 80% White/Caucasian, 5% Black/African American, 3% Asian/Pacific Islander, >1% American Indian/Alaska Native, and 11% Other, and male participants 78% White/Caucasian, 4% Black/African American, 5% Asian/Pacific Islander, 1% American Indian/Alaska Native, and 12% Other. Frequencies of participant skin types [21] for females were: 12% type I, 26% type II, 36% type III, and 26% type IV, and for males: 9% type I, 27% type II, 36% type III, and 28% type IV.

Measures

Appearance Factors

The item content representing the appearance factors can be found in the Appendix. Some of the items used were developed in a previous study [11]. The remaining items were developed based on focus groups. Participants of the focus groups were 18 students from the University of South Florida that closely resembled participants in the primary sample in demographic characteristics. Four focus groups and two individual interviews were conducted according to the methods prescribed by Vogt et al. [13]. The result of the focus groups and interviews was the generation of item content for some factors that had not been previously identified and adding item content to some already existing factors. Some items were also deleted based on redundancy with other items. The result was 44 total items.

UV Intention Outcome Variables

Intentions to behave were used to evaluate criterion-related validity because theory [22] and specific modeling efforts in this area [3] identify intentions as more proximal to attitudinal variables than behaviors.

Tanning Salon Intentions

Participants were asked to provide a 1-year estimate of times they plan to go indoor tanning (“Please give me your best estimate of how many times you PLAN TO use an indoor tanning salon in the NEXT 12 months” [10]). Participants respond to the item by checking the box that best approximates the range of times they intend to indoor tan (0, 1–10, 11–20, 21–30, etc.). Test–retest reliability of this item over a 7- to 10-day period was adequate in a previous sample of females (r = 0.90 [12]).

Future Sunbathing Intentions

Participants were asked to provide a 1-year estimate of times they plan to go sunbathing (“Please give me your best estimate of how many times you PLAN TO sunbathe in the NEXT 12 months” [12]). The scoring method is identical to that used for future tanning salon intentions scale. Test–retest reliability of this item over a 7- to 10-day period was adequate in a previous sample of females (r = 0.88 [12]).

Sun-Protection Intentions

This is a measure modeled after the sun-protection behaviors scale [3], but with an adapted item wording that asks about intentions to protect by using sunscreen on the face, sunscreen on the body, hat use, sitting in the shade, and wearing additional clothing (e.g., item, “In the future, how often do you intend to use sunscreen with sun protection factor (SPF) 15 or higher on your face when you were in the sun?” [12]). The measure has shown adequate internal consistency (Cronbach’s α = 0.70) and test–retest reliability (r = .87) over a 7- to 10-day period in a previous study of females [12].

Planned Analyses

Study hypotheses were evaluated with AMOS 6.0 using structural equation modeling with maximum likelihood estimates of model parameters. Preliminary analyses included screening for outliers and examining the assumption of multivariate normality. There was evidence of multivariate non-normality in both male and female samples for the items used in the measurement models. Based on simulations studies [23, 24], the impact of non-normality on the X 2 statistic, fit indexes, and estimation of SEs of parameters on the models tested in this study was deemed to be little (largest absolute value of univariate skewness for any individual item for either gender was 1.30 and kurtosis 1.31).

Confirmatory factor analysis (CFA) models were evaluated within each gender to determine the factor structure of the items. Given the relatively large number of items and hypothesized factors, estimating a single CFA model for each gender would be problematic because such models are unlikely to yield adequate fit, resulting in a search for misspecification that is extremely difficult [25]. Fitting a series of smaller models instead of a single model with all items/factors present can also cause problems, such as obscuring spurious or suppressor relations [25]. For these reasons, the jigsaw piecewise technique [25] was applied to examining the factor structure of the items. In this approach, models with a smaller number of items/factors are estimated separately, then brought together in a single model. In addition to examining model fit for the smaller models and the overall model, parameter shifts from the smaller to larger model are examined for evidence of misspecification. In the current application, the smaller models were estimated based on grouping items/factors in the context of three higher-order factors: appearance reasons to tan, appearance reasons not to tan, and sociocultural influences to tan. The a priori decision to evaluate factor structures in this way was based on previous theory in the body image field distinguishing perceived sociocultural influences from individual body image attitudes [26], a previous study indicating small nonsignificant correlations between appearance reasons to tan and appearance reasons not to tan [11], and interventions in the area differentially targeting these appearance motives.

After establishing models that fit well within each gender, the goal was to test for measurement invariance [27]. Several kinds of invariance can be identified in CFA models. For instance, equality of factor loadings across groups offers evidence that the unit of measurement for the underlying factors is the same across groups, that is, the factors have the same meaning for males and females [27]. It should be noted that in a first-order factor model, there is only one set of factor loadings, whereas in a higher-order factor model (i.e., a second-order model) there are both first- and second-order factor loadings [28, 29]. In the present study, there is a higher-order factor structure (viz. there are three higher-order factors that are inter-correlated); therefore, it is necessary to show that both types of factor loadings are invariant. This form of invariance is often referred as “weak” invariance [30], which contrary to its title is of central interest from a construct validity perspective [31] and is a necessary precondition for establishing stricter forms of invariance. One stricter form of invariance is “strong” invariance [30], which is satisfied if the intercepts can be shown to be invariant. If this form of invariance exists, it suggests that the origins of the scales are equal across groups, a necessary condition for testing latent mean differences [27], one of the goals of this study. In a first-order factor model, it is only necessary that intercepts of measured variables are invariant, whereas to estimate mean differences of a higher-order factor, it is necessary that intercepts of the first-order factors be invariant as well [28, 29]. Finally, when tests of invariance suggest that the groups are not invariant on the parameters of interest, some argue that it is possible to identify a model that generalizes to both groups by isolating the source of the difference and allowing these parameters to be freely estimated in each group [32]. We apply this approach when appropriate, but consider its limitations in the discussion.

Several indices were used to evaluate the structural equation models. When the aim was to evaluate an individual model for fit, the root mean square error of approximation (RMSEA), comparative fit index (CFI), and the non-normed fit index (NNFI) were used because of their relatively good performance in simulation studies [3335]. Several cut-off values were used to judge model-data fit: RMSEA < 0.05 suggested good fit, 0.05–0.08 suggested marginal fit, and >0.10 suggested questionable fit, and a CFI > 0.95 indicated good fit [3335]. When testing for invariance of parameters, a series of hierarchically nested models are compared, moving progressively from an unconstrained model to more constrained models. The likelihood ratio test (i.e., Δχ 2; differences between chi-square values) can be used to test for invariance, with nonsignificant increases in χ 2 suggesting the presence of invariance. Because this test is sensitive to sample size and non-normality [25], some have argued for the use of goodness of fit indices [36]. Based on simulation results of CFA models, one fit index criteria proposed is that CFI values should not decrease by more than 0.01 [36]. Because Δχ 2 and ΔCFI are the most widely researched [36, 37] and used indices for judging invariance in CFA models [28, 29], they are both applied to make determinations of measurement invariance. Specifically, both indices had to exceed their prespecified thresholds for a group of parameters to be judged as non-invariant. Based on the results of simulations [36, 37] and heeding the advice of others in the field [29], we regard ΔCFI as a possibly liberal test and Δχ 2 as conservative, but also consider the magnitude of ΔCFI in making determinations about the likelihood of invariance.

Results

Within Gender Analyses of Factor Structure

Initially, three separate higher-order confirmatory factor analysis models were tested among females. For the purposes of identification, each of the three models had one arbitrarily selected lower-order loading per lower-order factor fixed to a value of one, and one higher-order loading fixed to a value of one. In the first model, the higher-order factor represents appearance reasons for tanning, with three lower-order factors of general appearance enhancement, improvement in the appearance of acne, and improvement in body shape. In the second model, the higher-order factor represents appearance reasons not to tan and two lower-order factors of skin aging and immediate skin damage.Footnote 1 The third model has sociocultural influences to tan as a higher-order factor and four lower-order factors of media, family, friends, and significant others. Pictorial representations of these models can be found by focusing on different portions of Fig. 1; the top and right-hand side represents the first model, the bottom represents the second model, and the left-hand side represents the third model. As can be seen from Table 1, the fit statistics suggest that the higher-order factor models adequately fit the data, whereas the unidimensional models (i.e., all items load on a single factor) did not. It should be noted that model 3 with females and males converged, but yielded a solution with a negative error variance for the error term of the friends factor. Based on the simulation and recommendations of Chen et al. [38], the source of this result was determined to be an error variance sufficiently close to 0 that sampling variability led to a negative estimate (constraining the error variance to 0 yielded: χ 2 (1) = 1.58, p > 0.05). This led to re-specification of the model by constraining the variance for the error term to 0. Subsequent models included this constraint.

Fig. 1
figure 1

Confirmatory factor analysis model. Note: The standardized estimates are presented above for females (bold) and males (italics). All factor loadings are statistically significant at the 0.05 level

Table 1 Within gender analyses of CFA model fit

When all three of the models were incorporated into a single model (see Fig. 1), the result suggests adequate fit, in contrast to the alternative unidimensional models that did not (Table 1). Moreover, comparing the smaller models with the larger model in terms of standardized lower- and higher-order factor loadings, suggests small shifts in the parameter estimates when the absolute value of the differences were aggregated across the parameters, M = .008 (SD = 0.017). The largest outlying differences were the loadings of the general and body shape factors on the higher-order factor of appearance reasons to tan, with an increase of 0.07 units and a decrease of 0.08 units (from the smaller to the larger model), respectively. The relatively small shift in parameter estimates suggest no major misspecification present. Furthermore, a test of statistical independence was conducted [39] to evaluate the distinctiveness of the sociocultural influences to tan and appearance reasons to tan factors. Applying single-factor constraints resulted in: χ 2 (894) = 2,330.77, which when compared to the model without the constraints (fourth entry in Table 1), was significant, Δχ 2df = 2) = 38.61, p < .05. This suggests that the single-factor constraints are unreasonable, and two higher-order factors fit the data better than one. Collectively, the results suggest that the aggregated model fits adequately among females.

Identical analyses were applied to studying the factor structure among males. As can be seen from Table 1, the fit statistics suggest that the hypothesized higher-order factor models adequately fit the data, whereas the unidimensional models did not. The exception to this generalization is model 2, for which the unidimensional did have adequate fit, but the hypothesized higher-order model fit better. When all three of the models were incorporated into a single model, the result suggests adequate fit, in contrast to the alternative unidimensional models that did not (Table 1). As with females, model 3 yielded a solution with a negative error variance for the error term of the friends factor, which was determined to be an error variance close to 0 (constraining the error variance to 0 yielded: χ 2(1) = 0.85, p > 0.05). This led to re-specification of the model by constraining the variance for the error term to 0. Subsequent models included this constraint.

As with females, small shifts in the parameter estimates were observed when comparing parameter estimates from the smaller models to the aggregated model, with the absolute value of the differences yielding little change, M = .006 (SD = .009). The largest difference was the loading of the friend factor on the higher-order factor of appearance reasons to tan, with a decrease of 0.03 units (from the smaller to the larger model). The relatively small shift in parameter estimates suggest no major misspecification present. Collectively, the results suggest that the aggregated model fits adequately among males. Employing the test of statistical independence described above yielded χ 2 (894) = 2,063.92, p < .05 for a model with single-factor constraints. When compared to the model without the constraints (eighth entry in Table 1), the result was significant, Δχ 2df = 2) = 53.26, p < 0.05, suggesting that the two higher-order factors fit the data better than one. Collectively, the results suggest that the aggregated model fits adequately among males.

Measurement Invariance

The next aim is to test for measurement invariance using the aggregated models for males and females. All previously applied constraints are included in tests of measurement invariance. The threshold for measurement invariance is ΔCFI = 0.01 and Δχ 2 with α = 0.01 (Bonferonni adjustment for primary model comparisons; 0.05/5). It is necessary to first evaluate the extent to which the model fits across groups without imposing any cross-group constraints (model 1, Table 2).Footnote 2 This model fit adequately, therefore the next model (model 2) tested whether constraining the first-order factor loading equal across groups was feasible. Although the Δχ 2 criterion suggested that this was not a reasonable constraint, ΔCFI was not greater than 0.01 and was approximately ten times less than this value; therefore, it was concluded that the first-order factor loadings were invariant. When the second-order loadings, in addition to the first-order loadings, were constrained equal across groups, both Δχ 2 and ΔCFI suggested the presence of invariance (model 3). Collectively, these results imply that the unit of measurement for the underlying factors is the same across groups. Next, the intercepts of the measured variables were constrained equal, in addition to the aforementioned constraints (model 4). The result suggests that this constraint did not hold; Δχ 2 and ΔCFI both exceeded their prespecified thresholds. Consequently, we sought to identify a model that is partially invariant [28]. One approach to identifying such a model is examining modification indices for improvement in fit resulting from releasing the group equality constraints on specific items [28]. Six intercept equality constraints were relaxed based on modification indices (i.e., a univariate Lagrange Multiplier type test), which indicated that χ 2 would decrease significantly by releasing each constraint. The intercept parameters corresponded to items 1, 6, 9, 16, 23, and 24. Retrospective examination of the content of these items suggests that they target the appearance concerns of females more than males; therefore, the modifications seemed reasonable from a theory perspective. Comparing this partially invariant intercept model (model 5) to the model with only the first- and second-order factor loadings constrained equal (model 3) suggested the presence of invariance based on ΔCFI criterion but not Δχ 2, which again is interpreted as being invariant by the criteria set forth in this paper. Constraining the intercepts of the lower-order factors equal across groups (model 6) suggested the presence of invariance. Consequently, a model that can be compared across groups with respect to its latent means was identified.

Table 2 Fit statistics related to tests of measurement invariance and latent mean differences

Internal Consistency, Temporal Stability, and Relationship with Outcome Variables

Using unit weights for individual items, the internal consistency of each of the lower-order factors was calculated (Table 3). The internal consistency ranged from 0.73 to 0.96. Using an independent sample of 14 males and 80 females, the temporal stability of the lower- and higher-order factors was evaluated by measuring individuals twice over the course of 1 week. The resulting reliability coefficients ranged from 0.74 to 0.92. Examining the relationship between the factors and the three intention variables (Table 4) suggests the presence of criterion-related validity.

Table 3 Descriptive statistics and reliability estimates by gender
Table 4 Factor and criterion-related correlations for females and males

Latent Mean Differences

The specifications for tests of latent mean differences of the higher-order factors were the same as those implemented in other studies [28, 29]. In this model, males were chosen as the reference group, therefore their higher-order latent means were set to zero, while the latent means for females were freely estimated. The result is the difference between the factor means of the two groups. Tests of latent mean differences of the second-order factors are based on α = 0.0167 (0.05/3 tests of mean difference). Using the partially invariant measurement model previously identified, the results suggest that females score significantly higher than males on sociocultural influences to tan, mean difference = 0.16, z = 3.52, p < 0.001, appearance reasons to tan, mean difference = 0.30, z = 4.41, p < 0.001, and appearance reasons not to tan, mean difference = 0.57, z = 8.34, p < 0.001.

Discussion

Confirmatory factor analysis supported the validity of the factor structure within each gender, and tests of measurement invariance suggested that this factor structure is gender invariant. Estimates of internal consistency, temporal stability, and correlations with relevant intention outcomes further supported the reliability and validity of the measures of the factors. Tests of latent mean differences indicated that females had significantly higher means than males on all higher-order factors. In the sections that follow, the implications of the findings are considered in terms of how the statistical methods used in this paper can improve behavioral medicine research generally and how the specific findings contribute substantively to research on skin cancer prevention.

Factor Structure

The importance of factor analysis in evaluating the construct validity of scales assessing psychological constructs is widely accepted [41]. Less well known however, are the merits of using factor analysis to test the assumption that two groups (e.g., males vs. females, adolescents vs. adults, etc.) are equivalent on a psychological construct. Indeed, whenever two groups are being compared, or if a single sample consists of a mixture of two groups, it is assumed that the dependent variable is measuring the same construct in both groups, otherwise the results are not meaningful [40]. The merits of evaluating group equivalence of psychological constructs through tests of measurement invariance are clear; however, few studies in behavioral medicine employ such tests. Despite the lack of past studies evaluating the assumption of group equivalence, the increased use of structural equation models combined with the growth in user friendly software, will likely serve to increase number of studies employing tests of measurement invariance.

In the current study, evidence of construct validity from conventional single group factor analytic methods and tests of measurement invariance suggest motives to tan and not tan are quite multidimensional and comparable across gender. It would therefore be prudent to use scales based on the higher- and lower-order factors identified in this study with future mixed-gender samples or studies designed to make comparisons across gender. Particularly lacking in skin cancer prevention research are studies designed to make inferences about risk factors (e.g., longitudinal studies) for increased UV exposure and decreased protection. Given the significant associations between the appearance constructs and these outcomes in the current study, and past research identifying the importance of appearance in tanning and sun protection [36], use of the constructs identified in this study to evaluate risk factors would be well worthwhile. With respect to intervention research specifically, the lower-order factors identified in this study can provide direction for the design of novel intervention targets. The necessity for new intervention targets is clear because appearance-based interventions focus almost exclusively on appearance reasons not to tan [710]. However, it should be mentioned that one recent multicomponent intervention has targeted a sociocultural reason to tan, media influence, with evidence of program efficacy [42]. In addition, scales based on the lower-order factors may be used to test mechanism of change of existing appearance-based interventions, and future appearance-based interventions.

Latent Mean Differences

The conventional approach to testing mean differences between two groups on a scale representing a psychological construct is to assign unit weights to the items, calculate a sum for each individual, calculate a mean and variance for each group, then compute a t test. A somewhat better approach is to test for differences based on latent mean. The benefit of testing for latent mean difference is that it offers a more valid test of mean differences, by using an invariant measurement model with item weights based on factor loadings, and estimate means in the absence of measurement error [40]. As with measurement invariance, tests of latent mean differences offer an improvement to existing methods but are rarely implemented. For the same reasons stated previously however, such tests should see increased use in the future.

The tests of latent mean differences conducted in this study offers substantive information about how males and females differ on the appearance constructs. Women were found to have higher scores than males on all higher-order latent factors. The finding that females have higher ratings of sociocultural influences to tan and appearance reasons to tan is consistent with their use of indoor tanning salons and sunbathing at levels greater than males [1416], while the finding that females had higher ratings on appearance reasons not to tan is consistent with females engaging in more sun-protective practices than males [1718]. Therefore, it seems that the gender differences related to UV exposure and protection observed in past studies is consistent with, and may be explained in part by, appearance attitudes.

Limitations and Directions for Future Research

Despite the merits of testing for measurement invariance and latent mean differences, some caveats should be heeded. For instance, with respect to test of measurement invariance, there are only a few simulations that can be used to inform statistical decisions. Although this limits confidence in the application of the statistical procedures to behavioral medicine research, such limitations are characteristic of many advanced statistical methods. Moreover, it is always possible to re-evaluate the results of previous studies in light of the results of newer simulations. Consequently, the benefits of conducting the kind of statistical procedures used in this paper would seem to outweigh their cost. With respect to the current study specifically, the lack of simulations with higher-order factor models, mean structures, and non-normal data [29] leads to concern about how effective the cut-off criteria were in detecting non-invariant parameters. Another concern is the validity of the latent mean difference test because equality of the origin points of the scales was not found on all items across groups. Although a partial scalar invariant model was identified, partial measurement invariance is somewhat controversial, and the invariant model was obtained through the use of modification indices, which requires cross-validation [28, 29].

Another important limitation of the present study and an area for future research is the need to evaluate the identified factors in more demographically diverse samples of adults and adolescents. For instance, it will be important to test the models developed here with a younger sample, preferably high school students, who have been found to engage in high levels of UV risk behaviors [1416]. Improved identification and evaluation of factors involved in UV exposure and protection will ultimately lead to more success in the prevention of skin cancer.