Self-reported sun protective and sun exposure behaviors have often been the primary outcome variables of skin cancer prevention research. Lack of consistent use of these outcome measures led to convening a workshop of skin cancer prevention investigators in December 2005, and to the recommendation of standard measures of these behaviors (Glanz et al. 2008). Self-reported measures were needed because skin cancer risk behavior currently lacks a “gold standard” biological marker. While spectrophotometer readings assessing change in skin pigmentation provide a surrogate measure of ultraviolet radiation (UVR) exposure, the lack of precision under field conditions and difficulty interpreting changes in scores makes the technique unreliable (Creech and Mayer 1997; Milne et al. 2001). UV dosimetry, which captures the accumulated UVR dose through the use of film, offers a potential standard for determining personal UVR exposure. In a series of studies, Thieden and colleagues tested a small and relatively easy to use personal dosimeter to explore a variety of research questions including UVR dose ranges across various subgroups, different exposure levels between work days and days off, and different exposure levels between non-risky and risky behaviors (Thieden 2008; Thieden et al. 2001, 2004, 2009). Despite its promise, interpretation of an individual’s UVR dose required information on sun exposure behavior and dosimeter use compliance; thus, linkage to self-reported behavior was still required. In addition, accurate UVR exposure information required the respondent to wear the uncovered dosimeter continuously for the length of the study with special instructions if used when swimming. Dosimeter use could be quite burdensome for respondents in studies examining exposure over long time periods or for certain activities. Another criticism was that the devices were not able to record the use of some sun protection behaviors that were often a primary focus of study and intervention (e.g., wearing hats or sunglasses). Lastly, the devices were not widely available and required significant costs to be used with large samples.

There have only been limited attempts to validate measures of sun exposure and protection, despite the fact that the Society of Prevention Research’s Standards of Evidence includes validated measures as one of its principles (Flay et al. 2004). The most common strategies found in the literature for self-reported measures assessing sun exposure have been frequency estimates of minutes, hours or days, and Likert-type rating scales (e.g., never to always). For example, some studies asked for estimates of total sun exposure on an average weekday and weekend day (Diffey and Norridge 2009). Other studies asked respondents to report on sun exposure frequency from a Friday through Monday in order to capture two weekdays and two weekend days, or alternatively keep a week’s worth of exposure diaries (Dixon et al. 2007; Mahler et al. 2007). If sun exposure is a consistent weekly behavior, the strategy of assessing a few days and extrapolating to longer time periods could prove valid. If, on the other hand, situation-specific factors (i.e., weather patterns, differences due to activities like going to the beach) lead to inconsistent exposure, then frequency estimates across longer periods such as a summer are needed to assess sun exposure.

Sun protection behaviors (e.g., sunscreen use, wearing hats) have typically been assessed using rating scales that measured consistency of use (e.g., always, often, sometimes, rarely, never). These scales typically used only the single words as labels, though sometimes the labels occurred with descriptors such as percentage ranges.

Sunbathing measurement has often used rating scales similar to those used for sun protection behaviors (Kulik et al. 2008; Pettijohn et al. 2009). A few studies have used frequency estimations over a specific time period (e.g., past week, a summer, a lifetime) to estimate sunbathing. Rating scales sacrificed precision in return for ease of use. They were typically analyzed using statistics that assumed interval properties for the scale.

Assessment of unprotected sun exposure, which required the assessment of both frequency of exposure and protection, was not able to be performed using the measures typically reported in the literature. Ratings scales did not account for differences in exposure frequency. They could not estimate unprotected exposure when used alone. For example, a person who went out in the midday sun 100 times and used protection 50% of the time would experience 50 unprotected exposures (100 total exposures–50 protected exposures). A second person who went out in the midday sun 10 times and protected themselves 0% of the time would receive 10 unprotected exposures. Rating scale-based measures may have rated the first person’s sunscreen use as “sometimes,” while the second person may have been rated as “never.” If frequency of unprotected exposure was a critical variable for the development of skin cancer, then this rating scale would not accurately reflect risk even though the scale may be used to estimate the consistency of protection. The simplest way to measure unprotected exposure is to ask respondents to estimate their sun exposure frequency and their frequency of sun protection and calculate the difference between these.

In the current study, the accuracy of end-of-summer measures was assessed in comparison with weekly diaries of respondents about their sun exposure and sun protection behavior over a summer. The present study also evaluated the accuracy of the recommended standard self-reported measures (Glanz et al. 2008) in comparison with the end-of-summer and weekly measures with special emphasis on the interval properties of the scales and adequacy of quantifying unprotected sun exposure.

Method

Respondents

The study was approved by the East Tennessee State University Institutional Review Board, and all respondents signed informed consent documents before completing study material. Respondents were recruited by randomly sampling email lists of staff and students at public and private, two- and four-year colleges and universities in the Tri-Cities, TN region. Each potential participant received an email containing information about the study and a screening questionnaire about prior history of tanning. Recruitment was carried out in two waves from January though April 2007 and 2008. There were no eligibility requirements other than being 18 years old or older.

Procedure

The validation of weekly diaries required respondents to provide daily electronic reports of their sun exposure and sun protection behaviors for 1 week during the summer via DatStat Illume (DIS) online data management system. The DIS provided a time stamp for the data. Respondents were then assessed at the end of the week on these behaviors using the weekly diary. From Memorial Day until Labor Day, weekly, monthly and end-of-summer assessments were conducted with the main study and control groups. Each week, research assistants contacted those who did not complete the weekly diary and reminded them to complete it. In addition, participants completed an assessment at the end-of-summer that estimated global self-reported sun exposure and protection behavior across the entire summer. Respondents were informed that the study was designed to estimate overall regional sun exposure and sun protection practices. Therefore, they did not know the underlying purpose of the research. Respondents were paid $5 for each weekly diary completed, and $30 for the end-of-summer survey.

A control group, who did not complete weekly surveys across the summer, was included to examine testing effects. The control group estimated their sun exposure, sun protection and sunbathing behaviors at 1-month, 2-month and end-of-summer surveys. It was also possible that monitoring could affect the recall accuracy of past sun exposure and sun protection behaviors. Unambiguous evaluation of such effects was not possible. In order to determine if monitoring had any effect on the accuracy of the sun exposure and protection estimates, the amount of monitoring in one of the control groups was varied. If no effect was seen, then it could be ascertained that completing the weekly diaries did not lead to increased accuracy at the end-of-summer. This control group engaged in identical assessments as the study group with the exception that they assessed their sun exposure and protection behaviors at 50% the level (i.e., biweekly rather than weekly). The study and control group respondents were recruited at the same time from individuals agreeing to participate and they were randomly assigned to one of the three groups (study group with weekly assessment, control group with biweekly assessment and control group with monthly assessment).

Measures

The specific questions for the daily diary included: “Please think carefully about what you did today between 10 AM and 4 PM. How many hours were you outside today between 10 AM and 4 PM? Did you wear sunscreen while you were outside? Did you wear a shirt with sleeves? Did you wear a hat? Did you stay in the shade or under an umbrella to avoid sun exposure? How many hours did you spend sunbathing?” The weekly survey questions were very similar (Table 1).

Table 1 Full text and response options for study measures

The end-of-summer survey assessed sun exposure, sun protection and sunbathing using schemes commonly used to measure these behaviors in the existing literature. These included previously recommended measures (Glanz et al. 2008), as well as items from the Behavioral Risk Factor Surveillance System (BRFSS) (Nelson et al. 2001). Respondents also provided end-of-summer numerical estimates of their sun exposure and sun protection behaviors across the summer using daily and hourly estimates. Subtracting the number of days respondents reported sun protection from the days they reported sun exposure provided an index of unprotected exposure. In addition, a consistency index could be created by dividing the number of days with reported sun protection by the number of days of sun exposure and multiplying by 100 to produce a percentage (i.e., 0–100%). Consistency indices greater than 100 were set to 100. All these survey items can also be found in Table 1. The rating scale and numerical items were separated with filler questions, and the order of presentation was counterbalanced. Careful instructions devised to yield high motivations for truthful responses were employed, and social desirability response tendencies were assessed (Paulhus 1984).

Approach to Statistical Analyses

The response distributions demonstrated significant skewness and were generally leptokurtic with many outliers. In addition to traditional statistics, outlier resistant robust estimators and parameters in the analyses were also conducted due to this [i.e. 10% trimmed means, a percentage bend correlation coefficient, a robust bootstrap regression method that could estimate CIs to examine significant differences from 1.0 for the slope and 0.0 for the intercept; (Wilcox 2005)]. The use of count-based regression strategies was considered but deemed problematic for several reasons. First, these strategies are not outlier resistant nor are they robust to assumption violations. Second, examination of the count distributions revealed none of them corresponded to a Poisson distribution, a negative binomial distribution, or zero inflated variants of them.Footnote 1 In short, these methods represent misspecified models. In addition, this study was designed to explore the relationship between behavior from diaries and end-of-summer recall, which ideally should be exactly linear with an intercept of 0.0 and slope of 1.0. Although the outcome variable is a count, traditional count regression models (e.g., Poisson, negative binomial, hurdle models) do not assume linear functions between predictors and outcomes. Such models are, instead, inherently non-linear in nature, with the form of non-linearity dependent upon the particular regression model employed (see Long 1997). Given this, the most reasonable way to approach the data seemed to be methods that are outlier resistant, and that make no assumption about the distributions of the counts.

Rating scale measures were evaluated by examining estimates derived from the aggregated diaries at each rating scale value (e.g., how many days of sunbathing did respondents indicating they sunbathe “often” actually report on the diaries). For the sun protection scales, we used the consistency index for comparison. Whether scales satisfied the equal interval assumption was evaluated by examining whether there were equal interval increments between scale responses. We examined whether moving one unit on the scale (e.g., from never to rarely) was equivalent to other one-unit differences (e.g., from rarely to sometimes).

Frequency measures were evaluated by comparing end-of-summer estimates of days and hours of sun exposure or sun protection to values derived from the aggregated weekly diaries. We reported on the traditional mean, which is outlier sensitive, as well as 10% trimmed mean for weekly diary and end-of-summer measures. In addition, we reported on the Pearson correlation between the weekly diaries and the end-of-summer measure, as well as a measure of correlation that was robust to outliers, the percentage bend correlation coefficient (Wilcox 2005). The degree of bias in the end-of-summer measure was calculated using the formula (End-of-Summer/Diary−1)/(100). This formula gave the percentage of bias the end-of-summer measure over- or underestimates the aggregated diary data. In order to test significance of the differences between end-of-summer measures and diaries, confidence intervals were formed around the mean. All analyses were conducted using SPSS 18.0 or R version 2.13.1 for the robust estimators.

Individual-level comparisons of end-of-summer and diary measures

Comparisons of central tendencies were not sensitive to lack of individual agreement between end-of-summer and diary estimates. Individuals who underestimated their sun exposure or sun protection behavior can be balanced by individuals who overestimated these behaviors. Good agreement could be achieved in central tendency estimates that mask modest individual agreement. To examine individual agreement, we performed linear regression analyses predicting end-of-summer estimates from aggregated diary measures. Perfect correspondence at the individual level would result in a regression line with an intercept of 0.0 and a slope of 1.0.

Initially, ordinary least squares (OLS) regression was used to examine individual-level comparisons. Residual analyses uncovered significant heteroscedasticity, with error variance generally decreasing as predictor variable scores increased (with the exception of sunbathing days and hours which revealed increasing error variance with increasing predictor variable scores). Regression analyses using a robust bootstrap method that can estimate CIs to examine significant differences from 1.0 for the slope and 0.0 for the intercept were also used due to this heteroscedasticity (Wilcox 2005). The robust regression technique used m estimator regression with Schweppe weights and a value of κ = .10 in the Huber function.

Because many of the findings tended to be redundant, particularly for the sun protection measures, focus was given to sun exposure, sunbathing and sunscreen results. Any results that differ significantly from these have been highlighted and discussed.

Results

The study group was mainly female (78%) and Caucasian (83%) with a mean age of 24.7 years. Most respondents reported having had some college education (57%) with 27% having reported a college degree, 11% having reported an advanced degree and 5% having reported no college experience.

Approximately 84% of those approached about participating in the study agreed to participate. Of these, 91% completed full data for the summer months included in the study. There were high compliance rates with daily, weekly, biweekly and end-of-summer interviews (94%, 89%, 76% and 93%, respectively).

Analyses revealed no order effects in the data. Participation bias was examined by comparing responses on the screening survey for those who agreed to participation with those who refused. These groups did not differ significantly on screening survey measures. Attrition bias was examined by comparing those who completed the study with those who dropped out at any point in the study. No evidence of any systematic or meaningful bias was found in these comparisons. Only sporadic significant correlations with no conceptual meaning were found between social desirability scores and criterion variables reported in this study. For example, a significant negative correlation was found between social desirability tendencies and one of the rating scale measures assessing consistency of shirt wearing (r = −.11, p < .05) indicating individuals who scored higher on the social desirability measure were less likely to indicate they consistently wear a shirt with sleeves when out in the midday sun.

The weekly diaries were first validated on a group of 394 respondents. A second group was then recruited to complete the primary aims of this study, validating end-of-summer measures of sun exposure and sun protection. The study group that completed weekly diaries across the summer had a final sample size of 250 for analyses. The control groups had sample sizes of 53 (monthly assessment) and 65 (biweekly assessment) respondents.

Analyses found nonsignificant differences in terms of sun exposure or sunbathing frequency or hours and sun protection use estimates between the control group with monthly assessments and the study group indicating that monitoring did not influence the frequency of behavior. There were also no significant differences in recall accuracy seen between the control group with biweekly assessments and the study group, indicating that monitoring did not significantly influence accuracy of recall. For example, the biweekly group reported being outside for 129.8 h on the diaries, and 120.5 h on the end-of-summer report. A paired samples t-test revealed this difference to be non-significant (t(52) = 1.12, p > .05), and the two measures were significantly correlated (r = .82, p < .05). Lack of testing effects in this study were congruent with the existing literature in other health-related areas (Halpern et al. 1994; Jaccard et al. 2002).

Missing Data

Amongst the main study respondents, there were occasional missing weekly surveys. The percentage of missing weekly data for any particular week averaged 2.4%. A dummy variable for each week indicating whether data were missing or not was created, and bias was evaluated by correlating these dummy variables with the end-of-summer recall survey data and the social desirability scale. Only irregular significant correlations appeared, which had little theoretical meaning [e.g., the number of days sunbathing significantly correlated with the presence vs. absence of missing data at Week 2 (r = .23, p < .05), but not with missing data from other weeks]. Analyses evaluating whether missing 1 week of data were related to missing weekly surveys at other points revealed low correlations. In addition to the evidence above that reflected random missing data, Little’s MCAR test indicated the data were missing completely at random (chi-square = 3295.3, df = 3260, p = .33). Single imputation using an EM algorithm for ML estimation under conditions of MCAR with low rates of missing data (i.e., <3%) are not substantially biased (Schafer and Graham 2002). Therefore, missing behavioral data for any given week were imputed based on non-missing weekly values using the EM algorithm.

Validation of Weekly Diaries

The accuracy of the weekly diaries was examined by comparing them with daily reports of behavior in 394 respondents. Table 2 presents the mean and the 10% trimmed means for daily and weekly diary reports. In addition, the table also reports the Pearson correlation and the percentage bend correlation coefficients between daily and weekly diaries (Wilcox 2005). Degree of bias in weekly diaries was calculated using (Weekly/Daily−1)/(100). In order to test the significance of the differences between weekly and daily diaries, confidence intervals were formed around the mean. The means and 10% trimmed means were quite similar between daily and weekly diaries with trivial bias in most cases. While the correlation for sunbathing hours was somewhat modest (r = .62, p < .001), and the one for sunbathing days was moderate (r = .78, p < .001) the other correlations were strong (r’s = .82 to .89, p’s < .001).

Table 2 Weekly diary validation study: analysis of measures of central tendency and correlations for frequency of sun exposure and sun protection behavior for weekly and aggregated daily diaries

The weekly diary parameter estimates and aggregated daily diary reports were very similar indicating that the weekly surveys captured these behaviors, and were appropriate for estimating these behaviors across the summer. The weekly diaries were used to collect behavioral estimates across the summer. Weekly diary estimates were then aggregated and used to check the accuracy of end-of-summer self-reports.

Descriptors of Summer Sun Exposure and Protection Behavior Derived from Aggregated Diary Data

There were 9,867 reported acts of being outside between 10 am and 4 pm over the course of the summer for the 250 study respondents, with sunscreen use reported during 1,902 (19.2%) of the outdoor intervals. Respondents reported being outside a total of 25,018 h, for an average of over 100 h. In addition, respondents reported wearing shirts 4,174 times (42.3%), wearing a hat 1,202 times (12.2%) and seeking shade 2,609 times (26.4%). Respondents reported sunbathing 1,520 days for a total of 3,675 h. A total of 173 respondents (69.2%) reported sunbathing at least once. Respondents who sunbathed reported an average of over 21 h across the summer. Respondents reported 7,965 days outside without sunscreen during the summer, or approximately 32 days per person. All respondents reported going outside at least one time.

Accuracy of End-of-Summer Rating Scale Measures

End-of-summer rating scale measures of sun protection using the standard measures and BRFSS scales

The correlations of rating scales with behavioral consistency scores were generally moderate (i.e., r’s = .63–.76) for both the standard and BRFSS measures. The only exception was with the BRFSS rating scale measure of short- and long-sleeved shirt wearing, which only correlated .43 and .17 respectively, and the BRFSS hat measure that correlated .45.

The standard scales exhibited non-interval characteristics (Table 3). For example, the mean difference between never and rarely for sunscreen use was 7.60, while between sometimes and often was 27.52. Similar results were obtained if the mean differences between rating scale values was examined for other sun protective measures on both the standard and BRFSS scales.

Table 3 Central tendency and variability of behavioral consistency scores calculated from the diaries associated with standard rating scale of sunscreen use

Sun exposure

Few respondents indicated they spent more than 3 h out per day on an average summer weekday (Table 4). The responses to the standard measures appeared to match the aggregated diary data reasonably well for low frequency behavior (i.e., reported being outside an hour or less). However, those who reported higher levels of exposure (i.e., >1 h per day) generally overestimated their average hours outside on the end-of-summer scale compared to the diary reports. These data also suggested that this measure should not be considered an interval scale. A difference of one unit on the standard measure, from 2 to 3 h, was equivalent to an average difference of 22 min (1 h 34 min minus 1 h 8 min) on the diary measure. However, an identical one-unit difference from 4 to 5 h on the standard measure was equivalent to an average difference of 1 h, 2 min (3:05 – 2:03) on the diary measure. Closer examination of the weekend average data revealed that the median hours outside for respondents endorsing 3 h on the end-of-summer measure was actually higher than the corresponding median hours for those endorsing 4 or 5 h.

Table 4 Central tendency and variability of average hours outside, sunbathing days and sunbathing hours calculated from the diaries associated with standard measures of average hours outside on week days and weekend days, sunbathing days, sunbathing hours

Next we looked at whether the standard measures (Glanz et al. 2008) would perform better if used to estimate total summer-long exposure. The estimate of average hours outside on weekday and weekend days was multiplied by the corresponding number of weekdays and weekend days across the summer and the products summed to calculate total hours outside for the summer. This estimate from the standard measure was 166.1 h, which differed significantly from the 100.1 total hours calculated from the aggregated diaries (t(249) = −12.48, p < .001). The degree of bias was 66% for this estimate. The standard sun exposure estimate exhibited a relatively poor match between its response alternatives and average weekday and weekend hours calculated from the diaries. It also did not exhibit equal interval properties. When used to estimate total exposure across the summer, this measure had a very large bias index when compared to the aggregated diaries.

Estimating sunbathing using the standard rating scale

Descriptive statistics of the actual days and hours spent sunbathing during the summer derived from the aggregated diaries for each sunbathing rating scale value are also presented in Table 4. The very low number of respondents that endorsed “always” sunbathing (i.e., n = 3, 1.2%) was conspicuous. With each unit increase on the scale, the actual number of days and hours of sunbathing increased. However, much as with the sun exposure scale, the sunbathing scale was not consistent with interval scale properties. For example, moving one-unit on the sunbathing rating scales from never to rarely was equivalent to a mean days difference of 1.92 and mean hour difference of 5.19. However, a comparable one-unit difference from rarely to sometimes was equivalent to a mean days difference of 6.96, and a mean hours difference of 17.53.

Accuracy of End-of-Summer Frequency Measures

Sun exposure, sun protection and unprotected exposure

The end-of-summer frequency estimates were quite close to the diary estimates (Table 5). The differences between end-of-summer and diary estimates were non-significant for total hours outside between 10 am and 4 pm across the summer, number of days outside without sunscreen, number of days outside without a shirt with sleeves, and number of days outside in which shade was not sought. There was a slight tendency for total days outside to be overestimated in end-of-summer reports. The robust regression analyses tended to give regression lines with intercepts close to 0.0 and slopes close to 1.0 for the hours out end-of-summer frequency estimates, though the slope did differ significantly from 1.0 for this measure. There was a tendency to overestimate days out in the end-of-summer ratings by about 10%.

Table 5 Analysis of frequency estimation measures of central tendency, correlations and regression for frequency of sun exposure, sun exposure without sunscreen use, and sunbathing behavior for the summer long study

The degree of bias index was relatively large for sunscreen use days. However, the index of bias was defined in such a way that a small amount of absolute bias produced a large bias index in the case when behavioral frequency was low (e.g., an absolute bias of 3 units would result in an index of bias of 50% if the behavioral mean is 6 versus 5% if the behavioral mean is 60). This seemed to be the case with sunscreen behavior, where absolute bias was approximately 2–3 days. The correlations between diary and end-of-summer estimates were strong for sunscreen use. The results of the regression analyses indicated regression lines with an intercept of 1.0 and a slope not significantly different from 1.0.

End-of-summer estimates of days outside without sunscreen showed less than 1% bias when compared to estimates derived from the aggregated diary estimates. Bias was small and non-significant for hours out and days out without sunscreen. Correlations were moderate to strong for these frequency measures (r’s = .76–.81).

Sunbathing

The degree of bias index was relatively large for both sunbathing days and hours. However, much as with sunscreen use, absolute bias for sunbathing days was only 3 and for sunbathing hours about 5, despite bias indices of 55% and 50%, respectively. The correlations between diary and end-of-summer estimates were strong for both sunbathing behavioral estimates. Furthermore, the results of the regression analyses indicated regression lines with intercepts not significantly different from 0.0 and slopes not significantly different from 1.0 for both sunbathing days and hours.

Consistency of sun protection using numerical self reports

There was a tendency to over-report sun protection consistency to varying degrees (degree of bias ranged from 10% to 25%). However, correlational and robust regression results seemed to indicate that the individual-level comparison was quite close (i.e., intercepts not significantly different from 0.0, slopes not significantly different from 1.0). For example, mean sunscreen consistency was 22% using diary reports but 27% using end-of-summer reports. These estimates correlated highly (r = .83), with an intercept of 0.00 and slope of 1.03. These data indicated that generally when an individual’s sunscreen use consistency estimates changed by 1 %, their end-of-summer estimates of consistency changed by approximately 1 % too.

Discussion

Frequency of sun exposure and protection was reliably obtained by asking participants for estimates using open-ended responses. This technique was easy to implement and had low subject burden. When compared with aggregate diary measures of behavior, the frequency estimation strategy had high levels of accuracy. For example, the mean of the respondents’ estimation for total hours spent outside over the past summer was almost 98 h in this study. The estimate from the diaries revealed an average of close to 100 h, a bias of about 2%. This bias was trivial given the time period assessed and the accompanying evidence for individual level correspondence seen in the strong correlations and linear relationship between end-of-summer reports and diaries.

Accuracy in self-reported open-ended responses over a relatively long time frame of three summer months was an important finding of this study. While it may seem reasonable that open-ended estimates should be more accurate over short time frames, studies in other health behavior areas indicated that the 3-month time period may be favorable in some situations. The use of counting strategies to estimate behavior would tend to favor shorter time frames. However, it was likely that recall strategies vary based on the frequency of behavioral engagement (Jaccard et al. 2002). Infrequent behavior probably led to attempts to recall specific episodes, while frequent behavior more likely resulted in the use of mental rules to estimate frequency. Short time frames may encourage counting episodes, which could be counterproductive for those with more frequent behavior (Menon 1993). Somewhat longer time frames, such as the 3 months in this study, may provide accurate recall because it was short enough to encourage counting strategies in individuals who performed the behavior infrequently, but long enough to motivate rule-based strategies for more frequent behavior.

Furthermore, estimates over shorter time frames were more likely to reflect situation-specific effects (Ajzen and Fishbein 1977; Jaccard and Wilson 1991). Summing estimates across longer time frames will lead to situational effects biased toward increased behavior being canceled out by situational effects biased toward decreased behavior. For example, when both months occur in the time frame evaluated, a rainy month with less sun exposure would be balanced out by a drier month encouraging increased exposure.

Sun protection consistency estimates using frequency estimates of protection divided by frequency estimates of exposure demonstrated a slight tendency to overestimate consistency when means were examined. However, the results of the correlation and regression analyses indicated good correspondence between end-of-summer and diary measures on this index. The frequency of unprotected UV exposure was assessed by measuring both sun exposure and protection frequency and calculating an unprotected sun exposure index. This index was quite precise over the relatively long time frame of this study, demonstrating a bias of <1% in estimates of exposure without sunscreen. Similar small biases were found across other measures of unprotected exposure (i.e., sun exposure without hat). Furthermore, these estimates exhibited excellent indices of fit for individual-level comparisons with strong correlations and good evidence for linear relationships.

Generally, the strategy of simply asking individuals to estimate their frequency of exposure in terms of days and hours and their use of sun protection measures using open-ended responses and then using these responses to calculate unprotected exposure or consistency ratings demonstrated sufficiently accurate estimates of behavior as measured by the diaries. Robust regression analyses generally found regressing end-of-summer frequency reports onto behavior aggregated from diaries resulted in slopes near 1.00 and intercepts near 0.00 (with the exception of total days outside and total days sunscreen use which had higher intercepts). It is important to note that the data in this study exhibited significant heteroscedasticity and non-normal distributions on the end-of-summer and diary reports. Such properties of outcome data can create significant problems and biases in the context of traditional ANOVA and regression analyses. Bootstrapping and robust estimation techniques that are outlier resistant have made significant strides in recent years. It would be wise to consider these analytical tools when evaluating the types of risk measures examined in this study (Wilcox 2005).

A secondary finding of this study was that the standard measures for both the sun protection and sunbathing scales deviate from the interval assumption. The standard measures relied on rating scales to measure sun protection and sunbathing and estimates of typical weekday and weekend day exposure using a close-ended response format to quantify sun exposure. This approach was appealing due to its simplicity, and the common belief that it was not possible to accurately estimate behavioral frequency across time periods as long as a summer. Since rating scales asked respondents to provide a single consistency estimate for their behavior during a specific time, this approach appeared to assume behaviors were relatively consistent, without significant impact from situation-specific factors that would vary over time. Furthermore, rating scales have been typically analyzed using parametric statistical approaches that assumed that the scales have interval properties. The deviations identified in this study may confuse the interpretation of central tendency measures and regression coefficients used in theoretical model and intervention efficacy evaluation if treated as interval data. However, if treated as ordinal data, suitable analytical methods exist to appropriately analyze these scales. Based on these data, rating scales of sun protection or sunbathing demonstrating these non-interval properties should be treated as ordinal scales, and analyzed with statistics appropriate for non-interval data.

The standard sun exposure measures did not map well onto behavior estimated from the aggregated diary measures. This was particularly true for individuals who reported more than 1 h of exposure per day. For example, individuals who reported they are normally outdoors 3 h on the typical weekend day, averaged more time outdoors in their diaries than those who had indicated they were ordinarily outdoors for 4 h. Attempts to use the standard sun exposure measures to estimate overall summer exposure were also not successful, with overestimation of summer exposure by 66%. Given that sun exposure behavior is influenced by numerous variables, such as the weather, location, activities, holidays and the people one is with, it is perhaps not surprising that measures assessing a typical days’ behavior did not easily capture it across a summer.

The BRFSS measures of hat and short- and long-sleeved shirt use had modest to small correlations with the consistency index derived from the diary assessments. It is possible that the extra detail provided in the BRFSS items [i.e., separating short- and long-sleeved shirts; providing more detailed description of the types of hats (see Table 1)] had a negative impact on their accuracy. However, it is more likely that the relatively low correlations reflected the fact that the wording of these items differed somewhat from the diary items that assessed more generic hat and shirt use.

The limitations of this study included the time period examined and the population studied. First, the accuracy of these techniques was demonstrated for time periods up to 3 months. The accuracy of asking participants to look back and estimate skin cancer risk behaviors over longer time periods has not yet been evaluated. Of course, it would be possible to implement measurement strategies that simply asked respondents to estimate their behaviors every 3 months, a procedure that may not be difficult or particularly burdensome for them. It is also possible that the discrete seasonality of summer sun exposure may have enhanced the ability of individuals to accurately estimate their sun exposure and protection behavior in ways that would be more difficult for other behaviors (e.g., seat belt use).

The comparative approach used here assumed that individuals could accurately recall sun exposure and protection behavior on diaries over 7 days. This assumption was explicitly confirmed in the first study of this report by comparing weekly diary reports to aggregated daily diary reports. Furthermore, a study by Thieden and colleagues (2001) confirmed that sun exposure diaries were significantly correlated to objective measures of UVR exposure as measured on personal dosimeters. Second, this approach assumed that people would be truthful in their reports of sun exposure and protection behavior. This assumption is reasonable given that all comparisons made between self-report and objective measures of exposure or protection behaviors published thus far have indicated that study respondents are not influenced by social desirability tendencies (Buller et al. 1996; Girgis et al. 1993; Milne et al. 1999; O'Riordan et al. 2006; Oh et al. 2004). Lastly, this approach assumed that the act of obtaining repeated assessments of sun exposure and protection behavior would not affect either respondents’ behavior or their ability to recall their behaviors. The results of the analyses comparing the control to main study respondents indicated that monitoring did not significantly affect either the frequency of behavior or the recall accuracy.

The fact that the sample was drawn from college students and staff in the urban/rural region of the Southeast, and thus not a national sample is a further limitation. The sample was also generally more educated than the general population. It is possible that more education could have led to more accurate end-of-summer self-reports. However, when the accuracy of the self-reported behavior in college graduates, including many with advanced degrees, was compared to those not possessing a college degree, there was no evidence of differences in accuracy of self-reported behaviors. It is also important to remember that the end-of-summer assessments were initiated in the context of specific procedures that have been shown in other related areas to increase honesty and to accurately reflect actual behavior of relatively salient events. Specifically, the surveys were self-administered, able to guarantee anonymity, stressed the importance of honest responding and avoided face-to-face interviews (Jaccard et al. 2003; Turner et al. 1997, 1998). It is also important to consider that the modest to moderate correlations found between sunbathing behavioral estimates derived from daily diaries and weekly diaries somewhat reduces our confidence in the comparisons using the sunbathing days and hours estimates.

In conclusion, there was minimal evidence for bias in end-of-summer self-reported frequency of behaviors. In general, previous concerns raised about global frequency estimates focused on possible under-reporting of self-reported health risk behavior. Overall, the relatively small biases found here tended toward over-reporting rather than under-reporting sun exposure. Rating scale measures performed reasonably well for sun protection behavior if they were treated and analyzed as ordinal data. However, rating scales sacrificed precision in return for ease of use. Given that the frequency measures used here were both easy to use and relatively precise, it appears that they should be considered in studies examining sun exposure and protection behavior across a summer. Furthermore, the frequency measures had the advantage of allowing calculation of an index of unprotected exposure that appeared relatively accurate. The standard scales for sun exposure did not fare well in these analyses, demonstrating a large bias from diary reports; thus, they should be used with care in studies attempting to assess sun exposure behavior across a long time period like a summer.

Prevention science depends on the quantification of behavioral risk factors through national surveys, the understanding of those behaviors through theoretical modeling, and their modification through efficacious interventions. Each of these, in turn, depends on the development of accurate outcome measurements. The ability to accurately estimate behavioral risk factors over relatively long time periods using measures that are easy and inexpensive to implement are crucial across all prevention areas. Rating scales are often relied upon despite their relative lack of precision due to their ease of use, and the assumption that frequency estimation of behavior is inherently inaccurate. Other methods to obtain accurate behavioral frequency data such as diaries, observations or physiological measures can be expensive to researchers and burdensome for respondents. However, empirical testing of these assumptions is relatively rare in prevention science. An earlier study demonstrated that accurate recall of sexual intercourse and condom use frequency can be obtained using simple procedures across time periods as long as a year (Jaccard et al. 2002). The current study indicates the ability to obtain accurate behavioral frequency estimation on a different set of behaviors, sun exposure and sun protection, across a summer. These studies indicate that the assumptions about the inaccuracy of frequency estimation over relatively long time frames should be empirically examined in other areas of prevention. This and the study by Jaccard and colleagues (2002) provide potential methods for pursuing these empirical evaluations. Risk behavior frequency may be able to be more easily and precisely estimated than previously thought possible.