INTRODUCTION

Sexual behavior has been implicated as a major route through which HIV/AIDS is transmitted, and is the route through which sexually transmitted infections (STIs) are spread. As such, a large focus of the HIV/STI prevention literature has been on understanding and promoting safer sexual behavioral change (Centers for Disease Control and Prevention [CDC], 2002). Although complete abstinence is the best way to eliminate one's risk for HIV and other STIs, sexually active individuals are unlikely to become abstinent and a recent national study indicates that adolescents who pledge to delay their initiation of sexual activity often fail to keep such pledges (Bearman & Bruckner, 2004). In fact, such individuals were found to have similar STI infection rates as those who did not take a pledge. In addition, data indicate that individuals use sex partner questioning and selection of what they view as uninfected partners to reduce their HIV/STI risk (Mays & Cochran, 1993; Noar, Zimmerman, & Atwood, 2004); however, because partners may lie or not know that they are infected with HIV or another STI, this strategy as a prevention technique is generally not advisable.

What remains as the best method to reduce HIV and other STI risk among sexually active persons is the correct and consistent use of condoms, and thus a large literature has been devoted to understanding the psychosocial correlates of condom use (for reviews, see Flowers, Sheeran, Beail, & Smith, 1997; Sheeran, Abraham, & Orbell, 1999). In addition, numerous sexual risk reduction programs have been focused on increasing condom use, with evidence suggesting that such programs can be effective in doing just that (e.g., Johnson, Carey, Marsh, Levin, & Scott-Sheldon, 2003).

Despite this large amount of research, there has been a lack of consensus as to the best way to measure and validate self-reports of sexual behavior generally and condom use specifically (Catania, Gibson, Chitwood, & Coates, 1990; Schroder, Carey, & Vanable, 2003a). This lack of agreement is problematic because it makes comparison of studies difficult (Pinkerton et al., 1998). For instance, if two sexual risk reduction interventions use different outcome measures, and one wishes to compare the effectiveness of the two interventions, this task becomes quite difficult. Similarly, if two studies devoted to the relationship of psychosocial variables to condom use achieve different results, it is often unclear if this is related to their differential measurement of condom use or to some other factor. Thus, use of different measures may make it difficult to cumulate findings from numerous studies over time, including an assessment of which factors are most related to condom use as well as which intervention programs are most effective in increasing condom use.

In addition, there are significant public health implications that directly relate to condom use measurement. For instance, if a study examining condom use in a certain community is conducted using an inappropriate condom use measure or a measure that is not sensitive to certain aspects of sexual behavior, then such a study may reach inappropriate conclusions regarding risk behavior. As one example, a surveillance study may be conducted using only a proportional (e.g., percent) measure of condom use. Although such a measure may accurately assess the proportion of times individuals use condoms, such measures do not take into account the frequency of sexual intercourse. Thus, if individuals in a community reduce their frequency of intercourse, which might translate into reduced STI risk, such an outcome would not be captured by the proportional measure. In this manner, accurate measurement can relate directly to public health impact and policy decisions about how to prevent STIs. In addition, moving toward consensus as to what are more and less advantageous ways to measure self-reported condom use might have broad implications for future studies that involve the measurement of condom use.

Condom Use Measurement and the HIV/STI Prevention Literature

What do we know about the sexual risk behavior literature and condom use measurement? Sheeran and Abraham (1994) conducted what is perhaps the most comprehensive review of condom use measurement, examining 72 studies of sexual risk behavior. They examined measurement of condom use in a group of correlational studies, and found great variation in how condom use was measured, identifying 94 distinct measures of condom use (some studies reported more than 1 measure). They found that the most common type of condom use measure was a frequency measure (37% of studies), followed by condom use at last intercourse (14%) and percentage of condom use measures (13%). The most common response alternatives were yes/no (34%), 3- to 4-point (24%), or 5- to 8-point Likert-type scales (14%), or “count data,” in which participants wrote the number of times they had sex with and without condoms (21%). In addition, the most common recall period used was 3–6 months (29%), followed by no recall period at all (18%), followed by 12 or more months (15%). Further, most condom use measures asked participants to respond with regard to all of their sexual partners (79%); only a minority of studies specified to which sexual partner the condom use questions apply (e.g., primary partner and casual partner). Similarly, the majority of condom use measures asked participants to respond with regard to “sex,” “intercourse,” or “coitus” (65%), without specifying what was meant by these terms. Only a minority of studies specified the types of sex, such as oral, vaginal, or anal sex.

Schroder et al. (2003a) conducted a more recent review of condom use measurement. This review was different than Sheeran and Abraham's (1994) study in that it examined condom use measurement across 116 correlational, methodological, and intervention studies. Schroder et al. divided studies into two major categories: (1) Those that used relative frequency data, meaning those that examined frequency of condom use, and (2) those that used count data, in which participants wrote the number of times they had sex with and without condoms. They found that 64% of the studies used frequency data whereas 36% used count data. Further, correlational studies were more likely to use frequency data whereas intervention studies were more likely to use count data.

Schroder et al. (2003a) suggested that, from a public health perspective, count data are more specific to the risk of the participant. For instance, consider the scenario in which Person A has sex two times and uses a condom once, and Person B has sex 100 times and uses a condom 50 times. Both individuals may report that they use condoms “sometimes.” A researcher computing percentages would classify both individuals as using condoms 50% of the time, though clearly this does not capture all of the relevant information. In fact, Person A had unprotected sex only once as compared to Person B who had unprotected sex 50 times (Schroder et al., 2003a). In addition, if Person A's sexual partner is HIV negative and he/she is monogamous, and Person B has five sexual partners, then Person B is clearly at much higher risk of acquiring HIV or another STI. Such examples make the point that more specific measures of condom use, specific to partner, sex act, and perhaps using counts instead of frequencies, may often be appropriate. In some cases, measures of unprotected intercourse may be more appropriate than measures of condom use.

Further, Pinkerton et al. (1998) examined the current state of the literature on condom use and sexual risk measurement and concluded that what was needed was a standard set of questions that could be included in all studies of HIV prevention. Suggested questions related to condom use include the average number of acts of condom-protected and unprotected intercourse, estimated HIV prevalence among study participants, and the per-contact HIV transmission probability for unprotected intercourse. Pinkerton et al. (1998) made the important point that the prevalence of HIV in the population one is studying is vitally important. For instance, if one has unprotected sex in a high HIV prevalence area or community, he or she is at much greater risk of acquiring the disease than if one has unprotected sex in a low prevalence area. In this way, the same sex act in two different contexts can be very different in terms of risk for HIV infection.

How Should Self-Reported Condom Use Be Measured?

These studies of condom use lead to an obvious question: What is the best way to assess self-reported condom use? Researchers have suggested a number of ways to maximize the precision of measures of condom use. Specifically, Sheeran and Abraham (1994) concluded that measures of condom use should be more specific to a number of dimensions of sexual behavior, and other researchers have echoed some of their suggestions. With regard to condom use measurement, they and others have suggested that researchers (1) use multiple-item measures to improve reliability of measures (Weinhardt, Forsyth, Carey, Jaworski, & Durant, 1998), (2) use 2- to 3-month recall periods; other researchers have also recommended 3-month recall periods (e.g., Schroder, Carey, & Vanable, 2003b), (3) compute test–retest reliabilities (Catania et al., 1990), (4) weight (e.g., multiply) condom use by frequency of sex and/or number of sexual partners, so that measures better reflect risk (Sheeran & Abraham, 1994), (5) use measures that are specific to sexual partners (Sheeran & Abraham, 1994), (6) use measures that are specific to sex acts, rather than general measures (Fishbein & Pequegnat, 2000), (7) measure social desirability and/or self-reported honesty to aid in examining the validity of condom use measures (e.g., Zimmerman & Langer, 1995), (8) establish why individuals use condoms and whether or not they are in a monogamous relationship with an HIV-negative partner (Fishbein & Pequegnat, 2000; Miner, Robinson, Hoffman, Albright, & Bockting, 2002), and (9) conduct more longitudinal studies on condom use (Schroder et al., 2003b; Sheeran & Abraham, 1994).

Schroder et al. (2003b) suggested that continuous measures are superior to dichotomous measures. Thus, rather than ask a person a yes/no question regarding condom use, they suggest that one should ask a question of how often the respondent has engaged in the behavior. There are at least two reasons for this. First, asking someone how often something has taken place communicates that the researcher expects that the behavior occurs and that it is normative, which is an important signal to send to respondents (Weinhardt et al., 1998). Secondly, statisticians have long discussed the superiority of continuous over dichotomous measurement (e.g., Cohen & Cohen, 1983; Hunter & Schmidt, 1990). In fact, dichotomizing a continuous variable can have the effect of attenuating its relationship with other variables (Hunter & Schmidt, 1990). From this perspective, it makes little difference whether one measures the variable continuously and then dichotomizes it for purposes of analysis, or simply measures it dichotomously to begin with. Both of these scenarios have the potential to attenuate the relationship between condom use and other variables.

There may be, however, cases where dichotomous measurement of condom use is entirely appropriate. For instance, gonorrhea is a highly contagious STI, and thus having sex one time with an infected person can easily lead to infection (Hook & Handsfield, 1999). In studying a population at high risk for gonorrhea, an investigator may only be interested in consistent condom users compared to all other condom users. Thus, it may matter little to such an investigator to distinguish among those using condoms “never” versus “sometimes,” because anything less than 100% condom use is a risk.

Finally, some research has focused on condom use errors and problems and the relationship to risk reduction. For instance, if someone uses condoms every time, but uses them incorrectly, then the ability of condoms to prevent disease is diminished. Research has shown that from populations as diverse as college students and STI clinic clients, many individuals do not use condoms correctly during sex, with problems ranging from putting on condoms incorrectly to only wearing the condom for part of the sex act (Crosby, Sanders, Yarber, Graham, & Dodge, 2002; Fishbein & Pequegnat, 2000; Sanders, Graham, Yarber, & Crosby, 2002). Thus, assessing individuals’ ability to correctly use a condom would be beneficial in studies of condom use.

The purpose of the current study was to review a sample of correlational studies of sexual risk reduction behavior, using recommendations made from methodological studies as a guide in evaluating and critiquing the literature. It remains unclear the extent to which numerous recommendations made regarding measurement of condom use have been put into practice. In addition, a particular focus was put on whether measures of condom use have improved since the appearance of Sheeran and Abraham's (1994) comprehensive review and call for higher quality measures. The current study reviewed a systematic set of published studies and examined a number of characteristics of condom use measurement, including (1) the type of measure used (e.g., frequency of condom use, proportion of condom-protected and unprotected occasions), (2) the number of levels of the condom use measure (e.g., dichotomous, 3-, 4-, or 5-point Likert-type scale), including whether the measure was analyzed in the way that it was measured, (3) the recall period used, (4) the extent to which measures were specific to sexual partners and types of sexual acts, (5) whether measures were weighted by frequency of intercourse and/or number of sexual partners, (6) whether multiple-item measures were used to measure condom use, (7) whether test–retest reliability was reported, (8) whether social desirability or self-reported honesty were measured, (9) whether other forms of birth control besides condom use were measured, and (10) whether one's ability to use condoms correctly was assessed.

METHOD

Search Strategy

The current study utilized a sample of k = 56 studies published in 53 articles in peer-reviewed journals between 1989 and 2003. The articles were initially collected for a meta-analysis on the relation between safer sexual communication (SSC) and condom use (Noar, Carlyle, & Cole, in press). The search strategy was as follows. First, comprehensive searches of both the PsycINFO and Medline computerized databases were conducted. Combinations of the keywords sexual, safer sexual, and condom on the one hand were combined with combinations of the keywords communication, assertiveness, influence, negotiation, and compliance gaining on the other. In some cases, the so-called wildcard characters were used to maximize the number of possible hits. For instance, this was done with the term safer sexual so any articles using the term safe or safer and sex or sexual were included in the search results. All articles from this search that had the possibility of being relevant were located and examined to determine the extent of relevancy.

Second, reference lists of a number of reviews and meta-analyses in the area of communication and safer sexual behavior were combed and all articles that had the potential to be relevant were located (e.g., Allen, Emmers-Sommer, & Crowell, 2002; Cline, 2003; Fisher & Fisher, 1992, 2000; Flowers et al., 1997; Sheeran et al., 1999). Third, all issues (through the end of 2003) of Health Psychology, Health Communication, Journal of Health Communication, AIDS Education and Prevention, and Journal of Adolescent Health were searched for relevant articles. Finally, requests were sent to 17 well-known scholars for any references that might be relevant to the meta-analysis.

A decision was made to include only those works that were published in peer-reviewed journals, books, or book chapters. This decision was made for two reasons. First, published work was potentially of greater quality than the work that was unpublished. And secondly, because the SSC variable was often only one of many variables in a given study, it did not appear that a publication bias in favor of significant findings was present in this literature. In other words, it appeared that most of the studies would have been published regardless of whether or not the SSC variable was significantly related to condom use, thereby decreasing the chances of publication bias.

All articles that were considered for inclusion had to meet the following criteria in order to be included in the current study: (1) The authors had to include both an applicable measure of SSC (see Noar et al., in press) and an applicable dependent measure (condom use or unprotected intercourse), and (2) the authors had to examine the association between these two variables. In addition, the data reported in the articles needed to be able to be converted to an effect size. When none of the statistics reported in the study could be converted appropriately, the authors were contacted and the appropriate data were requested.

Four studies that were eligible could not be included because the authors were not able to provide the necessary data. In addition, a small number of eligible studies were excluded because the same data were published in part or whole in more than one research report. In these cases, the study that reported the most complete data was used, with preference given to longitudinal over cross-sectional reports. Using these criteria, a total of 53 articles contributing 56 studies (some articles reported data from more than one study) were included. Although the studies were initially collected for a meta-analysis, a comparison of a variety of characteristics of the 53 articles with a representative sample of studies reported in Sheeran et al. (1999) revealed remarkable similarity, suggesting that the current set of studies was reasonably representative of the literature at large. An exception to this was that 86% of the studies contained samples drawn from the United States, whereas the Sheeran et al. (1999) review contained only 65% of such samples, suggesting that samples from countries other than the United States were underrepresented in the current review.

Table I. Characteristics of the 56 Studies

Article Coding

Articles were coded on numerous dimensions by two independent coders (the second and the third author). The researchers coded all articles independently, and then compared the results to one another, all the while keeping track of the proportion of agreement for each coding category. Inter-coder reliability was calculated using both percent agreement and Cohen's (1960) kappa, with each being calculated for each coding category. Percent agreement was calculated by dividing the number of times the coders agreed on a response for each category by the total. The mean percent agreement across all categories was 94%, whereas the mean kappa was .80, indicating very good reliability. All three authors met to discuss each article after it was coded in order to resolve any discrepancies that were present.

In addition, after all articles were coded, each characteristic was given a value that corresponded to the quality of that measure characteristic. When measures were strong on a characteristic, they were assigned a 1. When they were weak on a characteristic, they were assigned a 0. Then, the values were summed in order to give each measure a quality score for which the maximum value was 10 that could be achieved. For each dimension, values were assigned in the following way. Measure type (dichotomy: 0, all other types: 1); recall period (6 months or less: 1, all other time periods: 0); partner specific (general measure: 0, all others: 1); sex act specific (general measure: 0, all others: 1); weighted (no: 0, yes: 1); multi-item scale (no: 0, yes: 1); test–retest reliability (no: 0, yes: 1); social desirability or self-reported honesty (no: 0, yes:1); birth control (no: 0, yes: 1); and condom use skills (no: 0, yes: 1). Number of levels was not included because it was partially redundant with measure type, and whether the condom use measure was analyzed the way it was measured or differently was not directly relevant and thus not included. In addition, any characteristic not reported received a 0, whereas any nonapplicable characteristic (such as birth control for studies of gay men) received a 1 (as to not penalize any measure in an area where a certain characteristic did not apply). Although the literature was not absolutely clear in terms of what distinguishes a strong from a weak characteristic of a condom use measure, this coding reflects the suggestions of researchers in this area (e.g., Sheeran & Abraham, 1994) and thus provides a reasonable estimate of quality for each measure.

RESULTS

The 56 studies contributed 72 measures of condom use. This was the case because many studies contained more than one measure of condom use. The group of studies had a cumulative N of 18,680 participants. As can be seen in Table I, there was a reasonable amount of participant diversity in the studies although most studies were of heterosexually active individuals (86%), and of individuals from the United States (86%). In addition, most were cross-sectional studies (82%) of HIV-negative participants (89%) that were samples of convenience (88%).

Characteristics of Condom Use Measures

Table II reports on a number of characteristics of these 72 condom use measures. Selected characteristics are also summarized in Table III. As can be seen, the most common measure type was a frequency measure, with 36% (26 of 72 measures) using this type. Frequency measures were defined as those that asked participants how often they used condoms. This was followed by dichotomous-type measures (28%) and then by proportions (21%). Dichotomous measures were defined as those that asked yes/no questions about condom use, typically asking whether or not condoms were used in general or the last time one had sex. Proportion measures were defined as those that examined the proportion or percent of condom-protected occasions to overall sex occasions, and were typically computed from count data.

Table II. Characteristics of 72 Condom Use Measures Within 56 Studies

Number of levels (or response alternatives) that measures utilized ranged from 2 to 100, with 2 and 5 being employed most often (both at 28%). In addition, recall periods ranged from the past 24 hr to lifetime condom use, with 6 months being employed most often (20%), followed by no time frame given (15%), followed by 1 year (14%) and 2 months (13%).

Condom use measures varied greatly to the extent to which they were specific both to partner type and sexual act. The most specific measures were tailored to all partner types (16%), whereas the most general measures did not specify partner type at all (57%). Similarly, the most specific measures were tailored to vaginal, oral, and anal sex (17%), whereas the most general were measures that did not specify which type of sex they were assessing (12%). Given that these were largely studies of heterosexuals, a large number of measures (63%) assessed vaginal sex only.

Only three (4%) condom use measures were weighted, with two weighted by frequency of sex and one both by frequency of sex and number of sexual partners. What this meant was that these additional factors (e.g., frequency of sex) were combined with the condom use measure, sometimes by multiplying the condom use frequency by frequency of sex, in order to take these additional factors into account. A large number of measures (N=28, 39%) were analyzed in a manner different from how they were measured. What this often meant was that categories were collapsed for purposes of analysis, often from many categories into two categories for logistic regression analysis (e.g., Cohen & Dent, 1992; Heckman et al., 1996; Shoop & Davidson, 1994). Only six (8%) measures were multiple item scales, and these were derived from three studies (Grimley, Prochaska, & Prochaska, 1993; Huszti et al., 1998; Noar, Morokoff, & Redding, 2002). The Hustzi et al. (1998) study used a multiple-item safer sex scale, whereas the Grimley et al. (1993) and Noar, Morokoff, and Redding (2002) studies used condom stage of change algorithms.

Further, only six (11%) studies either reported test–retest reliability of a condom use measure or stated that it had been examined in formative work, but did not necessarily report it. In addition, only four (7%) studies measured social desirability or self-reported honesty. Twenty-three (41%) studies assessed birth control, which was consistent with the large number of heterosexual samples in this group of studies; however, studies tended not to take the birth control measure into account when conducting analyses. Finally, nine (16%) studies contained some measure of correct condom use skills. These varied from self-efficacy scales, to condom use error and problem scales, to behavioral skill simulations in which the participant had to demonstrate the correct use of condoms using a penis model.

Table III. Summary of Characteristics of 72 Condom Use Measures

Quality of Measurement and Changes Over Time

Whether the quality of condom use measures has improved over time was examined. The ***quality scores were averaged together for each year and plotted on a graph. Because the review contained only two studies between the years 1989 and 1991, the quality scores for these years were averaged together. As can be seen in Fig. 1, the quality of condom use measures showed a modest trend toward higher quality measures over time. The proposition that condom use measures might have significantly improved over time, both because of the Sheeran and Abraham (1994) article appearing in the literature and the natural progression of this research literature was then tested statistically. Quality scores from the 23 measures published in the 1989–1995 studies were averaged together, as were the scores from the 49 measures published between 1996 and 2003. The split was made in 1996 because studies that integrated suggestions from Sheeran and Abraham (1994) would likely have taken until at least 1996 to appear in the published literature. A t test calculated comparing the 1989–1995 mean score (M = 2.96, SD = 1.40) with the 1996–2003 mean score (M = 3.57, SD = 1.00) was found to be statistically significant, t(70) = 2.13, p < .05, ω2 = .05. The results suggest that condom use measures have increased in quality over time, and that this finding is of approximately medium-sized magnitude in terms of effect size (Cohen, 1988).

Further, the issue of which specific features of condom use measures have improved was examined. Using the scheme developed for the quality scores, percentages were calculated in order to examine the proportion of “strong” characteristics in each category. These were calculated for both the 1989–1995 and the 1996–2003 period, and the results are displayed at the bottom of Table II. The results revealed that characteristics of measures fell into two categories: those that improved over time and those that became slightly weaker over time, and the percent change is reported here. The results show that measures in the 1996–2003 period used improved measure types (+4%), improved recall periods (+13%), had more specificity to sex acts (+26%), increased calculation of test–retest reliability (+6%), increased measurement of social desirability/self-reported honesty (+2%), increased measurement of birth control methods besides condom use (+15%), and increased assessments of condom use skills (+5%). The results also revealed that measures in the 1996–2003 period had decreased partner specificity (−5%), used less weighting of measures (−7%), and contained fewer multi-item scales (−1%), indicating slightly weaker measurement on these characteristics.

Finally, whether longitudinal studies had higher quality measurement than cross-sectional studies was examined. Because the 1989–1995 period contained only one longitudinal study, a comparison could not be made within that time period. In the 1996–2003 period, the 13 measures from longitudinal studies were compared to the 36 measures from cross-sectional studies. A t test calculated comparing the mean quality score from longitudinal studies (M = 3.38, SD = 1.12) with the mean quality score from cross-sectional studies (M = 3.64, SD = 0.96) was found not to be statistically significant, t(47) = 0.78, p = .47, ω2 = .008. This suggested that measures contained within longitudinal studies did not have superior measurement to those contained in cross-sectional studies.

DISCUSSION

Measuring sexual risk behavior is complex and researchers have responded to this complexity by developing risk indices (Burkholder & Harlow, 1996) and safer sex algorithms (Le Pont, Pech, Boelle, & The ACSAG Investigators, 2003; Miner et al., 2002; Noar & Morokoff, 2002), as well as conducting research on how and under what circumstances self-reported safer sexual behavior can be accurately assessed (for reviews, see Catania et al., 1990; Schroder et al., 2003a, 2003b; Weinhardt et al., 1998; Zea, Reisen, & Diaz, 2003). The current study specifically examined condom use measurement, evaluating the literature based on a number of recommendations that have been made by researchers. Overall, our study provided evidence that condom use measures have improved in quality over time, although the study also revealed that there was still clearly room for greater improvement to such measures. We now discuss each of the aspects of condom use measurement in more detail and then provide recommendations for future measures in this area.

Fig. 1.
figure 1

Quality scores for condom use measures between 1989 and 2003.

First, in the overall data, measure types were most often frequency or dichotomous measures. Although researchers have suggested that frequency measures are a good choice in many cases (e.g., Sheeran & Abraham, 1994), researchers would likely not support the widespread use of dichotomous measures. Although some of the dichotomous measures were “last time” measures, which can be useful because of their specificity, many of these were yes/no questions that simply asked respondents “if they use condoms.” Given that the condom use is a continuous phenomenon, we would argue that this is not a sensitive way to measure it. This is also tied directly to the number of levels within each measure (e.g., response alternatives). When several response alternatives are provided to a condom frequency question, a researcher can gain reasonable insight into how often one uses condoms. When only two options are given, the information assessed is cruder and less sensitive to differences that may exist.

In addition, a number of measures were analyzed differently than they were measured (39%), with many continuous measures being collapsed into two categories, such as “never/sometimes” compared to “always” condom users. As previously discussed, this may be an appropriate strategy when studying populations in which there is a high prevalence of infectious STIs; however, this was not often the case within the studies in this review. Rather, it was often not clear why certain condom use groups were created for certain analyses. This is an important issue because the way in which condom use groups are created can effect the results of one's statistical analyses, and might lead to Type 1 or Type 2 errors if groupings are made inappropriately (see Crosby, Yarber, Sanders, & Graham, 2004). In the future, researchers should be more specific as to why they are creating certain classifications of condom users for certain analyses, and why these groupings are appropriate for the particular question being asked in the research.

A total of 21% of measures were proportion type measures, compared to the 36% that were frequency measures. Both measures get at the consistency of condom use, one through Likert-type scales and the other through calculating the percent of time condoms are used. One drawback of using 5-point Likert-type and similar scales is that individuals may liberally interpret these categories. For instance, Cecil and Zimet (1998) found that 63% of young adult college students thought that 90–95% condom use was “always” using a condom, whereas 54% thought that 5–10% condom corresponded to “never” using condoms. White et al. (2000) found similar results in a study of high-risk heterosexually active adults. When proportion type measures are used, the number of condom-protected and unprotected sexual occasions one has had in a certain time period is assessed (e.g., count data). Number of unprotected occasions is an excellent dependent measure in and of itself (see Fishbein & Pequegnat, 2000; Schroder et al., 2003a). However, these measures require someone to report exactly how many condom-protected and unprotected sexual occasions they have had in a certain time period, which may be difficult for some individuals and populations. Researchers should weigh these issues carefully when designing and utilizing measures for their specific population of interest.

Next, recall period was considered. Some recall periods, such as 6 months and 1 year, were common periods to use. Although longer recall periods are likely to be more representative of one's behavior, little research has demonstrated the accuracy of recall periods of more than 3 months (Schroder et al., 2003b; Sheeran & Abraham, 1994). Troubling is the fact that more than half of the studies in this review either used a recall period of greater than 3 months or used no recall period at all. Researchers should be very cautious regarding the use of recall periods that are greater than 3 months until more research validates such longer time periods. In addition, studies that gave no recall period (15%) or used lifetime condom use (7%) may not have yielded particularly accurate responses. Short, specific periods are likely to yield the best responses from participants, although additional studies are needed in order to better evaluate the reliability of various recall periods.

Further, researchers have suggested that the more specific condom use measures are to partner type and sex act, the better the resulting data will be (Sheeran & Abraham, 1994). A total of 57% of measures were not specific to partner type, whereas 12% of measures appeared not to be specific to type of sex. Although creating measures that are specific to these characteristics can get complex, it is likely that the resulting data will be more clear and accurate. For instance, studies have shown differences between main and casual partner condom use rates (Noar, Zimmerman et al., 2004) as well as vaginal, oral, and anal sex condom use rates (Lindley, Nicholson, Kerby, & Lu, 2003; Semple, Patterson, & Grant, 2002). For these reasons, delineating between these may make clearer which sexual partner(s) and sexual act one is asking about, and thus increase confidence in the meaningfulness of the data.

Additional aspects of condom use measures were also evaluated, including weighting by frequency of sex or number of sexual partners. Only 4% (three studies) weighted by one or both of these characteristics. The result is that studies may not be accurately assessing the true risk of participants if they are not taking into account the frequency of sex and/or the number of sexual partners of an individual. In addition, only a small number of studies (16%) assessed the ability of individuals to use condoms correctly. These studies most often employed self-efficacy scales, which are not ideal because an individual may have high self-efficacy but simply be incorrect in the way in which he or she is using a condom. Most ideal are face-to-face behavioral skills tests in which individuals demonstrate their ability to put a condom correctly on a penis model, including opening the package carefully, leaving space at the tip, and so forth (e.g., St. Lawrence et al., 1998). In cases where this is not feasible, condom skills instruments that ask a number of questions about correct condom use might be employed.

Finally, some methodological issues deserve mention. Nearly all of the condom use measures in this review were single item measures. The difficulty with single item measures is that their reliability is not known, and in this review studies tended to not examine test–retest reliability. To increase confidence in such measures, researchers should either move toward developing multi-item scales (e.g., Huszti et al., 1998), which were quite rare in this literature, or should calculate test–retest reliability on the single item measures. In the case of cross-sectional studies, researchers might conduct short longitudinal studies in their formative work to be sure that the test–retest reliability of measures used in the subsequent study is acceptable.

A small number of studies used condom stage of change algorithms (Grimley et al., 1993; Noar, Morokoff, & Redding, 2002), which are a bit different than multiple item scales. Such algorithms classify individuals into stages on the basis of their readiness to change their sexual risk behavior and are based on Prochaska and DiClemente's (1983) Transtheoretical Model of Change. In some contexts, these measures may be preferable to other condom use measures, as they integrate condom use behavior, intentions, and temporal indicators into one measure of condom use. For instance, such algorithms classify individuals into (1) precontemplation (no intention to use condoms consistently), (2) contemplation (plan to use condoms consistently in the next 6 months), (3) preparation (plan to use condoms consistently in the next 30 days), (4) action (currently use condoms consistently), and (5) maintenance (currently use condoms consistently and have been for 6 months or more) stages of change. In this way, a brief set of items can provide an abundance of information regarding one's condom use (see Noar & Morokoff, 2002, for algorithm), and some researchers have used condom stage measures to evaluate the efficacy of HIV/STI prevention interventions (e.g., Redding et al., 2004; Schnell, Galavotti, Fishbein, Chan, & The AIDS Community Demonstration Projects, 1996). In addition, although the Huszti et al. (1998) study used a safer sex scale that is summed up into a score, it was also based on the concept of stages of change. This 3-item scale assessed individuals’ past condom use, future intention to use condoms, as well as whether individuals were currently abstinent with their romantic partner. The amount of time that behaviors such as condom use or abstinence had been taking place was also assessed. Such a scale attempts to take a broader assessment of safer sexual behaviors to include abstinence.

In terms of other methodological issues, only 7% of studies assessed social desirability or self-reported honesty, and many of the studies did not report the correlation of these measures with condom use. Such correlations, should they be low, would give one more confidence in the validity of condom use measures. It is possible, even probable, that when participants answer questions about condom use they feel some pressure to report that they use condoms. Similar to a behavior such as wearing a seatbelt, condom use is likely to be viewed as socially desirable. By assessing correlations between condom use and social desirability or self-reported honesty, one can examine the extent to which this is true (e.g., Zimmerman & Langer, 1995). This might lead to the “fine tuning” of condom use instruments in order to discourage socially desirable responding.

Changes in Measures Over Time, Limitations, and Recommendations for Future Measures

This review found that the condom use measures have increased in quality over time, with a nearly medium-sized effect, suggesting that researchers have improved their measures as more research has been conducted and perhaps in response to calls for improved measures published in the literature, such as Sheeran and Abraham (1994). When examined more specifically, a number of features were found to have improved, including measure types, recall periods, specificity to sex acts, test–retest reliability, measurement of social desirability/self-reported honesty, measurement of birth control, and assessment of condom use skills. It is promising that researchers have improved their condom use measures in these areas, although areas that improved only slightly, such as measure types, test–retest reliability, measurement of social desirability, and measurement of condom use skills, clearly need greater improvement in the literature. In addition, a small number of characteristics, including partner specificity, weighting of measures, and use of multi-item scales, worsened over time. Further, although the measures in the 1996–2003 period statistically improved with a quality mean score of 3.57, the quality score index had a maximum possible score of 10, and thus we still have much room to improve. The current review is a call for researchers to take a next step in improving the quality of condom use measures, to increase the sensitivity and appropriateness of such measures, as well as to increase the comparability of measures across studies. Such changes to measures will continue to increase the quality of the sexual risk behavior literature and thus the confidence in the findings from that literature.

We should note that the current review had a number of limitations. First, the studies reviewed were mostly correlational and cross-sectional in nature. Although the set of studies reviewed appear to be reasonably representative of this correlational literature, this literature may be vastly different as compared to the intervention literature. In fact, it may be that studies of sexual risk reduction interventions contain higher quality measures than do correlational studies, something that the current study did not examine. Secondly, and as already noted, studies conducted in countries other than the United States were underrepresented in the current review. Why was this the case? It may be that researchers from the United States have focused more on the safer sexual communication variable that was used as a criterion variable in the current review of studies. In fact, recent meta-analyses of the condom use literature conducted by researchers from the United Kingdom have noted the paucity of studies that have been conducted on the issue of safer sexual communication and negotiation, particularly when compared to the number of studies focused on variables such as attitudes and self-efficacy (see Flowers et al., 1997; Sheeran et al., 1999). Finally, because the safer sexual communication variable was used as a major focus of the search for studies, the sample of studies may be less representative of the literature than if other inclusion criteria had been used. Although, as already noted, a comparison of the current set of studies to other published reviews (e.g., Sheeran et al., 1999) showed remarkable similarity on a number of demographic, measurement, and other study characteristics.

In conclusion, we summarize overall recommendations for future measures of condom use as follows:

  • Use measures most appropriate to the situation, as there is no “gold standard” in this area and differing contexts may call for differing types of measures.

  • In analyses, examine multiple measures of condom use and unprotected sex and compare findings for potential differences. If differences in findings emerge this would be important to take into account in the final analysis and reporting of the data.

  • Use the specific recommendations below to guide the development/choice of specific condom use measures.

Table IV. Recommendations for Measures of Condom Use

In addition, in the spirit of improving future measures, we have summarized a number of specific recommendations from the literature and present those recommendations for future condom use measures in Table IV. Currently, there is no agreed upon “gold standard” in this area and that different situations may call for different types of measures. However, measures should vary across studies because they have to, not because of idiosyncratic choices made by researchers. Thus, to the extent that we can move in the direction of consensus as to stronger measures of condom use, the field will be strengthened. It is our hope that Table IV, which is a summary of important recommendations in this area from the literature, will help facilitate the continued improvement of condom use measures and the building of consensus in this area over time.