INTRODUCTION

Sex research has typically relied on participants’ self-reports of sexual behaviors, sexual orientation, sexual identity, and psychosexual developmental milestones. However, researchers have questioned the ability of individuals to provide reliable self-reports and whether current measurement strategies reliably assess sexual information (Catania, Gibson, Chitwood, & Coates, 1990; Schroder, Carey, & Vanable, 2003; Weinhardt, Forsyth, Carey, Jaworski, & Durant, 1998). Failure to reliably assess self-reported sexual information would have profound consequences for research on sexuality and sexual health. In the absence of reliable self-reports, the ability to predict sexual behavior or evaluate changes in behavior is greatly reduced. Furthermore, if measurement of sexual behavior is unreliable then, by definition, it is also invalid.

Despite the importance of reliable self-reports of sexually relevant information, few researchers have undertaken test–retest studies to evaluate whether individuals can provide reliable self-reports and whether current measures of sexual behavior reliably elicit such reports. Indeed, a recent review of the research conducted since 1990 identified only 15 studies that examined test–retest reliability of self-reported sexual behavior (Schroder et al., 2003), with far fewer studies conducted prior to 1990 (Catania et al., 1990). Taken together, research on the reliability of self-reported sexual behavior has been characterized as “a mixed bag” (Catania et al., 1990). The more recent research continues to have a number of limitations (see Schroder et al., 2003, for a recent critique). To adequately evaluate reliability, questions must assess the same behaviors for the same period in time (e.g., the past month) at both the test and the retest assessments. However, overly long assessment periods (e.g., 6 months) used is much reliability research increases the likelihood that the two assessments will be nonoverlapping in time and therefore not be assessing the same behaviors. Such long test–retest periods are better characterized not as reliability of reports, but rather as consistent patterns of behavior (Nunnally, 1978). Similarly, overly short retest periods (e.g., 48 hr) increase the likelihood of participants recalling their original reports, thus artificially increasing reliability coefficients. The research is also limited in the scope of sexual behaviors examined, with many studies assessing the reliability of only a few global assessments of sexual behavior (e.g., number of partners, number of times had sex) rather than the reliability of specific sexual behaviors (e.g., frequency of oral, anal, or vaginal sex; with or without condoms; sex while using substances).

Catania et al. (1990) also noted the lack of research on the reliability of reported sexual behaviors by specific subpopulations, including different age groups, genders, and ethnic/racial groups. Indeed, the reliability and validity of adolescents and young adults’ self-reported sexual behaviors have been questioned. Several studies have suggested that a sizable number of youths admit to lying about their sexual experience (17%, Newcomer & Udry, 1988) or report being dishonest about their sexual behavior (8–24%, Siegel, Aten, & Roghmann, 1998). In particular, male youths are significantly more likely to overreport their sexual behavior (14% report that they reported ‘‘a lot more’’ sexual behavior than they really had) than female youths, whereas female youths underreport their behavior (8% report that they reported ‘‘a lot less’’ sexual behavior than they really had; Siegel et al., 1998).

The test–retest reliability of youths’ sexual behavior has not been extensively examined (for review, see Brener, Billy, & Grady, 2003). Most of the reliability research among adolescents and young adults has found self-reported sexual behavior to be only moderately reliable (mean reliability in each study = .51–.66; Boekeloo, Schamus, Simmens, & Cheng, 1998; Brener, Collins, Kann, Warren, & Williams, 1995; Brener et al., 2002; Hearn, O’Sullivan, & Dudley, 2003), with only a single study finding high levels of reliability among youths (mean reliability = .92, Durant & Carey, 2002). The low reliability found in past research is partly due to methodological limitations such as nonoverlapping assessment periods (e.g., Boekeloo et al., 1998). The extant literature on adolescents is also limited by the inclusion of only a small number of sexual items, most of which are general in scope (e.g., whether youths ever had sex; Brener et al., 2002; Flisher, Evans, Muller, & Lombard, 2004). As such, more research is needed to examine a broad range of specific sexual behaviors (e.g., oral and anal sex, condom use during specific behaviors, and insertive versus receptive behaviors).

Although the reliability of self-reported sexual behavior has been questioned among adolescents in general, at particular risk for low reliability may be gay, lesbian, and bisexual (GLB) youths. Although social desirability and privacy concerns may reduce the reliability of reported sexual behaviors of all populations (for review, see Catania et al., 1990; Schroder et al., 2003), this may be particularly true among GLB individuals, given the stigma attached to same-sex sexuality. However, few studies have examined the reliability of sexual behavior among GLB individuals. Only three test–retest studies have examined the reliability of adult gay men's reported sexual behavior (Coates et al., 1986; McLaws, Oldenburg, Ross, & Cooper, 1990; Saltzman, Stoddard, McCusker, Moon, & Mayer, 1987). These studies found a wide range of reliability coefficients from poor to near perfect: .40–.99 (Coates et al., 1986), .08–.98 (McLaws et al., 1990), .34–.72 (Saltzman et al., 1987). To date, the reliability of reported sexual behaviors has not been examined among GLB youths.

Although the reliability of reported sexual behaviors has been examined (for review, see Schroder et al., 2003), the reliability of other aspects of sexuality remains unexamined, including sexual orientation, sexual identity, and the self-reported ages of various psychosexual developmental milestones. The ability of GLB youths to reliably report such aspects of their sexuality is critical for research on psychosexual or sexual identity development. In the only study identified that has examined the test–retest reliability of sexual orientation, Saltzman et al. (1987) found good reliability (.84) over a 6-week period among adult gay men. The only study to examine the reliability of retrospective reports of the ages of achieving various sexual milestones (e.g., age first kissed, age first sex) was conducted among heterosexual girls and it found high levels of reliability (mean r=.85, Hearn et al., 2003). However, no such studies have examined the reliability of sexual identity, orientation, or psychosexual milestones among GLB youths.

In an attempt to address the absence of research on the reliability of self-reported sexual behavior among GLB youths, this test–retest study examined the reliability of a broad range of sexual behaviors (both lifetime and in the past 3 months) over a 2-week period. In addition, the study examined the reliability of youths’ sexual identity, sexual orientation, and psychosexual developmental milestones. Furthermore, because past research has found gender differences in the reporting of sexual behavior, the reliabilities of male and female youths were examined separately.

METHOD

Participants

As part of a larger longitudinal study of 156 GLB youths aged 14–21 years, a subsample of 64 youths also participated in a sub-study of the test–retest reliability of their self-reported sexual behaviors, sexual identity, sexual orientation, and psychosexual developmental milestones. Youths were recruited from five GLB-focused organizations in New York City, including three community-based organizations and two student organizations from public colleges. Additional description of the larger study sample, including descriptive data of the youths’ sexual behavior, is available in earlier reports (e.g., Rosario et al., 1996; Rosario, Meyer-Bahlburg, Hunter, & Gwadz, 1999).

Initial interviews with youths who would compose the reliability sub-study were initiated 4 months following the start of the larger study. As such, recruitment and interview procedures had become established and interviewers had become experienced with the interview protocol. Youths who had recently participated in the baseline assessment of the longitudinal study were contacted by telephone and asked to participate in a second interview scheduled approximately 2 weeks after their original interview. So as to not bias their responses, youths were not told at baseline that the reliability of their responses would be assessed nor were they told the reason for the retest interview was to assess their reliability. We attempted to reinterview all participants who were accrued into the larger study after initiation of the reliability interviews. For the reliability subsample, specific attention was focused on obtaining approximately equal numbers of male and female youths and youths from all five recruitment sites.

Of the 64 youths who participated in the reliability sub-study, 35 (55%) were males and 29 (45%) were females. Youths were between the ages of 14 and 21 years (M=18.1, SD=1.9 for males; M=18.2, SD=1.5 for females). They self-identified as gay/lesbian (69%), bisexual (28%), or other (3%). The youths were of Latino (42%), Black (28%), White (19%), Asian (5%), and other (6%) ethnic backgrounds. Over one third (38%) of the youths reported that their mother or father received welfare, medicaid, or food stamps (i.e., low socioeconomic status, SES). Most youths (87%) were recruited from community-based organizations and the remainder (13%) from college student organizations. A comparison of youths who participated in the reliability sub-study with youths who did not participate (e.g., those interviewed prior to the initiation of the reliability interviews) found no significant differences on gender, age, ethnicity/race, SES, sexual identity, or recruitment site.

Procedure

As part of the larger study, youths provided voluntary signed informed consent for a longitudinal series of interviewer-administered structured interviews. Parental consent was waived for those youths under age 18 years by the Commissioner of Mental Health for the State of New York. Instead, an adult in each community-based organization served in loco parentis to safeguard the rights of the underage participants. The study was approved by the Institutional Review Board of the Psychiatry Department of Columbia University and by the recruitment sites.

All interviews for the test–retest sub-study were conducted between January and June 1994, with follow-up interviews conducted approximately 2 weeks later (M=17.1 days, SD=4.36). A 2-week interval was selected for the test–retest administration because this interval is long enough to minimize recall of responses provided at the original assessment, but sufficiently brief both to reduce the likelihood of new sexual behaviors between the test and the retest assessments and to minimize the nonoverlapping portion of the reporting periods for recent sexual behaviors (e.g., in the past 3 months). Indeed, some researchers have suggested a 2-week interval as ideal for test–retest studies (e.g., Nunnally, 1978; Wiederman, 2002) for these reasons.

Interviews were conducted in a private room at each recruitment site. Each youth received $30 for his or her participation at both the initial and the retest assessments. Interviews were conducted by an ethnically diverse group of college-educated male and female interviewers who were purposefully matched to participants on gender, but not necessarily on race/ethnicity. No attempt was made to have the same interviewer conduct the baseline and retest interviews.

Every interviewer received 20 hr of training on conducting interviews on sexually sensitive topics and interviewing techniques (e.g., probing for accuracy of responses, tracking the logical consistency of responses over the course of the interview, building rapport with the youths; Dugan & Meyer-Bahlburg, 2003). Training was conducted by experts in the area of sexuality assessment. As part of their training, each interviewer conducted four practice interviews. Audiotaped interviews were monitored throughout the study to ensure quality and consistency. Interviewers received feedback from the researchers in both individual and group supervision.

Measures

Sexual behavior (both lifetime and in the past 3 months), sexual identity, sexual orientation, and psychosexual developmental milestones were assessed with the Sexual Risk Behavior Assessment Schedule for Homosexual Youths (SERBAS-Y-HM; Meyer-Bahlburg, Ehrhardt, Exner, & Gruen, 1994). The SERBAS-Y-HM is a semi-structured interview schedule with male (M-1) and female (F-1) versions. The SERBAS-Y-HM consists of approximately 300 items, but because of skip patterns throughout the interview, the number of items administered is dependent on the responses reported by each youth. It requires approximately 45 min to administer. The current version of the SERBAS-Y-HM is based on an earlier version of the SERBAS-Y for gay/bisexual male youths (Meyer-Bahlburg, Ehrhardt, Exner, & Gruen, 1988). Revisions were based on focus groups with GLB youths at community-based agencies serving these youths and discussions with staff serving these youths.

Lifetime Sexual Behaviors

A series of items assessed the lifetime prevalence of various sexual behaviors, including the number of sex partners, number of sexual encounters, sex in exchange for goods, and sexual partners at risk for HIV infection. After defining the various sexual behaviors to be assessed in the survey and the youths’ own terminology for each behavior, youths were asked to “count up” all the same-sex partners with whom they had “any kind of sex within their whole lifetime.” This was followed by a question about the total number of times they had sex with these partners. The lifetime prevalence of the exchanging sex for goods was assessed by asking youths questions about whether they had ever received money, drugs, or a place to stay from a same-sex partner in exchange for sex. Questions also assessed whether youths had ever given money, drugs, or a place to stay in exchange for sex, but no youths reported this behavior. Experiences with potentially risky sexual partners were assessed by asking youths whether they had ever had a same-sex partner who had injected drugs, had a sexually transmitted disease, or had tested positive for HIV/AIDS. Lesbian and bisexual female youths were asked whether they had ever had a sexual partner who was a gay or bisexual male. With the exception of this last question, identical questions were asked regarding the same behaviors with other-sex partners. The other-sex questions for these and all other subsections always followed the same-sex questions.

Recent Sexual Behaviors

A series of items assessed the prevalence of various sexual risk behaviors in the past 3 months. After requesting personally relevant events to clarify the 3-month period of interest, youths were asked whether they had any sex (previously defined for them) with a same-sex partner in the past 3 months. If appropriate, youths were then asked to “count up” the number of same-sex sexual partners they had in the past 3 months. Youths were subsequently asked the number of times they had engaged in various sexual behaviors with each of these same-sex partners (separately for active/insertive and passive/receptive), including vaginal–digital sex (for females only), oral sex, oral–anal sex, and anal sex (for males only). A total number of episodes for each sexual behavior was assessed by adding the number of passive and active encounters. Additional items assessed, for each behavior, the number of sexual encounters in which condoms or other appropriate HIV barrier methods were used and the number of encounters in which the youths used drugs or alcohol right before or during sexual activity. We computed the number of unprotected sexual encounters by subtracting the number of protected encounters from the total number of encounters. Corresponding data on other-sex sexual behaviors were also collected. However, with the exception of the overall prevalence of any sex with the other sex, the frequency of specific sexual behaviors with the other sex was too infrequent for reliability analysis; only 19% of youths reported any recent other-sex sexual behaviors.

Sexual Identity

A single item assessed sexual identity, “When you think about sex, do you think of yourself as lesbian/gay, bisexual, or straight?” Youths who rejected these identities were coded as “other.”

Sexual Orientation

Sexual orientation was assessed with three items that asked youths to indicate the degree to which in the past 3 months their recent sexual attractions, thoughts, or fantasies focused on the same sex or the other sex: (1) when in the presence of other individuals in a public setting (i.e., sexual attractions), (2) when masturbating, dreaming, or daydreaming (i.e., sexual fantasies), and (3) when viewing erotic materials in films, magazines, or books (i.e., erotica).Footnote 1 A 7-point, Kinsey-type response scale was used ranging from 0 (always girls/women) to 6 (always guys/men), with a midpoint 3 indicating equally guys/men and girls/women. The scale was reversed for female youths. Youths who indicated not experiencing the assessed event were coded as such. The mean of these three items was computed as an assessment of overall cognitive sexual orientation (Cronbach's α=.92 in the initial assessment of the reliability subsample).

Psychosexual Developmental Milestones

The youths were asked the ages when they first experienced various milestones in the development of sexual orientation, sexual behavior, and sexual identity. They were asked the ages when they were first (1) erotically attracted to, (2) had thoughts or fantasies about, and (3) were aroused by erotica focused on the same-sex. Similar items assessed ages at which youths first experienced attractions, fantasies, and erotic arousal toward the other sex. Youths were asked the ages when they first engaged in various sexual behaviors with the same sex and the ages when they first engaged in various sexual behaviors with the other sex. On the basis of these responses, the minimum age reported was used as the age when they first had any sex with the same sex and the age when they first had any sex with the other sex. Finally, youths were asked about the ages when they first thought they “might be” bisexual, when they thought they “might be” gay/lesbian, when they thought they “really were” bisexual, and when they thought they “really were” gay/lesbian. Youths who indicated not experiencing the assessed event were coded as such.

Data Analysis

Test–retest reliability was computed using kappa (κ) for categorical variables (Cohen, 1968) and intraclass correlations (ICC) for continuous variables (e.g., Bartko, 1966). The rationale for the use of kappa and ICC over Pearson or Spearman correlations have been argued elsewhere (e.g., Schroder et al., 2003). Briefly, although interclass correlations (e.g., Pearson, Spearman) are appropriate for examining the relation between two independent variables, these correlations are inappropriate when the two variables share variance (e.g., two assessments of the same variable). In cases of common variance, intraclass correlations are used (e.g., McGraw & Wong, 1996). Because correlation coefficients are asymmetrically distributed, correlations were transformed using Fisher's r-to-z transformation (Hays, 1994), averaged, and then back-translated to correlations, so that mean reliability coefficients could be obtained for each domain.

RESULTS

Lifetime Sexual Behaviors

Test–retest reliability of self-reported lifetime prevalence of sexual behaviors are presented in Table I. Overall, youths were found to reliably report lifetime prevalence of sexual behaviors (M=.89, range .69–1.00). The lifetime number of same-sex sexual partners (ICC=.96) and the prevalence of exchanging sex for goods with a same-sex partner (κ=1.0) were among the most reliably reported. The one exception to this trend was youths’ reports of the lifetime number of same-sex sexual encounters. The moderate reliability found for this variable (ICC=.49) was attributable to a low value among the female youths (ICC=.41)Footnote 2 as compared with male youths (ICC=.81). Indeed, examination of this observed difference in the reliability coefficients indicated that although female youths were found to provide somewhat more reliable reports than male youths (M=.94 versus .88, respectively), female youths had a wider range of reliability coefficients (.41–1.00) than did male youths (.64–1.00) on reports of lifetime sexual behaviors.

Table I. Test–Retest Reliability of Reports of Lifetime Sexual Behaviors

Recent Sexual Behaviors

The reliability coefficients of sexual risk behaviors in the past 3 months are presented in Table II. Overall, youths reliably reported recent sexual risk behaviors (M=.96, range = .68–1.00), with male and female youths having nearly identical reliability (M=.96 and .94, respectively). Indeed, the prevalence of sexual behavior with an other-sex sexual partner (κ=1.0), the number of same-sex partners (ICC=.96) and encounters (ICC=.91) were the most reliably reported. Reports of unprotected sex were all quite reliable (M=.93, range = .77–.99), with no apparent gender differences (M=.91 for males and .94 for females). Two gender-specific exceptions should be noted to this general pattern. First, youths (particularly female youths, κ=.60) were moderately reliable in their reports of whether they had a same-sex sexual encounter in the past 3 months. Second, whereas reports of vaginal–digital, oral, and analingus sexual behaviors while on alcohol or drugs were generally reliable (range = .69–1.0), reports of anal sex while using drugs or alcohol (which was asked only of male youths) were found to be poor (ICC = −.01–.24).

Sexual Identity, Sexual Orientation, and Developmental Milestones

The reliability coefficients of self-reported sexual identity, sexual orientation, and psychosexual development milestones are presented in Table III. Youths reliably reported (κ=.89) their sexual identity as gay/lesbian, bisexual, or other. Similarly, youths’ sexual orientation was reliable when assessed as attractions to others in public and in their fantasies (ICC range = .85–.89), but both male and female youths were only moderately reliable about erotica (ICC range = .63–.66). Youths reliably reported the ages at which they experienced various psychosexual developmental milestones (M=.77, range = .66–.88), with female youths somewhat more reliable than male youths (M=.85 and .77, respectively). One gender-specific exception was noted; female youths were found to have only moderate reliability (ICC=.45) in reporting the age when they first were sexually “turned on” by same-sex erotica.

Table II. Test–Retest Reliability of Reports of Sexual Behaviors in the Past 3 Months

DISCUSSION

Despite the importance of reliable sexual information regarding GLB individuals, the current study, as far as we know, represents the first test–retest study of various aspects of sexuality among GLB youths. Overall, substantial to almost perfect reliability was obtained using the SERBAS-Y-HM among GLB youths on a variety of aspects of their sexuality, including lifetime sexual behavior, recent sexual behavior, unprotected sexual risk behavior, sexual identity, sexual orientation, and ages of psychosexual developmental milestones. The reliability found here is substantially higher than that found among most past research among primarily heterosexual adolescents or GLB adults.

Table III. Test–Retest Reliability of Reports of Sexual Identity, Sexual Orientation, and Psychosexual Developmental Milestones

Two potential explanations exist for the strong reliability found in this study. First, the SERBAS-Y-HM includes strategies that have been recommended by experts in sexual behavior assessment to enhance the reliability and validity of the behaviors assessed, including (1) defining sexual terms (e.g., what do you mean by “sex”; Wiederman, 2002), (2) using nontechnical jargon by exploring and using the youths’ own language and terms for sexual behaviors (e.g., “tossing salad”; Catania et al., 1990), (3) focusing on a short, 3-month recall assessment (Schroder et al., 2003), (4) using participant-nominated events in order to personally anchor and clarify the assessment window (Weinhardt et al., 1998), (5) assessing behaviors with respect to each specific partner, and (6) utilizing qualitative research to inform item content and language (Weinhardt et al., 1998). Second, the interviewers were highly trained and experienced with the administration of the SERBAS-Y-HM, comfortable with discussing sexual topics, and comfortable with the GLB population. Unfortunately, it is impossible to determine which aspects of the SERBAS-Y-HM or the interviewer training played critical roles in the reliability of the reports assessed here. Nevertheless, researchers are encouraged to employ measures that, like the SERBAS-Y-HM, incorporate strategies to enhance the reliability and validity of self-reported sexual information.

Despite the generally high reliability found among these youths, some exceptions were noted. Although there were generally few observed differences in the reliability of male and female youths’ reports, instances of moderate or low reliability were often gender-specific. For example, female youths were found to have only fair agreement on the number of sexual partners in their lifetime, whereas male youths provided almost perfect reliability on this question. In contrast, male youths were found to provide poor reliability in their reports of anal sex while using alcohol or drugs (female youths were not asked about anal sex). This poor reliability may be due to the rarity of this behavior, which reduced the sample size and potential variability. Nevertheless, it should be noted that the numbers of moderate or low reliabilities observed were less than expected by chance. Future research must determine whether the low reliabilities are chance findings or indicate problematic measurement.

Unreliable findings also have serious methodological implications for sex research in general. For example, youths reported only moderate reliability (κ = .77 for males and .60 for females) on whether they had any same-sex sexual behavior in the past 3 months. Although this would suggest that youths are only moderately able to recall their recent sexual behaviors, in fact, they provided highly reliable reports of the number of recent partners and the number of recent specific sexual acts (e.g., vaginal, oral, anal; with or without a condom). This inconsistency suggests that perhaps, despite our efforts to clarify what we meant by “sex,” some youths were confused by this general term, but not when asked about specific behaviors. Thus, the use of general questions may be unreliable and research should focus on specific sexual behaviors. This would also imply that general questions should not be used to determine whether to skip a section of more detailed sexual inquiry; instead, specific behaviors should be assessed, regardless of any response to more general sex questions.

Given the recent advances in computer-assisted interviewing (e.g., Audio-CASI), some may question whether the use of a face-to-face interview for the assessment of sexual behavior is a reliable and valid method of assessment. Indeed, many have suggested that the greater privacy afforded by Audio-CASI assessments would increase the reliability and validity of self-reported sexual behavior (for review, see Schroder et al., 2003). Although some research has indicated that Audio-CASI results in more reports of potentially stigmatizing sexual behaviors than do face-to-face interviews (Des Jarlais et al., 1999), most of the research has identified only a small number of differences between interviews and Audio-CASI in the reports of sexual behaviors (Ellen et al., 2002; Macalino, Celentano, Latkin, Strathdee, & Vlahov, 2002; Metzger et al., 2000; Williams et al., 2000). Indeed, some of these observed differences are in the opposite direction, with more sexual behaviors disclosed via face-to-face interviews than with Audio-CASI (Ellen et al., 2002; Jennings, Lucenko, Malow, & Devieux, 2002; Williams et al., 2000). Furthermore, at least some past research has suggested that test–retest reliability of sexual behavior is greater in face-to-face interviews than when using Audio-CASI (Williams et al., 2000). Although it is unclear whether Audio-CASI results in more reliable and valid assessments of sexual behavior, face-to-face interviews may have some potential advantages in some populations, such as among those with low educational background or those who are uncomfortable using computers. Face-to-face interviews have the added benefits of allowing for the exploration of the individuals’ own terms for various sexual behaviors, perceiving possible confusion and clarification of questions, exploring of potential logical inconsistencies, and building trust and rapport with the participant—none of which are adequately duplicated with the use of Audio-CASI. Indeed, this report provides evidence that sexual information can be reliably obtained via face-to-face interviews and earlier reports from this study using the SERBAS-Y-HM provide evidence of the construct validity of this interviewer-administered assessment (e.g., Rosario, Hunter, Maguen, Gwadz, & Smith, 2001; Rosario, Mahler, Hunter, & Gwadz, 1999; Rosario, Schrimshaw, & Hunter, 2004).

The present sub-study has limitations. First, the sample size for the test–retest study was limited. Although we had a sufficient sample to examine reliability separately for male and female youths, we had insufficient numbers to examine potential ethnic/racial differences in reliability. A second limitation is that the sample was recruited from GLB-focused organizations in a major urban area. As such, these GLB youths may not be representative of the population of GLB youths. These youths may have been further along in the development of their GLB identity and more comfortable discussing their sexuality than youths who might not be involved in GLB organizations. As such, these youths’ reports may have been more reliable than might be found among samples less comfortable with their sexuality. Similarly, the findings from this ethnically diverse and urban sample may not generalize to other GLB populations. A third potential limitation is the use of a 2-week test–retest period. Although the 2-week retest is recommended by psychometric texts to prevent recall (e.g., Nunnally, 1978) and is sufficiently brief to help ensure that new behaviors did not occur between test and retest (thereby biasing the reliability estimates), this brief retest period might increase the possibility of participants recalling their original responses and artificially increasing their reliability coefficients. As such, future reliability research may wish to employ longer test–retest periods to determine whether the reliability in reports observed here are replicated over longer periods (but not so long as to assess behaviors in two nonoverlapping time periods). Finally, this report demonstrated that GLB youths were able to reliably report sexual information, this study does not provide any information about the validity of these reports. Although reliability is necessary for validity, the reverse is not true. Thus, the high reliabilities identified here are not necessarily indicative that youths were accurate in their reports of sexual information. Future research into the validity of sexual reports are needed.

Despite these limitations, the findings provide preliminary but critical information regarding the reliability of self-reported sexual information among GLB youths. However, given the importance of reliable reports of sexual information and the scarcity of empirical reports examining reliability, future research is needed into reliability of self-reported sexual information among all groups including adolescents and GLB individuals.