Introduction

A long-standing question in survey research has been: When do people feel comfortable enough to provide honest answers to sensitive questions (Hyman, 1944; Parry & Crossley, 1950)? The results of surveys covering topics considered to be sensitive or stigmatized, such as a person’s abortion history (Fu, Darroch, Henshaw, & Kolb, 1998; Jones & Forrest, 1992; Jones & Kost, 2007), criminal record (Kleck & Roberts, 2012; Preisendörfer & Wolter, 2013), drug use (Aquilino, 1994; Johnson & Fendrich, 2005; Morral, McCaffrey, & Iguchi, 2000; Tourangeau & Smith, 1996), or sexual inclinations (Coffman, Coffman, & Ericson, 2016; Gates, 2013a; Tourangeau & Smith, 1996; Villarroel et al., 2006), have been found to be especially prone to systematic distortions. For example, less than half of induced abortions in the U.S. are reported (Jones & Kost, 2007), over one-third of subjects from a German sample provided inaccurate reports about their criminal history (Preisendörfer & Wolter, 2013), and over 25% of students sampled from a US university did not report the failing or near failing grades that were, in fact, part of their academic history (Kreuter, Presser, & Tourangeau, 2008). One meta-analysis of self-report validation studies, covering 38 studies and 226 sensitive questions, estimated that 42% of sensitive behaviors are not reported in surveys (Lensvelt-Mulders, Hox, & Van Der Heijden, 2005).

This perpetual difficulty in producing valid and accurate prevalence estimates has been largely attributed to the fact that people typically underreport sensitive or stigmatized information, while emphasizing or exaggerating socially valued attributes, attitudes or behavior (Barnett, 1998; Crowne & Marlowe, 1964; Krumpal, 2013; Lee, 1993; Paulhus, 2002; Rasinski, Willis, Baldwin, Yeh, & Lee, 1999; Singer, Von Thurn, & Miller, 1995; Stocké & Hunkler, 2007; Tourangeau, Rips, & Rasinski, 2000; Tourangeau & Smith, 1996)—a phenomenon that has been labeled “social desirability bias” (DeMaio, 1984; Paulhus, 1984, 1986). The strategy which people employ to manage their stigmatized social identities is known as “passing” (Kanuha, 1999) and is defined as “the management of undisclosed discrediting information about self” (Goffman, 1963, p. 42).

Among sensitive topics, surveys on sexual orientation are especially difficult to conduct for several reasons. First, unlike other sensitive behaviors (e.g., abortions or criminal history), there are no objective records of the public’s sexual attractions, fantasies, behaviors, or identities, and objective measurements are virtually impossible to obtain due to the discreet and somewhat subjective nature of sexual orientation. Second, there are unique complications associated with the wording of sexual orientation-related questions and response options (Badgett, 2009; Savin-Williams, 2006; Vrangalova & Savin-Williams, 2012), as well as complications that can arise from the fluid and situationally dependent nature of a person’s sexual orientation identity (Diamond, 2008; Kanuha, 1999; Katz-Wise, 2015; Mock & Eibach, 2012). Third, questions about one’s sexual orientation are particularly sensitive due, in part, to a specific form of social desirability bias—heteronormativity (Warner, 1993). Heteronormativity is the preconception that heterosexuality is the socially accepted norm and that those who diverge from the norm are not only deviant but undesirable.

Although recent reports suggest that societal acceptance of the LGBT community is on the rise in the U.S. (Pew Research Center, 2013), the effects of heteronormativity are still prevalent, with 39% of LGBT adults in the U.S. reporting having been rejected by a family member or close friend because of their sexual orientation or gender identity, and 58% reporting having been the subject to slurs and jokes (Pew Research Center, 2013). Furthermore, 50% of 40,117 respondents in 40 countries believe homosexuality is “morally unacceptable” (Pew Research Center, n.d.), and in many countries people who are believed to be non-heterosexual have been subjected to indignations ranging from discrimination and hate crimes to enforced psychiatric treatments, torture, and execution (Itaborahy & Zhu, 2014; Kitzinger, 2005).

Obtaining Accurate Measures of Sexual Orientation

Given the particularly high sensitivity surrounding questions regarding one’s sexual orientation (Coffman et al., 2016; Gates, 2013a, b) and the importance of obtaining accurate estimates of the non-heterosexual population for policy makers (Graham et al., 2011; Mayer et al., 2008; Sell & Holliday, 2014), an expert panel of survey specialists and sexual orientation researchers was recently convened by the Ford Foundation to develop a guide to asking questions about sexual orientation on surveys. After a multi-year effort, these experts concluded that, when possible, sexual orientation-related questions should be placed on the self-administered portions of a survey (Badgett, 2009).

This advice is supported by the findings of researchers from a variety of fields who have found that in the U.S. the use of survey modes that increase a sense of anonymity in respondents can mitigate impression management (Dwight & Feigelson, 2000) and can increase reporting of sensitive or stigmatized beliefs, behaviors, and identities (Dillman, Smyth, & Christian, 2014; Durant, Carey, & Schroder, 2002; Krumpal, 2013; Villarroel et al., 2006). Researchers have also found that use of self-administered survey modes—such as self-administered questionnaires (SAQs), computer-assisted self-interviews (CASIs), and audio computer-assisted self-interviews (ACASIs)—can increase the likelihood of respondents reporting they have engaged in stigmatized or embarrassing behaviors when compared with interviewer-administered survey modes—such as face-to-face interviews (FTF), computer-assisted personal interviews (CAPI), and computer-assisted telephone interviews (CATI) (Aquilino & LoSciuto, 1990; Chang & Krosnick, 2010; Jones & Forrest, 1992; O’Reilly, Hubbard, Lessler, Biemer, & Turner, 1994; Tourangeau & Smith, 1996, 1998; Turner et al., 1998; Turner, Lessler, & Devore, 1992; Turner, Ku, Sonenstein, & Pleck, 1996).

Booth-Kewley, Larson, and Miyoshi (2007) found this to be especially true of computer-based self-administered surveys because “computers create an impersonal social situation in which individuals feel more anonymous, more private, more self-absorbed, less inhibited, and less concerned about how they appear to others” (p. 3), and this view has been echoed by others (e.g., Badgett, 2009; Baker, Bradburn, & Johnson, 1995; Bradburn et al., 1991; Buchanan, 2000; Gnambs & Kaspar, 2015; Joinson, 1999; Keeter, McGeeney, Igielnik, Mercer, & Mathiowetz, 2015; Lucas, Gratch, King, & Morency, 2014; Tourangeau & Smith, 1996; Tourangeau & Yan, 2007; Trau, Härtel, & Härtel, 2013). While some early studies found no significant advantage to computer-based self-administered surveys over traditional SAQs (Locke & Gilbert, 1995; Wright, Aquilino, & Supple, 1998), a meta-analysis published in 2000 of 30 studies yielding 77 effect sizes concluded that computer-based surveys were somewhat better than other measures (including SAQs) in increasing the self-disclosure of sensitive behaviors (Dwight & Feigelson, 2000), and a recent meta-analysis of 39 studies yielding 460 effect sizes confirmed this finding, adding that the effect was strongest for the most sensitive behaviors (Gnambs & Kaspar, 2015).

Despite the evidence in favor of using non-intrusive survey methodologies, and the availability of more accurate methods for assessing sexual orientation (Cerny & Janssen, 2011; Chivers, Rieger, Latty, & Bailey, 2004; Chivers, Seto, & Blanchard, 2007; Savin-Williams, 2006), the majority of national surveys conducted in the U.S. have utilized relatively intrusive methodologies to ask about sexual identity, if they ask any sexual orientation questions at all (Sell & Holliday, 2014).

Taking the above factors into account, we set out to examine (1) the variability in 12 recent estimates of non-heterosexual identity prevalence in the U.S. as a function of the privacy and anonymity granted by the survey modes they employed, and (2) the comfort level that people report when asked to consider answering questions about their sexual orientation through 16 different survey modes.

Study 1: Non-Heterosexual Prevalence Estimates in the U.S.

We identified 12 nationwide surveys conducted between 2004 and 2015 that provided estimates of sexual orientation prevalence in the U.S. For each survey, we extract the available sexual orientation identity estimates and discuss the methods used to obtain them. Breakdowns of the estimates each survey produced by mode and response option are shown in Table 1.Footnote 1

Table 1 Breakdown of non-heterosexual (NH) estimates by survey, mode, and response category

Method

We report the non-heterosexual estimates for each survey by taking the sum of explicit non-heterosexual identity survey responses (“Homosexual,” “Gay or Lesbian,” “Bisexual,” or “Something else”) and excluding ambiguous or non-responses (“Not sure,” “Don’t know,” and refusals to answer). Though research has shown that the use of non-response options such as “I prefer not to answer” increases significantly when answers to sensitive questions are compared with answers to non-sensitive questions (Joinson, Woodley, & Reips, 2007), item non-response has not been shown to be a reliable indicator of item sensitivity, and non-responses cannot be assumed to be the result of social desirability (Beatty & Herrmann, 2002).

In order of the size of the prevalence estimates each survey produced, the surveys were: a 2004–2005 National Epidemiologic Survey on Alcohol and Related Conditions Wave 2 (NESARC), the 2013 National Health Interview Survey (NHIS), the 2004–2006 National Survey of Midlife Development in the United States (MIDUS II), the 2010 National Intimate Partner and Sexual Violence Survey (NISVS), a 2012 survey by the Gallup organization (Gates & Newport, 2012), the 2006–2010 National Survey of Family Growth (NSFG), the 2014 General Social Survey (GSS), the 2009–2012 National Health and Nutrition Examination Survey (NHANES), the 2010 National Survey of Sexual Health and Behavior (NSSHB), a 2015 online survey by the YouGov organization (Moore, 2015), a 2013 online survey (Coffman et al., 2016) administered on Amazon’s Mechanical Turk (AMT)—an online subject pool frequently used by psychology and behavior researchers (Berinsky, Huber, & Lenz, 2012), and a 2012 survey that was freely available online (Epstein, McKinney, Fox, & Garcia, 2012).

Results

Among the 12 surveys we examined, we found that estimates of non-heterosexual identities were lower among surveys using interviewer-administered modes—with estimates ranging from 1.4 to 3.4% for FTF, CAPI, CATI, and telephone interviews—than among surveys using self-administered modes—with estimates ranging from 3.0 to 4.8% for SAQ, CASI, and ACASI (Table 1). Estimates obtained using Internet-based self-administered survey modes were the highest, with estimates ranging from 7.2 to 22.2% (Table 1). The prevalence of self-identified bisexuals appears to be especially predictable from the intrusiveness of the survey mode employed (Table 1). This could be due, in part, to a phenomenon known as “binegativity,” which is defined as negative attitudes toward bisexuals from both males and females, and both heterosexuals and homosexuals (Feinstein, Dyar, Bhatia, Latack, & Davila, 2016; Rust, 2002). There are, however, several factors other than mode effects that could have contributed to the distortion of these estimates.

In terms of sampling, coverage, and response rates, the Coffman et al. (2016) and Epstein et al. (2012) surveys used self-selecting samples, MIDUS II was a follow-up survey that had a 30% attrition rate, and the NISVS and NSSHB had low response rates, which may indicate a higher degree of self-selection than is typical (Table 2). However, low response rates are not necessarily indicative of a biased sample (Chang & Krosnick, 2009; Curtin, Presser, & Singer, 2000; Keeter, Miller, Kohut, Groves, & Presser, 2000). For six of the 12 surveys we examined, we were unable to locate a response rate or a response rate was not provided by the researchers (Table 2). Given that the mean age of people who identify as LGBT in national surveys is often lower than the mean age of people who identify as heterosexual (Gates, 2014a), it is possible that the restricted age ranges of the NESARC Wave 2, MIDUS II, NSFG, and NHANES samples affected their estimates (Table 2). While the Coffman et al. (2016) study was posted on AMT, the Epstein et al. (2012) survey was freely available online and received both mainstream and LGB media coverage. Given that non-heterosexuals might be more likely to self-select to take an online test of this sort, the high figure obtained in this study probably overestimated the prevalence of non-heterosexuality in the general population.

Table 2 Sampling and coverage details of U.S. non-heterosexual prevalence estimates

There was also a high degree of variability in the question wording and response options provided by each survey: MIDUS II and NHANES conflated sexual attraction with sexual orientation identity, and Gallup conflated gender with sexual orientation identity (Table 3). In addition, only NHIS and NHANES provided separate response options depending on the respondent’s gender, and several of the surveys did not provide any type of “Other” response option (Table 3). Order effects could also have distorted the estimates, since only Gallup, GSS, and NHIS listed a non-heterosexual category as the first response option (Krosnick & Alwin, 1987). Unlike the rest of the surveys, which offered various sexual orientation labels as response options, Gallup and Coffman et al. (2016) only offered “yes,” “no,” or “don’t know” responses. The questions preceding the sexual orientation question could also have affected the estimates, but what their impact might have been is uncertain (Bradburn & Mason, 1964; Lee, McClain, Webster, & Han, 2016).

Table 3 Question and response options used in U.S. non-heterosexual prevalence estimates

In addition to the above areas, there were several areas of potential distortion that were unclear from the resources we were able to find. For example, whether participants were called by a computer or a person in the Gallup survey is not clear. When we contacted the Gallup organization to determine how participants were contacted, we were told that CATI, in which participants are asked questions over the phone by a human interviewer or an automated computer system and respond by pressing the number on their phone that corresponds to their answer, did not accurately describe Gallup’s telephone interviewing process, and that “to protect our proprietary systems and avoid divulging information to market competitors, we do not share our processes and methodology” (Gallup Client Support, personal communication, October 18, 2013). The NHIS also asked an unreported proportion of subjects the sexual orientation question over the phone (B. W. Ward, personal communication, April 13, 2015), which could have resulted in a mix of mode effects. The YouGov survey did not provide specific details about the sampling and weighting methods used, and the small sample size, as well as the relatively large margin of error of 4.2%, makes the validity of this estimate somewhat suspect (Moore, 2015).

Taking the above distortions into account, we believe that the six surveys that were least problematic and easiest to compare are NESARC, NHIS, NISVS, GSS, NHANES, and NSSHB. There are still notable differences among these surveys other than the mode of administration (including the year NESARC was conducted, the low response rate of NISVS, and the limited age range of NHANES), but given the paucity of data on non-heterosexual prevalence in the U.S. (Sell & Holliday, 2014), we believe these six surveys allow for the best possible comparison of survey mode as it affects estimates of non-heterosexual prevalence (Fig. 1). Utilizing a Welch’s t test and a Cohen’s d test modified for unequal variances (Bonett, 2008), we found that the mean estimate for surveys that involved an interviewer (M = 2.4%, SD = 1.0%) was significantly different from the mean estimate for surveys that were self-administered (M = 5.5%, SD = 1.4%) (t = 3.13, p < .05; Cohen’s d = 1.97, 95% confidence interval [CI] = 0.46, 3.50). Although we realize that in some respects we are comparing apples and oranges, the fact that there seems to be an orderly relationship in these studies between the intrusiveness of the survey mode and the magnitude of the prevalence found is suggestive. If we consider the nationally representative online NSSHB to be our ground truth—7.2% of the population identifies as non-heterosexual—then the prevalence of non-heterosexuals may have been underestimated by between 50% in the case of the relatively non-intrusive GSS CASI survey, to as much as 414% in the case of highly intrusive NESARC FTF survey (Fig. 1). However, the cause of these differences cannot be directly or solely attributed to the mode in which the survey was conducted (Dillman et al., 2014). It is possible that other sensitive behaviors measured by these surveys would follow the same pattern. If comparable studies exist that do not fit this pattern, we are not aware of them.

Fig. 1
figure 1

Non-heterosexual prevalence estimates by mode and response option from six of the least problematic and easiest to compare US nationwide surveys. The FTF mode corresponds to the NESARC estimate, CAPI corresponds to the NHIS estimate, CATI corresponds to the NISVS estimate, CASI corresponds to the GSS estimate, ACASI corresponds to the NHANES estimate, and Online corresponds to the NSSHB estimate

Among the surveys we examined, only the GSS provided data that allowed us to compare mode effects within a survey. We discovered from the researchers that a small portion of the GSS respondents were interviewed with a CATI mode rather than a CASI mode (T. Smith, personal communication, April 11, 2015). When we compared the non-heterosexual prevalence estimates between the participants surveyed by these two modes and employed weights provided by the GSS, we found that a larger portion of respondents reported a non-heterosexual identity with the CASI mode (4.7%) than with the CATI mode (1.8%) (χ 2[1] = 5.73, p < .05), a finding that supports the notion that self-administered survey modes can produce larger and more accurate non-heterosexual prevalence estimates (Badgett, 2009).

The experimental design employed by Coffman et al. (2016) to produce two prevalence estimates is also relevant to the present study. Their participants were randomly placed into one of two treatment groups. In one group, item count technique (ICT) (Holbrook & Krosnick, 2010; Miller, 1984; Tsuchiya, Hirai, & Ono, 2007) was used to reduce social desirability bias and encourage honest responding. ICT, also known as an “unmatched count” or “list response” technique, is a within-subjects method in which a control group—the “Direct Report” group in this study—indicates how many of N questions are true, while an experimental group—the “Veiled Report” group in this study—indicates how many of N + 1 questions are true. By including a sensitive question in the Veiled Report group, researchers are able to estimate the population mean for the N + 1st item—the sensitive question—by comparing the mean number of affirmative responses in two groups. In this study, the N + 1st question was: “Do you consider yourself to be heterosexual?” and the only response options were “yes” or “no.” Subjects in the Direct Report group answered the N + 1st on its own separate page after responding to the N list. Using this experimental design, the researchers found an 11.3% non-heterosexual prevalence estimate among the “Direct Report” group, and an 18.6% estimate among the “Veiled Report” group, a 64.2% increase (Coffman et al., 2016). The rationale behind ICT suggests that the larger value, 18.6%, is the more valid estimate (Miller, 1984).

The main problem with the Coffman et al. (2016) study is the sample. Although AMT has been shown to provide valid research results (Berinsky et al., 2012; Buhrmester, Kwang, & Gosling, 2011; Paolacci & Chandler, 2014; Peer, Vosgerau, & Acquisti, 2014), some evidence suggests MTurk samples are not representative of the US population (Chandler & Shapiro, 2016). That said, the ICT methodology employed by Coffman et al. avoids two problems inherent in most sexual orientation surveys: (1) the heteronormativity problem is reduced because participants do not have to answer the sexual orientation question directly, and (2) because the test is not identified as a sexual orientation test, participants are unlikely to self-select based on their interest in sexual orientation. On the surface, the survey contained entirely innocuous questions on a variety of subjects; the sexual orientation question was just slipped in.

From the estimates and methodologies we have gathered, it appears that little attention has been paid to the accumulating evidence on the importance of asking sensitive questions in particular ways (Badgett, 2009), and the dramatic variability we have found in estimates of sexual orientation prevalence in the U.S. suggests that most of the sexual orientation prevalence data available to policy makers may be distorted due to the use of survey methods known to produce distorted results.

This is alarming, because the need to collect accurate and reliable data on the LGBT population has been cited as a crucial step in researching and addressing disparities experienced in this diverse population (Bradford & Mayer, 2008; Cochran & Mays, 2006; Gates, 2013b; GLMA, 2001; Graham et al., 2011; Mayer et al., 2008; Sell & Becker, 2001; Sell & Holliday, 2014; Ward, Dahlhamer, Galinsky, & Joestl, 2014). These disparities have been found in areas such as physical health (Cochran & Mays, 2007), mental health (Bostwick, Boyd, Hughes, & McCabe, 2010; Conron, Mimiaga, & Landers, 2010; Diaz, Ayala, Bein, Henne, & Marin, 2001; Dilley, Simmons, Boysun, Pizacani, & Stark, 2010; Gilman et al., 2001; McLaughlin, Hatzenbuehler, & Keyes, 2010; Roberts, Austin, Corliss, Vandermorris, & Koenen, 2010), substance abuse (Conron et al., 2010; Dilley et al., 2010; Gilman et al., 2001; Gruskin, Greenwood, Matevia, Pollack, & Bye, 2007; McLaughlin et al., 2010), domestic violence (Walters, Chen, & Breiding, 2013), health insurance coverage (Gates, 2014b), and disability (Fredriksen-Goldsen, Kim, & Barkan, 2012). Such disparities have been found across all age groups of this population, from teenagers (Bradford & Mustanski, 2014; Kann et al., 2011; Mustanski, Van Wagenen, Birkett, Eyster, & Corliss, 2014; Remafedi, French, Story, Resnick, & Blum, 1998; Russell & Joyner, 2001) to the elderly (Fredriksen-Goldsen et al., 2012).

Study 2: Comfort Level by Survey Mode

How can we produce the most accurate estimates? To address this question, we surveyed a diverse group of people regarding how comfortable they would feel in answering questions about their sexual orientation given a wide range of survey methods—from, at one extreme, highly intrusive methods that preserve neither privacy nor anonymity to, at the other extreme, non-intrusive methods that preserve both. We hypothesized (1) that self-reported comfort level would be predictable from the literature on sensitive topics and inversely related to the intrusiveness of the survey mode, and (2) that the comfort levels would be correlated with the non-heterosexual prevalence estimates obtained from the national sexual orientation surveys we analyzed.

Method

Participants and Procedure

On October 24, 2013, we administered our survey to 652 US participants through AMT with a solicitation entitled: “10–15 min opinion survey on comfortability with various research methods.” Due to the nature of participation on AMT, no measure of response rate was collected. Each participant was paid US $1 for participation in the study. The mean age was 33.60 years (SD = 11.44), there were slightly more male (58.9%) than female (41.1%) participants, and the sample was diverse demographically (Table 4).

Table 4 Demographic characteristics and mean comfort level differences

Measures

We developed a list of survey mode scenarios that at one extreme were highly intrusive in both respects and that at the other extreme were not (see “Appendix”). We investigated eight interviewer-administered methods, such as face-to-face, CATI, and CAPI, and eight self-administered methods, such as SAQ, CASI, and online surveys. Each survey mode scenario had two versions: an anonymous (name is not required) version and a non-anonymous (name is required) version. The wording of these question pairs was otherwise identical (“Appendix”).

In total, the questionnaire described 16 different survey modes that were listed in order, roughly, from least intrusive to most intrusive (“Appendix”). Each mode was described in a short paragraph, and participants were given the following instructions: “On a scale of −5 to +5 (where −5 means very uncomfortable and +5 means very comfortable), please rate how comfortable you would feel reporting accurate information about your sexual orientation (such as the opposite-sex or same-sex attractions you have felt or the opposite-sex or same-sex sexual encounters you have had) when asked in each of the ways listed below. Please note that some of these conditions are very similar, so read them carefully before answering.”

Results

We examined the results of our survey in the context of the literature on mode effects and sensitive questions. We utilized nonparametric statistical tests such as Spearman’s ρ, Mann–Whitney U, and Wilcoxon signed rank V throughout this study because our survey results lie on an ordinal scale. Subjects mean comfort level ratings varied greatly among survey modes and were consistent with the literature on asking sensitive questions about sexual orientation. We found that anonymous survey modes elicited significantly higher mean comfort levels (M = 1.43, SD = 2.49) than non-anonymous modes (M = −0.21, SD = 3.33) in aggregate (V = 142,364; p < .001), as well as individually (Fig. 2). Among the anonymous modes we examined, participants reported the highest mean comfort level for online surveys (M = 3.93, SD = 2.05) and the lowest for CAPI-VR (M = −1.44, SD = 3.75). Among the non-anonymous modes we examined, the highest mean comfort level was again reported for online surveys (M = 1.02, SD = 3.61), and the lowest was reported for CAPI-VR (M = −1.81, SD = 3.74).

Fig. 2
figure 2

The 16 survey modes described in the questionnaire grouped by anonymous and non-anonymous versions. Significant differences were found between anonymous and non-anonymous versions of each methodology. Error bars represent standard error of the mean

We also found support for the finding that self-administered surveys (M = 1.90, SD = 2.60) elicited more sensitive behavior reporting than interviewer-administered surveys (M = −0.70, SD = 3.36) (V = 162,012; p < .001). The finding that computers were valuable in eliciting higher rates of sensitive behavior reporting was also consistent with our data, with the CASI scenario having a significantly higher mean comfort level rating (M = 3.14, SD = 2.52) than its paper-and-pencil counterpart SAQ (M = 2.31, SD = 2.98) (V = 44,575; p < .001). As with other studies that sampled from a highly literate population (e.g., Gnambs & Kaspar, 2015), we did not find a significant difference for audio-enhanced computer surveys: the mean difference between CASI and ACASI modes was 0.2 (V = 9777; p = .99). The anonymous online survey mode not only elicited the highest mean comfort level rating from our participants (M = 3.93, SD = 2.05), but was also significantly higher than CASI surveys (V = 35,273; p < .001), supporting the notion that an online survey mode can enhance feelings of privacy compared with CASI, because no interviewer is involved in the administration of an online survey (Gnambs & Kaspar, 2015).

We found only one significant difference in comfort level for gender, race, education, income, sexual orientation, or age groups (Table 4). Specifically, a significant difference emerged among age groups for comfort level with non-anonymous survey modes. A significant negative correlation also emerged between continuous age and comfort level (Spearman’s ρ = −0.13, p < .001). This relationship is supported by previous findings concerning age and non-heterosexual orientation prevalence estimates (Gates, 2014a).

To examine the effects of age, gender, and sexual orientation on mean comfort for anonymous, non-anonymous, self-administered, and interviewer-administered survey mode groups, we constructed four regression models. Given the ordinal nature of our response variable and the bimodal distribution in responses, we utilized logistic regressions and converted the mean comfort for each methodology to a binary response variable, where subjects received a one if they reported a mean comfort with that group of modes over zero (comfortable), and a zero otherwise (uncomfortable). We also collapsed the smallest demographic groups for age (45–64 and 65–74) and sexual orientation (“other” and “unsure”). Only the non-anonymous and the interviewer-administered survey mode groups returned significant coefficients, and diagnostic plots of deviance residuals as well as Hosmer–Lemeshow goodness of fit tests (Hosmer & Lemeshow, 2000) suggested that both logistic response functions were appropriate (non-anonymous, χ 2 = 6.16, p = .10; interviewer-administered, χ 2 = 0.45, p = .93). Coefficients were adjusted for dispersion and are presented as odds ratios with 95% confidence intervals in Table 5. For non-anonymous survey modes, being in the 25–44 age group reduced the odds of being comfortable by 0.64 relative to the 18–24 group. For interviewer-administered survey modes, a self-reported sexual orientation of gay increased the odds of being comfortable by 2.88 relative to a sexual orientation of straight. This latter finding suggests that people who openly identify as gay might be comfortable discussing the details of their sexual orientation with an interviewer because they’ve already declared their preferences, whereas someone who identifies as straight might feel uncomfortable disclosing potentially dissonant sexual preferences to an interviewer.

Table 5 Logistic regressions predicting mean comfort

When the prevalence estimates obtained from the six comparable studies we mentioned previously (Fig. 1) were compared to the mean comfort levels we obtained for their respective survey modes, we found a suggestive correlation (Spearman’s ρ = 0.77, p = .05). Generally speaking, we found that the higher the comfort level of a survey mode, the higher the non-heterosexual prevalence estimate it produced.

Discussion

Given that the most common surveys of sexual orientation prevalence in the U.S. conducted in recent years employ relatively intrusive methodologies, we conclude that those surveys may greatly underestimate the prevalence of non-heterosexuals and, therefore, that current public policy making in this area may be based on inaccurate estimates. Our findings suggest that integrating non-intrusive administration modes, such as anonymous online surveys, into national data collection efforts will produce more accurate estimates of non-heterosexual prevalence and that the methodology least likely to underestimate the prevalence of non-heterosexuals has the following characteristics: (1) It is self-administered online; (2) it is fully automated; (3) it assures people that their participation is completely anonymous; (4) it uses recruitment methods that do not compromise anonymity; and (5) it does not give participants the feeling of being observed or recorded.

We have taken liberties in the present study by comparing surveys that differ from one another not only in intrusiveness but also in other significant ways, including coverage, sampling methods, and non-response rate (Dillman et al., 2014). We defend these comparisons only by saying that these are, to our knowledge, all of the large-scale national surveys conducted in the U.S. over the past decade or so that have asked about sexual orientation in samples drawn from the general population, and that the limited manner in which we have compared these studies is not unreasonable.Footnote 2

Given the wide range of prevalence estimates that have been obtained from such studies, we believe it is important that standards be set that will allow future studies to be conducted with a higher degree of validity than appears to exist in studies to date. While we realize that the primary purpose of each of the surveys we examined was not necessarily to produce accurate estimates of sexual orientation prevalence, we believe that any studies that ask about sexual orientation should be informed by relevant research (Badgett, 2009; Savin-Williams, 2006; Vrangalova & Savin-Williams, 2012).

The validity of our survey results is limited by our sample, which was drawn from a pool of highly experienced online survey takers (AMT). It is possible that people with limited or no online experience would express less comfort when asked about the possibility of taking tests online and that people with little or no experience taking surveys would express less comfort over the possibility of taking any surveys at all. However, a growing body of research suggests that data gathered on AMT are as valid as data collected in other settings (Berinsky et al., 2012; Casler, Bickel, & Hackett, 2013; Mason & Suri, 2012) and perhaps even superior to other commonly used subject pools (Casler et al., 2013; Chandler & Shapiro, 2016). Follow-up research should repeat our procedure with (1) a sample of people with experience taking surveys but with little online experience, (2) a sample of people who have little or no experience taking surveys, and (3) include control questions on non-sensitive behaviors or sensitive behaviors unrelated to sexuality. Based on research related to the present study (Booth-Kewley et al., 2007; Coffman et al., 2016; Gnambs & Kaspar, 2015; Villarroel et al., 2006), however, we conjecture that, in both cases, the overall pattern will be similar to the one we found—namely, that the more intrusive the survey mode, the less likely people will be to disclose sensitive information.

It is also possible that the comfort level ratings we found were affected by the order in which the items were presented; we did not vary the order (Appendix). We chose to use a fixed order for our questions in order to minimize confusion; the fixed order, we hoped, would draw attention to the specific restrictions we were adding to each question. A random order, we felt, could easily cause participants to overlook specific restrictions (since we were showing them long, compound sentences). Relevant research on order effects (e.g., Tourangeau, Couper, & Conrad, 2013) suggests that the fixed order of our questions might have distorted the ratings we found by between 0.20 and 0.30 points on an 11-point scale, but our comfort levels spanned a 6-point range (Fig. 1). We believe, therefore, that order was a contributing factor in the effect we observed but not a fatal confound. This issue should be explored in follow-up research.

Besides providing participants with a sense of anonymity and privacy, online surveys provide researchers with additional benefits such as reduced costs, rapid data collection, and easy access to large samples, both national and international (Rhodes, Bowie, & Hergenrather, 2003; Wright, 2005). Internet penetration in the USA is currently at 88.5 percent and is still increasing (Internet Live Stats, 2016), which means that Internet-based research will become even more advantageous and valid in future years. Despite these advantages, there are potential limitations to the validity of data collected through online surveys. Such surveys do not necessarily produce representative samples, which can limit the generalizability of the results (Rhodes et al., 2003; Wright, 2005). Some studies have found, however, that the validity of data collected from online surveys is comparable to that of data collected through SAQ (Fortson, Scotti, Ben, & Chen, 2006; Millar & Dillman, 2011; Nathanson & Reinert, 1999), telephone interviews (Chang & Krosnick, 2002, 2009; Graham & Papandonatos, 2008; Keeter et al., 2015), and CAPI (Ramo, Hall, & Prochaska, 2011)—the methods used in several of the national surveys we examined.

In spite of the limitations of the present study, we believe we have provided reasonably strong support for several key ideas that should be of interest to people working on sexual orientation issues, especially to people responsible for formulating relevant public policy: (1) survey methodology is critically important in determining non-heterosexual prevalence, (2) intrusive methodologies will likely lead to underestimates, and (3) for better or worse, technology is increasing our ability to elicit sensitive information from individuals.