Introduction

Many aspects of daily living, such as physical health, psychological well-being, and partner relationship quality, influence sexual functioning in otherwise sexually healthy individuals (Dunn, Croft, & Hackett, 1999; Kalmbach, Ciesla, Janata, & Kingsberg, 2012; Levine, 2003; Rosen & Bachman, 2008). Presently, evidence for the utility of self-report instruments of sexual functioning in men and women primarily comes from clinical samples (e.g., individuals with sexual dysfunction). As such, it is presently unclear whether existing measures of sexual functioning are sensitive to normative variations. Indeed, prior investigations have typically relied on using measures of sexual functioning which have not been validated for normative, healthier populations who are less likely to be encumbered by physical complaints and psychiatric illness. In order to advance the study of the influences and outcomes of normative sexual functioning, studies must establish the applicability of these measures to healthier populations. Additionally, the field presently suffers from a dearth of self-report instruments of male and female sexual functioning that allow for comparison between men and women. Parallel measures of male and female sexual functioning would allow for important comparisons between men and women in both clinical and research settings. As such, the present study sought to examine the utility of existing measures of sexual functioning in non-clinical samples of young adult men and women.

The Female Sexual Function Index (FSFI) was introduced by Rosen et al. (2000) who examined the measure by comparing middle-aged (M age = 40 years) women with female sexual arousal disorder with healthy controls. Using a sample of 259 women, exploratory techniques provided support for a five-factor model. This factor structure was consistent with the a priori model, which includes pain, lubrication, orgasm, satisfaction, and desire/arousal. However, Rosen et al. made a clinically-based decision to split desire and arousal into separate scales, as each construct can be defined independently. Rosen et al. also found the FSFI factors to have acceptable test–retest reliability and high internal consistency. Notably, Wiegel, Meston, and Rosen (2005) replicated these findings when comparing a sample of middle-aged women with various sexual disorders and a healthy control sample.

The FSFI has been validated for use in a variety of populations in which individuals may experience sexual difficulties. Indeed, analyses revealed FSFI factors to have high internal consistencies and have shown expected differences in sexual difficulties between middle aged women who are healthy and the following clinical populations: women who suffer from hypoactive sexual desire disorder (Meston, 2003), female sexual orgasmic disorder (Meston, 2003), vulvar intraepithelial neoplasia (Likes, Stegbauer, Hathaway, Brown, & Tillmanns, 2006), chronic pelvic pain (Verit & Verit, 2007), multiple sclerosis (Borello-France et al., 2008), and vulvodynia (Masheb, Lozano-Blanco, Kohorn, Minkin, & Kerns, 2004).

Carvalho, Vieira, and Nobre (2012) examined the factor structure of the FSFI in both clinical and non-clinical samples of Portuguese women. Unlike much of the prior research, they used confirmatory techniques to compare different models of the FSFI in their samples. Carvalho et al. compared a four-factor and five-factor model for clinical samples: in the four factor model, the desire and arousal were combined, but separated in the five-factor model (sexual satisfaction was excluded from analyses). Despite the five-factor showing better model fit as indicated by the fit indices, Carvalho et al. reasoned that, because the five-factor model violated a .70 measure distinctness threshold between the desire and arousal scale, the four-factor model was the better model for the clinical sample.

Though much empirical support of the FSFI has been found in clinical samples, two recent studies have examined its utility in non-clinical samples. Carvalho et al. (2012) found a five-factor model to best fit the data: desire, arousal, lubrication, orgasm, and pain (satisfaction was not examined). Opperman, Benson, and Milhausen (2013) used confirmatory techniques to examine the utility of the FSFI in a non-clinical sample of young Canadian women. They found that the original six-factor structure was the best fit for their data.

Similarly, studies have also provided empirical support for the utility of the Profile of Female Sexual Function (PFSF). McHorney et al. (2004) examined the PFSF in a sample of 580 oophorectomized women with hypoactive sexual desire disorder and healthy controls from North America, Europe, and Australia. Using exploratory factor analysis, they found a seven-factor structure: desire, arousal/orgasm, responsiveness, pleasure, self-image, concerns, and disinterest. However, McHorney et al. discarded the disinterest scale as it accounted for a small portion of the total variance. Further, McHorney et al. split the arousal/orgasm scale as arousal and orgasm are clinically observed to be separate constructs. In support of the measure’s reliability, McHorney et al. found good internal consistencies and test–retest reliability for each of the scales. The PFSF showed good clinical utility in its ability to discriminate participants experiencing sexual difficulties from the healthy controls. In further support, DeRogatis et al. (2004) found that the PFSF differentiated between clinical samples and healthy controls and provided support for the measure’s factor internal consistencies.

Many of the validation studies of the FSFI and PFSF have used samples of middle-aged adults and individuals with various sexual dysfunctions and medical conditions. However, these instruments may also be useful for studying more normative variations in sexual functioning, as would be seen in healthier populations. By validating these measures in non-clinical samples, we can evaluate the sensitivity of these measures to be able to quantify the more subtle differences anticipated within normative levels of functioning. To our knowledge, the utility of the FSFI in non-clinical samples has only been evaluated in two studies (Carvalho et al., 2012; Opperman et al., 2013) whereas the PFSF has not been examined in non-clinical samples.

Both the FSFI and PFSF were developed for the assessment of female sexual functioning. Male sexual functioning has typically been assessed through the use of other instruments, such as the International Index of Erectile Function (IIEF) (Rosen et al., 1997). However, measures of female and male sexual functioning are not parallel to each other. For example, though the FSFI and the IIEF share some similarities (e.g., both measures assess sexual satisfaction), there remain some significant differences (e.g., the IIEF does not assess subjective arousal whereas the FSFI does). The development of maximally similar measures for use in both genders would allow researchers to compare the relative influence of variables on female and male sexual health. For instance, the present study was part of a larger effort interested in the relations between affective (i.e., depression and anxiety) and sexual problems and how these relations differ between men and women (Kalmbach et al., 2012). The use of non-parallel measures would complicate such research, as any observed gender differences could either be due to true differences between women and men or to differences between the measures themselves.

The FSFI and PFSF were chosen to be validated in men rather than validating male sexual function measures in women. Measures of female sexual function place a lesser emphasis on the physical aspects of sexual function than measures of male sexual function. As many of the gender differences in sexual function are physiological, similar measures would need to place lesser emphasis on these physiological, gender-specific difficulties (e.g., lubrication and erection difficulties) though we also did not want to ignore them. As such, any measure with gender-specific items would have to be modified to appropriateness for men (e.g., replace lubrication difficulty items with erection difficulty items); however, this practice should not be considered an attempt to equate the two constructs. Notably, some measures of male sexual function attend more so to difficulties with premature ejaculation than to delayed ejaculation. However, this study was part of a larger effort in examining the relations between affective and sexual problems. Kennedy, Dickens, Eisfeld, and Bagby (1999) showed that delayed ejaculation has twice the prevalence of premature ejaculation in depressed men. As such, we chose to validate in men the FSFI and PFSF, which assess delayed and difficult orgasm.

The present study sought to examine the utility of the FSFI, Male Sexual Function Index (MSFI) (adapted for use in this study), and PFSF in young adults. As the FSFI has gender-specific items, we modified the FSFI and to create the MSFI. The MSFI was created to be maximally similar to the FSFI to allow for comparisons between men and women, while minimizing measurement confound. Unlike the FSFI, as the PFSF does not contain gender-specific items, it was unaltered for administration to male participants. As past research has outlined the factor structures of these measures, we employed confirmatory factor analysis (CFA). We hypothesized that the previously identified factor structures of the FSFI and PFSF would hold true for our sample of young female adults. Although our review of the literature did not include male samples, we also elected to employ CFA for the examination of the MSFI and PFSF in young male adults. We hypothesized that the structure of the MSFI in the male sample would be parallel to the previously identified factor structure of the FSFI in women. Similarly, we predicted that the previously identified factor structure of the PFSF would also hold true for men.Footnote 1

Method

Participants

The sample consisted of 1,258 undergraduate students (748 women) who were screened to be antidepressant-free for at least 2 weeks prior to participating in the study. A total of 731 participants (409 women) reported sexual activity with a partner (oral, anal, or vaginal sex, sexual foreplay) in the past 30 days. The participant age range was 18–29 years (M = 19.56). The mean age that participants first engaged in sexual intercourse was 16.22 years (SD = 2.16) for female participants and 16.18 years (SD = 2.47) for male participants. As for lifetime history of sexual experiences among the entire sample, 81.2 % reported having engaged in sexual intercourse, 82.9 % had received oral sex, 79.7 % had performed oral sex, and 78.8 % reported having performed self-stimulation. The mean number of lifetime sexual partners (defined as any persons with whom they engaged in any form of sexual activity) was 6.17 (SD = 2.16) for men and 3.81 (SD = 4.54) for women. The mean number of partners within 30 days of participating was .98 (SD = 1.08) for men and .76 (SD = .63) for women. Participants were largely heterosexual (87.3 %), with a smaller proportion being bisexual (10.7 %) or homosexual.

Procedure

Individuals were recruited from introductory psychology classes at a large midwestern university in the US and received course credit for their participation. The present study used a cross-sectional design. All instructions were changed to ask participants to report their experiences over the past month (30 days) to both standardize assessment window across all measures as well as to encompass the average menstrual cycle. Additionally, definitions of sexual terms were provided to individuals prior to participation. The Institutional Review Board approved this study. All participants were required to provide written informed consent prior to participation.

Measures

The FSFI (Rosen et al. 2000) is a 19-item self-report questionnaire of female sexual functioning. The FSFI assesses sexual desire, psychological arousal, lubrication, pain, satisfaction, and orgasm achievement over the previous 30 days. We slightly modified the FSFI’s definitions of sexual activity and intercourse to also include anal sex, as we felt that it was a relevant sexual activity for both our heterosexual and homosexual participants. As some participants in the study were not expected to have sexual partners in the assessment window, items specific to sexual activity with a partner were given an additional response choice of “I have not had a sexual partner in the past 30 days” (e.g., Item 14). For items that were specific to sexual activity with a partner, we treated non-sexually active participants’ data to be missing and computed prorated (using mean substitution) scores for the scale (Brotto, 2009; Meyer-Bahlburg & Dolezal, 2007). However, individuals’ factor scores were treated as missing if more than 25 % of data in a given factor was missing. Our rationale was that we believed treating a response of “No Sexual Activity” as zero would artificially bias scores into indicating higher dysfunction whereas proration allowed us to estimate the total scale score based on their other responses within the same scale. We decided to only prorate when the response rate was 75 % and above so as to minimize the impact of our estimation on the data.

The MSFI is a 16-item self-report questionnaire of male sexual functioning over the previous 30 days that was created by modifying the FSFI. As previously mentioned, the MSFI was created rather than using an existing measure of male sexual functioning to allow for some comparison of sexual problems between genders. That is, utilizing maximally similar measures in order to decrease confounding between measures and biological sex was important. Similar to the FSFI, we slightly modified the MSFI’s definitions of sexual activity and intercourse to also include anal sex, as we felt that it was a relevant sexual activity for both our heterosexual and homosexual participants. In creating the MSFI, we replaced FSFI items assessing lubrication items with items assessing erection difficulties [see Kalmbach et al. (2012) for complete list of new items]. Also, the Pain scale was removed when adapting the MSFI due to the low prevalence of sexual pain in men (see Rosen, 2000). However, the desire, psychological arousal, orgasm, and satisfaction items were unchanged from their FSFI phrasings. Like the FSFI, the MSFI contains items that are specific to sexual activity with a partner. Again, partner sex items were accompanied by a response option indicating an absence of partner sex during the assessment window. These data were considered missing, which allowed for proration of the scale scores.

The PFSF (McHorney et al., 2004) is a 37-item self-report questionnaire assessing sexual function. The PFSF is an assessment tool of sexual difficulties, including desire, psychological arousal, orgasm achievement, pleasure, responsiveness, concerns, and self-image over the past 30 days. As items on the PFSF are not specific to females, the items were unchanged for male participants. Further, because the PFSF contains items that are specific to sexual activity, participants were provided with a response option indicating an absence of sexual activity during the assessment window. Similarly to the FSFI and MSFI, these data were considered missing, which allowed for proration of the scale scores.

Data Analysis

All analyses were conducted separately for each gender. To evaluate the utility of the FSFI, MSFI, and PFSF in young adults, we examined the structure and reliability of these measures. To examine the factorial structures of the FSFI, MSFI, and PFSF, we conducted CFAs for each measure in its corresponding sample (i.e., FSFI and PFSF in women, MSFI and PFSF in men). A notable benefit of using CFA over exploratory techniques is that it allowed us to test the validity of the FSFI and MSFI two-item desire scales. Exploratory factor analysis requires a minimum of three to five items for a factor and, thus, cannot validate a two-item scale whereas CFA models with multiple factors require that each factor have two or more indicators (Fabrigar, Wegener, MacCallum, & Strahan, 1999; Kline, 2013; Velicer & Fava, 1998). We predicted that the FSFI and PFSF’s original structures would be supported in our sample of young, healthy women. Similarly, we hypothesized that the MSFI’s structure would correspond to its female counterpart’s original structure, and that the PFSF’s original structure would be supported in our male sample. To examine the reliability of these measures, we used Cronbach’s alpha to investigate the internal consistency of the scales of each questionnaire.

Results

Factorial Structure

FSFI in Women

We tested the six-factor structural model of the FSFI with all latent variable variances set to 1.0 and, based on Hu and Bentler’s (1998) recommendations, and found adequate to good model fit, χ2(137) = 683.28, p < .001, CFI = .91, TLI = .88, RMSEA = .07. Examination of the factor loadings revealed notable differences between items that were worded positively versus those that were worded negatively. As such, we suspected that item-valence markedly influenced the observed structure. Items worded in a negative direction tended to cluster together, as did the items worded in a positive direction. Therefore, we employed a multi-trait multi-method (MTMM) CFA (Kline, 2011) to model method variance due to item-valence. This approach allowed for modeling item covariance as a function of (1) the associated sexual functioning construct and (2) whether items were worded in a negative or positive direction.

To create our MTMM CFA model, we used two exogenous factors such that all positively-valenced items (e.g., How often did you become lubricated during sexual activity or intercourse?) were loaded onto one factor and all negatively-valenced items (e.g., How difficult was it to become lubricated during sexual activity or intercourse?) were loaded onto a second factor. These loadings were in addition to the already established loadings of the items onto their corresponding sexual functioning factors (see Fig. 1 for a partial model). We then tested this MTMM CFA model, which produced very good model fit, χ2(118) = 303.01, p < .001, CFI = .97, TLI = .95, RMSEA = .04.Footnote 2 Additionally, analyses showed that the MTMM CFA was a significant improvement over the initial model, Δχ2(19) = 380.27, p < .001. All items significantly loaded onto their respective factors (see Table 1 for factor loadings).

Fig. 1
figure 1

Partial model of the Female Sexual Function Index multi-trait multi-method confirmatory factor analysis (error terms not shown in model)

Table 1 Female Sexual Function Index, multi-trait multi-method confirmatory factor analysis factor loadings, and standardized regression weights

PFSF in Women

We tested the seven-factor structural model of the PFSF and found poor model fit, χ2(608) = 4540.59, p < .001, CFI = .86, TLI = .84, RMSEA = .09. Similar to the FSFI, substantial variance in the PFSF items appeared to be due to item-valence. Thus, we tested a MTMM CFA model similar to the previously described, which produced good model fit. However, the Responsiveness factor did not conform to the original structure. Upon examination, we found that the first two items produced high loadings of .57 and .63 whereas the latter five item loadings ranged from .02 to .19. These data suggested that the first two items of the Responsiveness scale were not measuring the same construct as the latter five items. Inspection of the scale items showed that the latter five items appeared to measure sexual avoidance (e.g., “I avoided having sex”) whereas the first item appeared to measure sexual initiation and the second item measured responsiveness. As the data suggested that the Responsiveness scale was measuring more than one construct, we removed the first two items of the scale, thus revising the Responsiveness factor into an avoidance factor. After re-running the MTMM CFA, our model yielded good fit, χ2(573) = 2488.53, p < .001, CFI = .93, TLI = .91, RMSEA = .06. Additionally, analyses showed that the MTMM CFA was a significant improvement over the initial model, Δχ2(35) = 2052.26, p < .001. All item loadings were significant (see Table 2 for factor loadings).

Table 2 Profile of Female Sexual Function multi-trait multi-method confirmatory factor analysis factor loadings and standardized regression weights

MSFI in Men

We tested the five-factor structural model of the MSFI and found poor model fit, χ2(94) = 440.82, p < .001, CFI = .86, TLI = .80, RMSEA = .08. However, similar to the analyses for women, we once again observed substantial variance in the MSFI items due to item-valence. We then tested a MTMM CFA model accounting for item valence, which yielded good fit, χ2(78) = 229.51, p < .001, CFI = .94, TLI = .89, RMSEA = .06Footnote 3 and was a significant improvement over the initial model, Δχ2(16) = 211.31, p < .001. Item loadings on each factor were significant (see Table 3 for factor loadings).

Table 3 Male Sexual Function Index multi-trait multi-method confirmatory factor analysis factor loadings and standardized regression weights

PFSF in Men

We tested the seven-factor structural model of the PFSF and found poor model fit, χ2(608) = 3495.86, p < .001, and additional fit indices, CFI = .83, TLI = .80, RMSEA = .09. After again observing substantial variance due to item-valence, we then tested a MTMM CFA model, which produced good model fit. However, the Responsiveness factor once again did not conform to its original structure. Similar to what was found in the female sample, the first two items did not correlate with the latter five items. After employing the same modification used in the female sample, we re-ran the MTMM CFA and our model produced adequate to good model fit, χ2(573) = 2255.29, p < .001, CFI = .90, TLI = .88, RMSEA = .07, and was a significant improvement over the initial model, Δχ2(35) = 1007.33, p < .001. All item loadings were significant (see Table 4 for factor loadings).

Table 4 Profile of Female Sexual Function multi-trait multi-method confirmatory factor analysis factor loadings and standardized regression weights

Reliability

Internal consistency was examined for each measure to evaluate inter-item correlations within each measure’s scales. The FSFI produced high internal consistencies in each of its subscales (see Table 5), with Cronbach’s alphas ranging from .81 (lubrication) to .89 (desire). Similarly, the internal consistencies in the PFSF for the female sample ranged from high (orgasm; α = .84) to very high (pleasure; α = .99) (see Table 5). The MSFI yielded internal consistencies ranging from adequate (orgasm; α = .66) to high (desire; α = .85) (see Table 6). The PFSF for the male sample produced adequate (orgasm; α = .71) to very high (pleasure; α = .97) internal consistency (see Table 6).

Table 5 Means, SDs, internal consistencies, and correlations among sexual functioning measures (women)
Table 6 Means, SD, internal consistencies, and correlations among sexual functioning measures (men)

Discussion

The present study sought to examine the validity of commonly used measures of sexual functioning in a sample of healthy, young adults. Findings of our confirmatory techniques supported the original structures of the FSFI and MSFI, the latter of which was adapted for this study to be maximally similar to the FSFI (from which it was based). However, when evaluating the PFSF, the original structure required modification in both the male and female samples. Specifically, we revised the PFSF Responsiveness scale by removing the first two items, thus creating a Sexual Avoidance scale. This modification resulted in a supported structure of the PFSF in both sexes.Footnote 4 Further supporting the utility of these measures, reliability analyses revealed adequate to very high internal consistencies among the scales of the three measures.

As reviewed in the Introduction, the validity of the FSFI and PFSF has largely been supported in numerous studies using older female samples with medical and/or sexual difficulties. To our knowledge, only two previous investigations have examined the FSFI in non-clinical women whereas the PFSF had yet to be examined in sexually healthy women. Additionally, with our use of confirmatory techniques, our findings supported the separation of the desire and arousal scales in the FSFI for use in young women. This research is important in that it advances the study of sexual health by providing empirical evidence supporting the appropriateness of administering the FSFI and PFSF to younger, healthier female populations. We believe that the ability to detect normal variations in sexual response is crucial to identifying biological and psychosocial influences and vulnerabilities that correspond to pre-morbid changes in sexual function. Therefore, we believe that our findings supporting the use of the FSFI and PFSF in non-clinical samples are important to this etiological research.

This study was also the first examination to support the MSFI and PFSF for use in young, healthy men. As previously stated, the MSFI was created for this study to be maximally similar to the FSFI whereas the PFSF, as administered to our male sample, was unaltered from its original form. Empirical support of maximally similar measures for use in men and women will allow researchers to evaluate the relative influences on female and male sexual health. By using these parallel measures, we were elsewhere able to demonstrate that affective problems were differentially predictive of sexual functioning for women versus men (Kalmbach et al., 2012). Due to the numerous biological differences between men and women, we caution against equating scores across gender. However, the development of maximally similar measures better enables researchers to make comparisons across gender, without differences in measures confounding their results. For example, parallel measures will allow researchers to better examine such hypotheses as: Do men and women differ in their normative fluctuations in desire across the lifespan? Does life stress differentially predict sexual dysfunction for men and women? Our findings showed that the FSFI, MSFI, and the modified PFSF can be helpful answering these types of questions for researchers who are interested in male and female sexual functioning.

This study was also the first to propose a reconceptualization of the Responsiveness scale of the PFSF. In both female and male samples, analyses revealed that the PFSF did not conform to its original structure. Specifically, the first two items did not correlate with the latter five items (even after accounting for item valence), results that were replicated in both genders. This finding resulted in removing the first two items of the Responsiveness scale to form an Avoidance scale; the renaming of the scale was based on inspection of the items that revealed a theme of sexual avoidance. This revised Avoidance scale was supported in both men and women. Importantly, our finding not only argues for future studies to administer or score the PFSF in such a way as to account for this modified avoidance scale, but the finding also invites future investigations to create a more coherent responsiveness scale. We contend that both sexual avoidance and responsiveness are important aspects of sexual behavior. As such, both constructs warrant research interest. Because empirical evidence exists in support of both the PFSF’s responsiveness scale (McHorney et al., 2004) and avoidance scale (the present study), further psychometric studies of these scale conceptualizations are needed to examine their validity.

Lastly, this investigation supported the factor structure of these instruments, but it is important to note that this was enabled through the use of an MTMM approach to the factor analysis, which attempted to isolate the effects of item valence. Though this statistical approach may appear to simply account for nuisance variance in individual self-report, we believe that the significant impact of item valence across all measures for both genders has important psychometric implications for the study of sexual health. The superiority of MTMM models conceptually demonstrated that the wording of items led to observable differences in response. To illustrate, our findings suggest that asking a man how easily he can produce an erection will likely generate a response that is not simply the inverse of asking the same participant how difficult it is for him to produce an erection. Strongly agreeing with the former was empirically distinct from strongly disagreeing with the latter. To highlight the strength of the effect, even after employing multi-method techniques to account for this variance due to item valence, some factor loadings still showed differences between how positively and negatively worded items loaded onto factors of sexual functioning in which a mixture of both valences appear (see Tables 1, 2, 3 and 4). Based on those differences, the mixture of positive and negative items may have tempered the internal consistencies for these scales. Men appeared to particularly respond differently to positive and negative items on the erection and orgasm scales, which was reflected in their lower internal consistencies as compared to other factors. It remains uncertain whether writing items to detect dysfunction rather than writing items to detect healthy functioning produces more or less valid results for either men or women. Future studies are needed to examine these possibilities.

Though we believe this investigation contributes to our knowledge of normative sexual functioning and the assessment of sexual functioning, some study limitations should be acknowledged. One limitation regards the omission of a premature ejaculation scale in either measure of male sexual functioning. If using the MSFI and/or PFSF, researchers studying sexual health in young men may want to use additional measures or scales, if interested in premature ejaculation, which is a common sexual complaint among men (Laumann, Paik, & Rosen, 1999). Also, future validation studies that wish to examine the utility of male sexual function measures for use in normal populations would benefit from examining measures that assess premature ejaculation. Furthermore, researchers have recently identified female premature orgasm as an area of interest in sex research (Carvalho et al., 2011). Thus, future investigations would benefit from not only examining premature ejaculation in men, but women as well. Additionally, the present study did not attempt to examine the utility of measuring male sexual pain in our young, healthy sample. Though male sexual pain receives little attention in the field of sexual health research (Davis, Binik, & Carrier, 2009; Luzzi & Law, 2006), our understanding and treatment of male sexual pain would improve with both increased attention to the phenomenon and better methods to measure it. Further, we cannot generalize our findings in our sample of young, healthy men and women to other populations. Notably, though our samples consisted of bisexual and homosexual men and women, these groups were not large enough to allow for examination of sexual orientation as a factor in our study.

Sexual health research would greatly benefit from future studies that build upon the findings of this study. Though much research exists supporting these measures in various female populations, the MSFI and PFSF warrant evaluation in male populations other than young adult men as studied here. Indeed, just as the validity of the FSFI and PFSF warranted examination in younger women, studies determining the applicability of the MSFI and PFSF to older and more sexually dysfunctional men deserve attention. Should these measures prove to be appropriate for these other male populations, the field of research examining determinants and consequences of male sexual difficulties would greatly benefit.

Additionally, to our knowledge, this study was the first to identify item valence and message framing as an important aspect to psychometrics in sex research. We observed that individuals responded to questions about their sexual functioning very differently depending on whether these questions were framed positively or negatively. Additionally, even after employing techniques to best account for variance due to positively and negatively worded items, which significantly improved all four models, our findings still showed some evidence of item valence playing a role in factor loadings (see Tables 1, 2, 3 and 4). Indeed, assessing the sexual response in functional and dysfunctional women and men remains difficult. Patient reported outcomes are becoming the primary endpoints for clinical trials in treatments for sexual dysfunction since they are better suited to capture the multidimensional and subjective information collected in this research (Kingsberg & Althof, 2011). Therefore, much is unknown about the importance of this phenomenon and it is presently unknown whether or not one valence (i.e., positive or negative) is more valid than the other. Future studies are needed to address this question, comparing self-reported sexual functioning to functioning as assessed through an alternative methodology (e.g., corroborating reports from partners; physiological assessments). Also, it is possible that the item valence phenomenon may differ in other cultures with different attitudes toward sex. These findings not only highlight the importance of message framing in the psychometric validation of sexual functioning measures, but also to other areas of sexual research.

Importantly, the use of confirmatory techniques is what allowed us to partition out the method variance due to item valence. For this reason, we urge future investigations to employ similar techniques, as the ability to account for message framing variance may result in more consistent findings across studies. Additionally, as the FSFI, MSFI, and IIEF contain two-item scales, future studies should avoid using exploratory factor analysis for these measures, given that EFAs require that each scale have a minimum of three to five indicators. As such, we strongly urge future investigations of these measures to utilize confirmatory techniques to allow for validation of two-item scales, which may also yield more reliable findings across studies.

The implications of this study also extend to clinical practice. Indeed, clinicians now have empirical evidence showing that these measures are appropriate for use in these populations. Therefore, these tools can be used to evaluate function and change in function in patients with subclinical sexual concerns as well as diagnosed sexual disorders. Also, not only are maximally similar measures of sexual function between men and women important to research, but it can be particularly useful in couples counseling when the dyad is heterosexual. Having the ability to administer similar measures to both members of the couple allows for easy comparison. For instance, if both partners complete the same questionnaire, it allows the clinician to more easily and efficiently compare levels of, say, sexual desire or arousal between partners.