I thank all the commenters for their thoughtful, thought-provoking responses. I agree with almost all the points raised, so that the entire suite of responses is, to me, a wonderful review of what we know and what we can reasonably speculate about human sexual orientation. Many commenters expressed surprise at findings I didn’t discuss, and I’m happy they had the opportunity to cover that additional material. For example, the discussion of partner preference in the animal literature (Adkins-Regan, 2017; Balthazart & Court, 2017; Baum & Bakker, 2017) is an excellent addendum to my comments. My difficulty relating partner preference in animals to human sexual orientation is obviously a judgment call, so now readers can judge for themselves. I was particularly delighted with Balthazart and Court’s (2017) suggestions for possible gene candidates affecting sexual orientation, and Skorska and Bogaert’s (2017) discussion of handedness and height, which I had neglected. Nevertheless, I have an apology to make and a few disagreements, to make my position on these topics clear for any future sex researchers.

Mea Culpa

As several commenters gently noted (Adkins-Regan, 2017; Balthazart & Court, 2017), it would have been better if my opening statement about the organizational role of androgens in sexual differentiation of brain and behavior had been limited to “mammals” rather than “vertebrates,” as things are obviously a bit different in birds and, truth be told, there is precious little literature on the remaining vertebrates. My goof.

Testosterone’s Role in Sexual Orientation of Men

In several commentaries, summaries of my position might be misinterpreted to indicate that I feel prenatal testosterone plays no role in affecting sexual orientation in men, only in women. To be clear, my position is that testosterone plays a very important role in affecting sexual orientation in men, i.e., that the reason about 95% of men are gynephilic is because they were exposed to ample levels of androgens such as testosterone before birth. The fact that lesbians, on average, appear to have been exposed to more prenatal androgen than straight women makes a role of prenatal androgens in gynephilia among males much more plausible, at least to me. My point about the differences in orientation in women versus men was, rather, that we cannot (easily) explain why a minority of men are androphilic, given that nearly all of them were exposed to much more prenatal androgen than almost any lesbian. The question is whether prenatal androgens can explain variation in men’s orientation, and so far, I find the data unsupportive.

As for the role of perinatal androgen in women’s sexual orientation, LeVay (2017) is quite correct to emphasize that many females become lesbians without any contribution from androgens. Like Skorska and Bogaert (2017), I have wondered if prenatal androgens alter the probability of a girl becoming a lesbian by acting not upon her brain, but upon her facial features (Weinberg, Parsons, Raffensperger, & Marazita, 2015), thereby altering the way others react to her. Like Pasterski (2017), I think a strong case is building that some variation in children’s gender nonconformity can be attributed to variation in prenatal androgen in girls, but not boys (Atkinson, Smulders, & Wallenberg, 2017; Wallien, Zucker, Steensma, & Cohen-Kettenis, 2008). More research is needed!

Timing is Absolutely Everything, and So is Location

Several commenters made the point that if variance in the amount of androgen exposure does contribute to variance in sexual orientation in men, it is always possible that the sensitive period during which androgens act on orientation might be different from the sensitive period affecting digit ratios (or otoacoustic emissions) (Baum & Bakker, 2017; McFadden, 2017; Skorska & Bogaert, 2017). If so, that could explain why the body markers of early androgen exposure might not differ between gay and straight men, even if one group had been exposed to less androgen than the other during some point in development. Of course, this logic is sound and I agree it is possible, plausible even.

But to fully explain behavioral differences between gay and straight men as related to androgens, we would have to posit not only different sensitive periods for androgen effects on somatic traits versus behavior, but also between the various classes of behavior since, as I pointed out originally, gay men who appear feminine in terms of occupational preferences and love interests are very masculine in terms of other behaviors, such as interest in casual sex and visual pornography.

McFadden (2017) goes further to point out that the cells responding to androgen during these sensitive periods are also spatially distinct, and therefore it is possible that androgen levels may be higher in one somatic location than another, leading to dichotomies in measured response to the hormone. Although we don’t think of highly lipophilic substances like steroids as likely to set up gradients in the body, we can sometimes detect the presence of steroid gradients (Rand & Breedlove, 1992). Plus, steroids do not work like magic, in a vacuum, but rather must rely on lots of cellular machinery to effectively affect any given target cell, and there may well be spatial gradients in the availability of cofactors and the like, such that the same level of androgen at two different sites may very well result in quite different levels of response. I find this argument plausible, too.

Likewise, I agree with the several commenters (LeVay, 2017; McFadden, 2017; Skorska & Bogaert, 2017) that if both low and high androgens increase the likelihood that a boy will grow up to be gay, then of course the average would be no different from straight men and markers of prenatal androgen would therefore not differ, either. I agree that we may someday be able to classify gay men into categories, such as tops and bottoms (Swift-Gallant, Coome, Monks, & VanderLaan, 2017), to detect subclasses of gay men who display indications of lower, or greater, levels of perinatal androgen than straight men.

Adventures in Variance

One of the few instances when I disagree with several commenters concerns thinking about sources of variance for any biological variable (Baum & Bakker, 2017; Pasterski, 2017), an issue that arises in many other published (or posted, or whispered) remarks about digit ratio work. Behavior is influenced by many different factors, so one can never say, “This factor and this factor alone determines the amount of behavior X, Y, or Z.” In fact, if you try to replace the word “factor” with some particular influence, and specify a particular behavior, the sentence soon becomes absurd. There is no behavior that is influenced by any one hormone alone. Natural selection simply does not work like that, nor does the universe.

But the same goes not just for the complex things we love like behavior, but for any biological variable, including things as objective as structure. For example, a genetic screen identified over 700 genes that affect human height, 83 of which had “major” effects, meaning a difference in alleles would result in differences in height of 1–2 cm (Marouli et al., 2017). If androgen is responsible for men being taller, on average, than women (does anyone doubt this?), then why aren’t all men taller than all women? After all, all the men were exposed to more androgen than (almost) all the women, right? For one thing, because both sexes, in addition to having different androgen exposures, carry a mix of those 700 genes. Given the influence of so many genes on the trait, if we were studying the effects of growth hormone on height, should we be surprised if everyone exposed to the hormone regimen did not respond exactly the same? If we found a group of people carrying a gene coding for an impaired growth hormone receptor, we would expect them to have a smaller average height than other people, but we would have to use statistics to see it because of the variability in both groups caused by the other 699 genes.

So, for me, the fact that digit ratios are “noisy,” that they do not perfectly reflect prenatal androgen exposure (Baum & Bakker, 2017; Pasterski, 2017), is simply an acknowledgment of the nature of all measures in the life sciences. Why should 2D:4D differ from hormone levels, anatomical measures, behavioral assays or any other measure we might use? Of course, there are things affecting digit ratios in addition to prenatal androgen. How could it be otherwise unless androgens work by magic? Variance is universal; get over it.

As for concerns about why digit ratios are more sensitive to androgen in the right hand than the left, we simply must accept that this is so, since meta-analysis confirms the sex difference is greater on the right than the left (Honekopp & Watson, 2010), and digit ratios in mice are also more responsive to perinatal hormone treatment on the right paw than on the left (Zheng & Cohn, 2011). It is, of course, unsatisfying that we don’t know why this is true, but then we don’t have any understanding of the developmental origins of many lateralities, such as why most people are right handed, or why language is more often analyzed by the left cerebral hemisphere than the right. Scientists studying handedness or language simply have to accept this is true without knowing why. That doesn’t mean they are reckless researchers or cast doubt on their findings about handedness or language. Similarly, not knowing why the right hand is more androgen-sensitive than the left does not mean that we can’t exploit the right hand’s sensitivity.

Doublethink

In the animal literature, the definitive proof that a particular mammalian sex difference is mediated by activation of the androgen receptor (AR) is to find that the trait in question is feminine in XY individuals with a genetically dysfunctional gene for AR. If the only difference between two groups of genetically male (XY) animals is that one has a functional AR and the other does not, then any differences between them must be due to the difference in that gene. If a sexually differentiated trait is fully masculine in Tfm rats with a dysfunctional AR gene, such as the volume of the SDN-POA (Morris, Jordan, Dugger, & Breedlove, 2005), then clearly ARs are not necessary for masculinization of that trait. On the other hand, if a trait, such as size of neurons within the SDN-POA, is fully feminine in Tfm rats (Morris et al., 2005), then functional AR is required for masculinization of that trait. Such results are the gold standard for proving a trait is, or is not, androgen-sensitive.

If it sounds like I’m belaboring the point here, it’s because there is a strange doublethink (Orwell, 1949) in both published reports that digit ratios are feminine in XY women with androgen insensitivity syndrome (AIS), echoed by Baum and Bakker (2017). These reports (Berenbaum, Bryk, Nowak, Quigley, & Moffat, 2009; van Hemmen, Cohen-Kettenis, Steensma, Veltman, & Bakker, 2017) somehow interpret their findings not as fulfilling the gold standard of demonstrating androgen sensitivity, but as casting doubt about whether digit ratios are androgen-sensitive. How can these authors, having just presented the definitive, conclusive evidence that the sex difference in digit ratios depends on androgen simultaneously think they do not?

These are interesting examples of twists of logic one may go through in order to reject a measure one disapproves of. That both empirical reports disapprove of digit ratio research is made plain even in the final lines of their abstracts, where one expresses the opinion that digit ratios are “not a good marker” (Berenbaum et al., 2009), and the other concludes digit ratios are “not recommended” (van Hemmen et al., 2017). “Not recommended” comes as close to “no further research is needed” as I’ve ever seen in an article in PubMed. Having made these value judgments, both papers somehow have to reconcile them with the data, which solidly supports the idea that digit ratios are indeed androgen-sensitive.

In the earlier report, the reason offered is again the variance bugaboo—because there is extensive overlap between groups in the distribution of digit ratios, including males versus females, they are not “good markers” (Berenbaum et al., 2009). In other words, if you have to use statistics to detect the effect, it is “not good.” Really, are we to restrict our measures to those that show no overlap between groups? I suppose that would make the use of statistical reasoning superfluous. If there is overlap, are we not to use statistics to gauge whether the difference is real? Because there is overlap between the sexes in virtually every morphological trait, including height, phallus size, and extent of beard growth, this reasoning would suggest that there are no sex differences at all in humans, and indeed has been marshaled to declare there are no sex differences in the human brain (Joel et al., 2015), which has been cogently disputed (Del Giudice et al., 2016). Yes, if you use digit ratios to compare levels of prenatal androgen exposure, you will have to gather large enough samples and use inferential statistics to judge whether the difference is real, like virtually every other behavioral or morphological trait studied by modern scientists. If they are “not good” or “not recommended,” then so are the other measures, like hormone assays, behavioral tests, and every morphological measure.

In the more recent report about digit ratios in AIS, yet another objection is offered, repeated in Baum and Bakker (2017), namely that because there was not greater variability in digit ratios of control women than women with AIS, that means digit ratios do not reflect androgen (van Hemmen et al., 2017). In both the original report and the commentary, this point about comparing variances is termed a “prediction,” but in fact, a prediction, by definition, is something proposed before the data are known, and as far as I know, this idea first appeared after the report of Berenbaum et al. (2009), in fact as a commentary (Wallen, 2009) upon that report. This “prediction” would be more accurately described as post hoc hand-waving in an attempt to reconcile the data with the authors’ expectations.

We can relate this objection to the earlier discussion about sources of variance for human height. With 700 different genes at work, if we gathered all the people carrying a particular allele for one of those genes, and found they were indeed shorter or taller than the rest, that would be a coup worthy of publication in Nature (Marouli et al., 2017). Would we also expect to be able to detect a reduced variance in height in that group, because one of those genes was the same, while the other 699 were left uncontrolled? I wouldn’t.

One flaw in the reasoning that women with AIS should show less variability in an androgen-sensitive trait than control women is that it assumes that all the women with AIS are equally androgen-insensitive. In fact, neither study determined which alleles the AIS women carried. As there is a great range of androgen insensitivity in humans (Mongan, Tadokoro-Cuccaro, Bunch, & Hughes, 2015), the differences in AR alleles between AIS women would be expected to add variance in any androgen-sensitive trait such as digit ratios.

Another indication of the post hoc nature of this “prediction” is that, of the many reports that have examined Tfm rats and mice to determine whether AR plays a role in sexual dimorphism, I can find no record of researchers ever suggesting such a finding about variances was needed to confirm a role for AR. In fact, in our several publications about brain measures in rodent equivalents of AIS, the testicular feminization mutant (Tfm), we have never seen significantly less variance in Tfm animals than in control females, even when the Tfm animals are clearly less masculine than males for the trait in question. Just leafing through our publications, looking solely at the brain measures where Tfm males are demasculinized compared to wild-type males, which proves the trait is androgen-sensitive, we see that variance is equal between Tfms and females for the volume of the left bed nucleus of the stria terminalis, but variance is slightly (not significantly) greater in Tfms than females on the right (Durazzo, Morris, Breedlove, & Jordan, 2007). In the ventrolateral portion of the ventromedial hypothalamus, Tfms also have greater variance in volume than do females (Dugger, Morris, Jordan, & Breedlove, 2007). In the SDN-POA, variance in neuronal somata size is greater in Tfms than females (Morris et al., 2005). Among five measures of astrocyte process complexity in the posterodorsal medial amygdala that are sexually differentiated, variance is about equal in Tfms and females for one, greater in females than Tfms in another, and greater in Tfms than females in the other three (Johnson, Breedlove, & Jordan, 2013). Thus, I can find no empirical support for the idea that androgen-insensitive individuals will necessarily display less variance than females for androgen-sensitive traits, even when sample sizes are ample to detect differences between the androgen-insensitive animals and wild-type males. Note that, if there were any merit to this criterion, then our work in rats and mice would be much more likely to conform to it, because our Tfm subjects within each species are all carrying identical AR alleles, and on a genetically homogenous background, unlike the human subjects in the AIS studies (Berenbaum et al., 2009; van Hemmen et al., 2017).

In the absence of any empirical confirmation of this idea that truly androgen-sensitive measures should be less variable in individuals with dysfunctional AR alleles than control females, I see no reason to regard it as valid, much less compelling. My feeling is that this armchair speculation reflects a rather naïve view of variability in biological systems. There are no biological traits that respond solely to one influence—that would be an impossibility, as it would require, for example, that no genes influence the trait. My expectation is that, because androgen has only a minority effect on digit ratios, we would need enormous sample sizes, before one could detect significantly reduced variance in digit ratios in AIS women versus controls.

Motes versus Beams

Perhaps most tellingly, none of these critics of digit ratio research offer a suggestion for a retroactive marker of prenatal androgen in adult humans that is “good” or “recommended” as an alternative to 2D:4D. I feel sure that anogenital distance (AGD) (Pasterski, 2017) would be more sensitive to prenatal androgen exposure than digit ratios, but unfortunately there are cultural barriers that make it impractical to measure AGD in large non-clinical samples. Unfortunately, most people would be unwilling to place their uncovered pudenda on the platen of my copier simply to aid the scientific enterprise.

But the strangest thing about the publications of AIS insinuating that digit ratios do not reflect prenatal androgen, either because there is overlap in the measures and/or because there is not reduced variance in women with AIS, is that they obsess with a single point casting doubt (perhaps) on the validity of the ratios, without coming to grips with the mountain of data supporting validity of the ratios. If one really thinks digit ratios do not reflect perinatal androgen, then how does one explain the feminine nature of the ratios in XY women with AIS (Berenbaum et al., 2009; van Hemmen et al., 2017), to say nothing of the sex difference in humans (Manning, Scutt, Wilson, & Lewis-Jones, 1998) and mice (Brown, Finn, & Breedlove, 2002), the masculinized ratios in people with CAH (Brown, Hines, Fane, & Breedlove, 2002; Okten, Kalyoncu, & Yaris, 2002; Oswiecimska et al., 2012; Rivas et al., 2014) (the study of CAH highlighted in Pasterski [2017] examined only the left hand, which is less sensitive to perinatal androgen), the demasculinized ratios in men with Klinefelter’s syndrome (Manning, Kilduff, & Trivers, 2013), the correlation of digit ratios and AGD in a non-clinical population of women (Barrett, Parlett, & Swan, 2015), or the robust response of digit ratios to perinatal androgen manipulations in mice, which is disrupted when AR is genetically disabled selectively in the forelimbs (Zheng & Cohn, 2011)? When it comes to weighing the evidence on whether perinatal androgens affect human digit ratios, the skeptics worrying about a mote in my eye should give some thought to the beam lodged in their own For that matter, if the digit ratios don’t reflect perinatal androgen, then how would one explain why they are masculinized in lesbians (Grimbos, Dawood, Burriss, Zucker, & Puts, 2010), especially butch lesbians (Brown, Finn, Cooke, & Breedlove, 2002) compared to straight women? Nail-biting? Manicure accidents? Pixie dust?