Introduction

There is a substantial and ever growing body of work exploring male expression of elaborate or exaggerated traits such as armaments, ornaments, or same-sex aggression (Andersson 1994; Kotaiho 2001; Fairbairn et al. 2007; Cox and Calsbeek 2009; Schuett et al. 2010). Consequently, we have a fairly robust understanding of the selection pressures that favor the expression of such traits in males, the function of those traits, and the physiological mechanisms that regulate their expression (Andersson 1994; Badyaev 2002; Adkins-Regan 2005). Generally speaking, males compete for limited reproductive resources, e.g., mates and territories, via intrasexual or intersexual selection (Andersson 1994). Traits that improve access to breeding resources via competitive ability (hereafter, competitive traits; West-Eberhard 1983; Abrams and Matsuda 1994) are often favored, though directional sexual selection is often counterbalanced by viability selection owing to the cost of competitive traits (West-Eberhard 1983; Abrams and Matsuda 1994; Andersson 1994; Shuster and Wade 2003). Competitive traits often covary, e.g., individuals that are large are also aggressive, forming competitive phenotypes (West-Eberhard 1983; Andersson 1994), and males vary greatly in the degree of expression of competitive phenotypes in ways that relate to reproductive success (Andersson 1994; Kotaiho 2001 and references therein).

Females in many vertebrate species also express competitive traits, and though these traits may differ in degree compared to males, they are often similar in kind (Andersson 1994; Owens and Hartley 1998; Amundsen 2000; Langmore et al. 2002; LeBas 2006; Clutton-Brock 2009; Watson and Simmons 2010; Rosvall 2011a). Surprisingly, despite the ubiquity of female expression of competitive traits, very little is known regarding the existence, extent, modulation, and functional consequences of interindividual variation in the expression of competitive phenotypes in females (Amundsen 2000; LeBas 2006; Clutton-Brock 2009; Rosvall 2011a). Currently, there is considerable discussion as to whether female expression of competitive traits is a nonadaptive by-product of selection favoring their expression in males (leading to a correlated response in females) or whether female expression is due to selection acting directly on females to favor such traits (Lande 1980; Amundsen 2000; Langmore et al. 2002; Blanckenhorn 2005; Ketterson et al. 2005; Chenoweth et al. 2008; Clutton-Brock 2009; Cox and Calsbeek 2009). To address this critical question, we need to determine which mechanisms underlie individual variation in phenotypic expression and ascertain whether selection acts on females according to degree of competitive trait expression.

Among male vertebrates, androgens are the most well-studied physiological mechanism regulating behavioral and morphological competitive traits, specifically testosterone (T). Experimental elevations of T in adulthood can alter levels of trait expression in numerous traits, leading to differences in the reproductive success and survival of males in numerous taxa (Balthazart and Ball 1995; Ketterson and Nolan 1999; Dufty et al. 2002; Adkins-Regan 2005; Reed et al. 2006; but see Lynn et al. 2002, 2005). Further, interindividual differences among males in circulating androgens or in the ability to produce and secrete androgens in response to stimuli have been related to the expression of competitive traits (McGlothlin et al. 2007b, 2008; Kempenaers et al. 2008; Williams 2008; While et al. 2010; but see Lynn et al. 2002, 2005; Van Duyse et al. 2004). Androgens are also present in females, and there is some evidence that variation in androgen exposure may contribute to within-sex differences in female phenotype in a similar manner (Cristol and Johnsen 1994; Staub and De Beer 1997; Langmore et al. 2002; Adkins-Regan 2005; Ketterson et al. 2005; Mank 2007). If female phenotype is regulated to a degree by the same mechanisms as male phenotype, then females producing higher levels of androgens should also express more competitive morphology and/or behavior (Staub and De Beer 1997; Langmore et al. 2002; Adkins-Regan 2005; Ketterson et al. 2005; Mank 2007).

It is currently unclear what advantage, if any, a female gains from competitive traits. Female expression of competitive traits may be a nonadaptive by-product of selection on males or due to selection acting directly on females (Lande 1980; Amundsen 2000; Blanckenhorn 2005; Ketterson et al. 2005; LeBas 2006; Chenoweth et al. 2008; McGlothlin and Ketterson 2008; Rosvall 2008; Clutton-Brock 2009; Cox and Calsbeek 2009). In species in which males compete for mates, males that are capable of expressing a more competitive phenotype may receive benefits that offset costs, i.e., an increase in the number of mates may compensate for a shorter lifespan. For females, additional mates may not enhance fecundity, so expressing a more competitive phenotype may incur the additional costs of trait expression without additional benefits (Kotaiho 2001 and references therein).

Three predictions can be made regarding the expression of competitive traits and reproductive success in females: (1) if the presently observed level of trait expression reflects the outcome of direct and ongoing stabilizing selection on females, then females with sex-typical expression should have the highest fitness; (2) if the current level of trait expression is an outcome of genetic correlations between the sexes constraining optimal trait expression in females, then females with greater than average expression should have lower fitness than the norm; (3) if selection is acting directly on females to favor traits that improve access to breeding resources, then selection should be directional on competitive traits at least during episodes of breeding and females expressing the most competitive phenotypes should benefit most. By relating fitness estimates to individual variation in the expression of competitive traits, we aim to integrate proximate and ultimate frameworks in order to better understand the evolution and maintenance of female expression of competitive traits (Lande 1980; Badyaev and Martin 2000; Fairbairn et al. 2007; Williams 2008; Cox et al. 2009).

Here, we investigate free-living females in the mildly dimorphic dark-eyed junco (Junco hyemalis). Female juncos express traits that vary in degree from males, but not in kind. Females are slightly smaller than males (males <2% larger in tarsus, <10% larger in wing and tail), slightly less ornamented, and follow a similar but lower seasonal profile of circulating T (Cawthorn et al. 1998; Nolan et al. 2002; Wolf et al. 2004; Ketterson et al. 2005). During the breeding season, males exhibit more readily observable intrasexual aggression (e.g., territorial singing). However, both sexes respond to experimental elevation of T with enhanced intrasexual aggression (Ketterson 1992; Clotfelter et al. 2004; Zysling et al. 2006; O’Neal et al. 2008), suggesting that T may be an important mediator of aggression in both sexes. Furthermore, recent work examining female aggression in juncos reports that females are consistently more aggressive toward same-sex intruders than opposite-sex intruders (Cain et al. 2011), suggesting that female aggression may act to limit competition from other females for paternal assistance or other reproductive resources (Yasukawa and Searcy 1982; Slagsvold et al. 1992; Sandell 1998; Langmore et al. 2002; Rosvall 2008).

Specifically, we addressed three questions regarding female expression of competitive traits. First, we asked whether female juncos show covariance in traits that may be important in a competitive arena (morphology and behavior). In juncos, individuals with longer wings have higher status in winter flocks (Ketterson 1979), leading us to predict that larger females will express greater aggression in a territorial context. Next, we asked whether hormonal phenotype (assayed using a physiological challenge, see below) would predict behavior and morphology, implicating T as a potential mediator of competitive phenotypes in females (Staub and De Beer 1997; Ketterson et al. 2005; Adkins-Regan 2005). Individual ability to produce T in response to this challenge is positively related to territorial aggression in males (McGlothlin et al. 2007b), and we predicted that females would show a similar relationship. Finally, in order to address whether the current levels of female trait expression are favored under direct selection or are more likely an indirect product of selection on males, we asked whether variation in intrasexual aggression was associated with nest success, a measure of fitness.

Methods

Study species, site, and population

Subjects were female Carolina dark-eyed juncos (J. hyemalis carolinensis), a socially monogamous songbird with biparental care (Nolan et al. 2002). This study took place from April 15 to August 10, 2008, on and around Mountain Lake Biological Station, in Giles County, Virginia (37°22′ N, 80°32′ W). The resident population was censused prior to onset of breeding activity (late April). Details regarding the study site and field methods and measurement techniques are described elsewhere (McGlothlin et al. 2005; Reed et al. 2006). Briefly, individuals were captured using baited mist nets and Potter traps. All individuals were banded with serially numbered metal bands and a unique combination of color bands for later field identification. Throughout the nesting season, efforts were made to locate all nests and identify the individuals associated with the nest using color bands. Juncos build open nests on the ground, which makes it easy to locate and monitor nests (Nolan et al. 2002) on a regular schedule until fledging (when offspring leave the nest) or failing (loss of offspring to a predator). If the young successfully fledged, the nest was classified as successful; if the nest was predated or destroyed before fledging, the nest was classified as failed. The fate of one nest was unclear; it was excluded from this analysis. Females were classified as successful if any nest fledged and failed if all attempts failed before fledging.

Morphology

During the population census, all individuals were measured for mass, tarsus, wing length, and tail length (to the nearest 0.5 mm) and aged using a combination of mark–recapture data from previous years and plumage coloration (Nolan et al. 2002; McGlothlin et al. 2005). Older birds generally have longer feather measures, so we adjusted wing length and tail length for age by centering. Because morphological measures are intercorrelated, we used a principal components analysis with a varimax rotation to extract two variables that captured >85% of the variance in morphology. Component 1 explained 58% of the variance (loadings: tail 0.89, wing 0.91, tarsus 0.37); for brevity it will be referred to as feather PC, though tarsus also loads positively. Component 2 explained 31% of the variance (loadings: tail −0.25, wing −0.14, tarsus 0.93); it will be referred to as tarsus PC.

Aggression towards a same-sex intruder

Intrasexual aggression was measured with standardized behavioral assay described elsewhere (Cain et al. 2011). Briefly, we recorded behavioral response to a caged conspecific female bird (lure) between days 3–11 of incubation, where day 1 is the first day of incubation (females incubate eggs for 12 days and nests are built approximately May 15 to July 15). The lure was placed in a small wire cage with large openings permitting a clear view, positioned 1–3 m from the focal female’s nest and covered by a camouflage-patterned cloth. The trial began when the female returned to within 5 m of the nest and the lure was uncovered. A single individual (KEC) observed all trials using binoculars and noted all behaviors and locations to a second observer who transcribed data and operated a stopwatch. We recorded the amount of time spent within 0.25, 1, and 5 m and the number of attacks (swoops at the lure or actual contacts with the lure’s cage). The lure used for a trial was randomly assigned from a group of five females captured offsite and held throughout the season. Behavioral response to the lure was recorded for 30 min.

Females varied greatly in their response, spending 0–1,314 s within 0.25 m of the lure (mean 312 s, SE ±65 s; 1,800 s possible). In preliminary studies, some females responded intensely for a short time period but did not persist, while other females were slow to respond but were persistent once they initiated response. To capture both types of responses, we recorded behavior separately for the first 10 min versus the entire 30-min trial. Because behaviors were intercorrelated, we used a principal components analysis to extract two variables, which together captured ~87% of the variance. Components were loaded as in Table 1. Component 1 (called general aggression PC) described 62% of the total variance, higher scores indicate greater aggression; component 2 (called latency PC) described an additional 23% of the variance, higher scores indicate a longer latency to respond (less aggressive). To determine whether behavioral responses were affected by other variables (female age, date of trial, number of eggs, and day of incubation), we used a stepwise backward multiple regression to eliminate nonsignificant factors (P = 0.25 for probability to enter, P = 0.10 for probability to leave) and found no significant effects (all F < 0.5, all P > 0.50). We also tested for an effect of lure identity using a one-way ANOVA (F < 0.10, all P > 0.5). None of these variables was a significant predictor, and all variables were excluded from later analyses.

Table 1 Loadings of the first two principal components of intrasexual aggression measured in behavioral assay

Hormonal phenotype

Gonadotropin-releasing hormone (GnRH) is a neuropeptide that triggers the cascade of events in the brain and pituitary that leads to the release of sex steroids by the gonads (Wingfield et al. 1991; Johnson 2000; Moore et al. 2002). The strength of response to the challenge is used here as a measure of hormonal phenotype, enabling exploration of the relationship between T, behavior, and morphological phenotype (McGlothlin et al. 2007b, 2008). In female juncos, T in response to GnRH is highest in the 7 days prior to oviposition, when females are rapidly yolking eggs (Jawor et al. 2007). During this period, females have a distinct “torpedo-like” shape to their abdomens and are heavy for their size. All females that were challenged were torpedo-shaped and/or heavy (n = 18). Mean mass of challenged females was 24.0 ± 0.42 g (mean±SE); mean mass for breeding junco females is 21.5 ± 0.18 g (Nolan et al. 2002). Females were captured using baited mist nets or Potter traps and transported to a central processing area. Capture time and handling time (time elapsed between capture and initiation of the challenge) were recorded. The specifics of the GnRH challenge are detailed elsewhere and are described only briefly here (Jawor et al. 2006a, 2007; McGlothlin et al. 2010). After processing, an initial blood sample was taken, followed by an intramuscular injection of 50 μL of a solution containing 1.25 μg of chicken GnRH-I (Sigma L0637; American Peptide 54-8-23). After 30 min had elapsed, a second blood sample was taken immediately for the post-challenge hormone measure, hereafter referred to as challenge T. Samples were centrifuged, and the plasma drawn off, frozen, and stored at −20°C until assayed. All challenges were administered between April 15 and May 15. Sample sizes differ (challenged females, n = 18; morphology, n = 94; behavior, n = 31) because females were only challenged if they were yolking eggs at capture, while all females that were captured were measured for morphology and all females with nests were assayed for aggression.

Testosterone assays

To determine plasma concentration of T, samples were purified using long-column chromatography followed by a single radioimmunoassay (RIA) using competitive binding (Ketterson et al. 1991; Wingfield and Farner 1975; Wingfield et al. 1984). All samples were run in duplicate. Intra-assay variation was 2.2%, as calculated from the coefficient of variation between values of three standard samples of known concentrations. Recoveries were calculated by adding approximately 2,000 cpm of titrated T to samples and concentrations were corrected for incomplete recoveries, which averaged 82% (N = 36, SE = 2.9%). Initial T mean was 0.17 ± 0.12 ng/mL−1 (mean±SE). The mean was driven higher by one outlier, which, when excluded, produced an initial T mean of 0.05 ± 0.02 ng/mL−1 (mean±SE). Mean post-challenge testosterone (challenge T) was 0.349 ± 0.1176 ng/mL−1, with a range from 0.00 to 2.04 ng/mL−1. The maximum value and mean for initial and challenge T concentrations are lower than previously reported values in female juncos (e.g., Jawor et al. 2007). This is likely because here we used RIA which is less sensitive that enzyme immunoassay used in previous studies. Neither initial T nor challenge T was normally distributed; however, a Wilcoxon t test, including the outlier, revealed that challenge T was significantly greater than initial T, indicating females responded to the challenge with an increase in T (Wilcox statistic = 68.5, P = 0.0016, N = 20). Initial T and challenge T were positively correlated (Spearman’s rho = 0.614, P = 0.0040, N = 20). Because initial T was generally undetectably low (only eight females had detectable initial T), we focus on challenge T for the remainder of the analysis. Challenge T was not normally distributed. However, regression is generally robust to violations of non-normal data (Box 1962) and we use the log-transformed values to further minimize the violation. Neither age nor mass showed a relationship with challenge T (all F < 0.90 and all P > 0.20). The time elapsed between capture and challenge ranged from 16 to 85 min (36 ± 20 min [mean±SE]).

Statistical analysis

All statistical analyses were performed using JMP 8 for Mac (SAS Institute Inc.). We used separate forward stepwise multiple linear regression analyses (P < 0.15 to enter, P > 0.10 to leave) to examine the relationships between morphological traits (feather and tarsus PC) and the two aggression measures (general aggression and latency PC). Interaction terms were included in the full model to determine whether relationships with individual traits were strengthened or weakened when accounting for other traits; interaction terms were eliminated if P > 0.25. To determine whether T in response to GnRH (challenge T) predicted morphology and aggression, we used two separate multiple linear regression analyses. Because the date of the challenge and the amount of time that elapsed between capture and the initiation of the challenge may affect the response (Jawor et al. 2006a; McGlothlin et al. 2010), we used challenge T as the dependent variable in both analyses, allowing us to control for these variables before examining the traits of interest (aggression or morphology measures) (McGlothlin et al. 2007b). To permit visualization of these relationships, individual leverage effect pairs from leverage plots were calculated. Leverage pairs are made up of the actual residuals from the best-fit line and the residual error without the effect in the model (Sall 1990). For traits that showed a nonsignificant relationship, we used retrospective power analysis to determine the lowest number of data points (least significant number [LSN]) required to find a significant relationship for the observed effect size (Thomas and Krebs 2003). Because our fitness measure is binary (fledge/fail), logistic regression was used to explore the relationship between behavior and nest success. A likelihood-ratio chi-square test determined how well the categorical model fit the data relative to constant response probabilities. We estimated selection intensity by calculating linear and nonlinear regression gradients using relative fitness (individual fitness divided by population mean) and z-transformed aggression scores and body measures (Lande and Arnold 1983; Arnold and Wade 1984a, b). We calculated quadratic regression coefficients with the linear component statistically controlled (Arnold and Wade 1984a, b). Reported quadratic coefficients, and standard errors, are doubled, as in Stinchcombe et al. (2008). Linear selection gradients estimate directional selection; nonlinear (quadratic) gradients are used to estimate stabilizing/disruptive selection (Brodie et al. 1995).

Results

Morphology and aggression

Individual females with higher feather PC scores and lower tarsus PC scores expressed elevated levels of overall aggression (greater aggression and shorter latencies) towards a same-sex intruder (Fig. 1; overall adjusted: R 2 = 0.395, F 3, 26 = 6.44, P = 0.0027; feather PC: b = 0.76, P = 0.0304; tarsus PC: b = −1.13, P = 0.0036; feather PC × tarsus PC: b = −0.69, P = 0.0464). Conversely, there was a negative, but not significant, relationship between latency PC and tarsus PC, i.e., females with large tarsus PC scores had short latencies (Fig. 1; R 2 = 0.105, F 1, 27 = 2.94, P = 0.0986). Power analysis indicates that a sample size of 38 would be required to detect a significant relationship (P < 0.05) for the observed effect size (δ = 0.325, actual n = 27). There was no detectable relationship between feather PC and latency PC (Fig. 1; R 2 = 0.005, F 1, 27 = 0.12, P = 0.73).

Fig. 1
figure 1

Visual illustration of the relationship between body measures and aggression measures. Points are leverage plot pairs (see the “Statistical analysis” section) showing the relationship between variables controlling for other predictors in the model, akin to a partial correlation. Curved lines are 95% confidence intervals; R 2 values were calculated using leverage plots pairs, p values were from multiple regression

Challenge T, aggression, and morphology

Testosterone production ability (challenge T) was positively related to tarsus PC (Fig. 2, Table 2; overall model adjusted: R 2 = 0.57, F 2, 18 = 8.56, P = 0.0018; tarsus PC: b = 0.16, P = 0.0193) and there was a positive, nonsignificant relationship between feather PC and challenge T (Table 2; P = 0.2420). Power analysis revealed that, for the observed effect size (δ = 0.053), a sample size of 49 would be required to detect a significant relationship between challenge T and feather PC. More aggressive females produced more T in response to the GnRH challenge. Challenge T was negatively related to latency PC and positively, but not significantly, to general aggression PC (Fig. 2, Table 2; overall model adjusted: R 2 = 0.54, F 4, 12 = 4.27, P = 0.0460; general aggression PC: b = 0.03, P = 0.1327; latency PC: b = −0.10, P = 0.0357). Power analysis indicates that, for the observed effect size (δ = 0.063), a sample size of 19 would be required to detect a significant relationship between challenge T and general aggression PC (actual n = 13).

Fig. 2
figure 2

Visual illustration of the relationship between individual hormonal phenotype, morphology, and behavior. Points are leverage plot pairs showing the relationship between variables controlling for other predictors in the model, akin to a partial correlation. Curved lines are 95% confidence intervals; R 2 values were calculated using leverage plots pairs, p values were from multiple regression

Table 2 Multiple regression models of the relationship between ability to produce T (challenge T) and traits of interest

Phenotype and nest success

Females with higher general aggression PC and lower latency PC had a significantly higher probability of rearing offspring to the age of fledging (nest leaving) (Fig. 3; logistic regression: overall χ 224  = 9.46, df = 2, P = 0.0088; general aggression PC: χ 2 = 8.82, P = 0.0030; latency PC: χ 2 = 4.26, P = 0.0390). Analyzed in another way, females with nests that survived to fledging had higher general aggression scores on average than females that were unsuccessful (Fig. 3; t 29 = −2.48, P = 0.0153); there was no difference in latency scores (t 29 = 0.833, P = 0.4121). The selection differential and linear selection gradient was positive for standardized general aggression PC and weakly positive for standardized latency PC (general aggression: S = 0.274, ±0.13; latency: S = 0.087, ±0.13), suggesting directional selection for greater aggression. The nonlinear (quadratic) selection gradient was weakly negative for both aggression scores (general aggression: γ = −0.082, ±0.18; latency: γ = −0.086, ±0.34), suggesting weak stabilizing selection (quadratic coefficients and errors doubled as recommended in Stinchcombe et al. 2008).

Fig. 3
figure 3

Relationship between general aggression PC and nest success. Individual nests are plotted according to nest fate and general aggression PC (overlapping points are jittered slightly for visual clarity). The curved line is the logistic regression line relating the probability that young in nest survive to the age of leaving the nest (fledging) according to the general aggression PC of the female

Discussion

We found that female juncos covary in traits that may be important in reproductive competition (morphology and behavior). Females with the largest feather PCs and smallest tarsus PCs were the most aggressive, indicating a relationship between morphology and intrasexual aggression. Exploring one potential mechanism of trait expression, we found that the ability to produce T in response to a GnRH challenge predicted individual latency to respond (more T = quicker response) and showed a positive, but not significant, relationship with general aggressive response. Production of T in response to GnRH was also positively related to body size, though significantly so only for tarsus PC. Finally, females that were more aggressive were also more likely to produce a successful nest. Selection gradients based upon nest success indicated that, all other things being equal, more aggressive females are favored by selection. Together, these findings indicate that female expression of competitive traits may be due to the direct action of selection rather than only the nonadaptive product of selection on males. To our knowledge, this is the first study to simultaneously relate natural variation in short-term elevations of T to female expression of competitive traits.

Covariation in morphology, behavior, and physiology

The most aggressive birds were bigger overall as measured by the length of their flight feathers and tarsi, but interestingly, aggressive females also had relatively short tarsi in relation to the length of their feathers. Focusing first on feather length, the length of feathers is influenced by many factors, but one source of variation is resource availability during growth (Grubb 1989; McGlothlin et al. 2007a). In juncos, wing and tail feathers are grown in the nest or regrown after termination of breeding, typically in the autumn, making feather length a more dynamic measure of size (Nolan et al. 2002). Conversely, tarsus length is a static measure of body size that, in songbirds, reaches full length before leaving the nest. A common conundrum for traits related to competitive ability is whether the traits predict future competitive ability or reflect past competitive ability because such traits relate to condition. Thus, the question arises, are larger females more aggressive because they had better access to resources or did they have better access to resources because they are more aggressive?

Females that have the smaller skeletons may have a lower basal metabolic rate (BMR) and, because the cost of feather growth is proportional to mass-specific BMR (Lindström et al. 1993), females with smaller skeletons may be able to grow longer feathers during molt that later enable greater aggression during the breeding season. Alternatively, aggression may drive body size, leading to better access to limited resources during feather growth (Fretwell 1969; Smith and Metcalfe 1997). If so, this would suggest that female competitive traits are condition dependent, as is often seen in males and was recently reported in female tree swallows (Rosvall 2011b). Previous work in wintering juncos showed that dominance was related to wing length (Ketterson 1979). Regardless, these findings indicate that aggression might be beneficial outside of the context measured here, and future tests should explore the consequences of female phenotype outside of the breeding season (Marra 2000).

Covariation between morphology and reproductive behavior of males has been reported in numerous species (Andersson 1994). Fewer studies have examined these relationships in females, but those that have often find covariation as well. In red-winged blackbirds (Agelaius phoeniceus), larger females (wing, tarsus, and mass PC score) are more aggressive (Langston et al. 1990). Female cardinals with darker face masks are more aggressive (Jawor et al. 2004). In American goldfinches, females’ colorful bills act as a signal of status (Murphy et al. 2009). In the horned dung beetle (Orthophagus sagittarius), females with relatively larger horns are more successful at acquiring resources necessary for breeding (Watson and Simmons 2010). Female Soay sheep (Ovis aries) with horns initiate and win more aggressive interactions than females without horns (Robinson and Kruuk 2007). Among female redstarts (Setophaga ruticilla) larger body size is related to access to higher-quality territories in the nonbreeding season (Marra 2000).

We used the strength of response to the GnRH challenge as a measure of hormonal phenotype. In male juncos, a standardized injection of GnRH (GnRH challenge) produces a repeatable transient increase in circulating T (Jawor et al. 2006a) that is correlated with the amount of T produced in response to a social challenge (McGlothlin et al. 2007b), thus providing a biologically relevant measure of interindividual variation in the ability to secrete T (Jawor et al. 2006a; McGlothlin et al. 2007b, 2010; Williams 2008). Here, we report that the ability to produce T in response to a GnRH challenge was related to both the overall size measure (feather PC) and the skeletal measure (tarsus PC), suggesting that T may be related to trait expression in females as well as males. Covariation between hormonal phenotype and morphology has been reported in males (Badyaev 2002; Adkins-Regan 2005; Hau 2007; McGlothlin et al. 2008; Cox et al. 2009; Ketterson et al. 2009). Much less is known about the role of androgens in regulating female morphology. However, in some birds and lizards, androgens can cause females to develop traits normally seen in males (Staub and De Beer 1997; Ketterson et al. 2005). For example, experimentally elevated T in leopard gecko females (Eublepharis macularius) led to the development of male-typical genitalia (Rhen et al. 1999) and experimentally elevated T induced female budgerigars (Melopsittacus undulatus) to develop male-typical cere color (Nespor et al. 1996).

We found that individuals that produced more T in response to GnRH gave faster aggressive response times (lower latency PC) and showed a positive, though not significant, relationship with general aggression. To our knowledge, this is the first study in females to relate individual variation in behavior to individual variation in the ability of the gonad to respond to a physiological challenge (GnRH). There is a substantial body of work detailing relationships between T and aggression in males (Wingfield et al. 1987, 2001; Ketterson et al. 1991; Ketterson 1992; Ketterson and Nolan 1999; Adkins-Regan 2005) and previous work in male juncos also found that the ability to produce T in response to a GnRH challenge was positively related to same-sex aggression (McGlothlin et al. 2007b). However, less is known about the role of T in female–female aggression and the results of such studies are equivocal (Jawor et al. 2006b and references therein). A number of studies report that T is related to increased aggression or dominance in females, as is commonly found in males. For example, in female baboons (Papio sp.), fecal T levels are related to social aggression (Beehner et al. 2005); in female dunnocks (Prunella modularis), competition for male investment elevates T (Langmore et al. 2002); females with experimentally elevated T are more aggressive to same-sex intruders in tree swallows (Tachycineta bicolor) (Rosvall, personal communication) and juncos (Zysling et al. 2006); female spotted starlings (Sturnus unicolor) with experimental T were more aggressive and more likely to win nesting cavities or remain monogamous (Veiga et al. 2004; Sandell 2007); and leopard gecko females (E. macularius) with experimental T showed increased aggression (Rhen et al. 1999). However, numerous other studies have found no relationship between T and aggression (Elekonich 2000; Hau 2000; Jawor et al. 2006b; While et al. 2010), while others have found relationships between aggression and other steroid hormones (Van Duyse et al. 2004; Goymann et al. 2008; Pärn et al. 2008). Taken together, the reported covariation indicates that, to some extent, competitive traits covary, as is often seen in males. Our findings are correlative however, prohibiting us from untangling whether a female’s hormonal phenotype affects her morphological phenotype, morphology affects hormones, or some combination of the two. These correlative patterns are, however, consistent with T playing a interrelated role in the mechanistic pathways regulating female variation in phenotypes in a manner similar to what is seen in vertebrate males.

Fitness consequences of female phenotype

The covariation we observed among females in competitive phenotype is of considerable interest, but without a relationship to fitness, there would be no evidence that selection might act directly on female phenotype or the mechanisms regulating it (Langston et al. 1990; Finch and Rose 1995; Williams 2008). We found that aggressive females had greater nest success and report positive selection gradients on aggression measures, suggesting that selection may directly favor a competitive phenotype. Previous work by McGlothlin et al. (2005) examined correlational selection on some of the same morphological features in the same population and found that fecundity selection favored females with smaller wings and long tails (McGlothlin et al. 2005), while our results indicate that a shorter wing and tail was related to less aggression, which we found to be related to lower nest success. We do not have a ready explanation for why these results are in conflict, but note that, in this study, we used the eventual success of the nest through the nest cycle (fledge/fail at day 12) as our measure of reproductive success, while the previous study measured fecundity as the number of offspring surviving to day 6, the point halfway through the nestling period. Consequently, the difference in findings may be due to differences between years in the direction or strength of selection, or it may be that the time between days 6 and 12 is an important selective episode. For instance, less aggressive females may be better equipped to produce more offspring in the absence of predation pressure or with ample resources, but when predation pressures escalates or resources are limited, requiring competition, aggressive/large females are at an advantage. Furthermore, because nest success in this population varies considerably from year to year (Raouf et al. 1997; Reed et al. 2006; Clotfelter et al. 2007), whatever the basis for the association between aggression and nest success, its impact on fitness is likely to vary over time. The study of McGlothlin et al. (2005) also reported that viability selection favored females with longer wings (related to increased aggression in our study), supporting the view that females experience some benefits from competitive trait expression. The implication here is that, whatever the costs and benefits of being a more competitive female, the net benefit is likely to be context dependent.

If selection consistently favors more aggressive/competitive females and trait expression is heritable, then the female expression of competitive traits observed in the junco may be due in part to direct selective pressures, rather than solely a compromise due to genetic correlations between the sexes and/or a by-product of selection on males. A number of recent studies have found similar advantages for competitive females. With respect to behavior, aggressive female skinks (Egernia whitii) produce more surviving offspring (Sinn et al. 2008), aggressive female tree swallows (T. bicolor) are more likely to acquire the nesting cavities essential for reproduction (Rosvall 2008), and aggressive female starlings (Sturnus vulgaris) are more likely to monopolize access to their mates and maintain monogamy (Sandell 1998). Similarly, with respect to morphology, female striped plateau lizards (Sceloporus virgatus) with more elaborate coloration are preferred in mate choice arenas (Weiss 2006), female barn swallows (Hirundo rustica) with longer tails produce more fledglings and produce more second clutches (Møller 1993; Cuervo et al. 1996), larger female red-winged blackbirds (A. phoeniceus) are more aggressive and initiate breeding earlier (Langston et al. 1990), and larger, heavier Eurasian red squirrels are more likely to have a territory, live longer, and produce more young (Wauters 1995). Taken together, these results, and others like them, indicate that females are likely not similar to males solely because of strong genetic correlations, but are often experiencing positive selection for the expression of a competitive phenotype (Amundsen et al. 1997; Amundsen 2000; Langmore et al. 2002; LeBas 2006; Rosvall 2008, 2011a; Clutton-Brock 2009; Watson and Simmons 2010).

The mechanism underlying this apparent advantage for aggressive female juncos is as yet unclear. However, more aggressive females may be better at defending their nests from potential predators (Cain et al. 2011) or more likely to acquire high-quality reproductive resources, e.g., nest site or mate (Sandell 1998; Forsgren et al. 2004; Jawor et al. 2004, 2006b; Illes and Yunes-Jimenez 2009; Murphy et al. 2009). Regardless, this apparent advantage suggests that females compete for breeding resources in a manner analogous to males and that greater aggression can be beneficial, at least in some stages. On the other hand, nest success is only one component of fecundity, which is only one component of fitness, and there may be substantial costs that our study did not capture. For instance, other songbird studies have shown that aggressive or high T females (O’Neal et al. 2008; Rosvall 2011c) and males (Cawthorn et al. 1998; Duckworth 2006) produce fewer, smaller, or lower-quality offspring, suggesting that there may be a cost to such a phenotype. Similarly, high-ranking female olive baboons (Papio cynocephalus anubis) have shorter interbirth intervals and improved infant survival, but also higher probability of miscarriage (Packer et al. 1995). However, a second possibility is that higher-quality females are better able to cope with the consequences of higher levels of endogenous T, as has been argued for males (Peters 2000; McGlothlin et al. 2010), allowing expression of greater aggression without the costs often reported in females with experimental elevations of T (e.g., Clotfelter et al. 2004; Zysling et al. 2006; O’Neal et al. 2008).

Currently, our understanding of why females express competitive traits, which appear similar in form to traits that are sexually selected in males, is limited. By quantifying the fitness consequences of these traits, we can begin to understand to what extent selection is acting directly on these traits in females and to what extent these traits are the nonadaptive by-product of selection on males (Lande 1980; Badyaev 2002). By also examining the proximate mechanisms that regulate dimorphic trait expression, we can elucidate how this divergence may occur (Fairbairn et al. 2007; Cox et al. 2009). Additionally, despite the fact that variation in female behavior has important influences on population parameters and the strength of sexual selection (Shuster and Wade 2003), we are just beginning to understand the role that androgens play in modulating behavioral variation and the magnitude and importance of that variation among females in free-living populations (Staub and De Beer 1997; Amundsen 2000; Langmore et al. 2002; Ketterson et al. 2005; Jawor et al. 2007; Rosvall 2011a; Clutton-Brock 2009).