Introduction

Studies of sexual selection often focus on the fitness costs and benefits of signaling in males. In many species, however, both males and females possess elaborate signals and a complete understanding of sexual signaling should explain the expression of signals in both sexes (Amundsen 2000b). One influential view is that female ornaments may be non-functional byproducts of selection in males (Lande 1980; Amundsen 2000a). While genetic constraints undoubtedly play a role in shaping ornamentation in both sexes (Kraaijeveld 2014; Dale et al. 2015; Dunn et al. 2015), recent work has demonstrated that female signals may also serve as indicators of female quality that function in mate choice and that are maintained by selection (Doutrelant et al. 2008; Griggio et al. 2009; López-Idiáquez et al. 2016). In support of this idea, evidence suggests that males may modulate their investment in parental care or nest defense in response to variation in female ornamentation (i.e., the differential allocation hypothesis; Griggio et al. 2003; Mahr et al. 2012). Despite an increasing focus on female signal variation, studies of male signaling are still much more common (Rosvall 2011; Odom et al. 2014; Kokko 2017). One reason for this continued discrepancy is that it is often logistically easier to measure variation in the quantity rather than the quality of offspring produced, but variation in the quality of offspring may be relatively more important for understanding sexual selection operating on female ornamentation.

The most widespread paradigm for studying sexual selection in wild birds is to tally the total number of within and extrapair offspring produced by each individual in a population in relation to a signal of interest (e.g., Safran et al. 2005). This approach has proven powerful for understanding differential reproductive success in males, where variance in the number of offspring produced is potentially high due to differential investment, extrapair paternity, and polygyny (Trivers 1974; Gerlach et al. 2012; Kokko et al. 2012) and has been used effectively in a wide range of species (Griffith et al. 2002; Westneat and Stewart 2003). In contrast, for many of the best-studied species, female reproductive output is less variable because it is limited by egg production (note that there are many exceptions to this pattern for species with different life histories; e.g., Apakupakul and Rubenstein 2015). Despite relatively low variance, some studies have demonstrated correlations between female ornamentation and seasonal reproductive success (e.g., Doutrelant et al. 2008; Freeman-Gallant et al. 2014); yet, even in species where females vary little in the number of offspring produced, mate selection mediated by signals could evolve if signals convey information about female parental behavior and the likely quality of offspring produced (reviewed in Nordeide et al. 2013).

For many birds, mortality is highest in the first year and any parental behavior that increases the likelihood of offspring survival and recruitment should increase fitness (Sæther 1989; Sullivan 1989; Martin 2015). One major way that female behavior can influence offspring phenotype is through variation in incubation behavior, particularly in species where incubation is performed entirely by females (Ardia et al. 2010; Martin et al. 2011). The critical role of incubation in development has long been recognized and there is a rich literature describing the trade-offs and demands of incubation (e.g., Conway and Martin 2000a, b; Martin et al. 2007). Fluctuations in the thermal environment that avian embryos experience during incubation are known to have profound effects on survival (LaManna and Martin 2016), immunity (Palacios and Martin 2006; Martin et al. 2011), growth rate (LaManna and Martin 2016), metabolism (Nord and Nilsson 2011), endocrine regulation (Uehling et al. 2020), and body condition (Ardia et al. 2010; Bleu et al. 2017). For parents, incubation is a costly behavior and is under strong selection for optimal performance (Conway and Martin 2000b; Visser and Lessells 2001). Across species, incubation behavior balances (i) the thermal demands of embryos, (ii) nest predation risk, and (iii) parental foraging demands (Conway and Martin 2000b; Martin and Schwabl 2008). Despite this strong selection, there is apparently considerable flexibility in incubation within individuals (LaManna and Martin 2016; Martin et al. 2018). For example, incubation bouts change with weather conditions (Conway and Martin 2000a; Cones et al. 2020) and are altered in response to experimental manipulations of nest temperature (Pérez et al. 2008; Ardia et al. 2009, 2010; Bleu et al. 2017), food availability (Pearse et al. 2004), and perceived predation risk (Fontaine and Martin 2006; LaManna and Martin 2016).

Given the importance of incubation behavior, males that mate with a female expressing a favorable incubation phenotype are likely to increase their long-term fitness via increased offspring quality. While a great deal of research on incubation has focused either on differences across species or on adjustments in response to manipulations, relatively few studies have investigated relationships between female incubation behavior and ornamentation (but see Matysioková and Remeš 2010; Hasegawa et al. 2016; Kötél et al. 2016; Plaza et al. 2018). Moreover, it is often unclear to what extent females have a stable “incubation phenotype” that is repeatable across conditions and nesting attempts. Interestingly, several papers explore the possibility that females might flexibly adjust their own incubation behavior in response to the signaling phenotype of their mate (Gorman et al. 2005; Hasegawa et al. 2012; Hasegawa and Arai 2016). While these studies do explicitly link incubation behavior and mate choice, the focus is on male signal variation, despite the fact that females express the behavior of interest.

We studied incubation behavior and female ornamentation in a free-living population of common yellowthroat warblers (Geothlypis trichas). Common yellowthroats are socially monogamous, migratory songbirds with female only incubation and sexually dimorphic plumage. Males possess both a melanin-based black mask and a carotenoid-based UV-yellow bib, both of which function in sexual signaling (e.g., Freeman-Gallant et al. 2010; Whittingham et al. 2015). Females do not have the black mask, but do have a smaller and duller version of the yellow bib. Despite the fact that the bib is less pronounced in females, a previous study in this population found that larger bibs in females were positively associated with age, body size, and fecundity, while more colorful bibs were associated with lower fecundity, increased nest failure due to predation, and decreased female survival prospects (Freeman-Gallant et al. 2014). Thus, there is some evidence that female bib attributes might be informative in this species, but it is unclear whether these same attributes are associated with female incubation behavior.

Here, we evaluated the hypothesis that female ornamentation functions as a reliable signal of incubation phenotype in common yellowthroats. This hypothesis results in four clear predictions and our dataset allowed us to directly address the first three of these: (i) females exhibit stable between-individual differences in incubation behavior, (ii) variation in incubation behavior influences offspring quality or survival, (iii) between-individual variation in female ornamentation is correlated with variation in incubation behavior, and (iv) male mate choice decisions are based on female ornamentation. We first used an observational dataset of incubation behavior in females—including observations of the same females across multiple years—to assess the repeatability of several aspects of incubation behavior both within and across nesting attempts. Next, we asked whether nestling survival and proxies for nestling quality (body size and condition) were correlated with female incubation behavior. Finally, we assessed whether any aspects of female phenotype, including variation in the size and color of the female bib, were correlated with variation in incubation behavior. Because early season mate choice decisions are difficult to observe in this species, we could not assess male choice of mates directly, but we discuss the implications of our findings for understanding mutual mate choice and the evolution of signals in both sexes.

Methods

General field methods

We studied common yellowthroats breeding in Saratoga County, NY, USA, from early May to late July 2011–2012. This population has been studied continuously since 2005 and details of the field site and general field methods are available in Freeman-Gallant et al. (2010). We captured males and females using mistnets shortly after arrival on the breeding territory (males) or during incubation (females). At the time of capture, we took morphological measurements (tarsus length, wing length, and mass), collected a small blood sample (< 30 μl), and attached an aluminum USGS band along with a unique combination of colored plastic leg bands. We also collected 4–6 feathers from the center of the yellow bib to measure coloration and a set of still photographs with the bird held against a standardized grid background to measure the size of the yellow bib (as in Freeman-Gallant et al. 2010).

We considered females to be “inexperienced” if they were first-time breeders in our study site and “experienced” if they were returning breeders. At our site, individuals typically return to a territory within 400 m of the previous year and the habitat surrounding the monitored area is unsuitable for yellowthroats (Taff et al. 2013). Thus, inexperienced birds are generally 1 year old, though we cannot rule out the possibility that a few inexperienced breeders may have dispersed as older adults into our study site.

We conducted a full census each day of the breeding season on all territories to determine important dates in the breeding cycle (i.e., arrival, clutch initiation, hatching, and fledging). Common yellowthroats build open cup nests near ground level. Once females had settled on a territory, we searched for nests by observing females during the nest building or—more rarely—the early incubation phase. Nests were checked every other day to determine clutch size, hatch dates, and ultimate fates of nestlings. We sampled nestlings on day 5 after hatching. At this time, we collected mass and tarsus measurements, took a small blood sample, and attached the USGS band. We continued checking nests every other day after measurements to record survival and fledging. Successful fledging is clearly distinguishable from predation because offspring remain on the territory and are actively fed by parents for several days to weeks after fledging. Nest predation rates are high in this population (~ 70% predation rate reported in Taff et al. 2014) and females typically re-nest after predation events up to six times in a season. After nest predation, we again followed females to locate subsequent nesting attempts. Occasionally females also produced a second or third clutch after successful initial attempts and we also found and monitored these nests.

Characterizing variation in ornaments

We measured the size and coloration of the yellow bib patch in females using photographs and feathers collected in the field. Total bib area was assessed from still photographs in ImageJ (http://rsb.info.nih.gov/ij) using the “Threshold Color” plugin (hue 20–50; saturation 100–255; as in Freeman-Gallant et al. 2010, 2014). Coloration was assessed using an Ocean Optics 2000 UV–VIS spectrophotometer to measure reflectance across the visual spectrum (320–700 nm). We collected and averaged four readings from each bird with the probe removed between each reading. As in Freeman-Gallant et al. (2014), we calculated total brightness (average reflectance from 320–700 nm), UV saturation (proportion of total reflectance from 320–400 nm), and carotenoid chroma (Ccar) as (R700 nm–R450 nm)/R700 nm and used these three metrics to describe variation in yellow coloration.

Temperature logger placement and analysis

To each nest, we added two Thermochron iButton temperature loggers within 1 day of the initiation of incubation (model DS1921G-F5#, Embedded Data Systems, Lawrenceburg, KY). Yellowthroat nests consist of a rough cup structure made of large reeds and grass outside of a smaller, thin cup of lining material made of very fine grass and pine needles. We positioned the loggers between the rough cup and lining material just below the eggs. In this position, the devices were not visible, but were close enough to the egg mass to be warmed when females were incubating the eggs.

The devices were set to record temperature ± 0.5 °C every 2 min. At this recording frequency, the memory available was sufficient to record ~ 3 days of temperature data for each device. Yellowthroat nests in our population hatch after 12.1 ± 0.6 SD days of incubation. For our temperature loggers, we programmed a delayed start time so that the first device recorded early in incubation (target days 2–5) and the second device recorded late in incubation (target days 9–11). In practice, the exact days recorded varied somewhat because the loggers were sometimes deployed before the start date of incubation was confirmed. Because our devices were not positioned directly against the brood patch of incubating females, we could not collect reliable information on the absolute temperature of incubated eggs, but we were able to determine the length of on and off bouts throughout the observation period.

The start and end of incubation bouts were determined manually by plotting the temperature profile for each nest and locating transition points where temperature began to rise sharply (start of an on bout) or fall sharply (start of an off bout). A similar approach has been used for iButton data in other studies (Ardia and Clotfelter 2006; Ardia et al. 2009). In most cases, transitions between on and off bouts were unambiguous. We excluded any sections of the recordings in which transitions could not be identified due to high ambient temperature or bad logger placement. In a few cases, entire records could not be scored or the devices malfunctioned. It was also common for predation to occur before all 6 days of incubation data had been collected; therefore, the total amount of time scored at each nest varied.

Typically, female yellowthroats remained on the nest actively incubating for the entire night from around 20:00–21:00 until around 5:00–6:00 the following morning. Although we occasionally observed off bouts during the night (perhaps when females were flushed off the nest), these occurrences were rare and we did not score incubation behavior during the night. We began scoring with the first off bout each morning and ended once females were back on the nest for the night. We also noted the time of the first off bout for each female to assess variation in the onset of daily activity across females.

Finally, we collected information on ambient temperature at the field site using a set of five data logging devices that were deployed throughout the site as part of a separate study (Song Meter 2 + , Wildlife Acoustics, Concord, MA.; Taff et al. 2014). Depending on the time of day, these devices recorded ambient temperature ± 2.0 °C every 5–25 min. We averaged ambient temperature across all devices and for each incubation bout scored we considered ambient temperature to be the reading taken closest to the midpoint of the bout. Using the bout level data described here, we calculated overall means for on and off bout length, percent of time spent incubating, and average onset of daily activity. In our analyses, we consider both bout level and nest level descriptors of incubation behavior.

Accounting for covariates of incubation behavior

The goal of this study was to examine between-individual differences in incubation behavior. However, individual incubation bouts occurred across changing conditions that are known to influence incubation behavior directly and these effects may be more appropriately attributed to external conditions than to individual phenotype. Therefore, before addressing the main questions of the study, we examined the relationships between incubation behavior and the possible covariates of ambient temperature, date, time of day, and year. To do this, we fit two separate full models with either “on bout length” or “off bout length” of each individual bout as the response variable, the set of covariates listed above as predictors, and nest identity and female identity as random effects. Beginning with this full model, we used the “dredge” function from the “MuMIn” package in R to fit all possible combinations of covariates (Bartón 2016). From the selection table, we chose the single model with the strongest support (lowest AICc value). To illustrate the explanatory power of the environmental covariates, we calculated R2 for the full model and intercept only model using the method for mixed models described in Nakagawa et al. (2013).

Using the best-supported model, we extracted the random intercept estimates (i.e., best-linear unbiased predictors, BLUPs) for each female and nesting attempt after controlling for the covariates included in the final model. We used these random effect estimates and the intercept to calculate on and off bout length for each nesting attempt after accounting for the conditions under which observations were made. We used the same approach to calculate corrected times for the onset of daily activity, except that this model included the time of sunrise instead of the start time of the bout as a predictor. Finally, we used the corrected on and off bout lengths for each nest to determine total incubation cycle length (on + off bout) and incubation constancy (percent time on nest) after accounting for covariates. These estimates represent the expected incubation behavior of each female for each nesting attempt under the average environmental conditions observed in our dataset. Despite the fact that several covariates were correlated with incubation behavior (see “Results”), in all cases the corrected estimates were highly correlated with raw calculations of incubation behavior and the results of analyses using corrected or raw values were qualitatively similar. We report results using only the corrected incubation measures except when noted otherwise.

Data analysis

We first calculated the within-individual repeatability of incubation behavior across observations within a single nesting attempt. Using data from individual bouts, we determined repeatability of on bout length, off bout length, and the onset of activity across days. We next calculated the repeatability of incubation behavior across multiple nesting attempts by the same female in a season. For these comparisons, we determined repeatability of averages from the full nesting attempt for on and off bout length, average cycle length (on + off bout), incubation constancy (percent time spent on nest), and average onset of daily activity. We calculated and reported “agreement repeatabilities” (i.e., repeatability with no covariates), because this is the most widely reported and easily comparable measure of behavioral repeatability (Nakagawa and Schielzeth 2011). However, “adjusted repeatabilities” (i.e., corrected for covariates) using the corrected incubation metrics described above were similar in both effect size and significance level in all cases. Repeatability measures were calculated using linear mixed models with either raw or corrected incubation measures as the response variable and female identity as the random grouping variable. Significance was assessed using a permutation test with 1000 iterations in the “rptR” package in R (Nakagawa and Schielzeth 2011; Stoffel et al. 2017).

We next asked whether there was any evidence that variation in incubation behavior was associated with differences in offspring survival or phenotype. For this analysis, we fit linear mixed models (LMMs) or generalized linear mixed models (GLMMs) that considered the response variables of incubation period (days from initiation of incubation to hatching; LMM), complete brood loss due to predation during incubation (binomial GLMM), hatching success for each egg laid (excluding nests that failed due to predation; binomial GLMM), and nestling mass and wing length on day 5 (LMMs). For each response variable, we fit a single model with the predictors of incubation constancy, total incubation cycle length (on + off bout), and onset of daily activity. We included only cycle length to describe the timing of incubation activity because on bout, off bout, and total cycle lengths were highly correlated. Total cycle length was also correlated with incubation constancy, but the correlation was much weaker (see “Results”). Each model included female identity as a random effect and models fit at the individual nestling level (hatching, wing, and mass) also included nest identity as a random effect. Significance of each predictor variable was assessed using likelihood ratio tests comparing the full and reduced models.

We used a similar approach to determine whether female condition (residuals of a mass on tarsus regression) or wing length predicted incubation characteristics. For this analysis, we limited the dataset to nests at which females were captured and sampled during incubation. Because females nested multiple times per season but were only captured once, the sample size for this analysis was much smaller. For each predictor (condition and wing length), we fit three simple models with incubation constancy, cycle length, or onset of daily activity as the response variable and with female identity included as a random effect. Significance was assessed with likelihood ratio tests.

Finally, we asked whether female ornamentation predicted incubation behavior. For these analyses, we fit three LMMs with incubation constancy, cycle length, or onset of daily activity as the response variable. Predictors included bib coloration (UV saturation, yellow brightness, and carotenoid chroma), bib size, and female experience class (inexperienced vs. experienced). Each model included female identity nested within year as a random effect and significance was assessed with likelihood ratio tests comparing the full and reduced models.

In all the mixed models described above, continuous predictors were standardized to a mean of zero and standard deviation of one so that effect sizes in tables are directly comparable. All analyses and figures were produced in R version 3.3.3 (R Development Core Team 2017). Sample sizes vary where incomplete data forced the exclusion of some records. It was not possible to record data blind because our study involved focal animals in the field.

Results

We collected incubation data from a total of 34 unique females and 71 nests. Most females produced multiple nests during the study (combining the 2 years, one nest n = 15; two nests n = 9; three nests n = 5; four nests n = 3; five nests n = 1; six nests n = 1). From these nests, we scored a total of 6054 on or off bouts across 314 days of incubation (92 ± 58.2 SD bouts per nest). Overall, the average incubation cycle lasted 59.8 ± 12.1 SD minutes (44.8 ± 10.3 min on and 15.0 ± 2.9 min off the nest), but there was a considerable variation in cycle length across females (range of on + off bout length: 34.8 to 90.2 min). Across nests, average on bout length was positively associated with average off bout length (Fig. 1; Table 1; r = 0.65, P < 0.001), suggesting that although females employed different time allocation strategies, their behavior resulted in somewhat similar incubation constancy. Individuals also differed substantially in the timing of the onset of daily activity; females first left the nest at 5:27 am EST ± 27.5 SD minutes.

Fig. 1
figure 1

On bout length in relation to off bout length for each nest measured. Blue points are average lengths for each nest; gray lines indicate SD of bout lengths for on bouts (horizontal) and off bouts (vertical) for each nest

Table 1 Repeatability of female incubation behavior across bouts in a single nesting attempt and across nesting attempts in a season

Effect of environmental covariates

The length of on and off bouts and the timing of the onset of daily activity all varied significantly as external conditions changed. When comparing all possible environmental covariates as predictors of on bout length, the best-supported model included ambient temperature, year, time of day, and a quadratic effect for time of day (Table S1, wi for best model = 0.55). Females exhibited longer on and off bouts at higher ambient temperatures and in 2012 compared to 2011 (Fig. S1A; Table S2). On bout lengths also declined across the day, with a quadratic effect indicating longer on bouts near dawn and dusk and shorter on bouts at midday (Fig. S1C; Table S2).

For off bouts, the best-supported model included ambient temperature, date, and a quadratic effect for time of day (Table S1, wi for best model = 0.23). Three other very similar models received equivalent support (ΔAICc < 0.8); all of these top models included ambient temperature and date, but the inclusion of year or time of day was less certain (Table S1, combined wi of best four models = 0.79). Off bouts were longer at higher ambient temperatures and shorter later in the breeding season (Fig. S1B; Table S2). The quadratic effect of time of day indicated that off bouts were shortest near dawn and dusk and longest at midday (Fig. S1D; Table S2).

The best-supported model for the onset of daily activity included ambient temperature, year, date, and time of sunrise (Table S1, wi for best model = 0.99). Females began their daily activity later with increasing ambient temperature, at later dates in the season, and in 2012 compared to 2011, but became active earlier with earlier sunrise times (Fig. 2; Table S2).

Fig. 2
figure 2

Onset of daily activity with respect to date and civil sunrise (orange line). Individual females are represented multiple times on different days

While environmental covariates were highly supported as predictors of all three incubation metrics, their explanatory power was small and the random effects of female and nest identity explained much more variation in incubation behavior (on bout length: pseudo-R2 of best model = 0.38, intercept only model = 0.33; off bout length: pseudo-R2 of best model = 0.19, intercept only model = 0.16; onset of daily activity: pseudo-R2 of best model = 0.61, intercept only model = 0.57). Corrected incubation behavioral metrics derived from these models (see “Methods”) were quite similar to raw estimates (correlation between raw and corrected values for on bout length: r = 0.94; off bout length: r = 0.95; constancy: r = 0.88; onset: r = 0.86).

Repeatability of incubation behavior within females

Within a single nesting attempt, the length of on bouts, length of off bouts, and time of the onset of daily activity were all significantly repeatable (Table 1, all P = 0.001). The onset of daily activity had the highest repeatability (0.47), while on bout length was intermediate (0.33) and off bout length was low (0.14). Across nests within a year, on bout length (0.49), total cycle length (0.45), and onset of daily activity (0.47) were all significantly repeatable (Table 1, all P < 0.02), but off bout length (0.08) was not repeatable. There was a non-significant trend for repeatability of incubation constancy (r = 0.31, P = 0.09). We did not calculate repeatability of females over multiple years because the sample size was very small (9 females) and because many of the attributes that we considered drivers of stable differences in incubation behavior (e.g., condition, ornamentation) also change within individuals across years. When we recalculated repeatabilities controlling for the covariates identified above, most estimates increased slightly, but the significance of estimates did not change for any comparisons (Table S3 shows adjusted repeatabilities controlling for environmental covariates).

Female and nestling phenotype in relation to incubation

None of the incubation measures that we scored predicted nest predation during incubation (Table S5, all P > 0.17). Unlike some prior studies in other species, we detected very little variation in incubation period. A total of 43 nests had both known start and end dates and survived until hatching. Of these, 34 had 12-day incubation periods, 3 had 11-day incubation periods, and 6 had incubation periods longer than 12 days. Given this lack of variation, it was unsurprising that incubation behavior did not predict incubation period (Table S5, P > 0.26), although the few nests that had longer incubation periods did tend to have lower incubation constancy (Fig. 3).

Fig. 3
figure 3

Incubation constancy for nests with different incubation periods. Blue points show raw values for each individual nest. Box-and-whisker plots show the median, interquartile range, and up to 1.5*IQR for each group

In a binomial GLMM with the hatching fate of individual eggs as the response variable, both total incubation cycle length and the timing of the onset of daily activity were significant predictors of hatching success (Table 2, Fig. 4, P < 0.05). Eggs incubated in nests where females displayed shorter cycle lengths and an earlier onset of daily activity were significantly less likely to hatch. Nestling mass and wing length 5 days after hatching were not related to any incubation measures (Table 2, all P > 0.1).

Table 2 Incubation behavior as a predictor of hatching success, nestling mass, and wing length on day 5 after hatching. The response variable is the individual egg or nestling. Models include nest identity and female identity as random effects. Significance is assessed with likelihood ratio tests comparing full and reduced models
Fig. 4
figure 4

Incubation cycle length (A) and onset of daily activity (B) for females from nests that did or did not hatch. In each panel, blue points show the raw data for each nest and box-and-whisker plots show the median, interquartile range, and up to 1.5*IQR for each group

Females in higher body condition and with longer wings had significantly higher incubation constancy, although the effect size was small (Fig. 5A, Table S6; body condition ß = 0.01, 95% CI = 0.0001 to 0.026; P = 0.049; wing length ß = 0.01, 95% CI = 0.001 to 0.026, P = 0.046). Cycle length and the onset of daily activity were not related to female body condition or wing length (Table S6; both P > 0.1).

Fig. 5
figure 5

Relationships between female morphology and incubation. A Wing length is positively associated with incubation constancy. B Greater carotenoid chroma is positively associated with incubation constancy. One individual with very low carotenoid chroma is not shown in the plot; results are unchanged with this outlier included or excluded

Ornamentation as a predictor of incubation behavior

Females with greater bib carotenoid chroma had significantly higher incubation constancy (Table 3, Fig. 5B, P = 0.03). There was also a non-significant trend for females with greater carotenoid chroma to start daily activity earlier (Table 3, P = 0.07) and for females with larger bibs to start daily activity later (Table 3, P = 0.09). No other ornamentation metrics were related to incubation behavior (Table 3, all P > 0.1). Female breeding experience was not related to any incubation behaviors (Table 3, all P > 0.1).

Table 3 Female ornamentation as a predictor of incubation behavior. Models include female identity as a random effect

Discussion

Effective incubation behavior is critical for developing embryos, yet relatively few studies address the possibility that mate choice might be based on signals that provide information about a potential mate’s incubation phenotype. We found that female common yellowthroats display repeatable between-individual differences in incubation behavior and that these differences are stable across multiple nesting attempts, suggesting that incubation style is an intrinsic attribute of the phenotype at least within a breeding season. Indeed, female identity explained much more variation in incubation behavior than any combination of environmental covariates that we considered. Variation in incubation behavior was correlated with both hatching success and female body size, indicating that this variation has direct consequences for nestlings and that it may be costly or difficult to express the most successful incubation patterns. Incubation was also correlated with female plumage coloration, such that receivers evaluating plumage could gain information about the incubation phenotype of a potential mate. However, the specific aspects of incubation behavior that were related to ornamentation were not significantly repeatability and were not associated with measures of hatching success or offspring quality. Thus, while incubation behavior remains a plausible target for mate preferences, our results do not support the hypothesis that variation in ornamentation signals important aspects of incubation behavior in this species.

Because variation in incubation behaviors may contribute to nest predation (Conway and Martin 2000b) and nestling development (LaManna and Martin 2016), the expectation is that there will be strong and consistent selection for optimal incubation behavior. Indeed, inter-specific variation in incubation behavior is well explained by species differences in temperature, predation, and nesting substrate (Conway and Martin 2000b). All else being equal, species with higher predation risk have longer on and off bout lengths (Conway and Martin 2000b). In our population, predation rates are very high (~ 70%, Taff et al. 2014), yet we still observed remarkable variation between individuals in incubation behavior. In fact, the range of total cycle length that we observed (34.8 to 90.2 min) spans much of the range of inter-specific variation described by Conway and Martin (2000a, b). Thus, inter-specific explanations for the evolution of incubation differences appear to be insufficient to explain the existence and maintenance of such pronounced variation at an intra-specific level.

One possibility is that incubation behavior is highly plastic and context specific and that individual nesting attempts are made under different conditions. Under this scenario, females may differ relatively little in intrinsic incubation phenotype, but rather adjust their behavior based on current conditions. When conditions differ between territories or across dates, these local effects might generate the intra-specific variation that we observed. This explanation would seem to agree with manipulative studies that demonstrate different incubation behaviors after experimental manipulations of temperature, perceived predation, or food availability (Pearse et al. 2004; Ardia et al. 2009; LaManna and Martin 2016). Similarly, there is some evidence that females may adjust their incubation behavior based on the quality of their current mate (Gorman et al. 2005; Hasegawa et al. 2012; Hasegawa and Arai 2016). It is worth noting, however, that most manipulative studies compare averages across treatment groups and therefore cannot directly assess the repeatability of within-individual incubation patterns across nesting attempts. While behavioral changes in response to manipulated conditions may be reliably produced and biologically meaningful, it seems unlikely that these responses can explain the full range of variation that we observed. Like several previous studies (e.g., Conway and Martin 2000a; Fontaine and Martin 2006; Coe et al. 2015), we found evidence that incubation bout length and onset of daily activity were related to environmental conditions, such as ambient temperature, time of day, and date in the season. However, these predictors explained very little of the overall variation in incubation behavior.

An alternative—though not mutually exclusive—hypothesis is that females possess stable incubation phenotypes that result in similar incubation patterns across repeated nesting attempts. Under this scenario, stable differences in incubation likely result from some combination of genetic differences, current condition, or past experience. Our results are consistent with both environmental modulation of incubation and individual repeatability in incubation phenotype, but relatively more variation was explained by between-individual differences in incubation behavior. Regardless of the mechanism generating differences, we found that incubation behavior is consistent within a season and related to hatching success. It should be noted, however, that due to the observational nature of our study, we cannot distinguish reduced hatching success as a direct consequence of incubation from reductions based on infertility at this point. Nevertheless, the fact that there are consistent between-individual differences in incubation behavior suggests that there may be heritable variation that has fitness consequences. Thus, males that mate with females expressing favorable incubation phenotypes should attain higher long-term fitness and signals that are related to incubation behavior could be targets of sexual selection. However, while female ornamentation was associated with incubation phenotype in common yellowthroats, it was not associated with the characteristics of incubation that were most consistent and that best predicted nestling quality.

A previous study in this population demonstrated that female coloration might act as a signal of female quality in some cases (Freeman-Gallant et al. 2014). That study found that female ornamentation is under both viability and fecundity selection, but the patterns of selection are contrasting and may limit the utility of the female bib as a target of sexual selection (Freeman-Gallant et al. 2014). In particular, bib carotenoid chroma—the metric that we found was correlated with incubation constancy—was associated with lower fecundity, increased nest predation, and lower survival (Freeman-Gallant et al. 2014). Thus, any benefits of mate choice based on incubation phenotype might be offset by the costs of negative viability and fecundity selection. Alternatively, the negative viability and fecundity selection observed in the earlier study might be a consequence of greater investment in offspring quality at the expense of quantity and future reproduction, but without data on offspring survival and recruitment it is impossible to distinguish between these possibilities. Taken together, the two studies suggest that female ornamentation is an unreliable signal in this species.

A full understanding of sexual signal evolution should explain ornamentation in both sexes, but female ornaments have been understudied and the most common empirical approach used in studies of male ornaments—tallying the number of within and extrapair young—is often less informative for females. Alternative approaches focused on offspring quality or survival and mutual mate choice have the potential to clarify when and how female ornamentation is selected as a sexual signal. While our study did not find clear support for the idea that signals might convey information about incubation behavior, we did find strong evidence that females vary widely and consistently in these behaviors with important consequences for nestling quality and survival. Future work should seek to identify different aspects of female investment in offspring quality that might be the targets of mate choice rather than focusing only on the number of offspring produced.