Introduction

Understanding natural selection and the selective processes that shape the evolution of phenotypic traits in living organisms hinges upon our ability to measure and aptly capture the adaptive influence of traits with respect to specific environments (Endler 1986). Studies of natural selection can be broadly separated into those that model the evolutionary process and those that attempt to measure the action of natural selection in populations in nature. Models of natural selection seek to reveal how the process works through mathematical expressions of change from one generation to the next (Wright 1931; Crow and Kimura 1970; Orr 2009). These models often assume that changes in trait frequencies are the result of natural selection, without the occurrence of other processes such as migration, mutation, drift, or epigenetic modifications of gene expression.

In contrast, evolutionary change in real populations may actually reflect a combination of these different processes. In addition, empirical studies of natural selection, in action in real populations, generally measure a change in trait frequencies from short-term measures of survival or reproduction. This is problematic for inferences on natural selection, since fitness advantages of particular trait values are often measured over a shorter time period than a complete generation (Lande and Arnold 1983; Kruuk et al. 2002; Dobson et al. 2017). Such studies measure fitness as success at surviving a storm or at producing juvenile offspring, and then compare the fitness metric to trait values, rather than using success at producing offspring that themselves reproduce in the next generation (but see Boyce and Perrins 1987).

Fitness is the complex metric by which one typically measures the adaptive value of organismal traits in evolutionary ecology. Studies that refine the definition of fitness or seek alternative measures of fitness are legion (Fisher 1930; Hamilton 1964; Grafen 1982, 2015; Endler 1986; Lucas et al. 1996; Queller 1996; Oli 2003; Qvarnström et al. 2006; van de Pol et al. 2006; Hunt and Hodgson 2010; Dobson et al. 2012; Zhang et al. 2015; Scranton et al. 2016; Harris et al. 2017; Rubach et al. 2020; Levin and Grafen 2021). These studies usually propose some practical means of measuring the propensity of a given gene, group of genes, trait, or group of traits, to spread through populations over generations (Sæther and Engen 2015). In the present paper, we use a trait-based definition of fitness: the change in frequencies of alternative forms of an organismal trait over time in a population (Dobson and Viblanc 2019). This definition presents the advantage that trait variation integrates sources of both genetic and non-genetic (e.g., early environmental or maternal effects affecting trait expression) variation. It further has the advantage of practicality since it is the phenotypic traits (e.g., individual mass, size, age) that are measured in field studies, with no assumption made about their underlying genetic or non-genetic architecture (e.g., the genetic variance–covariance matrix; Lande 1979; Arnold et al. 2008).

Regardless of the metric used, measuring fitness requires understanding how traits are transmitted through the population over time. In terms of evolution, it is the fitness of a trait form relative to others that counts (Ayala and Campbell 1974). For sexual organisms, such as many animals and plants, the transmission of traits is achieved through reproduction. Thus, most studies evaluate fitness by measuring the propensity of organisms to transmit traits through the number of offspring they produce. For iteroparous organisms, this includes successive reproductions, so that adult survival and longevity, as well as age at first reproduction and reproductive senescence, are also important components of fitness (Brommer et al. 2002; Oli et al. 2002; Bouwhuis et al. 2012). However, when measuring fitness, the most important issues are what ultimately influences changes in trait frequencies, and how our conclusions about trait evolution are shaped by the nature of our fitness measures.

If adaptive traits are those that spread in the population relative to other trait forms, then surely offspring that die before reproducing themselves do not contribute to the fitness of a given trait. Yet, for studies in nature, our ability to monitor individuals over time is often limited: many animal species disperse at a young age (Greenwood 1980; Dobson 1982), and for many species, age at maturity may be delayed (Cole 1954; Oli and Dobson 2003), so that the contribution of offspring to trait fitness through their own future reproduction is difficult to measure. Thus, as second-best proxies, studies of animals often rely on metrics such as clutch size, brood size, and litter size at birth or at offspring independence, to quantify the number of offspring produced that potentially can contribute to the next generation. Yet, because numerous offspring die at a young age, it is unclear how trait fitness is influenced by further selection or changed by environmental stochasticity as time goes by (Hadfield 2008). On one hand, trait fitness may be influenced by genetic correlations and selection on other traits than the one of interest. On the other hand, the further fitness is measured from the production of reproductive offspring, the more environmental variation may change the association between a phenotypic trait and realized fitness (the fitness of alternative traits when offspring begin reproduction).

The purpose of our study was to assess how the evaluation of fitness changed depending on the timing of measurement using a 29-year long-term data set of male and female Columbian ground squirrels (Urocitellus columbianus). Specifically, we were interested in understanding (1) how individual fitness varied depending on whether we considered the offspring born, weaned, and at yearling age; and (2) whether the association of a phenotypic trait and fitness changed depending on the timing of the fitness measurement. To do this, we measured individual survival and reproduction, and estimated the number of gene copies that a breeding parent passes on to the next year, by adding the parent’s survival from the current year to the next (1 for surviving, 0 for not) to half the number of offspring it produced. We then compared this measure to other individuals in the population to obtain an estimate of annual fitness (Qvarnström et al. 2006). We also estimated fitness from adult annual survival and reproduction separately, and here the number of offspring was not halved (both annual fitness and these latter measures were used, for example, by Dobson et al. 2017). Offspring production may be measured at different time points, such as birth, weaning, or survival to a later period. We evaluated how important the choice of when to measure offspring production was with an examination of repeatability (the intraclass correlation coefficient “ICC,” used as a measure of consistency) of annual and reproductive fitness measures when using the number of offspring measured at birth, weaning, and yearling age and investigated paired correlations between these.

We also evaluated how measuring fitness at different time periods, and using different metrics, affects inferences about natural selection. For illustrative purposes, we focused on a single trait, and examined the strength of selection on emergence date from hibernation, i.e., the date of spring emergence above ground from hibernation. For ground squirrels, this occurs close to the termination of torpor (Williams et al. 2014). We did not know the genetic variance–covariance matrix for traits associated with emergence from hibernation, and thus could not differentiate selection among genetically correlated traits (Lande and Arnold 1983). However, adult female Columbian ground squirrels go through a single day of estrus and mating each year, about 3–5 days after emergence from hibernation (Murie and Harris 1982). The genetic correlation between emergence and estrus dates was 0.98 ± 0.01 SE (Lane et al. 2011), indicating selection on one of these traits would undoubtedly influence selection on the other. Since our sample sizes were greatest for emergence date from hibernation, we chose this trait for analyses. Columbian ground squirrels are hibernating rodents, with a short and intense reproductive season lasting only a few months (Murie and Harris 1982; Dobson et al. 1992). The timing of emergence from hibernation is highly variable (Tamian et al. 2022), and previous studies have shown it to be both significantly heritable (h2 = 0.22–0.34; Lane et al. 2011), and negatively associated with annual fitness when measured using offspring at emergence from their first hibernation, at about 1 year old (for adult females; Lane et al. 2012), prerequisite conditions for responding to selection. However, nothing is known about whether the strength or direction of selection gradients differs when the fitness for this trait is measured at different time periods, nor whether selection gradients exist for males. Thus, emergence date appeared to be a good candidate trait to investigate its association with different estimates of fitness measured from different life stages.

Materials and methods

Study species and long-term population monitoring

Columbian ground squirrels were studied from 1992 to 2020 in the Sheep River Provincial Park, Alberta, Canada (50° 38′ 10.73″ N; 114° 39′ 56.52″ W; 1524 m; 2.3 ha). Individuals were fitted with permanent numbered ear tags (#1-Monel metal tag; National Band and Tag Company, Newport, KY) when weaned (or at first capture for immigrants). Thus, it was not possible to record data blind because our study involved focal animals in the field. In each year of the study, the entire population was trapped at spring emergence using 13 × 13 × 40 cm3 live traps (Tomahawk Live Trap, Hazelhurst, WI, USA) baited with a small amount of peanut butter. Each individual was then dyed with a temporary unique dorsal mark (Clairol® Hydrience black hair dye N°52 Black Pearl, Clairol Inc., New York, USA) for identification during field observations. We followed ground squirrels daily throughout the breeding season to assess breeding phenology and success. Females copulated with multiple males within 3–5 days following emergence from hibernation, typically during a single day of estrus (Murie and Harris 1982; Raveh et al. 2010). We determined female mating date through behavioral observations and by inspecting female genitalia (presence of a copulatory plug or plug material in abdominal fur, or sperm in vaginal smears; Murie and Harris 1982). Following mating events, we identified female single-entrance nest burrows during gestation by visual observations of females stocking them with dry grass (Murie et al. 1998), and marked them with colored and flagged metal pins (1 m in length).

Females in the wild gave birth to an average of three (one to seven) blind and hairless offspring in a specially constructed nest burrow, after some 24 days of gestation (Dobson and Murie 1987; Murie 1995). From 2000 to 2016, we caught pregnant females within 2–3 days of expected parturition, about 21–22 days after mating, and brought them to an on-site field laboratory (Hare and Murie 1992). Females were housed indoor in polycarbonate microvent rat cages (483 × 267 × 200 mm; Allentown Caging Equipment Company, Allentown, NJ), supplied with wood shavings and newspaper as nesting material. Females were provided with apple, lettuce, and horse feed (EQuisine sweet show horse ration; Unifeed, Okotoks, Alberta, Canada) ad libitum. After parturition, offspring were sexed and a small tissue biopsy was acquired for paternity analyses (see below) by clipping a toenail bud as previously described by Hare and Murie (1992). We returned mothers and their offspring to flagged nest burrows, usually within a day of birth.

After a lactation period of approximately 27–28 days, offspring first emerged above ground around the time of weaning (Murie and Harris 1982). We trapped females and their entire litters the day the young first emerged. Mothers were determined by observation of the single lactating female that associated with the natal burrow from which young emerged. Juvenile ground squirrels hibernate within their colony of origin for the winter, and those juvenile males that emmigrate typically do so towards the end of the subsequent spring. Thus, in each year, we were able to recapture all yearling males and females that survived their first hibernation.

Paternity analyses

From 2001 to 2017, paternity was estimated following the methods of Raveh et al. (2010). Briefly, DNA was extracted from offspring, known mothers and potential fathers tissue biopsies using DNeasy Tissue extraction kits (Qiagen, Venlo, The Netherlands). We amplified 13 microsatellites with polymerase chain reaction (PCR), using primer pairs previously developed for U. columbianus GS12, GS14, GS17, GS20, GS22, GS25, and GS26 (Stevens et al. 1997); Marmota marmota BIBL18 (Goossens et al. 1998); MS41 and MS53 (Hanslik and Kruckenhauser 2000); and Marmota caligata 2g4, 2h6 (Kyle et al. 2004), and 2h4 (GenBank accession no. GQ294553) amplified polymorphic microsatellite loci. We used similar PCR conditions and cycling parameters as Kyle et al. (2004), but with an annealing temperature of 54 °C. We tested for deviation from Hardy–Weinberg equilibrium (HWE) at each locus within cohorts, and for linkage disequilibrium between pairs of loci within cohorts using exact tests.

Paternity assignment was done using CERVUS 3.0 (Marshall et al. 1998; Kalinowski et al. 2007). Paternity was assigned with 95–99% trio confidence (assumed dam–sire–offspring relationship). Analyses were conducted for each year separately. The input parameters for the simulation step of CERVUS were 10 000 cycles, 70 candidate fathers, 90% of the population sampled, and 1% genotyping error. Parental assignments were accepted when the offspring had no more than 2 mismatches with both parents.

Annual survival, reproduction, and fitness

We calculated annual survival (Surv), as adult survival from the spring mating period to the time of emergence in the next spring (1 for surviving, 0 for not surviving), separately for males and females. Second, we quantified annual reproduction (R), as the number of offspring measured at birth, weaning, and yearling age, separately for males and females. All immature individuals (i.e., females not observed mating or males with testes in abdominal position) were excluded from these calculations. However, individuals considered mature (females observed mating and males with testes in scrotal position) were all kept, even when their reproductive success was zero.

We calculated annual contributions to lifetime fitness, viz. annual fitness \((\lambda_{an})\), following the method of Qvarnström et al. (2006), as applied to ground squirrels by Lane et al. (2011, 2012) and Dobson et al. (2016, 2020):

$${\lambda }_{an}=Surv+ \frac{1}{2} R$$

For any given individual, R is halved since only half of an individual’s genetic contribution is passed on to offspring.

The objective of our study was to determine whether fitness metrics were comparable when reproduction was measured at different time points. Thus, annual fitness was calculated based on R being measured as the number of offspring either produced at birth \({(\lambda_{an}}_b)\), or weaning \(({\lambda_{an}}_w)\), or surviving up to yearling age \(({\lambda_{an}}_y)\).

Selection analyses

We tested for directional, stabilizing, or disruptive selection on emergence date using linear and quadratic regression on annual fitness \({\lambda }_{an}\) (Lande and Arnold 1983). We further ran selection analyses decomposing \({\lambda }_{an}\) into its constitutive elements including annual reproduction (fecundity selection) and annual survival (viability selection). Those three fitness metrics were calculated relative to the population in a given year by dividing them by the annual population mean for annual fitness, reproduction, and survival, respectively, and for males and females separately (Lande and Arnold 1983). As selection operates among individuals, we first centered emergence dates per year by subtracting from individual emergence dates the mean emergence date of the population in a given year, hence translating how early or late (number of days) an individual was compared to others in that same year.

We estimated (1) directional selection gradients (\(\beta\)) using univariate linear models, followed by (2) quadratic selection gradients (\(\gamma\)) from models that included both a linear and quadratic term (Lande and Arnold 1983; Arnold and Wade 1984a). The general form for these models is:

$$\omega = \alpha + \beta z+ \varepsilon$$
(1)
$$\omega = \alpha +{\beta }^{^{\prime}}z+\frac{1}{2}\gamma {z}^{2}+\varepsilon$$
(2)

where \(\omega\) is the considered measure of fitness (i.e., either \({\lambda }_{an}\), R or Surv), \(\alpha\) is the intercept, z the phenotypic trait (here centered emergence date), and \(\varepsilon\) an error term. Note that \(\gamma\) coefficients are reported as Lande and Arnold (1983)’s original formulation and do not require doubling to be interpreted as stabilizing or disruptive selection gradients (Stinchcombe et al. 2008). We used linear mixed models to account for individual ID as a repeated random term, and individual age as a random factor because of known fitness differences that occur with age (Broussard et al. 2003; Raveh et al. 2010). Statistical significance of the selection gradients for viability was estimated with generalized linear mixed models (GLMMs) with a binomial error structure (Garant et al. 2007; Dobson et al. 2017). Directional selection is indicated by significant linear coefficients (β), the sign of the coefficients indicating the direction of selection. Stabilizing or disruptive selection occurs when γ is significantly < 0 or > 0, respectively (Lande and Arnold 1983; McGraw and Caswell 1996). We ran the analyses separately in males and females because the variance of fitness metrics was far greater in males than females (Jones et al. 2012).

Partitioning selection into additive episodes

Following Arnold and Wade (1984a, b), we partitioned selection on emergence date into additive episodes of selection by looking at how selection changed from reproduction measured at birth, weaning, and when offspring reached yearling age. To do so, we separated reproduction into three distinct biological episodes: production of offspring at birth, offspring survival from birth to weaning, and offspring survival from weaning to yearling age. To better understand how each of these episodes contributed to the total selection, we estimated selection differentials for each of them. Selection differentials represent the change in the mean value of a phenotypic character (here emergence date) produced by selection. Because reproductive success at yearling age can be decomposed in the product of the number of offspring born and the number surviving in the two following episodes, selection differentials corresponding to each of the episodes should sum to the total selection differential. Selection differentials are thus presented for each episode in absolute values (shift of the emergence date in number of days), but also as a % of contribution to total selection.

Statistics

All analyses were done in R v. 4.0.2 (R Core Team 2020). The 95% confidence intervals for male and female survival were obtained by parametric bootstrapping (10,000 simulations, 50% of the individuals resampled each time). The consistency of fitness metrics (repeatability, or intraclass correlation coefficient (ICC)) was calculated at different stages using the “rptR” package in R (Stoffel et al. 2017), \(\mathrm{ICC }=\frac{{V}_{G}}{{V}_{P}}= \frac{{V}_{G}}{{V}_{G}+ {V}_{R}}\), where \({V}_{G}\) is the among-individual variance, \({V}_{R}\) is the within-individual (or residual) variance, and \({V}_{P}\) is the total variance in fitness. The ICC for annual fecundity (number of offspring) was estimated using a Generalized Linear Model with a Poisson distribution (Stoffel et al. 2017), as appropriate when working with count data (we added + 1 to offspring numbers to avoid zero values). Correlation between fitness metrics measured at different time points are Spearman’s rank correlation tests, as not all distributions were Gaussian. Because information at birth and paternity analyses were not available in all years, sample sizes vary and are indicated as n the number of reproductive events, N1 the number of individuals, and N2 the number of years in the results.

Results

Male and female annual survival

Sexually mature (scrotal) male survival from one breeding season to the next was on average 65% (CI95 = [59% – 71%], n = 209 reproductive events, N1 = 78, N2 = 21). In contrast, sexually mature females (i.e., those that estrus cycled and mated) had survival rates of 73% (CI95 = [70% – 76%], n = 732 reproductive events, N1 = 223, N2 = 28), on average.

Male and female annual fecundity

Adult male annual reproductive success varied from 0 to 29 offspring born (mean ± SD = 6.2 ± 5.7, n = 154 reproductive events, N1 = 61 fathers, N2 = 16 years), 0 to 26 offspring weaned (4.2 ± 4.8, n = 190, N1 = 73, N2 = 19), and 0 to 16 offspring surviving to yearling age (1.7 ± 2.6, n = 190, N1 = 73, N2 = 19). Adult female annual reproductive success was lower, and varied from 0 to 7 offspring born (3.0 ± 1.2, n = 454 reproductive events, N1 = 167 mothers, N2 = 21 years), 0 to 7 offspring weaned (2.1 ± 1.5, n = 759, N1 = 228, N2 = 29), and 0 to 5 offspring surviving to yearling age (0.9 ± 1.1, n = 732, N1 = 223, N2 = 28). Overall, offspring survival from birth to weaning was 75% (CI95 = [72% – 78%], N = 684) for male offspring and 77% (CI95 = [72% – 78%], N = 655) for female offspring; and from weaning to yearling age 43% (CI95 = [40% – 47%], N = 775) for male offspring and 42% (CI95 = [39% – 46%], N = 742) for female offspring. Annual fecundity was strongly consistent in males (ICC = 0.71, CI95 = [0.60 – 0.78]), but less so in females (ICC = 0.08, CI95 = [0.03 – 0.13]).

Male and female annual fitness

When calculated using all of the offspring born \(({\lambda_{an}}_b)\), weaned \({({\lambda }_{an}}_{w})\), or surviving to yearling age \({(\lambda_{an}}_{\mathrm y})\), annual fitness was strongly consistent in males (ICC = 0.67, CI95 = [0.60 – 0.73]), but less so in females (ICC = 0.49, CI95 = [0.45 – 0.54]). The correlation between annual fitness metrics generally weakened as the period between life stages increased, from 0.95 between birth and weaning to 0.73 between birth and yearling age in males (Fig. 1A) and from 0.74 to 0.65 in females (Fig. 1B).

Fig. 1
figure 1

Pairwise correlation plots for (A) male and (B) female annual fitness metrics calculated from offspring counted at birth \((\lambda_{an\_b})\), weaning \((\lambda_{an\_w})\), or surviving to yearling age \((\lambda_{an\_y})\). The distribution of data is given on the diagonal. Significant Spearman’s correlation coefficients are given for ***P < 0.001

Selection on emergence date

Annual fitness ( \({{\varvec{\lambda}}}_{{\varvec{a}}{\varvec{n}}}\) ) selection

In males, we found directional selection for earlier relative emergence date when annual fitness was calculated based on offspring born (β =  − 0.024; Fig. 2A), and on offspring surviving to weaning (β =  − 0.018; Fig. 2C), but not when calculated based on offspring that survived to yearling age (β  =  − 0.007; Fig. 2E) (Table 1). The strength of selection appeared to decrease from \(\left|{{\beta \lambda }_{an}}_{b}\right|>\left|{{\beta \lambda }_{an}}_{w}\right|>\left|{{\beta \lambda }_{an}}_{y}\right|\). In females, we also found directional selection for earlier relative emergence date when annual fitness was calculated based on offspring born (β  =  − 0.007; Fig. 2B), and on offspring surviving to yearling age (β  =  − 0.013; Fig. 2F), but not when calculated based on offspring that survived to weaning (β  =  − 0.004; Fig. 2D). Note that the directional selection coefficient for relative emergence date almost doubled for females between annual fitness calculated based on offspring born compared to when fitness was based on offspring surviving to yearling age (− 0.013/ − 0.007 = 1.86), but decreased in males (− 0.007/ − 0.024 = 0.29).

Fig. 2
figure 2

Selection on emergence date from regression of annual fitness \({(\lambda }_{\mathrm{an}})\) on year-centered emergence dates in males (top panels) and females (bottom panels). Fitness was calculated for fecundity based on offspring counted at birth \({(\lambda }_{\mathrm{an}\_b})\) (A, B), weaning \({(\lambda }_{an\_w})\) (C, D), or yearling age \({(\lambda }_{\mathrm{an}\_\mathrm{y}})\) (E, F). Significant regressions are indicated by black lines, and the gray ribbons represent 95% confidence intervals

Table 1 Linear (β) and non-linear (γ) selection gradients for viability selection (adult survival), fecundity selection (adult reproduction, number of offspring produced), and annual fitness selection (\({\lambda }_{an}\), see methods) on emergence date in Columbian ground squirrels. Coefficients estimated from LMMs are provided with their standard errors. Significant coefficients at P < 0.05 are indicated by an asterisk. Significance of coefficients was obtained with Gaussian error structure for fecundity and annual fitness, and binomial error structure for viability

Both in adult males and adult females, we found no evidence of substantial disruptive or stabilizing selection on relative emergence date, regardless of whether annual fitness was measured based on offspring born, or on offspring surviving to weaning, or yearling age (Table 1).

Viability selection

In males, we found directional viability selection for later emergence date (β  =  + 0.022), but no evidence of non-linear viability selection on emergence date (Table 1). In females, there was no evidence of significant directional viability selection, but a suggestion of weak stabilizing viability selection on emergence date that approached significance (Table 1).

Fecundity selection

In males, we found directional selection for earlier emergence date when fecundity was calculated from the number of offspring born (β  =  − 0.032, Fig. 3A), weaned (β  =  − 0.033, Fig. 3C), and surviving to yearling age (β  =  − 0.033, Fig. 3E) (Table 1). Selection gradients were of similar magnitude regardless of the period considered. We found no evidence of disruptive or stabilizing selection regardless of whether fecundity was calculated from the number of offspring born, weaned, or surviving to yearling age (Table 1).

Fig. 3
figure 3

Fecundity selection on emergence date from regression of relative annual reproductive success on year-centered emergence dates in males (top panels) and females (bottom panels). Fecundity was calculated from offspring counted at birth (A, B), weaning (C, D), or surviving to yearling age (E, F). Significant regressions are indicated by black lines, and gray ribbons represent 95% confidence intervals

In females, there was mild directional selection for earlier relative emergence date when fecundity was calculated from the number of offspring born (β  =  − 0.007, Fig. 3B), but not when calculated from offspring weaned (β  =  + 0.002, Fig. 3C), or surviving to yearling age (β  =  − 0.012) (Table 1). However, we found weak stabilizing selection for emergence date based on offspring surviving to yearling age that approached significance (\(\gamma\) = − 0.005; Table 1, Fig. 3E).

Partitioning selection into additive episodes

In both males and females, selection on the number of offspring born provided a major selective advantage favoring an earlier emergence date (Table 2). For males, selection on the number of offspring born accounted for 93% of the total selection differential and shifted the mean emergence date by 2.5 days, while selection acting between birth and weaning or between weaning and yearling age were minor, accounting for only 5 and 2% of the total selection differentials. For females, selection differentials were much lower than in males (0.4 days vs. 2.6 days shift due to total selection). While selection on the number of offspring born, as well as from weaning to yearling, also shifted emergence dates earlier (by 0.6 and 0.3 days respectively), the selection from birth to weaning counteracted this by shifting emergence dates later by 0.5 days.

Table 2 Episodes of fecundity selection (Arnold and Wade 1984a, b). The trait under selection was date of emergence from hibernation, an important phenological trait that is strongly associated with successful reproduction and fitness (Lane et al. 2012). Selection differentials are given in days

Discussion

Our results in the Columbian ground squirrels show that, overall, measures of fitness are relatively well correlated when fitness is measured from offspring counted at birth, at weaning, and at yearling age. In general, annual measures appeared quite strongly inter-correlated at different stages. Not surprisingly, the correlation between fitness measures changed, generally waning over the progression of life stages of the offspring. Overall, the strongest correlations were for fitness measured from birth and weaning offspring numbers, and associations became weaker, as time went by (from birth to yearling surviving offspring). These results might have been expected, since offspring mortality was low between birth and weaning, and much higher between weaning and yearling age.

Columbian ground squirrel offspring are raised over a short lactation period of 27 days (Murie and Harris 1982), during which they are secluded in specialized burrow systems and protected by their mothers (Murie and Harris 1988). As expected, mortality is relatively low during this period (roughly 25%). After weaning, they only have a few weeks of foraging to grow and gain sufficient body mass to survive their first hibernation (Dobson 1992; Dobson et al. 1992). Offspring with insufficient body condition are unlikely to survive to the next spring (Murie and Boag 1984; Dobson et al. 1986). Thus, not surprisingly, overwinter mortality of juveniles is high (> 50%) (Dobson and Murie 1987; Zammuto 1987; Neuhaus and Pelletier 2001), and the correlation between fitness metrics wanes rapidly as the period between life stages of the offspring increases. Nonetheless, it is noteworthy that for annual fitness measures these correlations remain relatively high (> 0.60), so that measuring fitness from the number of offspring weaned in a given year already appears as a reasonable first approximation for estimating fitness. For instance, the correlation between annual fitness measured at weaning and when offspring were yearlings was 0.81 for adult males and 0.75 for adult females.

Sources of variation concerning the changes in correlation between measures between birth and yearling age of offspring remain to be determined. Such variations could include stochastic events that occur on an annual scale and accumulate over time, or other sources of biological variation known to affect early offspring survival such as changes in the social environments (e.g., kin numbers; Viblanc et al. 2010; Barra et al. 2021). It is possible that selection on the offspring before they begin to reproduce at a later time could operate in a way that could weaken or strengthen associations of fitness estimates (whether or not an evolutionary response to selection occurs). Nonetheless, it is noteworthy that, overall, the correlations between fitness measured from offspring counted at birth, weaning, or yearling age were all high (> 0.65) for annual fitness measures, suggesting that measuring fitness from offspring surviving to the first year is a good proxy to measuring actual fitness from offspring susceptible themselves to passing on traits (as for adult females in Lane et al. 2012; Dobson et al. 2016).

How did measuring fitness from an annual perspective at different time points affect our inferences on natural selection? To answer this question, we focused on the entire data set, calculating fitness from both male and female offspring, and regressed individual annual fitness on the date of emergence from hibernation, a phenotypic trait known to be heritable (Lane et al. 2011), responsive to environmental fluctuation (Dobson 1988), and negatively associated with fitness (Lane et al. 2012). There appeared to be little or no stabilizing or disruptive selection (no quadratic effects), but only directional selection, acting on emergence date. In general, individuals emerging earlier from hibernation had higher estimated fitness, and this was especially true for males.

In males, selection coefficients decreased when based on measurements from offspring counted at birth, weaning, and yearling age. This likely occurred because of relatively strong mortality during the juvenile period, and supports the hypothesis that environmental stochasticity may dilute the association between a phenotypic trait and fitness due to the passage of time, since the environmental event (viz., emergence from hibernation) increases. This is important, because although the directional selection coefficients measured at different time points are consistent in sign (i.e., negative), the conclusions made would differ substantially. For instance, in males, based on fitness calculated from the number of offspring counted at weaning, significant linear (β) coefficients would give rise to a conclusion of directional selection (Lande and Arnold 1983) acting on emergence date, whereas non-significant coefficients based on fitness calculated from the number of offspring counted at yearling age would not (if relying on significance thresholds, but see below). Similarly, in females and based on statistical significance, one might conclude on directional selection on emergence when using offspring counted at birth, but not when using offspring counted at weaning or at a yearling age, despite the selection gradients being strongest when using the number of offspring that reach a yearling age (Table 1).

Decomposing annual fitness into its annual survival and reproductive components generally revealed consistent patterns. Interestingly, we found mild positive directional viability selection on emergence date for males, but no viability selection for females: males that emerged later had better annual survival. Positive selection on viability in males might occur because males emerge from hibernation early to establish mating territories (Manno and Dobson 2008), but early emergence has a survival cost (Turbill et al. 2011; Constant et al. 2020). For females, a weaker pattern of selective advantage was revealed for earlier emergence from hibernation, but the pattern was only significant when the combined index of annual fitness was used. In general, for both sexes, fecundity selection on emergence date revealed overall similar patterns as annual fitness, except perhaps for females where a mild stabilizing selection was found using offspring that survive to yearling age as the fecundity fitness estimate.

When fecundity selection was separated into distinct episodes of selection (Arnold and Wade 1984a, b), we found that selection at birth accounted for the vast majority of the selection differentials on emergence date both in males and females. For males’ dates of emergence from hibernation, selection based on the number of young at birth was the strongest (by nearly 2.5 days, Table 2), and might have reflected sexual selection (primarily number of mates) and maternal effects during gestation. Selection during lactation, based on numbers of offspring at weaning, however, was weak. This is consistent with the males contributing virtually nothing to parental care in this species, and reproductive success being determined foremost by the number of mated females early on in the season (CS et al., unpublished data). Finally, ecological influences on male date of emergence from hibernation may have been best reflected by the numbers of offspring that survived their first hibernation, but this influence was trivial. For females, selection on emergence from hibernation was relatively weak and only approached significance. Partitioning this pattern into episodes of selection did not appear to produce important insights, other than a slight fitness advantage of emerging earlier from hibernation (by less than half a day).

Taken together, our results over a 29-year long-term study of Columbian ground squirrels showed that annual fitness measures, whereas generally closely related, may lead to nuanced conclusions on the strength (but not direction) of selection acting on heritable phenotypic traits. We documented an overall dilution of fitness associations, and in most cases a waning association between fitness and emergence date, as time passed likely due to added stochastic processes affecting offspring survival. Importantly, focusing on the significance of associations between phenotypic traits and fitness led to contrasting conclusions depending on when fitness is measured. The values of the selection coefficients, however, were fairly consistent using different fitness estimates. Taking a step back from a traditional hypothesis-testing approach and focusing on the magnitude of effect sizes, as is more and more frequently recognized in evolutionary ecology (Yoccoz 1991; de Valpine 2014), likely provides more meaningful information on the strength and patterns of selection acting on phenotypic traits in living organisms.