1 Introduction

American foulbrood (AFB) is a disease of honeybees (Apis mellifera L.) caused by the spore-forming bacterium Paenibacillus larvae sensu Genersch et al. (2006). Beekeepers consider AFB to be the most damaging disease (Melathopoulos and Farney 2002; vanEngelsdorp and Otis 2000; Brødsgaard et al. 2001), and in North America, it has been routinely managed with prophylactic treatments of oxytetracycline. Resistance to the antibiotic, however, has evolved and spread widely (Alippi 1994; Miyagi et al. 2000; Evans 2003) and although alternative antibiotics have been identified, they are more persistent in honey (Thompson et al. 2007). Concern over residual antibiotics in food (Hwang et al. 2005) coupled with the link between agricultural antibiotics and resistance in human pathogens (Wegener 2003) has stimulated interest in managing AFB with reduced antibiotic use.

The search for alternative methods for managing AFB has renewed interest in breeding AFB-resistant traits into commercial bee populations. AFB resistance was successfully bred to high frequency in a government-maintained breeding population between 1935 and 1949 (reviewed by Rothenbuhler 1958 and Spivak and Gilliam 1998a) but has never been widely incorporated into commercial breeding populations since that time (Spivak and Gilliam 1998b). One reason for this may be that queen breeders, queen propagators and their honey-producing customers, until recently, viewed AFB resistance as only moderately important in their overall selection priorities (vanEngelsdorp and Otis 2000).

Hygienic behaviour, described 70 years ago (Woodrow and Holst 1942), is the most studied trait conferring AFB resistance. It is a behavioural character expressed by mid-aged adult worker bees. These bees selectively abort the development of larvae infected with P. larvae before the pathogen can form infective spores. Hygienic behaviour can dramatically reduce AFB infection. For example, bi-directional selection for the trait (Spivak and Reuter 2001a) resulted in significant divergence in AFB susceptibility in just four generations. Recent research has demonstrated that the trait is not only effective at controlling other brood diseases (reviewed by Spivak and Gilliam 1998b) but also the multiplication of Varroa destructor Anderson and Trueman (2000) (Spivak 1996; Spivak and Reuter 2001b; Ibrahim et al. 2007), the most damaging pest of honeybees. Coupled with the rise of oxytetracycline resistance in the Americas, these recent findings have renewed the interest of bee breeders in hygienic behaviour.

While there is little controversy over the ability of hygienic behaviour to reduce brood diseases and V. destructor infestations, it is uncertain whether the trait can be significantly increased in commercial populations without first increasing its frequency in a closed and elite breeding population (Harris and Newman 1994). Increases in the frequency of the trait, to date, have largely resulted from the selective breeding of small and closed experimental populations (Rothenbuhler 1964a; Gilliam et al. 1983; Spivak and Reuter 2001a, b) or, in one successful commercial example, successive transfer of stock from an experimental population to commercial breeders over several years (Spivak et al. 2009). A common practice among commercial queen breeders is to free-mate their selected queens to a population of largely unselected drones. Mass dam selection on its own is expected to result in a slow increase in the trait’s frequency as it appears to be only moderately heritable in breeding populations (Harbo and Harris 1999; Boecking et al. 2000). Hygienic behaviour has been characterised as a simple Mendelian recessive (Rothenbuhler 1964b), though more recent analyses suggest the trait has a multi-locus basis (Lapidge et al. 2002; Oxley et al. 2010). Selection for the trait, however, is further complicated by evidence suggesting that it is influenced by maternal effects in a Russian-derived population (Unger and Guzmán-Novoa 2009)

Palacio et al. (2000) suggested that hygienic behaviour establishes more rapidly in commercial populations than is predicted. They documented a dramatic increase in the frequency of hygienic behaviour in a commercial population of honeybees in Argentina. Although the queens were free-mated to an unknown mixture of selected and unselected drones, the proportion of colonies expressing hygienic behaviour increased from 43% to 74% within just four generations of maternal selection. Furthermore, selected colonies had half as many cases of AFB as unselected colonies. Similar increases in the frequency of hygienic behaviour among open-mated commercial colonies were also reported in 2005 by technical transfer specialists from the Province of Ontario Beekeepers’ Association (OBA) in Canada (OBA, personal communication). Unfortunately, these designs, which involve comparisons across generations in different years, confound heritable and environmental factors and, in turn, obscure the gains made from selection.

Queen breeders in the honey-producing Canadian prairies may be similarly suited to increasing the frequency of hygienic behaviour within their operations as the Argentinean beekeepers studied by Palacio et al. (2000). These beekeepers produce large numbers of queens for their own use that are openly mated within their operations for use in establishing new nucleus colonies in the spring. The nucleus colonies grow to populations that are able to survive the winter are used for honey production and the source of drones in the following year. Nucleus colonies mated the previous year may comprise up to 50% of the drone-producing colonies, as some beekeepers routinely kill all 2-year-old colonies to ensure only young and vigorous queens are present. Consequently, their populations experience relatively short generations which favours selection. Furthermore, Canadian prairie queen breeders are able to exercise some control over their queen’s paternity by virtue of the large territory occupied by their colonies, the great number of colonies (>1,000) they operate and the relative scarcity of feral or other commercial colonies within their territory.

The objectives of our study were to: (a) determine whether the frequency of hygienic behaviour could be increased among four Canadian prairie beekeeping operations through maternal selection of queens used in the production of their nucleus colonies and (b) estimate the genetic parameters of hygienic behaviour in this population. This study also provided the opportunity to investigate the larger question of how effective maternal selection is, on its own, in increasing the frequency of traits of apicultural importance in commercial populations.

2 Materials and methods

The general scheme of the study was to select commercial honeybee populations for hygienic behaviour over four successive generations and to determine if selection increased the frequency of the trait. The effect of selection was determined by comparing the change in the frequency of the trait among generations and in the observed additive genetic and environmental variances.

2.1 Study population

Four commercial honeybee populations (D, M, S and W) each operated between 1,300 and 3,000 colonies underwent phenotypic selection for hygienic behaviour between 2001 and 2005 (Figure 1). Annual selection of breeders in these operations was followed by the production of nucleus colonies, which, by the following year, effectively replaced between 10% and 65% of the queens in these operations with selected progeny. These four populations were collectively refer red to as Peace Select. A fifth population (Unselect) was included in the study beginning in 2003 (Figure 1). During the study, Unselect was not selected for hygienic behaviour and was used as an unselected benchmark to compare against the Peace Select populations.

Figure 1.
figure 1

Location of cooperating Peace River beekeeping operation extraction plants. Four beekeepers requeened replacement nucleus colonies with queens selected for hygienic behaviour (Beekeepers D, M, W and S) and one used unselected queens (Beekeeper Unselect). Each beekeeper’s colonies were located within 10–30 km of their extraction plants.

Selection for hygienic behaviour was initiated in May 2001 (Figure 2). Each of the four Peace Select beekeepers assembled a base population of between 10 and 27 colonies in a single apiary (Figure 2). These colonies were drawn from the general population on the basis that they exhibited superior apicultural phenotypes, such as large spring worker population, unbroken patterns of sealed brood, the absence of visible signs of disease and gentle temperament. These four base populations were then assayed for hygienic behaviour and the top two to six colonies mothered the subsequent generation of nucleus colonies. The breeder population for each successive generation of selection were drawn from a restricted subset of the previous generation of progeny, specifically, progeny of the previous generation’s two best breeders. Consequently, although progeny of each generation originated from two to six families, only progeny from the top two mothers could enter the next generation’s breeder population (Figure 2).

Figure 2.
figure 2

Experimental design. The breeder dataset consisted of hygienic behaviour assay results measured on a subset of colonies, following successive generations of selection, from the larger population of each of four commercial beekeepers (D, M, W and S). The colonies with the best assay results were used exclusively as the dams for each subsequent generation. The half-sib dataset consisted of a randomly selected apiary within each beekeeper’s operation stocked with queen cells from all of the beekeepers in the study as well cells from a beekeeper in the region not undergoing selection (Unselect), mated queens from a commercial offshore queen rearing supplier (Common) or a stock purebred for hygienic behaviour (Minnesota). The F2 half-sib apiaries did not include an Unselect apiary and benchmark stocks (Unselect, Common and Minnesota), but they were included when testing the F3 and F4 generations.

In addition to the hundreds of selected progeny propagated annually within each Peace Select population, half-sib families from all four Peace Select populations were assembled at common apiary sites in June 2002, 2003 and 2004, the F2, F3 and F4 generations respectively, to enable estimates of quantitative genetic parameters using half-sib analysis (Figure 2). These apiaries consisted of 80 colonies established from ten queen cells from each of two families from each Peace Select beekeeper (10 cells × 2 families × 4 beekeepers = 80 colonies). The four half-sib test apiaries were located within the territory of each Peace Select beekeeper. Queens within apiaries, consequently, were mated to the same population of drones, whereas queens among apiaries were mated to different populations. Test apiaries were considered to be a random selection within each beekeeper’s operation and queen cells were randomly assigned among colonies within each apiary.

The design of the test apiaries was modified for the F3 and F4 generations and a fifth test apiary, located in the Unselect operation, was included. In addition, the following 36 (F3) to 52 (F4) benchmark colonies were added to the 80 colonies at each test yard: (1) 12 (F3) or 20 (F4) Unselect queen cells, (2) 12 (F3) or 20 (F4) mated Minnesota queens and (3) 12 (F3, F4) mated Common queens. Subsequently, the total numbers of colonies at each of the five F3 and F4 test yards was 116 and 132, respectively. Some colonies did not survive until the time they were to be tested. The actual number tested appears in Figure 5.

The Unselect queen cells were reared from three (F3) or two (F4) different breeder queens that were selected for important apicultural traits, but not for hygienic behaviour. Two benchmarks stocks used in the progeny test apiaries were from: (1) an offshore queen producer whose queens are widely used in Western Canada (Common) and (2) a closed population at the University of Minnesota (Spivak and Reuter 2001a, b) (Minnesota). Common queens were mothered by at least six different breeder queens and open mated within the offshore queen producer’s breeding populations. Minnesota queens were imported from M. Spivak’s lab in 2003. Each subsequent generation of Minnesota was selected and propagated by the authors. Selection involved rearing queens and drones from the six to eight progeny with the highest level of hygienic behaviour and crossing them freely at an isolated mating location or by instrumental inseminating virgins with a mixture of semen. Semen mixing was achieved by inseminating each virgin with 6–8 μL of semen collected in a random sequence from the selected progeny colonies (Spivak and Reuter 2001a, b).

2.2 Environment

The study took place in the Peace River region, located in the province of Alberta in Western Canada (Figure 1). The region is on the north-western tip of the aspen parkland, which is a transitional biome between the arid prairies and the boreal forest, and among the most productive honey-producing area in the world. Nevertheless, there is considerable variation in colony productivity within this region among different years and among different locations in the same year (Szabo and Heikel 1987). Given that some of the beekeeping operations studied were greater than 100 km from one another, it can be assumed that there was considerable variation among locations within a given year.

Each new generation was established in June with a single Langstroth frame of brood covered with adult workers and a selected queen cell. Following the placement of the queen cells, beekeepers confirmed the maternity of the queens heading the colony by: (1) ensuring that the queen cell had emerged within a week of placement and all other queen cells were destroyed and (2) the colony had a laying queen within four weeks. Given the large size of the operations and logistical considerations of this study, the identity of the queen beyond this point was assumed, meaning that we could not discount the possibility of her replacement by supersedure and that her daughter might head the colony in the period following. We also mitigated for this possibility by the inclusion of benchmark stocks in the study (described below). All colonies grew to populations capable of wintering and were then evaluated for hygienic behaviour the following May.

2.3 Phenotype measurements

Colonies were assayed for hygienic behaviour using the freeze-killed brood method as described by Spivak and Reuter (1998). The number of uncapped and removed pupae in each frozen brood patch was evaluated 24 and 48 h after freezing. Some colonies prior to 2003, however, were only evaluated at 48 h. Colonies were typically assayed twice within a period of three weeks, but would sometimes only be tested once if there was not a suitable patch to freeze on the second assay date, or if the colony had become queenless. In all years, assays were conducted in the month of May when colonies were actively foraging on willow (Salix spp.) and dandelion (Taraxacum officinale F.H. Wigg. ssp. officinale) flowers.

2.4 Statistical analysis

2.4.1 Phenotypic parameters

Two sets of data were used to test hypotheses and estimate additive genetic and environmental parameters, specifically: (1) the five generations (parental, F1, F2, F3 and F4) of breeder selection (Breed-Data) and (2) the three generations (F2, F3 and F4) of half-sib family tests (Sib-Data) (Figure 2). The Breed-Data consisted of two fixed effects: generation (GEN; four levels—results from the F1 were excluded because data were not comparable to other years) and bee breeder (BRD; four levels). The Sib-Data consisted of two fixed effects: GEN (three levels) and either BRD (F2 = four levels, F3 and F4 = seven levels) or breeder type (BRDTYP). For breeder type, all the Peace Select progeny were pooled into one level and each benchmark stock was considered a separate level (F2 = one level, F3 and F4 = four levels). The Sib-Data also included one random effect: apiary (APIARY; F2 = four levels, F3 and F4 = five levels), which was considered a random apiary within the 25–75 different apiaries within each beekeeping operation. In both datasets individual colonies were the experimental unit and the dependent variable was the percentage of frozen brood removed at 24 or 48 h (ra24%, ra48%) transformed by root arcsine (Zar 1999). Each variable (ra24% or ra48%) was calculated as the average percentage of two assay rounds, unless there was only one assay result.

The hypotheses that ra24% or ra48% did not differ among GEN, BRD or an interaction between the two was tested using the Breeder-Data using PROC MIXED (Littell et al. 1996) and among GEN, BRDTYP or an interaction between these for the Sib-Data. Unlike the analysis of the Breeder-Data, however, a random effect, APIARY, and its interaction with the main effects was also included in the model. Where there were significant fixed effects, means were separated using Tukey’s HSD test. Furthermore, where interactions between fixed effects were significant, simple main effects were also tested using the SLICE option of the LSMEANS statement in PROC MIXED.

We tested the hypothesis that the proportion of colonies from the breeding population expressing hygienic behaviour did not differ across GEN or across BRD by GEN using Pearson χ 2 contingency tests (SAS Institute 2001). Colonies were only considered positive for the trait if they removed an average of >95% of the frozen brood across the both rounds of tests at either 24 or 48 h. Only colonies from the F2, F3 and F4 generations were used for the analysis, because colonies in these generations were largely evaluated across two consecutive rounds of assays. Where GEN or BRD by GEN was found to be significant, means were separated using a modified Tukey’s HSD test for proportional data (Elliott and Reisch 2006).

2.4.2 Genetic parameters

Genetic parameters were calculated from the hygienic behaviour results of the Peace Select half-sib daughters (F2, F3 and F4) and their dam mothers (F1, F2 and F3). (Co)variance components were estimated using a DMU software package with an EM algorithm for the analysis of the multivariate mixed model (Madsen and Jensen 2006). The package estimated covariance of random effects by using a restricted maximum likelihood method.

The following additive model was used to estimate the (co)variance components:

$$ y = Xb + Za + e $$

where y is the vector of the observations for the trait(s) in each analysis, X and Z are the incidence matrices relating to the observations to the fixed and random effects, respectively; b is a vector that contains the fixed effect of GEN (1, 2, 3 and 4), APIARY, sire population; a is the vector of random additive genetic effect (colony), and e is the vector of random residual effect. The expectations and assumed variances are: E(y) = Xb; E(a) = E(e) = 0; V(a) = G; V(e) = R; cov(a, e′) = 0; and V(y) = ZGZ′ + R; where G is the direct product between the numerator relationship matrix (A) for colonies and the matrix of genetic variance and covariances (G = A × G 0); R is the direct product between an identity matrix of order of the number of observations and the matrix of error variances and covariances (R = I × R 0). The numerator relationship matrix (A) was constructed based on the assumption that the genotype of the queen (dam) was known and sire unknown, precluding the necessity to modify the A matrix for the degree of relationship within a given population. The dataset consisted of only the Peace Select half-sib results (F2, F3 and F4) and their mothers (F1, F2 and F3), since data from the base population were not comparable. The genetic parameters of the base population were inferred using the relationship matrix. Heritability, correlations and expected breeding values were estimated using a colony variance from a total variance. Repeatability was also calculated using a variance component from repeated measures analyses of the two rounds (24 and 48 h) assay results.

3 Results

3.1 Phenotypic parameters

Breeder populations removed more freeze-killed brood in the years following selection (Figure 3; 24 h—F = 26.46; df = 3, 332; P < 0.01; 48 h—F = 19.85; df = 3, 347; P < 0.01) and, consequently, exhibited higher frequencies of hygienic behaviour (Figure 4; 24 h—χ 2 = 54.78; df = 2, 325; P < 0.01; 48 h—χ 2 = 78.37; df = 2, 326; P < 0.01). The mean proportion of frozen brood removed over 48 h, for example, increased from 68% in the base population to 86% following four generations of selection (Figure 3). Similarly, the proportion of breeder colonies considered hygienic at 48 h was only 6% in the F2 population, but rose to 44% in the F4 population (Figure 4).

Figure 3.
figure 3

Mean percentage removal of frozen brood after 24 and 48 h among colonies in the breeder populations for each Peace Select generation. Results from the F1 generation have been omitted because the data were not comparable (see text). Different Roman letters above each bar indicate significant differences among breeder for a given generation and assessment period (Tukey’s HSD, α = 0.05). Different Greek letters, in contrast, indicate significant differences among generation for each assessment period. Each succeeding generation of queens among each of the four Peace Select beekeepers was reared exclusively from the two colonies in their pool of putative breeders with the highest percentage removal.

Figure 4.
figure 4

Proportion of breeder colonies exhibiting hygienic behaviour (>95% removal) after 24 or 48 h on two consecutive assays. The first two generations of selection were omitted as the figure as they were tested with one assay, and thus, results were not comparable. Different Roman letters above each bar indicate significant differences among breeder for a given generation and assessment period (Tukey’s HSD, α = 0.05). Different Greek letters, in contrast, indicate significant differences among generation for each assessment period.

The GEN in which a colony was tested, however, was not the only factor that explained the variation in the proportion of brood removed among the breeder colonies. The breeding population (BRD; 24 h—F = 20.33; df = 3, 332; P < 0.01; 48 h—F = 8.39; df = 3, 347; P < 0.01) and the interaction between the breeding population and generation (GEN*BRD; 24 h—F = 3.16; df = 7, 332; P < 0.01; 48 h—F = 3.54; df = 8, 347; P < 0.01) also explained a significant amount of variation (Figure 3). Our analysis of simple main effects provides insight into the interaction between generation and breeding population as only in the base population were no differences among breeding populations detected (24 h—F = 0.54; df = 2, 332; P = 0.58; 48 h—F = 0.23; df = 2, 347; P = 0.88).

Unlike the breeding population, the generation in which half-sib colonies were tested did not influence the proportion of frozen brood removed at 24 (F = 3.73; df = 2, 6; P = 0.09) or 48 h (F = 2.18; df = 2, 7; P = 0.18). The source of the queens heading the half-sib colonies (BRDTYP), in contrast, was a significant explanatory variable for both the 24 (F = 17.16; df = 3, 11; P < 0.01) and 48 h tests (F = 7.39; df = 3, 11; P = 0.01). There were no GEN*BRDTYP interactions at 24 (F = 2.02; df = 3, 737; P = 0.11) or 48 h (F = 1.56; df = 3, 804; P = 0.20).

Although the Peace Select half-sib colonies collectively expressed a higher level of hygienic behaviour compared with the Common benchmark, their performance was similar to the Unselect benchmark (Figure 5). Furthermore, the Peace Select colonies did not exhibit the high levels of hygienic behaviour observed in the Minnesota benchmark stock. While there was some variation within the Peace Select colonies, most notably the families from Beekeeper M performing better than the families from Beekeeper W, only the stock from Beekeeper M outperformed the Unselect benchmark.

Figure 5.
figure 5

Mean percentage removal of frozen brood after 24 and 48 h among half-sib test colonies pooled across three generations (F2, F3 and F4). The number of colonies from each of the three generations appear on the bars in the following order (F2 + F3 + F4 = total in pool). Different Roman letters above each bar indicate significant differences among means at 24 or 48 h (Tukey’s HSD, α = 0.05). Different Greek letters, in contrast, above each benchmark and the pool of Peace Select beekeepers indicate significant differences among means at 24 or 48 h.

3.2 Genetic parameters

The percentage of frozen brood removed at 24 or 48 h was analysed untransformed, as estimates of the coefficient of skewness and kurtosis indicated that root arcsine transformation did not improve the departures from normality.

The heritability of hygienic behaviour, expressed as the proportion of total variance attributable to differences in breeding values among the colonies, was moderate to low and only slightly higher at 48 h compared to 24 h (Table I). The level of heritability we observed suggests the hygienic behaviour assay was, at best, only somewhat reliable at predicting breeding values within the studied population. In contrast, the repeatability between the two rounds of testing ranged from moderate at 24 h (0.35 ± 0.13) to high at 48 h (0.48 ± 0.17). While repeatability often estimates the upward bounds of heritability, the higher values for repeatability suggests that permanent environmental effects play a role in the variation observed in hygienic behaviour.

Table I Mean, standard deviation and estimated genetic parameters for the percentage of frozen brood removed at 24 and 48 h among the F2, F3 and F4 progeny colonies.

The mean estimated breeding value for 24 and 48 h removal increased steadily over generations (Table II), which suggests selection increased the overall expression of the traits. Nonetheless, while the EBV increased over 5-fold between the F2 and F4 generations for the 24 h scores, the average expected increase in removal by the progeny of the F4 generation was a modest 0.173%. The 48 h removal decreased in the F3 generation (−0.057) but increased to 0.339% in the F4 generation.

Table II Estimated breeding values (EBV) of F2, F3 and F4 progeny colonies for the percentage of frozen brood removed at 24 and 48 h.

Correlations between the 24 h and 48 h assay results were high. Phenotypic correlation between observations at the two time periods was 0.90, whereas genetic correlation was 0.98 ± 0.22.

4 Discussion

Our study seemingly supports contradictory conclusions about the effectiveness of selecting for hygienic behaviour in open-mated commercial beekeeping operations. The analysis of the breeder populations suggests a significant increase in hygienic behaviour, particularly between the base population and F4 generation. In contrast, our analysis of the colonies in the half-sib family test apiaries suggests selection did not markedly improve the level of hygienic behaviour across the larger populations.

We suggest that the contradictions in our conclusions underscore a problem with year-to-year comparisons among breeding populations. While this kind of analysis has been used in the past to test the hypothesis that selection increases hygienic behaviour (Palacio et al. 2000), it is limited in that comparisons among successively selected generations are made across different years. In doing this, year is confounded with generation, making it impossible to partition year-to-year variation due to additive genetic effects from the year-to-year environmental variation. As an example, such environmental variation could include changes brought about in hygienic behaviour as related to the strength of nectar flows within a given year (Momont and Rothenbuhler 1971).

Ideally, bee breeders evaluate the effects of selection by comparing the performance of multiple generations in a common environment. Unfortunately, they do not have a reliable way to preserve honeybee germplasm for the term of a selection programme, unlike plant and other livestock breeders. To overcome this limitation honeybee breeders have assessed selection relative to an unselected benchmark population (Guzmán-Novoa and Page 1999) or a benchmark population bred in a negative direction (Spivak and Gilliam 1993; Spivak and Reuter 2001a). When the selected and benchmark populations diverge, the hypothesis that selection has no effect on a population is falsified. In our study, for example, we did not observe a significant divergence from our unselected benchmark and therefore conclude that selection did not significantly increase the frequency of hygienic behaviour.

One reason for the slow response to selection in our study compared with other findings (cited below) may be the low heritability of the trait in our population. Whereas the repeatability (ρ = 0.35–0.42) of our assay was similar to that in other studies (ρ = 0.46 Boecking et al. 2000; ρ = 0.60 Lapidge et al. 2002), our estimates of heritability (h 2 = 0.17–0.25) lay on the lower end of that previously observed (h 2 = 0.65 Harbo and Harris 1999; h 2 = 0.36 Boecking et al. 2000; and h 2 = 0.45–0.63 Stanimirović et al. 2008). While each of these studies estimated repeatability and heritability differently, comparison with our results suggests that the variation in hygienic behaviour observed in our population is more influenced by environmental factors (Panasiuk et al. 2009). Regardless of the heritability, it was clear that 24 and 48 h assay results were highly correlated, not only phenotypically, but genetically. While this may not be surprising, as the two traits are temporally dependent on one another, it does suggest that selection based on either 24 or 48 h acts across the same set of loci.

We did observe variation in hygienic behaviour among the Peace Select stocks, with the progeny of one beekeeper (M) demonstrating higher levels of the trait than either commercial stock (Common) or unselected stock. One might expect that higher rates of requeening with selected stock would increase recruitment of selected drones to the breeding population each generation, increasing selection gains. It is unclear to the extent this was a factor in our study. While beekeeper M did report one of the higher rates of requeening in the study (40%), the beekeeper with next highest level of hygienic behaviour, beekeeper S, had the lowest requeening rate (10%). In contrast, beekeeper D, with the highest rate of requeening (65%) did not have the highest levels of hygienic behaviour. It would be interesting to conduct a study with commercial producers whereby the rate of requeening could be tightly controlled to determine the effects on realised selection gains.

Another possibility for the variation in the trait observed among Peace Select beekeepers was the endemic level of hygienic behaviour within operations at the commencement of the study. While beekeeper M had a significant proportion of breeding colonies in the initial rounds of selection that removed >95% of the frozen brood, many of the other beekeepers had very few or none. One strategy that might have reduced this variation would have been to increase the pool of breeders from operations in which the trait is rarer to ensure that only colonies with a very high level of the trait were bred from.

The Peace Select population did not appear unusual from other European-derived populations with respect to hygienic behaviour. The initial level of hygienic behaviour we observed among the Peace Select beekeepers was equivalent to levels observed elsewhere. Early in a breeding programme conducted in Ontario, Canada, for example, colonies removed 57% of the frozen brood in 24 h (OBA, personal communication) compared with 54% in our study. In Argentina, where brood was pin killed instead of freeze killed, colonies initially removed 66% of their dead brood in 24 h (Palacio et al. 2000), although pin killing results in more rapid removal of brood than freeze killing (Spivak and Downey 1998). While some bee populations from Africa (Fries and Raina 2003; Kamel et al. 2003) and the Russian far-east (de Guzman et al. 2002) appear to have higher endemic levels of hygienic behaviour, the population we studied appeared similar to the majority of European origin previously studied. Given the similarities, it is possible that other European-derived honeybee populations confront a similar challenge with low heritability.

Our findings appear to contradict those of Unger and Guzmán-Novoa (2009) who found evidence of strong maternal effects associated with hygienic behaviour. Such effects would suggest that dam selection might proceed more rapidly than we observed. It is possible, however, that such maternal effects may be specific to the population they tested, which originated from the Primorsky region of Russia, and may not apply more generally to all European populations. A concerted survey for the presence of maternal effects among different selected populations would be needed to test this hypothesis.

There are a number of strategies breeders have developed to overcome low heritability. Many of these strategies are predicated on the existence of a multi-tiered hierarchy that separates elite, or nucleus, breeding populations from the much larger production, or multiplier, populations (Harris and Newman 1994). The adoption of a closed, elite breeding population makes it feasible to conduct controlled mating, record pedigrees and conduct progeny testing, all of which facilitate the selection and establishment of traits with low heritability (Harris and Newman 1994). With respect to hygienic behaviour, this would be a departure from suggested efforts “to have many queen breeders that will select for the behaviour among their own lines of bees to maintain genetic variability within and among bee lines and to increase the behaviour in the general population of honeybees” (Spivak and Gilliam 1998b). To be clear, this does not mean such a strategy would be unable to realise improved levels of hygienic behaviour in some commercial populations. In fact, there is evidence of the recent successful implementation of this strategy in some US breeding populations (M. Spivak, personal communication). Nonetheless, in populations such as those we have studied, demonstrable increases in the trait could be accelerated by the implementation of elite breeding populations. Though more complex, using breeding schemes in which selected genetics not only work their way down to production populations, but superior families flow up to the elite population, would help maintain genetic variability and improve selection gains (Harris and Newman 1994).

A strategy that would allow honeybee breeders to overcome the low heritability without capitulating to a tiered breeding system is to use an assay that better predicts genotype. Not all assays for hygienic behaviour are equally as heritable (Boecking et al. 2000). Our results, for example, demonstrated that the heritability of a 24-h assay was lower than that of a 48-h assay, suggesting at least for the base population, the 48-h assay offered more opportunity for genetic improvement. Furthermore, the heritability of other colony-level behavioural traits can vary considerably depending of environmental conditions, sometimes differing dramatically between different days of testing (Moritz et al. 1987). The freeze-killed assay might be improved by conducting the tests at a period when heritability is higher or by increasing the number of successive times the colony is assayed. In the latter case, however, our results suggest that two rounds of testing are adequate as these rounds were moderately repeatable. Another approach would be to substitute the freeze-killed assay for an assay that detects the presence of genetic markers associated with the genes responsible for hygienic behaviour. At present, no markers have been identified that are unambiguously associated with the trait, although great strides have been made towards the identification of quantitative trait loci (QTL) associated with the genetic variation in hygienic behaviour. Recently, three QTLs were identified that could account for up to half the genetic variance in typical populations (Oxley et al. 2010). The continued search for these markers, however, is likely to progress particularly in light of the characterization of new single-nucleotide polymorphisms (HBGSC 2006) and high-throughput identification of proteins (Chan et al. 2009).

Given the low heritability of hygienic behaviour, it may be useful for researchers to search for other AFB-resistance traits with higher heritability. There are a number of other traits for AFB resistance whose heritability is unknown (Spivak and Gilliam 1998a). There is reason to suspect that at least one of these traits is heritable given the rapid success realised in breeding for AFB resistance in the USA between 1935 and 1949 (reviewed by Rothenbuhler 1958; and Spivak and Gilliam 1998a). The hypothesis that other AFB-resistant traits are equally heritable as hygienic behaviour can be evaluated using an experimental design previously used to estimate the relative heritability of traits conferring resistance to V. destructor (Harbo and Harris 1999). In light of the challenges of breeding for hygienic behaviour, a re-examination of the heritability of other traits may make the propagation of AFB-resistant stock a more achievable prospect for bee breeders, particular those who rely on open-mating.