Introduction

Social evolution has repeatedly integrated individuals into larger functional units such as social insect colonies, with great evolutionary success (Maynard Smith and Szathmáry, 1995). Cooperative groups are better at manipulating and exploiting their environment, defending resources and brood, and allow for task specialization among group members (Oster and Wilson, 1978; Hölldobler and Wilson, 1990). However, the close proximity of group members to each other may also lead to increased competition for resources and an increased probability of disease transmission (Schmid-Hempel, 1998). Genetic diversity may therefore benefit cooperative groups by increasing group persistence and productivity through genotype-based task specialization (Oldroyd and Fewell, 2007) and enhancing disease resistance (Brown and Schmid-Hempel, 2003).

Social insect colonies are highly cooperative groups that usually consist of descendents of a single or a few reproductives. Their high genetic relatedness makes the colony members genetically similar and may thus decrease colony fitness. However, several mechanisms exist to increase the genetic diversity among colony members of social insects. All social insects investigated to date have the exceptionally high genetic recombination rate (Beye et al., 2006; Wilfert et al., 2007) and several taxa have evolved high levels of multiple mating (e.g. Fjerdingstad and Boomsma, 2000; Wattanachaiyingcharoen et al., 2003; Kronauer et al., 2007).

Several non-mutually exclusive hypotheses have been suggested to explain multiple mating in social insects. Most of them focus on the benefits of genetic diversity, including decreased intra-colonial conflict (H5 of Crozier and Page, 1985; Moritz, 1985; Ratnieks and Boomsma, 1995), reduced load from sterile male production (Page, 1980; H8 of Crozier and Page, 1985), enhanced division of labour (Crozier and Consul, 1976; H2 of Crozier and Page, 1985) and increased disease resistance (H2 of Crozier and Page, 1985; Sherman et al., 1988; Shykoff and Schmid-Hempel, 1991).

Among these hypotheses, enhanced disease resistance and enhanced division of labour have recently received the most attention. In both cases, multiple mating may be selected for because genetic diversity increases the performance of colonies or because genetic diversity within colonies reduces variation among colonies by reducing the influence of single patrilineal genotypes (Rueppell et al., 2008). A comparative simulation study (Brown and Schmid-Hempel, 2003) found resistance to parasites and diseases more plausible than the division of labour hypothesis and empirical support for the importance of pathogens comes from honeybees (Seeley and Tarpy, 2007), bumblebees (Baer and Schmid-Hempel, 1999) and leaf-cutting ants (Hughes and Boomsma, 2004). However, the conditions for increased disease resistance via genetic diversity may be restrictive (van Baalen and Beekman, 2006) and other theoretical analyses have found support for the division of labour hypothesis (Waibel et al., 2006). In addition, empirical support for enhanced division of labour in genetically diverse colonies continues to accumulate (Fuchs and Schade, 1994; Jones et al., 2007; Mattila and Seeley, 2007; Mattila et al., 2008; Wiernasz et al., 2004, 2008) but see (Fournier et al., 2008).

Many studies suggest that division of labour in social insects has a genetic basis (Robinson and Page, 1988, 1989; Fewell and Page, 1993; Oldroyd et al., 1994; Page and Fondrk, 1995; Page et al., 1995; Robinson and Page, 1995; Kryger et al., 2000; Jones et al., 2004, 2007; Chapman et al., 2007; Hughes and Boomsma, 2007; Jaffe et al., 2007). The genetic basis may be based on individual differences in response thresholds to task-related stimuli, leading to differential responsiveness and thus behavioural specialists (Page and Mitchell, 1998; Beshers and Fewell, 2001). Theoretically, more behavioural specialization can lead to more efficient task allocation and performance, increasing the effectiveness of division of labour and consequently enhancing colony performance. This argument only holds if response thresholds for different tasks vary among individuals independently of each other, which has been shown in a study of bumble bees (Weidenmuller, 2004) and honey bees (Goode et al., 2006).

Early models have shown that division of labour emerges from different response thresholds in simple group and task scenarios (Page and Mitchell, 1998). Later models incorporated more complex principles such as self-reinforcement (Beshers and Fewell, 2001), evaluated the effects of the genetic architecture of behavioural specialization (Bertram et al., 2003), and linked task performance to colony fitness and evolution (Waibel et al., 2006). Studies have traditionally focused on one task (Graham et al., 2006; Myerscough and Oldroyd, 2004; Bertram et al., 2003), two (Page and Mitchell, 1998), or a fixed set of five tasks (Waibel et al., 2006). However, more modelling efforts are needed to extend these scenarios. Task number is variable and may exceed 20 in the most complex insect societies (Wilson, 1976; Johnson, 2008a) and the maximum number of distinct matrilineal or patrilineal classes of workers match that number in some species (Kronauer et al., 2007; Wattanachaiyingcharoen et al., 2003).

Furthermore, it has never been addressed to our knowledge that there are two different types of task-associated variables in a typical social insect colony: variables that need to be maintained above a certain threshold with little fitness costs of excess performance such as nest defence (Breed et al., 1990), and variables that need to be maintained close to an optimum such as temperature (Jones et al., 2004; Graham et al., 2006). Respectively, honeybee workers only decrease and never actively increase the stimulus for nest defence but regulate brood nest temperature in both directions by either vibrating thoracic muscle to increase the temperature of the brood nest (Bastian and Esch, 1970) or by wing fanning and water evaporation to decrease the temperature (Chadwick, 1931; Lindauer, 1954). Another example of this dichotomy is pollen and nectar hoarding behaviour in honeybees: When given the opportunity, bees collect as much nectar as possible with no apparent detrimental effects of excess, while pollen hoarding is tightly regulated (Fewell and Winston, 1992) and workers can increase or decrease colony pollen levels by gathering or consuming pollen, respectively. We call the former maximum tasks, and the latter optimum tasks.

In this study, the effects of broad variations in task number and mating number on colony performance were simultaneously assessed by computer simulation. Two model versions were constructed, based on maximum tasks and optimum tasks, and each assessed with two scenarios of response thresholds. We did not attempt to simulate the specific biology of any social insect species but the models were conceived for a species with relatively large colony size and potentially high numbers of patrilines and tasks such as the honeybee.

Methods

Model description

The simulation models were constructed using C++ (compiled and executed under Windows) to systematically investigate the effect of varying number of tasks and patrilines on colony performance. Colonies of 14,000 workers were simulated, omitting the reproductives and brood as potential task performers. We focused on patriline genotype as the sole worker status variable that influenced a worker’s likelihood to perform a specific task. This simplification omits matrilineal genetic variation and environmental influences and represents a simple fixed-threshold model (Beshers and Fewell, 2001). Colony performance was used as a measure of how well a colony could regulate the actual value of a task-related colony variable with respect to its optimum value (optimum task model, Fig. 1a) or whether it kept the actual value of above a threshold (maximum task model, Fig. 1b).

Fig. 1
figure 1

Flow diagram of the simulation of the effectiveness of the division of labour in a theoretical insect colony made up from patrilines that differ in their response threshold to a varying number of tasks. The first, optimum trait model (a) addressed colony traits that need to be maintained at an optimum value and for which workers perceive the deviation of the actual value from that optimum (o i optimal value of task-associated variable, a i actual value of task-associated variable, l ij response threshold). The second, maximum trait model (b) assumes that trait values only need to be maintained above a lower limit without penalty for higher values and that actual trait levels are directly perceived by workers (see text for further explanation)

Further assumptions were made: (1) all workers perform tasks at the same time and each task is completed in the same time step. This assumption was needed to eliminate bias due to the ordering of the patrilines in the simulation. (2) All workers from the same patriline react stereotypically given the same stimuli due to a genetic determination of worker response thresholds (Robinson and Page, 1988; Pankiw et al., 2002; Chapman et al., 2007). (3)Workers can accurately determine the actual and optimum level of all task-related variables when idle. While it may be unrealistic for all workers to have direct access to the current status of all task-related variables in a colony at all times, there is evidence that workers acquire colony-wide information (Johnson, 2008b) through patrolling and cell inspections (Lindauer, 1952; Kolmes, 1985; Dreller and Tarpy, 2000), trophallaxis (Camazine, 1993), pheromones (Pankiw et al., 1998; Pankiw and Page, 2001; Pankiw and Rubink, 2002; Pankiw, 2004a, b, 2007), and mechanical signals (Schneider et al., 1986; Schneider, 1987). (4) Each drone has the same probability to father a worker offspring, i.e. patrilines are randomly represented among the workers.

Each colony, made up of a certain number of patrilines (j = 1,…,p), was given a certain number of tasks (i = 1,…,k) to perform in order to regulate colony variables such as temperature or food stores. The actual values of the colony variables (a i for the ith task-associated variable at time step n) were allowed to vary in the range of values [0, 9,999] and were initially set to 0. The corresponding optimal values (o i ) were randomly generated in the range of values [3,750, 6,250]. Each variable had a corresponding response threshold which was randomly generated for each patriline (l ij for task i and patriline j).

In the optimum task model, the response threshold of a worker for a task was compared to the difference between the actual and optimum values of the task-associated variable (d i  = |a i  − o i |) to determine a worker’s priority for this task (Fig. 1a). Therefore, the task was evaluated relative to the optimum value. The response thresholds in this model were randomly generated in the range of values [0, 6,250], which corresponded to the range of values for d i . On the other hand, in the maximum task model, the priority of a task was directly determined by how much the actual value of the task-associated variable was below the absolute worker response threshold (Fig. 1b). The response thresholds in this model were randomly generated in the range [0, 9,999], which corresponded to the range of values for a i . To evaluate whether having response thresholds closer to the optimum values for each task-associated variable would change the model predictions, the simulations were rerun with response thresholds in the narrower ranges of [0, 2,500] and [3,750, 6,250] for the optimum and maximum task model, respectively.

In both model versions, workers engaged in the task with the highest priority, provided that they were idle or had performed this task during the previous time step. Direct task switching was precluded to account for spatial task heterogeneity and other factors (e.g. physiological adjustment) that impede direct task switching (Beshers and Fewell, 2001; Tofts and Franks, 1992). Therefore, workers usually performed a selected task as long as they perceive a need for it (optimal task model: d i  > l ij , maximum task model: a i  < l ij ) and became idle otherwise (Fig. 1). If a worker was idle, the actual task-associated values were unaffected. On the other hand, if a worker performed a task, the actual value for the task-related variable was either increased (maximum task model) or the difference between actual and optimal task-associated values, d i , was decreased (optimum task model) by 1/100 unit (regardless of task number). The intermediate value of 1/100 was selected to simulate colonies that were effective but not over-efficient at regulating colony parameters. In contrast with the maximum task model, the ability of workers in the optimum task model to either increase or decrease the actual task-associate values leads to potentially conflicting behaviour among patrilines.

After the adjustment of the actual task-associated values, the performance of a colony with p patrilines and k tasks at time step n (f(p,k) n ) was determined by averaging the negative of the deviations over all tasks. The negative of d i was used in both models so that colonies, which had fewer deviations from the optimal value, would have a higher colony performance value.

$$ f(p,k)_{n} = \frac{1}{k}\sum\limits_{i = 1}^{k} { - d_{i} } $$
(1)

In the optimum task model d i was the deviation of the actual value from the optimal value i.e., d i  = |a i  − o i |. In the maximum task model d i was defined the same as in the optimum model when o i  > a i , and zero otherwise. Thus, a penalty for over-performing a given task by devoting too many workers to it is applied in the optimum task model. However, there is no penalty in the maximum task model, which is the second significant difference between these two models.

At the end of each time step, each task level a i was randomly perturbed by a value in the range [−250, 250] for the optimum task model and [−400, 100] for the maximum task model to simulate environmental influences. Following this, a new time step was initiated except for at the end of the simulation (Fig. 1).

Simulations

To obtain a realistic estimate of the performance of a particular colony type (with p = # of patrilines) and task scenario (k = # of tasks), the performance f(p,k) n given by (Eq. 1) was averaged over all N time steps in one simulation and over all S simulations that involved a unique colony type/task scenario (p, k) combination:

$$ f(p,k) = \frac{1}{SNk}\sum\limits_{s = 1}^{S} {\sum\limits_{n = 1}^{N} {\sum\limits_{i = 1}^{k} { - d_{\text{ins}} } } } . $$
(2)

To empirically determine the number of time steps N required to obtain a reasonable estimate of performance, f(p,k), we calculated the range of time steps n which fulfilled

$$\left| {f(p,k)_{n} - \frac{1}{{100,000}}\sum\limits_{{m = 1}}^{{100,000}} {f(p,k)_{m} } } \right| < \left( {\frac{1}{{100,000}}\sum\limits_{{m = 1}}^{{100,000}} {f(p,k)_{m} } } \right){{\left( {\frac{1}{{100,000}}\sum\limits_{{m = 1}}^{{100,000}} {f(p,k)_{m} } } \right)} \mathord{\left/ {\vphantom {{\left( {\frac{1}{{100,000}}\sum\limits_{{m = 1}}^{{100,000}} {f(p,k)_{m} } } \right)} {100}}} \right. \kern-\nulldelimiterspace} {100}}.$$
(3)

This allowed us to see over what range of time steps the colony performance was less than 1% different from the assumed true colony performance, which we assumed to be when the number of time steps was 100,000. The inequality held in an intermediate case (k = 7, p = 7) under a wide range of response thresholds (optimum task model: [0, 6,250]; maximum task model: [0, 9,999]) for n in [21,130, 100,000] in the optimum task model and for n in [34,229, 100,000] in the maximum task model. Respectively, all following simulations were run with 25,000 and 35,000 time steps. The same process was used for determining the number of distinct simulations to be performed, indicating 2,615 and 4,933 separate simulations, respectively, which was implemented by 3,000 (optimum task model) and 5,000 (maximum task model) independent simulations of each p, k scenario in the following experimental design.

The number of tasks (k) and patrilines (p) were independently varied from 1 to 15, creating 225 different colony type/task scenarios. Since results converged toward the higher values of either the number of patrilines or the number of tasks (see results below), more simulations with values beyond 15 for either variable were deemed unnecessary, even though they might be biologically justifiable (Beshers and Fewell, 2001; Oldroyd and Fewell, 2007). The convergence was confirmed by running 5 additional simulations with 1 patriline and 50 tasks, 10 patrilines and 50 tasks, 50 patrilines and 50 tasks, 50 patrilines and 10 tasks, and 50 paltriness and 1 task for the wide distribution of response threshold scenario (optimum task model: [0, 6,250]; maximum task model: [0, 9,999]). For each of the 230 combinations, we ran 3,000 independent simulations of the optimum task model (total of 690,000 simulations) with 25,000 time steps and 5,000 simulations of the maximum task model (total 1,150,000) with 35,000 time steps. We ran the same set of simulations with a narrow threshold scenario (optimum task model: [0, 2,500]; maximum task model: [3,750, 6,250]), varying k and p from 1 to 15. For each model version, the average performance of colonies composed of p patrilines facing k tasks, f(p,k), was calculated and transformed onto the interval [0, 1] according to Eq. 4 before calculating averages and standard deviations:

$$ f^{*} (p,k) = \frac{{f_{\text{Max}} - f(p,k)}}{{f_{\text{Max}} }} $$
(4)

where f Max = 6,250 is defined as the theoretical maximum of f(p,k). Results were subsequently evaluated with regard to p patrilines and k tasks.

We used parametric statistics, justified by our large sample sizes and the central limit theorem (Kallenberg, 1997). Thus, for each model we used 2-factorial ANOVAs (patriline number × task number) with Dunnett’s T3 post hoc tests to assess differences between scenarios and multiple regressions to test statistical significance of overall trends. We used a subsample of the raw data (n = 500 for each scenario, N = 115,000 for each model) because the complete dataset proved to be too large to be handled by our statistical software (SPSS® version 14.0). For overall model comparisons, the averages for each of the task/patriline scenarios were evaluated by ANCOVAs with the number of patrilines and tasks as covariates.

Results

All simulations averaged over all 230 task/patriline combinations in the optimum task model resulted in an average colony performance of 93.4% (±5.0% SD) of the theoretical maximum performance. The mean values of each task/patriline scenario ranged from the minimum of 58.6% (±0.6; calculated from 3,000 independent simulations) for one patriline facing 50 tasks to the maximum of 97.7% (±0.2%) for 50 patrilines facing 10 tasks. Overall, simulation scenarios were significantly different from each other (F (229,114770) = 2,843, P < 0.001) with post hoc tests revealing significant differences for most pairwise comparisons of different numbers of patrilines or tasks (Table 1). In general, colony performance increased with an increase in the number of patrilines and decreased with an increase in the number of tasks (multiple regression r 2 = 0.32, F (2,114997) = 262,857, P < 0.001, βpatrilines = 0.53, βtasks = –0.28).

Table 1 Results of Dunnett’s T3 post hoc tests for differences between the different task/patriline scenarios in the optimum task model based on random subsets of 500 simulations for each scenario (only P values are given: above diagonal: task differences, below diagonal patriline differences)

For the maximum task model, the overall average was 93.8% (±7.6%) and the lowest average performance was recorded for one patriline with 50 tasks (64.7 ± 0.1%), and the highest performance was recorded for 50 patrilines facing 10 tasks (99.0 ± 0.1%). The maximum task model did not differ significantly from the optimum task model (F (1,456) = 3.6, P = 0.058). Within the maximum task model, simulation scenarios were significantly different from each other (F (229,114770) = 2,528, P < 0.001) with post hoc tests revealing significant differences for most pairwise comparisons of different numbers of patrilines or tasks (Table 2). In general, colony performance increased with an increase in the number of patrilines and decreased with an increase in the number of tasks (r 2 = 0.29, F (2,114997) = 23,418, P < 0.001, βpatrilines = 0.51, βtasks = –0.26).

Table 2 Results of Dunnett’s T3 post hoc tests for differences between the different task/patriline scenarios in the maximum task model based on random subsets of 500 simulations for each scenario (only P values are given: above diagonal: task differences, below diagonal patriline differences)

Restricting the worker response thresholds to values close to the optimum increased colony performance overall in the optimum task model to an average of 96.1% (±0.2%), which was significant (F (1,451) = 110.6, P < 0.001). Overall, colony performance in the maximum task model with restricted response thresholds (92.6 ± 0.7%) was also significantly higher than in the original model (F (1,451) = 24.7, P < 0.001), although the effect was less pronounced.

Except for scenarios with few tasks (k ≤ 3), both model varieties showed a monotonic increase of colony performance with an increasing number of patrilines present in the colony (Fig. 2). However, the performance increase per additional patriline was decreasing with the number of patrilines in both models, indicating diminishing returns for multiple mating. These functions differed slightly between model versions, and changes in performance caused by additional patrilines were generally steeper for more complex task scenarios, and most increments were negligible when there were more patrilines present than tasks (p > k). Post hoc tests of patriline effect in subsets of the data with a constant number of tasks only revealed non-significant differences between model scenarios when p > k.

Fig. 2
figure 2

Simulation results of colony performance as a function of the number of patrilines given 1–15 tasks under the optimum trait model with a wide (a) and narrow (b) distribution of worker response thresholds and under the maximum trait model with a wide (c) and narrow (d) range of response thresholds. Figure legend in (a) applies to all figure panels

The optimum task models exhibited a cross-over in performance (Fig. 2a, b), meaning that colonies that contained only a few patrilines performed better when faced with fewer tasks (e.g. post hoc tests for the one patriline scenario showed that any colony with <6 tasks performed significantly better than any colony with >6 tasks; Table 1) and the performance of colonies with many patrilines increased with an increasing number of tasks (e.g. post hoc tests for the 15 patriline scenario showed that any colony with <6 tasks performed significantly worse than any colony with >6 tasks; Table 1). This interaction was also significant in the overall ANOVA (F (199,114770) = 287, P < 0.001) and apparent from the different slopes of the functions in Fig. 3a, b. In contrast, the maximum task models showed more parallel patterns with the exception of the one-task scenario that was indifferent to increases in the number of patrilines (Fig. 2c, d), which may account for the overall significant interaction term between patrilines and tasks in the overall ANOVA (F (199,114770) = 280, P < 0.001).

Fig. 3
figure 3

Simulation results of colony performance as a function of the number of tasks given 1–15 patrilines under the optimum trait model with a wide (a) and narrow (b) distribution of worker response thresholds and under the maximum trait model with a wide (c) and narrow (d) range of response thresholds. Figure legend in (a) applies to all figure panels

Colony performance slowly and steadily declined with task number in colonies with few patrilines in the optimum and the maximum task model (Fig. 3) but a performance maximum existed in all colonies with more than five patrilines in both model versions. In the optimum task model, these maxima increased with the number of patrilines present (Fig. 3a) but were always between four and six tasks in the maximum model for all P > 5 (Fig. 3b). The shape of these functions were not significantly changed in the repeat simulations with narrow response threshold distributions (Fig. 3b, d, respectively).

Discussion

Although well suited to illustrate the benefits of multiple mating for the efficiency of division of labour in social insects, few simulation studies have systematically explored the relationship between mating number, task number and colony performance (Oldroyd and Fewell, 2007). This study shows systematic relationships between these variables and generally supports the notion that more patrilines lead to a more efficient division of labour with diminishing returns. Our simulations also show that this conclusion is robust about the width of the distribution of response thresholds and whether maximum or optimum tasks are assumed. However, the average performance of colonies with a narrow distribution of response thresholds that are centered on the optimum may be higher on average than the performance of colonies with a wide distribution of response thresholds. In addition, the results suggest that the fitness benefit of multiple mating through enhanced division of labour may depend on the number of tasks.

Previous theoretical studies have found a beneficial effect of increased genetic diversity on homeostatic task performance (Page and Mitchell, 1998; Myerscough and Oldroyd, 2004) and our simulations broadly corroborate these results. This emphasizes the value of the earlier studies, even though they were based on very simple models. Although our results resemble those of earlier studies, the extension to more tasks and patrilines is important because complex systems such as insect colonies, may change abruptly in response to minor increases in complexity (Beekman et al., 2001). We increased systematically the complexity from the simplest possible case of a genetically homogeneous colony (one patriline) facing one task to scenarios that seemed to reflect the reality of some more complex social insect colonies with multiple tasks and patrilines. For both variables, the number of tasks and number of patrilines, the most effective changes were observed in the first third of the parameter space, which suggests that the coverage of the parameter space was sufficient. This conclusion was further supported by the few sample scenarios that we investigated with more extreme numbers of tasks and patrilines. The added complexity in those extreme scenarios did not result in new insights but increased computation time to a point that did not allow more scenarios to be tested.

Despite the relative complexity of our simulations, we made several significant simplifications. The most important one is the assumption that patriline alone determines the response threshold of workers. Age, experience, and environmental factors may affect response thresholds (Pankiw and Page, 1999, 2001; Pankiw et al., 2002; Weidenmuller, 2004) in addition to genotype and genetic variation among workers is not due to patrilineal variation alone. All these factors could have been incorporated to add inter-individual variation but had to be omitted to keep simulation times reasonable. In addition, none of the omitted factors is associated with the two investigated variables and hence should not bias the results. In our view, the omitted factors and an individual model could have lessened the deterministic nature of the model and therefore increased the variability but not the mean effect of the investigated variables. This also holds true for the other assumptions such as simultaneous task performance, perfect task level assessment, and equal representation of all patrilines.

As in previous studies (Page and Mitchell, 1998; Myerscough and Oldroyd, 2004), an increased efficiency of the colony to deal with a certain number of tasks with an increasing number of patrilines was found. This conclusion is supported by empirical studies (Mattila and Seeley, 2007). However, most of the empirical studies are based on necessarily restricted testing of two experimental conditions from either end of our simulated genetic diversity range (e.g. 1 and 15 patrilines: Mattila and Seeley, 2007) and no variation in task number. Our results predict that very similar fitness benefits should result from five and from fifteen patrilines, compared to one. Tests that incorporate a third group of modest polyandry have not been performed but seem important to resolve the unresolved debate whether the extremely high degree of polyandry is adaptive (Fuchs and Moritz, 1999) or not (Tarpy and Page, 2001).

Empirical studies on the fitness effect of task number seem less feasible. However, one possibility would be to compare colony fitness of colonies that experience either one large-scale experimental challenge or several different challenges of smaller scale. Another possibility would be a comparison of colony performances between a simple and a complex environment. The relation between fitness and task number is important because it may explain the prevalence of multiple mating in complex societies (Wiernasz et al., 2008; Oldroyd and Fewell, 2007; Kronauer et al., 2007) and it may also select for a certain degree of behavioural specialization.

As expected, colony performance declined with the number of tasks that genetically homogenous colonies were facing. Without sufficient genetic specialization, an increasing number of tasks remain unaccomplished with increasing task number, lowering colony performance. However, performance was lower for one or two tasks than for multiple tasks in genetically diverse colonies, which can only be explained by the larger effect of stochasticity in the scenarios with few tasks. Stochasticity effects are probably also the cause of the irregular behaviour of some of the functions but we ultimately do not know why these occur. Overall, our results suggest that too many tasks may decrease colony performance, regardless of the number of patrilines. The number of recognized acts or tasks in a behavioural repertory is somewhat determined by the scrutiny of the investigator and often exceeds 20 in a typical social insect (Hölldobler and Wilson, 1990). However, the tasks in this study are probably better understood as behavioural roles (sensu Oster and Wilson, 1978) due to the prevention of direct task switching. We found that colonies performed best or were not significantly different from the scenario with the best performance when four to six tasks were present. Although for honey bee workers only three distinct worker castes are recognized (Johnson, 2008a) the number of distinct behavioural roles, consisting of sets of interrelated tasks, in many other social insect species may be close to 4–6 (Oster and Wilson, 1978; Hölldobler and Wilson, 1990).

Larger colonies of social insects may face more tasks and display a more pronounced division of labour (Jeanson et al., 2007). Although the causal relationships of these associations are not clear, our simulations indicate how performance may be affected by increasing complexity when the number of patrilines matches the number of tasks. In both model versions, colonies with many patrilines facing many tasks outperformed colonies with few patrilines facing few tasks, which may be interpreted as an increase in robustness with increased system complexity. Model scenarios with few response thresholds and tasks suffer more from the stochastic perturbations and the random assignment of response threshold values (Rueppell et al., 2008). This effect is stronger for optimum tasks due to the potential for over-regulation: If too many workers perform the focal task, it may result in a fitness penalty because the actual task level is deviating from its optimum value in the opposite direction. This is not the case for maximum tasks, for which over-regulation has no negative fitness consequences. Consistent with this argument, we find that performance increases with increasing number of patrilines are more pronounced in the optimum task version than in the maximum task version of the model.

Previous studies make no explicit assumptions regarding the nature of the tasks or they used an optimum task scenario (Myerscough and Oldroyd, 2004). However, it is unclear whether an optimum or a maximum task scenario better describes the biological reality of social insect colonies. Many tasks may in fact represent intermediate cases between optimum and maximum tasks, although some tasks are strictly optimum tasks such as thermoregulation (Jones et al., 2004; Weidenmuller, 2004). For tasks that tend more towards maximum tasks such as brood care or nest maintenance, it could always be argued that a small fitness cost to over-performance exists due to time and energy allocation. Overall, the differences between the outcomes of the two model versions were minor and we conclude therefore, that the nature of the tasks is not critical for our and other studies’ conclusions.

The distribution of response thresholds is significant for the evaluation of response threshold models of division of labour (Beshers and Fewell, 2001) but it is not clear whether a broad or narrow distribution of response thresholds is favourable (Fuchs and Moritz, 1999). Our results suggest that colonies with response thresholds that are narrowly distributed around the optimum outperform colonies with a wider distribution of response thresholds on average. However, a more stochastic environment, including changing optimum values for task-associated variables over time, may invert this conclusion.

In sum, our simulations show that the previously demonstrated benefit of multiple mating for task performance systematically extends to more complicated scenarios than previously investigated, regardless of model version and distribution of worker response thresholds. In all cases, we find the theoretically expected diminishing returns of multiple mating (Fuchs and Moritz, 1999). Therefore, our results suggest that enhanced division of labour may have selected for multiple mating in social insects but they cannot adaptively explain the extreme mating numbers in some species (Wattanachaiyingcharoen et al., 2003; Kronauer et al., 2007), even in complex task scenarios.