Introduction

The reproductive division of labor is an essential feature of insect societies. In some species (such as honeybees and many ant species), a clear morpho-physiological differentiation exists between colony members, with few individuals having a reproductive phenotype, while the majority being sterile and committed to lifelong helping tasks. In a large majority of species, however, such differentiation between reproductive and nonreproductive individuals is weak or even absent (Wilson, 1971). In such cases, a large proportion of (or even all) females retains the ability to develop ovaries. Reproduction thus involves a relevant part of the population, with different individuals showing various levels of reproductive activity (Keller and Reeve, 1994; Sherman et al., 1995). In particular, each colony member may show different levels of investment in reproductive physiology (development of ovaries) and different levels of egg-laying. Estimating the reproductive activity level of each individual is a crucial task in the comprehension of many sociobiology topics, such as reproductive skew, intracolonial cooperation-conflict dynamics, and evaluation of fitness values of alternative strategies (e.g., Vehrencamp, 1983; Reeve and Nonacs, 1992; Ratnieks and Visscher, 1989; Keller and Reeve, 1999; West-Eberhard, 2003).

Several methods can be used to obtain a measure of reproductive activity and partitioning in social insects. Researchers can directly observe and record egg-laying activity of each female or individually assign the brood to colony members thanks to genetic analyses. Both methods are, however, quite demanding (in terms of time and money, respectively) and moreover, miss some aspects of reproduction: given the existence of egg destruction processes inside many social insect colonies (oophagy, i.e., Ratnieks and Visscher, 1989; Downing, 1991; Monnin and Peeters, 1997; Wenseleers and Ratnieks, 2006), genetic analyses give a picture of the final reproductive output (i.e., only those eggs that were not eaten by nest mates are genotyped); while direct observation measures the egg-laying rate, which does not necessarily correspond to the final reproductive output. Both methods, moreover, do not measure anything related to the physiological investment in reproduction. Two individuals could have the same investment in reproductive physiology (well developed ovaries), while reproducing at very different extents (e.g., Shukla et al., 2012). A very widely used measure which is able to show the investment in reproduction is the assessment of ovarian development. Ovarian index (O.I.) procedures measure how much the ovaries of a certain individual are developed by looking at different morphological features, such as number, size, and development of mature and immature oocytes. The calculation of ovarian indexes is pervasive in social insect studies. The advantage of the O.I. methodology lies in its immediacy: O.I. is usually easy to learn and perform, inexpensive (contrary to genetic analyses) and directly analyzable (contrary to video recordings of egg-laying activity).

The great use of O.I. by researchers led to the development of a surprisingly large quantity of different procedures. The features taken into account vary from qualitative characteristic (shape, color, approximate size; e.g., Michener et al., 1971; Cant et al., 2006) to quantitative traits (e.g., length and width of oocytes and eggs, total length of the ovaries, e.g., Röseler et al., 1984). Some indexes are compiled by classifying each sample into one of several subjective categories (e.g., Suzuki, 1985), while others are obtained using one or more measures to obtain a continuous index of ovarian development (e.g., Dietemann and Peeters, 2000; Markiewicz and O’Donnell, 2001). Recently, also a statistical approach which merges several qualitative and quantitative features has been proposed (Deshpande et al., 2006; Liebert and Starks, 2006). Even though such a great assortment can be expected since dozens of different social insect species have been investigated so far, it is nonetheless surprising that such a great variety can be found also when considering just one species, such as the social wasp Polistes dominula. During the almost 70 years of research on this model species, many different methods have been used to obtain indexes of ovarian development (such as Röseler et al., 1984; Suzuki, 1985; Strassmann et al., 2004; Liebert and Starks, 2006). As the protocols according to which each method is compiled are quite different from one another, the fundamental issue of the inter-method agreement arises. As far as we know, a direct comparison among methods to evaluate how much the results obtained according to different procedures can be compared has never been performed.

The assessment of ovarian development, in addition of being used to evaluate reproductive investment, can also be used to infer the egg-laying activity: individuals with more developed ovaries are also those who more likely have higher egg-laying rates (e.g., Ortius and Heinze, 1999). Not surprisingly, evidence supports the assumption that, on average, more developed ovaries are found in fertile and egg-laying individuals, while non egg layers or poor egg layers have only slightly or even undeveloped ovaries. Nonetheless, the real accuracy of ovarian indexes in estimating individual egg-laying rate is still unknown: notwithstanding the great use of O.I., a formal assessment of their reliability in estimating the real reproductive activity has never been performed.

In this paper, we used the paper wasp species Polistes dominula as a model organism to answer these two issues, which so far represented untested assumptions.

Materials and methods

The model species

Polistes dominula (sometimes misspelled as P. dominulus, Buck et al., 2008), is the most common species of the genus Polistes in temperate zones (Pardi, 1996; Cervo et al., 2000) and has become a key model species for the study of social evolution and reproductive skew over the last decades (1996). By being primitively eusocial and with a slight caste differentiation (all individuals are fully fertile, thus retaining the possibility to develop ovaries and lay eggs), this species offers the opportunity to compare reproductive skew among several individuals in each colony. A large variability in reproductive investment among individuals is present throughout the whole life-cycle, from the founding phase to the overwintering period (Pardi, 1948; Dapporto and Turillazzi, 2010; Dapporto et al., 2006) and rapid changes occur according to variation in the colonial social environment (e.g., breeder presence and/or fertility status, Strassmann et al., 2004; Liebig et al., 2005).

P. dominula has a typical Polistes colony cycle (see Reeve, 1991). Colonies are founded in spring by one or more mated overwintered foundresses (associative foundations), which exit from the diapause and found new nests (pre-emergence phase). In associative foundations, a linear hierarchy is soon established among the various females by agonistic interactions successively maintained by ritualized dominance behavior (Pardi, 1942, 1996). One of the foundresses becomes the dominant individual (called alpha female), monopolizing reproduction on the nest (more than 90 % of eggs are laid by the alpha female) (Queller et al., 2000). The remaining foundresses become subordinate, assume a worker-like behavior (foraging and provisioning) and show different degrees of reproductive investment according to their rank and the effort spent in colonial tasks (Pardi, 1948; Turillazzi, 1980; Dapporto and Turillazzi, 2010). At the end of May, the ‘worker phase’ begins with the emergence of the first brood (mostly females) which helps in the rearing of additional brood until the end of the season. Worker-reared brood will develop in male and female reproductives. In queenright colonies (composed by dozens to a few hundreds of wasps), workers have very poorly developed ovaries (even if a certain variability still exists among different individuals, Dapporto et al., 2005; Tibbetts et al., 2011). However, workers readily develop ovaries and lay eggs whenever an opportunity arises, such as in cases of disappearance of the breeder (Strassmann et al., 2004) or reduction of its fertility (Liebig et al., 2005). After mating, males die while mated females aggregate together in large clusters in sheltered places (Dapporto et al., 2006). The large and dynamic variability in reproductive skew in P. dominula at both the colonial and population level qualifies this species as a very suitable model to assess the reliability of the tools used for the assessment of reproductive skew.

Animal collection and rearing

Twelve colonies of P. dominula were collected in the surroundings of Florence (Italy) during Spring, before the worker emergence. They were transferred to the laboratory and reared in cubic glass cages of 15 cm side. Water, sugar, fly maggots, and pieces of filter paper for nest building were provided ad libitum. The experiment started in the postemergence period, when each colony had five workers at least. All wasps have been individually marked with Humbrol modellism colors on thorax and wings for individual identification.

Brood removal and behavioral recordings

In order to obtain a large sample of reproductive individuals, we induced the ovarian development of subordinate individuals (subordinate foundresses and workers) by removing each day, for 22 consecutive days (in July), about half of the immature brood on each nest. This treatment is known to evoke ovarian development and egg laying in both subordinate foundresses and workers (Liebig et al., 2005; Monnin et al., 2009).

Behavioral observation to record egg-laying activity started 13 days after the brood removal treatment. This time was necessary to allow wasps to detect the presence of empty cells and develop ovaries (ovarian development requires about 5–6 days in P. dominula, Monnin et al., 2009). Each colony has been observed for about 22 h (except from one colony that has been observed for about 14 h), recording the individual who was laying the eggs and the event duration. Egg laying was easy to recognize by the characteristic position of the individual, but it was not always possible to check for the presence of an egg inside the cell afterwards, i.e., for the efficacy of egg laying. We thus defined ovipositions as egg-laying postures that lasted 1 min or more (Monnin et al., 2009). Previous video recording data (Cini and Monnin, unpubl.) showed that the total number of ovipositions performed within 1 day is highly correlated with the number of ovipositions performed during the period from 10 am to 3 pm (Pearson correlation, r = 0.814, p < 0.0001, N = 52). We thus recorded ovipositions during this interval. All individuals were stored at −20 °C at the end of the observation period.

Choice of ovarian index methods

A bibliographic survey was carried out to identify a wide spectrum of different O.I. methods. We chose a subset of methods that comprises the most used, diverse, fast, and complex ones to provide a comparison as much general as possible. We focused only on quantitative O.I. methods (i.e., methods that provide a continuous variable as index) as they are frequently used and their outputs are easily treated statistically. The chosen methods are listed in Table 1, together with protocols and references. For some indexes, some variants with slight differences exist. Below, we provide a detailed overview of the methods and their possible variants.

Table 1 The five methods used for compiling the ovarian indexes (details in the text)

1. Length method The index based on the mean length of the six most developed oocytes/eggs could either take into account the longest or the largest oocytes/eggs (e.g., Cervo and Lorenzi, 1996; Sledge et al., 2004). Longest and largest oocytes/eggs are not necessarily the same (pers. observation). As length is more variable and thus more informative than width (coefficient of variation about 15 and 5 %, respectively), we chose to use the six longest oocytes/eggs as the most developed ones. The Length method has also been compiled by measuring the mean length of the six terminal oocytes/eggs (Pardi, 1948; Cervo and Turillazzi, 1989). While the terminal oocytes are usually also the longest ones, this is not always the case (pers. obs.). We did not include both methods, but rather chose the one based on the largest eggs, which is by far the most common. Indeed, these two indexes gave nearly identical results (correlation between O.I. according to the two methods: Pearson correlation, r = 0.98, p < 0.0001, N = 76; correlation between wasps ranks according to the two methods: Spearman correlation rs = 0.98, p < 0.0001, N = 76; mean difference in wasps ranks = 1.54 ± 3.05 positions).

2. Volume method Oocytes and eggs also show variability in their width. Le Lann et al. (2011) estimated the volume of each occyte/egg by considering each oocyte/egg as formed by two connected cones and calculating their volume as:\( V = \frac{2}{3}*\pi *\left( {\frac{L}{2}*\left( \frac{W}{2} \right)^{2} } \right) \); where L stands for oocyte/egg length and W for width. We compiled this index by calculating the mean volume of the six longest oocytes/eggs.

3. Area method Some O.I.s take into account length and width to calculate the area of oocytes/eggs. For example, Foster et al. (2004) used length and width to calculate the area of the two largest eggs. We modified this index by directly measuring the area of the six longest oocytes (through ImageJ software) and taking the mean area as O.I. value.

4. PCA method Quantitative methods usually do not focus on the presence of yolk in oocytes, a feature which is, however, critical in differentiating among reproductive and non reproductive individuals and which is usually taken into account in categorical methods. We thus compiled also a recently developed quantitative ovarian index method which merges the two approaches, taking into account descriptive features, such as yolk presence and oocytes number, as well as oocytes/eggs measurements (Liebert and Starks, 2006).

5. Yolk ratio method We introduced this new additional method as it can be very quickly compiled. It is calculated as the ratio between the vitellarium (i.e., the proportion of the ovary length which is filled with yolk, see Online Resource 1) and the total ovary length.

Dissection and morphological measures

Seventy-nine wasps were dissected under a stereomicroscope at 25× magnification in tap water. We dissected all wasps seen laying at least once (N = 42) and a random subset of those never seen laying (N = 34). This sample is representative of all colonies (from 4 to 11 wasps dissected per colony). Only marked wasps were considered to avoid any just emerged individual. O.I. was not compiled for three specimens since they were not correctly preserved.

For each dissected wasp, the following measures were taken: length of the longest ovariole in the ovaries (from the beginning of the germarium to the beginning of the oviduct), length of the vitellarium inside the most developed ovariole, length and width of the six longest mature/immature oocytes, area of the six longest mature/immature oocytes, number of oocytes, and number of mature oocytes (for details about the measurements see Online resource 1).

Statistical analyses

Ovarian index methods comparison

Pearson correlation coefficients were used to compare scores of O.I. obtained with different methods. We performed correlation analysis both for raw values and for rank-transformed values (individual ranks were calculated on the basis of each method), as ranks give the relative position of each wasp in the population thus being directly comparable among methods. Moreover, mean of the difference between the ranks assigned to wasps by every pair of methods was calculated to give an average measure of the discrepancy between wasp ovarian development among different methods. Friedman non parametric test for paired data, followed by post hoc tests using Wilcoxon test with Bonferroni correction, was used to compare these distributions. Finally, Euclidean distance matrices on the basis of wasp ranking, one for each pair of methods, were created to better grasp the global accordance in the relative position of each wasp in the whole population. Mantel tests were used to compare these distance matrices. As nest membership did not influenced any of the variables (GML, nest as random factor, p > 0.02 in all cases), we pooled all individuals.

Ovarian indexes and egg-laying activity

In order to evaluate whether any of the methods is a reliable indicator of egg-laying activity, we carried out the same set of analyses used for inter-methods comparison (see above). Differences between the correlation coefficients among each pair of O.I. and egg-laying activity were tested using Steiger’s Z test (Steiger, 1980) for r coefficients calculated on raw scores, rank scores, and on rank matrices. Significance threshold was put at 0.005 because of multiple comparisons according to Bonferroni procedure. Finally, to visualize the similarity of the results of the different O.I. methods and egg laying ranking, we performed a Multidimensional scaling on the Euclidean distance matrices created from rank data, using the SPSS Proxscal MDS procedure (initial condition: simplex).

Colony-level analysis

We explored whether O.I. methods could reliably predict the main breeders in each colony. We compared the ranks assigned to each wasp on the basis of each O.I. method and of the wasp egg-laying activity (ranking wasps separately for each colony). We counted the number of colonies in which (a) the main breeder or (b) the first two breeders were correctly classified on the basis of their O.I. according to different methods.

Statistical analyses were performed with the statistical package SPSS 20 and FZT software (http://psych.unl.edu/psycrs/statpage/FZT_backup.exe). All means are quoted ± standard deviation.

Results

Colony demography and wasp egg-laying activity

One hundred and seventy-nine acts of oviposition performed by 42 different wasps were recorded during the observation time. An average of 6.46 ± 11.5 egg-laying acts was observed for each colony. Egg laying acts were performed by 4.17 ± 3.21 wasps per nest (range 1–11 wasps), equal to the 22 ± 15 % of the colony individuals. The egg-laying rate among those wasps that laid at least one egg varied among 0.04–0.66 eggs per hour, with a median value of 0.11. At the end of the experiment, colonies had on average 68.7 ± 18.3 cells and 14.9 ± 7.1 females.

Ovarian index methods comparison

The O.I. values for each wasp calculated with different methods were all highly positively correlated, both considering the raw value and the rank assigned to each wasp according to each method (correlation coefficients range from 0.51 to 0.95; statistical details in Table 2). When comparing directly the rank of each individual, two different methods yield on average a difference of 7.38 ± 3.06 positions per wasp (Table 2). An example of the relationship between the ranks assigned by two different methods is shown in Fig. 1a (scatter plots for all other possible pairs are given in Online Resource 1).

Table 2 Accordance among different ovarian index methods
Fig. 1
figure 1

Scatter plot of the relationships between individual wasp rank calculated according to a two different ovarian index methods (x-axis PCA method, y-axis length method) and b an ovarian index method (x-axis length method) and egg-laying activity (y-axis). This picture well represents the other pair-wise relationships among ovarian index methods and among ovarian index methods and egg-laying activity, respectively, which are provided in the Online Resource 1

The matrices of wasps ranks calculated according to different methods were all positively correlated, with correlation coefficients ranging from 0.559 to 0.920 (Table 2).

Although all tests were highly positively correlated, some were more congruent than others (Online Resource 1). Indeed, Friedman test revealed a significant effect of the methods pair on discrepancy (χ 2 = 119.23, df = 9, p < 0.01). Among all methods, Yolk ratio is the one which shows more discrepancy with the others methods results (Yolk ratio vs each of the other methods: always p < 0.05, except from the comparison between Volume–PCA and Area–Yolk ratio discrepancies, p = 0.230; comparisons among other methods, always not significant p > 0.05, see Table in Online Resource 1). This picture is corroborated by Multidimensional scaling (Fig. 2).

Fig. 2
figure 2

Multidimensional scaling of Euclidean distances calculated on wasps ranking according to each of the five ovarian index methods and egg-laying activity; symbols represent egg laying (star) and the different methods (cross length, pentagon volume, rhombus area, square PCA, circle Yolk Ratio)

Ovarian indexes and egg-laying activity

Overall, O.I. values calculated with different methods were all positively correlated with the egg-laying activity (Table 3). Wasps ranks calculated according to each ovarian index method and to egg-laying activity were also significantly positively correlated (Table 3). However, the correlation coefficients were significantly smaller than those among different methods: the highest correlation coefficient for a method × egglaying pair is significantly lower than the lowest correlation coefficients for a method × method pair (Fisher Z test, Z = 3.06, p < 0.01). Indeed, the variance in egg laying explained by ovarian index measures ranged only from 0.13 to 0.33, while, when comparing O.I. methods among them, it ranged from 0.51 to 0.94 (Table 2). The matrices of wasps ranks calculated according to O.I. methods and egg-laying activity were all positively correlated, and correlation coefficients ranged from 0.242 to 0.278 (Table 2), being thus not as high as for inter-methods comparison (Fisher Z test, Z = 2.59, p < 0.05). In addition, the average error of O.I. methods in estimating the egg-laying rank of a wasp is 16.27 ± 3.98 positions per wasp (Table 3, Fig. S3). Friedman test revealed a significant effect of the method considered on error value (χ 2 = 11,733, df = 4, p = 0.019): pair-wise post hoc comparisons showed that a significant difference only exists between the performance of Volume method and that of Yolk method (T = 0.816, p = 0.015, all other comparisons p > 0.05, see figure S3 and Table S2). Correlation coefficients comparison generally confirmed these results: none of the methods had higher correlation coefficient with egg-laying activity than the others, no matter which approach was considered (raw scores, ranks, or matrices of ranks). Only Yolk method proved to be significantly less correlated (only for raw values) with egg-laying activity when compared to PCA and Volume methods, but not when compared to Length and Area method (Online Resource 1). Finally, the relative similarity among different methods and their very limited accordance with egg laying data is also confirmed by Multidimensional scaling (Fig. 2), which clustered together the O.I. methods (in particular separating the Yolk Ratio method from the others) and put these methods quite far from egg-laying activity. The very low MDS stress value (0.0061) and the high value of the dispersion accounted for (99.98 %) suggest that the procedure provides an excellent reflection of the data structure (Clarke and Warwick, 2001).

Table 3 Accordance among ovarian index and egg-laying activity

Colony-level analysis

Analysis at the colonial level showed that O.I. poorly predicts the main breeders rank inside each colony. On the basis of the different O.I. methods, only in 6–9 colonies out of the 12 colonies the main breeder was correctly identified, while the percentage of correct matches was even lower when considering the first two breeders (from 2 to 4 correctly classified colonies out of 9 examined colonies).

Discussion

Ovarian index methods comparison

Our results show that all the investigated methods provide very similar scores for the ovarian index. The inter-method accordance is confirmed by all the different analyses we performed. Indeed, we found a globally high positive correlation among all pairs of different O.I. methods. The relative distances among wasp’s O.I.s were positively correlated among all pairs of methods, thus suggesting that different methods provide both similar classification of individual position and similar landscapes of individual reproductive investment.

On average, two different methods provide a difference in individual wasp rank of about 7.38 ± 2.90 positions. It is quite predictable to find differences among methods, as different traits are taken into consideration in each index. In particular, we found that the yolk ratio method gives the most discordant results with all the other methods. This can be expected considering that the Length, Area, Volume and PCA methods are based on measures related to the six largest oocytes, while the Yolk ratio method instead considers entire ovariole properties. At the same time, it is also easy to understand that these differences will be anyway rather limited, as physiological constraints likely make the development of all traits interdependent.

Given that different methods yield similar results, we suggest that the choice of the method to evaluate reproductive investment can be mainly driven by personal preferences, especially taking into consideration the feasibility and the quickness of each method as well as the precision the study wants to attain (misplacing an individual of 7 position out of 70 could be acceptable for some purposes but not for others). The wide use of the same standardized methods would nonetheless facilitate future comparisons among published results and extensive meta-analyses.

Ovarian indexes and egg-laying activity

Our study shows that none of the examined ovarian indexes is highly correlated with the real egg-laying activity of the individual. Overall, correlation coefficients are very low (under 0.573), thus explaining a very limited part of the variance in egg laying (at best the 32 %). Indeed, the gap between the rank a wasp is assigned on the basis of its ovarian index (however, it is calculated) and the rank assigned on the basis of its real egg-laying rate is on average quite large, being approximately 16 positions on a total of 76 wasp in the population. None of the O.I. methods seems better than the others.

An average error of 16 positions is surely too much for any kind of study dealing with wasp reproductive activity, thus suggesting that O.I. (however, calculated) is a very poor estimator of the reproductive activity of a wasps. O.I. could be useful in classifying non reproductive, poorly reproductive, and highly reproductive wasps (a distinction easily produced by descriptive methods which do not require extensive measurements), but it is surely misleading in studies searching for fine reproductive differences. For example, considering every pair of wasps in our dataset for which the O.I. of the first wasp was greater than the O.I. of the second wasp (Length method, O.I. greater than zero for both wasps), the wasp with more developed ovaries was also the one laying more eggs only in about the 55 % of the cases. This finding is also confirmed by the very poor predictive power of O.I. methods at the colonial level, where the main breeder was not correctly inferred in one colony every three.

The mismatches between O.I. and egg-laying activity are not so surprising, and it can be easy explained by the dominance/reproduction system in Polistes wasps. At the very beginning of their cohabitation, co-foundresses of P. dominula can have similar reproductive potential, but only one of them (the alpha female) has a relevant egg-laying activity (Pardi, 1942; Queller et al., 2000; Monnin et al., 2009). This high skew in egg-laying activity despite lower skew in reproductive investment is a common trait in primitively eusocial insect species, where reproductive flexibility allows for development of ovaries in many colony members, while egg oophagy assures the main breeder monopoly over reproduction (Fletcher and Ross, 1985).

Conclusion

On the basis of our results, we propose to be very careful in extrapolating egg-laying activity levels from O.I. estimations and that the three different facets of reproductive skew—reproductive investment, egg laying and reproductive output—should be evaluated using different procedures (Table 4). O.I.s allow a good estimation of the reproductive investment of an individual, while direct observation of egg-laying activity and brood genotyping, respectively, allow the measurement of oviposition activity and reproductive output.

Table 4 Identification of the most suitable method for the estimate of different aspects of reproductive skew inside a social insect colony

Although our study was performed on only one species, we believe our results can be an incentive to address the same issue in other species of the same or different taxa. As Polistes paper wasps and P. dominula, in particular, have represented in the past good models to tackle several topics such as insect sociality, communication, and physiology (Turillazzi and West-Eberhard, 1996), we feel that the results we found in P. dominula provide a valuable first step for this methodological investigation in many pother systems as well.