Introduction

Nearly three decades ago, Hamilton and Zuk (hereafter H&Z) laid the foundation for the theory of parasite-mediated sexual selection (Hamilton and Zuk 1982). They postulated that male traits evolve to signal parasite resistance to females, and therefore, species under strong selection due to pathogens should have evolved males with more extravagant secondary sexual characters than species experiencing weak selection. To test their hypothesis, H&Z showed an interspecific relationship between the prevalence of blood parasites and features of male display in birds. Subsequently, the theory had an exceptionally strong impact on evolutionary biology, and it currently influences how we think about the evolution of sexually selected traits (their influential paper has received more than 1,700 citations so far (see also Milinski 2001).

However, despite the strong theoretical argument by the authors, follow-up studies questioned the validity of empirical support that H&Z obtained based on comparative data (e.g., Read and Harvey 1989; Read and Weary 1990; Johnson 1991; John 1995; Yezerinac and Weatherhead 1995). These studies re-examined the interspecific relationship between the prevalence of avian hematozoa and male display traits by correcting for several confounding factors that might affect the relationship in focus. These factors included the phylogenetic association of species, different ecological variables, and the subjective scoring of male traits. A control for these confounding factors is important because they can lead to spurious correlation between the estimates of parasitism and male conspicuousness if the focal variables vary non-randomly with respect to the confounding variable. For example, the relationship between prevalence and estimates of song complexity and continuity is diminished when accounting for the phylogeny of birds (Read and Weary 1990). This finding indicates that heavily parasitized species tend to have elaborate songs most likely because they share a common ancestor with these properties and not because song and prevalence are causally related. A similar phenomenon was observed for plumage coloration, which also showed weaker relationships with parasitism when phylogenetic associations were taken into account (Johnson 1991; Underhill and Kaleita-Summers 1995). Moreover, others demonstrated that interspecific patterns of feather coloration can be weakened by the sensitivity of color scoring to the subjectivity of observers (Read and Harvey 1989) or by an appropriate control for the effect of an interacting third variable (Johnson 1991; John 1995; Yezerinac and Weatherhead 1995). Interestingly, there are studies that reported a significantly positive association between prevalence of hematozoa and plumage colorfulness even when behavioral and ecological confounding factors and phylogeny were accounted for (Read 1987). As a result, the role of parasites mediating sexual selection in birds is less than evident (e.g., Cox 1989; Endler and Lyles 1989; Møller 1990; Clayton et al. 1992). Note that most of the papers above were published more than 15 years ago, and subsequently, the H&Z hypothesis has seen little test in a comparative context (Møller et al. 2000).

Most of the discussions emerged in the light of statistical advancements that occurred in the nineties (e.g., standard phylogenetic tools became adopted at that time), and methodological issues had a strong influence on our understanding of the role of parasites mediating the evolution of sexual characters. Since then, the study of sexual traits as well as of parasitism has improved significantly, while the comparative toolbox also has developed. In this paper, we focus on three major advancements that may have consequences for assessing the relationship between avian hematozoan parasites and the expression of sexual traits in males.

First, recent progress in the field of molecular ecology now allows determination of parasite infection rates routinely by means of PCR-based protocols (Weiss 1995; McManus and Bowles 1996). This approach has been suggested to perform more sensitively than the classical microscopy-based methods because PCR amplification of target sequences is more efficient in detecting parasites at low intensity of infection than visual screening (Jarvi et al. 2002; Richard et al. 2002; Waldenström et al. 2004). The better performance of molecular screening compared with microscopic analysis may be expected when the latter is done by inexperienced observers or relies on the assessment of infection from an insufficient number of visual fields (Valkiūnas et al. 2008). In fact, there is considerable variation between microscopic studies in these respects, as there is no standard criterion applied for the minimal number of visual fields to be screened (Valkiūnas 2005). Accordingly, PCR-based detection methods can reveal positive infection by Plasmodium in species in which classical screening methods failed to find evidence of infection (Garamszegi 2010). However, the downside of the higher efficiency of PCR tools is that they can provide both false negatives and positives and perform poorly to detect multistrain infection when these methods are used improperly (Freed and Cann 2003, 2006; Cosgrove et al. 2006; Valkiūnas et al. 2006). Although there is no unequivocal support yet available for either method, detected prevalence might be affected by method sensitivity, which is likely to differ between molecular and microscopic studies (Richard et al. 2002; Fallon and Ricklefs 2008; Valkiūnas et al. 2008; Garamszegi 2010). Moreover, data for prevalence of blood parasites from different microscopic studies may be subject to different sensitivity thresholds, and thus, combined data from these studies may cause bias.

Second, the assessment of bird coloration has also improved, as portable spectrophotometers are now widely available to collect quantitative and objective data on reflectance in a standardized fashion (Montgomerie 2006). Objectivity is important because relying on human vision to evaluate bird colors may be misleading due to the differences between avian and human visual systems (Bennett et al. 1994). For example, while human eyes operate with three-color cones, birds use four single cone types and a double cone type that permit them to see in the ultraviolet (UV) and the near-UV part of the light spectrum (Bowmaker et al. 1997; Andersson et al. 1998; Hart et al. 2000). Moreover, in the avian cones, oil droplets filter the incoming wavelengths, which results in a narrowed spectral sensitivity of the cones and a sharpened color discrimination (Vorobyev 2003; Hart and Vorobyev 2005). Therefore, some aspects of avian coloration may be hidden from the human eye, and it is highly likely that birds perceive colors differently than human observers. As such, the use of subjective scores may fail to capture differences between species that are beyond our visual capacity (Eaton 2005). In particular, UV colors, to which humans are blind, may play an important function in sexual selection, as many feathers that are used in courtship displays reflect wavelengths in the UV spectrum (Hausmann et al. 2003; Håstad et al. 2005). This confounding phenomenon remained completely disregarded in the classical tests of the H&Z hypothesis.

Third, statistical approaches that account for shared ancestry of species and thus control for non-independence of data have also seen fundamental advancements in recent years (Felsenstein 2004; Garland et al. 2005; Freckleton 2009). Classical studies relied on independent contrasts, which fully and automatically remove the phylogenetic effects that confound the interspecific data. This data transformation seems inevitable, since the source data is structured by the phylogenetic relationship of species, which violates assumptions about statistical independence (Felsenstein 1985; Gittleman and Kot 1990; Harvey 2000). However, the contrast method inherently assumes that the actual data are necessarily biased and that phylogenetically corrected analyses provide superior results than tests that do not consider phylogenetic inertia. Removing effects caused by phylogeny can reduce the variance of the estimated regression or correlation coefficients, but without empirical support, it does not guarantee that the estimate that is corrected for phylogeny will be better than another estimate from the ordinary analyses that ignore phylogenetic relationships (Rohlf 2006). Phylogenetically corrected results should be preferred only if the data are truly structured by phylogenetic relationships, which is a question of observation within the data at hand (Abouheif 1999). To avoid phylogenetic overcorrection, recent phylogenetic approaches can adjust for phylogenetic effects to the degree to which the available data require (Freckleton et al. 2002).

Another currently emerging issue in the comparative literature is that not only the phylogenetic signal but also within-species variation and heterogeneity in sampling effort can bias the interspecific patterns (Ives et al. 2007; Felsenstein 2008; Garamszegi and Møller 2010). Intraspecific variation may appear due to true differences between populations or individuals or to measurement errors, which can pose two problems at the level of analysis. First, when data are structured by phylogenetic relationships, within-species variation can lead to biased estimation of biological effects, which can be particularly misleading if several species are involved in the comparative study (Ives et al. 2007). Second, low within-species repeatability can pose problems in terms of heterogeneity in data quality if within-species sample sizes (i.e., research effort) vary considerably among species (Garamszegi and Møller 2010). This can have consequences for the precision of parameter estimates, such as phylogenetic correlations or slopes. In fact, having large variation between species in terms of study effort violates the statistical assumption that each data point provides equally precise information about the deterministic part of the total process variation, i.e., the standard deviation of the error term is constant over all values of the predictor or explanatory variables (Sokal & Rohlf 1995). This problem is likely prevailing in the comparative study of parasite prevalence because estimates in different species correspond to vastly different within-species sample sizes (Jovani and Tella 2006). Although correction for phylogenetic effects has received considerable emphasis, error terms caused by imbalanced within-species sampling are neglected and have rarely been considered when testing the H&Z hypothesis using prevalence data (John 1995; Yezerinac and Weatherhead 1995).

The goal of this study was to re-examine the interspecific relationship between the prevalence of blood parasites and the expression of secondary sexual traits in birds by implementing the above-discussed methodological advancements. Accordingly, we considered differences between prevalence data obtained through microscopic and molecular tools and performed phylogenetic analyses separately for the two types of data. Furthermore, to avoid reliance on subjective human scores, we used quantitative estimates of both acoustic and visual signals of males. We adopted a perspective that better reflects avian signal perception for our measure of plumage coloration, as it was based on the spectrophotometric estimate of sexual dichromatism that also includes UV ranges. Finally, applying modern phylogenetic comparative approaches, we equally considered correction factors for phylogenetic inertia and heterogeneity in sampling effort. Prior to the analyses, we calculated repeatability of parasite prevalence in order to assess the role of within-species variation that can blur interspecific patterns. In light of the H&Z hypothesis, we predicted that the across species relationship between the prevalence of blood parasites and the degree of expression of male traits would be generally present even when the above potential confounders are taken into account. For these tests, we relied on a newly compiled interspecific dataset, which covered a large number of bird species from all over the world.

Materials and methods

Data set

Prevalence

Using keyword and cross-reference searches in the major scientific search engines, such as Web of Science, Google Scholar, and PubMed, information was gathered on infection by any of the four main avian hemo-protozoans (Haemoproteus, Plasmodium, Trypanosoma, and Leucocytozoon) as revealed by either molecular or microscopic analyses of blood samples. Studies that targeted fewer than three host species were disregarded because such studies would have caused a negligible increase in our sample size and might have introduced some potential bias as selecting few focal species based on a particular purpose in a study can lead to non-random sampling in our comparative context. We assumed that studies presenting information on a large sample of birds represent unbiased sources because host sampling in these cases was not particularly focused on areas, where hematozoa infection is established. This assumption is supported by the fact that most large-scale studies report a long list of species without infection. From each source, data were derived on prevalence for each host species and for each parasite genus (number of individuals found to be infected/number of individuals screened), sample size (number of individuals screened), and main screening method (PCR or microscopy). Bird species that were screened but found uninfected were included with zero prevalence. We used the raw dataset with multiple information per species to assess the importance of within-species variation in prevalence of blood parasites (see the calculation of repeatabilities below). For the interspecific tests, we calculated species-specific prevalence (for both microscopic and PCR methods separately) by combining sample sizes and occurrences of infections for the same host species across studies. Altogether, we found information on 3,542 bird species, which originated from the microscopic or PCR screenings of 179,452 and 8,809 individuals, respectively. From this list, information necessary for unraveling phylogenetic associations was available for 1,955 species, which could be used in comparative studies. Species with a known phylogenetic position have similar prevalence as species that cannot be placed in the phylogenetic tree (effect sizes for the difference in prevalence between the two groups of species: Haemoproteus, r = 0.002, 95% CI = −0.035/0.039; Plasmodium, r = 0.039, 95% CI = 0.002/0.076; Trypanosoma, r = 0.122, 95% CI = 0.085/0.159; Leucocytozoon, r = 0.098, 95% CI = 0.061/0.135). Therefore, focusing on species for which phylogenetic information is available does not cause considerable bias. Sources of the prevalence dataset can be found in Garamszegi (2010, 2011). Sample sizes for this study were further limited by the availability of song and color data on males (see sample sizes in Tables 1 and 2).

Table 1 Interspecific relationships between prevalence of four blood parasite genera and measures of song complexity and song output in birds
Table 2 Interspecific relationships between prevalence of four blood parasite genera and sexual dichromatism when relying on the avian vision model and when using both microscopically and molecularly derived parasite data

Song variables

We used four variables to represent song output and song complexity, following the definitions in Read and Weary (1992). Song output was measured as song duration (seconds) and intersong interval (seconds). We did not derive song continuity and song rate as defined by Read and Weary (1992) because they are directly calculated from song duration and intersong interval. Between-song complexity was estimated as song type repertoire size (number of different song types in an individual’s repertoire). We used syllable repertoire size (number of different syllable types within a song) to reflect within-song complexity. For species with extreme song complexity, general sources provide near-infinite data on repertoire size because it is difficult to estimate. In such cases, syllables are combined randomly and a particular combination occurs rarely, leading to the detection of numerous song types. Following common practice, for species with infinitely large song type repertoire size (>100), we assigned an arbitrary value of 1,000 (Read and Weary 1992).

We used song data from Read and Weary (1992), supplemented with information from other sources (listed in Møller et al. 2006; Garamszegi et al. 2007a). We could locate information on song for 189 species for which we also had phylogenetic and prevalence data based on microscopic assays. We did not match song data with PCR-based prevalence data because there were fewer than 20 overlapping species in these datasets, and such low sample sizes are inadvisable for phylogenetic comparative studies (Freckleton et al. 2002) and would provide effect sizes with very broad (r > 0.8) confidence intervals that are hard to interpret.

Plumage coloration

We extracted data from Armenta et al. (2008), which provided data for a large number of species on sexual dichromatism based on spectrophotometric color measurements that incorporated the full visual spectrum of birds (including UV). This source gave three independent (but strongly correlated) estimates, from which we focused on the receptor noise-limited model of color discriminability because it corresponds the best to the perspective of the biological receiver of the signal. This measure reflects the quantum catch of the photoreceptors in the eye following the model of Vorobyev et al. (1998). The model considers the spectral sensitivity of different cones as caused by the oil droplets, the reflectance of the sample and the background, the irradiance spectrum of ambient light, and photoreceptor noise. Armenta et al. (2008) calculated discriminability for different body regions by comparing the difference between the output of cone receptors when stimulated by the male color and the output under the stimulus of female color, which were subsequently summed over different body regions regardless of size.

We could derive color discriminability for 467 bird species in our parasite dataset, which also included PCR-based prevalence data for 104 species.

Color discriminability refers to the color difference between males and females (i.e., sexual dichromatism), while the predictions of H&Z particularly apply to male coloration. This is because species that experience strong selection pressure on female cryptic coloration may appear dichromatic independent of factors affecting male coloration (e.g., Hofmann et al. 2008a, b). Unfortunately, sex-specific data are not accessible from Armenta et al. (2008) even upon request. Therefore, we must rely on the general assumption that dichromatism reflects the intensity of selection acting on male coloration. To test this assumption, for a subset of birds (N = 206), one of us (APM) calculated male colorfulness scores along a six-scale spectrum following the original descriptions of H&Z and using standard handbooks and field guides without seeing color discriminability values. We note that H&Z originally termed this variable “brightness,” but we avoid applying this terminology to human scores of colorfulness because in the recent coloration literature, the term is used to describe total reflectance. Relying on the illustrations of males, plumage coloration was scored simply with respect to all forms of coloration with 1 being very dull and 6 very striking. Interspecific variation in plumage coloration visually assessed from handbooks and field guides have been repeatedly found to be repeatable and to show biologically relevant patterns in relation to other variables (Møller and Birkhead 1994; see Bennett and Owens 2002). We related these sex-specific estimates to color discriminability based on spectrometric data incorporating bird vision. Taking into account the potential errors around the subjective scoring of male coloration (see “Introduction”), we infer that the correlation between the two estimates was sufficiently high to fulfill our assumption (r = 0.657, 95% CI = 0.572/0.728). However, the considerable amount of unexplained variance suggests that the results in association with color discriminability should be interpreted with caution.

The data used in the interspecific analyses are presented as an electronic supplementary material (Table S1).

Statistical approaches

Variables were transformed for the analyses (prevalence was arcsine-square-root; song variables were log10-transformed), but in some cases, the criterion for normal distribution was violated, e.g., due to zero-inflation or outliers. We have considered these confounding factors in the subsequent analyses (see below).

Before testing our main predictions, we used the raw data that hold information on within-species variation to calculate the repeatability of prevalence traits to test if the weighted mean prevalence values calculated at the species level are biologically informative. Repeatabilities (R) were estimated for each parasite genus as \( R = \sigma_{\alpha }^2/\left( {\sigma_{\alpha }^2 + \sigma_{\varepsilon }^2} \right) \), where \( \sigma_{\alpha }^2 \) is the group (i.e., species) variance and \( \sigma_{\varepsilon }^2 \) is the residual variance. These variance components were obtained from a linear mixed effects model that relied on restricted maximum-likelihood for parameter estimation and used species identity as a random factor (Nakagawa and Schielzeth 2010). Given that detected prevalence may be different when using microscopic and PCR tools, we introduced the underlying methodology (PCR or microscopy) as an additional fixed factor to the model. We also introduced the continent of origin (Africa, Asia, Australia, Europe, North America, or South America) as another fixed factor. To deal with heterogeneity in sampling effort, we also applied statistical weights in the form of log10-number of individuals. To obtain 95% confidence intervals of the repeatability estimates, we used the bootstrap procedure recommended by Nakagawa and Schielzeth (2010).

We used the species-specific estimates to test the predictions of the H&Z hypothesis. Since the high proportion of species with zero prevalence caused zero-inflation, we sorted the prevalence data according to prevalence transitions, as follows. First, we introduced a bivariate state variable to describe if prevalence differs from zero or not. Second, within the subset of species with non-zero prevalence, we used the continuous estimate of prevalence. We tested the predictions with both the bivariate and continuous estimates in separate analyses reflecting different evolutionary transitions (i.e., obtaining infection vs. increasing existing infection rates). In addition, we also tested the predictions by using rank-transformed variables to deal with the potentially confounding effect of outliers or non-normal distributions. The analysis of the ranked data provided us with qualitatively similar results to those we report below based on parametric transformations. Effect sizes for the focal relationships in Tables 1 and 2 were highly positively correlated between data transformation methods (r = 0.967, 95% CI = 0.917/0.987). Therefore, for simplicity, we do not provide results that we obtained by non-parametric transformations.

We estimated the phylogenetically adjusted interspecific associations between the prevalence of different blood parasites as predictors and sexual traits in a pair-wise design. We calculated these associations by comparing the expression of male traits between species with zero and non-zero prevalence and also by correlating observed prevalences with estimates of the strength of sexual selection only in species with non-zero prevalence. In this study, we did not attempt to assess the role of confounding co-variates, as our ultimate goal was to assess whether the relationship as proposed by H&Z can be observed in the improved data and by applying enhanced methodology.

For the phylogenetic modeling, we used phylogenetic generalized least square (PGLS) models, which incorporate a matrix of the expected covariances among species based on likelihood ratio statistics (Martins and Hansen 1997; Pagel 1999). This method enabled us to estimate the importance of phylogenetic corrections by calculating the phylogenetic scaling parameter lambda (λ), which varies between 0 (phylogenetic independence) and 1 (trait evolution corresponds to a Brownian motion model) (Freckleton et al. 2002). We conducted all analyses by setting the index of phylogenetic dependence (λ) to the value that was the most appropriate for given model (i.e., provided the highest maximum likelihood).

We observed huge variation among studies and species in terms of the number of individuals sampled (range, 1–5,607), and considerable variation within species (see “Results”) necessitating statistical approaches that can balance heterogeneity in sampling effort. Therefore, we also investigated more complex phylogenetic models that incorporated a weighting factor for the number of individuals studied, thus considering heterogeneities in sampling effort associated with the species-specific estimates of prevalence. In this PGLS matrix framework, we combined variance factors due to phylogenetic and weight effects as error terms in the form of the following equation Q = V + W, where V is the phylogeny matrix, W is the diagonal matrix of 1/weights scaled to the range of values in V (Martins and Hansen 1997). We used log10-transformed sample sizes as weights, which particularly de-emphasizes observations with small sample sizes. In this modeling, we used the maximum likelihood estimate of λ from the phylogenetic model of the correlated evolution, which did not yet include weight effects.

The PGLS analyses were performed in the R (version 1.35) statistical environment (R Development Core Team 2007), with additional phylogenetic functions included in the package caic. The phylogeny used in these calculations originated from the consensus tree of the bird supertree project (Davis 2008), which assembles numerous smaller phylogenies into a large-scale avian phylogeny. Since source phylogenies rely on different branch length resolutions, the supertree lacks information on branch lengths. Therefore, we set them to be equal, which is equivalent to an assumption of a punctuated model of evolution.

Although we could obtain a large sample size that result in high statistical power, we followed recent statistical recommendations that shift the focus from significance levels to effect sizes, as null-hypothesis testing puts too much emphasis on significance levels (Nakagawa and Cuthill 2007). This would be particularly dangerous at large sample sizes because small effects would easily appear statistically significant even if they are biologically irrelevant (Freckleton 2009). Accordingly, we present effect sizes (such as Pearson’s product–moment correlation coefficients as an estimate of effect size “r” sensu Cohen 1988) and the associated 95% confidence intervals (95% CI) that were calculated from the most appropriate phylogenetic model. Hence, for our interpretations, we focus on the strength of biological effects (small ≥ 0.1, medium ≥ 0.3, and large ≥ 0.5 according to Cohen 1988, which roughly correspond to ≥1%, ≥10%, and ≥25% explanatory power, respectively) and the precision with which they can be measured from the available data. We provide P values for the interested readers but avoid emphasizing the statistical significance of the results. We compare mean effect sizes between prevalence-transition groups (zero to non-zero prevalence vs. increasing non-zero prevalence) using weighted effect sizes (Garamszegi 2006).

Results

Within-species repeatability

Relying on a full sample of birds having data on prevalence, the mixed effect modeling revealed consistent within-species variation but only moderate repeatabilities (Haemoproteus, R = 0.484, 95% CI = 0.448/0.517; Plasmodium, R = 0.164, 95% CI = 0.132/0.196; Trypanosoma, R = 0.206, 95% CI = 0.164/0.249; Leucocytozoon, R = 0.397, 95% CI = 0.357/0.435). Therefore, there is considerable within-species variation even when methodology and sampling location (continent) were held constant as a fixed effect. This indicates that species-specific prevalence can only be estimated with certain errors in a statistical sense, which will likely reduce the ability of the interspecific tests to detect strong effects.

Interspecific patterns of prevalence

The phylogenetic correlations that are adjusted for sampling effort are given in Tables 1 and 2. In general, there was no indication of a strong relationship between sexual traits and prevalence of the four blood parasite genera in birds. Most of the associations between song and parasite pressure due to blood parasites based on microscopic data showed either weak or no effects (Table 1), but some of these relationships reached effect size intervals that correspond to ∼10% explanatory power (Figs. 1 and 2).

Fig. 1
figure 1

The relationship between prevalence of blood parasites and song phenotypes in birds when considering transitions from zero prevalence to non-zero prevalence (all species are included). Only those relationships are shown for which effect size and confidence interval estimates reach a range that corresponds to an intermediate effect (see Table 1): a Leucocytozoon and syllable repertoire size and b Plasmodium and song type repertoire size. Horizontal lines within boxes show medians for the respective groups, the ends of the boxes are the 25th and 75th quantiles, whiskers give the data range that falls within the upper/lower quartile ±1.5 × (interquartile range), and numbers indicate sample sizes

Fig. 2
figure 2

The relationship between prevalence of blood parasites and song phenotypes in birds when considering transitions from non-zero prevalence (i.e., species with zero prevalence are excluded). Only those relationships are shown, for which effect size and confidence interval estimates reach a range that corresponds to an intermediate effect (see Table 1): a Trypanosoma and song duration (N = 93), b Leucocytozoon and song duration (N = 97), c Plasmodium and syllable repertoire size (N = 82), and d Haemoproteus and song type repertoire size (N = 115). Data points are species-specific estimates. Regression line is from the phylogenetically corrected regression model that also takes statistical weights into account

We detected indication for only a weak, if any, relationship between plumage dichromatism based on avian vision and prevalence (Table 2). Given the large sample size resulting in the relatively narrow confidence intervals around the effect sizes, we consider that our estimates of the size of the effects are of high accuracy. Our conclusions were similar for both microscopy- and PCR-based parasite data, as confidence intervals around effect size estimates were largely overlapping. However, we avoid making strong conclusions from the models based on PCR data, as sample sizes were generally smaller than in models using microscopy data, which led to highly imprecise estimates (broad 95% CI).

We compared effect sizes corresponding to microscopy data between prevalence-transition categories. Detected effect sizes for the focal relationships were more likely to vary in the positive direction when we compared the expression of sexually selected characters between species with zero and non-zero prevalence than when we followed the change in prevalence in association with sexual selection by considering only species in which blood parasites are prevalent (Tables 1 and 2, Figs. 1 and 2). The mean effect size for the former comparison of species with parasite presence/absence was 0.051 (95% CI = 0.007/094, when the signs of the effect sizes for inter-song interval have been converted), while the same statistics for the latter correlational approach was −0.036 (95% CI = −0.098/0.026, when the signs of the effect sizes for inter-song interval have been converted). This corresponds to a considerable difference between the two categories (r = 0.349, 95% CI = 0.096/0.554). This difference indicates that sexually selected traits tend to be favored when infection by a blood parasite emerges in a species, while the expression of male traits tends to be reduced in parallel with increasing rate of infection in species in which the pathogen has already been established.

Discussion

Here, we re-examined the relationship between male sexual traits and prevalence of blood parasites in birds following the original predictions of H&Z using a large sample of birds and implementing recent methodological advancements. We first demonstrated that the repeatability of the species-specific estimate of prevalence of blood parasites is moderate, which suggests considerable within-species variation, e.g., due to differences between populations or methodological differences. The biological causes of such within-species variation remain to be investigated. We then derived effect sizes for the focal association at the across species level by relying on both microscopic and PCR-based data on prevalence as well as on photospectrometric measurements of trait expression in males and by adjusting for biases due to phylogenetic effects and within-species sample sizes as observed in the actual data. We examined the predictions of the H&Z hypothesis in two separate sets of comparisons based on prevalence transition categories. After adjustment for the different confounding effects, we failed to find generally strong support for the predicted relationship, as we detected only effect sizes of weak to intermediate magnitude. However, certain relationships reached a level that is evolutionarily meaningful. In fact, we found interspecific evidence that an increase from zero to non-zero prevalence is accompanied by an increase in trait expression in males, but further increase from non-zero prevalence to a higher level of infection is associated with a reduced degree of trait elaboration.

There are several explanations why interspecific correlations of parasite prevalence reveal little about the biological role of parasitism for mediating the elaboration of male sexual traits in birds. These explanations include the inappropriateness of prevalence data to reflect general parasite pressure, and the uncertain role of blood parasites in host–parasite co-evolutionary dynamics (Read 1988; Endler and Lyles 1989; Møller et al. 1999). Specifically, the prevalence of the four major hematozoan blood parasite genera detected in the sample of captured individuals might not mirror the true effect of parasites on hosts because such data correspond to chronic infection stages in surviving individuals while the mortality and morbidity costs remain undetermined (Atkinson et al. 2000; Valkiūnas 2005; Møller and Nielsen 2007). Furthermore, some studies emphasize the possibility that an unobserved third variable has a strong influence on both male signals and parasite prevalence, which makes it difficult to uncover the biological importance of parasites for sexual selection in a pair-wise comparison (Read 1987; Johnson 1991; John 1995; Yezerinac and Weatherhead 1995). We find these explanations equally applicable to the generally weak effects that we uncovered in this study. In addition, we suggest that the modest within-species repeatability of prevalence may also affect our ability to find strong interspecific relationships. Such modest repeatability is caused by considerable variation within species that might occur due to differences in disease exposure across space and time (Møller et al. 2011). Prevalence is also a population-specific parameter as several factors (habitat, host immunity, vector abundance, etc.) can affect population-specific infection rates. Accordingly, it is commonly found in the literature that prevalence can vary among host populations and over time within a population (Mendes et al. 2005; Ricklefs et al. 2005; Fallon and Ricklefs 2008; Evans et al. 2009; Garamszegi 2011).

The narrow confidence intervals around the repeatability estimates indicate a role for consistent interspecific variation, which verifies interspecific variation in the prevalence of blood parasites may provide biologically meaningful information and that species-specific estimates may arise from selection. A list of comparative evidence proved that prevalence of blood parasites in birds can be related to host ecology and life history (Garvin and Remsen 1997; Tella et al. 1999; Scheuerlein and Ricklefs 2004; Arriero and Møller 2008), risk taking (Møller 2008; Møller et al. 2008), cognition (Garamszegi et al. 2007b), sociality (Tella 2002), and genetic polymorphism (Poulin et al. 2000).

Moreover, the chosen plumage and song characters may be poor indicators of pathogen resistance as well as the intensity of sexual selection at the interspecific level. For example, different acoustic or visual may be used in different species to provide information about male quality, and this could make it difficult to draw conclusions from the interspecific comparison of a trait that has different relevance for sexual selection in different species. As a matter of fact, recent reviews emphasized that the role of song repertoire size in mediating female choice varies among species (Byers and Kroodsma 2009; Soma and Garamszegi 2011). While some studies showed that males with large repertoire size appear better at combating pathogens (e.g., Buchanan et al. 1999), this does not necessarily hold for other species (e.g., Weatherhead et al. 1993; Saino et al. 1997; Garamszegi et al. 2004). In fact, the general relationship between parasitism and the color discriminability index calculated across all body regions might be even weaker, as this assessment of coloration does not account for the fact that females of different species rely on different visual cues (e.g., hue, chroma, brightness) that are displayed in specific body regions of males. Furthermore, we have also emphasized (see “Materials and methods”) that results in association with color discriminability should be interpreted with caution because of the confounding effect of female cryptic coloration. Despite the potentially weak relevance of the used sexual traits, a fair number of interspecific studies demonstrated that the investigated acoustic characters and sexual dichromatism might be relevant for the evolution of sexual signaling (e.g., Read and Weary 1992; Møller et al. 2000; Bennett and Owens 2002).

Our study also indicates that different selective mechanisms might be operating in different evolutionary stages of infections, which might have opposing effects. For statistical reasons (to avoid zero inflation), we separated data into two prevalence transition models, but these models may not only correspond to different statistical situations but also involve different evolutionary scenarios that have not previously been considered. We detected that species without hematozoan infection are more likely to have less elaborate sexual signals than species that are already exposed to parasite pressure. If there is a reliable relationship between current parasite prevalence and the co-evolutionary history between hosts and parasites (Hoberg et al. 1997), the detected pattern may reflect an early stage of evolutionary contact and should thus translate into a positive relationship between the focal traits, in general agreement with the prediction of the H&Z hypothesis. In contrast, when we considered species in which infection by blood parasites is already established, we were more likely to find a negative interspecific relationship between intensity of sexual selection and prevalence. Therefore, in an advanced stage of co-evolution, when hosts are already adapted to parasitic infections to a certain degree, mechanisms other than the H&Z may be applicable. For example, it remains plausible that host species experiencing a strong selection pressure from parasites cannot possess and maintain complex ornaments because of the immunosuppressive side effects that reduce the efficiency of immune defense (Sheldon and Verhulst 1996). Alternatively, low prevalence in species that have males with elaborate sexual characters can be caused by parasite-mediated sexual selection if individuals that die from the infection remain undetected. Therefore, low prevalence observed in a species based on the sampling of surviving individuals may actually mean strong parasite pressure leading to high mortality. Accordingly, the apparent negative correlation between male traits and detected prevalence in surviving individuals may conceal a positive evolutionary relationship between parasitism and sexual selection.

Mean effect size estimates in this study accounted for <1% to a maximum of 8% of the variance. While such effects may seem tiny, they correspond to small to intermediate effect sizes (sensu Cohen 1988). Across all meta-analyses in the biological sciences mean effect sizes account for about 7% of the variance (Møller and Jennions 2002), putting the present study into perspective. We caution against dismissing weak effects as being unimportant because a selection differential of just 1% can readily “change a mouse into an elephant” during a relatively small number of generations (Møller and Jennions 2002). Therefore, the relationships we uncovered here (especially for some song traits) may have important evolutionary implications. The potential importance of moderate effect sizes is also reflected by the fact that we found biologically meaningful patterns when we compared the mean effect sizes across prevalence transition stages.

Apparently, the consideration of some methodological advances in the fields of parasitology, behavioral ecology, and comparative biology when testing the H&Z hypothesis has a philosophical rather than a practical value. Although the sensitivity of PCR- and microscopy-based detection techniques might differ (Fallon and Ricklefs 2008; Valkiūnas et al. 2008; Garamszegi 2010), the conclusions for the focal relationship were similar when prevalence data had been tabulated separately for the two screening methods (Table 2). Moreover, spectrophotometric color data may seem superior to human plumage coloration scores, as the former rely on an objective avian visual model, which can be utilized in a standard way in different species (Eaton 2005; Montgomerie 2006; Armenta et al. 2008). Despite this advantage, we failed to find strong support for the H&Z hypothesis when using photospectrometric data. Other studies arrived at a similar conclusion when a strong correlation between sexual dichromatism based on human scores and photospectrometric estimates was observed (Armenta et al. 2008; Seddon et al. 2010). This indicates that the color differences that humans perceive among species of birds contain biologically relevant information. Finally, it is assumed that an appropriate control for phylogenetic relatedness of species and the adjustment for uneven sampling effort are important (Freckleton 2009; Garamszegi and Møller 2010). We found models that accurately adjust for phylogenetic errors and also consider sampling bias when revealing a relationship between focal traits. The only factor that we identified to affect our understanding about the roles of parasites in mediating sexual selection in birds is the consideration of prevalence transition scenarios. When we separated species based on zero/non-zero prevalence, we found different patterns of association with the degree of evolutionary contact between hosts and parasites.

We conclude that the combination of currently available data with recent methodological developments highlights the importance of within-species variation as a confounding factor for the relationship between blood parasitism and male sexual signals in birds. Furthermore, the degree of evolutionary contact between hosts and parasites should also be considered in interspecific tests of prevalence. We were generally able to uncover weak effect sizes in support of the H&Z hypothesis, which can vary from positive to negative directions. We note that although effects with 1–8% explanatory power may appear small to medium, they could readily be biologically important on an evolutionary time scale. We also note that experimental studies of the relationship between expression of male displays and parasitism show much stronger effects than correlational data, apparently because many confounding factors reduce the size of the biological signal (Møller et al. 1999). Likewise, studies based on expression of male displays and immunity show much stronger effects than studies of parasites that often have very weak or no discernible fitness consequences for their hosts (Møller et al. 1999). Finally, future studies may benefit from the consideration of different transition scenarios by taking the degree of infection into account.