Introduction

Vector-borne parasites account for approximately 30% of recent emerging infectious diseases, and growing evidence indicates that climate change and land cover have profound impacts on arthropod distributions and phenologies (Patz et al. 2000; Daszak et al. 2001; Altizer et al. 2013). The distribution and timing of vector-borne parasite transmission are therefore likely to be directly influenced by changes in environmental variables such as temperature, humidity, and rainfall as well as by availability of breeding habitat, canopy cover, and other landscape features. In spite of the benefit to human and animal health, predicting transmission hotspots is complicated by the fact that many vector-borne parasites are transmitted by a suite of related arthropods. For practical purposes, studies of disease-causing organisms transmitted by such groups are often restricted to a few dominant vector species (Lord 2010; Higa 2011) or they consider the arthropod genus as the smallest taxonomic unit (Allan et al. 2009). Improving our knowledge of environmental drivers of arthropod community structure may provide more precise links between underlying geographical variation and risk of infectious disease (Ostfeld et al. 2005; Blaustein et al. 2013; Wells and Flynn 2022).

There are several distinct reasons why changes in vector communities will impact parasite transmission. Modeling studies have shown that a parasite’s basic reproductive number, the number of secondary cases caused by an infectious individual in a susceptible population (Keeling and Rohani 2008), scales with the vector to host abundance ratio (Smith et al. 2012). Consequently, environmental conditions that promote the abundance of one or more vector species can increase the parasite’s transmission potential. However, interspecific interactions, which include competition between vector species (Marini et al. 2017) as well as predation of vectors (Quiroz-Martínez and Rodríguez-Castro 2007), are also impacted by environmental conditions which change over time and space to influence the inclusion or exclusion of certain species (Claflin et al. 2017). These biotic interactions may act in concert with environmental drivers of abundance or they may counteract them (Ferraguti et al. 2018; Altamiranda-Saavedra et al. 2020).

Beyond vector abundance, abiotic and biotic factors will also impact the composition of a vector community (Johnson et al. 2020). Vector species are known to vary in their ability to transmit parasites. Vectorial capacity (Kramer and Ciota 2015) is an integrated measure of a vector species’ ability to acquire and transmit a parasite and typically includes the blood feeding rate, the transmissibility of the parasite to and from the vector, the extrinsic incubation period (the time it takes for a vector to become infectious after acquiring the parasite), and the life span of the vector. A community composed of different vector species will have a characteristic mean and variance for vectorial capacity that determines the ability of the parasite to be effectively transmitted between host species. Importantly, it is not just the quantity of vectors in a community that needs to be considered, but also their quality, measured, for example, by mean vectorial capacity.

While we often think of vector communities varying across space, there is also variation through time. Species vary in their phenology via the dates of their emergence, peak abundance, and overwintering (Quaglia et al. 2020). As a result, distinct vector communities may generate high risk of transmission at different times of year (McMillan et al. 2019). Further, some communities may have longer (even continuous) transmission seasons compared to others (Park et al. 2016; Quaglia et al. 2020). Further, in the context of generalist parasites, because vector species may vary in their blood meal preference among vertebrate host species (Takken and Verhulst 2013), vector communities will vary in their propensity to transmit parasites between different host species (Kilpatrick et al. 2005).

Metacommunity ecology can be leveraged to make the link between exogenous abiotic and biotic drivers and infectious disease risk (Mihaljevic 2012; Johnson et al. 2013; Mihaljevic et al. 2018). It enables the characterization of species occurrence patterns by integrating spatial and environmental information (Cottenie 2005). These characteristic patterns provide valuable ecological insight into community dynamics, highlighting the relationships between species, environmental heterogeneity, and dispersal mechanisms (Leibold et al. 2004). Applying metacommunity ecology to infectious disease problems provides a method of establishing potential links between environmental variables, host and vector communities, and spatial distribution and disease. Specifically, it allows for connections between environment and disease to be evaluated, yielding an explanatory and predictive framework for disease patterns that may be interpreted via the environmentally structured community of vector species. For vector-borne diseases, where vectorial capacity of candidate species is often uncertain, methods that can link individual species or groups of species to environmental conditions and to disease patterns provide important evidence that may help focus future studies on vectorial capacity of particular species. Ultimately, by identifying key factors that determine vector communities, we become better positioned to estimate the impacts of anthropogenic changes, including land use change and climate change, on future vector communities and their likelihood of transmitting parasites of concern.

The application of metacommunity theory to vector-borne diseases requires spatially resolved disease occurrence data, and site-by-species presence/absence data of associated arthropods across a landscape with heterogeneous environmental conditions. To demonstrate that changing environmental conditions shape vector communities and associated disease risk, we examined the structure of a Culicoides vector metacommunity in relation to both environmental conditions and the occurrence of hemorrhagic disease (HD) in populations of white-tailed deer hosts (WTD, Odocoileus virginianus). This vector-borne disease is caused by several serotypes of epizootic HD virus and bluetongue virus, which are closely related double-stranded RNA orbiviruses (Attoui et al. 2009). HD is considered one of the most important diseases of WTD (Howerth et al. 2008) causing morbidity and mortality, and is transmitted only by biting midge species in the speciose genus Culicoides. To evaluate the temporal and spatial extent of HD, the Southeastern Cooperative Wildlife Disease Study (SCWDS) has compiled nationwide county level reports of morbidity and mortality in WTD since 1980. For the purpose of this study, data were restricted to 2007–2012 to coincide with available data on the biting midges.

The complex nature of HD transmission via the Culicoides community, coupled with the availability of long-term sampling of WTD populations, makes this an ideal system to explore infectious disease through the lens of metacommunity ecology. Our study provides evidence of a close, predictable relationship between environmental covariates, vector community composition, and disease occurrence in host populations. It extends our knowledge of disease-diversity relationships (Johnson and Thieltges 2010), both by including vector diversity and by analyzing communities at a finer scale than simply species richness. Lastly, it demonstrates the utility of metacommunity ecology in identifying transmission hotspots and the underlying environmental factors that modulate disease risk through their shaping of vector community structure, including anticipating vector community dynamics in times of global change. Such surveillance of vector communities is feasible, and given that vector communities often transmit parasites causing several diseases, there is considerable added value in increasing data and its availability as part of our efforts to anticipate and mitigate the emergence and transmission of vector-borne parasites.

Methods

Culicoides surveillance

Within the southeastern USA, approximately 50 species of Culicoides have been reported via light trap collections (from approximately 5500 trap nights; number of traps times number of nights) at over 200 discrete sites (Vigil et al. 2014). While some of these species have been implicated in the transmission of HD (Jones et al. 1977; Ruder et al. 2012), competency for transmission has not been studied systematically. Between 2007 and 2012, SCWDS conducted surveillance for novel Culicoides spp. in the Atlantic and Gulf coastal regions of the southeast (Louisiana, Mississippi, Alabama, Georgia, and Florida). These light-trap surveys were typically carried out during peak HD activity (late July through September) involving eight to twelve light traps per site visit and provide a presence/absence dataset of 50 Culicoides species, with further details in (Vigil et al. 2014).

Hemorrhagic disease reports

SCWDS has compiled nationwide county-level reports of morbidity and mortality in WTD since 1980. For the purposes of this study, data were restricted to 2007–2012. Counties that reported either morbidity or mortality in a given year were scored as HD positive. Consequently, for the 6-year study period, each county had a value between 0 and 6, i.e., the number of years that the county reported at least one case of morbidity or mortality attributable to HD. The georeferenced values were smoothed spatially using two-dimensional kernel density estimation via the kde2d function in the MASS package (Venables and Ripley) using the R programming environment, version 4.1.2 (R Core Team 2021). This method generates a non-parametric estimate of the density of disease reports (referred to hereafter as “disease score”) across the landscape. The spatial smoothing changes the scale from 0 to 6 (i.e., a county’s value over the 6-year period) to scores ranging from 0 to approximately 0.006. Values from this distribution were then extracted at the geographic coordinates of the Culicoides sampling sites to provide a measure of the regularity with which a location was likely to have experienced HD outbreaks, and referred to as disease score.

Metacommunity analysis

We characterized Culicoides community composition in two ways. First, we ordinated the site-by-species presence/absence matrix—describing the distribution of vector species among the sampled sites—using the method of reciprocal averaging (Presley 2020). This method maximizes between-site and between-species similarity, effectively swapping rows and columns so species occupying similar sites are adjacent and sites with similar species composition are adjacent in the matrix. The method also returns real-valued scores for sites and species, with more similar scores reflecting more similar occupancy patterns (species score) or composition of species (site score).

We calculated standard metacommunity metrics: coherence, turnover, and boundary clumping (Leibold and Mikkelson 2002; Presley et al. 2010). Significant positive coherence means the site-by-species matrix has non-random structure (and other than a “checkerboard” pattern). Turnover and boundary clumping metrics further indicate the nature of nestedness and/or turnover describing how the species composition changes across sites. This was done using the R package metacom (Dallas 2014) on the matrix representing presence/absence data of species at each site. In this framework, coherence and turnover are calculated via statistical comparison of the empirical matrix with 1000 null matrices, generated under the constraint of maintaining the species richness of a site (row totals) and filling species ranges (columns) based on their marginal probabilities (the “swap” algorithm as described in (Gotelli and Entsminger 2003)). This establishes if patterns in data are beyond those expected by chance. Boundary clumping was established using Morisita’s index, with significance determined relative to a chi-squared distribution.

In order to provide additional support for our findings, we characterized the site-by-species presence/absence matrix as a bipartite network and tested for modularity, using Barber’s Q (Barber 2007) to quantify the tendency of Culicoides communities to cluster among sites, forming distinct modules that may be associated with environmental drivers, and have clear implications for disease transmission. First, to determine if the Culicoides metacommunity was significantly more modular than expected by chance, we generated 1000 null matrices using the swap algorithm detailed above. We then compared the null distribution of Q values to the empirical Q. Lastly, we created a one-mode projection of the bipartite network in which sites were connected to each other with edge weights given by the number of shared species. Analyzing this unipartite network with the Walktrap method (Yang et al. 2016) quantifies and identifies “modules” within which there are dense connections between sites, i.e., many species in common, but sparse connections between modules (Barber 2007).

Environmental covariates

To examine the relationship between Culicoides communities and abiotic variables, we obtained temperature (°C) and precipitation data (mm) along with spatial location data (decimal degrees) and elevation (m). Specifically, we used the 2009 Prism data set (PRISM Climate Group, Oregon State University 2018) to obtain monthly estimates of minimum, maximum and mean temperature, and precipitation at each site as rasters. From these data, we also derived the temperature range as the difference between the maximum and minimum temperature through the year (2009 was the approximate temporal midpoint of the Culicoides sampling and is the year with the most samples, 28%). Site centroids, based on all light trap locations, provided latitude and longitude data from which site estimates for temperature and precipitation data were extracted. Elevation data were obtained from the USGS using the “get_elev_point” function in the elevatr package.

Linking site scores, covariates, and disease reports

The site score (see subsection on Metacommunity ecology, above) was then explained in terms of environmental covariates (Hijmans et al. 2005) using the statistical methodology of boosted regression trees (Elith et al. 2008) using the R package gbm (Ridgeway 2013). Boosted regression trees rely on decision trees, weighted by their predictive capability, in order to remove infrequent (“weak”) decision factors for making a branch in the tree. The remaining frequent (“strong”) decision rules are combined to form a single predictive model. In ecology, combining regression and boosting has been demonstrated to enhance predictive performance of models (Elith et al. 2008). Because site score is obtained from the method of reciprocal averaging, basically a form of correspondence analysis, these scores can be negative as well as positive (e.g., as in PCA analysis), and indeed some were in our case. Consequently, the square root transformation (to an approximately normal distribution) first required the response variable (site score) to be shifted to site score-min(site score), so the minimum value was zero, and then square root transformed. The optimal number of trees was determined using fivefold cross validation (i.e., 80% training data, 20% testing data). Interactions were included by setting the interaction depth to 3, and 50,000 trees were iteratively fit (other model parameters were shrinkage = 0.001 and bag.fraction = 0.7). Model performance was assessed by calculating the root mean squared error following visual inspection that this had stabilized within the 50,000 iterations of fitting. The final model provides relative contribution (RC) values for each predictor variable, calculated based on the number of decision trees in which a variable was included, weighted by the improvement in model performance as a result of the inclusion of the predictor. All RC values are bound between 0 and 100 (and sum to 100), where low scores correspond to variables with little contribution to Culicoides community composition, and high scores indicate highly predictive variables. Relationships between the response variable and predictors were visualized using univariate partial dependence plots, which account for the average effects of other variables. Lastly, the Culicoides community composition described by the site score was related to disease score by visualizing disease score as a function of the rank order of site scores (including a loess regression trend), and by testing if the same rank order was statistically correlated with disease score (using Spearman’s rank correlation test). Disease score was also related to the network module a site belonged to using Student’s t test.

Results

Culicoides metacommunity structure

Culicoides communities were significantly coherent, meaning they exhibited non-random structure (z = 7.91, p < 0.0001), and additionally had non-significant species turnover (z = 1.62, p = 0.103) and significantly clumped species range boundaries (Morisita’s index = 7.33, p < 0.0001). The synthesis of these analyses is that the Culicoides communities exhibited a quasi Clementsian structure (Fig. 1) in which species replace each other in discrete communities along some environmental or spatial gradient (Presley et al. 2010). This is further supported by the significant modularity of the site-by-species matrix (Q = 0.22, p < 0.0001) meaning site and species combinations cluster, resulting in dense connections between subsets of sites and species, but sparse connections between modules (Barber 2007). Across sites where virus transmission causes HD, no vector species is ubiquitous (Fig. 1—absence of a solid black bar).

Fig. 1
figure 1

Culicoides distributions (columns) among sites (rows) sampled. A black rectangle within the matrix indicates presence of that particular Culicoides species at the corresponding site. Grayscale colors other than black indicate level of disease reporting (“disease score” indicated on gray-scale bar legend are kernel density estimates × 10.3)

Environmental and spatial covariates of metacommunity structure

Metacommunity analysis quantifies how similar sites are to each other in terms of species but does not automatically explain what features of the environment or configuration of sites explain similarity. In order to generate that explanation, the gradient along which Culicoides species form discrete communities was related to environmental and spatial variables via boosted regression trees using the site scores (square-root transformed for normality) as a measure of the latent gradient along which the Culicoides community was structured. The temperature range between the warmest and coldest months (RC = 23.3), and minimum temperature (RC = 17.7) were the most influential predictors of Culicoides community composition (Fig. 2), whereas maximum temperature was the least informative (RC = 5.4). The model, with only eight predictors, performed well; the response variable, sqrt(site score), ranged from ~ 0 to 3.5, and the root mean squared error was 0.205 converging after approximately 14 k of the 50 k trees generated. The top two predictors had opposite relationships with site score; higher temperature ranges were associated with high site scores, whereas higher minimum temperatures were associated with low site scores (Fig. 3). Essentially, this means the two community types are characterized by either low minimum temperatures and large temperature ranges or high minimum temperatures and small temperature ranges.

Fig. 2
figure 2

The relative contribution of spatial and environmental variables to the structure of Culicoides communities determined from a boosted regression model that predicted site score based on site-level data

Fig. 3
figure 3

Partial dependence plots for the top two predictors of sqrt(site score) from the boosted regression model. The sqrt(site score) is shown by the left-hand y axes and black lines, with gray histograms (and right-hand y axes) indicating the frequency of temperature ranges and values recorded across sites

Site scores and disease scores

Site score was a predictor of the level of disease reporting (Spearman rank correlation rho = 0.74, p < 0.0001, Fig. 4—illustrated with loess trend line, span = 1.2) as was site membership of cluster (Student’s t test t =  − 14.2, p < 0.0001, Fig. 4—illustrated with sites colored by module membership). Sites belonging to the same module have many vector species in common, compared to sites belonging to different modules, which have few species in common. High levels of disease reporting were associated with Culicoides communities that occur at locations with relatively low minimum temperatures, but with large temperature swings between summer and winter.

Fig. 4
figure 4

Main plot shows sites ranked by the similarity in their community composition score (“site score”) and the level of hemorrhagic disease reports (see Methods for details). Sites are colored according to their membership of a network module. The inset plot shows each site colored according to its module, with edges weighted according to the number of shared vector species between sites. The illustrative trend line in the main plot is a loess regression (span = 1.2)

Discussion

Culicoides are important vectors worldwide, capable of transmitting micro- and macro-parasites in human and animal populations and causing severe economic impacts to the domestic livestock industry (Jennings and Mellor 1988; Mellor et al. 2000; Agbolade et al. 2006; Azevedo 2007; Rasmussen et al. 2012). In addition to contributing information on their ecological requirements and community structure, our study develops a method to integrate metacommunity ecology with infectious disease ecology in order to identify and explain hotspots of vector-borne transmission that may be applied to other systems. Use of the regularity and spatial extent of HD reporting to inform an integrated “disease score” places emphasis on conditions under which transmission may occur, rather than times and places where large outbreaks occur. This is useful in HD where outbreak size may depend on unmeasured covariates including population level immunity (Park et al. 2013). The method presented does not require direct information about the competency of candidate vector species, but can help to identify species that may be targeted for further research. In addition, it identifies ecological gradients over which groups of arthropods turnover. Further, by collapsing high-dimensional data into a scalar site score representing community similarity, it facilitates the relation of environmental drivers and community composition to levels of disease reporting using straightforward statistical models.

The metacommunity analysis revealed that no species of Culicoides was ubiquitous in its distribution such that no single species is solely responsible for vectoring viruses. Indeed, Culicoides sonorensis, a known vector for HD (Ruder et al. 2012), was sampled only rarely and does not appear related to the level of HD reports referenced within the temporal and spatial extent of this study. Further, C. insignis, the documented vector for bluetongue virus in the Caribbean and Central and South America, occurs in this study frequently in areas of low disease reporting. Plausibly then, viruses causing HD are vectored by communities of vector species. The major measured environmental gradient influencing community composition in this system is the annual temperature range. Greater differences in temperature in summer and winter months are anticipated to occur under climate change (Rummukainen 2012), and this may cause restructuring of vector communities such that they are more efficient at transmitting the viruses causing HD.

Specifically, in our case study environmental changes including annual variation in temperature effectively flip communities between one of two types. These two “modules,” within which sites share many vector species but between modules do not, could manifest at the same site provided vector species are not dispersal limited. This means that a location previously inhospitable to the virus could change to become a source of transmission, provided environmental conditions changed to those preferred by the module associated with transmission. Notably, land-use change has also been associated with elevated disease risk due to altered vector communities (Shah et al. 2019; Guo et al. 2019; McMillan et al. 2020; da Silva Pessoa Vieira et al. 2022). Future research on environmental drivers of vector community restructuring seems likely to provide many more examples of vector community-mediated environment-disease associations. In our system, the spatial location of sites was informative in predicting the structure of Culicoides communities but was less important than key environmental variables. This spatial correlation is also reflected in previous studies characterizing the clustering of HD reporting; among the southeastern USA, Mississippi and Georgia are relative hotspots, followed by their shared neighbor, Alabama. Finally peripheral states (Arkansas, Louisiana, and Florida) exhibit lower HD reporting (Park et al. 2016). The gradual transition in space from low to high areas of HD transmission due to vector community structure suggests that sites could transition between modules based on dispersal of constituent vector species.

It is well-known that mosquito species have feeding preferences among host species (Takken and Verhulst 2013), and recent evidence suggests the same is true for Culicoides species (González et al. 2022). Such feeding preferences are important determinants of cross-species transmission (Simpson et al. 2012), which is relevant in this system as bluetongue viruses not only infect WTD but also economically important cattle (Walton 2004). Consequently, vector community structure may be leveraged to assess the risks of infection to target species of concern. Efforts to control communities of vector species are also anticipated to change the community structure itself, with unknown consequences for transmission. For example, species of Culicoides are known to be differentially susceptible to insecticides (Venail et al. 2015). Future research on the response of community structure to control efforts including repellents and insecticides promise to improve our understanding of risk beyond the likely oversimplified assumption that, barring evolutionary responses of arthropods, vector control should reduce transmission (Qureshi and Connolly 2021).

None of the ideas generated from this study are unique to the HD-Culicoides system. Many emerging and re-emerging infectious disease agents, including malaria parasites, West Nile, Schmallenberg, Dengue, Chikungunya, and Zika viruses (Lord 2010; Burt et al. 2012; Rasmussen et al. 2012; Corbel et al. 2013; Weaver et al. 2016; Ferraguti et al. 2018; McMillan et al. 2019; Hoi et al. 2020), are vectored by suites of arthropods. With appropriate sampling, metacommunity ecology could be used to understand how groups of vector species replace each other over landscapes in these systems and how this relates to disease prevalence in target host populations. Indeed, it has recently been argued that vector diversity should be included in an integrated approach to leveraging biogeography to improve health management (Murray et al. 2018). Our case study illustrates that focusing on a single vector species may fail to convey the full picture regarding potential transmission hotspots, environmental drivers of risk, and how these will change in the Anthropocene.