1 Introduction

Culex species are the major vector of West Nile Virus (WNV) in the USA. This virus was first identified in North America in New York City in 1999. By the end of 2019, more than 51,801 disease cases of WNV had been reported to the Centers for Disease Control and Prevention (CDC) [1]. Recently in 2020, 664 human cases and 52 deaths were reported to the CDC. In the southeast USA, over 96% of the WNV positive mosquito pools reported to the CDC from 1999 to 2010 have been obtained from Culex mosquitoes, among which 64.6% were from Culex quinquefasciatus (CQ) [2]. The state of Georgia, and especially the Atlanta area, was a hotspot of WNV incidence in 2012 with 117 WNV human cases, 6 deaths, and 125 WNV positive mosquito pools (among which over 81% were from CQ) reported by the Georgia Department of Public Health.

Transmission of vector-borne diseases is influenced by a wide range of environmental factors. Among these, meteorologic variability is one of the most important drivers of inter-annual WNV transmission risk. Weather directly affects the vector population, pathogens and hosts distribution, and their abundances [3,4,5]. Culex species display a seasonal behavior. Their activity reaches its minimal level in the winter and then rises in the spring to the peak levels of summer and continues until mid-fall [6]. Females which emerge in late summer search for sheltered areas where they become inactive until spring. They become inactive when the temperature drops below 60 °F [7], while warm weather brings them out in search of water on which to lay their eggs [8]. Changes in meteorological conditions such as temperature, relative humidity, and wind speed can impact mosquito populations [9]. The greatest WNV transmissions during the epidemic summers of 2002–2004 in the USA were linked to above normal temperatures. Analysis of temperature deviations from the 30-year mean (1971–2000) during summer in the USA showed that during years with above normal temperatures, WNV always dispersed into new areas and the amplification occurred during summers with above or normal temperatures. Subsequent cool summers were associated with decreased or delayed virus activity, especially at northern latitudes [10].

Temperature influences the development rate and fitness of immature mosquitoes and the biting rate and survival of adult female mosquitoes [11]. Drought can lead to a decline in the number of mosquito predators, and it may encourage birds to gather near standing water, where the virus can circulate more easily. High temperatures also speed the development of viruses within the mosquito carriers [12,13,14]. During periods of drought between rainfall events, blood-fed and potentially infected mosquitoes digest blood meals and wait for a heavy rainfall that floods the temporary pools to oviposit [15]. Rainfall and the surface moisture first create temporary freshwater habitats and also maintain permanent aquatic habitats that are used as egg-laying sites by female mosquitoes. Subsequently, rainfall saturates the ground and increases near-surface humidity levels [16].

The increase in the relative rate of WNV human cases from 2001 to 2005 in the USA has been linked to warmer temperatures, elevated humidity, and heavy precipitation independent of season using conditional logistic regression [17]. WNV mosquito infection changes from year to year spatiotemporally. For the temporal scale, higher temperature and less rainfall are associated with more human cases and also with the highest WNV prevalence in the mosquitoes. WNV infection can also be negatively correlated with the previous year’s precipitation [18]. For the spatial analysis, temperature plays a bigger role than precipitation in comparison to temporal patterns [18]. Drought followed by wetting of the land surface is associated with the spatiotemporal variability of human WNV cases [19]. Spring drought induces the amplification of WNV by concentrating vector mosquitoes in humid vegetated areas where nesting birds are present. This makes the virus transmission easier as birds are the natural host of WNV, and this virus is maintained in nature in a mosquito-bird-mosquito transmission cycle. Subsequent summer rainfall and wetting of the land surface enable the dispersal of infected mosquitoes into the open, sparsely vegetated areas they had avoided during the drought [20].

To control mosquito populations and to prevent disease, understanding this vector–environment relationship is essential. It is also helpful to understand the responses of WNV transmission risk to meteorological variability for public health policies so they can be adapted based on the consequent impacts [21]. Predictive models can be helpful in this regard to enhance the warning of high-risk periods for WNV and to describe the variations in mosquito abundance over time. Extensive attempts have been made to develop mosquito abundance prediction models which mostly rely on meteorological and environmental data from the days and weeks preceding the capture of mosquitoes [22]. Such models can be designed to provide continuous daily or weekly estimates of mosquito populations under the impacts of different environmental conditions. Ahumada et al. [23] proposed a discrete-time population model to simulate the temporal dynamics of CQ abundance. The model incorporated temperature and rainfall dependence and breeding site density dependent competition. This model simulated the mosquito population growth through time and at different elevations in Hawaii. Temperature was the major driving force behind mosquito population growth and abundance in Hawaii, but precipitation dependence also constrained population size which was evident during dry years.

A climate-based model was developed by [24] to predict mosquito abundance of WNV Culex species. Temperature, rainfall, evaporation, and photoperiod were used as inputs to the model. A moisture index was also created based on 7 days cumulative rainfall and evaporation. The model was developed on temperature-dependent functions including development rate and survival rate, a moisture index dependent function, and daily egg-laying rate.

The Dynamic Mosquito Simulation model (DyMSiM) developed by Morin and Comrie [25] was used in simulating CQ population dynamics in Florida and California. This model breaks up the larval phases into separate instar stages. The model used daily temperature and precipitation to drive population simulations throughout the year. This model revealed that dry conditions in California reduced mosquito populations due to loss of immature mosquito habitats, while drier late summer conditions in Florida decreased late-season adult mosquito populations.

In most of the previously mentioned analyses, the impact of meteorological conditions on mosquito abundance was limited to single point lags which consider the conditions at a certain time prior to trapping. Curriero et al. [9] introduced cross-correlation maps (CCMs) as a graphical method for visualizing the influence of preceding environmental conditions during a time lagged interval on the abundance of Ochlerotatus sollicitans species. Since then, this tool has been used to identify the timing and duration of potential meteorological effects on mosquito populations [22, 26,27,28]. In this study, to investigate the correlation between meteorological variables and inter-annual and seasonal variation in Culex mosquito population carrying WNV, CCMs were developed for mosquito data from central north Georgia (GA). The main goal was to develop an improved predictive model of CQ populations by extending effects of meteorological conditions over a range of time rather than a single point in time. Two modeling approaches were applied in this study, multi-regression and Artificial Neural Network (ANN); lagged meteorological data were fed into these models for prediction purposes. In addition, as there is a correlation between any two observations of the time series of mosquito count data, antecedent conditions of response variable up to 10 weeks prior to the event were added to the models as predictors. It was hypothesized that addition of past values of mosquito count data to the model improves the model performance and increases the prediction accuracy.

2 Materials and methods

2.1 Mosquito and meteorological data

In this study, effects of meteorological variation on female CQ abundance per trap night were explored for central north Georgia. The weekly meteorological data including mean weekly precipitation, temperature, potential evapotranspiration (PET), and available moisture in surface layer from 2002 to 2009 were downloaded from the National Weather Service, Climate Prediction Center (CPC) (http://www.cpc.ncep.noaa.gov/products/monitoring_and_data/drought.shtml). Climatic divisions were defined for the state of GA by CPC, and as the Atlanta metropolitan area is located in division 2, the central north part of GA, weekly climatic data were obtained for this division (Fig. 1). Soil moisture is estimated by a one-layer hydrological model [29, 30]. The model takes observed precipitation and temperature and calculates soil moisture, evaporation, and runoff. Potential evapotranspiration is computed from observed temperature and using Thornthwaite method [31]. Mosquito data were obtained from 2002 to 2009 for the counties located in division 2. Mosquitoes had been collected, classified by species, pooled by date, location, species and trap type, and tested for WNV infection [32]. Collections were done using paired CO2-baited CDC light traps [33] and gravid traps [34].

Fig. 1
figure 1

A Climatic divisions for the state of Georgia defined by Climate Prediction Center (http://www.cpc.ncep.noaa.gov/products/monitoring_and_data/drought.shtml). B, C Average weekly weather and mosquito abundance data over the period 2002 to 2009 for the central north of GA, respectively

As Culex species either hibernate or become inactive during winter, no traps were set during winter and active mosquito counts were assumed to be zero for this period [28]. Figure 1 shows the average weekly precipitation, temperature and female CQ abundance over the period 2002–2009 for counties located in the study area.

2.2 Statistical analysis

Female CQ mosquito time-series data and its potential relationship with each meteorological variable were analyzed using cross-correlation maps (CCMs). This graphical approach characterizes the temporal structure of mosquito population size in association with meteorological variables. Using this method, the key antecedent environmental conditions, their timings, and durations were identified which can improve the ability of developing predictive models of vector abundances.

Assume Y(t) and X(t) represent two time series with time index t, CCMs illustrate the correlation coefficients (r) between Y(t), in this study number of captured female Culex mosquitos at time t, and a meteorological variable X, averaged over a time period starting at time t − j and ending at time t − k with j ≥ k:

$$r(Y, {X}_{j,k})=cor(Y(t), \overline{X(t-j, t-k)})$$
(1)

In this study, t changes from 1 to 52 for a given year (i.e., weekly time interval). Spearman’s rank-order correlation was applied to calculate the correlations as it makes no assumption about the distribution of the data and does not consider a linear relationship between mosquito abundance and meteorological data. The CCMs were developed for four weeks moving average scale. As preceding meteorological conditions up to 5 months prior to summer play a significant role on the life cycle of mosquitoes, and also to make sure that sufficient time lag is searched to capture high correlations, the maximum time lag was set to 20 weeks. Other studies used similarly long time lags [28, 35]. In addition, the sample autocorrelation function (ACF) was defined for the time series of mosquito count data to identify the time interval over which a correlation in the data series exists. All analyses were performed in R statistical software (version 3.0.2.) [36].

2.3 Predictive models

As the response variable is count data, Poisson regression model was selected for prediction purposes. Poisson regression model assumes that the log-transformed outcomes are linearly related to the count data, and the mean and variance of date are equal. To overcome the limitations of statistical models, and to capture the potential complex nonlinear relationships between meteorological variables and mosquito abundance, the ANN model was also used. ANN is a black box type lumped model that has the ability to identify a relationship from given patterns which makes it possible to solve nonlinear models. ANNs can be categorized based on the direction of information flow and processing. In a feed-forward network, the connections between nodes are from an input layer, through one or more hidden layers, to an output layer [37] (Fig. 2). The most common method used to find the number of hidden layers and nodes is a trial-and-error approach [38]. In this study, the number of hidden neurons changed from 4 to 6, and number of hidden layer was set to 1 to build a parsimonious model and to avoid data overtraining. A neural network was constructed using MATLAB version 7.10.0 (2010) and was trained by adjusting the weights that link its neurons.

Fig. 2
figure 2

An example of feed-forward artificial neural network structure with three vectors as inputs, 1 hidden layer with 4 neurons and two output vectors

Some meteorological variables are highly correlated with each other (e.g., evaporation and temperature), which will cause high variance inflation in the Poisson regression model. To handle such collinearities, Principle Component Analysis (PCA) was used. PCA is a variable reduction technique that uses orthogonal transformation to convert a set of observations of correlated variables into a set of values of linearly uncorrelated variables called principle components. PCA was applied to two sets of predictors: the first set included the interval lags of meteorological variables with highest correlation, temperature, precipitation, PET, and available moisture in surface layer, and the second set included the interval lags of meteorological variables and also antecedent conditions of Culex mosquito count. Components that explained the variability of observed data were fed into ANN and Poisson regression models. 70% of the time series data, selected randomly from the whole data set [39], was used for training and 30% of data was used for testing purposes into the ANN and regression models. Model performances were assessed with the coefficient of determination (R2), Nash–Sutcliffe efficiency (ENASH) [40], and bias ratio (RBIAS) [40].

3 Results

3.1 Cross-correlation maps (CCMs)

Results of CCMs for each meteorological variable and Culex mosquito abundance (female CQ species) data are shown in Fig. 3. Culex vector abundance was positively correlated with temperature and PET, respectively, over 20 to 5 weeks prior to sampling, \(r\left(M,{T}_{\mathrm{20,5}}\right)\)= 0.82, and 19 to 7 weeks prior to sampling, \(r\left(M,{PET}_{\mathrm{19,7}}\right)\)= 0.82, and negatively correlated with four weeks moving average available moisture in surface layer over 16 to 8 weeks prior the capture event, \(r\left(M,{\theta }_{\mathrm{16,8}}\right)\)= − 0.75. Four weeks moving average precipitation over 20 to 13 weeks was positively correlated with mosquito abundance at week t, \(r\left(M,{P}_{\mathrm{20,13}}\right)\)= 0.1; also precipitation one week prior to mosquito capture event was weakly and negatively correlated with vector abundance, \(r\left(M,{P}_{\mathrm{1,1}}\right)\)=  −  0.09. As mosquito population density peaks in summer/early fall (Fig. 1C), counting back the lags with the highest correlation identifies the preceding late winter and spring as the most relevant time period.

Fig. 3
figure 3

CCMs of 4 weeks moving average Culex mosquito abundance versus meteorological variables. Black or white rectangles show the interval lags with the highest correlation

3.2 Principal component analysis (PCA)

The weather data of interval lags with the highest positive or negative correlation were fed into PCA to eliminate collinearity. Table 1 shows the proportion of variance of each component and how much each variable contributed to that principal component. PCs 1, 2 and 3 together explained 97% of the variance in observed data. PC1 has negative loadings for temperature and PET and positive loadings for surface moisture which considering summer/early fall as peak Culex species population corresponds to cold and moist late winter and spring. PC2 has strong negative loading for precipitation which reflects low precipitation in early spring and PC3 is positively related to precipitation one week prior the trapping event.

Table 1 PCA for meteorological variables with highest correlation with four weeks moving average Culex mosquito abundance

3.3 Female Culex Quinquefasciatus abundance prediction

Results of the Poisson regression model showed that all three PCs have a negative relationship with Culex mosquito count data. One unit increase in PC1 decreased mosquito abundance by 50% (0.50, 0.48–0.52, 95% C.L.), (p < 0.0001), so half as many female Culex mosquitoes. PC2 is negatively related to mosquito data, and 1 unit increase in PC2 decreases mosquito abundance by 22% (0.78, 0.75–0.82, 95% C.L.) (p < 0.0001) (Table 2). In addition, PC3 has a statistically significant negative relationship with mosquito data (0.94, 0.91–0.97, 95% C.L.) (p = 0.0004) (Table 2).

Table 2 Poisson model analysis of four weeks moving average Culex mosquito abundance

Three PCs were randomized and fed to the ANN model as input for prediction purposes. Figure 4 compares ANN and the regression model performance versus the observed data, after sorting the randomized data (combined training and testing data). ANN predicted the four weeks moving average mosquito abundance more accurately with ENASH = 0.62 and RBIAS = 9% relative to the regression model with ENASH = 0.52 and RBIAS = 18%. To improve the model prediction accuracy, the antecedent four weeks moving average Culex mosquito abundance data up to 10 weeks prior to sampling was added to the PCA as a predictor. Table 3 shows the proportion of variance of components for each set of PCs. PCs 1 & 2 & 3 & 4 explained about 98% of variance in observed data. Components 1 & 2 & 3 had same interpretation as explained for Table 1. Component 4 corresponds to antecedent mosquito abundance condition. These components were fed into ANN and regression models for prediction purposes. Figure 5 compares the predicted versus observed data for testing and training periods. For all the data sets, ANN performed better, with higher ENASH values and smaller RBIAS values compared to the regression model, and as the lag interval increases, models performances gradually decrease (Fig. 6). This indicates that by combining interval lagged weather data and single time lag antecedent Culex mosquito abundance at the four weeks moving average scale, a stronger model with higher accuracy performance can be built for prediction purposes.

Fig. 4
figure 4

Comparison of predicted four weeks moving average Culex mosquito abundance by ANN and regression model versus observed data

Table 3 PCA for meteorological variables and lagged four weeks moving average Culex mosquito abundance time series. Each set contains antecedent mosquito data changing from 1 to 10 weeks prior
Fig. 5
figure 5

Time series of predicted 4 weeks moving average Culex mosquito abundance by ANN and regression model versus observed data using different antecedent values of mosquito abundance and lagged weather data as predictors for training and testing periods

Fig. 6
figure 6

ANN and regression models performances built using interval lagged weather data and antecedent 4 weeks moving average Culex mosquito abundance for a training period, b testing period

4 Discussion

In this study, the associations of preceding meteorological conditions and Culex mosquito abundance were explored to enhance our understanding of mosquito ecology and disease risk of CQ vectors carrying WNV. To determine the maximum correlations between mosquito data and meteorological variables, cross-correlation maps (CCMs) were generated. The association of vector abundance with leading meteorological variables under specific time interval lags results in more robust inference than analyses that are restricted to single predefined time lags [27]. Using CCMs and considering interval lag structures, both the timing and duration of the meteorological effects are displayed [9]. The relationships revealed between interval-lagged environmental factors and the abundance of mosquitoes carrying WNV can be used as leading indicators of vector abundance. Predicting WNV activity is an essential requirement for vector control, and studying the Culex species population dynamics in relation to meteorological factors like ambient air temperature, surface moisture, and precipitation could help to improve the ability of predicting the WNV risk.

Using PCA, the collinearities among the meteorological variables were removed and the new components obtained at four weeks moving average scale were fed into the ANN and Poisson regression models as explanatory variables. Considering the timing of peak mosquito abundance, which is summer and early fall, results of CCMs and the Poisson regression model reflected that elevated temperature and PET averaged over late winter and spring were closely associated with increased abundance of CQ in summer (considering mid-July as peak mosquito count, Fig. 1C). This is consistent with other field studies as larval and pupal developments are temperature dependent [27]. Also, drier than normal conditions during spring with low available moisture in surface layers creates favorable conditions for the development of Culex vectors in summer. Prolonged above normal temperature extends the duration of the mosquito season and vector activity. It also accelerates the development rate, influences the fitness of immature mosquitoes, the biting rate, and survival of adult female mosquitoes [13, 41]. Also, vector development conditions are facilitated and the frequency of transmission events is increased due to dry conditions by gathering hosts and vectors around nutrient-rich water bodies [19].

An increase in formation and persistence of mosquito development sites due to early period precipitation is associated with an increase in the abundance of Culex mosquitoes [42]. Extensive habitats of Culex mosquitoes can result from the heavy rains and associated flooding, especially in late winter and early spring, which is right before the mosquito life cycle starts. Habitats can include temporary ground pools, pools along receding river floodplains, or natural or man-made containers. However, the impact of precipitation on the mosquito population is controversial [3]. Generally, regions with lower seasonal variation in precipitation such as the southeastern USA have a lower probability of WNV mosquito cases [43]. Also, the southeastern USA receives sufficient precipitation to support mosquito populations throughout the year, making temperature the controlling variable affecting Culex mosquito population dynamics [40]. The CCMs obtained for precipitation versus mosquito count data support this statement. Due to exponential growth rates and also complex interaction between mosquito abundance and rainfall, even small effects of weather conditions on a mosquito population could result in vast effects in future generations [3, 28]

The ANN and Poisson regression model predicted the seasonal cycle of mosquito abundance fairly accurately. The predictions improved significantly when antecedent conditions of mosquito count data up to 10 weeks prior to point of interest were added as predictors to the models. Addition of 1 week antecedent mosquito count data to the ANN model as a predictor increased the ENASH value from 0.62 to 0.89 during the testing period. Also, the addition of 10 weeks antecedent mosquito abundance data to the ANN model improved the model performance during the testing period by increasing ENASH from 0.62 to 0.68 (Fig. 6). ANN predicted the mosquito abundance slightly better than the regression model, which could be due to the high non-linearity of ANN in comparison with Poisson which is a log-linear model. Generally, including the antecedent mosquito count to the model increased the predictive power of both ANN and regression models. This suggests that meteorological conditions and mosquito data from preceding weeks may be better indicators of future population dynamics for Culex quinquefasciatus mosquito species and the WNV risk than just the present size of the population.

5 Conclusions

The findings of this study and the developed ANN and Poisson regression models for predictions could have important implications for the control of West Nile Virus spread by Culex mosquito species. Most other studies developed mosquito abundance regression based models with single time lag antecedent weather data up to 2 months as explanatory variables without fixing the collinearities among meteorological variables and without extending effects of weather conditions over a range of time. Multi-collinearity can increase the variance of the coefficient estimates and reduce the statistical power of the analysis. In addition, single time lag might not capture meteorological effects on mosquito abundance if preceding conditions contributed to breeding and survival over weeks to months [27]. By collecting rigorous weather and mosquito data during important seasons, between February and June, and also addition of any antecedent mosquito count data 1–10 weeks prior, the size of vector populations that are likely to be seen in summer can be estimated and the possible abnormalities in the increase in rates of WNV infection can be monitored. These meteorological factors can be modeled under future warming conditions so that long-term predictions of shifts in risk can be estimated [44]. Such information could be used for planning of mosquito control strategies and to prioritize the distribution of scarce mosquito control resources before the transmission season begins. Also, it can help in early detection of virus circulation in mosquitoes and pro92vide early warning for WNV outbreaks. In years with warm spring and mild late winter, control operations such as applying insecticides can be initiated late in the winter to prevent rapid development of mosquitoes early in the spring and summer as a result of increase in survival rate of Culex mosquitoes throughout the winter. Although weather is the main driver of WNV risk and meteorological factors increase predictive power of determining risk associated with WNV, further studies are needed to explore whether other environmental factors such as socio-economic conditions, and landscape and mosquito habitat characteristics should be accounted for a better understanding of disease risk and for developing a more comprehensive Culex mosquito dynamic simulation model.