Keywords

1 Introduction

Since late 2003, highly pathogenic avian influenza (HPAI) outbreaks caused by infection with the H5N1 virus have led to the death of millions of poultry and tens of thousands of wild birds. As of February 8, 2012, 42 laboratory-confirmed human infections have occurred in China [1]. Although HPAI H5N1 has taken place in a limited number of provinces in China, spreading might occur at any time due to movement of domestic birds, migration of wild birds, and interaction of both. This ongoing H5N1 avian influenza epidemic in China poses risks to animals as well as human health, and will be elevated by the potential cross-species transmission to humans and subsequent re-assortment of avian and human influenza viruses in co-infected individuals [2]. Thus, it is urgent and important to model the risk of the H5N1 infection in China. Modeling may help to detect areas of unusually high and low risks so that actions may be taken in advance to allow better resource allocation for prevention and control.

So far, studies aiming to identify HPAI H5N1 risk factors and predict risk have been undertaken in many countries where the disease was introduced, such as Thailand [35], Vietnam [6], Indonesia [7], Bangladesh [8], the U.S. [9], the Netherlands [10], Romania [11] and Southern Africa [12]. Only three studies tried to model the risk of HPAI H5N1 in China [1315]. Despite this research effort, a central goal still exists: to understand the factors favoring the continuing reoccurrence of the virus [16]. Specifically, little is known about the agro-ecological conditions associated with highly pathogenic avian influenza H5N1 virus spread and persistence [4, 13].

In this research, risk refers to the likelihood distribution for the number of cases of avian influenza in a particular area. The goal of modeling the risk of avian influenza is to examine spatial variation in risk in terms of number of cases for a given country or region. There are few examples applying spatial modeling methods to predict avian influenza risks [17]. Logistic regression models have been used widely [11, 13, 15, 16, 18]. One may implement logistic regression to characterize the statistical association between avian influenza cases or outbreaks and environmental covariates. However, considering risk modeling in a spatial context, particularly in the case that the areas are small, one would expect “residual” dependence between counts in areas that are geographically close, due to unmeasured risk factors or errors in the data that have spatial structure [19]. In such cases, simple logistic regression modeling is insufficient.

To account for spatial dependence in the residuals, a model involving spatial autocorrelation may be fitted [20], such as a spatial regression model [9, 21, 22], and Bayesian geostatistical logistic regression model [23]. Ignoring autocorrelation may lead to the erroneous conclusion that a variable is significant in explaining avian influenza cases when the variable is in fact insignificant [9]. Such spatial regression models have the advantage that both environmental covariates and spatial autocorrelation can be estimated and full posterior distributions can be produced to quantify uncertainties in the parameters of interest [23]. It is only in recent years that researchers have begun to apply spatial regression models to avian influenza risk [9]. However, they have not yet been applied to avian influenza risk in China.

Compared with previous studies, this research brings two improvements. First, we modeled the risk of H5N1 in poultry and wild birds, while others studies examine the risk in poultry only [15]. Surveillance for HPAI in wild birds will help to predict the spread of the avian influenza virus, and is an important component of a comprehensive surveillance program [24]. Therefore, it is included here. As the first attempt to model the distribution of HPAI H5N1 risk in China, Fang et al. [13] predicted areas at high risk in ecological areas that would not support the maintenance and transmission of the virus (e.g., the extremely large desert regions of Inner Mongolia, Tibet and Xinjiang autonomous regions) [15]. Moreover, the research conducted by Cao et al. [14] was mainly for risk analysis, rather than for risk modeling. Secondly, a generalized linear mixed model combined with variogram modeling was used for risk modeling here to account for spatial dependence. These two new aspects shed light on the risk of HPAI H5N1 in wild birds and poultry in China.

2 Methods

2.1 Data and Test for Spatial Dependence

Data on the number of cases of the HPAI H5N1 in wild birds and poultry in China reported from January 2004 to March 2011 were provided by OIE, a world organization for animal health [25]. Basic geographic data were provided by the Data Sharing Infrastructure of Earth System Science [26]. During the period January 2004 to March 2011, three main epidemic waves occurred in the number of HPAI H5N1 cases in China (Fig. 1). Some periodicity is evident. The H5N1 outbreaks in poultry during wave I were mainly distributed in central and South China, while outbreaks in wild birds only appeared in south (Fig. 2a). During wave II, the outbreaks in wild birds expanded from Southern to Western and North-Eastern China; and outbreaks in poultry moved to the North-Western and Northern part of China, while outbreaks in central China decreased significantly (Fig. 2b). During wave III, the HPAI H5N1 outbreaks in both poultry and wild birds decreased further. Outbreaks among wild birds were distributed in the West and South of China only, and those among poultry were mainly distributed in Xinjiang, Tibet and Guangdong Province (Fig. 2c). The HPAI H5N1 outbreaks in wild birds were distributed along the bird migration flyways, especially along the eastern one (Western Pacific Route) and the western one (Middle-Asia India Route) (Fig. 2).

Fig. 1.
figure 1

Temporal distribution of monthly HPAI H5N1 cases numbers reported in China

Fig. 2.
figure 2

Spatial distribution of HPAI H5N1 outbreaks reported during the three main epidemic waves in China. (a) Wave I: 01 January of 2004–30 December of 2004; (b) Wave II: 01 January of 2005–30 December of 2006; (c) Wave III: 01 January of 2007–31 March of 2011

Spatial dependence is the propensity for nearby locations to influence each other and to possess similar attributes [27, 28]. It is necessary to test for spatial dependence in the model residuals. If spatial dependence exists in the model residuals then it needs to be considered in the model for predicting HPAI H5N1 cases in different geographical areas. To test for spatial dependence in the model of HPAI H5N1 cases in wild birds and poultry in China, Ripley’s K function and Moran’s I statistic as the statistical measures of spatial dependence for point locations were used here [29].

Usually, L-function \( L(d) \) is used instead of the K-function to test for autocorrelation in a spatial point distribution [29]. Figure 3 shows that the observed value of \( L(d) \) for the HPAI H5N1 cases in wild birds and poultry of China between 2004 and 2011 was outside of the two envelope bounds (min and max), which is the confidence interval of Monte Carlo test. The value of \( L(d) \) increases with the increase in distance of separation from 10 km to 1200 km. It shows that the spatial distribution of the HPAI H5N1 cases in wild birds and poultry of China 2004–2011 was clustered. The value of \( L(d) \) increases rapidly at the distance between 11 km and 24 km and then it reaches its plateau slowly (Fig. 3). This might be related to the poultry activity radius of 11–24 km and the wild bird activity radius of greater than 24 km in China.

Fig. 3.
figure 3

Ripley’s K function for the HPAI H5N1 cases in wild birds and poultry

In addition, spatial autocorrelation analysis has also been performed on the incidence rate of HPAI H5N1 in poultry by Moran’s I statistic (Table 1). It shows that the Moran’s I statistics on the incidence rate of HPAI H5N1 in poultry for the year of 2004, 2005 and 2011 are significantly greater than zero. It means that the HPAI H5N1 incidence in poultry is positively spatially autocorrelated (Table 1). Complete wild bird population data at county level in China are unavailable. Therefore, spatial autocorrelation analysis was not performed on the incidence rate of HPAI H5N1 in wild birds. From the above test results, it can be found that spatial dependence does exist on the HPAI H5N1 cases in wild birds and poultry in China from 2004 to 2011. Therefore, it is important to include spatial component in the model to predict the HPAI H5N1 cases in various geographic areas. And it is suitable to apply spatial regression model in this research.

Table 1. Global spatial autocorrelation by Moran’s I statistic on the incidence rate of HPAI H5N1 in poultry of China

2.2 Environmental Covariates

Ten environmental covariates were considered as risk factors for HPAI H5N1 cases in wild birds and poultry in China. The covariates include human population density, annual mean temperature, annual precipitation, poultry density, mean elevation, Euclidean distance to lakes and wetland, minimum distance to the nearest bird migration route, minimum distance to the nearest road, minimum distance to the nearest city and road density (Table 2).

Table 2. Environmental variables associated with the HPAI H5N1 cases in wild birds and poultry of China

Human population density was chosen as one of the risk factors because it was found to be associated with HPAI H5N1 in several studies conducted in countries with different agro-ecological conditions such as Thailand, Bangladesh, Vietnam, Romania and China [6, 8, 15, 16, 30, 31]. Human population density may indicate higher levels of trading activity [18]. Disclosure of the HPAI H5N1 cases in Thailand poultry markets in 2006 and 2007 suggested that the HPAI virus had continued to spread among poultry through trade activities despite the presence of control measures [32].

Annual mean temperature and annual precipitation were chosen as climatic factors here. Some researchers found that a sudden drop in temperature occurred shortly before HPAI H5N1 outbreaks among birds in the Eurasian regions in 2005 and 2006 [33]. Climate change and subsequent immune-suppression may have allowed the H5N1 virus to proliferate more efficiently in birds which have already been carrying the virus, thereby, hastening the inter-species spread of the virus and the deaths of wild birds [33]. HPAI H5N1 virus transmitted by migratory birds could be spread during the migration period, and the speed of such a spread may be elevated in particularly cold winter [34]. Lower levels of moisture and precipitation may affect the availability of food resources and, thereby, influence the distribution of wild birds [18].

Previous research has shown an association between poultry density and HPAI H5N1 outbreaks [3, 15, 16, 30]. In China, the HPAI H5N1 cases were not always positively related to poultry density [13, 15]. Chickens in areas with high population densities are usually bred in industrialized farms with good animal husbandry practices and proper vaccination [35]. Mean elevation was chosen as an environmental factor here because some research has reported an increased HPAI H5N1 risk in lowland and river delta areas [6, 16, 36]. Also it has been demonstrated HPAI H5N1 outbreaks in South Asia and China to be significantly associated with elevation [15, 16]. Normally suitable habitats are concentrated in the lowland, and then elevation influences the availability of food resources and shelter for waterfowl, which are natural hosts for the HPAI H5N1 virus [18]. Water bodies and wetlands have been found to be significantly associated with HPAI H5N1 outbreaks in China, India and Bangladesh [13, 37, 38]. Lakes and wetlands are important for migratory (and local) waterfowl and provide potential, suitable habitats [18].

Some researchers have found that infected wild birds can carry the avian influenza virus for long distances during migration [39]. Wild bird migration is important for avian influenza virus transmission [13]. Usually migratory birds cannot fly the full distance to their annual migratory destination at once. Instead, they usually interrupt their migration to rest and refuel [40]. Avian influenza virus may be spread between wild and domestic birds when migratory birds search for food, water and shelter [13]. Proximity to cities and proximity to roads were included as risk factors here since they relate to poultry trade and movement which may facilitate the mechanical spread of the HPAI H5N1 virus [13]. During long distance transportation, a variety of birds and animals from various origins are caged on top of each other, possibly providing an easy cross-infection route for avian influenza. Moreover, many open live poultry markets are established along or near roads, which may further increase the chance of avian influenza virus transmission [13].

2.3 Spatial Regression Model

The statistical model represents the number of avian influenza cases per geographical unit as a Poisson-distributed random variable, which is appropriate for analyzing disease cases in which some geographic units have many cases but most units have few or no cases [9, 21]. Since not accounting for spatial autocorrelation when predicting the number of HPAI H5N1 cases may lead to the erroneous conclusion that an environmental variable is significant when it is in fact non-significant [9] a spatial regression model was used here to predict the number of HPAI H5N1 cases per geographical area.

Spatial regression models are regression models with a term to account for spatial dependence, which is assumed to arise from some unobservable latent variable(s) that are spatially correlated [41]. There are a lot of forms for spatial regression models. Here a generalized linear mixed model (GLMM) incorporating a variogram model was used to explore the statistical association between HPAI H5N1 cases in wild birds and poultry and environmental factors, to quantify the relative importance of the main environmental factors, and to predict the number of HPAI H5N1 cases in geographical areas [9, 21].

The key elements of a classical linear model are (i) the observations are independent, (ii) the mean of the observation is a linear function of some covariates, and (iii) the variance of the observation is a constant [42]. The extension to generalized linear models (GLM) consists of modification of (ii) and (iii) above; by (ii)’ the mean of the observation is associated with a linear function of some covariates through a link function; and (iii)’ the variance of the observation is a function of the mean [42]. Generalized linear mixed models (GLMM) are natural extensions of GLM and linear mixed models that allow for additional components of variability due to latent random effects [43].

The Poisson log-linear mixed model was used in this research. The Poisson distribution is often used to model responses that are counts [42]. Suppose that, given the random effects \( \alpha \), the counts \( y_{1} \ldots y_{n} \) are conditionally independent such that

$$ y_{i} |\alpha \sim Poisson(\lambda_{i} ) $$
(1)
$$ \log (\lambda_{i} ) = x^{\prime}_{i} \beta + z^{\prime}_{j} \alpha $$
(2)

where \( x^{\prime}_{i} \) and \( z^{\prime}_{j} \) are known vectors, \( \beta \) is a vector of unknown parameters (the fixed effects), and \( \lambda_{i} \) is the expected number of occurrences during the given interval [42].

Here, glmmPQL() in the MASS package of R was used to run a GLMM [44, 45]. The Poisson log-linear model with a random intercept estimated through the PQL method can be written as:

$$ \ln (\lambda_{i} ) = \beta_{0} + x_{i} \beta_{i} + b_{i} $$
(3)

where \( \lambda_{i} \) is the number of avian influenza cases, \( \beta_{0} \) and \( \beta_{i} \) are the unknown parameters for the fixed effects, \( x_{i} \) are the environmental covariates, \( b_{i} \) are the random effects with distribution assumption:

$$ b_{i} \sim N(0,\sigma^{2} ) $$
(4)

Formula (4) means that the random effects of \( b_{i} \) are normally distributed with a mean of 0 and a variance of \( \sigma^{2} \).

A GLMM was chosen here because it is not only one of the fundamental tools in the analysis of longitudinal data in epidemiology [44, 45], but also it allowed for a spatial correlation structure through its random effects term [9]. The random effects term of \( b_{i} \) is similar to the residual (error) term in classical linear models. Thus, the GLMM incorporates spatial autocorrelation in the residuals through its random effects term. Here, variogram modeling of the spatial autocorrelation in the residuals \( r_{i} \) was used, where:

$$ r_{i} \sim N(\mu ,\sigma^{2} ) $$
(5)
$$ \sigma^{2} = I\sigma_{1}^{2} + F\sigma_{2}^{2} $$
(6)
$$ F = \exp ( - d_{ij} /\rho ) $$
(7)

where, the residuals \( r_{i} \) of the GLMM (formula (3))are distributed normally with mean \( \mu \) and variance \( \sigma^{2} \), \( \sigma_{1}^{2} \) is the nugget of the residuals’ semi-variance, \( \sigma_{2}^{2} \) is the sill, \( \rho \) is the range, \( d_{ij} \) is the lag distance, \( I \) is the adjusted coefficient.

The final model used to predict the number of HPAI H5N1 cases per geographical area can be written as:

$$ \ln (\lambda_{i} ) = \beta_{0} + x_{i} \beta_{i} + r_{i} $$
(8)

where \( \lambda_{i} \), \( \beta_{0} \), \( x_{i} \), \( \beta_{i} \) and \( r_{i} \) are as in formulas (1)–(7).

3 Results

From Table 3, four environmental covariates are significant in predicting the HPAI H5N1 risks in wild birds and poultry in China between 2004 and 2011. The significant covariates are annual mean temperature, poultry density, distance to lakes and wetlands and distance to bird migration routes. In particular, the estimated coefficient for poultry density is 0.00. This means that HPAI H5N1 cases in wild birds and poultry is negatively correlated with poultry density. It might be partially contributed by the fact that poultry are normally fed in industrialized farms where poultry density is high, and where poultry have been vaccinated and well managed. On the contrary, poultry density is low in rural villages where poultry usually are fed in open backyards without having been vaccinated. This would imply a rural-urban divide such that poultry fed in backyards in rural areas are more likely to become infected.

Table 3. Effects of environmental variables on the HPAI H5N1 cases in wild birds and poultry in China 2004–2011

There were 111 sample data altogether in this research, 96 of which were chosen randomly and used in the GLMM, while the remaining 15 data were used for validation. After doing regression, the 96 points’ residuals have been done variogram modeling to test their spatial autocorrelation. It clearly shows that the curve of semi-variance rising up steadily with lag distance increasing from 0 up to about 100 km, and then it keeps level when lag distance is greater than 100 km (Fig. 4). It indicates that residuals have spatial autocorrelation when lag distance between 0 and 100 km, and spatial autocorrelation doesn’t exist when lag distance greater than 100 km.

Fig. 4.
figure 4

Isotropic variogram of fitted spherical model for the 96 sample data’s residuals

Root Mean Square Error (RMSE) has been used here to do the validation. The RMSE of the left 15 sample data is 11.56 cases per 10 km × 10 km pixel when the adjusted coefficient of \( I \) in formula (6) is 3.5. This means that its prediction results are desirable to utilize the spatial regression model which is GLMM incorporating with variogram modeling in this research. Model validation statistics revealed that the final spatial regression model has good predictive ability for HPAI H5N1 cases in geographical areas.

Moreover, relative risks of the HPAI H5N1 in wild birds and poultry in China were divided according to the predicted number of HPAI H5N1 cases in geographical areas (Fig. 5). Risk maps generated from the model shows a heterogeneous distribution and importantly risk of HPAI H5N1 in wild birds and poultry in China was found to highly varied across all regions. The highest predicted relative risk of HPAI H5N1 in wild birds and poultry mainly occurs in Northwest, Central and Southwest China, which are very near the Middle-Asia India bird migration route (Fig. 5). Another high predicted relative risk area occurs in Southeast China which is near the Western Pacific bird migration route (Fig. 5). It implies that wild birds and bird migration may play an important role in HPAI H5N1 virus spreading in the wild birds and poultry in China.

Fig. 5.
figure 5

Predicted relative risk of the HPAI H5N1 in wild birds and poultry in China 2004–2011

4 Discussion

Risk modeling treats the entire transmission cycle as a black box, and focuses on the spatial position and environmental characteristics of sites where humans or poultry contract the disease [46]. As such, independent testing and repeated challenging of models to be predictive and general are central to this application of risk modeling [17]. In this research, GLMM incorporating variogram modeling was used to predict the number of HPAI H5N1 cases in geographical areas. The model was limited by the data available, which were themselves limited to those events that have occurred in the last decade. Given more data, it is possible that a greater number of environmental covariates may be seen to affect the number of HPAI H5N1 cases in wild birds and poultry in China, and more sample data would be available to use in validation. A traditional problem with risk distribution maps predicted by statistical models, based on linking the presence/absence of a disease or species to a series of predictors, is that they often lose much of their predictive power when extrapolated outside of the spatial range of the training data, which makes external validation difficult [16]. This problem exists in the present research also and, therefore, caution is warranted when extrapolating the results beyond the present spatial and temporal domains.

Another problem that cannot be avoided is how to choose environmental covariates and how to represent their effects on the number of HPAI H5N1 cases in wild birds and poultry, because the processes including environmental factors influencing the spread of HPAI H5N1 virus are not clearly understood [13]. Here, we have chosen ten well defined environmental covariates, but the analysis may benefit from addition of other covariates and other representations (e.g., the role of live bird markets, cropping intensity etc.). Moreover, environmental covariates have temporal variability and this could lead to temporal variability in HPAI H5N1 virus cases. Thus, more effective environmental covariates and their interactions, as well as temporal variability could be taken into account in future research.

5 Conclusion

This research has adopted an integrated spatial regression model to explore the associations between the number of HPAI H5N1 cases in wild birds and poultry in China and ten environmental covariates such as to predict HPAI H5N1 risk in different geographical areas. This spatial regression model comprises a GLMM including a variogram model term allowing a quantitative analysis of the effects of environmental covariates and spatial dependence in the HPAI H5N1 incidence residuals. This spatial regression model is promising because it has a simple structure and good predictive capability. Thus, it can be applied to risk modeling of other subtypes of avian influenza and other diseases where spatial autocorrelation persists in model residuals.

The spatial regression model applied to risk modeling of HPAI H5N1 in wild birds and poultry in China has produced some interesting results. Four environmental covariates were significantly associated with the number of HPAI H5N1 cases in wild birds and poultry. These four covariates were annual mean temperature, poultry density, distance to lakes and wetlands, and distance to bird migration routes. Predicted high risk areas were identified in Northwest, Central, Southwest and Southeast China. These high risk areas fall within two bird migration routes: the Middle-Asia India Route and the Western Pacific Route. This implies that wild birds and bird migration may play an important role in outbreaks of HPAI H5N1 in China. Further research should be undertaken to explore further these findings, with the possible goal of targeting these geographical regions for future surveillance and control.