Introduction

The infectious disease COVID-19 caused by the severe respiratory syndrome coronavirus 2 (SARS-CoV-2) has been one of the hardest pandemics in recent human history. This disease emerged in Wuhan city (China) during late 2019. The World Health Organization then declared COVID-19 an international concern on January 30th and a pandemic on March 11th (Srivastava 2021). During 2021 and 2022, affected countries implemented a series of effective measures to cope the pandemic (e.g., quarantines; Coccia 2023), which corresponded to containment strategies such as quarantines and full lockdowns, as well as mitigation strategies such as promotion of social distancing, facemask wearing, and school closures (Coccia 2021a). Although the pandemic has been widely controlled in several countries, a series of variants are still circulating worldwide (e.g., delta, kappa, omicron). The virus is mainly transmitted by droplets from person to person through direct contact derived from coughing, sneezing, and talking, as well as by touching contaminated surfaces (Srivastava 2021). COVID-19 droplets are commonly dispersed by air at 2 m, but studies have found that airborne transmission could occur at distance up to 10 m (Setti et al. 2020). Studies supported the essential role of environmental factors in COVID-19 transmission and mortality, which also interact with social factors, mobility (Kraemer et al. 2020; Kephart et al. 2021), and the trade market in urban areas (Coccia 2020a; Bontempi and Coccia 2021; Bontempi et al. 2021). The positive effect of population density on COVID-19 transmission has been demonstrated in several countries (Bontempi et al. 2021; Diao et al. 2021), while studies have also suggested a modulator role of socioeconomic status (Mena et al. 2021; Coccia 2022). Additionally, there is a high correlation between people’s mobility within cities and trade market activities, and COVID-19 transmission (Kraemer et al. 2020; Kephart et al. 2021). This implies that a suitable socioeconomic condition for the spread of COVID-19 is overcrowded low-income areas in cities with high levels of mobility and trade market interchange. Although population density has been proposed as a main factor for COVID-19 transmission, Coccia (2020b) found that the effect of this factor is modulated by environmental factors such as air pollution and wind speed.

The role of air pollution in COVID-19 has been explored by multiple studies worldwide, revealing contrasting effects. For example, Bashir et al. (2020) found a negative association between PM2.5, PM10, SO2, and NO2, and the number of infected people in California during March and April 2020. In contrast, Menchaca et al. (2021) discovered a positive correlation between PM2.5 and COVID-19 associated mortality in the USA, which is coincident with the results of Song et al. (2022) for PM2.5, PM10, SO2, and CO in China. In fact, the literature reviewed by Srivastava (2021) agrees with this positive correlation, which is also identified for NO2, CO, and O3. However, the same study of Song et al. (2022) reported a negative correlation of O3 with COVID-19 infected cases. The medical evidence suggests that pollutant gases could contribute to COVID-19 transmission and mortality by promoting viral replication through increased patient vulnerability (chronical inflammation as evidenced by Lai et al. 2021), while the lung epithelial and vascular endothelial expression of the angiotensin-converting enzyme-2 (ACE2) receptors to which SARS-CoV-2 spikes glycoproteins tends to increase with pollutants (Lai et al. 2021). Thus, one of the main hypotheses suggests that pollutant gases should increase the transmission and mortality of COVID-19, due to an increase in host vulnerability and virus fatality (Liang et al. 2020). However, a series of studies have suggested that pollutant gases could interact with factors such as temperature, humidity, and wind speed, producing suitable conditions for COVID-19 transmission (Coccia 2021b).

A strong hypothesis about the interaction between air pollutants and weather conditions suggests that stable atmospheric conditions associated with a low wind speed promote COVID-19 by reducing air circulation and increasing pollutant concentration, which allows for extended permanence of viral particles in the air (Coccia 2020b, 2021b). Coccia (2020a2021b) also identified a negative association of wind speed with COVID-19 cases at a global scale. In contrast, Coşkun et al. (2021) revealed an opposite pattern for Turkey, proposing that wind speed promotes COVID-19 spread by increasing air circulation with viral particles. These differences could derive from several sources, including geographic, economic, and social context, highlighting the need to test for these contrasting hypotheses. Regarding temperature, Coccia (2020b) proposed a direct influence on wind speed increase, which would reduce air pollution and suitable conditions for COVID-19 transmission. This same negative correlation between temperature and COVID-19 cases was identified in Brazil by Rosario et al. (2020), and by Sarkodie and Owusu (2020) across 20 analyzed countries. Nevertheless, Xie and Zhu (2020) reported a positive correlation in the case of China, which is not coincident with the studies mentioned above. In this sense, Li et al. (2022) propose that season, latitude, and scale could play an important role in the determination of the directionality of effects derived from meteorological conditions on COVID-19 transmission. Additionally, the behavior of COVID-19 in China, associated with the effects of air pollutants and weather conditions compared to other countries, as well as its role as “source zone” of this pathogen, creates a need to expand our knowledge about the early stages of the COVID-19 pandemic. Therefore, the implications of environmental conditions on the early stage of the COVID-19 outbreak in China can be crucial to improve our preparedness against the emergence of future potential pandemics. In this sense, the present study aims to determine the influence of socioeconomic, infrastructure, air pollution, and weather variables on the relative risk of infection during the initial phase of the COVID-19 pandemic in China.

Methods

Sample and data

The open access COVID-19 infection case database from Xu et al. (2020), updated for near real time, was used as input data. This database provides a geographically specific location of cases (latitude and longitude), detailing sex, age, travel history, date of symptom onset and date of confirmation, and among other parameters. All the cases confirmed since January 18th, 2020, were used, because previous data do not provide a specific confirmation date. The cases were associated with urban areas of Chinese cities (n= 122) using the urban surface reported by Long and Huang (2017). This study analyzed all the infected population reported by Xu et al. (2020) in the urban areas mapped by Long and Huang (2017), reaching a total number of 30,787 cases analyzed between January 18th and February 29th, 2020.

Measures of variables

Environmental conditions corresponded to four variable subtypes (Table 1): (A) socioeconomic, (B) urban infrastructure, (C) air pollution, and (D) weather. Socioeconomic and infrastructure variables were based on Long and Huang (2017), who provide a detailed database of socioeconomic, urban infrastructure, and vitality variables from government agencies, previous studies, and their own data. Socioeconomic variables included population density and income, while urban infrastructure included total roads, amenities, access to transit, distance to train stations, distance to green areas, distance to hospitals, and human modification index (Table 1). A series of variables related to air pollution and weather were generated using the cloud-based platform Google Earth Engine (Gorelick et al. 2017); air pollution included carbon monoxide and sulfur dioxide based on the satellite products from the Sentinel-5 Precursor sensor, while weather variables included mean daily temperature, wind speed and relative humidity (Table 1).

Table 1 Description of predictor variables used in the analysis

Models and data analysis procedure

This study aims to estimate the influence of the previously mentioned variables on the relative risk of COVID-19 contagion. First, the relative risk of infection of COVID-19 disease was estimated on a weekly basis for the initial stage of the outbreak in Chinese cities (6 consecutive weeks) using the Besag, York, and Mollie (BYM) model (Besag et al. 1991) within a Bayesian framework (Lawson et al. 2003). One hundred twenty-two cities were analyzed considering cities where at least one case was reported during the study period (18th January to 29th February. A zero-inflated Poisson (ZIP) distribution was implemented to the model due to the excess zeros in the COVID-19 case data (y; see model code available in Supplementary material 1). The ZIP model contained two link functions: first, the probability (pi, j) of true zeros (i.e., presence of the disease) in city i during the jth week was specified as:

$$logit\left({p}_{i,j}\right)=\alpha +B\ {week}_j$$

where α is a global intercept and Β describes the temporal increase (or decrease) in the probability that the COVID-19 disease spreads to a city (considering a spatial adjacency matrix). Second, disease count data in each city and week was assumed to be a Poisson process, with the mean parameter λijEi, where Ei is the number of expected cases based on city population. Based on the BYM model, the relative risk (\(RR=\frac{y}{E}\)) of COVID-19 disease was estimated as:

$$\mathit{\log}\left({\lambda}_{ij}\right)={v}_i+{u}_i+\sum_i\beta\ {X}_i+\sum_{ij}\beta\ {Z}_{ij}$$

where vi is a spatially unstructured error (i.e., an intercept parameter for each city), and ui is a spatially structured error (i.e., correlation of neighboring cities on a spatial adjacency matrix). A set of 14 covariates was included, with their corresponding regression parameters (β), characterizing socioeconomic, urban infrastructure, air pollution, and weather conditions in each city (Table 1); some of them varied over time (Zij, e.g., temperature), while others were stationary during the study period (Xi, e.g., infrastructure). Normal non-informative distributions were used first for uncorrelated spatial heterogeneity, while correlated spatial heterogeneity was specified with conditional autoregressive (CAR) priors (see prior distributions for all parameters and models fitting in Supplementary material 1). To compare the magnitude of the coefficients, covariates were standardized to have equal unit variance and mean zero. Correlated variables (rs>0.7) were not included in the same model to avoid collinearity. To implement the CAR model, a geographic adjacency matrix through a Dirichlet tessellation process was used, in which the country was divided into smaller, contiguous non-overlapping tiles, one per city. Models with different combinations of environmental variables were ranked based on the deviance information criterion (DIC) values. Model fit was determined based on DIC differences with respect to the best-supported model (∆DIC) and by assessing the agreement between the observed and expected values. The model was run by calling the OpenBUGS version 3.2.3 from R using the R2openbugs package (Lunn et al. 2009). Model convergence was assessed by examining the plots of posterior parameters and using R-hat statistics. Each model was run using three chains until the chains converged. The results could be understood as the ratio of the probability of an adverse outcome in an exposure group divided by its likelihood in an unexposed group (relative risk = (probability of event in exposed group) / (probability of event in not exposed group) (Richardson et al. 2004).

The weekly behavior of air pollution and weather conditions was analyzed across the six studied weeks, showing their temporal patterns and changes.

Results

Geographic distribution of COVID-19 relative risk in China

The provinces with cities that showed the highest relative risk (RR > 10) of COVID-19 disease corresponded to Hubei, Henan, and Guangdong (Fig. 1). Seven cities were identified with the highest relative risk (78 < RR > 10): Xiongan, Ezhou, Wuhan, Suishou, Jingzhou, Jingmen, and Xianning, all of them in the Hubei province. In addition to the latter, a total of 17 cities presented a high relative risk (RR > 1), with the five more affected being Xiogfan, Fuyang, Yichang, Shiyan, and Karamay. In the coastal zone the most affected cities (RR > 1) were Putian, Zhonghan, Wenzhou, and Huizou.

Fig. 1
figure 1

Map of the estimated median risk per week for Chinese cities with COVID-19 cases from 18th January to 29th February 2020, showing different zoomed zones (A, B, C, D) with the name of cities and provinces. Each analyzed city is represented by a point with a particular color and size, color rank shows the estimated relative risk (RR), while the size of the points represents the number of total cases during the analyzed period

Variables explaining COVID-19 relative risk

No significant effect was found for urban infrastructure variables, except for the human modification index, which presented a positive effect on COVID-19 relative risk (RR) (mean = 0.621; SD = 0.47; R-hat = 1.154; Table 2 and Fig. 2). Air pollution variables presented significant effects on RR, identifying that nitrogen dioxide had a positive effect (mean = 0.101; SD = 0.015; R-hat = 1.031; Fig. 2), with a relatively low concentration during the period, except for the third week, when it reached a peak (Figs. 2 and 3). Meanwhile, CO exhibited a negative correlation with COVID-19 RR (mean = −0.48; SD = 0.016; R-hat = 1.051; Fig. 1), which presented a decreasing trend in their concentration during the studied period in the analyzed cities (Figs. 2 and 3).

Table 2 Mean coefficient of variables in relation to the relative risk of COVID-19 derived from the Bayesian analysis. The table shows standard deviation from iterations, R-hat statistics, and level of the convergence of Markov chains presenting the relationship between models that converged (R-hat < 1.2 = *), and did not converged (R-hat > 1.2 = n/s)
Fig. 2
figure 2

Bayesian posterior coefficients associated with the effects of weather (left panels) and air pollution (right panels) with Log of relative risk of COVID-19. Red lines and colored shaded areas represent the Bayesian mean and 95% confidence interval of predicted relative risk. Bayesian estimation of the weekly relative risk values of the 122 cities are included. Subgraphs show the temporal changes of the corresponding variable over the six studied weeks

Fig. 3
figure 3

Map of weekly pollutant gas concentration in China based on the analyzed satellite images from Sentinel-5 Precursor. The left column shows NO2 concentrations, while the right column corresponds to CO concentration

Regarding temperature, a strong negative effect on COVID-19 relative risk was identified (mean = −1.10; SD = 0.058; R-hat = 1.068; Table 2 and Fig. 2), presenting a sustained increasing trend from weeks 1 to 6. Relative humidity presented a positive correlation with RR of COVID-19 (mean = 0.67; SD = 0.024; R-hat = 1.008; Table 2 and Fig. 2), increasing from weeks 1 to 5 and then stabilizing at week 6. Wind speed presented a negative effect on COVID-19 relative risk (mean = −0.079; SD = 0.023; R-hat = 1.021; Table 2 and Fig. 2) showing an increase until week 4 and then decreasing until week 6.

Discussion

This study supports that air pollution and weather conditions played an important role in the early phase of COVID-19 transmission in China. First, the results suggest that urban infrastructure variables did not present a significant effect on determining relative risk of COVID-19. This same effect was also reported for China by Diao et al. (2021), who found non-significant relationship between population density and the spread and decay duration of the first wave. The same study also reported significant relationships in other countries such as Germany, Japan, and England, proposing that this difference is related to the early implementation of strict lockdown policies to control people movement (Sun et al. 2020; Diao et al. 2021). The human modification index presented a positive correlation with RR of COVID-19, which suggests a positive correlation between human derived disturbances and COVID-19.

A positive correlation of NO2 with the relative risk of COVID-19 was found in China, which is coincident with the results of Zhu et al. (2020), who suggested that short exposure to higher concentrations of this pollutant is associated with an increased risk of COVID-19 infection. Travaglio et al. (2021) and Semczuk-Kaczmarek et al. (2022) reported a correlation between NO2 and COVID-19 deaths in England, while Ogen (2020) found the same relationship in Italy, France, Germany, and Spain. In turn, a negative correlation of COVID-19 relative risk with CO concentrations was identified, which is in contrast with studies in other countries (Srivastava 2021). However, this pattern could be attributed to the strict lockdown measures adopted by China during the initial phase of the epidemic outbreak, which have been also identified for spread duration (Diao et al. 2021) and movability (Kraemer et al. 2020). Meanwhile, the positive association of NO2 with COVID-19 infection risk could be linked to the negative effect of this gas on human health, which has been demonstrated to increase the likelihood of respiratory problems (Khaniabadi et al. 2018). One of the main effects of NO2 is the inflammation of the lining of the lungs and the reduction of immunity to lung infections (Dauchet et al. 2018), increasing the infection rate by COVID-19.

The obtained results suggest that higher temperatures decrease the relative risk of COVID-19 disease (Figs. 1 and 2). This pattern was identified by a global study that analyzed 125 countries (Notari 2021), and by a recent study with data from 455 cities worldwide (Nottmeyer et al. 2023), concluding that higher temperatures reduce COVID-19 spread. This negative relationship was also identified in Bangladesh by Emdadul and Rahman (2020), in Indonesia by Tosepu et al. (2020), and in Spain by Pérez-Gilaberte et al. (2023). Regarding humidity, a positive correlation was identified, which is coincident with a previous study in China (Song et al. 2022) and with a global analysis across 1236 regions from 124 countries (Zhang et al. 2021); in contrast, other evidence proposes a positive correlation (Nottmeyer et al. 2023). These contrasting effects have been argued to be associated with geographical factors such as season and latitude (Li et al. 2022), as well as the implementation of measures to tackle the pandemic (Diao et al. 2021). Finally, our results suggest a negative correlation with wind speed, which is coincident with previous studies (Coccia 2020a, b, 2021b; Islam et al. 2021; Rosario et al. 2020). In this sense, the identified effect of temperature and wind speed on relative risk of COVID-19 in China could derive from the influence of atmospheric stability that promotes transmission (Coccia 2021b). This is also in agreement with the positive association of COVID-19 relative risk with NO2, which has a higher density in comparison with CO (Crutzen 1979), which did not present a positive correlation. Thus, the results obtained by this study are coupled with the hypothesis that atmospheric stability promotes COVID-19 transmission (Coccia 2021b).

Conclusions

This study contributes valuable information about the relationship between urban infrastructure, socioeconomic, air pollution, and weather with COVID-19 during the early stage of the pandemic in the first affected country. The used methodology considered a Bayesian approach, which provides an improvement in comparison with other methods used, such as descriptive statistical analyses (correlation indices or regression analyses; Nazia et al. 2022). Bayesian methods provide advantages for the spatial modeling of infectious diseases because spatial units are heterogeneous and have dependency at the same time, while the factors determining risk also covariate to measure (Hong et al. 2021)

However, it is noteworthy that some assumptions and limitations from this study aim to ensure a proper understanding of the obtained results. Studies have suggested that the directionality and significance of the effects of air pollution and weather on COVID-19 could present high variations, derived from season, latitude, and analyzed scale (Li et al. 2022). Thus, considering the specific context of each country when analyzing this type of dynamics related with COVID-19 is highly recommended. In turn, published evidence also suggests that China presents a dissimilar behavior of the relationship between air pollution and COVID-19, which is related to the strict lockdowns policies implemented by the government (Coccia 2022, 2023). Therefore, this result should be interpreted carefully, because the relationship of CO and relative risk of COVID-19 may result from those governmental actions (Cole et al. 2020; Singh et al. 2020).

Regarding the analyzed hypotheses, the results shows that NO2 had a positive effect on COVID-19 in China, but CO did not present the same trend. Meanwhile, it is highly possible that atmospheric stability could play an important role in the early stage of COVID-19 in China, due to the configuration of the identified effects from wind speed, temperature, and NO2. We support that the implementation of policies focused on reducing air pollution could hinder the spread and transmission of infectious diseases in future pandemics, as shown by the evidence we presented here for the case of COVID-19. In this sense, the improvement of the quality of urban environmental conditions may contribute to the improvement of the preparedness against future pandemics. Finally, we believe that our results can be useful for understanding the environmentally driven initial dynamics of COVID-19, informing potential effects of future pandemics.