1 Introduction

Environmental problems, especially air pollution caused by rapid urbanization, growing factories, and vehicles, affect public health adversely. Individuals’ health is negatively affected by air pollution. Air pollution directly correlates with infectious diseases (Luo et al. 2020) and respiratory diseases like Asthma and Tuberculosis (TB) (Gorai et al. 2016; Jiandong et al. 2020). In 1952, the London Smog Phenomenon resulted in almost 12,000 deaths due to atmospheric congestion and increased air pollution concentrations (Lee et al. 2016; Bert and Stephen 2002). The Air Quality Index (AQI) measures air pollution levels (Weichenthal et al. 2014). AQI depends on the average concentration of the particulate pollutant measured over a specific time interval. As AQI rises, the population experiences severe adverse health effects, and a specific color code was assigned to each range of air pollution based on the severity of adverse health effects. The air quality index indicates how polluted the air is and considers the Particulate Matter (PM) concentrations.

The primary pollutant that contributes to air pollution is particulate matter, and a rise in PM concentrations has a negative impact on human health (Strak et al. 2010). There are two subcategories of particulate matter, coarse particulate matter (PM10) and fine particulate matter (PM2.5). PM10 are particles of 10 µm in diameter, and PM2.5 is a fine inhalable particle with diameters generally 2.5 µm. PM2.5 is more dangerous than PM10 i.e., particles with minute size (< 2.5 µm) can able to travel deeply into the respiratory tract and can cause breathing issues to individuals. It leads to respiratory problems as it remains in the atmosphere much longer. Particulate matter (PM2.5) concentrations correlate positively with asthma, pulmonary infections, and pneumonia. It can enter the nose and mouth to deposit on the respiratory tract, and exposure to high PM2.5 concentrations may worsen asthma symptoms (Hao and Linyu 2018; Jennifer et al. 2010). There is evidence that PM2.5 is causing an increase in asthma among children and elders (Robert and Kazuhiko 2010; Ko et al. 2007).

Respiratory conditions like asthma are impacted by the effects of climate change on air pollution levels (Renato et al. 2010). As a result of ingestion, inhalation, contact, and iatrogenic transmission, pathogens enter the respiratory system. Small virus particles are suspended in the aerosol during dust storms and can travel through the air on airborne particles (Sutyajeet et al. 2016). The influenza virus survives for days on surfaces and for hours in aerosol; 21% of the virus is transmitted through aerosol, 29% by close contact between individuals, and 50% via contact with surfaces (Antonio et al. 1996). Asthma outbreaks are caused by air pollution and meteorological parameters like high temperatures, precipitation, air pressure, and humidity (Iha et al. 2016; Killerby et al. 2018). Multiple viruses are affected by temperature and relative humidity, and influenza infection is inversely correlated with both (Lipsitch 2020).

An investigation has recently been conducted regarding the relationship between particulate matter and communicable disease fatality rates (Yao et al. 2020). Infectious respiratory mortality may be increased by exposure to particulate matter. Pollution partially impacts lung infection-related mortality in Italy as of March 2020 (Conticini et al. 2020). Infectious respiratory diseases are strongly influenced by meteorological conditions. With seasonal variations in meteorological conditions, there is no quantitative association between health risk and air quality index. Temperature parameters such as temperature, perception, virus transmission, and viability are strongly correlated. In addition, the virus transmits more efficiently in winter than in summer (Killerby et al. 2018; Charkraborty et al. 2021). Health disorders affecting the respiratory system still face many challenges, including underdiagnosis and overdiagnosis, high mortality rates, and high costs associated with exacerbations (Exarchos et al. 2020). This study’s main objective is to investigate changes in respiratory infections related to particulate matter and respiratory casualty prediction trends.

Traditional models such as Markov chain, autoregression integrated moving average, and regression models have difficulty meeting assumptions, overfitting, and increased loss when predicting respiratory casualty. These models require a long time and complete datasets to obtain unbiased estimates. Analyzing and modeling complex health disciplines has been made possible by deep learning models like artificial neural networks (ANN). These models do not require prior knowledge and are independent of the data’s statistical distribution. Using the data released by the Indian government, we attempted to predict respiratory casualty caused by air pollution that adversely affects the human body. An improved deep-learning algorithm was applied to identify respiratory infections caused by various viruses to predict respiratory casualty. The performance of the proposed deep learning algorithm was then verified, and it was compared with existing deep learning algorithms with default regularisation.

1.1 Motivation

Air pollution causes many environmental and health issues, including cardiovascular disease, respiratory disorders, and early death. Anthropogenic activity produces hazardous gases and particles released into the atmosphere, causing emergencies. Humans exposed to air pollution over an extended period are susceptible to lung cancer, asthma, chronic obstructive pulmonary disease, and other respiratory health issues. The accurate predictions will help to give early warnings during hazardous air pollutants. Prediction, tracking, early alerts for hazardous air quality, and preventative measures help manage pollution and improve air quality through environmental regulations and pollution control techniques. These actions also contribute to environmental sustainability. Along with precise air pollution prediction levels, the public and the government may take the necessary actions.

1.2 Research contributions

  1. (1)

    Applied the enhanced ANN model to estimate the respiratory infection risk.

  2. (2)

    An improved regularization function is proposed to estimate accurate values and reduce the loss of estimation of the respiratory infection risk.

  3. (3)

    Extensive Experiments were conducted on publicly available air pollution datasets, meteorological data, and respiratory infection data.

2 Related work

Burnett et al.’s (Burnett et al. 2018) discussion of mortality risk factors focuses on the issues of contact to outdoor fine PM. To calculate and analyse the risk functions while exposed to elevated PM2.5 concentrations in the wide, they use the Global Exposure Mortality Model. Based on the duration of exposure to fine particulate matter, air pollution has a substantial influence in environmental risk factors that cause disability and cardiovascular death (Rajagopalan et al. 2018). Lelieveld et al. (2019) address empirical studies that argue for reevaluating disease trouble by comprehending the lessening in life expectancy imposed by air pollution. Ferronato and Torretta (2019) expressed their opinions regarding the significant impact of improper waste management and serious health problems due to contamination of the land, water, and air due to pollutant discharge, which are the vehicle for ailment transmission when improperly disposed of, treated, and managed. Wu et al. (2020) observed the chance that long-term experience to air pollution will augment COVID-19-related health problems and mortality.

Pye and Nenes (2020) proposed the atmospheric bitterness of particles and vapours and its effects on the health of people. According to Wu et al. (2020), as air is the most fundamental medium for transmission, it is important for these respiratory viruses to spread through various channels. The exposed droplets and aerosols also have an effect on human-to-human transmission by inducing death and speeding up infection rates (Shiu et al. 2019).

The risk of health deterioration is considerable as a result of both direct and indirect exposure to contaminated air (Weichenthal et al. 2014). Stark et al. (2010) concluded that there is a correlation between pollutant air and disorders in the heart and lungs. Augmented contact to air pollution, regardless of the source, has reduced health hazards. A raise in PM2.5 exposure causes inflammatory bowel disorders, connective tissue diseases, and an increased risk of rheumatoid arthritis (Adami et al. 2022). Exposure to air pollution raises chronic risks like asthma or lung cancer and psychiatric conditions like schizophrenia and depression (Newbury et al. 2021). Long-term exposure to fine particles alters the bloodstream’s functioning, affecting how well the heart works. Additionally, coronary arteriosclerosis has been associated with compounds generated due to traffic emissions (Hoffmann et al. 2007).

3 Materials and methods

3.1 Data collection

We collected data on the daily air pollution and metrological and respiratory infection cases (https://cpcb.nic.in; https://main.mohfw.gov.in/documents/Statistics; https://www.covid19india.org/). This study used the fine particulate matter concentration from August 31, 2018, to August 31, 2021 (https://cpcb.nic.in) and considered 8760 h. We divide the collected records into two parts: the first is the training set (70%), and the second is the test set (30%). This study used the number of respiratory infections, influenzas, Pneumonia, and SARS-CoV-2 cases as the prediction target. The count of these cases was extracted from the public domain (https://main.mohfw.gov.in/documents/Statistics), managed by the Indian Government Health Cooperation. We considered the number of cases in India from January 1, 2020, to December 31, 2021.

3.2 Methods

First, to build the prediction performance, examine the association between the independent and the dependent variables. To build the forecast model for the respiratory infection cases, enhanced the regularisation with ANN first analyses the association between the dependent and the independent featuresThis model constructed the ANN model and evaluated the association between environmental features and respiratory infection cases. The approach used in the study is shown in Fig. 1.

Fig. 1
figure 1

Flow of the respiratory disease mortality assessment from Air Pollution

3.3 Pre-processed data

There are some missing values, which were filled using the average of all the values for the particular feature.

3.4 Enhanced regularization

ANN model used in this study consists of 3 layers: input layers used for data input, multiple hidden layers, and an output layer that produces the final result—links the nodes of the hidden layer to the output layer step by step. Every connection has a value, and the updated value assists in the gradient descent process used to back propagate the weights between each layer, measuring the inaccuracy in the final output layer and optimising the weight. The enhanced Regularization reduces the risk of overfitting the dataset by decreasing the overall size of the weights in the ANN model. The proposed Regularization in the loss function enhances the performance of the existing solution by removing the unwanted neurons and maintaining only the essential neurons, minimises the training set’s overfitting and speeds up processing.

The weights will continue to grow over time in every iteration without this Regularization, which decreases the learning rate of the ANN model. It affects the prediction and makes the neural network more complex.

Regularization widens the applicability of the model and enhances its performance by maximizing the regularization coefficient’s value to provide a model that fits the data well. Regularization determines how important this function is by adding the improved regularization term to the loss function. The general form is represented as

$$ {\text{Result}} = {\text{loss }}\left( {{\text{truth}},{\text{ prediction}}} \right) \, + {\text{regularization term}} $$
$$ {\text{Improved Regularization Function }}\left( {{\text{IRF}}} \right) = \sqrt[n]{a}.{\text{a}}^{{3}} $$

Regularization is a technique that calibrates the neural network in order to minimize the loss function and prevent overfit or underfit. It improves performance by learning the model parameters on new or unseen data. Even considering the huge dataset, the data available is minimal compared to the quantity of data the model accepts. Due to the lack of new data, the model may overfit and become adept at accurately representing the training data (Yao et al. 2020). However, it could perform better on new data, i.e., test data. This issue arises when the model has exhaustively trained on the training data that is available that it is unable to generalize successfully for fresh data.

Regularisation expands the model’s applicability and enhances its performance by maximising the value to produce a well-fitted model. Regularisation enables deep learning algorithms to perform effectively with different inputs and is specifically intended to reduce error for both training and test data. The neural network’s loss function determines how far the forecast values differ from the observed values. Adding the regularization term to the loss function determines how strongly the model will influence the training. The regularisation removes the unwanted neurons and works with the essential neurons. The processing time is reduced by disabling and discarding the undesirable neurons.

4 Experimental results

We collected the daily atmosphere data used in this study from the Indian government metrological websites (https://cpcb.nic.in). India has 270 stations for measuring air pollution, and each station’s status is listed by the state. These stations continuously and automatically collect hourly data on air quality, and the information is available to everyone. Major air pollutants, i.e., PM2.5, PM10, CO, NO2, SO2, and O3, were collected from hourly measurements at the 270 monitoring stations in India from 1 January 2019 to 31 December 2021 (https://main.mohfw.gov.in/documents/Statistics). We collected the metrological data from the Indian government websites (https://main.mohfw.gov.in/documents/Statistics). The dataset contained pre-processed hourly values of temperature, humidity, wind speed, wind direction, and rainfall. Table 1 tabulates the statistics of the daily pollutant data, and Table 2 shows the statistics of the daily temperature data.

Table 1 Statistics of the daily pollutant data
Table 2 Statistics of the daily temperature data

The first row in the table specifies the total number of samples in the specific feature. The remaining rows specify the statistics like the specific feature’s mean, standard deviation, maximum, and minimum. Tables 1 and 2 used the same statistical measures.

Figure 2 shows the estimation of asthma due to air pollution in Andhra Pradesh. The total population in Amaravathi is 84,835, and the people affected by asthma are 7353 children, 225 youth, 895 adults, and 6674 elders. The total population in Rajamahendravaram is 561,000, and the people affected by asthma are 4526 children, 262 youth, 674 adults, and 2463 elders. The total population in Tirupati is 729,000, and the people affected by asthma are 73,452 children, 5732 youth, 8456 adults, and 53,631 elders. The total population in Visakhapatnam is 2,138,000, and the people affected by asthma are 35,633 children, 7355 youth, 3365 adults, and 7944 elders.

Fig. 2
figure 2

Estimation of asthma in Andhra Pradesh due to the excess air pollution

Figure 3 shows the estimation of Pneumonia due to air pollution in Andhra Pradesh. The total population in Amaravathi is 84,835, and the people affected by Pneumonia are 5456 children, 43 youth, 3254 adults, and 5467 elders. The total population in Rajamahendravaram is 561,000, and the people affected by Pneumonia are 23,631 children, 537 youth, 17,456 adults, and 26,832 elders. The total population in Tirupati is 729,000, and the people affected by Pneumonia are 164,578 children, 1385 youth, 74,743 adults, and 168,486 elders. The total population in Visakhapatnam is 2,138,000, and the people affected by Pneumonia are 646,671 children, 3545 Youth, 363,456 adults, and 649,574 elders.

Fig. 3
figure 3

Estimation of the Pneumonia in Andhra Pradesh due to excess air pollution

4.1 Mortality risk assessment

Evaluated the proposed model and compared the predictive performance with other algorithms. Table 3 tabulates the performance and measurements of the LSTM, RNN, DNN, and LSTMGRU that have high predictive performance. The mean absolute error for assessing the asthma, Influenza, respiratory infection occurrence is approximately 6.55, and RMSE of about 6.19.

Table 3 Results of different classifiers on the test data

To predict the high or low risk of the spread of respiratory diseases due to air pollution, we used the improved ANN model, which divides the region into five categories: high-high, low-low, low–high, high-low, and no significant. High-high recognizes the areas with a high incidence of disease, and low-low recognizes those with a low incidence of disease. Table 4 tabulates the cumulative mortality rate and its processing time for affected states in India.

Table 4 Cumulative mortality rate for affected states in India

Figure 4 shows the estimation of the mortality risk using the various deep-learning models. The performance metrics MAE, RMSE, and r compare the existing deep learning models. The MAE for the proposed registered cases is 1.10, and for mortality, the risk is 1.45. The RMSE for the proposed registered cases is 4.57, and for mortality, the risk is 5.84. The r value for the proposed registered cases is 0.98, and for mortality risk is 0.99.

Fig. 4
figure 4

Comparison of the mortality risk of the proposed with the existing models

Table 5 tabulates the comparison of the performance metrics for respiratory infections using an enhanced ANN model. The highest accuracy for asthma is 99.95% in the state of Jharkhand. The lowest accuracy is 98.04% in the state of Rajasthan, the highest sensitivity for asthma is 99.85% in the state of Haryana, and the lowest sensitivity is 93.78% in the state of Rajasthan. The highest specificity for asthma is 99.75% in the state of Chhattisgarh. The lowest accuracy is 92.65% in the state of Rajasthan. The highest accuracy for Pneumonia is 99.86% in Arunachal Pradesh. The lowest accuracy is 93.84% in the state of Dadra and Nagar Haveli, the highest sensitivity for Pneumonia is 99.94% in the state of Nagaland, and the lowest sensitivity is 91.17% in the state of Gujarat, and the highest specificity for Pneumonia is 99.37% in state Haryana. The lowest accuracy is 91.46% in the state of Daman and Diu.

Table 5 Comparison of the performance metrics for respiratory infections using Enhanced ANN model

5 Discussion

The soluble air contaminants are harmful to human health. The drive of the air particles from the high concentration space to the low concentration space will increase the level of the pollutants. The finer-grained particles are going to stay around in the atmosphere for a longer period of time than the coarser-grained ones. Poor air quality damages the immune system and respiratory system and exacerbates the virus replication (Schraufnagel, et al. 2019), which leads to the risk of hospitalization and death (Urrutia-pereira et al. 2020).

Almost 4.2 million mortalities occurred due to air pollution as it damages the organs in the body especially respiratory and cardiovascular systems (Forouzanfar et al. 2015). Qin et al. (Qin et al. 2020) recognized the association between high mortality and air pollution and demonstrated that long-term exposure to harmful pollutants made people more susceptible to COVID-19. Asthma, cardiorespiratory disorders, and respiratory inflammation are all caused by prolonged exposure to air pollution. One of the main environmental health risk factors responsible for many million fatalities annually worldwide is fine particle matter (Lelieveld et al. 2015).

Particulate matter adversely affects respiratory diseases (Forouzanfar et al. 2015), and long-term exposure to this pollutant worsens asthma symptoms (Qin et al. 2020). During the flu season, there was a correlation between the particulate matter (PM2.5) and influenzas in Beijing, China (Feng et al. 2016). If the viruses adhere to the particulate matter, they stay airborne for a long period and can spread through the air (Lindsley et al. 2010). The negative effect of the particulate matter enhances the viral replication in the human respiratory system (Xian et al. 2020).

Exposure to increasing levels of air pollution lowers life expectancy and affects the respiratory and cardiovascular systems (Casee and Newby 2011). Small particles have the highest deposition in the lungs and extrapulmonary resulting in systemic inflammation (Tseng et al. 2017). Particulate matter functions as a carrier for many viruses and increases the spread of the virus in the aerosol because it creates a microenvironment suitable for the virus persistence (Setti et al. 2020). Pollution is primarily caused by the harmful effect, and specific particle components provide sustenance for bacteria that serve as carriers (Wei et al. 2020). Cui et al. (Cui et al. 2003) proved that the regions with high air pollution index have double the chance of death than the regions with low air pollution index and concluded that prolonged exposure to the particulate matter might increase the mortality rate. Prolonged exposure to high quantities of particulate matter threats to older adults compromises the lung and the cardiovascular tissue (Sancini et al. 2014).

The major drawback of the traditional models like Markov chain models, linear regression, and the autoregressive integrated moving average is the creation of bias in the estimation of the air pollution-related health impacts like asthma, tuberculosis, and respiratory infections. To overcome the limitations of the linear mapping and prioritize the dataset before predicting the future outbreak. We proposed the improved ANN to prevent bias in predicting the infection outbreak due to air pollution. To represent the relationship between the environmental factors with infectious respiratory cases, we generated the prediction model using the enhanced ANN. The results proved that the highest R2 value of 0.35 outperformed the existing models.

From the results, we observed that the proposed minimized the computational error. To overcome the drawbacks of conventional models, the enhanced ANN was proposed to minimize the error in predicting respiratory casualty because of air pollution. We generated the prediction model using the enhanced ANN to represent the relationship between environmental factors and infectious respiratory cases. Among the limitations of this study are the following: we did not include individual-level data, especially demographic data.

6 Conclusion

The virus-laden particles that are < 5 um in diameter is related to respiratory infections in individuals, and this virus would remain in the air for hours. For sustainable life, air quality is very crucial for human beings. This study proposes a risk assessment model for respiratory diseases using the improved regularization in the artificial neural network. Finally, we conclude that particulate matter and temperature could promote the transmission of asthma, Pneumonia, Influenza, and SARS-CoV-2. The main benefit of the proposed method is the early notification system based on the environment for chronic patients and alarming to all illness categories. It could be widely used by suitably altering the model’s parameters to be acceptable for particular nations in order to collect the data. The future work will be tested on the cardiorespiratory datasets and will try to improve its computational methods. There are several limitations to this study. First, we did not include the individual-level data, especially demographic data. Second, we cannot assess the rate of hospitalization due to some other comorbidities.

6.1 Future work

In the future, the work will be tested on cardiorespiratory datasets, and its computational methods will be improved.