Introduction

As air pollution has evolved into a global issue, studies on its impact on human health are continuously being conducted [1, 2]. Air pollutants are known to cause various diseases. In particular, criteria air pollutants, including particulate matter, ozone, NO2, and SO2, are widely distributed in the general atmosphere, and long-term exposure to high concentrations is reported to be strongly related to an increase in the occurrence of respiratory disorder [3, 4], central nervous system [5, 6], cerebrovascular [7, 8], and cardiovascular [9, 10] diseases.

The eye is an organ that is directly exposed to the atmosphere and, thus, to air pollution. Some epidemiological studies have reported that exposure to polluted air both long-term and short-term can cause symptoms such as itching and body sensation [11,12,13,14]. Keratitis and conjunctivitis, which are representative eye diseases, are inflammation of the cornea, conjunctiva, and the tissues surrounding the external surface of the eyeball [15, 16]. These eye diseases are known to be caused by allergic inflammation due to severe air pollution [17, 18]. A recent study indicated a strong correlation between air pollution and a rise in the number of eye-related outpatient visits. [19]. It has been reported that the number of emergency room visits for conjunctivitis and keratitis patients increased in the presence of high concentration of air pollutants [20, 21]. In addition, it has been reported that damage to the ocular surface by air pollutants induces tear film instability, leading to dry eye syndrome (DES) [22,23,24]. One study investigated the association between DES and SO2 concentrations [25], and another study examined the link between outpatient visits for conjunctivitis and SO2 concentrations [26]. The association between O3 levels and DES was examined, and it was established that elevated O3 concentrations and reduced atmospheric moisture contents were linked to the patients visits of DES in the Korean population [27]. Although many studies have been conducted to find the correlation between individual air contaminants and eye diseases, only few studies have identified the correlation between the occurrence of eye diseases according to various standard air pollutants, sex, and regional air quality characteristics.

In 2018, Seo et al. developed a predictive model for conjunctivitis by analyzing the weekly average of air pollutants and meteorological factors in Seoul, and the conjunctivitis incidence rates from 2011 to 2013 [28]. A feature of this predictive model is that it is a multilevel regression model with sex and age as independent variables. In 2020, Youn et al. developed a prediction model for DES incidence rate using the monthly data of air pollutants and meteorological data of nationwide in Korea [29]. The study in this model focused on understanding the link between the population rate across different age groups and the incidence rate of DES. By incorporating the population rate as an independent variable, the precision of the model was significantly improved. Youn et al. [29] further developed a predictive model to identify the correlation between baseline air pollutants and DES; however, it did not designate sex as an independent variable. Since epidemiological studies have established a connection between environmental elements and eye diseases, there is a need to create a model that can forecast the occurrence of keratitis, conjunctivitis, and DES resulting from air pollution, to aid in public health research.

In this study, a previously developed DES incidence rate prediction model was further developed into keratitis and conjunctivitis prediction models. In addition, we analyzed air pollutants and meteorological factors corresponding to eye diseases incidence data, developed an environmental eye disease incidence rate prediction model based on the nationwide prediction model, and analyzed regional deviations in this model.

Methodology

Record of Hospital Visits

The Health Insurance Review and Assessment Service (KHIRAS) supplied the hospitalization data for keratitis, conjunctivitis, and DES in Korea from January 2002 to December 2013. Based on the disease code, data were obtained for a total of 5,874,259 patients for all eye diseases, that contains 347,635, 653,087, and 503,180 patients with keratitis, conjunctivitis, and DES, respectively. The ophthalmic outpatient data were classified by the Korea Institute of Health based on diagnosis history without patients’ personal information. More than 95% of public health insurance information was acquired by the National Health Insurance Service (NHIS) of Korea [30].

Prediction Model Development

For the development of the model, monthly averages of both environmental and eye disease outpatient data were utilized. Nevertheless, since the prediction model based on daily averages might be influenced by weekend and holiday data, there was a need to incorporate variables associated with the day of the week. As a result, data were examined on a weekly or monthly scale, omitting variables related to specific days of the week. In previous studies using three-year data, we used weekly averages because the amount of data obtained using monthly averages was small [28]. However, as this model utilized 14 years (2002–2015) statistical data, it was appropriate to use the monthly average data to minimize errors arising from delays in hospital treatments due to weather conditions. In addition, it was more appropriate to use monthly averages rather than annual or weekly averages to analyze the seasonal effects on incidences of eye diseases. The incidence rate of eye diseases is provided in Eq. (1). The incidence rates of eye diseases were calculated separately for both nationwide and regional levels.

$$\alpha =\frac{\text{number of }\,\beta }{\gamma }$$
(1)

Here, α is incidence rate, β is outpatients of each eye disease keratitis, conjunctivitis, or DES, and population, and γ is population of total number of individuals living in the administrative district. Initially, a correlation analysis was conducted to determine the factors most significantly affecting the number of patients with keratitis, conjunctivitis, and DES. This was achieved by employing the monthly averages (nationwide) for disease incidence, levels of air pollutants, and meteorological elements. We created a model to predict the nationwide incidence rates of keratitis, conjunctivitis, and DES by applying a general regression model to the results of the analysis. Then, model for administrative district was developed by using same methodology as nationwide model.

Data Acquisition

Air pollutants and meteorological data were acquired every 1 h from January 1, 2002, to December 31, 2015, and 254 air pollution monitoring networks were recovered. Each administrative district had 3–70 monitoring sites (Fig. S1). The hourly measured meteorological factors were obtained from a same location of monitoring site, then monthly averages were calculated for the entire country as well as separately administrative districts in Korea.

Results

Monthly Patterns

Figure 1 shows the nationwide average monthly incidence rates of eye diseases from 2002 to 2013. Keratitis, conjunctivitis, and DES showed an increasing trend in the annual average. Keratitis and conjunctivitis showed seasonal characteristics, such as an increased incidence in July–September when the T and RH are high; however, DES showed no direct seasonal characteristics. Outliers were only seen in keratitis and conjunctivitis, and appeared in September 2002, 2003, and 2007. As these outliers could distort the model, they were excluded from the prediction model development (dashed circles in Fig. 1). The monthly average air pollutant and meteorological data are shown in Figs S2 and S3, respectively.

Fig. 1
figure 1

Monthly average of eye disease outpatient rates from 2002 to 2013

Correlation Analysis

Air pollutants and meteorological factors were normalized using Eq. (2) in order to adjust the scale of factors in the developed prediction models, 4 was included to Eq. (2) to eliminate negative results:

$$\text{Normalized data}= \frac{\text{Original data}- \mu }{\sigma }+4$$
(2)

where µ refers mean value and σ refers standard deviation of data. Equation (3) describes the z-score (z) which estimates a standardized measure of the distance between the data deviates and σ.

$$z= \frac{\text{Original data}- \mu }{\sigma }$$
(3)

Table S1 summarizes the details of the variables used in the model. In previous study, the population rates were used regardless gender [29]. In this study, the population rates are used separately for men and women in order to improve the accuracy of prediction model.

Table 1 displays the correlation coefficients between the incidence rates of keratitis, conjunctivitis, and DES and the air pollutants and meteorological data. There seems to be a positive correlation between the increase in O3 (× 4) over the years and the incidence rates of keratitis, conjunctivitis, and DES. To verify the relationship between the variables and the eye diseases, the correlation coefficients were calculated for each year and for each eye disease, as presented in Table 2. By eliminating the annual pattern of each air pollutant and meteorological factor, the correlation analysis revealed the seasonal influence of these variables on the eye diseases. As reported by Youn et al. [29], the incidence rate of DES increases when humidity is low and NO2 concentration is high. Moreover, z4 (AP) was found to be positively correlated with keratitis and conjunctivitis, but negatively correlated with DES.

Table 1 Nationwide correlations among model input parameters
Table 2 Annual correlations between outpatient rates and air pollutant and meteorological data by disease

Figure 2 shows the average incidence rates of keratitis, conjunctivitis, and DES by sex and age group, and the three eye diseases show differences in patterns. First, the incidence rate of keratitis increased from age 0 to age 30 in females, whereas, it increased until age 20 and decreased rapidly after that in males (Fig. 2a). The incidence rate of keratitis for women over 10 years of age was higher than that of men, and it was analyzed that there was no significant difference between the incidence rates for males and females over 40 years of age. Only males under the age of 10 years had a higher incidence rate. In the case of conjunctivitis, the incidence rate decreased until the age of 30 years for both men and women, and then increased again after the age of 30 years (Fig. 2b). For the incidence rate of DES, a consistent rise was noticed in male from 10 to 80 s (Fig. 2c). In female, the incidence rate escalated in 20 s, experienced a minor decline from their 30 s to 40 s, and then saw an increase again in 70 s. Female incidence rates were higher than those for male over the age of 10 years. The only common finding among keratitis, conjunctivitis, and DES is that the incidence rate in men under the age of 10 years is higher than that in women.

Fig. 2
figure 2

Nationwide outpatient rates by sex and age, for: a keratitis, b conjunctivitis, and c dry eye syndrome

Table 3 shows the correlation between the incidence rates of keratitis, conjunctivitis, and DES and the population rate for each sex and age group. Youn et al. selected a population rate, from nationwide and regional data, with a high correlation as an independent variable only for DES [29].

Table 3 Correlations between outpatient and population rates, by eye disease

Prediction Model Results

Table 4 displays the predictive models for the incidence rates of keratitis, conjunctivitis, and DES. Unlike previous studies [29], this research used linear, log, and exponential functions as independent variables in the regression models for each air pollutant and meteorological variable to develop more accurate predictions. The models also included a variable “M” representing males to account for different population rates by sex. Regarding air pollutants and meteorological factors, variables identified to have an impact through correlation analysis were included in each model, and the one with the highest absolute value of correlation was added incrementally. The function with the best predictive power was chosen by evaluating logarithmic, exponential, and linear functions. In all three eye disease models, the p-value was less than 0.05, in the in-sample and out-of-sample tests, which was statistically significant. In-sample test R2 values for keratitis, conjunctivitis, and DES were 0.924, 0.897, and 0.955, respectively. Out-of-sample test R2 values for keratitis, conjunctivitis, and DES (previous study [29]) were 0.920, 0.870, and 0.951 (0.9443), respectively.

Table 4 Prediction model for nationwide test results

Figure 3 displays the predicted incidence rates of keratitis, conjunctivitis, and DES using both in-sample and out-of-sample data from nationwide model. There is a similarity in the trends of the incidence rate results between the two sets of data. To examine regional variations in the DES model, the data were divided into administrative districts (Table S2). There are seven metropolitan areas and nine provincial areas in Korea. Table 5 presents the regional prediction models for keratitis, conjunctivitis, and DES, which were developed using data from each region. The DES model achieved a high R2 is 0.744 (in-sample test) and 0.760 (out-of-sample test), indicating a good prediction of the disease trend. This is an improved result compared to R2 of previous study [29] which is 0.709 (in-sample test) and 0.722 (out-of-sample test). However, the models for keratitis and conjunctivitis had relatively low coefficient of determination in the regional models compared to the nationwide model.

Fig. 3
figure 3

Incidence rates versus prediction of in-sample and out-of-sample for nationwide: a keratitis, b conjunctivitis, and c dry eye syndrome

Table 5 Test results of prediction models for administrative districts

Figures 4 and 5 display the results of comparing the predicted values for Area 27 (Daegu) and Area 28 (Incheon) with the actual data, respectively. As shown in the graphs, the overall trend is well aligned. However, the prediction model did not fit well in regions such as Chungnam (44) and Jeju (49). By developing a separate prediction model for each district, it is possible to create a model that can achieve a high R2 value, like the prediction model for the entire nation. When using the regional prediction models, the decrease in R2 indicates that there are other factors that contribute to regional differences, in addition to the factors used in these prediction models. For example, it may be the variation in the degree of harmfulness of air pollutants by region or the living environment of people in each region. Therefore, further regional studies are required.

Fig. 4
figure 4

Incidence rates versus prediction of in-sample and out-of-sample for Daegu (27): a keratitis, b conjunctivitis, and c dry eye syndrome

Fig. 5
figure 5

Incidence rates versus prediction of in-sample and out-of sample for Incheon (28): a keratitis, b conjunctivitis, and c dry eye syndrome

Conclusion

In this study, disease patterns for conjunctivitis, keratitis, and DES were analyzed, and a predictive model was developed. First, a correlation analysis was conducted between disease incidence rates, air pollutants, and meteorological factors, which was used as basic data for model development. As the previously developed DES model was a population model that did not differentiate results based on sex, this study developed an optimized model for three diseases and simultaneously divided the population rate variable by sex, i.e., male and female. In addition, since nonlinear functions were considered for air pollution factors and meteorological factors, it showed a significant increase in the accuracy of the model. A nationwide model was developed first, followed by models suitable for each region. In Incheon and Daegu, the models fit very well, but there are also areas (Chungnam and Jeju) where they do not fit. The deviation of the accuracy of the nationwide prediction models and the regional prediction models shows that there are additional regional factors affecting conjunctivitis, keratitis, and DES. Therefore, further research is needed on the deviation study between the nationwide and the administrative districts models, the development of a weekly prediction model, and the creation of a predictive model utilizing artificial intelligence.