Introduction

Efficient air quality monitoring networks (AQMNs) provide information that allows the source and location of air pollution to be identified and, thus, have attracted substantial attention in environmental engineering. Such information is beneficial for preventing the catastrophic damage of pollution to humans’ livelihoods (Chen et al. 2006; Dong et al. 2022). Nowadays, techniques associated with effective air quality monitoring system designs have been focused in several studies based on communication and computing approaches. In more detail, assessing the air quality systems gives engineers the golden opportunity to manage this environmental issue with efficient adaptive and intelligent techniques (Hadj Sassi and Fourati 2022). Different simulation techniques have been applied for effective AQMN design, including the figure-of-merit (FOM) approach (Liu et al. 1986), cell approach (Elkamel et al. 2008), passive diffusion sampler–based model (Lozano et al. 2009), industrial source complex short-term 3 (ISCST3) model (Zoroufchi Benis et al. 2015), dispersion model (Zoroufchi Benis et al. 2016), geographic information system (GIS), cluster analysis (Li et al. 2018), interpolation method (Boubrima et al. 2019), and simple linear regression approach (Galán-Madruga 2021). In addition, the MCA method (Elkamel et al. 2008; Zoroufchi Benis et al. 2016), the ISCST3 model (Sharma and Chandra 2008; Kansal et al. 2011), GIS, and kriging (Chung et al. 2019; Galán-Madruga 2021; Wang et al. 2021) have been used to predict the amount of air pollutants around an emission source. However, although these simulation models are suitable for estimating the air pollutants’ concentration around an emission source, this study focuses on designing an effective AQMN for a large area from existing monitoring stations and estimated ones. To this extent, these tools can be supplemented using robust optimization algorithms (Trujillo-Ventura and Ellis 1991; Kao and Hsieh 2006). Therefore, the optimal design of AQMNs draws substantial attention because of the air pollution severity associated with the monitoring stations’ installation and maintenance.

Applied optimization algorithms for designs of AQMNs started from geospatial and statistical techniques to evolutionary and heuristic approaches (Sun et al. 2019; Verghese and Nema 2022), such as heuristic optimization algorithm (Elkamel et al. 2008), ant colony and genetic optimization model (Zoroufchi Benis et al. 2016), genetic algorithm (Hao and Xie 2018), stepwise genetic algorithm (Li et al. 2019), ant colony optimization (Rathee et al. 2019), and clustering analyses (Lu et al. 2011; Stolz et al. 2020). However, as far as we know, the literature has not considered optimizing the AQMN parameters using non-dominated sorting genetic algorithm II (NSGA-II), a robust multi-objective optimization method (Yazdandoost et al. 2022), principally in the presence of uncertainties and shared information between stations. Hence, the present study develops an optimization model using an NSGA-II optimization algorithm concerning the uncertainties and the stations’ transfer information as a novel method for optimal AQMN characterization.

Employing uncertainties within the NSGA-II optimization model for the AQMN characterization is the most critical element for the optimum design of the AQMN. Mutual information, related uncertainties, and the fuzzy analytic hierarchy process (Mofarrah and Husain 2010) can serve this purpose. To this extent, while transinformation entropy (TE) is a well-known approach for calculating each monitoring station’s mutual information with another monitoring station, the preliminary investigation pointed out that adopting TE within the NSGA-II optimization framework for the AQMN using a tool for generating an accurate estimation of potential air quality monitoring stations such as Bayesian maximum entropy (BME) have not been considered in the literature. This highlights the need for presenting an effectual NSGA-II optimization model based on the TE approach using the BME technique for accurate optimal AQMN characterization, which can be empowered when the numbers are assumed to have interval natures using a robust nonlinear interval number programming (NINP) tool (Jiang et al. 2008; Nematollahi et al. 2022).

This study suggests a novel multi-objective optimization framework using NSGA-II based on the TE and NINP approaches using the BME method, considering fuzzy set theory for the accurate and optimal characterization of AQMNs. In this innovative framework, first, the BME method is applied to generate potential monitoring stations; then, the data of the existing and potential stations are collected. Second, the TE method calculates the shared information between each station pair, plotting TE-distance (TE-D) curves. Furthermore, in this stage, the desired configuration of the monitoring network is obtained. Third, NSGA-II multi-objective optimization model is operated using the objectives of minimizing the number of monitoring stations, the value of TE between selected stations, and the average and radius of variations for the TE values and maximizing the fuzzy degree of membership for the amount of TE and the AQMN coverage to obtain a series of Pareto optimum AQMN stations. Finally, the Preference Ranking Organization METHod for Enrichment Evaluation (PROMETHEE) is adopted to rank the Pareto optimal solutions acquired from the multi-objective framework to select the most superior optimum AQMN characterization. To prove the usability of this methodology, it is applied to the real-world AQMNs in Los Angeles, Long Beach, and Anaheim cities as the substantial metropolitan in California, USA. To sum it up, the presented novel multi-objective optimization model fills the gaps in the literature by.

  1. 1.

    Making the multi-objective optimization framework using NSGA-II for AQMN characterization

  2. 2.

    Adopting the BME method to generate potential air quality monitoring stations as a set of input data to the multi-objective optimization framework

  3. 3.

    Utilizing the TE method to calculate the stations’ shared information

  4. 4.

    Applying the NINP technique in the optimization algorithm for the AQMN to include the interval natures of the numbers

  5. 5.

    Considering the TE and NINP approaches simultaneously through the multi-objective optimization framework

  6. 6.

    Maximizing the fuzzy degree membership for the TEs and the AQMN in the optimization model

  7. 7.

    Employing the PROMETHEE method to obtain the superior optimum AQMN characterization

  8. 8.

    Obtaining the superior optimal AQMN characterization for the significant metropolitan in California, USA, using the proposed innovative framework

Methodology

Figure 1 shows that the proposed framework in this study finds the best appropriate optimal AQMN within three primary stages: (i) preparing data, (ii) multi-objective optimization, and (iii) multi-criteria decision-making (MCDM). It is noteworthy that MATLAB® is used for data analysis, developing algorithms, and making models.

Fig. 1
figure 1

Schematic view of the suggested framework to design a superior optimum AQMN

Stage 1 collects air quality concentration measurement data, generates potential monitoring stations with the help of the BME method, and calculates the TE of all potential and existing monitoring stations to develop a TE-D curve. To successfully estimate the potential stations of AQMN, the BME method uses data from all existing stations (i.e., stations’ locations and pollutant concentrations) as the input.

Stage 2 develops an optimization model using NSGA-II to acquire the Pareto optimal solutions for the AQMN with six objective functions: maximizing the AQMN coverage and the degree of fuzzy membership and minimizing (i) the TE values, (ii) the average of TE variations, (iii) the TE variation radii, and (iv) the number of stations. Finally, stage 3 ranks solutions acquired from the optimization framework using the PROMETHEE technique to acquire the most suitable optimal characterization of the AQMN.

Transinformation entropy

Transinformation entropy measures the quantity of information about a random variable by observing another variable (Vicente et al. 2011). When the transinformation entropy is high between two variables, it indicates that the value of one variable gives essential information about the value of the other variable. Conversely, if the transinformation entropy is low, it suggests that the two variables are mostly independent. Transinformation entropy has a symmetrical characteristic, meaning that the information obtained about A when B is known is equal to that obtained about B when A is known. This entropy measure is always non-negative and has a zero value only if the two variables are independent.

Multi-objective optimization model

In this research, the optimization model uses NSGA-II to (i) maximize the coverage of the AQMN (Eq. (1)), (ii) minimize the amount of the TE for the potential stations (Eq. (2)), (iii) minimize the average of TE variations (Eq. (3)), (iv) minimize the radius of TE variations (Eq. (4)), (v) maximize the fuzzy degree of membership for the stations (Eq. (5)), and (vi) minimize the number of stations (Eq. (6)).

$$\mathrm{Maximize }{Z}_{1}=\sum\limits_{i=1}^{{N}_{p}}\left\{\begin{array}{c}\begin{array}{cc}{b}_{i}\left(\frac{{D}_{\mathrm{max}}-{D}_{i}}{{D}_{\mathrm{max}}-{D}_{\mathrm{opt}}}\right)& \mathrm{if} {D}_{i}>{D}_{\mathrm{opt}}\end{array}\\ \begin{array}{cc}{b}_{i}& \mathrm{if} {D}_{i}\le {D}_{\mathrm{opt}}\end{array}\end{array}\right\}$$
(1)
$$\mathrm{Minimize }{Z}_{2}=\sum\limits_{i=1}^{{N}_{p}}\left\{\begin{array}{c}\begin{array}{cc}{b}_{i}\left(\frac{{\mathrm{TE}}_{\mathrm{max}}-{\mathrm{TE}}_{i}}{{\mathrm{TE}}_{\mathrm{max}}-{\mathrm{TE}}_{\mathrm{min}}}\right)& \mathrm{if} {D}_{i}<{D}_{\mathrm{opt}}\end{array}\\ \begin{array}{cc}{b}_{i}& \mathrm{if} {D}_{i}\ge {D}_{\mathrm{opt}}\end{array}\end{array}\right\}$$
(2)
$$\mathrm{Minimize\;}{Z}_{3}=\sum\limits_{i=1}^{{N}_{p}}{b}_{i}\left(\mathrm{max}\left({\mathrm{TE}}_{i}\right)+\mathrm{min}\left({\mathrm{TE}}_{i}\right)\right)/2$$
(3)
$$\mathrm{Minimize\;}{Z}_{4}=\sum\limits_{i=1}^{{N}_{p}}{b}_{i}\left(\mathrm{max}\left({\mathrm{TE}}_{i}\right)-\mathrm{min}\left({\mathrm{TE}}_{i}\right)\right)/2$$
(4)
$$\mathrm{Minimize\;}{Z}_{5}=\sum\limits_{i=1}^{{N}_{p}}{b}_{i}\left(1-{D}_{{\mathrm{m}}_{i}}\right)/{N}_{p}$$
(5)
$$\mathrm{Minimize\;}{Z}_{6}=\frac{\left(\sum\limits_{i=1}^{{N}_{p}}{b}_{i}-{N}_{\mathrm{min}}\right)}{\left({N}_{p}-{N}_{\mathrm{min}}\right)}$$
(6)
$${N}_{\mathrm{min}}=\left({D}_{\mathrm{max}}/{D}_{\mathrm{opt}}\right)+1$$
(7)

where \({Z}_{i}\) is the ith objective function; \({b}_{i}\) is the auxiliary binary variable that is equal to 1 if there is a monitoring station; otherwise, it is zero; \({N}_{p}\) is the number of monitoring stations; \({N}_{\mathrm{min}}\) is the monitoring stations’ minimum number; \({D}_{i}\) is the distance between the monitoring station \(i\) and the nearest monitoring station; \({D}_{\mathrm{opt}}\) and \({D}_{\mathrm{max}}\) are the optimum and maximum distances between stations, respectively; \({\mathrm{TE}}_{i}\) is the TE between the monitoring station \(i\) and the nearest monitoring station; \({\mathrm{TE}}_{\mathrm{max}}\) is the maximum amount of TE between stations; \({\mathrm{TE}}_{\mathrm{min}}\) is the minimum amount of TE between stations; and \({\mathrm{D}}_{{\mathrm{m}}_{i}}\) is the degree of membership for each station.

The pseudocode of the algorithm mentioned above is presented in Fig. S3, Section S2, Supplementary Material.

PROMETHEE

The PROMETHEE prefers to rank solutions according to criteria, while VIKOR and TOPSIS are two aggregating function-based multiple-criteria decision-making algorithms demonstrating the concept of “closeness to the ideal” (Opricovic and Tzeng 2004). In addition, PROMETHEE is an easy-to-use method and can be employed in authentic planning problems (Ülengin et al. 2001). More importantly, this method is suitable for both quantitative and qualitative data, making it more user-friendly than other methods. Also, it consists of two components helping understand the process of ranking solutions as simply as possible. Finally, the PROMETHEE method is a robust MCDM method to rank a group of Pareto fronts, \(=\{{T}_{1},{T}_{2}\dots ,{T}_{n}\}\), according to a set of objectives, \(C=\{{C}_{1},{C}_{2},\dots ,{C}_{m}\}\), that are different from each other (Brans and Vincke 1985; Pourshahabi et al. 2018). This study considers an identical weight to all objectives because all objectives are equally important. Although the different sets of weights may lead to different results, our approach can easily handle all possible sets of weights given by the decision maker. The obtained results illustrate the capability of the methodology to use desirable weights and criteria based on existing conditions in different case studies.

Case study

The suggested framework in this research is used to design the AQMNs for Los Angeles, Long Beach, and Anaheim in California (USA) (Fig. 2). These cities are regarded as the most industrial, populated, and polluted cities in the USA, with an area of about 5000 km2, in which many people are exposed to air pollution with its health risks.

Fig. 2
figure 2

The study area

In this case study, concentrations of all three pollutants, nitrogen dioxide (NO2), carbon monoxide (CO), and ozone, are collected from each of the 15 monitoring stations in the study area between January 2015 and December 2016 by the US Environmental Protection Agency (U.S. EPA 2022). The locations of existing stations are plotted in Fig. 5, and the information about all three pollutants is from the EPA’s website.

It is notable for asserting that the main reason for selecting these 15 stations from all existing ones is significant gaps in the time series data of removed monitoring stations. Data deficiency does not estimate data for estimated monitoring stations using BME. The precise locations of the chosen stations are available in Table S1, Section S6, Supplementary Material.

Results and discussion

BME for potential monitoring station prediction

The BME approach predicts three air pollutants’ concentrations, including CO, NO2, and ozone in the selected locations among the existing stations to provide input data for the proposed optimization framework. Applying BME to obtain the potential stations is performed following a 3-step procedure.

First, the daily pollutants’ concentrations, the stations’ longitudes and latitudes, and the number of stations are collected as inputs to the BME model. After that, the distribution of pollutant concentrations is determined using histograms of the pollutants’ frequencies, as illustrated in Fig. 3.

Fig. 3
figure 3

Distribution of pollutants’ concentrations in the existing stations

Second, the obtained data are analyzed spatially and temporally to estimate the pollutants’ concentrations in considered time intervals and locations. The mean trend analysis calculates the data averages to provide a smooth time series through these analyses. Then, these spatial and temporal mean trends are plotted, as illustrated in Figs. S4 and S5, Section S7, Supplementary Material. Subsequently, the most suitable time and space covariance models are defined and fitted with their related parameters (Fig. 4).

Fig. 4
figure 4

Spatial covariance models of pollutants

Finally, the BME approach predicts nine potential stations, wherein their covariance structure of time and space in determining latitudes, longitudes, and periods are similar to the existing stations. The locations of these potential stations are selected to cover the study area and help find the most optimized AQMN (Fig. 5). The detailed BME potential stations’ longitudes, latitudes, and station numbers are noted in Table S3, Section S8, Supplementary Material.

Fig. 5
figure 5

Location of the existing and BME potential stations

The mathematical models are regarded as a robust tool to select the most proper position for monitoring stations in AQMN optimization (Modak and Lohani 1985). The results show the BME method’s importance in redesigning and improving the AQMN of the case study. For instance, monitoring stations 17 and 24 for CO, 18 and 21 for NO2, and 17 and 21 for ozone are used in the most efficient AQMN. They constitute almost 50% of monitoring stations in the designed AQMN for air pollutants.

Transinformation entropy approach

The TE approach is used in this study to calculate the mutual information between each pair of stations. For this purpose, first, a matrix of order 24 × 24 is calculated to obtain the TEs between all existing and potential stations. Each cell in this matrix presents the value of TE(a,b) in which a and b are the number of stations or ath row and bth column in the TE matrix.

After that, a TE-D curve is plotted according to the TEs between each pair of stations and their corresponding distances, as displayed in Fig. 6. This figure illustrates that the maximum TE belongs to the nearest stations, and as the distance increases, the amount of the TE decreases. Besides, in this TE-D curve, the point about zero inclination is presented as the optimum value of the distance (\({D}_{\mathrm{opt}}\)) since the TE is minimum after this point.

Fig. 6
figure 6

A diagram of the TE-D curve and the fuzzy degree of membership (\({D}_{{\mathrm{m}}_{i}}\))

Finally, the fuzzy degree of membership (\({D}_{{\mathrm{m}}_{i}}\)) is used to reduce uncertainty in the amount of TE between each pair of stations, which is zero in the lower and upper bounds of the TE-D curve and 1 in the curve fitted on all points of the TE-D curve.

Multi-objective optimization model

The suggested multi-objective optimization framework uses the abovementioned six objectives, resulting in a series of Pareto optimal solutions, as indicated in Table S4, Section S9, Supplementary Material. The population size and generation in this NSGA-II multi-objective optimization model are 200 and 2400, respectively. Table 1 demonstrates the maximum, minimum, and average values for the AQMN coverage, TE, average of TE variations, radii of TE variations, fuzzy degree of membership, and the number of stations for CO, NO2, and ozone contaminants obtained from 150 acquired Pareto optimal solutions.

Table 1 The multi-objective optimization model results

Relation between the number of monitoring stations, coverage, and transinformation entropy

Figure 7 demonstrates the relation between the number of monitoring stations of monitoring networks, their coverage, and transinformation entropy. This figure shows the relationship between the number of stations varying from 4 to 24, the spatial coverage, and redundant information for all pollutants. The amount of spatial coverage objective is to the negative power of 1 because the purpose is to maximize it in the multi-objective optimization algorithm.

Fig. 7
figure 7

The relationship between the number of monitoring stations in the monitoring networks, coverage, and TE

PROMETHEE decision-making model

The PROMETHEE method is applied to find the optimal AQMN characterization among all Pareto optimal solutions by ranking the multi-objective optimization model results. Specifically, every solution is rated based on entering, leaving, and net flows. Table 2 presents the top ten resolutions for CO, NO2, and ozone concentrations based on equal weights for the six objectives, from which the optimal AQMN with the most desirable solutions is determined. In other words, the selected AQMN allows the amount of information transferred between stations, related uncertainty of TE, the number of stations to be minimized, and the coverage to be maximized simultaneously. The location of these stations is depicted on the map shown in Fig. 8. As can be seen, the net flow and ranking of the solutions are directly related. Furthermore, this table illustrates that solution Nos. 30, 35, and 38 provide the best AQMN characterization, including 4, 4, and 5 stations for CO, NO2, and ozone, respectively. Detailed results for the Pareto optimal solutions in 6 scenarios are provided in Table S5, Section S10, Supplementary Material.

Table 2 The ranking of the best ten solutions based on the PROMETHEE method
Fig. 8
figure 8

The locations of the superior optimal AQMN in the study area

The optimum number of monitoring stations directly affects expenses associated with operation, maintenance, and assessment performance (Zhao et al. 2022). According to Table 3, a simple comparison between the TE of previous AQMNs and proposed ones for air pollutants shows that observing the joint information decreases considerably using the presented methodology while the number of monitoring stations declines. In addition, the uncertainties of the observing data, including average TE variations, radii of TE variations, and fuzzy degree of membership, significantly decrease. Last but not least, this framework determines monitoring stations with an acceptable level of coverage and shared information.

Table 3 The comparison of the optimization model objectives for current monitoring stations and the proposed ones

The objective values of all three pollutants for different numbers of stations are compared in Fig. 9. Based on this figure, the amount of redundant information (TE) for monitoring networks with 12, 14, and 9 stations is relatively close. Therefore, having more than 9 stations does not significantly help to reduce the mutual information between monitoring stations. By comparing the current monitoring networks with 15 stations to the proposed networks with 4, 4, and 5 stations for monitoring CO, NO2, and ozone, respectively, the results indicate a significant reduction in the number of stations and the amount of coverage by 3.75 times for both CO and NO2 and 3 times for ozone. Despite this reduction, the proposed networks show a decrease in the amount of redundant information for CO, NO2, and ozone concentrations by 8.25, 5.86, and 4.75 times, respectively. This outstanding reduction in the amount of shared information compared to the number of monitoring stations and coverage means a relatively desirable level of information can be achieved at a much lower cost. However, if decision-makers have a larger budget, they can have more coverage by installing more monitors.

Fig. 9
figure 9

The comparison of monitoring networks with different numbers of stations

Sensitivity analysis

A series of sensitivity analyses are performed based on the model’s results in different scenarios to show that the PROMETHEE method is sensitive to varying objectives’ weights. Figure 10 demonstrates the weights of the objectives in 6 scenarios. In addition, the results of sensitivity analyses are given in Fig. 11, including the best solution for each scenario obtained from the PROMETHEE method. According to the solutions obtained from different scenarios, it is clear that TE and its related uncertainties are determining factors exerting a pronounced effect on the number of stations in scenarios 4 and 5. On the other hand, the coverage raises the number of stations in scenarios 1, 2, and 3. However, in some cases, such as scenarios 1 and 2, TE and its associated uncertainties impact the number of stations. However, in scenario 3, where coverage has a significantly higher weight, the number of stations is higher than the others.

Fig. 10
figure 10

The objectives weights in each scenario

Fig. 11
figure 11

Results of sensitivity analysis for different weights

Conclusions

A novel multi-objective optimization model using NSGA-II is suggested in this study to obtain the most suitable optimum air quality monitoring network (AQMN) characterizations. The presented framework adopts transinformation entropy (TE) and fuzzy nonlinear interval number programming (NINP) to evaluate the multi-objective model’s mutual information and uncertainties among AQMN stations. The input data are obtained from 15 existing stations and nine potential stations generated by the Bayesian maximum entropy (BME) method for three contaminants, namely nitrogen dioxide (NO2), carbon monoxide (CO), and ozone, allowing for the selection of the optimal AQMN with more options. The Pareto optimal results of this NSGA-II optimization model are prepared considering six optimization objectives: AQMN coverage, TE, average of TE variations, radii of TE variations, fuzzy degree of membership, and the number of stations. These Pareto optimal solutions are ranked using the PROMETHEE method, giving the same weight to all objectives to find the most appropriate AQMN characterization. Finally, to successfully show the efficiency and advantages of the methodology mentioned above, it is applied to Los Angeles, Long Beach, and Anaheim in Southern California, USA. The results illustrate that the AQMNs determined by the PROMETHEE method consist of 4, 4, and 5 stations for CO, NO2, and ozone concentrations from all 24 existing and potential stations, respectively. This reveals that the optimal AQMN exhibits good information even though the stations’ number is reduced. This unparalleled significant achievement is gained since the BME approach generates the potential AQMN stations, and the shared information between the stations is considered using the TE and fuzzy NINP techniques within the innovative multi-objective optimization model. It is worth noticing that this methodology can be implemented in other areas with valid data within a fair distribution in those regions to acquire a good estimation using BME. More importantly, there should be no significant gaps in the time series data. In addition, other methods such as TE, NINP, and the fuzzy degree of membership are numerical methods that can be used in other case studies. Future studies can develop a framework using the non-dominated sorting genetic algorithm III (NSGA-III) to strengthen the optimization model and the value of information (VOI) theory to gain the most information. Also, the results of the future studies’ framework can be compared with those of the methodology of this study.