1 Introduction

Air pollution is one of the most important challenges facing the world today as a result of the development of technology [1, 2], where it can be defined from several aspects in terms of pathogenesis. Pollution due to the presence of living or invisible organisms, such as bacteria and fungi, in the environment such as water, air or soil (Air pollution as chemical is the imbalance of the ecosystem by chemical effects, and these pollutants can be in the form of solid particles or liquid droplets or gases), from the scientific point of view (a change in the harmonic movement between the components of the ecosystem that paralyzes the efficiency of this system and loses its ability to perform its natural role in the self-disposal of pollutants). This research deals with intelligent predictive design to address this phenomenon [3]. There are different types of error measurement, including: Root Mean Square Error (RMSE) measures how much error there is between two data sets. In other words, it compares a predicted value and an observed or known value. It's also known as Root Mean Square Deviation and is one of the most widely used statistics in GIS [4]. \({\text{RMSE}} = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} (F_{i} - A_{i} )^{2} }}{n}}\) Where: F = forecasts (expected values or unknown results), A = observed values (known results). n = sample size. Other measures known, cross-entropy loss is another loss function mostly used in regression and classification problems. Cross-entropy loss [5] is given by \(H\left( A \right) = - \mathop \sum \nolimits_{i} F_{t} \log \left( {A_{i} } \right)\) where \(y_{i}^{ - }\) is the target label, and \(y_{i}\) is the output of the classifier. Cross-entropy loss function is used when the output is a probability distribution, and thus it is preferred [6]. While, symmetric mean absolute percentage error (SMAPE) is an accuracy measure based on percentage (or relative) errors. It is usually defined as [7]: \({\text{SAMPE}} = \frac{1}{n}\mathop \sum \nolimits_{t = 1}^{n} \frac{{\left| {F_{t} - A_{t} } \right|}}{{\left| {A_{t} + F_{t} } \right|/2}}\) where At is the actual value and Ft is the forecast value. The absolute difference between At and Ft is divided by half the sum of absolute values of the actual value At and the forecast value Ft. The value of this calculation is summed for every fitted point t and divided again by the number of fitted point’s n. If actual value and forecast value are both 0, we will set SMAPE score 0, too. This paper used SMAPE to evaluate the quality of a prediction, by comparing predicted to observed values.

Forecasting is one of taking decision process to find estimates values for the future based on past data [8]. There are three type of prediction [9]: First, perspective predication model indicates the task of developing a model that is aimed to predict the target’s value as a work of the informative variable and the main aim of these tasks is predicting the value regarding a specific attribute according to the other attribute values. Second, Traditional Prediction: During the first half of the twentieth century, many methods used to extrapolate the future were used for decision-making. They are part of the planning process, at the same time, they have succeeded in helping planners predict and make rational decisions about the future, it is considered a traditional means of dealing with the future when compared to modern methods and techniques in this field. Traditional methods include: Method of prediction by guessing: This method depends on the intuitive way used by the individual in assessing some aspects of the future. But such predictions may fail more than success (Fig. 1).

Fig. 1
figure 1

Relationships among the main three challenges

Deep techniques are set of multi-levels learning techniques derivative from automated learning [10, 11]. A field in which the computer tests algorithms and programs by learning to improve and develop it by itself. Modern computer vision, speech recognition programs and future prediction are all the product of deep learning [12, 13]. The need for this method increases with the emergence of the concept of large data. Because of its ability to deal with these data, so the computer needs preliminary data to understand the relationship between objects, if we can say that it is a set of algorithms that allow the device to learn from itself and events, this makes the device learns and then develops itself through the neural classes [14, 15]. The greater the number of neural classes, the greater the performance of the device. This is characterized by deep instruction in teaching the device on other techniques that have a certain level of learning, injury to develop with the increase in the volume of data. To ensure the quality of automated learning through deep learning, you must provide and enter as many data as possible and to illustrate the relationship among these terms can be conceived in the form of concentric circles as explain in Fig. 2.

Fig. 2
figure 2

Relationship among AI, machine learning and deep learning

In this paper, we will present new forecaster through synthesis between tow techniques LSTM and PSO after develop one of it through build DSN-PSO to enhance the performance of other LSTM to satisfy the following: highly efficient, least cost and easy to use. Before build the forecaster, we design electric circuit consisting of several devices (LoRa, Waspmate Platform, Five sensors).

LoRa is modulation technique which allows sending data at extremely low data-rates to extremely long ranges for more detail see [16].

‘Waspmote platform is an architecture available as open access allow by connect devise and sensors platform, for more detail see [16, 17]. A sensor is a device, module, machine, or subsystem used in many applications to read the data for specific event or change in specific environment in the real times, for more detail see [18].

In this work, we deal with five types of sensors are Grove—Laser PM2.5 Sensor (HM3301) to measure PM2.5, PM10, MQ-7 to measure carbon monoxide (CO), MQ131 to measure Ozone, and NO2 sensor to measure nitrogen dioxide (NO2) used to collect data in real-time Fig. 3 showing the electrical circuit connects the main parts of station.

Fig. 3
figure 3

Connect the main sensors in design station through LoRa model

The main points attempt the achieve in this work:

  • Increase accuracy in knowing the percentage of air pollution in the coming days to take precautionary measures against the risks of such pollution and try to reduce it.

  • This integrated system is part of the electronic management and chemical safety of laboratories.

  • Apply decisions from health and environmental communities and avoid early pollution risks by educating people.

  • The system provides us with important statistical information in raw form that can contribute to the treatment of sources that cause pollution of air produced by human effort such as factories, houses or produced by nature such as burning forests and volcanoes and others and guidance.

  • The system is inexpensive and therefore does not burden the Ministries of Health and Environment.

  • To achieve the innovative method of safety of personnel working in laboratories that deal with these chemicals and comply with the requirements of UNESCO for the achievement of chemical safety and safety conditions.

The sensor is a device that works to detect the physical or chemical ambient state, some measure temperature, some measure pressure, some measure gases, and some measure air quality. It converts the signals incident upon it into electrical impulses that can be measured or counted by a device such as a computer [27]. In other words, the sensor is a device, module, or subsystem that aims to detect events or changes in its environment and send information to other electronics, often a computer processor. The sensor is always used with other electronic devices.

There are many sensors to measure the concentrations that cause air pollution, but in this paper, we will focus on the sensors specific to our work:

1.1 Sensor–Grove PM2.5 Laser (HM3301)

It is a new generation of laser dust detection sensor, which is used for continuous and real detection of dust in the air. It is used to measure PM2.5 and PM10 concentrations.

The main features of this sensor:

  • High sensitivity to dust particles 0.3 μm or greater.

  • Continuous detection of dust concentration in the air in real time.

  • Based on laser light scattering technology, readings are accurate, stable and consistent.

  • Low noise.

  • Energy consumption is very low.

1.2 Sensor–MQ-7

The MQ-7 gas sensor has high sensitivity to carbon monoxide. The sensor can be used to detect different gases that contain carbon dioxide, so it is low in cost and suitable for different applications.

The main features of this sensor:

  • High sensitivity to combustible gas (CO) in a wide range.

  • Stable performance, long life, and low cost.

  • Simple drive circuit.

1.3 Sensor-MQ131

The MQ131 gas sensor is highly sensitive to ozone.

The main features of this sensor:

  • Good sensitivity to ozone in a wide range of gases.

  • Long life and low cost.

  • Simple drive circuit.

1.4 Sensor–WSP1110 nitrogen dioxide sensor

Low-cost electrochemical nitrogen dioxide sensors provide exciting new opportunities for rapid and distributed outdoor air pollution measurements. This type of sensor is stable, long lasting, requires little energy, and is capable of accurately measuring parts per billion (parts per billion).

The main features of this sensor:

  • High sensitivity, stable performance and long-life time

  • Small in size and light in weight

  • 5 V voltage, low consumption

  • Quick response reset function, simple drive circuit

  • Long-term stability (50 ppm overload).

1.5 Sensor–SO2

SO2 sensor is designed to measure sulfur dioxide for applications in: air quality monitoring, industrial safety and air purification monitoring.

The main features of this sensor:

  • Small in size with low profile (15 × 15 × 3 mm).

  • Long life (10 years life expectancy).

  • Fast response (15 s typical).

2 Related works

The issue of air quality prediction is one of the critical topics related to human lives and health. The aim of the work presented herein is to develop a new method for such prediction based on the huge amount of data that is available and operating on data series. This section first reviews previous studies by researchers in this area and compares them based on the database used in each case, the methods applied to assess the results, the advantages of each method, and its limitations.

Li et al. [19] used a long short-term memory extended (LSTME) neural network model with combined spatial–temporal links to predict concentrations of air pollutants. In that approach, the LSTM layers automatically extract potential intrinsic properties from historical air pollutant and accompanying data, while meteorological data and timestamp data are also incorporated into the proposed model to improve its performance. The technique was evaluated using three measures (RMSE, MAE, and MAPE) and compared with the STANN, ARMA, and SVR models. The work presented herein is similar in its use of the LSTM approach as part of a recurrent neural network structure but differs in its use of another evaluation measure.

Lifeng et al. [20] reported that the best predictions of air quality could be obtained using the GM model (1.1) with fractional order accumulation, i.e., FGM (1.1), to find the expected average annual concentrations of PM2.5, PM10, SO2, NO2, 8-h O3, and O-24 h. The measure used in that work was the MAPE. Application of the FGM (1.1) method resulted in much better performance compared with the traditional GM model (1.1), revealing that the average annual concentrations of PM2.5, PM10, SO2, NO2, O8–O3, and O3 24-h will decrease from 2017 to 2020. That work presented herein is similar in that it predicts the concentration of air pollutants and finds ways to address them, but differs in its use of the LSTM method for the predictions.

Wen et al. [21] combined a convolutional neural network (CNN) and LSTM neural network (NN), as well as meteorological and aerosol data, to refine the prediction performance of the model. Data collected from 1233 air quality monitoring stations in Beijing and the whole of China were used to verify the effectiveness of the proposed model (C-LSTME). The results showed that the model achieved better performance than state-of-the-art technologies for predictions over different durations at various regional and environmental scales. The technique was evaluated using three measures (RMSE, MAE, and MAPE). In comparison, the LSTM approach is also applied in a RNN in this work, but after having identified the best structure for the network. In addition, another evaluation measure is used herein.

Shang et al. [22] described a prediction method based on a classification and regression tree (CART) approach in combination with the ensemble extreme learning machine (EELM) method. Subgroups were created by dividing the datasets using a shallow hierarchy tree through the CART approach. At each node of the tree, EEL models were constructed using the training samples of the node, to minimize the verification errors sequentially in all of the subtrees of each tree by identifying the number of hidden intestines, where each node is considered to be a root. Finally, the EEL models for each path to a leaf are compared with the root of each leaf, selecting only the path with the smallest error to check the leaf. The measures used in that work were the RMSE and MAPE. This experimental measurement results revealed that such a method can address the issue of global–local duplication of the prediction method at each leaf and that the combined CART–EELM approach worked better than the random forest (RF), v-(SVR), and EELM models, while also showing superior performance compared with EELM or k-means EELM seasonal. The work presented herein is similar in that it uses the same set of six air pollution indexes (PM2.5, O3, PM10, SO2, NO2, CO) but differs in terms of the mechanism applied to reduce air pollutants, applying the RNN method.

Li et al. [23] applied a new air quality forecasting method and proposed a new positive analysis mechanism that includes complex analysis, improved prediction units, data pretreatment, and air quality control problems. The system analyzes the original series using an entropy model and a data processing process. The multiobjective multiverse optimization (MOMVO) algorithm is used to achieve the required performance, revealing that the least-squares (LS)SVM achieved the best accuracy in addition to stable predictions. Three measures were used for the evaluation in that work, viz. RMSE, MAE, and MAPE. The results of the application of the proposed method to the dataset revealed good performance for the analysis and control of air quality, in addition to the approximation of values with high precision. The work presented herein uses the same evaluation measures but differs in its use of the LSTM approach in the RNN after identifying the best structure for the network.

Kim et.al. [24] aim to build annual-average integrated empirical geographic (IEG) regression models for the contiguous USA for six criteria pollutants during 1979–2015; explore systematically the impact on model performance of the number of variables selected for inclusion in a model; and provide publicly available model predictions. We compute annual-average concentrations from regulatory monitoring data for PM10, PM2.5, NO2, SO2, CO, and ozone at all monitoring sites for 1979–2015.

3 Building IFCsAP

The model presents in this paper consist of two phases, the first including build the station as electrical circuit to collect the data related to six concentrations in real time and saved it on the master computer to preparing and processing in next phase. The second phase focuses on processing dataset after splitting it based on station identifier, the processing phase pass on many levels of learning to product forecaster can deal with hug/big dataset. All the actives of this researcher summarization in Fig. 5 while the algorithm of IFCsAP model described in main algorithm. To making the model more understanding, we explain the first phase on it in Fig. 3 while the second phase in Fig. 4. The main constructions used.

  • PM2.5: 10 µg/m3 (average allowable value per year), 25 µg/m3 (average allowable value in 24 h).

  • PM10: 20 µg/m3 (average allowable value per year), 50 µg/m3 (average allowable value per year).

  • o3: 100 µg/m3 (average allowable value in eight hours). The recommended maximum value, previously set at 120 µg/in eight hours, has been reduced to 100 μg/m3 based on recent findings of relationships between daily mortality and ozone levels in locations where the concentration of the substance is less than 120 µg/m3.

  • No2: 40 µg/m3 (average allowable value per year), 200 μg/m3 (average allowable value per hour).

  • SO2: 20 µg/m3 (average allowable value in twenty-four hours), 500 μg/m3 (average allowable value in 10 min).

Fig. 4
figure 4

Block diagram of IFCsAP model

figure a

Dataset collection through two types of resources (i.e., directory web site represents by KDD cup 2018 dataset and by building station have multi-sensors to caption concentrations). That dataset needed to handle it before building the predictor as follows.

  • Split the dataset for each station and save it in separated file hold the name of this station.

  • After that, treatment missing values through drop each row have one or more missing values.

  • Finally, apply the normalization for each column in dataset related to each station to make the value of that concentration in the range [0, 1].

3.1 Develop long short-term memory (DLSTM)

This paper presents how can employ PSO through build new algorithm called DSN-PSO as explained in algorithm 2 to enhance the performance of one of deep learning algorithm LSTM (i.e., for more detail see Main steps for training LSTM–RNN in “Appendix”) through determined the structure and parameters of it. This explains with details in Algorithm 3 (Table 1).

Table 1 The parameters utilize in models
figure b
figure c
figure d

3.2 Running the IFCsAP model

We will train and predict concentrations movements for several epochs and see whether the predictions get better or worse over time. The algorithm is shown how execution the IFCsAP model.

figure e

3.3 Evaluation stage

The symmetric mean absolute percentage error (SMAPE) is used in this paper as measured to determine the accuracy and robust of the predictor.

SAMPE = 

N: number of samples. : forecast value.: Actual value. t: every fitted point.

Set SMAPE score as 0. If both values predict and actual are 0 actual value and forecast value are both 0. In each station forecasting the concentration levels “PM2.5, PM10, NO2, CO, O3 and SO2” to the next 48 h. We can calculate the values of this measure daily contagious through one moth then sort these values and compute the average of 25 lowest daily SMAPE scores. The main steps of evaluation shown with details in algorithm 5.

figure f

4 Results of IFCsAP model

The results and justification of it will explain with details in this section.

4.1 Pre-processing

This stage consists of multi-steps performance on the dataset after collecting it, each step handles the dataset from one problem as we will be discussed later.

4.1.1 Split station

The second column of Table 4 shows the result of splitting the dataset based on the name of station, where each station saves in a separated file hold the name of it.

4.1.2 Missing values [8]

Missing values one of the problem effect in the final results of any model. Spatially the prediction model, where all researches know the result of predictor become more accuracy of that predictor build based on true values otherwise the results become not truest. Therefore, in that model, we will drop any record have missing values in each station. In general, the station has different rate of missing value as explained in Table 2 column three and Fig. 5.

Table 2 Ratio after pre-processing the missing values
Fig. 5
figure 5

Flowchart of research work activities

Table 2 explains the dataset after split it into 35 stations have the same number of record 8886 and six features. Also show the dataset after handle the missing values with the rate of dropping. Also, Fig. 6 shows the percentage of records have missing values in each station.

Fig. 6
figure 6

Percentage of records have missing values in each station

4.1.3 Normalization

Normalization dataset based on MinMaxScaler scales to become in the range [0 and 1] [20, 25]. This is a necessary step for the proposed predictor. The main purpose of the normalization stage is to make all the values in the same range with save the natural of each feature in that dataset.

4.1.4 Split the dataset

Cross-validation is the best techniques for evaluating the performance of a given model. Because badly selected samples for training and testing affects the performance badly, cross-validation has different methods for wisely selecting the best samples for training and testing a given model. As shown in Table 3 as attached in “Appendix” and Fig. 7.

Table 3 Apply cross-validation for each station
Fig. 7
figure 7

Distribution of dataset based on five cross-validation

Table 4 shows idea of cross-validation which, in this paper, used ten cross-validation for each station to determine the best number of samples will used from training dataset to build model and from testing dataset to evaluation of the model.

Table 4 The best split for the dataset of each station

We note that the station with the highest percentage of missing values has a very high SMAPE score compared to stations with the lowest percentage of missing values. We conclude that using the drooping process will make the predictor results more accurate compared to other methods used to process missing values (Fig. 8).

Fig. 8
figure 8

SMAPE based on the traditional LSTM

4.2 DSN-PS

Select the suitable parameters of any deep learning algorithm is consider one of the main challenges in the science, in general, all known LSTM take a very long time in implementation to give the result; therefore, this section shows how DSN-PS solve this problem and exceed this challenge. The optimal structure with main parameters was find to DLSTM.

In other words, values determined as hidden layers number, nodes in each hidden layer, weights among layers, the bias, and activation function type of the deep learning network are essential parameters that fundamentally affect DLSTM performance. In general, all the network based on the try and error principle in select the parameters of it, while this led to long time on implementation that network. Therefore, the main parameters of DLSTM result from DSN-PS as shown in Table 5. While Table 6 shows the best parameters that represent the structure of DLSTM and compare with the parameters of traditional LSTM.

Table 5 The parameters of DSN-PS
Table 6 The best parameters of DLSTM results from apply DSN-PS compare with the parameters of traditional LSTM

Table 6 shows best parameters (# hidden layers, #nodes in each hidden layer, weights, bias, and activation function) resulted from the DSN-PS algorithm that represents the initial structure of the DLSTM (Table 7).

Table 7 The difference between the actual and predict values results from IFCsAP

4.3 DLSTM

DLSTM is mainly based on the LSTM algorithm, which is capable of handling large data and retains data for long periods because each cell contains memory. In this stage, forward the parameters result from DNS-PS to DLSTM that represent the structure of it with the dataset of that station generated from the best split of ten cross-validations to represent training of DLSTM. Compute the prediction values for each station (Station #1… Station #35) based on the best split result from ten cross-validations. The best parameters result from DNS-PSO as structure of DLSTM represent one input layer have six nodes each node represent one of six constructions; one hidden layer contain 250 nodes, one output layer. All other parameters and activation function described in Table 8. Also, we used 150 iteration in each iteration we enter batch size 24.

Table 8 SMAPE evaluation

Compare the actual and prediction values results from DLSTM for first station shown in Fig. 9.

Fig. 9
figure 9

Shown actual and prediction values for the first station

Compare the prediction values Station #18 based on the best split result from ten cross-validation and compare with the real values shown in Fig. 10.

Fig. 10
figure 10

Shown actual and prediction values for Station Number 18

Compare the prediction values Station #34 based on the best split result from ten cross-validation and compare with the real values shown in Fig. 11.

Fig. 11
figure 11

Shown actual and prediction values for Station Number 34

4.4 SMAPE evaluation

After, build the DLSTM based on the training dataset for each station, the model evaluated through compute SMAPE for testing dataset.

The result score of each concentration is the average of 25 lowest daily SMAPE scores. If a concentration misses a day, the score of this concentration on that day will be imputed by the baseline score. As shown in “Appendix” under Table 8.

5 Compare between traditional LSTM and IFCsAP based on the values of SMAPE

To explain the successful of IFCsAP model, we compare the result values of SMAPE come from the traditional LSTM and IFCsAP. As shown in Table 9.

Table 9 Compare SMAPE between the traditional LSTM and IFCsAP

The above table showed the result of SMAPE of IFCsAP Model, in comparison with the result of SMAPE of traditional LSTM. Which used the same dataset from the pre-processing stage (i.e., the dropping, normalization and the same split of training and testing resulting from ten cross-validation) were applied at each station. We found that the results SMAPE of IFCsAP model are better than traditional LSTM as shown in Fig. 12.

Fig. 12
figure 12

Compare SMAPE between the traditional LSTM and IFCsAP model

6 Summary

Air quality index dataset is a huge data needed to intelligent and deep computation to extract a useful pattern from it. The advantage of this data set is diverse and large in size, resulting in accurate and reliable decisions. In addition, the data used in this thesis were obtained from more than one station and this in itself is considered a challenge in building a stable prediction system for behaviors. Limitation of this dataset contains on concentrations that cause air pollution are usually unequal and unknown to non-experts, which contain missing value and taken from different stations in terms of the environment assigned to those stations.

DSN-PS is determined the parameters and activation function of DLSTM, the advantage of DSN-PS is the time of execution LSTM will be reduced, limitation of DSN-PS will increase complex of LSTM.

DLSTM is a develop of LSTM by DSN-PS, POS used to determine the optimal (number of hidden layers, number of nodes in each hidden layer, weight, bias, and activation function), the advantage of DLSTM capable to deal with huge data and contain memory cell to save information at the long term, the limitation of DSTM contain on huge number of parameters.

Evaluation is the process of calculating the amount of error from the actual value and its predicted value, there are different types of error measures: including prediction (i.e., MSE, RMSE, MAE, MAPE and, etc.) and coefficient matrix (i.e., accuracy, F, FP, etc.). While in this research, use SMAPE Evaluation.

  • How particle swarm can be useful in building a recurrent neural network (RNN)?

    PSO works to modify the behavior of each in a particular environment gradually, depending on the behavior of their neighbors until they are obtained the optimal solution.

    On the other hand, the neural networks use the principle of the try and error in the selection of the basic parameters of their own and modified gradually to reach the values accepted for those parameters.

    Depending on the PSO and neural networks of the above subject, we used the PSO principle to find the optimal parameters and the activation function of the neural network.

  • How to build a multi-layer model with a combination of two technologies LSTM-RNN with particle swarm?

    Through, building new predictor called IFCsAP that combining between the DSN-PS and the DLSTM. Where DSN-PS used to find the best structure with parameter to LSTM while DLSTM used to predict the rate Concentrations of air pollution.

  • IS SMAPE measure enough to evaluate the results of suggesting predictor?

    Yes, The SMAPE is sufficient to evaluate the results of the predictor within the next 48 h.

  • What is the benefit result from building predictor by combination between DSN-PS and DLSTM?

By combining DNS-PS and DLSTM, reduce the execution time by defining network parameters but at the same time will increase the computational complexity.

7 Conclusions

We can summarize the main point performance in that paper as the follows: Building an integrated platform based on physical and program entities in the form of an integrated station ((H/W, S/W) used for essential needs only and reduces the damage resulting from air pollution; thus, this platform saves effort and cost through sensor programming and activating its role to read data on concentrations that cause pollution real-time air, increasing performance, reducing effort, reducing time and cost. Building a special station to measure the concentrations that cause air pollution, which depends on the principle of Intelligent Data Analysis (IDA). Where data are collected from the stations which are considered as Class Node by the wireless network that was built represented (LoRa & Waspmate) on the calculator which is considered as Master Node. The IFCsAP is fed with the data collected in real time and the preliminary processing is performed on it, after which the predictor results are evaluated using a symmetric mean absolute percentage error (SMAPE). Through this scale, we will evaluate the levels of PM2.5, PM10, NO2, CO and O3. And SO2 for the next 48 h for each station. Often the data contain a proportion of missing or incomplete data, which causes an increase in the prediction or classification error of that data. Therefore, this problem can be addressed by deleting the entries that contain that data to create a more accurate forecast. The purpose of the Normalization process is to convert data within a specified range of values to be dealt with more accurately in the subsequent stages of processing. In our work, the data were converted within the range [0, 1], because the activation function deals with data within that range. The designed IFCsAP to dealt with one of the most important problems facing the environment at the present time as a result of increased pollution due to electronic waste, factories and laboratories, and the lack of real projects in Iraq to reduce air pollution rates. The designed model proved its accuracy and efficiency in predicting the concentrations that cause air pollution. The designed model is distinguished by the construction of a new tool called DLSTM, which is characterized by its ability to deal with large-size data as well as containing memory that enables it to retain data for long periods. Experiments have shown that the combination of the two technologies that have been designed, which are both DLSTM and DSN-PS, achieve more accurate results and reduce implementation time. The first tool that was designed, DSN-PS, used it to select the best parameters (parameters) to determine the structure of the second tool that was built called DLSTM, thus improving the performance of deep learning models and resulting in the production of a IFCsAP predictor that displays more accurate and efficient results.

The following point gives good idea for features works; Through explore the PSO, to tune other LSTM parameters such as: the learning rate, max error and the number of epochs, instead of based on trial-and-error principles that take too long to find the optimal parameters for the LSTM network. PSO with other deep learning models to find the best parameters and activation functions, instead of based on trial-and-error principles to find the optimal of network structure. Other type of swarm optimization such as (ant colony optimization, cuckoo search algorithm and glowworm swarm optimization) or a genetic algorithm can be used to find the best parameters and activation functions for LSTM.