1 Introduction

With the increasing population and urbanization in urban areas, as well as the growing demand for public transportation, the requirement for metro tunnels has been significantly increased. In subway tunnel excavations, it is necessary to estimate and control surface settlements observed after excavation that may cause damages to the surface structures [1]. Based on previous researches [2, 3], many geotechnical and geometrical parameters, such as cohesion, Poisson’s ratio, Young’s modulus, angle of internal friction and face support pressure, have been considered in predicting the values of the MSS.

Surface settlements are influenced by three main groups of factors, i.e., excavation and support method, tunnel geometry and ground properties. In the first group, the excavation and support methods are including Excavation, such as NATM and TBM, excavation type (full face or sequential mining) and Support, such as anchoring, shotcrete, steel sets and lining. In the second group, tunnel geometry factors are including worksite conditions, depth, diameter, number of tunnels and distance between tunnels. In the third group, ground properties are including elasticity modulus, unit weight, cohesion, friction angle, Poisson’s ratio, groundwater and permeability [4].

Previously, empirical and analytical methods as well as numerical analysis by finite difference (FD) and finite element (FE) methods were developed to predict the values of MSS [5, 6]. For instance, a method for estimating surface settlement above tunnels constructed in soft ground was developed in the study conducted by Schmidt [7]. Attewell and Farmer [8] evaluated ground disturbance caused by shield tunneling in a stiff, over-consolidated clay. In the other study, Ocak [9] proposed a new equation for estimating the transverse settlement curve of twin tunnels. He demonstrated that the proposed equation can estimate the transverse settlement with degree of confidence in the Otogar–Kirazli metro case studies. Atkinson and Potts [10] investigated the influence of the depth of burial and crown settlement on the surface settlement above shallow tunnels driven in soft ground. Hamza et al. [11] studied the ground movements due to construction of cut-and-cover structures and slurry shield tunnel of the Cairo Metro. Chi et al. [12] indicated the application of the conjugate gradient method for the back-analysis of tunneling-induced ground movement. They established semi-empirical equations to predict the tunneling-induced ground movement in the silty clay and silty sand of Taipei basin. Chou and Bobet [13] used twenty-eight tunnels to evaluate predictions from an analytical solution for shallow tunnels in saturated ground. As a result, comparisons between predictions and observations from actual tunnels indicated good agreement, generally within 15 % difference. In the other study of analytical solutions, Park [14] applied elastic solutions to predict the tunneling-induced undrained ground movements for shallow and deep circular tunnels in soft ground. He showed a good agreement of the predicted ground deformations with field observations for tunnels in uniform clay. Short-term surface settlements for twin tunnels, located between the Esenler and Kirazlı stations on the Istanbul Metro line, were predicted by Ercelebi et al. [15]. For this purpose, they used three different methods, including FE, semi-theoretical (semi-empirical) and analytical methods to predict surface settlement caused by tunneling. Their results indicated that the FE method can be used as a reliable method to predict short-term settlement.

Apart from empirical models, in recent years, artificial intelligence (AI) methods, such as artificial neural network (ANN), fuzzy inference system and support vector machine (SVM), have been developed for solving problems of rock and geotechnical engineering [1619]. In the field of MSS prediction, these models have been widely used and developed. Ocak and Seker [1] used three different methods, including ANN, SVM and Gaussian processes (GP) to estimate surface settlement. They concluded that the GP is a more precise method than the ANN and SVM models. In addition, a comprehensive study for prediction of MSS by ANN and multiple regression was presented by Mohammadi et al. [3]. The results of their research demonstrated that the ANN method can be regarded as a more reasonable predictive technique in predicting MSS.

Recently, the use of combination of evolutionary algorithms, such as particle swarm optimization (PSO) and imperialist competitive algorithm (ICA) with ANN has been highlighted in the field of rock engineering [20, 21]. The results indicated that such algorithms are useful to design the ANN. Nevertheless, as long as author’s knowledge, evolutionary algorithms have not been used and proposed for MSS prediction. In this research, a combination of PSO and ANN was proposed to predict MSS induced by tunneling along the line 2 of Karaj subway. In fact, PSO algorithm is utilized to incorporate ANN for its optimization propose.

2 Theory and methods

2.1 Artificial neural network (ANN)

One of the subsystems of AI systems is an ANN. The ANN model has been developed since the 1960s. Generally, the structure of an ANN, which is inspired by the human brain, consists of a group of computational units called neurons or nodes. These neurons are highly interconnected with each other. In addition, the capability of these neurons for performing mass parallel distributed processing is proved by many researches [2224]. A typical ANN consists of three layers, namely input, hidden and output layers. The mentioned neurons are placed in these layers and linked to each other by weights. On the other hand, problem effective and objective variables are placed in the input and output layers, respectively [25]. Theoretically, there are no restrictions on the No. of hidden layers and No. of neurons in the hidden layers and can be determined based on trial-and-error procedure [26]. To construct an ANN model, in the first step, ANNs require training to learn and consequently map a relationship from the data. There are many algorithms to train the network, such as Levenberg–Marquardt (LM), conjugate gradient and scaled conjugate gradient algorithms [27]. The selection of the best algorithm depends on the given problem, the purposes of the performed network such as classification and prediction, the number of datasets and so on. In the second step, to check the performance capacity of the constructed model, the rest of datasets are used for testing [28]. Although, ANN is used as a quick solution for engineering problems, it has a number of disadvantages: slow learning rate and getting trapped in local minima [29, 30].

2.2 Particle swarm optimization (PSO)

PSO which was first introduced by Kennedy and Eberhart [31] is a simple and powerful optimization technique inspired by social behavior of bird flocking or fish schooling. In the PSO algorithm, a number of simple particles are placed in the search space of n-dimensional problem or function [32, 33]. A potential solution can be represented by each particle and the particles evaluate the objective function at their current position. The next location of each particle is determined by combining some aspects of their own current and best position with those of other swarm particles, with some random perturbations [34]. Eventually, the swarm can be expected to move close to the optimum of fitness function [35].

Using Eqs. (1) and (2), the position and velocity of the particles can be determined and updated.

$$\mathop X\nolimits_{i}^{k + 1} = \mathop X\nolimits_{i}^{k} + \mathop V\nolimits_{i}^{k + 1}$$
(1)
$$\mathop V\nolimits_{i}^{k + 1} = \mathop V\nolimits_{i}^{k} + \mathop c\nolimits_{1} \mathop r\nolimits_{1} \left( {\mathop p\nolimits_{{{\text{best}},i}}^{k} - \mathop X\nolimits_{i}^{k} } \right) + \mathop c\nolimits_{2} \mathop r\nolimits_{2} \left( {\mathop g\nolimits_{\text{best}}^{k} - \mathop X\nolimits_{i}^{k} } \right)$$
(2)

where X k i is the n-dimensional vector that represents the position of particle i in the search space at iteration k. V i denotes the velocity of this particle. The velocity vector derives the optimization process by reflecting both the experimental knowledge of the particle and socially shared information from the particle’s neighborhood [36] by introducing distance of the particle from its own best position and swarm best position. The best position the particle has visited and found by the swarm so far are represented by p best,i and g best, respectively, in Eq. (2). Furthermore, r 1 and r 2 are random values in the range of zero to one, c 1 and c 2 are positive acceleration constants. The fitness function f measures how close the corresponding solution is to the optimum by calculating p best,i and g best. Due to this fact, objective function plays an integral role in this problem. Considering the minimization problem, the personal and global best positions at the next iteration are defined as:

$$\begin{aligned} \mathop p\nolimits_{{{\text{best}},i}}^{k + 1} = \left\{ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\mathop X\nolimits_{i}^{k + 1} ,f\left( {\mathop X\nolimits_{i}^{k + 1} } \right) < f\left( {\mathop p\nolimits_{{{\text{best}},i}}^{k} } \right)} \\ {} \\ \end{array} } \\ {\mathop P\nolimits_{{{\text{best}},i}}^{k} ,f\left( {\mathop X\nolimits_{i}^{k + 1} } \right) \ge f\left( {\mathop p\nolimits_{{{\text{best}},i}}^{k} } \right)} \\ \end{array} } \right. \hfill \\ \hfill \\ \end{aligned}$$
(3)
$$\mathop g\nolimits_{\text{best}}^{k} \in \left\{ {\mathop p\nolimits_{{{\text{best}},0}}^{k} , \cdots ,\mathop p\nolimits_{{{\text{best}},_{{n_{s} }} }}^{k} } \right\}\left| {f\left( {\mathop g\nolimits_{\text{best}}^{k} } \right)} \right. = \hbox{min} \left\{ {f\left( {\mathop p\nolimits_{{{\text{best}},0}}^{k} } \right), \cdots ,f\left( {\mathop p\nolimits_{{{\text{best}},_{{n_{s} }} }}^{k} } \right)} \right\}$$
(4)

where n s denotes the total number of particles in the swarm. The particles continue to move in the search space, with their position being updated at each iteration until the stopping condition is met.

3 Case study and data collection

In this research, datasets were collected from Karaj Subway (line No. 2), in Iran. Karaj is one of the large cities in Iran with 1.4 million inhabitants. Due to the increasing population and urbanization in this city, construction of a new subway system is necessary and crucial. Constructing of the operational line No. 2 of Karaj Subway was started in February 2007 with a total length of 27 km. Shape of tunnel is horseshoe and tunnel has 7.8 m height and 8.4 m width. Tunnel depth change 7–14 m. This project connects the Kamal-Shahr and Malaard, in northwestern and south of Karaj city, respectively (see Fig. 1). Based on many parameters, i.e., geotechnical analysis and economic studies, the tunnels have been designed and built in two phases, i.e., first and second, 14.5 and 12.5 km, respectively (see Fig. 1). According to Fig. 1, AB and BC are the first and second phases, respectively. Both the first and second phases are excavated using New Austria tunnel method (NATM). According to NATM, the tunnel excavation, in this project, was designed in three sections, as shown in Fig. 2. Based on Fig. 2, the heading was excavated in the step 1. Afterwards, the steps 2 and 3 were excavated, respectively. Moreover, it is observed that the tunnel has a horseshoe shape with 7.8 m height and 8.4 m width with the lining. After excavating the step 1, the exposed area is supported using steel fiber–reinforced shotcrete.

Fig. 1
figure 1

Location of the line 2 of Karaj Subway

Fig. 2
figure 2

The steps of tunnel excavation in the line 2 of Karaj Subway using NATM

In this research, a group of datasets, including 143 datasets, was collected from the laboratory and in situ tests. In this regard, the values of horizontal to vertical stress ratio (coefficient of earth pressure), cohesion and Young’s modulus were measured and considered as input parameters. To determine the coefficient of earth pressure, in situ horizontal stress and in situ vertical stress tests were conducted. In addition, the values of MSS were carefully measured and considered as output parameter. The range of the mentioned parameters to construct the predictive models, for all of 143 data sets, is given in Table 1. To measure MSS, the settlement markers were installed, grouted about 100 cm into the ground, placed approximately at intervals of 25 m along the tunnel alignment, and the surface settlements were measured. In addition, in each transverse section, three or five surface settlement markers, which are arranged approximately at intervals of 5–7.5 m, were installed, as depicted in Fig. 3.

Table 1 The range of measured parameters for MSS prediction
Fig. 3
figure 3

Schematic diagram of settlement marker location in the line 2 of Karaj Subway

4 Prediction of MSS

In this section, the modeling procedures of ANN and hybrid PSO-ANN models for MSS prediction are described. These models are constructed with the MatLab environment using MatLab2013b. To develop the models, the datasets have been divided into two groups: training and testing datasets. Previous researchers have recommended various percentages for the testing datasets [3739]. In the present study, 80 and 20 % of whole datasets were used for model developments and checking the performance of the developed models, respectively. Selection of the random training and testing data was carried out by a MatLab code written by authors.

4.1 Prediction of MSS by ANN model

In this part, an attempt has been made to estimate MSS using ANN procedure. In the first stage of this modeling procedure, the prepared database was normalized to simplify the design procedure as follows:

$$X_{\text{norm}} = \, \left( {X \, {-} \, X_{ \hbox{min} } } \right) \, / \, \left( {X_{\hbox{max} } - X_{\hbox{min} } } \right)$$
(5)

where X and X norm are the measured and normalized values, respectively. X max and X min are the maximum and minimum values of the X. Note that, to achieve a reasonable solution, it is recommended that the numeric values of input and output parameters be normalized [1721].

In the next stage of ANN modeling, the prepared database should be divided into training and testing datasets for model developments and also model evaluations. Here, testing datasets are utilized to evaluate the performance capacity of the developed models. In ANN modeling, selection of the ANN training algorithm and also the determination of the network architecture are the most difficult tasks [40, 41]. Among all ANN training algorithms, as mentioned before, LM was selected and utilized to train the ANN systems. Many researchers highlighted the efficiency of the LM algorithm, among other training algorithms, in solving engineering problems (e.g., [4244]). On the other hand, as mentioned by many scholars (e.g., [4547]) an ANN network with only one hidden layer can estimate almost all problems. In addition, developing an ANN model with one hidden layer is of attention because of its beneficial effect on decreasing the complexity of a model and as a consequence the likelihood of model overfitting. Hence, in this study, all proposed artificial intelligent (AI) models were designed using one hidden layer.

In the next stage of ANN design, number of hidden nodes (N h ) in a hidden layer should be determined. Sonmez et al. [48] and Sonmez and Gokceoglu [49] stated that the number of hidden node(s) has a deep impact on the performance prediction of an ANN model. In this regard, previous researchers proposed several equations for determining the N h as shown in Table 2. Based on this table, the upper limit for the N h is 2N i  + 1, where N i is the number of input parameters. Considering the presented equations in Table 2 and the prepared datasets, in this study, a range of 1–7 for the number of hidden nodes can solve MSS problem. It seems that the proper N h should be obtained using the trial-and-error procedure. For this purpose, a series of ANN models were designed using the mentioned parameters. The performance prediction of the constructed models was checked using both coefficient of determination (R 2) and root mean square error (RMSE) criteria as presented in Table 3. In this table, each hidden node is run five times. It is well established that a constructed model with lower RMSE and higher R 2 values is of advantage. Based on the obtained results, run 2 of the ANN model No. 4 with N h  = 4 indicates higher R 2 and lower RMSE values compared to other constructed models. So, an architecture of (3 × 4 × 1) was selected and introduced for solving an MSS problem by ANN model. More discussions regarding the evaluation of the ANN model will be given later.

Table 2 Several equations for determination of the no. of hidden node by previous investigators
Table 3 R 2 and RMSE values of the constructed ANN models

4.2 Prediction of MSS by PSO-ANN model

As mentioned in Sect. 1, in this study, an attempt has been made to increase the performance prediction of the ANN model by incorporating PSO algorithm to develop a predictive model with a higher degree of accuracy for MSS prediction. In this system, PSO is performed for minimization of a cost function by adjusting the weights and biases. The followings are the modeling procedure of the hybrid PSO-ANN model in predicting MSS.

4.2.1 Swarm size

The number of particle or swarm size has a significant impact on the performance capacity of the hybrid PSO-ANN technique. Considering the results of previous studies, there is no any specific way to determine proper swarm size. Therefore, it is well known to obtain swarm size considering parametric study using trial-and-error method (e.g., [55, 56]). Table 4 presents the results of PSO-ANN models for various numbers of particles together with their RMSE and R 2 values. In these analyses, iteration number of 100 and architecture of 2 × 5 × 1 were considered. In addition, based on literature’s suggestions [55, 56], velocity coefficients of 2 (C 1 = C 2 = 2) and inertia weight of 0.25 were utilized in all PSO-ANN models of this study.

Table 4 Results of PSO-ANN models for various number of particles in predicting MSS

As depicted in Table 4, selecting the best swarm size is very difficult. To overcome this problem, a ranking technique introduced by Zorlu et al. [57] was used. According to the mentioned technique, each performance index (RMSE or R 2) was ordered in its class and the best performance index was assigned the highest rating. For example, values of 0.882, 0.887, 0.920, 0.928, 0.894, 0.898, 0.922, 0.913, 0.918, 0.930, 0.902 and 0.938 were achieved for R 2 of training datasets of models 1–12, respectively, and values of 1, 2, 8, 10, 3, 4, 9, 6, 7, 11, 5 and 12 were assigned to their ranks, respectively. Additionally, in the case of RMSE and also testing datasets, this procedure was applied. Afterwards, for each PSO-ANN model, the ratings of the RMSE and R 2 for both training and testing datasets were summed up (total rank). According to the total rank results, PSO-ANN model No. 10 with a swarm size of 400 shows the highest total rank value. Hence, 400 was chosen as the optimum number of particle or swarm size in predicting MSS.

4.2.2 Termination criteria

The defined termination criteria in this study are considered as maximum number of iterations (I Max). An usual way for determining the I Max is to compare the network result in various iteration numbers. Previous researchers [32, 58] suggested various I Max values for solving different engineering problems. For instance, I Max values of 400, 400 and 450 were recommended for solving the problems in the studies conducted by Jahed Armaghani et al. [32], Gordan et al. [58] and Tonnizam Mohamad et al. [59], respectively. Therefore, another parametric study was conducted on the swarm size values used in the previous stage to find IMax as displayed in Fig. 4. Here, performance prediction of the network was checked using RMSE results. In obtaining I Max, iteration number of 1000, C 1 = C 2 = 2, inertia weight of 0.25 and architecture of 2 × 5 × 1 was applied. As shown in Fig. 4, after iteration No. of 300, there are no significant changes in the network results for all swarm size values. Hence, IMax of 300 was chosen in the modeling process of this study in predicting MSS.

Fig. 4
figure 4

Results of PSO-ANN network for determining the I Max

4.2.3 Network architecture

In this step of PSO-ANN design (which is the last step of that), using the obtained PSO parameters from the previous steps, 5 PSO-ANN models were trained like ANN design section. The performance prediction of these models was also considered based on RMSE and R 2 results as presented in Table 5. As a result, the best PSO-ANN model for the MSS prediction is obtained as run No. 3 considering both results of RMSE and R 2. Model details about the evaluation of the developed PSO-ANN model are discussed in the following section.

Table 5 R 2 and RMSE values of the constructed PSO-ANN models

5 Results and discussion

In this study, two non-linear AI models, i.e., ANN and PSO-ANN were developed to predict MSS caused by tunneling. To evaluate the accuracy level of the aforementioned models, results of training (114 datasets) and testing (29 datasets) datasets, based on 80 % and 20 % of whole datasets, were considered and these results were compared to the measured MSS values. Three of the most well-known performance indices, namely RMSE, R 2 and variance account for (VAF) were used/computed to check the performance of the predictive models:

$${\text{RMSE}} = \sqrt {\frac{1}{n} \times \mathop \sum \limits_{i = 1}^{n} \left[ {\left( {x_{i} - x_{p} } \right)^{2} } \right]}$$
(6)
$${\text{VAF}} = \left[ {1 - \frac{{{\text{var}}\left( {x_{i} - x_{p} } \right)}}{{{\text{var}}\left( {x_{i} } \right)}}} \right] \times 100$$
(7)
$$R^{2} = \frac{{\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - x_{\text{mean}} } \right)^{2} } \right] - \left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - x_{p} } \right)^{2} } \right]}}{{\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {x_{i} - x_{\text{mean}} } \right)^{2} } \right]}}$$
(8)

where x i is the measured value, x p is the predicted value, x mean is mean of the measured value, ‘var’ is the sign for the variance and n is the number of data sets. If RMSE is zero, VAF is 100 (%) and R 2 is one, the model will be excellent. Results of models performance indices for developed models are presented in Table 6. Based on Table 6, the lowest values of RMSE and the highest value of VAF and R 2 are obtained from the PSO-ANN model. For instance, RMSE equal to 0.04 and 0.05, for training and testing datasets, respectively, reveal that PSO-ANN model can predict MSS with high accuracy level. Furthermore, the relationships between the best datasets of ANN and PSO-ANN models in predicting MSS and the measured MSS for training and testing datasets are displayed in Figs. 5 and 6, respectively. Results of developed ANN model based on R 2 values are obtained at 0.939 and 0.940 for training and testing datasets, respectively, whereas values of 0.973 and 0.968 are achieved for R 2 of the selected PSO-ANN model. This indicates the superiority of the predictive PSO-ANN model compared to the proposed ANN predictive model. Note that the mentioned comparison was performed using normalized datasets for both measured and predicted values.

Table 6 Performance prediction of the developed ANN and PSO-ANN models
Fig. 5
figure 5

R 2 values of the selected ANN datasets for training and testing

Fig. 6
figure 6

R 2 values of the selected PSO-ANN datasets for training and testing

6 Sensitivity analysis

To determine the relative influence of the each input parameter on the output parameter, sensitivity analysis was performed using the cosine amplitude method [60]. This method is formulated in the following equation:

$$R_{ij} = \frac{{\mathop \sum \nolimits_{k = 1}^{n} (x_{ik} \times x_{jk} )}}{{\sqrt {\mathop \sum \nolimits_{k = 1}^{n} x_{ik}^{2} \mathop \sum \nolimits_{k = 1}^{n} x_{jk}^{2} } }}$$
(9)

where x i and x j represent input and output parameters, respectively, and n is the number of all data sets. R ij is in the range of [0–1] and for the most influential parameter, R ij will be equal 1. In the present paper, horizontal to vertical stress ratio, cohesion and Young’s modulus were selected as input parameters, while, output parameter is MSS. The strengths of the relations between input and output parameters are given in Table 7. As can be seen from Table 7, horizontal to vertical stress ratio is the most influential parameter on MSS in this research.

Table 7 Strengths of relation between input and output parameters

7 Conclusion

Especially in urban areas, MSS prediction with a high degree of accuracy is very necessary. For this purpose, a new application of PSO-ANN model was proposed for predicting MSS caused by tunneling along the line 2 of Karaj subway. To check the performance capacity of PSO-ANN model, a pre-developed ANN model was applied. In this regard, 143 groups of datasets were prepared in 114 and 29 datasets for training and testing datasets, respectively. The values of horizontal to vertical stress ratio, cohesion and Young’s modulus were taken as input parameters, while MSS was considered as an output parameter. To evaluate the authenticity and accuracy of the developed models, three performance indices, namely RMSE, VAF and R 2 were applied. The results revealed that PSO-ANN model can perform better than the ANN model for prediction of MSS. The R 2 equal to 0.9725 and 0.968 for training and testing datasets, respectively, indicate the high conformity of the PSO-ANN model in predicting MSS, while these values were obtained at 0.939 and 0.94 for ANN model, for training and testing datasets, respectively. Moreover, sensitivity analysis was carried out with input and output parameters and it was found that horizontal to vertical stress ratio has the strongest effect, based on considered datasets in this case study, on the MSS.