1 Introduction

Over the last few years, the world has witnessed an enormous growth in urban areas. Subsurface structures such as tunnels and underground metro stations have become the method of choice for overcoming congestion on the ground surface, whereas urban environments have become more limited. Although underground structures have been effective in addressing congestion at the ground surface, some problems and challenges remain in relation to tunneling in urban environments.

Estimation of the environmental impacts of tunnel construction is one of the most important stages in tunnel design in urban areas. Although construction of tunnels in urban areas has various long-term benefits, it may also raise important environmental issues. In urban areas, for instance, surface settlement as a result of tunnel construction may cause considerable damage to adjacent buildings. It is therefore of major concern in the design of underground works to estimate tunneling-induced ground movements. In general, five types of methods have been employed to predict tunneling-induced ground movements; empirical methods based on case history data, analytical methods based on closed-form solution and elastic theory, experimental methods based on laboratory tests and centrifuge modeling, and finally, numerical and artificial intelligence methods.

An empirically derived relationship based on observation of transverse settlement troughs in several tunneling projects was introduced by Peck [1]. He illustrated that the transverse settlement trough over a single tunnel can be described by a normal probability curve. As shown in Fig. 1a, permanent properties of the normal probability curve represent the pattern of the transverse settlement profile above a single tunnel. Peck [1] proposed the following equation to estimate transverse settlement troughs:

Fig. 1
figure 1

a Transverse and b longitudinal settlement troughs

$${S_{\text{v}}}={S_{{\text{v}},{\text{max}}}}~{\text{exp}}\left( { - {y^2}/2{i^2}} \right),$$
(1)

where \({S}_{\text{v}}\) is the vertical surface settlement at the y distance from the tunnel’s center line, \({S}_{\text{v}, \text{max}}\) is the maximum surface settlement that usually occurs above the tunnel’s center line, and i is the horizontal distance from the tunnel’s center line to the inflection points of the settlement trough. The transverse settlement trough has the maximum slope at the point of inflection that represents the standard deviation in the normal probability curve.

Based on the normal probability equation, an empirical solution was developed by Attewell and Woodman [2] by investigating several case histories of tunnel construction to estimate the longitudinal settlement of ground surface. They showed that the cumulative probability curve is reasonably valid for estimating the longitudinal settlement trough, as shown in Fig. 1b. The ground displacements for any desired point along the longitudinal direction can be obtained as follows:

$$S=\frac{{{V_{\text{s}}}}}{{\sqrt {2\pi } i}}\left\{ {G\left[ {\frac{{x - {x_{\text{i}}}}}{i}} \right] - G\left[ {\frac{{x - {x_{\text{f}}}}}{i}} \right]} \right\},$$
(2)

where \({V}_{\text{s}}\) is the volume of ground loss, \(x\) is the longitudinal position of the considered surface point, \({x}_{\text{i}}\) is the initial position of the tunnel, \({x}_{\text{f}}\) is the location of the tunnel face, and \(G\) is the numerical integration of the normal probability curve. When \(x={x}_{\text{f}}\), the quantity of \(G\) is 0.5, and when \((x-{x}_{i}\)) approaches infinity, the quantity of \(G\) approaches one.

Principal equations of elastic theory have been employed in different studies to develop analytical methods [3,4,5,6]. These methods apply only to circular tunnels, however, and are therefore inappropriate for non-circular tunnels under invariant geological conditions.

In the last decades, numerical methods have been developed in response to the increasing computing power and ability of numerical methods used to analyze complex geometrical conditions. Extensive researches have been conducted to estimate tunneling-induced ground movements using numerical analyses [7,8,9]. The time and expense required to conduct a full three-dimensional analysis with advanced non-linear soil constitutive models are considerable. In addition, calculation accuracy strongly depends on the selection of an appropriate mesh in terms of type and size.

Various laboratory model tests have been conducted by previous researchers to investigate the ground movements and collapse mechanism induced by tunneling in different types of soil [10,11,12,13]. Laboratory model tests are carried out under single gravity (1 g) or under multiple gravities in centrifuge modeling to investigate the most relevant factors influencing the ground-tunnel behavior. Tunneling procedure is modeled by either placing soil around a pre-installed tube as a tunnel and controlling the supporting pressure or pre-cutting the tunnel opening and installing a lining system [14]. In physical modeling, the tunnel-ground responses are investigated by means of a variety of techniques including the trap door, rigid tube, pressurized air bags, polystyrene foam and organic solvent.

Several studies in recent year employed artificial neural networks (ANNs) to predict the tunneling-induced ground movements [15,16,17,18]. Since the relationships between input and outputs data of ANNs are complex, training algorithms have to be used in relationship modeling process. The backpropagation (BP) algorithm seems to be the most popular neural network algorithm in civil engineering disciplines because of its simplicity. However, the inability of the BP algorithm to escape from local minimums has been well defined in several studies [19,20,21]. This is in contrast with the aim of simulation with ANNs that is to find the global minimum of the error function. Furthermore, the BP algorithm has a slow rate of learning and consequently it cannot be readily applied to large problems. To overcome these limitations, the idea of improving ANNs learning performance has been practiced by integrating ANNs with other computing paradigms such as genetic algorithms, fuzzy logic and ant colony algorithms.

This paper is aimed to develop and employ a hybrid ANN-PSO method in prediction of tunneling-induced three-dimensional ground movement. For this purpose, an optimized PSO-ANN model was generated from data obtained from the Karaj Urban Railway (KUR) Project in Iran. Subsequently, maximum surface settlement and inflection points in transverse and longitudinal settlement troughs were predicted by means of the proposed model. The predicted values of maximum surface settlement and inflection points and the classical empirical relationship proposed by Peck [1] and Attewell and Woodman [2] were used to obtain the three-dimensional surface settlements. This method can reduce the limitations of ANNs to make them more suitable and accurate for predicting ground movements induced by tunneling.

2 A hybrid PSO-ANN model

Researchers in different disciplines of science attempted to improve ANNs learning performance and generalizations abilities by integrating ANNs with PSO algorithms [22,23,24,25]. Since PSO is a robust global search algorithm, it can be used to adjust the weights and biases of an ANN to improve its performance. Typically, there is more probability of convergence at a local minimum by ANNs, as PSO is capable of finding a global minimum. Hence, a hybrid PSO-ANN model encompasses the search properties of both, whereby PSO looks for a global minimum in the search space and ANN uses it to find the best results.

The ability of PSO as a robust algorithm to find the global solution for its objective function can be used to improve the performance of ANN by adjusting its weights and biases [26]. Compared with PSO, it is more probable for ANN to trap into local minimums. A hybrid PSO-ANN model takes advantage of the both methods; where PSO searches for a global minimum and ANN employs the PSO outputs for fine the best results.

2.1 Artificial neural networks

An artificial neural network (ANN) is an information-processing pattern based on simulation of the biological nervous systems which estimate existing functions from actual data. ANNs are particularly suitable for problems too complex to be modeled and solved by classical mathematical and traditional procedures. They learn by example and obtain a close approximation relationship among the data, in contrast to most empirical and statistical methods, which need prior knowledge.

McCulloch and Pitts [27] proposed the threshold logic unit as the earliest neuron. The first ANN based on their work was developed by Rosenblatt [28] and called the perceptron. The perceptron is the simplest kind of feedforward neural network. Using a linear combination of inputs, this network produces an output scaled as + 1 and − 1. There was no algorithm which could adapt the neural network to minimize error until 1985, when the BP algorithm was discovered. To calculate the weight changes, the BP-ANN back propagated the error at the output layer to the hidden layer. Subsequently, ANNs have been further developed in several research [29,30,31]. The most common method of teaching ANNs in prediction modeling is BP that is a supervised learning method [31]. This method attempts to reduce the error between actual and calculated outputs among different layers of the network. Depending on the amount of error in the given and calculated outputs, learning arises in the perceptron by changing connection weights.

The foundation of an ANN is formed by nodes or neurons in arrays of parallel interconnected processing units. A multilayer perceptron (MLP) is a feedforward ANN model that maps sets of input data into a set of proper data. An MLP network consists of several layers or nodes commonly referred to as input layer, hidden layer(s) and output layer. The actual processing is done by neurons in the hidden layer(s), whereas the input and output neurons only collect and distribute the signals.

2.2 Particle swarm optimization

PSO was proposed by Kennedy and Eberhart [32] and developed further by Shi and Eberhart [33]. This computational method was originally formed on the basis of the social behaviors of bird and fish swarms. In a PSO system, particles as the entities are scattered in a multidimensional space that includes potential solutions for an objective function. The following criteria govern the movement of particles:

  1. 1.

    each particle considers its own best results as individual experience

  2. 2.

    all particles include overall experiences of the swarm that is the best results in the whole system

In other words, each particle discovers its movement through the search space by combining some aspect of the history of its own fitness value with the experience of neighboring particles during the search. Therefore, particles adjust their positions according to their own experience and that of the neighboring particles, making use of the best position encountered by themselves and their neighbors [34].

In the starting step of the optimization, certain numbers of particles are scattered throughout a multidimensional search space in a random pattern and subsequently the initial positions of particles are determined. Each particle is a representative of a feasible solution and an objective function determines the goal of the swarm. The fitness of each particle is calculated as its corresponding value obtained from the objective function. After determination of fitness values for all particles, velocity of each particles is calculated based on the above mentioned two criteria using the velocity function from the following equation:

$$\overrightarrow {{v_{new}}} =\vec {v}+{r_1}{C_1} \times \left( {\overrightarrow {{p_{best}}} - \vec {p}} \right)+{r_2}{C_2}*\left( {\overrightarrow {{g_{best}}} - \vec {p}} \right),$$
(3)

where \(\overrightarrow{{v}_{new}}\), \(\overrightarrow{v}\) and \(\overrightarrow{p}\) are the new velocity, current velocity, and current position of particles, respectively. \({C}_{1}\) and \({C}_{2}\) are the predefined velocity coefficients (acceleration coefficients), \(\overrightarrow{{p}_{best}}\) is the personal best position of a particle, and \(\overrightarrow{{g}_{best}}\) is the global best position among all particles. \({r}_{1}\) and \({r}_{2}\) are the random values in the range (0, 1) sampled from a uniform distribution. Following Kennedy and Eberhart [32], \({r}_{1}\) and \({r}_{2}\) were added to update the scheme to avoid settlement of particles in a united, unchanging direction. The next position of particles (\(\overrightarrow{{p}_{new}}\)) is then determined from their velocities derived from the following equation:

$$\overrightarrow {{p_{new}}} =\vec {p}+\overrightarrow {{v_{new}}} .$$
(4)

An inertia weight \(\left(w\right)\) was introduced to the PSO velocity function by Poli et al. [35] to provide more precise control over velocity of the particles. The inertia weight controls the acceleration of swarm and avoids a phenomena called explosion of swarm by determining the rate of contribution of particle’s previous velocity into their current velocity [36]. The following equation provides the updated velocity function:

$$\overrightarrow {{v_{new}}} =w\; \cdot \;\vec {v}+{r_1}{C_1} \times \left( {\overrightarrow {{p_{best}}} - \vec {p}} \right)+{r_2}{C_2} \times \left( {\overrightarrow {{g_{best}}} - \vec {p}} \right),$$
(5)

where \(w\) is the inertia weight. Figure 2 shows the standard flowchart of a PSO algorithm. This iterative process continues until the swarm meets the termination criterion/criteria. Different criteria have been use by researchers with most common among them are reaching a number of iterations and approaching a pre-determined accuracy level in the results.

Fig. 2
figure 2

Standard flowchart of PSO

2.3 The hybrid PSO-ANN model

The main objective in ANN training is to adjust a set of weights and biases to minimize an objective function. Usually, root mean square error (RMSE) is used as the objective function. PSO and ANNs employ different approaches to minimize an objective function. Typically, there is more probability of convergence at a local minimum by ANNs, as PSO is capable of finding a global minimum and continues searching around it. Therefore, a hybrid PSO-ANN model has the search properties of both PSO and ANN; PSO looks for a global minimum in the search space and ANN uses it to find the best results.

The main goal in ANN training is to minimize an objective function by adjusting a set of weights and biases. In general, the root mean square error (RMSE) is used as the objective function. PSO and ANNs employ different approaches to minimize an objective function. As discussed earlier, there is more likelihood of convergence to a local minimum by ANNs, since PSO is more powerful in finding the global minimum. Therefore, the best results are expected to be achieved by developing a hybrid PSO-ANN model with benefits of both PSO and ANN. In this model, PSO is responsible to look for the global minimum in the search space and ANN completes the job by finding the best results.

An appropriate objective function is required to train ANNs by means of PSO algorithms. Since the main target of ANNs is reaching the minimum difference between predicted and actual values, RMSE can be used as the objective function. In this case, each particle represents a candidate solution for minimizing RMSE, whereas each component of a particle represents one ANN weight or bias. Figure 3 shows the initializing process of a group of random particles prior to start the learning process in hybrid PSO-ANN model. The weight and biases of ANN are assigned randomly as positions of particles. In the next step, the model training is started by calculating the error between the randomly assigned initial values of weight and biases that are alternatively called initial position of the particles and the actual values. The hybrid model continuously reduces the magnitude of the error by improving the positioning of the particles during iterations. The updated positions of particles are calculated by the velocity function in which includes the values of \(\overrightarrow{{p}_{best}}\) and \(\overrightarrow{{g}_{best}}\), respectively, representatives of the smallest error obtained by each particle and by the swarm so far. Therefore, a value for position adjustment to the best solutions is produced and a new error is obtained from the updated positions. The error minimization is continued during the iterations until the termination criteria of the model are met. The optimization output is then determined based on the foremost \(\overrightarrow{{g}_{best}}\) value.

Fig. 3
figure 3

Development process of a hybrid PSO-based ANN model

3 Model implementation for surface settlement prediction

3.1 The database

Large numbers of reliable data are required to generate a comprehensive hybrid PSO-ANN network. For that reason, Phase I of Line No. 2 of the Karaj Urban Railway (KUR) Project in Iran was chosen to develop the desired model. Extensive geotechnical studies were performed before and during the tunnel construction and ground movements were frequently monitored during tunnel construction.

The 27 km long Line No. 2 of KUR is planned to connect north-west of Karaj city to its south through Kamal-Shahr to Malard railway stations. The metro line is composed of a single tunnel (with a double track) with 23 underground stations constructed by the New Austrian Tunneling Method (NATM). Figure 4 shows Line No. 2 of the KUR construction, divided into two phases; phase I, constructed between Kamal-Shahr and Station Q, and phase II, under construction between Station Q and Malard.

Fig. 4
figure 4

Schematic view of KUR Line No. 2 and location of subway stations

The NATM technique is used for excavation of the tunnel in the KUR project. This tunnel is classified as a shallow tunnel, in which the depth is changing between 6.5 and 23.5 m below the ground surface. A two-section excavation was designed for the tunnel to improve the self-stability of the ground in the gap time between excavation and completion of the lining installation. The upper part or top heading is excavated in one step, while the lower part or the bench is bored in two steps. The tunnel has a horseshoe shape with a height of 9 m and width of 9.6 m after excavation, and a height of 7.8 m and width of 8.4 m with the lining. A layer of shotcrete lining with a thickness of 0.30 m, including wire mesh, and lattice girders forms the primary support system. The lattice girders are made of curved steel frames installed every 120 cm along the tunnel axis on the inner side. A 30-cm thick reinforced concrete layer shapes the final tunnel lining. The tunnel dimensions and excavation sequence are shown in Fig. 5.

Fig. 5
figure 5

Tunnel dimensions and construction sequence

3.1.1 Geotechnical studies

Using various in situ and laboratory tests, comprehensive geotechnical studies were performed before the start of the KUR Project. The ground conditions were determined from samples collected from the boreholes and other geophysical techniques. By visual inspection of the soil during excavation, inorganic clay with clayey sand, clayey sand, and occasionally silty sand overlying clayey and silty gravel were identified, respectively, from top to bottom.

Boreholes drilled throughout the excavation area are spaced every 100 m along the tunnel alignment to provide information about the soil layer thickness, the physical and mechanical characteristics of the soil, and groundwater level. In total, 136 boreholes have been excavated in phase I of Line No. 2 of KUR at a depth of 25 m in the tunnel route and 30 m in the stations. The investigation of boreholes showed no water table in the containing soil of the tunnel and stations. In addition, 38 hand-dug wells have been excavated to carry out visual inspection of the soil layers.

3.1.2 Surface settlement measurement

Steel rod settlement markers were fixed throughout the tunnel alignment as a part of an extensive monitoring program to frequently measure the surface settlement. The markers were grouted in a depth of 100 cm into the ground to isolate them from any external surface movements such as asphalt movements. This monitoring operation was conducted in accordance with conventional survey techniques. In this project, tunneling was mostly conducted below the streets and hence arrays of settlement markers were placed throughout the streets to determine the settlement trough properties. In some of the stations, extensometers were also included in investigation of the subsurface settlement. To obtain a reliable source of ground movement data, transverse arrays of settlement markers were placed in three configurations along the tunnel axis and stations as shown in Fig. 6.

Fig. 6
figure 6

Schematic plan of settlement markers’ location in the KUR Project

  1. 1.

    The first group of arrays consisting of five markers (the distance between them is 7.5 m) installed every 25 m above the stations’ axis

  2. 2.

    The second group of arrays consisting of five markers installed 50 m away from the stations’ portal

  3. 3.

    The third group consisting of three settlement markers spaced every 100 m throughout the tunnel alignment.

3.2 Input and output parameters

In general, tunneling-induced ground movements are caused by three factors; immediate settlement owed to tunnel excavation, deformation of tunnel lining, and consolidation. Immediate settlement, the major factor, is a function of the tunnel depth and diameter, geological and geotechnical conditions, and construction procedure. Deformation of the tunnel lining has an insignificant role in creating surface settlement and is usually negligible. Long-term settlement because of primary and secondary consolidation takes place in the saturated soils and groundwater conditions. To predict the tunneling-induced ground movements using ANNs, it is essential to determine all influential parameters on surface settlements. This is due to the fact that ANNs only relies and work on the given data without having any prior knowledge about the settlement. In prediction of tunneling-induced ground movements by means of ANNs, the accuracy of the results is significantly influenced by the relevance of the selected parameters and their accuracy. It is worth mentioning that rejection of inappropriate inputs may enhance the prediction results and reduce the training time.

In general, the parameters affecting ground movements can be categorized into two universal groups; (1) geological and geotechnical conditions, and (2) tunnel geometric properties and tunneling methods. Hence, various influential geotechnical parameters such as SPT N values, soil cohesion, friction angle, soil unit weight, elastic modulus, and Poisson’s ratio were selected as input data to predict ground movements induced by tunneling. It is well known that soil cohesion can only be investigated as a suitable parameter to evaluate the ground movements in fine-grained soils. Therefore, both SPT N-value and soil cohesion parameters were investigated as input data because the KUR Project includes a wide range of fine-grained and coarse-grained soils. Ground water level, as a critical parameter in all excavation methods, can strongly affect ground movements. As previously mentioned, however, investigations of hand-dug wells and boreholes showed no water table in the containing soil of the tunnel and stations in the project. Therefore, this parameter was not investigated in the simulations.

The excavation’s cross-sectional area and the tunnel depth are influential geometric parameters regarding surface settlement. It is expected that a greater area of ground moves towards the tunnel face when the diameter is larger. However, the excavation cross-sectional area in the KUR Project is constant and the effect of the tunnel diameter on the surface settlement can be neglected. Therefore, as an influential geometric parameter in tunneling-induced ground movements, tunnel depth was used as an input parameter. As an operational parameter, the advancement rate appears to influence surface settlement. This parameter has been recorded during tunnel excavation. Observations of the KUR Project suggest that a low advancement rate or stopping the excavation can cause considerable surface settlement. Therefore, tunnel advancement rate was used as an input parameter. Table 1 shows the range of values for the parameters used in the prediction of transverse and longitudinal surface settlements. A total of 123 input datasets, each including eight input parameters, were obtained from a 14.5-km distance in the phase I of line No. 2 of the KUR Project.

Table 1 Input and output parameters in surface settlement prediction

To simulate surface settlements in transverse and longitudinal directions, it is necessary to consider the parameters that describe the settlement trough in both directions. The maximum surface settlement and horizontal distance from the tunnel’s center line to the point of inflection determine the transverse settlement trough (as in Eq. 1). The overall settlement and the settlement profile of a tunnel consist of a number of elements can be obtained as the result of settlement of its distinct elements (as in Eq. 2). To simulate the surface settlement in transverse and longitudinal directions, maximum surface settlement and location of inflection points in transverse and longitudinal directions were employed as output data. Table 1 shows the ranges of parameters used for training in the models. A total of 123 output datasets were obtained from the KUR Project, each of which corresponds to one input dataset.

3.3 Network design

The computer code for the hybrid PSO-ANN model was developed in Matlab. It is worth mentioning that there is no BP concept in hybrid PSO-ANN networks. The main objective of integrating PSO into ANNs training is to minimize RMSE by determining a set of appropriate weights and biases. PSO performs best when its parameters are selected properly. Therefore, PSO parameters were defined by conducting a series of sensitivity analyses. The optimum values of swarm size, iteration number, and particle velocity coefficients (C1 and C2) were obtained from the mentioned analyses. The role of PSO is to adjust the weights and biases of ANN and minimize the learning error. The network architecture including the number of hidden layer(s) and consequent number of nodes in each hidden layer is defined by trial-and-error method.

To perform the aforementioned analyses, a network with one hidden layer and eight nodes in that hidden layer was used as the initial model. This model was trained several times with different numbers of desired parameters and a fixed number of other parameters to determine the optimum PSO parameters. Consequently, the optimum network architecture was determined by the trial-and-error method.

To perform the above analysis, an initial model of a network was used with a hidden layer and eight nodes in the hidden layer. This model was formed several times with different numbers of desired parameters and a fixed number of other parameters to determine the optimal parameters of PSO and finally led to the optimal network architecture.

3.3.1 Swarm size

The first sensitivity analysis was performed on the swarm size (number of particles). While a small swarm usually cannot converge to a global solution, a large swarm can lead to delayed convergence and lengthy training time. To date there is no theoretical solution on the optimal size of the swarm. Therefore, it is common to conduct a series of sensitivity tests on convergence of the swarm to find the optimum number of particles for each problem.

In this study, sensitivity analyses were performed for each size of the proposed swarm by considering a fixed number of 1000 iterations and an equal value of 2 for both coefficients C1 and C2. The objective was to find the minimum and maximum values of RMSE and coefficients of determination (R2), respectively. The results of the analyses are shown in Fig. 7.

Fig. 7
figure 7

Relationship between swarm size and network performance

Figure 7 shows a substantial increase in the network performance between the swarm sizes of 10 and 25 and a moderate one between the swarm sizes of 25 and 125. There is no significant change in the values of R2 and RMSR for particles of more than 125. On the other hand, Fig. 8 indicates that the training time (elapsed time recorded by an INTEL CORE i7 PROCESSOR 2600 3.40 GHz) increased sharply as the swarm size increased. Therefore, swarm a size of 125 was selected as the optimum population of the swarm.

Fig. 8
figure 8

Total times taken to train the network with different swarm sizes

3.3.2 Termination criteria

The iterative process end when termination criteria are met. One or two termination criteria are usually defined, in which the first criterion and the easier one is a condition to achieve the required precision, while the second one is that is set to limit the number of iterations is more difficult to be defined. This number can be determined by comparing the outcomes of a sensitivity analysis on accuracy and convergence by setting different limits for maximum iteration number.

Smaller iteration numbers are normally suitable for problems with simple search spaces, while they must be larger to satisfy the convergence and required accuracy for the problems with more complex search spaces. Here, the maximum number of iterations is used as a termination criterion. It is worth noting that the training time gradually increases with an increase in the maximum number of iterations. Therefore, a series of sensitivity analyses was applied to a hybrid PSO-ANN model to find the appropriate number of iterations. These analyses were conducted by setting a fixed iteration number of 1000, the value of two for velocity coefficients (C1 and C2), one hidden layer and eight nodes in the hidden layer, and different swarm sizes. The aim of the sensitivity analyses was to monitor the Gbest among all the particles in each iteration.

Figure 9 shows the results of sensitivity analyses regarding the iteration number. According to the figure, the significant changes happened in the first iterations, whereas the changes were moderate up to iteration 200. After this iteration number, there were no significant changes in the Gbest values. Therefore, maximum number of iterations was set to 200 and the optimization process stopped at this number of iterations.

Fig. 9
figure 9

Convergence process in each iteration for different swarm sizes

3.3.3 Velocity coefficients

The same model was used to conduct the next sets of sensitivity analyses with the goal of finding the optimum values of velocity coefficients (C1 and C2). The original coefficients of Kennedy and Eberhart [32] and the modified values of Clerc and Kennedy [37] were used as initial boundaries to form a series of candidate combinations that are shown in Table 2. A constant swarm size of 125 together with the previously defined maximum iterations of 200 was used in all the analyses. The formation of the network was also the same as the network.

Table 2 Results of sensitivity analyses for velocity coefficients

Based on the relationship of the velocity coefficients, the results can be divided into two groups. The first group includes unequal coefficients (models 1–7), whereas the second group contains equal coefficients (models 8–12) (see Table 2). Generally, it can be said that when C1 < C2 the model yielded better results than other conditions. In the second group, the best results were obtained when the coefficients were equal to 2. In this case, Model 2 yielded the best performance amongst all models. Consequently, values of 1.333 and 2.667 were selected as C1 and C2, respectively.

3.3.4 Network architecture

To later determine the parameters of the PSO for superior performance in a hybrid PSO-ANN model is important to architecting optimal network. Since the input parameters are governing the number of nodes in the input layer, the network architecture is constrained by the number of input parameters. The output layer is adjusted according to the variable response in a similar way. Therefore, the components of the network architecture are number of hidden layers and the number of nodes connected to each hidden layer. The optimal network architecture is usually determined according to the described trial-and-error method, due to the fact that there is no theoretical analysis to be employed in this process.

Twelve hybrid models were developed by the trial-and-error method. Different numbers of hidden layers and different number of nodes in each hidden layer were used in the process. One or two hidden layers and 6, 9, 12, 15, 18 and 21 nodes in each hidden layer were considered to find the related optimum parameters of the network. All of the produced models utilized the PSO with optimized parameters in their training. At the same time, the performance of each model was evaluated by a transverse K-fold validation technique [38]. This technique works by dividing the data into K parts, in which K-1 parts are involved with training and the remaining one part is used to test the model. By repeating the same process for K times, all the data are used in training and the test phases.

A total of 123 datasets were used in the analyses. Fivefold cross-validation was employed to evaluate the performance of models and each model was trained with fourfold (98 datasets) and tested with onefold (25 datasets). Consequently, each model was trained and tested with different combinations of training and testing datasets for five times and the model performance was evaluated by average values of R2 and RMSE for testing datasets. The architecture of the model that yielded the best performance (high values of R2 and low values of RMSE) was selected as the optimum network architecture to be used in settlement prediction.

Table 3 summarizes the results of analyses. Model 3 composed of 1 hidden layer and 12 nodes was the best performer among all the models. Therefore was selected as the optimum network architecture to be used in predicting the tunneling-induced ground movements. The R2 values for the selected PSO-ANN model were 0.94 and 0.93 for training and testing datasets, respectively. Figure 10 illustrates the agreement between the actual and predicted values of maximum surface settlement and inflection points in transverse and longitudinal directions obtained by the proposed model for testing datasets. Since the predicted values are in close agreement with the actual results, it can be assumed that the proposed approach is an applicable and accurate tool for predicting ground surface settlement.

Table 3 Performance of trained PSO-based ANN and BP-ANN models
Fig. 10
figure 10

Concordance between actual and predicted values for testing datasets

4 Comparison with pre-developed BP-ANN model

A comparative study of the performances of the hybrid PSO-ANN and pre-developed BP-ANN models was conducted to demonstrate the advantages of the proposed hybrid model. Similar to the PSO-ANN models, the optimum architecture of the pre-developed BP-ANN was determined with the trial-and-error method and was evaluated by a fivefold cross-validation technique. The results of the analyses are provided in Table 3. Interestingly, the same network architecture (Model 3) was obtained for both BP-ANN and hybrid PSO-ANN models with one hidden layer and 12 nodes in the hidden layer. The R2 values for the selected BP-ANN model were 0.99 and 0.88 for training and testing datasets, respectively.

Comparing the R2ave and RMSEave in different models helps one to understand how the PSO increases the applicability of ANN in predicting tunneling-induced ground movements. The values of R2ave for training datasets in different BP-ANN and hybrid PSO-ANN models are illustrated in Fig. 11. In Fig. 11a, the BP-ANN models in general can be seen to have higher R2ave than the hybrid PSO-ANN models for the training datasets. The R2ave values for the testing datasets are significantly higher for hybrid PSO-ANN models than for the BP-ANN models in all cases, however, as shown in Fig. 11b.

Fig. 11
figure 11

R2ave for a training and b testing datasets in different hybrid PSO-based ANN and BP-ANN models

The same condition obtained when the two ANN methods were compared in terms of RMSEave, as shown in Fig. 12. Large differences were recorded between the R2ave for training and testing datasets in the BP-ANN models, whereas these differences are considerably smaller in the hybrid PSO-ANN models. The same condition can be seen in terms of RMSEave for training and testing datasets in BP-ANN models and hybrid PSO-ANN models. This is because of the ability of PSO to minimize error with high efficiency; the PSO algorithm adjusts weights and biases of the error objective function in ANN to obtain minimum RMSE. From the results, it can be concluded that hybrid PSO-ANN models produce far more reasonable results than pre-developed BP-ANN models.

Fig. 12
figure 12

RMSEave for a training and b testing datasets in different hybrid PSO-based ANN and BP-ANN models

5 Numerical examples

To predict ground movements with the proposed model, two measured points were selected from different parts of the KUR Project in terms of geotechnical conditions and tunnel depth. Subsequently, the transverse and longitudinal surface settlements were obtained by means of the proposed model and the results were compared with the actual settlements. The geotechnical properties and tunneling characteristics of the selected points are provided in Table 4. It is worth mentioning that the applicability and accuracy of this approach are constrained by the data used in the training step. Outside the employed range this network may not be able to predict ground movements accurately enough.

Table 4 Geotechnical properties and tunneling characteristics at the measurement points

Figure 13 shows the transverse and longitudinal settlement troughs obtained by the proposed hybrid model at Point 1 (chainage 2 + 085 km). Maximum surface settlement and point of inflection in transverse direction were obtained by the proposed hybrid PSO-ANN model. Subsequently, the transverse trough was plotted by means of Peck’s theory [1] (Eq. 1), as shown in Fig. 13a. The solid curve in this figure shows the actual settlements measured by settlement markers at a distance of 0 m, 7.5 m and 15 m from the tunnel axis. Figure 13b compares the actual and predicted longitudinal settlement troughs. Similar to transverse settlement troughs, the maximum surface settlement and inflection point in the longitudinal direction were obtained by the proposed hybrid model, and the longitudinal settlement trough was plotted by means of Attewell and Woodman’s [2] equation (Eq. 2), as shown in Fig. 13b. From these figures, it can be concluded that the predicted settlement trough is in good concordance with the actual trough. Conversely, some differences exist between the actual and the predicted longitudinal troughs. The beginning and end parts of the settlement troughs are in close agreement, however. Finally, three-dimensional settlement troughs were plotted with the predicted transverse and longitudinal troughs, as in Fig. 14.

Fig. 13
figure 13

a Transverse and b longitudinal settlement trough obtained by proposed hybrid model at Point 1

Fig. 14
figure 14

Three-dimensional settlement trough at Point 1

The transverse and longitudinal settlement troughs for Point 2 (chainage 11 + 841 km) obtained by the proposed model are shown in Fig. 15. According to this figure, the percentage error of the predicted maximum surface settlement is less than 5%. Furthermore, the transverse and longitudinal troughs are fairly close to the actual troughs. The three-dimensional settlement trough obtained by the proposed model at Point 2 is shown in Fig. 16.

Fig. 15
figure 15

a Transverse and b longitudinal settlement troughs obtained by proposed hybrid model at Point 2

Fig. 16
figure 16

Three-dimensional settlement trough at Point 2

6 Conclusion

This paper presents a new hybrid PSO-ANN model for predicting three-dimensional ground movements induced by tunneling. Comprehensive information including geological and geotechnical conditions, instrumentation readings from surface settlement markers, and tunneling operations were collected from line No. 2 of the KUR Project in Iran. To perform the modeling, a MATLAB code was prepared and input and output parameters were subsequently determined. Input parameters were composed of the geometrical and operational parameters of the tunnel and geotechnical properties, whereas the output data consisted of the maximum surface settlement and inflection points of the settlement troughs in transverse and longitudinal directions. A series of sensitivity analyses was conducted to find the optimum parameters of PSO algorithm and the optimum network architecture by means of the trial-and-error method. Fivefold cross-validation was employed to evaluate the performance of models and finally an optimized hybrid PSO-ANN model consisting of eight inputs, one hidden layer with 12 nodes and three outputs was selected as the optimized network for ground movement prediction in transverse and longitudinal directions. A comprehensive comparison was conducted between the obtained results by pre-developed BP-ANN and proposed hybrid PSO-ANN models using the performance of trained and tested models. The results indicated that the hybrid models produce far more consistent results than the ordinary ANN models. Numerical simulations from two cases of the KUR line indicated that the proposed PSO-ANN model is capable of predicting very accurately three-dimensional ground movements induced by tunneling.