1 Introduction

Due to the inherent complexity of geotechnical materials, researchers tend to replace tedious theoretical solutions with soft computing methods to solve various geotechnical design problems and assessment issues. Geotechnical problems are characterized with great uncertainties, and involve various factors cannot be directly determined by engineers, which leads to the rapid popularity of machine learning (ML) methods (Goh and Zhang 2014; Wang et al. 2020; Zhang et al. 2017b). ML algorithms are capable of capturing the potential correlations among information without any prior assumptions (Goh et al. 2018; van Natijne et al. 2020; Zhang et al. 2015, 2019b). With the improvement of computing efficiency, explorations of artificial intelligence (AI) and deep learning (DL) are in full swing (Da'u and Salim 2020; Nguyen et al. 2019; Zhang et al. 2019c). Specifically, AI, ML and DL have an inclusion relationship as shown in Fig. 1. AI is a science like biology or mathematics, it studies ways to build intelligent programs can creatively solve problems imitating the human prerogative. As for ML, which is a subset of AI that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. More specific to DL, or deep neural learning, this is a particular kind of ML that achieves great power and flexibility by learning to represent the world as nested hierarchy of concepts, without manually extracting features. Whereas previous researches of traditional ML algorithms, such as Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), Multivariate Adaptive Regression Splines (MARS) and Naïve Bayes (NB), their performances tend to stagnate to a certain extent as the amount of data increases (Goh et al. 2017; Zhang and Goh 2013; Zhang et al. 2020d). More recently, as a part of the broader family of ML methods based on artificial neural networks, the accuracy of DL predictions will gradually increase with the dataset expansion under no noise premise, which provides efficient tools to deal with those data and other extract useful information to make reliable decisions in geo-engineering. (Shrestha and Mahmood 2019).

Fig. 1
figure 1

(Adapted from Goodfellow et al. 2016)

Venn diagram representing the relationships between AI, ML and DL.

In the first decade of the twenty-first century, researchers have mainly discussed the shallow artificial neural network (ANN) based on optimization algorithms for certain aspects, but the complexity of neural networks has not been extended too much. Till the 2012 ImageNet competition, Hiton and other scholars applied a DL method named AlexNet (a variant of convolutional neural network (CNN)), which greatly improved the predictive accuracy, giving rise to the wide spreading of DL algorithms in numerous fields and disciplines. A systematic review of DL development timeline as shown in Fig. 2. It can be seen that DL gets great process in recent decades. As one of the cutting-edge algorithms in AI, DL is advanced in defining complex nonlinear relationship between features in different domains, such as, health and medicine (Lisboa 2002), business and management (Wong et al. 1997), natural language processing(NLP) (Mikolov et al. 2010), image processing (Ayyıldız and Çetinkaya 2017; Egmont-Petersen et al. 2002; He et al. 2015), geosciences and remote sensing (Lary et al. 2016), mathematics (Gao et al. 2019a, b, 2018a, b), civil engineering (Gandomi and Alavi 2012a, b; Lazarevska et al. 2014; Zhang et al. 2020c, 2014), early warnings related to geotechnical problems(Chou and Thedja 2016), risk assessment (Adams and Kanaroglou 2016; Dong et al. 2017).

Fig. 2
figure 2

(Adapted from Favio Vázquez; https://www.google.com)

DL development timeline

For practical uses, the proportion of DL researches in geotechnical engineering is relatively less among the extensive applications, for the reason of uncertain and varying behavior of rock and soil. To the best knowledge of the authors, several researchers have gradually explored new decision-making tools for the application of neural networks in the geotechnical field (Chua and Goh 2005; Li et al. 2012; Uncuoglu et al. 2008). There exists a great deal of detailed reviews of ANN research, discussing AI technology in the future (Jiao and Alavi 2020), the application of ANN in the general civil engineering field (Huang et al. 2019; Kapliński et al. 2016), the meta-analysis of ANN application in geotechnical engineering (Moayedi et al. 2020; Zhang et al. 2019d) and the research on the ANN from shallow to deep models (Saikia et al. 2020), etc. Nevertheless, some pointed out that although the monitoring data in geotechnical field becomes huge as development expands, compared with some other fields, the amount of data is far from enough yet to fully apply DL (Phoon 2020; Zhong et al. 2020). Latest research in geotechnical field show that DL is a very potential means to solve problems related with many uncertain factors in geotechnics. As the fundamental tool of DL, ANN has been widely used in various geotechnical aspects, such as underground openings (Lee and Sterling 1992), braced excavation (Goh et al. 1995; Zhang et al. 2020b), earth retaining structures (Calabrese and Lai 2013; Moayedi et al. 2011), slope stability (Chakraborty and Goswami 2017), modeling tunnel boring machine (TBM) performance (Benardos and Kaliampakos 2004), liquefaction (Baziar and Jafarian 2007), pile integrity testing (Watson et al. 1995), soil swelling (Erzin 2007; Garg et al. 2014), predicting geotechnical parameters (Asadi et al. 2011a, b, c; Lee et al. 2003), pile bearing capacity (Moayedi and Armaghani 2018; Moayedi and Rezaei 2019; Mosallanezhad and Moayedi 2017), kinematic soil pile interaction response parameters (Ahmad et al. 2007), shallow foundation and pile foundation (Shahin 2015, 2016), as well as corrosion monitoring (Mabbutt et al. 2012). Regarding to the findings presented in the ANN applications above, new methods have been proposed to further improve the performance. With the rapid development of DL, the prospective applications in geotechnics and geoengineering should have been broad and profound.

However, the conducted investigations were limited to some specific topics and did not emphasize the new challenges and prospects of DL in this field. A demand for comprehensive DL investigations dedicated to the geotechnical engineering of the published studies still exists. With regard to this, the aim of this paper is to provide a structured and a comprehensive review of literature using DL to perform in geotechnical engineering and to find drawbacks in the current application process. The main contribution of this article is to present a detailed and updated state-of-the-art concerning the application of DL algorithms to solve complex geotechnical task, and to provide an overview about the potential of these advanced algorithms and how they can be explored on further applications. To better illustrate mentality and primary coverage of this paper, the research flowchart is presented in Fig. 3. As is conveyed by the literature survey process, we mainly count the publication of papers on this aspect for the past 20 years, and briefly depicts the literature distribution in journals and papers publication trending in Sect. 2. Four classes of DL methods, namely feedforward neural network (FNN), recurrent neural network (RNN), convolutional neural network (CNN) and generative adversarial network (GAN), have been found the most use up to date in geotechnical engineering. Section 3 presents the methodology and DL model architecture followed by the practical applications of these in Sect. 4. Finally, Sect. 5 discussed the challenges and future prospects in relation to this issue, and concluding remarks are presented in Sect. 6.

Fig. 3
figure 3

Research frame of DL review in geotechnical engineering

2 Literature overview and distribution in journals

Since the early 1900s, and up to the date of writing of this paper, there are more than fifty thousand research articles in the field of geotechnical engineering which were indexed in the web of science (WOS). Nevertheless, when the searching scope narrows down to the application of DL in the subjects of geotechnical engineering, there are only 158 literatures, with a very limited number of source title remained. As presented in Fig. 4, the 158 articles are arranged according to the annual of publications. Indeed, it can be seen that the DL application in geotechnical engineering was extremely limited in the first decade of twenty-first century, while the paper number has increased sharply in the last decade, leading to forty-three publications in the year 2019. It seems that the rapid growth of this process will continue since many papers in current year 2020 are still under processing due to the retrieve rules.

Fig. 4
figure 4

(Source: Web of Science; literature search last updated in November 2020)

Annual distribution of published journal papers focusing on the DL application in geotechnical engineering

To meet readers’ research interest in the geotechnical field of DL, the journals that mostly published more than one paper on the corresponding keywords and percentage of selected articles amongst the top fifteen journal sources are drawn as a pie chart in Fig. 5. Meanwhile, the distribution of journal sources by number of papers and the specific application paper are tabulated in Table 1 (data from Web of Science). Noteworthy, book chapters, editor notes, master and doctoral dissertations were not involved. As illustrated in the diagram, IEEE access is the leading journals, which has been published over 28% of published paper focus on use of DL in geotechnical engineering followed by the journal of TUNNELLING AND UNDERGROUND SPACE TECHNOLOGY. Three among the top fifteen journals (IEEE ACCESS, SENSORS, and ADVANCES IN CIVIL ENGINEERING) are open-access journals.

Fig. 5
figure 5

Proportion of journal sources by number of papers

Table 1 Most contributed journals of DL in geotechnical engineering and corresponding papers (Data from Web of Science)

From the statistics retrieval data of Web of Science, several DL algorithms are being used to solve geotechnics related problems. Figure 6 illustrates the main DL methods applied in this field, in summary, the FNN, RNN, CNN, and GAN are the most popular ones in solving complex geotechnical problems. Their applications include soil parameters inference (Mollahasani et al. 2011), pile bearing capacity (Singh and Walia 2017), TBM performance (Ninić et al. 2017), stratum thickness prediction (Zhou et al. 2019a), landslide susceptibility (Xiao et al. 2018), rock physical parameters evaluation (Karimpouli and Tahmasebi 2019), etc. Moreover, as a new rising unsupervised learning method GAN, it provides a bright prospect in rock image identification and reconstruction of the structure of porous media.

Fig. 6
figure 6

The main DL methods in applications

In addition, some other deep neural networks, such as FL-SegNet(focal loss SegNet) network (Dong et al. 2019), Auto-encoder algorithm (Canchumuni et al. 2017; Liu and Wu 2016), transfer learning (Lu et al. 2020), deep belief networks (Chen et al. 2012; Hinton et al. 2006; Li et al. 2019a; Ye et al. 2019) and Restricted Boltzmann Machines (Canchumuni et al. 2019), are also investigated to extract valuable geological parameters and information. However, the other DL methods are rarely applied and too difficult to develop rapidly in a short time (Ball et al. 2017). Thus, in the following sections, the principles and applications of the above-mentioned main DL algorithms will be explained respectively. Overall, the results indicate that combination of DL techniques with available database is a promising research direction with potential for engineering applications. Furthermore, DL is effective regarding the resolution of images, which is also an aspect superior to shallow ANN techniques.

3 DL methodology

In this section, the theory and structure of four DL methods including FNN, RNN (included improved version LSTM), CNN and GAN models are described in detail, and for each step in the process, explanation of the employed approach are elaborated with schematic diagrams.

3.1 Feedforward neural network

ANN, as indicated by its name, complies with biological learning process that existed in human brain, aiming at build a highly intelligent system like that (Barrow 1996; Gurney 1997). The biological neurons existed in in human brain correspond to those highly interconnected processing elements in ANN structure, which is called neurons (Fukushima and Miyake 1982). With complex multilayered integration, neurons form network, named Artificial Neural Network.

Feedforward neural network (FNN) is the basic and simplest artificial neural network model. The most common architecture of this computing paradigm is the multilayer perceptron which consists of at least three layers: input layer, output layer and hidden layer. FNN has the capacity of discovering complex patterns and solving different statistical issues (Huang et al. 2018a; Qi et al. 2018; Qi and Tang 2018). By constructing a sequence of interrelated neurons in FNN, the correlation between input features and output label can be built. When processing the complicated but orderly network using training dataset, the neurons are interconnected into multiple layers and iteratively assign weight to each individual neuron. Figure 7 displays a fundamental diagram of developing a multilayer feedforward neural network. The calculation of some specific values and functions is simultaneously conducted in each layer during the forward path. The intermediate variable Z is the weighted sum of the inputs, and y represents the nonlinear activation function \(f\) of each layer. W refers to the weight between two units in adjacent layers indicated by subscript letters, and b stands for the bias value of the unit.

Fig. 7
figure 7

(Adapted from Shrestha and Mahmood 2019)

Architecture of multilayer feedforward neural network

The training algorithm widely used in FNN is back propagation algorithm, which can be simply represented as: forward propagation phase and backward propagation phase. The main purpose of the first phase is to transmit the input information to the output layer through several hidden layers assigning random weights to each neuron. When completing the forward propagation, some divergence between the resulted prediction and the desired output unavoidably exists, which brings necessity to adjusting the current model. The latter phase is to approximate the predicting output to the desired one by updating the weight of neurons in each layer. The gradient descent is performed in back propagation, it minimizes the model error through calculating the derivative of the error and propagates backwards.

When the number of hidden layers increases to more than one, the FNN model becomes a deep model. Deep FNN models are characterized by learning multiple representations, in other words, is capable of precisely modeling complex data interactions from practical engineering problems. The learning method used to train this model is called DL (Bengio 2009). For complex issues such as time series, computer vision, speech recognition, increasing the number of hidden layers may attribute effective modeling capabilities (Bengio et al. 2007). However, the learning process of deep neural networks may lead to overfitting and performance degradation.

3.2 Recurrent neural network (RNN)

In the traditional neural network ANN, nodes in different layers (input layer, hidden layer, and output layer) are connected to each other, and the nodes between each layer are independent. However, in RNN, it is identical that adjacent nodes in one hidden layer are connected to each other. Each node of the hidden layer receives the information from two sources. First one is inherited from the hidden layer at the previous time point, and the input layer at the current time point, which gives RNN the characteristic of memorizing the information from historical time points. Then the network applies the remaining information in memory to the output calculation of the current neuron, meanwhile, constantly transforms new data as input. In a word, RNN can describe the real-time dynamic behaviors using time sequential data. The sketch of the network is given in Fig. 8. The left simplified structure can be unfolded as the right one in detail. An integral illustrative one contains input, hidden and output layers (Williams and Zipser 1989). All hidden layers must be connected and committed to its next one. For a given input sequence \(x = (x_{1} ,x_{2} , \cdot \cdot \cdot ,x_{T} )\), the hidden state vector sequence is \(h = (h_{1,} ,h_{t} , \cdot \cdot \cdot ,h_{T} )\), the output sequence is \(y_{1} = (y_{1} ,y_{2} , \cdot \cdot \cdot ,y_{T} )\). The output sequence y can be obtained by iterating the following equations from time t = 1 to T (Fan et al. 2014; Yang et al. 2019). Let \(x_{t}\), \(h_{t}\) and \(y_{t}\) represent the input, hidden, and output vectors at sampling instant t, respectively. The hidden and output vectors at sampling time t can be calculated as

$$h_{t} = f(Ux_{t} + Wh_{t - 1} + b)$$
(1)
$$y_{t} = \alpha (Vh{}_{t} + c)$$
(2)

where U stands for the weight matrix between input and hidden vectors; W is the weight matrix between different time steps of the hidden vectors; V represents the weight matrix that connects the hidden layers to the output vectors, b and c are the corresponding bias vectors of W and V, respectively. a is the activation function like sigmoid function.

Fig. 8
figure 8

(Adapted from Yang et al. 2019)

Structure of the RNN

Due to the unique structure, RNN is characterized by a superior ability in time series prediction and can theoretically handle arbitrary long sequences. However, its shortage lies in processing long distance information. The learning ability will be weakened owing to the gradient vanishing or exploding problem, which makes it difficult to capture long-term time dependences (Bengio et al. 1994; Pascanu et al. 2013). To remedy this problem of conventional RNN, Hochreiter and Schmidhuber (1997) proposed a long short-term memory network (LSTM), which is an upgrade of original standard RNN. The thorough formation of an LSTM unit can be assessed in Fig. 9. A LSTM unit includes three gate controllers, known as the input, forget, and output gates individually. Every information going through this unit has to be decided whether to be remembered or forgotten, then assigned to corresponding gate. To avoiding the gradient vanishing problem, the LSTM network implements temporal memory through switching those gates (Yuan et al. 2019). For the basic LSTM unit, its external inputs are its previous cell state \(c_{t - 1}\), the previous hidden state \(h_{t - 1}\), and the current input vector \(x_{t}\). From Fig. 9, the three current gates are generated as Eq. (3) to Eq. (5).

$$i_{t} = \sigma (W_{ix} x_{t} + W_{ih} h_{t - 1} + b_{i} )$$
(3)
$$f_{t} = \sigma (W_{fx} x_{t} + W_{fh} h_{t - 1} + b_{f} )$$
(4)
$$o_{t} = \sigma (W_{ox} x_{t} + W_{oh} h_{t - 1} + b_{o} )$$
(5)
Fig. 9
figure 9

(Adapted from Yuan et al. 2019)

Structure of the LSTM unit

Inside the LSTM, the memory cell and the current hidden state in the memory block are calculated as Eq. (6) to Eq. (7).

$$c_{t} = f_{t} c_{t - 1} + i_{t} \tanh (W_{xc} x_{t} + W_{hc} h_{t - 1} + b_{c} )c_{t - 1}$$
(6)
$$h_{t} = o_{t} \tanh (c_{t} )$$
(7)
$$\sigma (x) = 1/(1 + e^{ - x} )$$
(8)
$$\tanh (x) = (e^{x} - e^{ - x} )/(e^{x} + e^{ - x} )$$
(9)

where \(i_{t}\)\(f_{t}\),\(o_{t}\),\(c_{t}\) are the values of the input gate, forget gate, output gate and memory cell in the memory unit; \(b_{i}\),\(b_{f}\),\(b_{o}\) and \(b_{c}\) are their corresponding bias values; \(W_{x}\) represents the weights between input nodes and hidden nodes; \(W_{h}\) represents the weights between hidden nodes and memory cell; \(W_{c}\) represents the weights that connect memory cell to output nodes; σ and \(\tanh\) represents the sigmoid and hyperbolic tangent activation function for the gates, respectively. The output \(y_{t}\) can then be obtained by Eq. 2. The output sequence \(y_{1} = (y_{1} ,y_{2} , \cdot \cdot \cdot ,y_{T} )\) will be updated by iterating Eq. (2) to Eq. (9) from times t = 1 to T (Yang et al. 2019).

3.3 Convenlutional nerual network(CNN)

The CNN show significant superiority when dealing with image classification and recognition issues comparing with other DL algorithms. Worldwide researchers beyond doubt employ this method on image recognition or computer vision tasks (Karimpouli and Tahmasebi 2019; Krizhevsky et al. 2017; Schmidhuber 2015). The first CNN model is called LeNet-5, proposed by LeCun et al., which has been successfully utilized to identify handwritten numbers at the end of last century (LeCun et al. 1998). As is known that the computation efficiency was quite limited back then, CNNs can only process small database, the potential of CNNs has not been adequately exploited. With the development of technology in the past 20 years, the emergence of big data (Zhang et al. 2016), and the improvements in training algorithms (Dong et al. 2020), the feasibility of training deep CNNs to handle further complex recognition tasks has greatly promoted (Khan et al. 2020). Microsoft has deployed a number of CNNs-based optical character recognition and handwriting recognition systems (Simard et al. 2003). CNNs were also employed for object detection in natural images (Vaillant et al. 1994; Zhao et al. 2019b), including face recognition (Lawrence et al. 1997), medical diagnosis (Shen et al. 2017) and image understanding (CireşAn et al. 2012).

The structure of CNN is based on a large number of convolutional layers, which are obtained from image convolution through many small size kernels. These large number of kernels work as feature identifiers to classify different features of input data, which are usually images. However, using such functions is not straightforward and requires activation functions and pooling. After feature extraction, they will be connected to the output layer using a fully coupled neural networks. As being seen in Fig. 10, a comprehensive representation of CNN structure for rock image is illustrated, which contains input layer, feature maps and fully connected network. Obviously, feature maps extraction is the most cumbersome and complicated, mainly including the following several steps (Karimpouli and Tahmasebi 2019):

  1. 1.

    Convolutional layer the input image or the last feature map is convolved by a randomly generated kernel (or filter) of size (height, width, channel). The feature map \(F{}_{k}\) is calculated as Eq. (10):

    $$F_{k} = (\sum\limits_{l} {W_{ki} *X_{i} } ) + b_{k}$$
    (10)

    where \(W_{kl}\) is the sub-kernel of the lth channel, \(X_{l}\) is the \(ith\) input channel, \(*\) represents the convolution operator, and \(b_{k}\) is a bias term. Since the convolutional layer is associated with \(L\) input channels, \(X\) contains \(M \times M \times L\) values, each kernel \(W_{k}\) contains \(N \times N \times L\) weights. Accordingly, the number of parameters in a convolution block composed of \(K\) feature maps is equal to \(K \times M \times M \times L\).

    Fig. 10
    figure 10

    Schematic illustration of the CNN (After Karimpouli and Tahmasebi 2019)

  2. 2.

    Padding In order to obtain a convolutional layer of the same size as the input matrix, a zero padding cells containing some rows and columns with zero value are added around the input matrix.

  3. 3.

    ReLU (Rectified Linear Unit) layer The ReLU layer is an activation function that changes negative values to zero. On the other words, it applies the following function to all cells obtained from the Eq. (11) is:

    $$F_{k} = \max (0,\sum\limits_{l} {W_{kl} *X_{l} } ) + b_{k} )$$
    (11)

    ReLU training is faster compared with sigmoid or tanh activation function (Krizhevsky et al. 2017). It induces the sparsity of the hidden unit and the nonlinearity of the system, while not encounter the gradient vanishing problem.

  1. 4.

    Maximum pooling The pooling layer, also known as down sampling, combines all values ​​in the pooling window into one value in the next layer. For example, the maximum pooling uses the maximum or average pool to calculate the average value in each pool cluster. It imposes a certain degree of spatial invariance on the network and reduces the computational cost by processing global information (Garcia-Garcia et al. 2017).

  2. 5.

    Stride Convolution and maximum merge are performed with an offset of "n" pixels, which is called stride. A stepping window larger than one pixel will lead to a smaller image. It controls the size of feature maps in convolution and max pooling layers.

Similarly to the conventional Multilayer Perceptron (MLP) neural network, fully Connected Network (FCN) is used to connect the feature maps or patterns obtained in the previous layers to known outputs.

3.4 Generative adversarial network(GAN)

GAN is proposed by Goodfellow et al. and it has become one of the most popular DL algorithms among generative models (Goodfellow et al. 2014). Although it is based on neural networks, GAN is characterized by training two independent networks simultaneously, namely generator and discriminator, as Fig. 11 shows its key logic. The generator generates new fake samples according to the features of practical data, making it difficult for the distinguisher to distinguish the fake samples from the real data. On the other hand, the discriminator attempts to identify whether the data comes from the generator. The two networks are trained in an adversarial way, and this process allows them to continously train for their goals.

Fig. 11
figure 11

(Adapted from Azevedo et al. 2020)

Schematic diagram of the GAN

Figure 12 is a conceptual description of the simultaneous training process of two models in GAN (Goodfellow et al. 2014). The black dots indicate the distribution of the training set and the green line is the generated sample distribution from generator G. The blue dashed line is a discernible distribution D, which can distinguish the data between the black and green lines. The sample is generated from the domain z and mapped to the domain x according to x = G (z). At first in Fig. 12a, the distribution from G is different from the true so that D can classify the samples from the true values. As G and D are trained repetitively, the samples from G and the trainset are analogous and D cannot distinguish them, and the probability D(x) is equal 0.5. The loss function V(D,G) in GAN is defined as Eq. (12), in which G is set to be minimized while D is to be maximized in each term.

$$\min \max V(D,G) = {\text{E}}_{x} \left[ {\log D(x)} \right] + {\text{E}}_{z} \left[ {\log (1 - D(G(z)))} \right]$$
(12)
Fig. 12
figure 12

Distributions of generator and discriminator in GAN (Goodfellow et al. 2014)

When data in the domain x is entered, the best result is that the discriminator D prints out 1. In addition, the data in the domain z (called as fake data) should be calculated as 0 from D(G(z)). On the other hands, the generator G is trained to make fake data to be treated as the data in x. Therefore, it optimizes the discriminator to print out D(G(z)) from 1.

After the first proposal of GAN in 2014, there have been several variants of GAN such as cGAN (conditional GAN) (Mirza and Osindero 2014), WGAN (Wasserstein GAN) (Arjovsky et al. 2017), BEGAN (boundary equilibrium GAN) (Marzouk et al. 2019), LSGAN(least square GAN) (Mao et al. 2017), DCGAN (deep convolutional GAN) (Radford et al. 2015), and so on. These variants make learning model more stable and accurate than the original GAN.

4 DL applications in geotechnical engineering

This section presents the findings obtained from processing the reviewed literature. The specific applications are summarized and shown in chart based on the DL methods adopted in papers. It is to be noted that the summary also covered the hybrid or optimized DL model applications. Additionally, the corresponding applications of each method are tabulated in detail.

4.1 Specific topics used DL methods

In order to illustate the applications of the fore-mentioned four methods in geotechnical engineering, this subsection aims at classifying which specific topics are being addressed by the four different DL algorithms. These categories are classified for each literature and results were organized into a structured figure and chart as following.

As shown in Fig. 13, it illustrates the distribution of different topics using DL methods to access or predict property parameters. The results show that the tunnel construction is the most widely used filed followed by the geological information extraction. Meanwhile, it can be concluded from the drawing colors that the main DL methods are widely applied in different aspect in addition to GAN methods, but obviously there are still some differences in the application direction between them. In general, FNN is mainly good at quantitative prediction of parameter properties. RNN is biased towards mining the connections between features and making long-term inferences in the time dimension. CNN is expert in image processing and image discrimination. Among them, GAN, which is the least used, mainly utilizes its advantages in image reconstruction and capture geological information.

Fig. 13
figure 13

The specific topics of the four DL methods’ usage in geotechnical engineering

The aim of Fig. 13 is to better vividly illustrate the different application fields of DL methods. The specific case study and references can be correspond clearly according to Table 2, which classified the constructive applications of the four main methods with a detailed list of the research fields. It needs to be emphasized that the research only exists on expand hidden layer for neural networks is quite a few. Hence, the references of multilayer neural network application are listed selectively in this article, as for the systematic review of ANN, it was discussed by numerous research scholars in the area of geotechnical engineering (Fatehnia and Amirinia 2018; Moayedi et al. 2020; Shahin 2015, 2016; Shahin et al. 2001).

Table 2 Literature survey on DL models used to geotechnical engineering

4.2 FNN applications

4.2.1 FNN model in geotechnical engineering

FNN is widely used, and many studies have explored it in terms of model structure. In the past few years, researchers only discussed the shallow neural network with a small amount of data. But for deep neural network, determining the network architecture is one of the most important and difficult tasks in ANN model establishment (Maier and Dandy 2000). It requires the selection of the optimum number of hidden layers and the number of nodes in each of these. However, there is no unified theory for determination of an optimal ANN architecture. The number of nodes in the input and output layers are restricted by the model of optional features and labels.

Honik et al. (1989) proved that the standard back-propagation network can approximate any measurable function to any desired degree of accuracy with just one hidden layer on condition that sufficient hidden units provided. However, as practical research shows (He et al. 2016), in many cases DNNs predictions are more accurate than the ones obtained by shallow networks. Meanwhile, this existence theorem does not suggest any rules to choose a proper hidden layer size. As for geotechnical engineering, Bagińska and Srokosz (2019) compared shallow and deep neural networks to predict the ultimate bearing capacity of shallow foundation. It shows that DNNs have a significant advantage over shallow networks even though the experimental dataset used for preparing models is small. Nejad et al. (2009) used single and multiple hidden layers of FNN models to predict pile settlement based on standard penetration test (SPT) data based on approximately 1000 data sets. Karlik et al. (1998) identified an optimal ANN model with 3 hidden layers for the vibration of a beammass system, which performed better than a single hidden layer model.

FNN recently generated, extended, and applied by many researchers scholars, they have successfully attempted to present this new utility determing tools into the field of geotechnical engineering. For instance, the interesting appraches has been studied several times in soil propertty assessment. Kim et al. (2014) described FNN model to estimate subgrade resilient modulus in correlation with the physical properties and stress state of subgrade soil which has a substantial effect on pavement design. Mollahasani et al. (2011) derived multilayer perception of FNN to estimate undrained cohesion intercept (c) of soil with experimental database which established upon a series of unconsolidated-undrained triaxial tests. The results indicate that the developed model is effectively capable of estimating the c values of soil samples. This model provides a significantly better prediction performance than the regression model. Baziar and Jafarian (2007) adopted a relatively large dataset to establish a FNN model for the correlation between soils initial parameters and the strain energy required to trigger liquefaction in sands and silty sands. In addition, the data recorded during some real earthquakes plus some available centrifuge tests data have been utilized in order to validate the proposed ANN-based liquefaction energy model. The results clearly demonstrate the capability of the proposed model and the strain energy concept to assess liquefaction resistance of soils.

In other respects, various practical issues have been resolved by this intelligent algorithm. Rahman et al. (2001) developed FNN model to predict the uplift capacity of suction foundations for the anchorage of large compliant offshore structures and compared with finite element based predictions. It demonstrated as more available data used, the model can be improved to make more accurate capacity prediction for a wider range of load and sit conditions. Bui et al. (2020) compared the prediction performance of deep FNN with conventional ML model such as C4.5-decision Tree model, Support vector machine, and random forest model in landslide susceptibility assessment. Meanwhile, Nhu et al. (2020) investigated the capability of deep FNN for landslide susceptibility mapping based on a Python DL library of Keras. Ranković et al. (2014) develop a FNN model to predict the piezometric water level in dams. An improved resilient propagation algorithm has been used to train the FNN so as much as possible to minimize the error between the neural network predictions and the desired outputs. Salsani et al. (2014) utilized FNN to model the relationship between the roadheader performance and the parameters influencing the tunneling operations with a high correlation based on the geological conditions.

4.2.2 Hybrid models of FNN in geotechnical engineering

However, although FNN perform well in most cases, it is a remarkable fact that the traditional neural network still has some limitation, such as overfitting and complicated parameter selection. By combining with other soft computing technologies, FNN have the strength to obtain better modeling capabilities (Cui et al. 2009; Elbeltagi et al. 2005). Many of optimization algorithms are inspired by various phenomena which occur in nature. particle swarm optimization (PSO), Genetic algorithm (GA), firefly algorithm (FFA), cuckoo search (CS), bacterial foraging (BF), artificial bee colony (ABC), ant colony optimization (ACO), gravitational search algorithm (GSA), etc. are being popular used in the area of geotechnical engineering. Gordan et al. (2016) combined PSO and ANN to predict slope stability induced by seismic loading. Liu et al. (2012) proposed genetic algorithm for determining load carrying capacity of composite foundation. Singh and Walia (2017) proposed four developed ANN by applying optimization algorithms of PSO, FFA, CS and BF to determine the unit skin friction and unit end bearing capacity for the design of bored pile foundation.

Some typical FNN application references can be referred in Table 2, similarly, the applications of the other three algorithms are also listed separately.

4.3 RNN applications

4.3.1 RNN model in geotechnical engineering

A time-aware class of the Neural Networks are RNN, which operate on time series of variables and have some memory of previous states. Thanks to the memory of those gates, previous conditions can be weighted, and when applicable, incorporated into the current state. This kind of DL method is proficient in natural language processing tasks, such as translation, speech generation from text and text classification (Van Houdt et al. 2020).

In the geotechnical field, some time series monitoring problems of tunnel construction have gradually applied this algorithm, Ninić et al. (2017) proposes RNN method and the process-oriented 3D simulation model for real-time simulation and monitoring-based predictions during the construction of machine-driven tunnels to support decisions concerning the steering of tunnel boring machines (TBMs). Li and Gong (2019a) presents diagonal RNN and evolved particle swarm optimization (EPSO) algorithm as a model predictive control (MPC) system for the slurry pressure balance during construction through effectively regulating the slurry circulation and air pressure holding systems according to geological conditions. The simulation results demonstrated that the presented approach can accurately track the desired water-earth pressure and significantly enhance the robustness of slurry supporting system in tunneling, and the novel EPSO also performed higher convergence speed and precision than the classic algorithms used for comparison. Besides, Zhou et al. (2019a) propose RNN to establish a sequence model of the stratum type and the stratum thickness. The series model based on DL can describe the real stratum situation, and it is a complimentary tool to the traditional 3D geological model. The prediction ability of the model is improved to a certain extent by including expert-driven learning, which provides a novel approach for the simulation and prediction of a series by 3D geological modeling.

4.3.2 Optimized model LSTM in geotechnical engineering

Since RNN is hard to capture long term time associations, as an advanced implementation of this is Long Short-Term Memory (LSTM), which is more suitable for solving detection problems with long time span. It is widely used in prediction of landslide deformation, prediction of groundwater level determination, and tunneling deformation. Landslide is a dynamic process characterized by several features. The conventional static method ignores the essence of the dynamic system of landslide evolution and cannot consider the influence of time, which restricts the improvement of prediction accuracy. Current researches have begun to focus on dynamic model LSTM to predict landslides. For step-wise landslide displacement, the accumulated displacement can be separated into trend term and periodic term displacement. Generally, the displacement of the trend term is determined by the conditions of the slope body reflects the long-term trend of landslide, which can be predicted by the traditional static method, the displacement of the periodic term is controlled by external factors, such as rainfall and reservoir water level changes, and can be predicted using LSTM efficiently.

More recently, Yang et al. (2019) propose LSTM as a dynamic model for predict the Baishuihe landslide and Bazimen landslide displacement in the Three Gorges Reservoir (TGR) Area. A LSTM model was used to predict the relationship between periodic displacement and reservoir water level with rainfall. The application of the model demonstrates that the LSTM model provides a good representation of the measured displacements and gives a more reliable prediction of landslide displacement than the static support vector machine (SVM) model. Likewise, Xing et al. (2020) investigate a novel prediction model of landslide displacement with risk-averse adaptation. For this methodology, double exponential smoothing method is utilized to predict trend term of landslide displacement, while hybrid model of LSTM and support vector regression(SVR) is developed to predict periodic term of landslide displacement. The proposed approach maintain a high prediction accuracy and reduce the underestimation rate based on Baishuihe landslide. Similarly to investigate landslide evolution, Xu and Niu (2018) applied LSTM model to predict the total periodic component of Baijiabao landslide in Zigui county of HuBei province. The predicted results indicate that, to some extent, the dynamic model (LSTM) achieves results that are more accurate than those of the static models (i.e., SVR and BP). LSTM even displays better performance than the Elman network which is also a dynamic method. Furthemore, Xie et al. (2019) adopted LSTM method to investigate the dynamic failure mode for Laowuji Landslide. The displacement of the Laowuji landslide contains the trend and periodic component. The periodic component is predicted by LSTM method and model's input includes multiple factors of geological conditions, rainfall intensity, and human activities. The measured data and the predicted data show good consistency. Compared with a traditional mechanical model namely the Empirical Mode Decomposition (EMD), the LSTM model is more powerful to predict the landslide displacement triggered by multiplying factors and the idea can give a promising way to develop the landslide warning system more efficiently and precisely on site. Besides, Xiao et al. (2018) employed the four data-driven algorithms of decision tree (DT), support vector machines (SVM), Back Propagation neural network (BPNN), and LSTM to evaluate landslide susceptibility along China-Nepal highway. LSTM outperformed the other three models due to its capability to learn time series with long temporal dependencies. It indicates that the dynamic change course of geological and geographic parameters is an important indicator in reflecting landslide susceptibility.

In addition, LSTM model can be used to predict water level change over time. Supreetha et al. (2020) developed a groundwater level forecasting model by using hybrid long short-term memory with lion algorithm based on the groundwater level and rainfall dataset from an oberservation well at India. The prediction accuracy of the hybrid LSTM-LA model was better than the FNN and the isolated LSTM models. Liang et al. (2018) establish a LSTM model to study the Dongting Lake water level variation and its relationship with the upstream Three Gorges Dam (TGD) The test shows the LSTM model has better accuracy compared to the support vector machine (SVM) model. Furthermore, the model is adjusted to simulate the situation where the TGD does not exist to explore the dam’s impact.

Consindering of the uncertain geological conditions, the performance evaluation of tunnel boring machine is a essential task during tunneling process. As a consequence of this, Li et al. (2020a) employed LSTM to predict the total thrust and the cutterhead torque during a stable period in a boring cycle which totally included 120 GB data. This real-time predication shows superior performance than the classical theoretical model in which only a single value can be obtained based on the rock properties. Gao et al. (2020a) verify the performance of LSTM in TBM penetration rate prediction. The machine parameters, rock mass parameters, and geotechnical survey data from the water conveyance tunnel of the Hangzhou Second Water Source project were collected to form a dataset. Compared with RNN based model and traditional time-series prediction model autoregressive integrated moving average with explanation variables (ARIMAX), the overall performance on proposed LSTM model is better. Moreover, in the rapidly increasing period of the penetration rate, the error of the LSTM-based model prediction curve is significantly smaller than those of the other two models. Gao et al. (2019c) adopted three kinds of RNN, including traditional RNN, LSTM and gated recurrent unit (GRU) networks, to deal with the real-time prediction of TBM operating parameters based on TBM in-situ operating data. Compared with several classical regression models such as support vector regression (SVR), random forest (RF) and Lasso, the comparative experiments show that the proposed RNN-based predictors outperform the regression models in most cases. The feasibility of RNN for the real-time prediction of TBM operating parameters indicates that can afford the analysis and the forecasting of the time-continuous insitu data collected from various construction equipments.

4.3.3 Hybrid models of LSTM in geotechnical engineering

Essentially based on RNN, the hybrid model of LSTM can be more flexible to achieve time-series dynamic prediction, for instance the quick determination of the attitude and position for shield machine in tunneling with the consideration of the uncertainty of geological stratum. Zhou et al. (2019b) presented a hybrid model, namely, WCNN-LSTM, for the dynamic multi-step-ahead prediction by using a DL-based forecasting framework that integrates wavelet transform(WT), CNNs, and LSTM. The prediction framework is tested with the collected data of Mixshield operated in the river-crossing tunnel project of Yangtze Sanyang Road, Wuhan, China. To verify the validity and demonstrate the prediction accuracy of the proposed method, three widely used predictive models, namely, ARIMA, LSTM, and WLSTM, were introduced for comparison with the WCNN-LSTM model. Results reveal that the proposed model outperforms the other three similar models in predictive accuracy and provides decision support for adjusting the attitude and position in shield tunneling. In addition, Qi and Fourie (2018) proposed LSTM model as a substitute for numerical modelling to eastimate the rheological parameter of rock tunnel and utilize firefly algorithm (FA) to search for the optimum hyper-parameter. The performance of DeepLSTM-FA, was verified using a tunnel response with the FLAC 2D finite difference program. Furthermore, an engineering instance is applied to validate the accurate of rheological parameters. Results demonstrated that the DeepLSTM-FA can provide real-time stress and stability analysis for engineering projects. What’s more, LSTM can also be used for other monitoring issues, such as soft sensor on a debutanizer column and penicillin fermentation process (Yuan et al. 2019).

Regarding the application of RNN in geotechnical engineering, its improved algorithm LSTM is mostly used, and evidently reflects this fact in Table 2.

4.4 CNN applications

4.4.1 CNN model in geotechnical engineering

Due to increasing the computational power and a high demand of considering more complexity in the computational methods, image-based problems have recently been developed on the ground of AI, particularly in DL. CNN is one of the DL derivatives, which is able to automatically learn the features required for image classification from training-image data, thus improving classification accuracy and efficiency without relying on artificial feature selection. Recently, many researchers used the CNN for different applications such as: image classification (Krizhevsky et al. 2017), face detection (Li et al. 2015), face image synthesis (Abdolahnejad and Liu 2020), semantic segmentation(Garcia-Garcia et al. 2017), traffic signal recognition (Lv et al. 2014; Sermanet and LeCun 2011), text mining (Shi et al. 2018),speech recognition (Zhang et al. 2017c), human behavior recognition (Han et al. 2018; Sargano et al. 2017), 3D pose estimation (Mahendran et al. 2017), plant disease identification (Liu et al. 2018; Lu et al. 2017), medical image analysis and applications (Havaei et al. 2017; Litjens et al. 2017; Wallach et al. 2015), saliency detection (Lee et al. 2017) and so on. Following the rapid development of DL methods in computer vision and medical science, some attempts have been done in geosciences especially in rock physics. For example, DL methods and CNNs were applied on classification of rock type (Cheng and Guo 2017; Ferreira and Giraldi 2017), borehole imaging for lithology detection (Zhang et al. 2017a), permeability prediction (Srisutthiyakorn* 2016), landslide detection (Ding et al. 2016; Ghorbanzadeh et al. 2019a, b; Lei et al. 2019; Lv et al. 2020), reconstruction/analyzing of rock porous media (Alqahtani et al. 2018; Laloy et al. 2017; Mosser et al. 2017) and rock image segmentation (Karimpouli and Tahmasebi 2019).

The automatic identification of rock type in the field would aid geological surveying, and the successes of applying CNNs to image recognition have led geologists to investigate its application in identifying rock types. Wang et al. (2019) proposes a novel network named as three-dimensional super-resolution CNN to realize Computed Tomography (CT) imaging of rock samples. Ran et al. (2019) proposes an accurate approach for identifying rock types in the field based on image analysis using deep CNN. The proposed deep CNNs model was trained and tested using 24,315 sample rock image patches and achieved an overall accuracy of 97.96%. Its application has effectively identified rock types from images captured in the field. Similarly, Wei et al. (2019) propose CNN for characterizing rock facies with feature engineering and data padding strategies. They test the feasibility of applying this new algorithm using a verifiable well logging dataset from the Panoma gas field in southwest Kansas. The results show that CNN has application potential in automatic rock facies characterization with high accuracy and efficiency. In addition, CNN can judge rock physical properties based on rock images, such as Karimpouli and Tahmasebi (2019) estimates P-wave and S-wave velocities based on images of rock media to evaluate the physical parameters of the rock. He et al. (2019) proposed CNN to continuously estimate the field strength parameters of rock which achieves higher accuracy than the Mohr–Coulomb criterion and shows superior performance for UCS estimation of various rock types. Han et al. (2019a, b)proposed new deep convolutional networks based on Inception-ResNet-v2 and Inception-v3 models for rock strength measurement based on spectrogram. The frequency spectrum collected by tapping the rock with a geological hammer is the input variable of the DL model. The classification accuracy of the model can reach 93%, thus overcoming the subjectivity of human judgment.

4.4.2 Hybrid models of CNN in geotechnical engineering

As an alternative and effective approach, hybrid model of CNN also offers the required tools for geotechnical engineers, to make a fast and better decisions to improve the quality of their performance and to reduce risks. Numerous scholars have applied this method in geological disaster assessment. Yu et al. (2017) proposed an algorithm based on depth CNN and an improved region growing algorithm (RSG_R) method for detection of landslide intelligence. Fang et al. (2020) developed three hybrid methods of CNN-SVM, CNN-RF, CNN-LR for landslide susceptibility mapping which effectively improved the classifiers performance. For tunnel defects classification and detection aspects, Huang et al. (2018a) presented an image recognition algorithm with Fully CNN to conduct semantic segmentation of cracks and leakage defects of metro shield tunnel. With respect to tunnel construction, He et al. (2017) proposd an interpretation model of ground penetrating radar point data based on WD (Wigner distribution) and deep CNN in tunnel geological prediction. As known, microelectro-mechanical systems (MEMS) sensors is a very effective and promising for ground vibration monitoring in early alert geological disasters warning. According, Kang et al. (2019) develop a novel ground vibration monitoring scheme for MEMS sensed data based on a deep CNN. Experimental results on both data sets demonstrate that the proposed scheme significantly outperforms the other comparable schemes.

For the application in image identification, Zhang et al. (2018b) established hybrid model based on a transfer learning model with the new deep convolutional networks of the Inception-v3 for geological structures and made a comparison between the identification model with the other four models, namely K-nearest neighbors (KNN), artificial neural network (ANN), extreme gradient boosting (XGBoost) and CNN. The ML method’s accuracy was poor because it is hard to extract accurate features of images from a pixel vector or histogram and traditional single CNN model is overfitting strongly. Transfer learning based on DL model was an effective method for geological structure images classification. And the transfer learning model of mineral microscopic images is also established based on Inception-v3 architecture considering the task of microscopic mineral image identificationis tedious and time-consuming in the lab (Zhang et al. 2019e).

Indeed, CNN can even be used to on-site construction safety management. Ding et al. (2018) developed a hybrid DL model that integrates CNN and LSTM to automatically identify unsafe behaviors of employees on construction sites. The CNN model is applied to each frame to capture the spatial features obtained from video, and LSTM network is used to understand the temporal information from the continuous frames that are generated. Kim et al. (2018) proposed a construction equipment detection model based on deep convolutional nerual network, which is helpful for construction site management. This model is trained with a small amount of construction equipment data through transfer learning.

Some applications of CNN in geoscience and geoengineering have been listed in Table 2 in detail. It can be seen that this method has attracted more attention in recent years for the improvement of computing power and storage capacity.

4.5 GAN applications

4.5.1 GAN model for image reconstruction

Since GAN was proposed in 2014 and has a relatively short development time. It is a radically novel approach to explore new development opportunity in image synthesis (Mosser et al. 2017). Recently, researchers have investigated the utilized of GAN for imge construction of porous media. As known, in the studying of rock microstructure macroscopic properties, microsomputed tomography is widely used to achieve the high resolution digital imaging. However, using this technique to extract a large number of three dimentional images of the pore space is often experimentally not feasible. To address this problem, GAN can be applied to reconstruct the solid-void structure of porous media for the stochastic image reconstrution.

As results shown by Mosser et al. (2018), GAN are able to make a fast and accurate reconstruction of the evaluated image dataset and the synthetic images generated by the GAN model are accurate match the key characteristic statistical and physical parameters of these porous media. Likewise, Valsecchi et al. (2020) establised a GAN based model for two dimentional to three dimentional of the structure of porous media by employ three kind of rock image database, named Beadpack, Berea and Ketton. The experiments proved that three dimentional reconstruction can be performed successfully employing the sets of two dimentional images, providing a huge advantage in terms of applicability with respect to the costly microsomputed tomography scans. In addition, according to Azevedo et al. (2020), GAN method is greatly suitable in generate geological models of discrete and continuous properties for stochastic subsurface model restruction. Moreover, the hybird model of developed generative-model-based comprssive sensing approach can be applied to recover the overall shape and thickness of the cracks with its outstanding ability to learn the low dimention representation of differents class of images (Huang et al. 2020).

4.5.2 GAN model in imge processing

Due to the inherent limitations of microcomputed tomography, the balance between the field of view and rock image high resolution has always been a research hotspot in the field of computer vision. SRGAN (Super-Resolution Generation Adversarial Network) is a successful case of GAN in the application of image super-resolution (Ledig et al. 2017). SRGAN proposes a new loss function, which effectively solves the problem of high-frequency details lost in the image after restoration, and enables people to have a good visual experience. Chen et al. (2020) propose a cycle-consistent generative adversarial network (CycleGAN)-based SR approach for real-world rock microcomputed tomography images, which can model the mapping between rock MCT images at different resolutions. The high resolution images reconstructed by SR CycleGAN show good agreement with the targets in terms of both the visual quality and the statistical parameters, which greatly improve the quality of rock images and exceed the limitation of imaging systerms on field of view and resolution. Similarly, Janssens et al. (2020) present the conditional Generative Adversarial Networks (cGANs) method to handle the SR problem on fluid flow characteristics assessment while still succeeding in generate visually more appealing results. It could therefore be interesting as a pre-processing step in geological materials study.

Furthermore, GAN is mostly used to explore the reservoir properties in combination with the existing geological information. Based on the unique architecture of this model, it generates new models that does not exist in the initial models, then try to perform cluster analysis on the regenerated model and filter it to obtain the corrected model, thereby reducing the uncertainty of prediction as possible (Kang and Choe 2020). Oliveira et al. (2019) evaluate the performance of cGANs as an interpolation tool for improving seismic data resolution on a public poststack seismic data set and compare the results with the traditional cubic interpolation. The results show that cGANs outperform traditional algorithms and that the texture descriptor was able to better capture image similarities, producing results more coherent with the visual perception.

Considering the advantages of GAN's autonomous learning and the generation of random samples, as well as the diversity for required training samples, it has been very successful in generating realistic images in a large number of fields such as natural synthesis (Brock et al. 2018; Karras et al. 2017, 2019; Salimans et al. 2016), face synthesis (Bao et al. 2018), image style transfer (Chen et al. 2018a; Choi et al. 2018; Ma et al. 2018b; Taigman et al. 2016), image segmentation (Zhang et al. 2018c), facial expression recognition (Peng and Wang 2018) and so on. Nonetheless, lack of data is still one of the important factors restricting the development of DL. Particularly, the relevant aspects are limited in geotechnical applications due to the inaccessibility of data as seen in Table 2, yet this method is already a key technology in unsupervised learning and will be one of the important development directions of AI in the future.

5 Discussion

5.1 Critical analysis

The four main DL algorithms, as well as literature mentioned are compiled in Table 2. The summary is mainly based on different data sources discussed with the case study, adopted methods and application aspect in geotechnical engineering. It is apparent from this short review that different DL methods are motivated with the limitation of traditional theoretical approach in solving particular problem in geotechnical engineering. Even though some DL methods can be applied in the same problem, results still have differences due to the different architecture of various DL methods. Thus, no single or particular model can be presented as the most suitable for all geotechnical problems, because the model selection depends on the underlying objectives, the scientific goals, and model limitations. Notably, DL models are still more appealing than others ML methods due to its calculation accuracy and computational efficiency.

However, due to the short development time of DL theories, there still exist some deficiencies in the four DL algorithms. Firstly, for the multilayer neural network, simply increasing the number of network layers do not significantly contribute to the results accuracy. Secondly, the CNN algorithm is more proficient in handling supervised issues, which utilize labelled data to push learnable model (Yang et al. 2017). However, the performance of CNN strongly depends on the availability of large training datasets, optimized network architectures, and faster graphics processing units (Guirado et al. 2017). Although automation approach with CNN has made a series of improvement in image classification and object detection, with the application of CNN for geotechnical engineering are still limited. Thirdly, although the RNN algorithm is apt at processing large amounts of time-related data, the problem of the data missing certainly hold it back. Furthermore, while GAN model is particularly excellent in image processing, but training-based methods such as the presented GAN-based approach have a high initial computational cost and run times due to the required training phase. Therefore, it places higher demands on the configuration of the calculation.

Although computing configuration does restrict the calculation efficiency, from the perspective of the application of DL algorithms, DL still has a broad prospect in geotechnical engineering applications, such as landslide susceptibility assessment, slope deformation prediction, tunnel defects detection. Undoubtedly, these methods can extract increasingly useful information of the raw data through each hidden layer, which can make full use of effective monitoring information and exert a crucial influence on geological disaster prevention (Shaheen et al. 2016). However, there are still several problems that cannot be ignored in the data processing. Firstly, although there was large amounts of monitoring data for large-scale geotechnical projects, the amount of data obtained from medium or small-scale projects is very less, this leads to the great difficulty to process and analyze the geotechnical problems. Moreover, data sharing platform of huge geological disasters has not yet been established till now, which limited the expansion and application of new data mining methods. Secondly, the output quality not only depends on the model, but also the quality of the acquired data. However, site data acquirements are often limited by the geological environment and monitoring instruments. Besides, due to the existence of missing values or outliers in the data, the robustness of the model may be affected in the process of data mining. In addition, as far as the current situation is concerned, it is still unachievable to realize a precise and fully automated process. Hence, the remote intelligent early warning of earth disasters of equipment, consultation, and cooperative prediction mechanism urgently needs to be established. Furthermore, considering the spatial variability and uncertainty of geological conditions, it is difficult to ensure the generalization of the model during data analysis. In summary, the exploration of AI applications, big data technology, and DL methods are in the ascending stage of development. Scholars still need to overcome various difficulties in the process of continuous exploration and advancement.

5.2 Future perspectives

With the particular insistence on the latest techniques used for geotechnical engineering in the past years and future remarks on those approaches that imply to be promising in the subsequent years but still requires further improvement. It is undeniably that DL methods can be adopted as a complementary measure to conventional theory. It may be utilized as a quick check on expert’s solutions even as an alternative approach (Thirugnanam et al. 2020). Interestingly, the DL based researches have already demonstrated the accuracy of pre-judgement compared with others.

However, although DL has been applied for civil engineering field, the practical research is rarely applied in practical engineering (Lin et al. 2017). The main reason for that is, in the last two decades, the majority of geotechnical engineering related studies have been dedicated to using conventional ML approaches such as RF, SVM and DT. On the contrary, geotechnical engineering such as tunnel construction and landslide displacement prediction developed by using DL methods are still in their infancy (Jiao and Alavi 2020). Hence, there are tremendous opportunities for exploitation of current algorithms with architectures and further exploration of optimization methods to solve more complex problems. It should be noted that DL training is currently constrained by overfitting, training time and is highly susceptible to getting stuck in local minima, overcoming these challenges shall become the research focus to accelerate breakthroughs (Shrestha and Mahmood 2019).

Furthermore, the performance of these approaches is strongly dependent on their network architecture, the sample patches selected for input, and graphics processing unit speed. That meaning increasingly high resolution images typically require increasing memory storage and computational load, thus an accurate model requires a balance between satisfactory performance and practicable computational time. In other words, while considering the resulting image quality to be equal, one possible differentiation of these methods is computational run time. Therefore, how to optimize the algorithms architecture to calculate faster is also a research orientation for scholars.

In the future development, in addition to continuously optimizing the algorithm architecture, how to effectively acquire high-quality data is also crucially important. Especially in this era of big data, it provides more opportunities and challenges for DL. As the characteristics of big data with 4Vs (volume, velocity, veracity, variety), if it can be properly managed and adequately analyzed, data will be priceless (Kapliński et al. 2016). Conversely, DL methods may also be limited by the availability of data sources. If the data source is limited or of substandard quality, adopting or reusing the proposed method may not be feasible for other research. That is to say, the durability and accuracy of geological monitoring are of significance to the prediction systems and performance of the DL based prediction model crucially depends on the monitoring data. As a consequence, we must adopt from many new monitoring devices to collect data locally, such as GPS, wireless devices, sensors, and streaming communication generated by machine-to-machine interactions in engineering (Huang et al. 2019). And then taking the advantage of the rapid development of internet-based platform, DL models can readily be functionalized in the platform to enable an advance real-time geological monitoring system, as well cloud computing techniques are expected to transmit and process geological data efficiently.

In addition, in terms of data source processing, especially for the related analysis of rock and soil materials, considering of its strong spatial variability, it is necessary to adopt appropriate data preprocessing techniques, such as noise removal, outlier removal, spurious attribute removal, and proper correlation between features of different data sources (Saikia et al. 2020). Simultaneously, the application of DL is mainly based on the assumption that the created training data set can correctly represent the relationship between the attribute to be predicted and the input data. It should be stressed that it is important to choose the training set that can represent the population well.

Besides, previous numerous research has focused on deploying supervised learning methods. While unsupervised learning methods may play a key role in geotechnical engineering in the future, because the main learning mechanisms of humans and animals are unsupervised-discovering the world through observation, rather than being informed on all objects (Gentine et al. 2018). Although supervised learning can synthesize all features to predict label values, unsupervised learning can more deeply mine potential connections and automatically extract features, such as the DL method of GAN can learn a function (generator) with the ability to generate data in various forms (images, speech, languages, etc.) from a large amount of unlabeled data, providing new ideas for image-based prediction, which has great potential even though temporarily applied less. And from a developmental perspective, we find that only a handful of researchers use GAN method in soil related but more rock image identification studies in the process of reviewing articles. It shows the expansibility of multi-dimensional image reconstruction of soil mass in geotechnical engineering。

Eventually, DL offers a potential solution for the field requested to address massive data by combing the advents of cluster computing environment and more powerful personal computers. The DL-enhanced prediction models also have shown promising performance when analyzing huge and complicated geological data and extracting meaningful finding. It is of our belief that the major progress of the big data analysis platform is likely to be achieved by combing various DL methods with complex reasoning.

6 Concluding remarks

Currently, a variety of DL methods are adopted for geotechnical engineering practices, such as tunnel construction, slope displacement prediction, landslide susceptibility evaluation and pile bearing capability assessment. Essentially based on the powerful nonlinear reflection and training function of deep neural networks, DL can progressively extract higher level features from the raw data while both the real-time and historical data can be utilized as inputs to develop DL models. Taking the advantages of efficiently processing increasing amount of data as well as requiring less subjective judgement, DL has outperformed over other ML methods for geotechnical applications.

Accordingly, this study presented a systematic review on different DL approaches applied in the geotechnical field and bridged the knowledge gap by performing a structured review of relevant literature focus on the use of DL applications in recent years. A summary of annual distribution of published journal articles focus on the DL application is depicted from the web of science database. Meanwhile, the distribution of selected articles amongst the contributive journals and corresponding publications are presented in the form of tables and pie charts. Furthermore, all selected papers were grouped into different categories considering the DL adopted. By reviewing the application of DL technology in various geotechnical aspects, the deep FNN, RNN and its optimized versions LSTM, CNN and GAN are widely used in geotechnical engineering, hence, the basic architecture of these most popular DL methods are also mainly introduced.

With respect to the specific applications, the literature review indicates that FNN has the longest development time and is the most widely used in the field of geotechnical engineering, while deep FNN is rarely used considering the limitation of the ability to improve the estimation results. RNN is more suitable for time series problem, and its evolutionary version LSTM shows a more satisfactory performance for long-term prediction, so it is widely used in landslide deformation prediction, tunnel boring machine parameters prediction. CNN is better at solving image process research, such as porous media reconstruction. For the unsupervised learning algorithm GAN, which is appeared in the latest years, its application in the geotechnical field is limited. However, its great potential lies in combining with other supervised learning algorithms due to its excellent generating ability.

Through the results of this review, it is evident that the adoption of DL algorithms for geotechnical engineering is an emerging field of study given the increasing trend in annual publications. As more monitoring data collected from different sites, the efficiency of DL methods may aid intelligent early warning by providing data driven inputs. Moreover, it can also be concluded that based on the development of technologies required for storing, computing, processing, analyzing, and visualizing of big data, DL theory may be able to extract more valuable knowledge and can better tap the potential links between information. Therefore, the combination of big data and DL is becoming the new trend of AI in geotechnical engineering.