1 Introduction

Homogeneity and isotropy are two properties that are take into account for most of the materials such as steel, concrete and timber in civil engineering design (Shahin et al. 2001; Park 2011). Although for the soils, it has been proved that due to complexity of geological formation which causes the imprecise physical processes, the geotechnical engineering properties of soil show varied and uncertain behavior (Jaksa 1995). Therefore developing analytical or empirical models in some simplified situations are feasible; however models that are more practical and less expensive than the analytical ones are of interest (Shahin et al. 2001; Park 2011). Inherent soil variability, loading, time and construction effects, human error, errors in soil boring, sampling, in situ and laboratory testing, characterization of the shear strength and stiffness of soils are some of the recognized uncertainty sources.

In this regard, Artificial Neural Networks (ANNs) as an alternate method is well suited to model complex problems where the relationship between the model variables is unknown (Hubick 1992). The ANNs are relatively crude electronic and computational models based on the neural structure of the brain consist of billions of highly interconnected neurons which provide a strong predicting and classification tool (e.g. Maulenkamp and Grima 1999; Zurada 1992; Fausett 1994).

Original ANNs as a sub system of Artificial Intelligence (AI) was introduced by McCulloch and Pitts (1943), and since then as an applicable tool have been used successfully for modeling of various fields of science and technology (e.g. Maier and Dandy 2000; Shahin et al. 2001; Zaheer and Bai 2003; Das 2005) in particular to almost all aspects of geotechnical engineering to solve complicated problems (e.g. Sayadi et al. 2013; Shahin et al. 2008; Basheer et al. 1996; Zhou and Wu 1994; Baziar and Ghorbani 2005; Goh 2002; Hanna et al. 2007; Kim and Kim 2006; Mayoraz et al. 1966; Fernandez-Steeger et al. 2002) as well as estimating geotechnical soil properties (e.g. Celik and Tan 2005; Lee et al. 2003; Yang and Rosenbaum 2002; Erzin 2007; Gribb and Gribb 1994; Sinha and Wang 2008; Cal 1995).

In the present paper applicability of ANNs in estimation of clay sensitivity (St) as one of the geotechnical soil properties has been notified. According to literature reviews the clay sensitivity parameter has a close relation to a unique type of high sensitive clay namely quick clay in Sweden (e.g. Nadim et al. 2008, Rankka et al. 2004; Rosenquist 1953; Torrance 1983; Lundström et al. 2009; Solheim et al. 2005; Abbaszadeh Shahri et al. 2015) and some other northern countries such as Norway, Canada, USA and Russia. Existence relation between this parameter with quick clay which is prone to slide and considered as the main responsible of occurred landslide in Sweden is the reason of focusing on this factor.

Reduction of clay shear strength to a very small fraction of its former value on remoulding at constant moisture content is called St and cyclic loading produced by wind, waves, ice and snow accumulation, earthquakes and other live loads cause cyclic stresses on foundations may lead to quick clay conditions and catastrophic failure. Terzaghi (1944) originally defined the St in terms of unconfined compressive strength (UCS) however; the concept of St as a ratio between the undisturbed undrained shear strength (Su) and the remaining strength after a so complete remoulding of the material that no further reduction can occur (disturbed undrained shear strength, Sur) is commonly used to describe the possible loss of strength in clay due to remoulding (Åhnberg and Larsson 2012). The Su can be assessed by in situ testing methods such as field shear vane and piezocone penetration test (PCPT or CPTu). The CPTu is a special type of Cone Penetration Test (CPT) which allows additional measurement of porewater pressure (u) generated during the penetration as well as cone tip resistance (qc), sleeve friction (fs) and depth (e.g. Baligh et al. 1980; Tumay et al. 1981; Zuidberg et al. 1982; Lunne et al. 1997; Cai et al. 2010; Abbaszadeh Shahri et al. 2015). The CPTu not only provide valuable information on soil types but are also useful in deriving correlations with the engineering properties of soil for the purposes of analysis and design of foundations. In the recent years, the CPT and CPTu have been used as standard investigation tools, mainly to determine quickly the soil profile (through the friction ratio) as well as estimation of the Su. In case of cohesive soils, the Su is the most important quantity for geotechnical design in clay (Schmertmann 1975; Anagnostopoulosi et al. 2003; Robertson 1999) and hence many empirical correlations have been developed to find a clear relationship between qc and Su from in situ tests such as CPTu and laboratory tests (e.g. Lunne and Kleven 1981; Jamiolkowski et al. 1982; Aas et al. 1986; Stark and Juhrend 1989; Mitchell and Brandon 1998; Lunne et al. 1986; La Rochelle et al. 1988; Rad and Lunne 1988). However, the accuracy of these correlations is poor, and their underlying theory is undependable (Kim et al. 2006).

The objective of the present paper is to evaluate the feasibility of different ANN algorithms to predict St using CPTu data. The quick propagation, conjugate gradient descent, quasi-Newton, limited memory quasi-Newton and Levenberg–Marquardt were the trained tested and developed ANN algorithms.

The optimized ANN model was selected using try and error method and tested by several statistical analyses criteria. The results showed that the conjugate gradient descent algorithm with minimum network root mean square error (RMSE) indicate better correlation with measured data. The performed sensitivity analysis in this study represented that depth and pore pressure are the two most effective and cone tip resistance is the least effective factors on prediction of St.

2 Study Area and Available Data

The selected area in this study with several subzones is around the Göta River near the Lilli Edet in the southwest of Sweden (Fig. 1) which has been recently studied by the Swedish Geotechnical Institute (SGI). A part of the mentioned area has been the subject of several geophysical investigations as well as CPTu analysis as referenced by Abbaszadeh Shahri et al. (2015).

Fig. 1
figure 1

Location of studied area in this paper (Swedish Geotechnical Institute (SGI) web site; http://bga.swedgeo.se/bga/)

Large number information for CPTu data and some laboratory test results are available in SGI website which this study used a total of 70 CPTu geotechnical test points and their available laboratory results belonging to subzones 5 and 7. The evidences of previous investigations indicate that the quick clay has been detected in glacial clay as layers or planes in clay sediments (Lindskog 1983; Klingberg 2010).

3 Estimation of Clay Sensitivity Using CPTu Data

To determine the St, the CPT or CPTu has been used to estimate the peak and minimum Su of clay soils through empirical relations (Robertson 1999). However there are several methods for estimating the minimum Su of soils from in situ tests (Worth 1984; Lunne et al. 1997; Seed and Harder 1990; Stark and Mesri 1992; Wride et al. 1999; Yoshimine et al. 1999; Ishihara 1993).

Typically, the peak Su from CPTu data is estimated using Eq. (1) as proposed by Lunne et al. (1997).

$${\text{S}}_{{{\text{u}}\left( {\text{peak}} \right)}} = \frac{{{\text{q}}_{\text{t}} -\upsigma_{\text{v}} }}{{{\text{N}}_{\text{k}} }}$$
(1)

where q t total cone penetration resistance corrected for unequal end area effects, N k empirical cone factor between 10 and 20 with an average of about 15, σ v total overburden stress

Another possibility to calculate Su is the use of effective cone resistance (qE) which can be defined by Eq. (2).

$${\text{S}}_{\text{u}} = \frac{{{\text{q}}_{\text{E}} }}{{{\text{N}}_{\text{ke}} }} = \frac{{{\text{q}}_{\text{t}} - {\text{u}}_{2} }}{{{\text{N}}_{\text{ke}} }}$$
(2)

where Nke is the empirical cone factor (for the expression using qE).

According to Lunne et al. (1997), the minimum (residual) Su in clays often assumed to be equal to fs, since the clay is almost fully remoulded as it passes the fs and hence the clay sensitivity can be define by Eq. (3).

$${\text{s}}_{\text{t}} = \frac{{{\text{S}}_{{{\text{u}}\left( {\text{peak}} \right)}} }}{{{\text{S}}_{{{\text{u}}\left( { \hbox{min} } \right)}} }} = \frac{{{\text{q}}_{\text{t}} -\upsigma_{\text{v}} }}{{{\text{f}}_{\text{s}} \times {\text{N}}_{\text{k}} }}$$
(3)

The correction of fs (Lunne et al. 1997) and qc (Robertson et al. 1986) for pore water pressure can be obtained by Eqs. (4) and (5).

$${\text{f}}_{\text{t}} = {\text{f}}_{\text{s}} - \frac{{\left( {{\text{u}}_{2} \times {\text{A}}_{\text{sb}} - {\text{u}}_{3} \times {\text{A}}_{\text{st}} } \right)}}{{{\text{A}}_{\text{s}} }}$$
(4)
$${\text{q}}_{\text{t}} = {\text{q}}_{\text{c}} + {\text{u}}_{2} \left( {1 - {\text{a}}} \right)$$
(5)

where f t corrected sleeve friction, u 2 water pressure at the base of sleeve (measured pore pressure), A sb cross section area of sleeve at the base, A st cross section area of sleeve at the top, A s surface area of sleeve, a ration between cone base un affected by the pore water pressure to total shoulder area.

The suggested normalized friction ratio (Fr) by Robertson (1990) is defined as Eq. (6) and hence the clay sensitivity can be estimated by Eq. (7) from CPTu data.

$${\text{F}}_{\text{r}} = \frac{{{\text{f}}_{\text{s}} }}{{{\text{q}}_{{{\text{t}} - }}\upsigma_{\text{v}}^{\prime} }} \times 100$$
(6)
$${\text{s}}_{\text{t}} = \frac{100}{{{\text{F}}_{\text{r}} \times {\text{N}}_{\text{k}} }}\mathop{\longrightarrow}\limits^{{{\text{N}}_{{{\text{k}} = 10 - 20}} }}{\text{s}}_{\text{t}} = \frac{10}{{{\text{F}}_{\text{r}} }} {\text{to s}}_{\text{t}} = \frac{5}{{{\text{F}}_{\text{r}} }}$$
(7)

where \(\upsigma_{\text{v}}^{\prime}\) is effective overburden stress.

In the case of soft clays the measured qc values are relatively small and hence even minor errors can influence the measured values significantly (Rémai 2013). Therefore for very soft clays the use of excess pore water pressure may be better to find a reliable correlation (Eq. 8). Based on Robertson et al. (1986) and Robertson (1990), pore pressure ratio (Bq) and normalized cone resistance (qcnrm) can be defined as Eqs. (9) and (10).

$$S_{u} = \frac{\Delta u}{{N_{\Delta u} }} = \frac{{u_{2} - u_{0} }}{{N_{\Delta u} }}$$
(8)
$$B_{q} = \frac{{u_{2} - u_{0} }}{{q_{t} - \sigma_{v} }}$$
(9)
$$q_{crnm} = \frac{{q_{t} - \sigma_{v} }}{{\sigma_{v}^{\prime} }}$$
(10)

where u 0 in situ pore pressure, \(\sigma_{v}\) and \(\sigma_{v}^{\prime}\) total and effective overburden stress and NΔu is the empirical cone factor (Lunne et al. (1985): 4 and 10; Karlsrud et al. (1996): 6 and 8; Hong et al. (2010): 4 and 9).The qcnrm and Fr in normally consolidated insensitive clays is around 2–6 and 5–10 % respectively (Robertson 1999).

Robertson (2008) defined the Eq. (11) for estimating the St.

$${\text{S}}_{\text{t}} = \frac{{{\text{S}}_{\text{u}} }}{{{\text{S}}_{\text{ur}} }} = \frac{7.1}{{{\text{F}}_{\text{r}} }}$$
(11)

where Sur is remoulded Su at the same water content of undisturbed Su.

By increasing the St of the clays, the fs and friction ratio will decrease (Robertson 1999). Based on this relation, in general St > 50 characterizes the quick clay, however as shown in Fig. 2 different classifications for sensitivity have been developed which show that the quick clay has been defined based on its geotechnical behavior rather than its composition. In Sweden quick clay are defined as clays with St > 50 and Sur < 0.4 kPa (Rankka et al. 2004) whereas in Norway, they are defined as clays with St up to 30 and Sur < 0.5 kPa (Lundström et al. 2009) and in Canada as clays with Sur < 1.0 kPa and a liquidity index of at least 1.2 (Robitaille et al. 2002).

Fig. 2
figure 2

Comparison of different classifications of St values (figure is provided by Abbaszadeh Shahri)

4 Applied Method

The efficiency of ANN in handling of highly non-linear relationships in data, even when the exact nature of such relationship is unknown can be considered as one of the major advantages of ANN. Therefore, due to the potential of mapping complex and non-linear relations between input and output variables of a system which are commonly used in non-linear engineering problems, the ANNs can be applied successfully in learning related classification, generalization, characterization and optimization functions.

The ability of ANNs for working with incomplete data, possess an error tolerance and show gradual convergence have been proved and hence they can easily form models for complex problems. Especially in the development of solutions for semi-structural or non-structural problems, ANN models can provide very successful results that are cheaper, faster and more adaptable than traditional methods. Hence, Networks are suitable approach to decompose a complex system into simpler elements or gathering simple elements to produce a complex one (Bar-Yam 1997).

Among of the various algorithms for training neural networks, the feed-forward back-propagation algorithm is the most efficient, simplest and also most general ones which are used for supervised training of multilayered neural networks. Back propagation works by approximating the non-linear relationship between the input and output by adjusting the weights values internally. It can further be generalized for the input that is not included in the training patterns (predictive abilities).

In general, training and testing are common two stages of the back propagation network. Before taking new information, a network should be trained to provide the most efficient learning procedure for multilayer neural network. This fact that back-propagation algorithms are especially capable of solving predictive problems makes them very popular. The operations of the back propagation neural networks can be divided into feed forward and back propagation steps. In the feed forward step, an input pattern is applied to the input layer and its effect propagates, layer by layer, through the network until an output is produced.

As shown in Fig. 3, an artificial neuron as the basic element of a neural network consists of input, weights, bias, transfer function, activation function and output. Each neuron receives inputs x1, x2, …, xn attached with a weight wi which shows the connection strength of input for each connection which then multiply by the corresponding weight of the neuron connection.

Fig. 3
figure 3

Basic component of an artificial neuron and simplified procedure of Δ-rule (gradient descent)

A bias (bi) can be defined as a type of connection weight with a constant nonzero value added to the summation of inputs and corresponding weights ‘u’, given in Eq. (12) (Cevik et al. 2011).

$$u_{i} = \mathop \sum \limits_{j = 1}^{H} w_{ij} x_{j} + b_{i}$$
(12)

The summation ui is transformed using a scalar-to-scalar function called an activation or transfer function (φ(ui)) yielding a value called the unit’s “activation”, given in Eq. (13).

$$Y_{i} = f \left( {u_{i} } \right)$$
(13)

In the output layer, the neuron computes the total weighted input xj (Eq. 14) and then calculates the activity yj using some function of the total weighted input (Fig. 3). The Fermi (Logistic or Sigmoid) and hyperbolic tangent functions (Eqs. 15 and 16) are very popular activation functions which this study considered both of them.

$$x_{j} = \mathop \sum \limits_{i = 1}^{n} y_{i} w_{ij}$$
(14)
$$y_{i} = \frac{1}{{1 + e^{{ - x_{j} }} }} sigmoid$$
(15)
$$y_{i} = \frac{{e^{{x_{j} }} - e^{{ - x_{j} }} }}{{e^{{x_{j} }} + e^{{ - x_{j} }} }} hyperbolic \, tangent$$
(16)

The network’s actual output value is then compared to the expected output, and an error signal is computed for each of the output nodes (Eq. 17).

$$E = \frac{1}{2}\mathop \sum \limits_{i = 1}^{n} \left( {y_{i} - d_{i} } \right)^{2}$$
(17)

Since all the hidden nodes have, to some degree, contributed to the errors evident in the output layer, the output error signals are transmitted backwards from the output layer to each node in the hidden layer that immediately contributed to the output layer. This process is then repeated, layer by layer, until each node in the network has received an error signal that describes its relative contribution to the overall error.

Once the error signal for each node has been determined, the errors are then used by the nodes to update the values for each connection weights until the network converges to a state that allows all the training patterns to be encoded (Eq. 18).

$$w_{ji} \left( {t + 1} \right) = w_{ji} \left( t \right) + \Delta w_{ji} \left( {t + 1} \right)$$
(18)

where t is the iteration number between the output and hidden layers and ∆w indicates the next value of the adaptation weights.

The back propagation algorithm looks for the minimum value of the error function in weight space using a technique called the Δ-rule or gradient descent (Fig. 3; Eq. 18) (Rojas 1996; Rumelhart et al. 1986). The weights that minimize the error function are then considered to be a solution to the learning problem.

5 Assessing the Optimized Network Architecture

In this paper, different types of networks using the MATLAB, NuMap7 (nonlinear regression/approximation networks) and NuClass 7 (nonlinear classification networks) which have developed in university of Texas at Arlington have been examined (training and testing) and developed to find an optimized ANN architecture model to predict the St.

Application of different ANN algorithms including quick propagation, conjugate gradient descent, quasi-Newton, limited memory quasi-Newton and Levenberg–Marquardt is advantage of this paper. The logistic, hyperbolic tangent and linear functions were used for activation of hidden and output layers and the sum-of squares were employed as output errors function respectively. The operation to find the optimized network architecture based on try and error method was started with one hidden layer and logistic activation function. Three components including the number of neurons, training algorithm and activation functions were considered and then the process was executed in three different stages. In each stage two of the mentioned components are fixed and the other will change. Therefore, numerous structures using different training algorithms with various activation functions as well as number of neurons in hidden layers were generated and controlled. For example, in number of neuron 10, several structures such as 4-4-3-3-1, 4-3-4-3-1, 4-3-2-5-1 and 4-3-5-2-1 separately was controlled for all training algorithms using hyperbolic tangent, logistic and then linear function. Then the operation was repeated and tested for the same structure using both logistic and hyperbolic tangent in different hidden layers. The value of network correlations and minimum root mean square error (RMSE) were the criteria to select the optimized network structure model. For each tested model the network error using RMSE was calculated and as presented in Fig. 4, the minimum error was observed in number of neurons 11 which is correspond to 4-4-4-3-1 structure. This ANN structure model under training of conjugate gradient descent ANN algorithm and using the logistic activation function both in hidden and output layers was selected as the optimized structure model (Fig. 5). The characteristics of used database and ANN algorithms are given in Tables 1 and 2. The percentage of data for training, testing and validation with randomized selection were considered as 55, 20 and 25 % respectively.

Fig. 4
figure 4

Network performances of applied ANN algorithms for different number of neurons and some of the tested structure architectures

Fig. 5
figure 5

Optimized network structure in this study

Table 1 Characteristics of employed ANN architecture and datasets
Table 2 Properties of used ANN algorithms

In order to evaluate the results, the regression analyses between measured and predicted clay sensitivity values using the optimized ANN structure were performed (Fig. 6; Table 3). To compare the data scattering a 1:1 slope line has been used to show the dispersion of obtained data which can be used for interpretation of exact prediction and correlation (Fig. 7).

Fig. 6
figure 6

Regression results of the measured and predicted clay sensitivity using the employed ANN algorithms

Table 3 Obtained results of applied ANN models for prediction of clay sensitivity using CPTu data
Fig. 7
figure 7

Dispersion of measured and predicted St values regarding 1:1 slope line for the selected area

6 Discussion

In the recent years the ANN algorithms were modified many times and as a large number of training algorithms appeared that generally have improved characteristics as compared with the original. The quick propagation method (Fahlman 1988; Fahlman and Lebiere 1990) is a heuristic modification of the back propagation algorithm shows good results when working with most problems. The conjugate gradient descent method (Shewchuk 1994) ensures perfect training speed when working with up to 2000 training sets.

The Levenberg–Marquardt method as an advanced non-linear optimization and fastest available algorithm for multi-layer perceptrons illustrates excellent results when working with small training sets. However, the Levenberg–Marquardt can only be used on networks with a single output unit with small networks (a few hundred weights) because its memory requirements are proportional to the square of the number of weights in the network. It is only defined for the sum squared error function and therefore it is only appropriate for regression problems (Levenberg 1944; Marquardt 1963; Lourakis 2005; Nielson 1999; Transtrum and Sethna 2012).

On the basis of Newton’s method, the quasi-Newton algorithm computes an approximate Hessian matrix during any iteration based on the gradients whereas the limited memory quasi Newton as a variation of quasi Newton avoids the need to store Hessian matrix and thus require less memory and can be used for bigger networks (Bertsekas 1995).

As mentioned above, due to number of data, the author didn’t expect to get suitable results with Levenberg–Marquardt method regarding to other tested algorithms because it works well for small data sets, however the quick propagation method showed moderate adaptability (Fig. 6). Hence the competition for best applicable algorithm was between the conjugate gradient descent, quasi Newton and limited memory quasi Newton methods.

According to obtained RMSE as selection criteria for optimized ANN model, the conjugate gradient descent showed the minimum RMSE but based on the network type and number of used data sets, the author also expected to get reasonable results from quasi Newton and limited memory quasi Newton regarding to quick propagation and Levenberg–Marquardt as shown in Fig. 6. In Fig. 7, the located points on the 1:1 slope line indicate exact prediction and correlation which can demonstrate the accuracy of predicted St values using the tested ANN algorithms.

Moreover, in addition of using RMSE and coefficient of determination (R2), the performance of selected ANN algorithms were tested by mean absolute percentage error (MAPE), variance absolute relative error (VARE), median absolute error (MEDAE) and variance account for (VAF) statistical criteria (Eqs. 1922). Higher value of VAF and lower values of MAPE, VARE and MEDAE illustrate better network performance in prediction of St as presented in Table 4.

$$MAPE = \frac{1}{n} \times \left[ {\mathop \sum \limits_{1 = 1}^{n} \left| {\frac{{t_{i} - x_{i} }}{{t_{i} }}} \right| \times 100} \right]$$
(19)
$$VARE = \frac{1}{n} \times \left[ {\mathop \sum \limits_{i = 1}^{n} \left( {\left| {\frac{{t_{i} - x_{i} }}{{t_{i} }}} \right| - mean\left| {\frac{{t_{i} - x_{i} }}{{t_{i} }}} \right|} \right)^{2} } \right] \times 100$$
(20)
$$MEDAE = median\left( {t_{i} - x_{i} } \right)$$
(21)
$$VAF = \left[ {1 - \frac{{var\left( {t_{i} - x_{i} } \right)}}{{var\left( {t_{i} } \right)}}} \right] \times 100$$
(22)

where ti and xi are measured and predicted values.

Table 4 Results of statistical criteria for tested ANN algorithms

The sensitivity analysis (Jong and Lee 2004; Eq. 21) as a method for determination of the effectiveness of each input parameter showed that depth with 47.932 % and pore pressure ratio with 42.76 % are the most and cone tip resistance with 37.59 % is the least effective input parameters for St prediction using the selected ANN model (Fig. 8).

$${\text{R}}_{\text{ij}} = \frac{{\mathop \sum \nolimits_{{{\text{k}} = 1}}^{\text{m}} \left( {{\text{x}}_{\text{ik}} \times {\text{x}}_{\text{jk}} } \right)}}{{\sqrt {\mathop \sum \nolimits_{{{\text{k}} = 1}}^{\text{m}} {\text{x}}_{\text{ik}}^{2} \mathop \sum \nolimits_{{{\text{k}} = 1}}^{\text{m}} {\text{x}}_{\text{jk}}^{2} } }}, \quad {\text{x}}_{\text{i}} \, {\text{and x}}_{\text{j}}{:}\,{\text{elements of data pairs}}$$
(23)
Fig. 8
figure 8

Sensitivity analysis of input parameters in prediction of clay sensitivity

To represent how the obtained results of CPTu data and ANN algorithms can fit to real data; tow test points from different subzones of studied area randomly have been selected. At the first, by use of the CPTu data the predicted soil profile for each test point based on Robertson et al. (1986) was provided and then correlation between results of confirmed ANN architecture between CPTu results and measured data as a function of depth were executed and plotted (Fig. 9). The collected laboratory results of clay sensitivity show that 34.96 % data falls in the range of 20 < St < 30, 33.21 % in the domain of St > 32 and 31.818 % in the range of 16 < St < 20. The detailed results of sensitivity classifications (Fig. 2) are given in Table 5 which can be compared and correlated with the presented test point examples in Fig. 9 respectively.

Fig. 9
figure 9

Comparison of measured sensitivity with predicted values by CPTu data and various types of tested ANN algorithms. The predicted soil profile for each test point have been obtained using proposed method by Robertson et al. (1986)

Table 5 Comparison of available data in the study area with define range by previous researchers

It is proved that the ANN models can be varied from case to case. However, the proposed model in this study can be a very good initial guess to develop and adapt for another area. Due to flexibility of proposed ANN model in this study, it is recommended that without changing the model structure, use other training algorithms and then change the activation functions alternately. In final step, the number of neurons or neuron arrays can be considered.

7 Conclusions

ANNs can be applied for problems where the relationships may be quite dynamic or non-linear and can provide an analytical alternative to conventional techniques which are often limited by assumptions. ANN can capture many kinds of relationships and allows to model phenomena which otherwise may have been very difficult or impossible to explain. This modeling capability, as well as the ability to learn from experience, has given ANNs superiority over most traditional modeling methods since there is no need for making assumptions about what the underlying rules that govern the problem in hand could be.

In the present paper several ANN algorithms with CPTu as inputs and clay sensitivity as output were tested and developed and the procedure to find the optimized ones were executed. The results of try and error method for testing numerous ANN structure architecture showed that increasing the number of hidden layers up to 4 in ANN will be able to improve the results and in this condition the prediction of clay sensitivity by ANN can be reliable and reasonable and then the 4-4-4-3-1 structure due to minimum RMSE and higher correlation coefficient was confirmed as the developed optimized network structure. The proposed model for the selected area in this paper can be update and adapt for another area. However, the optimized topology can play an important role for initial guess to develop and making compatible for another area.

Depending on the nature of the application and the strength of the internal data patterns it can generally expect a network to train quite well and hence according to obtained results in Table 2 and Fig. 4, in this study the conjugate gradient descent training algorithm gives better results and therefore it can be applied as an alternative method for reasonable prediction of sensitivity and also the condition of data scattering using a 1:1 slope line to show the correlation were presented.

The sensitivity analysis showed that the depth and pore pressure are the two most and cone tip resistance is the least effective factor on estimation of clay sensitivity in this study.

A comparison between the proposed classifications by other researcher with our data to show adaptability percentage was executed (Table 5). The performed correlation between the laboratory and ANN results with predicted soil profile using the CPTu data represent a good condition whereas the nearly most of all measured sensitivity and those obtained by CPTu data and ANN algorithms fall in the part of soil profile which indicate the sensitive fine grained.