1 Introduction

The undrained shear strength (USS) is the soil’s ability to resist shear stresses and is commonly utilized as a measure of this capability. When performing calculations related to various geotechnical topics, such as settlement induced by external forces, the USS value is widely standard as a crucial parameter (Behnam Sedaghat et al. 2023; Akbarzadeh et al. 2023; Sharma and Singh 2018). Accurate evaluation of USS is crucial for the authentic assessment of its sub-structure constancy. The USS value of the soil can be assessed through either in situ evaluations or laboratory examinations.

Geosciences, integral to geotechnical engineering, involve the study of Earth materials and their interactions. Essential components include soil mechanics, the cone penetration test (CPT) for in-situ analysis, USS as a key parameter for sub-structure stability, and laboratory examinations for precise property determination. Recent advancements include the application of machine learning (ML), such as support vector machines (SVM) (Wang 2005; Kordjazi et al. 2014) and artificial neural networks (ANN) (Mojumder 2020; Sulewska 2017; Shahin et al. 2001; Majdi and Rezaei 2013), to predict geotechnical properties, showcasing an evolving interdisciplinary approach in the field. The CPT is a geotechnical method involving a cone-tipped probe pushed into the ground, measuring resistance to penetration. It efficiently assesses soil properties, including shear strength and USS. CPT conducted on site are efficient for identifying types of soil, evaluating their properties such as shear strength, and various other geotechnical applications. Compared to traditional methods such as boring and experimental testing, in-situ CPT is a reliable, efficient, and cost-effective technique for evaluating soil characteristics. The CPT examination provides valuable and in-depth information about soil stiffness and strength that can aid in determining the soil’s ultimate bearing capacity (UBC), making it a useful tool for obtaining consistent soil profiles. In recent years, certain experimental methods have been expanded in order to estimate the USS using parameters obtained from in-situ cone penetration testing (Lunne 1982; Senneset 1982; Khajeh et al. 2021). Despite this, a variety of methods involve certain assumptions and judgments when selecting appropriate factors for correlation coefficients, such as the cone tip factor, \({N}_{{\text{kt}}}\) in CPT profiles and USS, which can have an impact on the calculation of shear strength. The assessment of the USS may be imprecise for different site scenarios due to this issue. For a true stability assessment of sub-structures, the USS should be evaluated accurately. Lab or field testing can be used to calculate the USS value of the soil. The quality of the collected data determines how accurate laboratory tests, such the undrained triaxial compression test and the direct shear test, can be (Prasad et al. 2007; Buò et al. 2019).

Additionally, this precision is significantly influenced by the friction coefficient of penetrometers at diverse depths (Tran et al. 2017; Tran and Sołowski 2019). The USS tests conducted in laboratories are found to be both laborious and time-consuming. The quality of laboratory conditions and imprecisions encountered in USS values obtained from soil samples also contribute to the challenges mentioned above (Zumrawi 2012). Experimental correlation techniques are frequently employed when measurements related to soil’s USS are absent or unreliable. The utilization of these methodologies is predicated upon the distinct attributes intrinsic to the soil (Hansbo 1957; Chandler 1988; Larsson 1980; Mbarak et al. 2020).

In recent years, the field of geotechnical engineering has experienced a notable proliferation in the application of ML techniques. The extensive utilization of diverse analytical models for the prediction of geotechnical properties of soils has been comprehensively documented in the existing literature (Onyelowe et al. 2021, 2022, 2023a, 2023b, 2023c; Gnananandarao et al. 2020). Specifically, the friction capacity of driven piles (Samui 2008), the soil’s shear strength (Ly and Pham 2020), the bearing resistance of shallow foundations (Kuo et al. 2009), and the peak friction angle of soils (Padmini et al. 2008) have been estimated using advanced methodologies such as SVM, ANN, and regression tree analysis (Kanungo et al. 2014). In contrast to single ML models like the least square support vector machine (LSSVM), hybrid ML models have been used in geotechnical research to improve the prediction power of models. Using cuckoo search optimisation (CSO) to forecast soil shear strength is one example of this (Tien Bui et al. 2019). The shuffling frog leaping algorithm (SFLA), salp swarm algorithm (SSA), wind-driven optimisation (WDO), and elephant herding optimisation (EHO) are some of the hybrid models that have been used to estimate soil shear strength (Moayedi et al. 2020).

A number of civil engineering applications, such as the investigation of frozen soil liquefaction and foundation settling using piles, have seen positive outcomes from the application of ANN (Tavana Amlashi et al. 2023; Das and Basudhar 2006; Ikizler et al. 2010; Nejad and Jaksa 2017; Yuan et al. 2022). The utilization of ANNs for the estimation of USS from CPT is proposed as a means of mitigating the limitations of traditional methods (Samui and Kurup 2012).

ANN methodologies exhibit a propensity to iterate the learning process of the human brain through prior examples and are then acquired by means of various mathematical algorithms. An ANN was employed to conduct a survey aiming to develop a model with improved accuracy in estimating USS through the utilization of CPT data. The data presented in this study consisted of empirical investigations and uneventful soil examinations conducted across different sites within Louisiana. Numerous ANN models were trained by incorporating diverse soil properties, such as sleeve friction (SF) and cone tip resistance. The results of the investigation demonstrate that the ANN approach exhibited superior performance in estimating the UBC of soil relative to the conventional method. The perspective mentioned above emphasizes the promising utility of these methodologies in the analysis of soil (Abu-Farsakh and Mojumder 2020). In a previous study, Bayesian optimization was utilized to optimize Random Forest (RF) and data-driven extreme gradient boosting (XGB) algorithms for the purpose of uncovering the relationships that exist between soil properties and the USS (Zhang et al. 2021). In previous studies, notably in Jamhiri et al. (2021, 2022), TA-based models demonstrated commendable results; however, a critical imperative exists to explore and implement innovative models grounded in ML and metaheuristic algorithms.

This paper uniquely contributes by unveiling the potential of employing the MLP model, integrating novel optimization algorithms, with a focus on predicting USS from CPT records. The study introduces a ML model designed to predict USS, leveraging an experimental dataset from reputable sources. Employing an MLP, the study constructs robust composite models integrating bonobo optimizer (BO), dynamic control cuckoo search (DCCS), and smell agent optimization (SAO) techniques for USS prediction in soils. This amalgamation enhances predictive precision by optimizing model parameters, ensuring robust and efficient predictions for USS. The resulting composite models exhibit a heightened ability to handle complex relationships within the dataset, improving adaptability to diverse data patterns.

Moreover, the streamlined optimization process of metaheuristics contributes to efficient convergence, ultimately bolstering model efficiency and generalization. In essence, this integration offers a synergistic approach that significantly augments the overall performance and versatility of the MLP model in soil applications. Various performance metrics, including R2, RMSE, NRMSE, and n10_index, are thoughtfully examined to evaluate the suggested models’ correctness.

2 Materials and methodology

2.1 Data gathering

The present study segments the USS data acquired through experimental techniques into four input variables, namely plastic limit (PL), liquid limit (LL), overburden weight (OBW), and SF. Subsequently, the USS value assumes the role of the identified output parameter. The dataset was stratified into two distinct phases: a train phase, which accounted for \(70\%\) of the data, and a test phase, which comprised \(30\%\) of the data. To predict the USS of soil from the CPT, the CPT dataset recorded from corresponding bore log data was gathered from different sites in Louisiana (Mojumder 2020). A quantitative representation of the parameters used in the model creation is shown in Table 1. This table exhibits certain distinct characteristics, encompassing the highest permissible value (Max) and the measure of variation denoted by the standard deviation (St. Dev). In the context of statistical analysis, the terms standard deviation (SD), mean (M), and minimum (Min) are often employed. The present investigation has yielded findings on the extreme values of SF, with maximum and minimum values of 1.4 and 0.02, respectively. Additionally, the variables \(L\)\(\left( {50,12} \right)\), LL\(\left( {133,24} \right)\), and OBW\(\left( {3.64,0.03} \right)\) were observed. Furthermore, the study set a target range for USS, with the upper and lower limits for this variable set at 5670 and 100, respectively.

Table 1 The examination of input and output variables statistically

2.2 Multi-layer perceptron (MLP)

The MLP is a frequently employed Neural Network methodology, typically trained using the backpropagation algorithm. The MLP has been designed specifically for modeling asset processes and facilitating the acquisition of knowledge through estimation and train techniques. MLP Neural Networks have gained recognition as a valuable tool for modeling complex and nonlinear processes that frequently occur in real-world scenarios. This is largely due to the innate adaptability of MLP networks, which possess formidable approximation capabilities (Moayedi and Hayati 2018). Three interconnected layers make up the architecture of an MLP model: Input, Output, and Hidden. The number of predictor variables is displayed by a few nodes in the input layer. Moreover, a single hidden layer within an MLP possesses the capability to represent multifarious functions through its respective hidden neurons proficiently. An insufficient quantity of neurons results in suboptimal functionality of neural networks. On the other hand, MLP neural networks pose challenges not only with regard to the complexity of training but also with a proclivity to overfitting. The nodes located in the output layer are correlated with the number of variables that have been modeled.

An MLP Neural Network is used in the function modeling challenge using one predictor to create a generalization of the nonlinear function \((h)\) in which \(X\in {R}^{D}\to Y\in {R}^{1}\). The variables denoted as \(Y\) and \(X\) correspond to the input and output parameters, in that order. The \(h\) denotes the function as mentioned above, and it is mathematically expressed in the form of Eq. (1):

$$Y=h\left(X\right)={s}_{2}+{M}_{2}\times ({k}_{b}\left({s}_{1}+{M}_{1}\times X\right))$$
(1)

M1 and M2 represents the weight matrixes of the output and hidden layers, respectively. s1 and s2 are bias vectors of the hidden and output layers, respectively. kb referred to the initiation purpose.

Both academic writing and real-world applications frequently use the activation functions of the tan–sigmoid and log–sigmoid. The equations in question have been denoted as Eqs. (2, 3) collectively:

$${h}_{b}\left(T\right)=\frac{1}{1+{\text{exp}}(-T)}$$
(2)
$${h}_{b}\left(T\right)=\frac{{\text{exp}}\left(T\right)-{\text{exp}}(-T)}{{\text{exp}}(T)+{\text{exp}}(-T)}$$
(3)

\(T\) denotes the input activation function.

2.3 Dynamic control cuckoo search (DCCS)

The Cuckoo birds are renowned for their magnificence, likely due to their forceful reproductive method or their melodious vocalizations (Yang and Deb 2009; Mareli and Twala 2018; Nguyen et al. 2016). Certain species of birds lay their eggs in shared nests, allowing them to selectively abandon some of the eggs in order to increase the likelihood of successfully hatching their own. Some bird species exhibit obligatory brood parasitism, where they deposit their eggs in the nests of other bird species, leading to a boost in their reproductive success. To simplify the identification of the standard DCCS algorithm, three idealized rules are utilized:

  1. 1.

    The nests that hold excellent eggs are preserved to ensure their availability for future generations.

  2. 2.

    Every individual bird lays a singular egg that is subsequently placed into any nest at random.

  3. 3.

    The quantity of host nests that currently prevail remains steady, and the probability of a host bird detecting a cuckoo’s egg is denoted as qb, which is a value that falls within the range of 0 and 1.

The final conjecture can be approximated through the introduction of a proportion pa of alternative nests (containing new and distinct solutions) among the set of n host nests.

One way to express the local random walk is as Eq. (4):

$${x}_{i}^{r+1}={x}_{i}^{r}+{\text{as}}\otimes F\left({q}_{b}-\delta \right)\otimes ({x}_{j}^{r}-{x}_{{\text{l}}}^{r})$$
(4)

\({x}_{j}^{r}\) and \({x}_{{\text{l}}}^{r}\) are different answers selected randomly using a permutation. \(F(u)\) is a Heaviside-related function. \(\delta\) is a value chosen at random from a regular distribution. \(as\) demonstrates the step’ s magnitude. ⊗ is the vectors’ entry-wise output.

To execute the global random walk, Lévy flights are used:

$${x}_{i}^{t+1}={x}_{i}^{t}+a L(t,\tau )$$
(5)

\(\alpha > 0\) is the scaling factor for step size, and

$$L\left(t,\tau \right)=\frac{\tau\Gamma \left(\tau \right){\text{sin}}\left(\frac{\pi \tau }{2}\right)}{\pi }\frac{1}{{t}^{1+\tau }} , \left(t" {t}_{0}>0\right)$$
(6)

The initial solution was developed regarding the succeeding:

$$x={\text{bL}}+({\text{bU}}-{\text{bL}})\times {\text{rand}}({\text{size}}\left({\text{bL}}\right))$$
(7)

A random number generator with a uniform distribution between \(0\) and \(1\) is referred to as a rand. The jth nest’s highest range is represented by bU, and its lowest range by bL.

Figure 1 shows the flowchart of DCCS combined with MLP.

Fig. 1
figure 1

Scheme of DCCS coupled with MLP

2.4 Bonobo optimizer (BO)

The BO is an advanced meta-heuristic algorithm that draws inspiration from the reproducible techniques and societal conduct of Bonobos. Das et al. (2019) created a population-oriented version of the BO algorithm. In essence, bonobos are divided into smaller groups, termed fission, to search for food and rest during the night. The incorporation of the method into the BO algorithm aimed to improve the search process’s efficiency. The algorithm examines the natural techniques and solutions used by bonobos as it has attained the optimal level of response. The Bonobo species uses four distinct methods to reproduce and create new bonobos, including extra-group mating, promiscuous, restrictive, and consortship (Das et al. 2019). The mating strategies may experience modification contingent upon the prevailing phase circumstance, whether negative (NP) or positive (PP). The current study clarifies the relationship between the PP state and the bonobo community under circumstances that provide enough food availability, genetic variety among the bonobos, advantageous mating results, and defense against potential threats from neighboring populations. Conversely, the NP denotes a detrimental circumstance within the broader societal context.

2.4.1 Promiscuous and restricted mating methods

The stage probability parameter, denoted as mm, corresponds to the mating behavior exhibited by the Bonobo species. Initially, the value of the variable representing the amount of mm is set at a value of 0.5, with incremental upgrades occurring throughout successive iterations. According to Eq. (8), the value is deemed to be identical or lower than mm.

$$n\_{{\text{bo}}}_{j}={{\text{bo}}}_{j}^{i}+{q}_{1}{t}^{a}\left({a}_{j}^{{\text{bo}}}-{{\text{bo}}}_{j}^{i}\right)+(1-{q}_{1}){t}^{t}{\text{flag}}({{\text{bo}}}_{j}^{i}-{{\text{bo}}}_{j}^{m})$$
(8)

bo = bonobo. \(n\_{{\text{bo}}}_{{\text{j}}}\) and \({a}_{j}^{{\text{bo}}}\) are the jth the variables of the new progenies. j is a variable number between zero and one. e stands for the number of variables. In q1, a random value between \(0\) and \(1\) is determined. \({{\text{bo}}}_{j}^{i}\) and \({{\text{bo}}}_{j}^{m}\) determine the values of the bonobos’ ith and mth associated variables, respectively. \({t}^{a}\) and \({t}^{t}\) are called division coefficients for \({a}^{{\text{bo}}}\) and mth bonobos, respectively.

When the ith bonobo achieves a more favorable outcome compared to the mth bonobo, it results in promiscuous mating. The flag has been denoted with the number 1 under these circumstances. Alternatively, in order to restrict pairing, a value of −1 is attributed to the \({a}^{{\text{bo}}}\) combination.

2.4.2 Methods of mating extra-group and consort ships

If the value of the part mm is lower than that of \(q\), the occurrence of these mating types will ensue. Alternatively, if the value of \({q}_{2}\) is equivalent to or lower than the probability of extra-group mating for the given group (\({m}_{{\text{xgm}}}\)), the resultant outcome will involve a bolstering of the solution through the process of extra-group mating.

$$\left\{\begin{array}{cc}{{\text{bo}}}_{j}^{i}+{e}^{\left({q}_{3}^{2}+{q}_{3}-2{q}_{3}^{-1}\right)\left({\text{Va}}{{\text{r}}\_}_{{{\text{max}}}_{j}}-{{\text{bo}}}_{j}^{i}\right) }& {\alpha }_{j}^{{\text{bo}}}\ge {{\text{bo}}}_{j}^{i}\\ {{\text{bo}}}_{j}^{i}-{e}^{\left(-{q}_{4}^{2}+2{q}_{4}-2{q}_{4}^{-1}\right)\left({{\text{bo}}}_{j}^{i}-{\text{Va}}{{\text{r}}\_}_{{{\text{min}}}_{j}}\right) }& {q}_{3}\le {m}_{e}\\ {{\text{bo}}}_{j}^{i}-{e}^{\left({q}_{3}^{2}+{q}_{3}-2{q}_{3}^{-1}\right)\left({{\text{bo}}}_{{\text{j}}}^{{\text{i}}}-{\text{Va}}{{\text{r}}\_}_{{{\text{min}}}_{j}}\right) }& {\alpha }_{j}^{{\text{bo}}}\le {{\text{bo}}}_{j}^{i}\\ {{\text{bo}}}_{j}^{i}+{e}^{\left(-{q}_{4}^{2}+2{q}_{4}-2{q}_{4}^{-1}\right)\left({\text{Va}}{{\text{r}}\_}_{{{\text{max}}}_{j}}-{{\text{bo}}}_{j}^{i}\right) }& {q}_{3}\ge {m}_{e}\end{array}\right.$$
(9)

The initial value of the me was set to 0.5, followed by gradual improvements aligned with the evolution’s inherent characteristics. This process enhances the search algorithm to produce the most promising results. The present study utilizes the notation \({\text{Var}}_{{{\text{ - min}}_{j} }}\) and \({\text{Var}}_{{{\text{ - max}}_{j} }}\) to represent the lower and upper limits of the jth variable correspondingly.

Alternatively, in certain scenarios, the application of the consortship mating technique leads to the production of a distinctive progeny, wherein the quantity of \(q_{2}\) exceeds that of \(m_{{{\text{xgm}}}}\), in accordance with Eq. (10):

$$n\_{\text{bo}}_{j} = \left\{ {\begin{array}{ll} {n\_{\text{bo}}_{j} + e^{{q_{5} }} {\text{flag}}\left( {1 + q_{1} } \right)\left( {{\text{bo}}_{j}^{i} - {\text{bo}}_{j}^{m} } \right)} & {q_{6} \le m_{e} } \\ {{\text{bo}}_{j}^{m} } & {{\text{otherwise}}} \\ \end{array} } \right.$$
(10)

In the present set of equations, the variables \(q_{1} ,q_{2} ,q_{3} ,q_{4} ,q_{5}\) are defined as random quantities that assume values within the range of zero and one. The schematic representation of the proposed algorithm coupled with MLP is depicted in Fig. 2.

Fig. 2
figure 2

Scheme of BO coupled with MLP

2.5 Smell agent optimization (SAO)

The sense of smell plays a critical role in upholding the world from its onset. The majority of living organisms detect the existence of noxious substances in their surroundings through their olfactory receptors (Buck 2004; Sakalli et al. 2020; Axel 2005). Incorporating the human sense of smell into the development of SAO is a normal occurrence (Axel 2005; Chapman and Cowling 1990; Abdechiri et al. 2013). The SAO’s overarching structure is determined by three modes, primarily derived from the smell perception steps. Firstly, the olfactory agent detects the olfactive molecules, evaluates their spatial position, and subsequently discerns a course of action; to either pursue the origin of the scent or disregard it. Secondly, the agent follows the scent particles in its quest to locate the source of the odor, relying on its previous modes of decision-making. Finally, the final mode of operation serves to impede agents from becoming entrapped within localized minima, hence safeguarding the agent’s ability to maintain its trail.

2.5.1 Sniffing mode

The initial stage of the process involves randomly identifying a location for the commencement of diffusion of odor molecules towards the agent, given that olfactory molecules possess a tendency to propagate in the direction of their target. The smell molecules can be initialized through the utilization of the mathematical formula designated as Eq. (11):

$$x_{i}^{\left( t \right)} = \left[ {\begin{array}{*{20}c} {x_{{\left( {1,1} \right)}} } & {x_{{\left( {1,2} \right)}} } & {x_{{\left( {1,D} \right)}} } \\ . & . & . \\ {x_{{\left( {N,1} \right)}} } & {x_{{\left( {N,2} \right)}} } & {x_{{\left( {N.D} \right)}} } \\ \end{array} } \right]$$
(11)

\(N\) is the overall number of scent molecules. \(D\) stands for the total number of the decision variables.

The position vector in Eq. (11) can be produced by applying Eq. (12), which allows the agent to choose its own optimal location inside the search space.

$$x_{i}^{\left( t \right)} = {\text{lb}}_{i} + r_{0} \times \left( {{\text{ub}}_{i} - {\text{lb}}_{i} } \right)$$
(12)

The terms “\({\text{ub}}\)” and “\({\text{lb}}\)” denote the upper and lower bounds established as a result of the variables of decision, correspondingly. r0 refers to a random value from zero to one.

Every individual scent molecule is allocated a fundamental velocity with which it diffuses from the smell source through the application of Eq. (13).

$${\upupsilon }_{i}^{\left( t \right)} = \left[ {\begin{array}{*{20}c} {{\upupsilon }_{{\left( {1,1} \right)}} } & {{\upupsilon }_{{\left( {1,2} \right)}} } & {{\upupsilon }_{{\left( {1,D} \right)}} } \\ . & . & . \\ {{\upupsilon }_{{\left( {N,1} \right)}} } & {{\upupsilon }_{{\left( {N,2} \right)}} } & {{\upupsilon }_{{\left( {N.D} \right)}} } \\ \end{array} } \right]$$
(13)

Every individual smell molecule serves as a potential indication of a potentially viable solution. The locus of potential explanations is derived from the situation vector \(x_{i}^{\left( t \right)} \in R^{N}\), as demonstrated in Eq. (11), and the molecular rapidity \({\upupsilon }_{i}^{\left( t \right)} \in { }R^{N}\), as articulated in Eq. (13). The velocity of the molecules is increased through the implementation of Eq. (14):

$$x_{i}^{t + 1} = x_{i}^{\left( t \right)} + {\upupsilon }_{i}^{t + 1} \times \Delta t$$
(14)

The expression \(\Delta t = 1\) denotes that the agent advances along the path of the optimization process in a parallel manner. The spatial coordinates of smell molecules are determined using Eq. (15):

$$x_{i}^{t + 1} = x_{i}^{\left( t \right)} + {\upupsilon }_{i}^{t + 1}$$
(15)

Each smell molecule possesses unique diffusion velocities, which facilitate its positional updates and evaporation during the process of scent analysis. The calculation of the revised velocity of scent molecules is accomplished through the utilization of Eq. (16):

$${\upupsilon }_{i}^{t + 1} = {\upupsilon }_{i}^{\left( t \right)} + {\upupsilon }$$
(16)

The update variable for velocity, denoted by \({\upupsilon }\), is obtained by applying Eq. (17):

$${\upupsilon } = r_{1} \times \sqrt{\frac{3KT}{m}}$$
(17)

The term \(k\) denotes the smell fixation factor that serves to normalize the impact of both mass and temperature on the scent molecules’ kinetic energy.

The mass and temperature of the scent molecules are denoted by \(m\) and \(T\), respectively.

Equation (15) is used to assess the scent molecule’s fitness at the updated sites.

Consequently, the process of sniffing has been accomplished, thereby enabling the determination of the precise location of the agent denoted as \(x_{{{\text{agent}}}}^{t}\).

2.5.2 Trailing mode

In the second operational mode, the simulated behaviors of an agent are directed toward the determination of the source of a particular smell. During the process of searching for a smell source, the agent can detect a new location that has a higher concentration of smell molecules through smell perception. The agent seizes the opportunity to explore the novel location by utilizing Eq. (18):

$$\begin{aligned} x_{i}^{t + 1} = & x_{i}^{\left( t \right)} + r_{2} \times {\text{olf}} \times \left( {x_{{{\text{agent}}}}^{t} - x_{i}^{\left( t \right)} } \right)\\ & - r_{3} \times {\text{olf}} \times \left( {x_{{{\text{worst}}}}^{t} - x_{i}^{\left( t \right)} } \right)\end{aligned}$$
(18)

r2 and r3 are numbers from zero to one. r2 penalizes the stimulus of \({\text{olf}}\) on \(x_{{{\text{agent}}}}^{t}\) and r3 penalizes the effect of the \({\text{olf}}\) on \(x_{{{\text{worst}}}}^{t}\).

The agent records \(x_{{{\text{agent}}}}^{t}\) and the \(x_{{{\text{worst}}}}^{t}\) gained from the mode of smelling. The aforementioned data play a pivotal role in the algorithm’s capacity to establish a state of equilibrium between exploitation and exploration, as denoted by Eq. (18).

2.5.3 Random mode

Smell molecules may differ in intensity over time if there is a considerable segmentation in their distance from one another. This variance might confuse the agent, which would cause the odor to disappear and make tracking extremely difficult. Due to its incapacity to remember trailing information, the agent is prone to being stuck in local minima. When the previously described situation occurs, when the agent enters the random mode, as shown by Eq. (19),

$$x_{i}^{t + 1} = x_{i}^{\left( t \right)} + r_{4} \times {\text{SL}}$$
(19)

Step length is denoted by SL. r4 is a random number that stochastically penalizes the quantity of SL.

Figure 3 indicates the scheme of SAO coupled with MLP.

Fig. 3
figure 3

Scheme of SAO coupled with MLP

2.6 Methods for evaluating performance

A range of evaluators were employed to appraise the application of hybrid models in predicting the USS’s value. The coefficient of determination (R2), Root mean Square Error (RMSE), Normalized Root mean square error (NRMSE), and n10_index are among the performance measures that are examined in this study. By calculating the difference between expected and actual values, these metrics are frequently used in statistical analysis to assess the precision and accuracy of models. The coefficient denoted as R2 is indicative of the extent of linear correlation between the actual and anticipated quantities. The RMSE is the square root of the average squared difference between estimated and actual values. In contrast, NRMSE normalizes RMSE by dividing it by the range of observed values, offering a dimensionless metric for more interpretable model accuracy assessment, especially across datasets with varying scales. Additionally, the n10_index is a performance metric that evaluates the percentage of predicted values falling within a factor of 10 of the corresponding actual values.

Equations (2023) provide the values of these metrics as stated above.

$$R^{2} = \left( {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {b_{i} - \overline{b}} \right)\left( {d_{i} - \overline{d}} \right)}}{{\sqrt {\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {b_{i} - \overline{d}} \right)^{2} } \right]\left[ {\mathop \sum \nolimits_{i = 1}^{n} \left( {d_{i} - \overline{d}} \right)^{2} } \right]} }}} \right)^{2}$$
(20)
$${\text{RMSE}} = \sqrt {\frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {d_{i} - p_{i} } \right)^{2} }$$
(21)
$${\text{NRMSE}} = \frac{{\sqrt {\frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} \left( {d_{i} - b_{i} } \right)^{2} } }}{{\frac{1}{n}\mathop \sum \nolimits_{i = 1}^{n} (b_{i} )}}$$
(22)
$$n10 - {\text{index}} = \frac{n10}{n}$$
(23)

The sample numbers are denoted by \(n\), the experimental value is shown by bi, the predicted value is shown by di, the experimental amount’s mean is represented by \(\overline{b}\) and the predicted value’s mean is denoted by \(\overline{d}\).

3 Results and discussion

3.1 Hyperparameters and convergence curves

Hyperparameters are predefined settings in ML that significantly impact a model’s performance and optimization. This study focuses on only one crucial hyperparameter (Neuron), and its effects on the MLP model’s performance, detailed in Table 2. The research underscores the importance of carefully fine-tuning this hyperparameter for optimal results. This involves a deliberate and judicious selection and adjustment process to achieve precision and robust generalization while mitigating issues like overfitting and prolonged training.

Table 2 The results of hyperparameters for MLP

This study examined how well nine models (three models in three layers) predicted USS by tracking their convergence progress using RMSE as the selected metric. The results of these evaluations are showcased in Fig. 4. Models had initial RMSE values of just under 200 for MLSA3 to just above 650 for MLBO1. As the convergence process proceeded, there was a conspicuous decrease in RMSE values for all models. Considering the models’ performance during the iterations 0 to 150, it became evident that the MLDC3 model experienced the best performance and finished its convergence process with almost 50 for RMSE value.

Fig. 4
figure 4

The USS hybrid models’ output convergence curve

3.2 Comparing models’ performance

In the current work, nine different hybrid model types were created to forecast the USS utilizing MLP in conjunction with the DCCS, BO, and \(SAO\) algorithms. A training set and a test set, comprising \(70\%\) and \(30\%\) of the data, respectively, are separated out of the data in these hybrid models. Four statistical measures (R2, RMSE, NRMSE, and n10_index) were used to obtain a comprehensive assessment of the effectiveness of the deployed optimizers. The results are shown in Table 3. This section determines which model is better than the other by analyzing statistical metrics.

Table 3 Developed assessment results of models by evaluators

The R2 for first layer models clearly shows that, with \(0.9816\) and \(0.9610\) in the training and testing phases, respectively, MLDC1 achieves the greatest results. When RMSE, NRMSE, and n10_index are compared as three different evaluator types during the training and testing phase, MLDC1 has the lowest mistakes.

Moving on to the second layer, MLDC2 is related to the maximum values of R2 = 0.9906 and R2 = 0.9783 in the training and testing sections. With scores of \(102.13\) and \(142.44\) in the RMSE training and testing phases, respectively, that model (MLDC2) has the best performance. Moving on to the third layer, the MLDC3 shows the best results for all four criteria, indicating that it may be used to estimate USS value.

It is clear from comparing all nine models that the maximum values of R2, RMSE, NRMSE, and n10_index for MLDC3, MLBO1, MLBO1, and MLDC3 are, respectively, 0.9949, 257.79, 1.8414, and 0.8143 during the training phase. Furthermore, for MLDC3, MLBO2, MLBO2, and MLDC3, the maximum values of R2, RMSE, NRMSE, and n10_index in the testing portion are 0.9905, 244.99, 4.0833, and 0.8333, respectively. None of the nine models have received adequate training since the R2 values for each model in the test section are less than those in the train section. All things considered, out of the nine hybrid models, MLDC3 has the top ratings from four evaluators. It can therefore be applied appropriately in practical settings.

Figure 5 displays fragmented presentations that illustrate the relationship between the measured and anticipated values of the USS. The two assessment sets that the numerical data in question relates to are RMSE and R2. By acting as a distributed controller, RMSE causes an increase in density when its evaluation metric’s value decreases. In addition, the Test and Train data points are moved along the central axis by the R2 evaluator. Numerous other variables are included in this picture, such as the centerline at position \(Y = X\), a linear regression model, and two lines that are drawn at \(Y = 0.9X\) and \(Y = 1.1X\), respectively, below and above the centerline. The incorrect prediction of values that are greater than or less than the actual values results from the intersections of the line’s upper and lower ends, respectively.

Fig. 5
figure 5

The expected and measured values’ scatter plot

In this investigation, the LSSVR model was combined with three different optimizer strategies that were applied throughout the training and testing stages to create three unique models. The results of this investigation are shown in Fig. 5. Given the uniform directionality and closeness of the data points to the centerline, the R2 of MLDC3 seems to be more favorable than that of the other models. The empirical data showed that the precision of the Train phase values withdrew from that of the test phase in every case, but it was especially apparent in the setting of MLSA2. Overall, the results show that the MLDC3 hybrid model, which combines the MLP approach with the third-layer SAO optimizer, produced the best results in terms of R2 and RMSE during the learning and validation stages when the data from Fig. 5 is taken into account. The reason for this discovery is that the model performs better in R2 and can make smaller mistakes.

The agreement between the measured and anticipated USS values for each of the three hybrid model categories is evaluated in Fig. 6. The two separate sections of the diagrams are the ones devoted to validating and learning models. When an MLP is used in conjunction with a DCA method, the projected USS of hybrid models in the first layer MLDC1 shows superior conformity with evaluated testing and training data. Moving on to the second and third layers, MLDC2 and MLDC3 have the best accordance status in the Train and Test models, respectively. Conversely, the combination of MLP and BBO in the third layer, or MLBO3, clearly displays the least favorable agreement status of models.

Fig. 6
figure 6

The line-symbol diagram comparing the measured and expected USS

Figure 7 shows the discrepancies between the measured USS values for three different hybrid model types in \(3\) layers with nine models total. This figure shows that the training and testing sets of models’ maximum errors for MLBO2 are between more than −\(60\%\) and \(150\% .\) Conversely, in both the train and test models, minimum error values are associated with MLDC3, where the majority of errors are concentrated in the range of \(- 50\%\) to \(90\% .\)

Fig. 7
figure 7figure 7

Based on a time series graphic, the models’ error percentages are displayed

The distribution of errors in the anticipated USS values is shown by the box plot in Fig. 8. It is clear that all \(3\) models have a graph with a sharp state during the training phase of the MLP model, which indicates greater dispersion; however, during the Test phase, these curves have taken on a flatter shape, indicating that the data has become less accurate and more dispersed. Therefore, all of the created models in this layer have typically been trained incorrectly. In the test and train phases, MLDC2 had the best prediction performance in the second layer. Nearly every result in the third layer is in the range of \(0\)% in MLDC3, which performs the best out of all the \(9\) hybrid models.

Fig. 8
figure 8

The error % of created hybrid models using the half-box normal plot

4 Conclusion

To determine the soil’s USS, a multi-layer perceptron (MLP) model was employed in the current study. Even though the results obtained using the traditional method were effective, they had several shortcomings. The laboratory method is considered inefficient in terms of time management and involves significant costs. Artificial intelligence (AI) was used in place of the software-based technique to overcome the aforementioned constraints. It was noted that the system’s estimation of the USS showed a high degree of accuracy. The created models used four input variables OBW, PL, LL, and SF with the goal parameter USS to achieve the intended outcome. The correctness of the suggested models was evaluated using five different performance indicators in this study: R2, RMSE, NRMSE, and n10_index. The study used three different meta-heuristic optimisation techniques BO, DCCS, and SAO to improve the system’s performance. It can be inferred and calculated from the results that:

The characteristics that were researched were used to construct prediction models for the USS estimation. The suggested models demonstrated a good degree of accuracy in forecasting the USS when compared to the experimental data.

  • In MLDC1, MLBO1, MLSA1, MLDC2, MLBO2, MLSA2, MLDC3, MLBO3, and MLSA3, respectively, the sprinkling value of the projected data in the Test phase compared to the Train phase dropped by 2.1, 1.54, 2.26, 1.24, 0.94, 2.65, 0.42, 3, and 0.94%.

  • The models suggested may understate the USS by an average of about 160, according to the contrast of measured and anticipated values. In the learning phase, the MLBO1 had the largest error in the RMSE(257.9), while in the validation phase, the MLBO2 had the largest error (249.99); the MLDC3 had the lowest error (73.47).