1 Introduction

Swelling and collapsing upon wetting, water retention, and infiltration are vital behaviours of soils, especially in arid regions. Soil suction, which reflects the ability of the soil to attract and retain water, is a critical stress state variable that influences the strength, deformation, and hydraulic behaviour of unsaturated soils. The term total suction describes the combined effect of all the mechanisms that influence the water retention behaviour of soil. Total suction or total soil–water potential comprises several components, including matric, osmotic, gravitational, pneumatic, and piezometric potential. However, in the absence of externally applied gradients and under isothermal conditions, the matric and osmotic components are sufficient to describe the soil–water potential of unsaturated soils (Bulut and Wray 2005; Yong and Warkentin 1975; Yong 1999). Matric suction arises due to adsorptive and capillary forces in the soil matrix, whereas osmotic suction is caused by salts or contaminants in the soil pore water. Since matric suction is the most significant component of total suction, changes in the total suction of soil are usually caused by variations in matric suction (Agus et al. 2001; Arifin and Schanz 2009; Nam et al. 2010; Ng and Menzies 2007; Malaya and Sreedeep 2012).

Soil suction can be measured directly or indirectly in the laboratory or the field using various techniques and instruments. In direct measurements, the water and air phases are separated by a ceramic cup or disk, and the negative pore water pressure is measured directly. The suction plate, pressure plate, and tensiometer methods are examples of the direct measurement of matric suction (Agus and Schanz 2005). Indirect techniques are generally based on measuring relative humidity in a climate with a moisture balance between the soil and the environment and estimating suction using the Kelvin equation. Indirect suction measurement techniques include the thermocouple, transistor, or chilled-mirror psychrometer methods, the filter paper method, the thermal conductivity sensor technique, the evaporation method (HYPROP), and the electrical conductivity sensor technique (Woodburn et al. 1993; Leong et al. 2003; Fredlund and Wong 1989; Likos and Lu 2002; Erzin 2007; Satyanaga et al. 2019). The suction of a given soil varies with its moisture conditions. Suction is zero in a fully saturated state but can exceed 1 GPa in a completely dry state. The soil–water characteristic curve (SWCC) describes the relationship between water content and suction in the region between the saturated and dry states for a given soil (Zhai et al. 2019). Each instrument or technique used to measure suction has some limitations in terms of reliability, practicality, or cost, and none of them can individually provide satisfactory measures for the entire suction or moisture range (Guan 1996; Vanapalli et al. 1999; Rahardjo and Leong 2006; Bulut and Leong 2008). The WP4-T dew point potentiometer provides satisfactory results in the high suction range region of the SWCC while reducing the time and costs associated with suction measurements (Ebrahimi-Birang and Fredlund 2016).

Several mathematical models (Brooks and Corey 1964; van Genuchten 1980; Williams et al. 1983; McKee and Bumb 1987; Fredlund and Xing 1994) have been developed to simulate the SWCC of a particular soil. On the other hand, these models have only been tested in certain soils and are only valid for specific suction ranges; for example, the Brooks and Corey, van Genuchten, McKee, and Bumb models are unsuitable for high suction ranges (Zapata 1999). Nam et al. (2010) observed that the Fredlund and Xing (1994) and van Genuchten (1980) models provide nearly identical SWCCs, except at high suction values. According to Lu and Khorshidi (2015), while many empirical SWCC models have been developed to cover the entire suction range, matric suction data greater than 10 MPa is limited; thus, the validity of these models in the high suction range is uncertain.

On a semi-logarithmic scale, the SWCC is typically S-shaped and consists of three zones separated by specific limit values. These limits can be expressed as saturated state water content, air entry value (AEV), and residual water content. The region between the saturated water content and the AEV is referred to as the capillary saturation zone; the region between the AEV and the residual water content is referred to as the desaturation zone or transition zone; and the region between the residual water content and the completely dry state is referred to as the residual saturation zone. The fundamental mechanisms influencing the water retention behaviour in these three regions are distinct. In the transition zone (the region dominated by capillary water), pore water is retained primarily as capillary menisci located between soil particles; consequently, capillary forces are effective, and suction is mainly a function of pore size distribution (Lu and Likos 2006; Zhai et al. 2018, 2021). In the residual zone (the adsorbed water-dominant region), the majority of pore water is retained as hydration on the particle surfaces; adsorptive forces are more critical in this region, and the effect of soil structure on the SWCC is negligible (Lu and Likos 2006; Malaya and Sreedeep 2012; Fang et al. 2022). In other words, different mechanisms control the moisture–suction relationship of soils over different suction ranges. It is difficult to explain the entire suction range using a model that only focuses on capillary mechanisms and ignores soil physicochemical interactions, especially in cohesive soils (Guo et al. 2021). Salager et al. (2013) indicated that the retention curves corresponding to different densities of the same soil converge and become unique after a specific suction value is exceeded in the suction versus water content plane. Chen et al. (2022) also demonstrated that at high suction levels, the SWCCs in terms of water content versus initial void ratio are independent of the initial void ratio. It can be deduced that the region where the void ratio or density changes do not affect suction is located within the SWCC’s residual zone.

The validity of most current prediction models for soils under low moisture conditions — typical for arid and semi-arid regions — is unclear. Models are needed to reliably estimate SWCC under these moisture conditions, particularly in cohesive soils. The present study attempted to develop prediction models for the residual saturation zone of the SWCC in cohesive soils based on the soils’ physical, chemical, and spectral properties. For this purpose, 162 gravimetric water content and total suction (> 10 MPa) measurements for 40 different soils were investigated. The SWCC at large suctions was simulated using a semi-logarithmic linear model with two parameters: the slope of the residual saturation zone of the SWCC (\({s}_{r}\)) and suction at zero water content (\({\psi }_{dry}\)). Correlation and regression analyses were conducted between these parameters and several soil properties. As a result of these analyses, empirical estimation models for the SWCC residual saturation zone in cohesive soils were developed. The estimation capabilities of the proposed models were evaluated using different performance measures and then discussed.

2 Material and methods

2.1 Some physical, chemical, and spectral properties of soils

The current study utilised experimental data obtained from a comprehensive survey by Demirci (2010). Forty soil samples for which sufficient suction and water content measurement data existed to establish a reliable relationship at high suction levels (> 10 MPa) were selected for evaluation. These soils had plasticity index values ranging from 9.5 to 70.1 and fine contents ranging from 46.8 to 97.2%. The soil sampling sites were located in the Turkish cities of Tokat (S1 to S6), Sivas (S7), Mersin (S8), Adana (S9), Burdur (S10 and S18), Antalya (S15 and S16), Mugla (S11 to S14), Isparta (S17), Afyon (S19, S20, and S22), Kutahya (S21), Ankara (S23 to S31), Istanbul (S32 to S38), and Kirklareli (S39 and S40), as shown in Fig. 1.

Fig. 1
figure 1

Sampling sites of soils

The WP4-T dew point potentiometer was used to measure the total suction of the soils under specific moisture conditions following the ASTM D6836-02 D protocol. The ASTM D4318-10 standard was followed to determine the consistency limits of the soils. Mechanical sieving (following ASTM D422-63) and laser diffraction (Malvern Instruments Mastersizer 2000) were used to determine the grain size distributions of the soil samples. The specific surface area (SSALD) values were calculated from the grain size distributions using Malvern Mastersizer software. The surface area activity (ASSA) values were calculated by dividing the specific surface area by the clay content. The free swell index (FSI) values of the soil samples were determined using the procedure described by Sridharan and Rao (1988). The carbonate content of the soil samples was determined using the method described by Loring and Rantala (1992). The electrical conductivity (EC) values were measured using a conductivity meter (Delta OHM, model HD 2106.1), and the pH values were measured with a pH/mV meter (Delta OHM HD 2105.1). The soils’ cation exchange capacity (CEC) values were determined using the sodium acetate method (Bower et al. 1952). Following this method, the soil was washed with sodium acetate, soluble salts were extracted with ethyl alcohol, and the sodium produced by washing with ammonium acetate was quantified (Azadi and Baninemeh 2022). The specific surface charge (SSC) was calculated as the ratio of CEC/SSALD (Pesch et al. 2022). The reflectance properties of the soil samples were measured in the laboratory using an ASD field spectrometer. The spectral absorption feature characteristics (asymmetry and depth of an absorption band) were determined according to procedures outlined by van der Meer (2004). The abovementioned textural, physical, and chemical soil properties of cohesive soils were used as explanatory variables to develop prediction equations that define the residual saturation zone of the SWCC. Table 1 presents the descriptive statistics, such as the standard deviation, minimum and maximum values, and quartiles for some of the textural, physical, and chemical properties of the 40 cohesive soils examined in this study.

Table 1 Descriptive statistics of selected textural, physical, and chemical properties of soil samples

2.2 Statistical analysis

Multiple linear regression is an analysis method that uses observed data to develop a prediction model for a dependent variable by describing the relationship between the dependent variable and explanatory variables using a linear equation. This study used stepwise multiple linear regression to develop prediction models for the residual saturation zone. Stepwise regression is an iterative procedure in which the possible explanatory variables to be used in a prediction model are added to or removed from the model in each iteration based on the test statistics and statistical significance of the coefficients (Hosseinpour et al. 2018). Prediction models were developed based on the results of stepwise regression analysis, as shown in Eq. 1:

$$Y={\beta }_{0}+{\beta }_{1}\cdot {X}_{1}+{\beta }_{2}\cdot {X}_{2}+\dots +{\beta }_{n}\cdot {X}_{n}$$
(1)

where \(Y\) is the dependent variable; \({\beta }_{0}\), \({\beta }_{1}\),…, \({\beta }_{n}\) are regression coefficients; and \({X}_{1}\), \({X}_{2}\), …,\({X}_{n}\) are explanatory variables.

One of the most essential aspects of the predictive stability of a regression model is the presence of multicollinearity. A high correlation between two or more explanatory variables is defined as multicollinearity. Multicollinearity between explanatory variables in a regression model can lead to unstable model estimates. The variance inflation factor (VIF) is a common indicator used to assess the multicollinearity of explanatory variables in a model. A value of 1 indicates that the predictive variables are independent of one another, while a value greater than 5 indicates that the variables are highly correlated (Daoud 2017; Shrestha 2020). In this study, the upper limit for the VIF value was set at 3.

Validation of a regression model is another crucial aspect of predictive stability. The two main methods for validating regression models are external validation and internal validation (cross-validation). External validation is achieved by collecting, analysing, and comparing new and previous data (Lucko et al. 2006). Cross-validation is a variable selection technique used to evaluate prediction models with a small data sample, and it is an effective method of model validation when collecting new data is impractical (Snee 1977). In this study, cross-validation of the regression models was performed using the K-fold procedure. The dataset was divided into four random and equal parts. A candidate model was trained on three subsets of the data, and its prediction performance was evaluated on a test set containing the remaining data. The model training and testing procedure was repeated to provide each of the four parts as a test set. The results from all four processes were combined to create a single validation set, and the cross-validated performance of the predictive model was evaluated by the performance measurements on the validation set (Jung and Hu 2015; Li et al. 2020; Liu et al. 2021). SPSS (version 25.0) software was used to perform the statistical analysis.

2.3 Performance metrics

In this study, the predictive performance of models was evaluated using six distinct metrics: the coefficient of determination (R2), the mean absolute error (MAE), the root mean square error (RMSE), the mean absolute percentage error (MAPE), the ratio of performance to deviation (RPD), and the ratio of performance to interquartile range (RPIQ). These metrics were calculated using the following equations:

$${R}^{2}={\left(\frac{\sum \left( {Y}_{i}-\overline{{Y }_{i}}\right)\left( {Y}_{i}^{*}-\overline{{Y }_{i}^{*}}\right)}{\sqrt{\sum {\left( {Y}_{i}-\overline{{Y }_{i}}\right)}^{2} \sum {\left( {Y}_{i}^{*}-\overline{{Y }_{i}^{*}}\right)}^{2}}} \right)}^{2}$$
(2)

where \({Y}_{i}\) is the actual value of the dependent variable, \(\overline{{Y }_{i}}\) is the average of the actual values of the dependent variable, \({Y}_{i}^{*}\) is the predicted value of the dependent variable, and \(\overline{{Y }_{i}^{*}}\) is the average of the predicted values of the dependent variable.

$$MAE=\frac{\sum \left|{Y}_{i}-{Y}_{i}^{*}\right|}{n}$$
(3)

where n is the number of observations.

$$RMSE=\sqrt{\frac{\sum {\left({Y}_{i}-{Y}_{i}^{*}\right)}^{2}}{n}}$$
(4)
$$MAPE=\frac{\sum \left|\frac{{Y}_{i}-{Y}_{i}^{*}}{{Y}_{i}}\right|}{n}$$
(5)
$$RPD=\frac{SD}{RMSE}$$
(6)
$$RPIQ=\frac{IQ}{RMSE}=\frac{{Q}_{3}-{Q}_{1}}{RMSE}$$
(7)

The MAE determines the magnitude of the mean errors between the predicted and actual values, regardless of the direction of the errors. In the case of large error values, the model’s prediction performance may not be accurately reflected. Unlike the MAE, which provides equal weight to all errors, the RMSE accounts for variance by giving greater weight to larger error values. The MAPE helps evaluate the performance of predictive models since it is independent of the sizes of the observed and anticipated values. Lewis (1982) categorises a model’s predictive capacity based on its MAPE values as highly accurate forecasting (10%), good forecasting (10 to 20%), reasonable forecasting (20 to 50%), or inaccurate forecasting (> 50%). The RPD is another metric used to assess prediction performance; higher values indicate the model’s robustness. If the RPD value is less than 1.5, the model lacks the ability to predict; if it is between 1.5 and 2, the model can distinguish between high and low values; if it is between 2 and 2.5, the model can be used for quantitative predictions; if it is between 2.5 and 3, the model has good estimation ability; and if it is greater than 3, the model has excellent predictive ability (Saeys et al. 2005). Ludwig et al. (2017) adapted the classification developed by Saeys et al. (2005) for RPD values to categorise the prediction performance of a model based on RPIQ values. According to Ludwig et al. (2017), if the RPIQ value is between 2.02 and 2.70, the model can distinguish between high and low values; if it is between 2.70 and 3.37, the model provides approximate quantitative predictions; if it is between 3.37 and 4.05, the model has good estimation ability; and if it is greater than 4.05, the model has excellent predictive power.

The Wilcoxon signed-rank sum test was employed in addition to the performance indicators mentioned above to evaluate the statistical significance of the differences between the actual and predicted values. The Wilcoxon signed-rank test is a non-parametric test used to determine whether two populations differ statistically based on rank. The following equation can be used to calculate the Wilcoxon z-score:

$$z=\frac{\mathrm{max}\left({R}^{-},{R}^{+}\right) - \left(\frac{n n(+1)}{4}\right)}{\sqrt{\frac{n n\left(+1\right)(2n+1)}{24}}}$$
(8)

where \({R}^{+}\) and \({R}^{-}\) are the sums of the ranks of the positive paired difference and negative paired difference (\({S}_{i}-{S}_{i}^{*}\)), respectively.

A two-tailed test is applied to decide if the difference between the actual and predicted values is statistically significant. Since the critical z value for the 95% confidence interval (or a 5% significance level) in a two-tailed test is 1.96, the null hypothesis of no difference is rejected if the absolute value of the calculated z value is greater than 1.96 or the p-value is less than the significance level. On the other hand, it is assumed that there is no significant difference between the actual and predicted values if the z-score is less than 1.96 or the p-value is greater than 0.05 (Uzundurukan 2023).

2.4 Model used to describe the residual saturation zone of SWCC

In the current study, the residual saturation zone of SWCC was defined using the linear relationship expressed in Eq. 9 between total suction (\(\psi\)) values (in pF) and gravimetric water contents (\(w)\).

$$w=-a\cdot \psi +b$$
(9)

where \(a\) and \(b\) are the equation coefficients of the relationship.

The equation coefficients of the relationships between total suction (in pF) and gravimetric water content (%) have specific values for each soil included in this study. The assessment of equation coefficients for all soils demonstrates a linear proportional relationship between coefficients a and b. Thus, Eq. 10 can be written as follows:

$$w=a \cdot \left(^{b}/_{a}- \psi \right)$$
(10)

where \(a\) represents the slope of SWCC in the residual saturation zone (\({s}_{r}\)) and \(^{b}/_{a}\) represents total suction at zero water content (\({\psi }_{dry}\)) of a particular soil. As a result, the expression in Eq. 9 can be transformed into the following form, which gives the gravimetric water content (\({w}_{i}\)) corresponding to a specific suction value (\({\psi }_{i}\)) within the boundaries of the residual saturation zone:

$${w}_{i}={s}_{r} \cdot \left({\psi }_{dry}- {\psi }_{i}\right)$$
(11)

3 Results and discussions

3.1 Obtaining the model parameters for the residual saturation zone of the SWCC

Total suction and gravimetric water content datasets with suction values larger than 10 MPa (> 5 pF) were used to develop a model defining the residual saturation zone of SWCC. The linear relationship expressed by Eq. 9 was obtained between the total suction (in pF) and the gravimetric water content (%) for each of the soils examined. It was observed that the equation coefficients of this relationship had specific values for each soil. Figure 2 provides an example of the relationship obtained for the S24 sample.

Fig. 2
figure 2

Gravimetric water content (%)–total suction (in pF) relationship (for S24 soil sample)

The coefficients a and b, both of which have a specific value for each soil, were observed to have a linear proportional relationship (Fig. 3).

Fig. 3
figure 3

Relationship between the a and b coefficients for all soil samples investigated in the study

Figure 3 demonstrates that the ratio b/a produces an approximate value applicable to all soils evaluated in the study. This ratio represents the suction at zero water content (\({\psi }_{dry}\)), while the coefficient a indicates the slope of the soil–water characteristic curve (\({s}_{r}\)) in the residual saturation zone. Consequently, Eq. 11 was derived by making appropriate modifications to Eq. 9, resulting in a model that properly reflects the behaviour of the SWCC within the residual saturation zone. It must be noted that Eq. 11 presents another representation of the semi-log-linear function initially developed by Campbell and Shiozawa (1992) to describe the relationship between water content and suction in soil.

Table 2 provides descriptive statistics for the \({s}_{r}\) and \({\psi }_{dry}\) values obtained from the 40 soils examined in this study.

Table 2 Descriptive statistics for \({s}_{r}\) and \({\psi }_{dry}\) values for the investigated soils

The \({s}_{r}\) values of the investigated soil samples were between 1.589 and 13.035. The values corresponding to the dimensionless slope \(SL\) (\(^{1}/_{{s}_{r}}\cdot 100\)) of the Campbell–Shiozawa model can be calculated as 62.93 and 7.67, respectively. Resurreccion et al. (2011) reported \(SL\) values between 19.2 and 276.92 for 41 illite-predominant clay samples. Chen et al. (2014) determined that the \(SL\) values of 24 clayey soils ranged from 7.6 to 135.6. Pittaki-Chrysodonta et al. (2019) investigated 144 different soils with \(SL\) values ranging between 16.71 and 165.54.

The \({\psi }_{dry}\) values obtained for the soils examined in this study ranged from 6.483 to 7.370 pF, with an average of 6.855 pF. Although previous research has assumed that the suction at zero water content can reach up to 1 GPa (or 7 pF), the exact limits have not been defined. Several studies have considered a value ranging from 6.7 to 7.1 for \({\psi }_{dry}\) (Ross et al. 1991; Rossi and Nimmo 1994; Fayer and Simmons 1995; Webb 2000; Groenevelt and Grant 2004; Schneider and Goss 2012; Lu and Khorshidi 2015). Karup et al. (2017) revealed that the matric potential at zero water content varies between 6.65 and 7.1, even in soils with similar mineralogy, in their study of 171 undisturbed soil samples.

Consequently, the \({s}_{r}\) and \({\psi }_{dry}\) values obtained from the semi-logarithmic linear function for the soils examined in this study are consistent with data previously reported in the literature.

3.2 Correlations between the model parameters and the soil properties

Spearman’s rho analyses were performed to determine whether the \({s}_{r}\) and \({\psi }_{dry}\) values are related to the soil properties selected for the study, and the statistical significance of the correlations was determined. The significance levels of the correlations and their corresponding correlation coefficients are shown in Table 3.

Table 3 Results of Spearman’s Rho correlation analysis

Based on WP4T measurements on 41 Danish soils, Resurreccion et al. (2011) found strong correlations between the slope \(SL\) of the Campbel and Shiozawa (1992) model and the clay content and specific surface area. Arthur et al. (2013) and Schneider and Goss (2012) also observed significant relationships between \(1/SL\) and clay content. However, this study observed no significant relationship between the slope values and the clay content. According to Spearman’s rho analyses, there are significant correlations (p < 0.01) between \({S}_{r}\) and the liquid limit, the plastic limit, the shrinkage ratio, the activity, the free swell index, and the SWIR spectral parameters D1400 and D1900. According to Pittaki-Chrysodonta et al. (2019), higher absolute values of \(SL\) indicate sandier soils, while lower absolute values indicate clayey soils. However, examining Table 3 demonstrates that defining this judgement based on consistency limits rather than clay content is more significant because consistency limits are controlled by the mineral composition, particle size distribution, and pore fluid chemistry, which govern the physicochemical forces that exist between soil grains and soil water (Zhou and Lu 2021). Therefore, even if it contains fewer clay particles, soil with higher plasticity and activity may have larger suction under the same humidity conditions than one with lower plasticity (Kocaman et al. 2022).

Several studies have been conducted to determine the relationship between spectral adsorption features and the physical and chemical properties of soils (Escadafal 1993; Kariuki et al. 2003; Kariuki et al. 2004; Viscarra Rossel et al. 2006; Ben-Dor et al. 2009; Mulder et al. 2013; Dufréchou et al. 2015; Zhou et al. 2022; Taghdis et al. 2022). In most of these studies, geometric parameters, such as band position, band depth, band width, band area, and asymmetry, have been used to express spectral absorption properties (van der Meer 1999; Byun et al. 2023). Some studies have explored the relationships between water retention characteristics and spectral soil properties (Janik et al. 2007; Santra et al. 2009; Babaeian et al. 2015; Pittaki-Chrysodonta et al. 2021), but studies on the residual saturation zone remain limited (Pittaki-Chrysodonta et al. 2019; Norouzi et al. 2022). The spectral parameters of soil are strongly affected by clay mineralogy, and the presence of kaolinite and smectite significantly affects the intensity of the 1900 nm and 2200 nm features (Kariuki et al. 2003, 2004). Santra et al. (2009) discovered a strong correlation between basic soil and hydraulic properties and spectral reflectance in the 2000–2300 nm wavelength bands. Babaeian et al. (2015) stated that wavelength bands in the NIR-SWIR (700–2500 nm) region had predictive capacity in their study, which aimed to estimate Mualem–van Genuchten model parameters using spectral reflectance data. Pittaki-Chrysodonta et al. (2021) indicated that visible near-infrared spectroscopy can effectively estimate the soil water retention curve. Consequently, the strong relationship between \({s}_{r}\) and the SWIR spectral parameters D1400 and D1900 observed in this study is consistent with the literature.

Karup et al. (2017) stated that they could not find a distinct relationship between the matric potential at zero water content and clay mineralogy, organic matter, or clay content. In this study, the relationships between \({\psi }_{dry}\) and the selected soil properties are not as strong as those observed for \({S}_{r}\). The total suction at zero water content was observed to have weak correlations with the free swelling index, particle fractions, surface area activity, liquid limit, and electrical conductivity.

3.3 Regression analysis between model parameters and soil properties

Following the correlation analysis between the selected soil properties and the \({S}_{r}\) and \({\psi }_{dry}\) values, the goal was to develop regression models for the \({s}_{r}\) and \({\psi }_{dry}\) values. Multiple linear regression analyses were performed using IBM SPSS Statistic Data Editor v25 and the stepwise regression method. Table 4 reveals the linear regression equations obtained between \({s}_{r}\) and the selected soil properties. Table 5 presents the regression equations for \({\psi }_{dry}\).

Table 4 The regression equations for \({s}_{r}\)
Table 5 The regression equations for \({\psi }_{dry}\)

Assessing Tables 4 and 5, it is evident that the coefficients of determination of the regression equations for \({s}_{r}\) values are acceptable, whereas those for \({\psi }_{dry}\) are relatively low. A four-fold cross-validation procedure was used to validate the regression models. The dataset was randomly divided into four equal parts, and the model training and testing procedure was repeated four times, with each of the four parts serving as a test set. The results of all four processes were combined to form a single validation set, and the predictive model’s performance was assessed using validation set performance measures. Figure 4 compares the performance metrics obtained for each fold, the combined validation set, and the regression set for \({s}_{r}\). The predictive performance comparison for \({\psi }_{dry}\) is given in Fig. 5.

Fig. 4
figure 4

Variations of the performance metrics according to datasets and regression equations for \({s}_{r}\)

Fig. 5
figure 5

Variations of the performance metrics according to datasets and regression equations \({\psi }_{dry}\)

Figure 4 shows that, except for E1, all other regression equations can be used to accurately predict \({s}_{r}\) for the soils examined. The calculated performance criteria, on the other hand, show that E1’s predictive ability is acceptable.

The values of the performance criteria of the regression equations used for the \({\psi }_{dry}\) predictions are, as expected, lower than those obtained for \({s}_{r}\). Based on the calculated performance criteria, it can be concluded that the regression equations obtained for \({\psi }_{dry}\) cannot provide reliable predictions. The Wilcoxon signed-rank sum test was performed to determine whether the regression equations produced statistically significant results by comparing the predicted values of \({s}_{r}\) and \({\psi }_{dry}\) with their corresponding actual values. Table 6 summarises the results.

Table 6 Wilcoxon signed rank sum test result for actual values vs. predicted values

The Wilcoxon signed-rank sum test results revealed that the E5 equation used for \({\psi }_{dry}\) estimation did not produce statistically significant results. Consequently, this equation was not used as a part of the prediction models in the next part of the study.

Combinations of \({s}_{r}\) and \({\psi }_{dry}\) values predicted by different regression equations were substituted into Eq. 11, and the actual water content values of the 162 measured \({w}_{i}-{\psi }_{i}\) data pairs were compared to the predicted values. Table 7 shows the prediction performance for all predictive model combinations. Table 7 also shows the prediction performances for cases where the average value obtained for the soils studied (6.855 pF) was used instead of the suction value at zero water content. The most accurate predictions were provided by the P13 model, which employs the E4 equation for \({s}_{r}\) estimations and the E6 equation for \({\psi }_{dry}\) estimations. The P4 model, which uses the E1 equation for \({s}_{r}\) estimates and the average value of 6.855 pF for \({\psi }_{dry}\), had the worst predictive accuracy. Figure 6 compares the water content values calculated from the models with the best and worst predictive performances with the measured water content values.

Table 7 Prediction performances of different prediction models
Fig. 6
figure 6

Predicted and measured gravimetric water contents for models with the best and worst prediction performance: a \({s}_{r}\) was estimated from E4, and \({\psi }_{dry}\) was estimated from E6; b \({s}_{r}\) was estimated from E1, and average value was used for \({\psi }_{dry}\)

Figure 7 depicts a visual comparison of the performances of the prediction models for the threshold values given for the performance criteria in the literature.

Fig. 7
figure 7

Comparison of prediction performances of the models

As shown in Table 7 and Fig. 7, the regression equation selected to predict the slope of the residual saturation zone of the SWCC has the greatest influence on the prediction performance of different models. Models employing regression equations other than the E1 equation, which employs only the liquid limit as an explanatory variable to predict \({s}_{r}\), provide acceptable estimates. Models that use the E3 and E4 regression equations to predict \({s}_{r}\) have the best predictive performance. However, using the E2 regression equations to predict \({s}_{r}\) yields good predictions. The regression equation used to predict suction at zero water content has a limited effect on the prediction performance of the models. Even in models employing the average value of 6.855 pF obtained for the soils examined in this study without any estimation equations for \({\psi }_{dry}\), acceptable estimations are achieved. For the P8 model, for example, knowing the liquid limit and the D1900 values, which are inexpensive and easily identifiable, is sufficient. The performance metrics demonstrate that the P8 model makes good predictions for the soils under consideration in the study.

3.4 Limitations of the study

The study’s results were obtained from the analysis of experimental data from 40 cohesive soils with plasticity index values between 9.5 and 70.1 and fine content between 46.8 and 97.2%. The suction–water content measurements in the high suction range (> 10 MPa) were considered in line with the study’s objectives. Consequently, the models proposed in this study can only be used to predict the residual saturation zone of the SWCC in cohesive soils.

4 Conclusion

Models used to predict the SWCC generally depend on suction measurements within the transition zone; however, the validity of these models for soils under low-moisture conditions, typical of arid and semiarid regions, is uncertain. Thus, models are needed that can accurately estimate the residual saturation zone of the SWCC, especially in cohesive soils.

In this study, a semi-logarithmic linear model was used to simulate the residual saturation zone of the SWCC. The model has two parameters: the suction at zero water content (\({\psi }_{dry}\)) and the slope of the SWCC in the residual saturation zone (\({s}_{r}\)). Using an experimental dataset for 40 cohesive soil samples, correlations between model parameters and the soils’ physical, chemical, and mineralogical properties were examined. The \({s}_{r}\) values were observed to have strong correlations with the consistency limits, activity, and some spectral properties that reflect the quantity and mineralogical composition of the clay minerals in the soils. However, the correlations between \({\psi }_{dry}\) and soil properties in this study’s dataset are not very strong, consistent with previous studies. Using the physical, chemical, and spectral properties of soils as explanatory variables, stepwise regression analyses were used to obtain empirical equations that could be used to predict the model parameters. Sixteen estimation models for predicting the residual saturation zone of the SWCC in cohesive soils were developed by inserting combinations of empirical equations based on regression analyses for \({s}_{r}\) and \({\psi }_{dry}\) into the two-parameter semi-logarithmic linear model. The results show that using the semi-logarithmic linear model to describe the SWCC’s residual saturation zone provides satisfactory estimations. The fact that the prediction equations include physicochemical soil properties, such as the liquid limit and specific surface charge, indicates that parameters related to the amount and type of clay minerals should be included for good predictions of the models that define the dry region of the SWCC in cohesive soils. Furthermore, the results demonstrated that spectral absorption properties in the SWIR region have the potential to predict the residual saturation zone of the SWCC in cohesive soils.