1 Introduction

Black cotton soil is largely an inorganic clay which possesses higher fraction of fine grained particles, higher plasticity, compressibility and expansive characteristics [1]. They are confined to the temperate climatic and semi-arid zones and very suitable for cotton cultivation [2]. Its mineralogical composition is dominated by montmorillonite clay mineral and is abundant in zones where the annual precipitation is less than the evaporation [3]. Quite number of studies have reported that this type of soil as well as other types deficient soil may not be suitable for engineering works due to the presence of soft sediments which makes them exhibit high volume change when there is significant change in moisture content [4,5,6,7,8,9,10,11,12,13,14,15].

The production of Portland cement which is an essential ingredient for construction works is accompanied by large quantities generation of by-product material in the form of cement kiln dust (CKD) which is not suitable for re-use in the cement production process and is disposed in millions of tons annually [16]. There has been great interest by researchers in providing applications for CKD so as to curtail the high cost associated with the disposal of these industrial waste and also related environmental degradation challenges [17,18,19]. It has been applied in various fields; first as a waste water streams stabilizer, as asphalts anti-stripping agent, soil fertilizer and masonry products. In the cement manufacturing process, the kiln rotates the cement ingredients (raw materials) gradually from the upper to the lower end and controlled by the rotational speed and slope of the kiln. In the hotter end of the kiln, Chlorine, Potassium, Sodium and other elements present in the raw materials are wholly or partially volatilize which are not allowed into the clinker [20, 21]. Rapid flow of air is supplied for combustion of the fuel which moves against the raw materials flow and the turbulent nature of raw feed agitation and swift gas flow results in large volume of particulate matter to be entrained during gases combustion. The gas flow takes up the volatilized and partially burned raw materials in the kiln while the entrained precipitates consists of CKD which is removed and collected from the exhaust gases of the kiln by pollution control equipment [22]. To achieve low alkali clinker from high alkali raw materials and also to ensure consistent compliant operation, there is larger quantity of CKD generated as industrial waste. Additionally, there are high concentration of developed volatiles deposited at the kiln’s walls which causes plants shut down frequently. Therefore, CKD is generated from the cement manufacturing plants so as to eject volatile alkalis, sulphates and chlorides within the kiln system. Cement kiln dust (CKD) is a highly alkaline waste and fine grained by-product obtained from the cement exhaust gas control air pollution devices. The chemical and physical characteristics of CKD may vary from one cement manufacturing plant to another depending on the component ingredients (raw materials) utilized and method of collection in the plant [23, 24].

There is need to improve the engineering properties of expansive soils to be used as pavement subgrade material using industrial waste to encourage recycling and re-use of industrial waste. To make clear the relationship between the factor levels under investigation and their corresponding responses (correlations), advanced mixture design methods have been employed by several researchers [25,26,27]. Salahudeen et al. [21] performed soil stabilization assessment in CKD blended black soil using artificial neural network (ANN) multilayer with perception back propagation algorithm. Ten input variables which were obtained experimentally and constitute the general engineering behaviour of the soil blended mixture in terms of effective grain size, gradation coefficients, swell-shrinkage and specific gravity. The output variables are two, namely max dry density (MDD) and optimum mixture content (OMC). The developed model was validated using loss function parameters P value and MSE. The simulated network performance was satisfactory with P value of 0.9884 and 0.983 for MDD and OMC, respectively. Also, Olubanwo et al. [28] investigated the utilization of optimization techniques in the mixture of material experiments concept for designing and proportioning the cementitious portion of a bounded roller compacted fibre reinforcement polymer modified concrete (BRCFRPMC). By constraining the variability range of the constituent paste, a feasible design space was generated having 13 experimental runs to derive the optimum consistency-time for composite and consolidation properties with the substrate OPC concrete at 34.10 and 34.90 sec. The apparent max density was obtained at the range of 97.10 to 98.0% of the free theoretical density.

In line with the foregoing, expansive soils are blended with stabilizing agents to enhance its performance in construction works using several mixture experimental design techniques [29, 30]. Mixture experiment is a special case of response surface method where the property understudy compete with the existing ones. For the purpose of modelling, all sort after responses are first experimentally measured for each of the possible mixture combination in the design space after which the generated responses were modelled as a function of the mixture components using polynomial fractions based on mathematical formulations [31]. As simplex design for mixture experiments places lower bounds only on the factor levels, there are conditions where the use of complex constraints is appropriate or required. Extreme vertices design (EVD) is very important in this case as it is flexible enough to allow the imposition of additional constraints factor levels by specification of both upper and lower bounds on the components through the definition of linear constraints for blends [32]. EVD is a mixture design which occupies a smaller space or sub portion within the simplex. The design is important when the design factor space chosen is not L-simplex design. This condition is imposed by both upper and lower bound constraints in the factor levels when there are series of inter-dependencies between the mixture components which result to setting of lower and upper bounds for the ingredients [33].

In design of experiments, optimal designs are experimental design types which are optimal compared to the required statistical criterion. Optimal designs permit variables or factors to be predicted with minimum variance and without bias. For non-optimal designs, more numbers of experimental runs would be needed using the same precision as optimal designs to predict the factor variables. This would result to more experimentation cost. Statistical criteria are utilized to evaluate experimental designs. Design optimality depends on statistical model evaluated with respect to prescribed statistical criterion, which is associated with the estimator’s variance matrix. When several variables are assessed in the given statistical model, the inverse of the variance matrix is termed information matrix. The information matrix is compressed or simplified with real valued summary statistics. D-Optimality which is a popular criterion which maximizes the information matrix \(X^{\prime}X\) or minimizes the information determinant matrix \(\left| {\left( {X^{\prime}X^{ - 1} } \right)} \right|\) of the design. However, I-Optimality is a criterion associated with the variance of predictions and minimizes the average prediction variance over the design factor space.

The objective of EVD is to select design points that appropriately cover the design space. This occurs due to imposition of additional constraints of lower and upper boundary conditions in the mixture components which result in the design points occupying smaller portion of the simplex termed the constrained region [34]. The constrained mixture is of the general form for a single component constraints (SCC) where q is the total number of ingredients as presented in the Eqs. 1 and 2. When the component proportions are imposed with SCC, the factor space will now take the shape of irregular polyhedron within the simplex [35]. Thus, the thrust of this study is that despite the numerous deployment of CKD in improving deficient soils, to the best of the authors’ understanding they exist minute attempts in optimizing additives proportions for soil treatment by means of EVD strategy.

$$X_{1} + X_{2} + ... + X_{q} - 1 = 0$$
(1)
$$\begin{array}{*{20}c} {L_{j} \le X_{j} \le U_{j} {\text{for j}}} \\ { = 1,2 \, ... \, ...{\text{ q}}{.}} \\ \end{array}$$
(2)

where, Lj ≥ 0 is the lower region and Uj ≤ 1 is the upper region.

It is possible to define new set of components consisting of values ranging from 1 to 0 since the new constrained region of the experimental design space is still a simplex. This makes model fitting and design construction easier through the constrained interest region. These newly generated components (× j) are termed as pseudo components which is defined using the mathematical formula expressed in Eq. 3. Pseudo-components are essentially utilized for mixture model fitting because there are relatively high levels of multicollinearity among the factor levels at the constrained design space and computer aided designs like D-optimal, I-optimal designs in mixture experiments [36, 37].

$$\begin{array}{*{20}c} {X_{j}^{*} = \frac{{X_{j} - L_{j} }}{1 - L}{\text{where L}} = \sum\limits_{j = 1}^{q} {L_{j} } } \\ { < 1{\text{ and is the lower boards total sum}}{.}} \\ \end{array}$$
(3)

When constructing a design in pseudo components, we specify the design points in \(X_{j}^{*}\) terms and then convert to the corresponding original component setting using the formula in Eq. 4.

$$x_{j} = L_{j} + (1 - L)X_{j}^{*}$$
(4)

For a q-component experimental mixture with upper bound constraints with components where 0 ≤  × 1 ≤ Ui shown mathematically in Eq. 5.

$$\begin{array}{*{20}c} {U - {\text{pseudo components}} = \frac{{U_{j} - X_{j} }}{U - 1}} \\ { > 1} \\ \end{array} , j = 1,2...,q{\text{ where U}} = \sum\limits_{j = 1}^{q} {U_{j} }$$
(5)

The upper bound constraint causes the feasible experimental portion to be situated entirely inside original simplex in inverted form only if; as stated mathematically in Eq. 6.

$$\sum\limits_{j = 1}^{q} {U_{j} - U_{\min } } \le 1$$
(6)

The minimum of q-upper bound is represented by Umin and the experimental region in this case is termed U-simplex.

In this research study, a mixture experimental optimization and design were carried out using EVD method so as to optimize the utilization of cement kiln dust for the stabilization of expansive clay (black cotton soil) in terms of its mechanical characteristics. The optimal ingredients’ content are then determined using multiple optimization criteria through the desirability function and the complex nature of soil-additive blends has been simplified using EVD method. This work provides an insight into the application of constrained simplex method of experimental design for the evaluation of the soil-additive blend engineering properties [38].

2 Materials and Methods

2.1 Test Materials

The soil material was sourced via method of disturbed sampling from a deposit in Deba, Gombe State, Nigeria (which lies within latitude 10° 12ʹ 42.73ʹʹ N, longitude 11° 23ʹ 13.56ʹʹ E). The soil material has a greyish black colour based on eye inspection. Prior to the usage of the greyish black expansive soil, it was broken into smaller fragments, air dried, pulverised and as well sieved with BS sieve No. 4 (4.75 mm aperture). The cement kiln dust used for the laboratory exercise was collected from a “mole hill” of CKD dumped at the discharge unit of Lafarge Cement Company in Calabar. Specific gravity test performed on the cement kiln dust presented a value of 2.65.

2.2 Test Methods

The experimental examinations were investigated upon according to the guiding principles in BS 1377 [39] and BS 1924 [40] for both unaltered and treated soil mixtures, respectively. The tests executed on the natural soil include particle size analysis, Atterberg limits, specific gravity, compaction test, California bearing ratio test and unconfined compressive strength test. This study presents a three-component mixture experiment consisting of soil, CKD and water; due to the imposition of component constraint at upper and lower boundaries, the simplex is constrained with the experimental points positioned at the edges and vertices of the constrained region. The mixture component ratios and number of experimental runs were then determined using I-optimal design through which the unconfined compressive strength (UCS) and California bearing ratio (CBR) tests were carried out in the laboratory. Data generated from the experimental responses and their corresponding factor levels were then analysed so as to model the soil-CKD mechanical blend behaviour. Statistical diagnostics and influence tests were also carried out to validate the developed model; numerical and graphical optimization is finally conducted using desirability function computation to maximize the response criteria with respect to the factor levels to obtain the optimal mixture combination of problematic soil-CKD for the maximized mechanical strength response [41].

2.2.1 California Bearing Ratio

California bearing ratio test is an indicator of soil strength parameter and it was executed according to the guidelines in BS 1377 [39] and BS 1924 [40] for both unaltered and treated soil mixtures, respectively. The tests were carried out for soil materials compacted based on British Standard Light (BSL) compaction energy for both soaked and unsoaked conditions. The soil specimens were compacted in three layers with the aid of a 2.5 kg rammer and each of the three layers received 62 nos. of blows. The soil specimens compacted during the CBR tests were cured for 6 days and after the sixth day the soil specimens were immersed in water for a period of 48 h. Thereafter, the cured specimens were subjected to a static loading system by the CBR machine until failure took place [42].

2.2.2 Unconfined Compressive Strength

The method documented in BS 1377 [39] was used to determine the unconfined compressive strength (UCS) of the soil specimens. Both the natural soil and cement kiln dust treated soil mixtures were used for the UCS experimentation. The soil specimens were compacted using BSL and cured for a duration of 7 days. After the curing exercise, soil specimens were placed inside the loading frame of the UCS testing machine [43].

2.3 Components Mixture Design Formulation

The determination of the actual ratio of the ingredients to be mixed for each particular experimental run and also the total number of experimental runs were carried out here. The effective ratios obtained here form fundamental bases for the model development so as to derive the optimal combination ratio for the soil-additives blend and achieve improvement in the problematic soil engineering properties [44].

2.3.1 Formulation of Constraints

The mixture components are imposed with upper and lower bounds established through the ingredient material characteristics which constitute the experimental blend. In most cases, economical, practical and environmental or physical considerations impose most of these boundary limits. For a three-component mixture investigated in this state constituting of the problematic soil, CKD and water; here the soil is treated with the CKD as admixture at varying values of moisture content to enhance its mechanical properties. From relevant literatures [45, 46], the component constraints were formulated using single component constraints (SCC) are presented in Table 1.

Table 1 Design constraints

2.3.2 Design of Factor Space and Simplex

The developed constraints which defined the upper and lower limits of the SCC imposed on the factor levels cause the factor space to take the shape of a hyper-polyhedron simplex. The feasible experimental region within the simplex termed the constrained space is then obtained through the SCC evaluation [47]. The degree of freedom evaluation is also carried out through the design matrix assessment computation for the mixture design using special cubic model as shown in Table 2. A minimum of three lack of fit degrees of freedom is recommended which ensures fit test validity. Fewer degrees of freedom will lead to a test that may not detect lack of fit [48].

Table 2 Design matrix evaluation for mixture special cubic model 3 factors: A, B and C

Power computation tests were carried out on the generated mixture component constraints using Minitab 18 and design expert software to determine the standard error, variances and standard deviations on the mixture terms which represent the model coefficients situated at the vertices, edges, design planes and centroid of the simplex at 5% level of alpha as shown in Table 3.

  • Standard errors should be smaller and similar within coefficient type.

  • The Best variance inflation factor (VIF) value is 1. VIFs higher than 10 are cause for concern. VIFs higher than 100 are cause for alarm, signifying poor estimation of the coefficients as a result of multicollinearity.

  • Best Ri-squared is 0. Higher Ri-square indicates correlation of the terms which would possibly lead to poor models.

  • For the experiment mixture designs the ratios of ingredients must sum to one.

  • This is a constraint on the system and causes multicollinearity to exist, thus increasing the VIFs and the Ri-squares, rendering these statistics useless.

  • Use precision-based metrics provided in this program via fraction of design space (FDS) statistics.

Table 3 Power calculations are performed using response type "Continuous"

The design expert software also developed the three-component simplex contour plot and conditions as shown in Figs. 1 and 2, showing the positioning of the actual experimental points within the feasible design space. There are four space types through which these points where randomly disposed which include, interior, edge, center and vertex. Information matrix showing the leverages, build and space type were calculated. Lack of fit were recorded on one point at the interior space type while the replicates were also observed at one point on the edge space type of the simplex as shown in Table 4. From the results, average leverage of 0.5833 was calculated [49, 50].

Fig. 1
figure 1

Contour space and factor space simplex

Fig. 2
figure 2

Experimental factor space of the components in a three-component mixture space

Table 4 Measures derived from the information matrix

The software determines data statistics for the experimental design and performs the multicollinearity design, G-efficiency, scaled D-optimality and I-optimal design. I-optimal designs, also known as IV (integrated variance), provide minimum average estimation of the variance across the experimental regions. It is desirable for (RSM) Response Surface Methods where estimation is very important and its algorithm selects the points which minimize the prediction variance across the design space [49, 50]. Condition Number of Coefficient Matrix = 61.376.

  • If this value is 100–1000, there is moderate to strong multicollinearity.

  • Values above 1000 indicate severe multicollinearity.

Maximum Variance Mean = 0.975.

Average Variance Mean = 0.376.

Minimum Variance Mean = 0.205.

G Efficiency = 59.8%

  • G Efficiency is inversely related to maximum variance.

  • Lack of fit runs and replicates tend to reduce the G Efficiency of a design.

    Scaled D-optimality Criterion = 82.331.

  • When comparing designs, a smaller value is better.

    Determinant of (X′X)−1 = 7.156E + 5.

    Trace of (X′X)−1 = 808.230.

    I = 0.37922.

  • These can be used in design comparisons with same number of runs, a smaller value is better. From the computation results the design summary is thus presented in Table 5.

Table 5 Design summary

2.3.3 Experimental Mixture Proportion Design

The number of runs was assigned from the information matrix where there are spaced at the corners of the experimental region termed space type. Twelve (12) run of experiments were computed in the process and the number of runs can be modified to reduce the lack of fit effects and screen for errors within the factor space. The mixture components are bounded by developed constraints to generate proportions for the run of experiments required. The mix proportion matrix is presented in Table 6. Showing the actual and pseudo components proportions for the mixture experiments. The results obtained for the run of experiments which is the response would be utilized for the model development process of the soil-additive blend overall behaviour [51].

Table 6 Experimental mix proportions

3 Results, Discussion and Model Validation

3.1 Characterization of Test Materials

The general classification and engineering behaviour of the test soil is presented in Table 7. The results indicate that it possesses high plasticity and swelling potential; it is also poorly graded and exhibit expansive properties with soft materials. Furthermore, the classification by AASHTO approach [52] and USCS [53] produced A-7–6 (14) and CH, respectively, which shows an unsuitable soil for engineering works with low CBR of 3%, MDD of 1.64 Mg/m3 and OMC of 18%. The studied soil has a specific gravity of 2.40 and from the grain size distribution test of the unaltered soil, 72% of the soil particles fall within the silt–clay fraction (Table 8).

Table 7 Basic properties of the test soil
Table 8 Particle size fraction of unaltered soil

The chemical composition of CKD and test soil is presented in Table 9. The results showed that CKD has a high content of lime at 66.82%, silica 18.82%, alumina 6.34%, iron oxide 2.05% and a very low content of magnesium oxide 0.01% while the black cotton soil has a high content of silica 48.5%, alumina 18.6%, iron oxide 2.2% and very low lime content 0.9%. From the result obtained in terms of elemental oxides present in the problematic soil, the higher content of lime present in CKD would react when hydrated with the alumina and silica that is abundant in the problematic soil to produce calcium silicate hydrate. The hydration products obtained from this reaction process are expected to enhance the mechanical behaviour that would probably culminate to improvement of the problematic black cotton soil [54].

Table 9 Basic properties of the test soil

3.2 Mechanical Properties of the Treated Soil

From the formulated mixture component proportions through the twelve runs of experiment, the actual ratios were converted to the effective mass (Kg) and the mixture ingredients are then weighed according to the mass conversion values for each experimental run. The California bearing ratio (CBR) and unconfined compressive test (UCS) were carried out on the soil-additive blend with the experimental results presented in Table 10. The results indicate an improved mechanical strength performance due to the blending of CKD from 30–40%, soil at 40–50% and water at 5–10% [55].

Table 10 Experimental results

4 Model Development and Validation

For the experimental response data processing, the required transformation for the analysis quadratic (square root) with the response ranges from 380 to 501 with a ratio of max to min 1.31842 for the UCS response and response ranges from 30 to 58 with max to min ratio 1.9333 for CBR responses. The fit summary, diagnostic tests, numerical and graphical optimization were carried out to determine the optimal mixture proportion of the problematic soil-CKD blend so as to maximize the mechanical strength response. Post analysis, confirmation and coefficient tables were then generated to validate the model results using design expert and Minitab 18 software [56, 57].

4.1 Fit Summary

Fit summary is a collection of relevant statistical tools which helps to choose the required final model initial or starting point. The results presented include sum of squares, lack of fit, R-squared and summary statistics. Several relevant statistical computation table which would enable us to determine which model to select for in-depth study. The full-order model which meets the criteria specified is ‘suggested’. Aliased models are derived through the software computation if there are not adequate unique experimental design points for the model coefficients prediction. Through the sequential sum of squares computation, the sum of squares and P value Prob > F is expected to be minimum to indicate which polynomial model improved the result the most. The total number of model coefficients added sequentially is equal to degree of freedom for each source. The lack of fit sum of squares utilizes the F value which is compared with the variations in average response differences at the design points. The lack of fit tests compare the error residual with the error (pure) due to replicated design points. When lack of fit error is bigger than error (pure), it indicates the residual error contains some values that can be taken care of by more appropriate model.

4.1.1 Fit Summary for UCS Response

The fit summary for UCS response is presented in Tables 11, 12, 13 and showed preference for the special quartic model with R-squared, adjusted and predicted R-squared of 0.9988, 0.9958 and 0.9237, respectively. The lack of fit test results showed sum of squares of 9.166 \(\times\) 10–003, mean square of 4.583 \(\times\) 10–003 and lack of fit p value (Prob > F) of 0.4909 higher significant lack of fit p value is selected and used as the response predictor [49, 50].

Table 11 Model summary statistics for UCS response
Table 12 Lack of fit tests for UCS response
Table 13 Sequential model sum of squares [Type I] for UCS response

4.1.2 CBR Fit Summary

The fit summary for CBR response is presented in Tables 14, 15, 16 showed preference for the quadratic model with R-squared, adjusted and predicted R-squared of 0.9939, 0.988 and 0.9606, respectively. The lack of fit test results showed sum of squares of 0.033, minimum mean square of 6.615 \(\times\) 10–003 and lack of fit p value (Prob > F) of 0.2783 [49, 50].

Table 14 Model summary statistics for CBR response
Table 15 Lack of fit tests for CBR response
Table 16 Sequential model sum of squares [Type I] for CBR response

4.2 Analysis of Variance (ANOVA)

Analysis of variance was carried out with respect to the model source selecting during fit summary computations. Special quartic model was prescribed for UCS response while quadratic model was selected for CBR response to determine the statistical significance for the mixture factor levels using pseudo coding. The ANOVA computation results using square root transformation is presented in Tables 17 and 18 for UCS response and Tables 19 and 20 for CBR response [58].

Table 17 ANOVA results for UCS response
Table 18 Statistical summary for UCS
Table 19 ANOVA results for CBR response
Table 20 Statistical summary for CBR

"Adeq Precision" measures the signal to noise ratio. A ratio greater than 4 is desirable. The ratio of 52.119 indicates an adequate signal. This model can be used to navigate the design space [49, 50].

4.2.1 Coefficient Estimates and Model Equations for UCS

The components, coefficient estimate, degrees of freedom, standard error, variance inflation factor (VIF) and final equations terms of L-pseudo components computation results are presented in Tables 18 and 21. VIF measures the extent to which the variance of the coefficient estimate (predictor) is inflated by the lack of orthogonality in the design points. If the factor is orthogonal with respect to all other factors in the model, the VIF = 1 [59] (Table 22).

Table 21 Model coefficient estimates for UCS
Table 22 Final equation in terms of L-pseudo components

The equation in terms of coded factors can be used to make predictions about the response for given levels of each factor. By default, the high levels of the factors are coded as + 1 and the low levels of the factors are coded as − 1. The coded equation is useful for identifying the relative impact of the factors by comparing the factor coefficients [49].

“Adeq Precision” measures the signal to noise ratio. A ratio greater than 4 is desirable. The ratio of 39.180 indicates an adequate signal. This model can be used to navigate the design space [49, 50].

4.2.2 Coefficient Estimates and Model Equations for CBR

The components, coefficient estimate, degrees of freedom, standard error, variance inflation factor (VIF) and final equations terms of L-pseudo components computation results for the CBR response are presented in Tables 23, 24, 25.

Table 23 Model coefficient estimates for UCS
Table 24 Final equation in terms of L-pseudo components
Table 25 Final equation in terms of real components

The equation in terms of coded factors can be used to make predictions about the response for given levels of each factor. By default, the high levels of the factors are coded as + 1 and the low levels of the factors are coded as − 1. The coded equation is useful for identifying the relative impact of the factors by comparing the factor coefficients.

The equation in terms of actual factors can be used to make predictions about the response for given levels of each factor. Here, the levels should be specified in the original units for each factor. This equation should not be used to determine the relative impact of each factor because the coefficients are scaled to accommodate the units of each factor and the intercept is not at the center of the design space [60].

4.3 Diagnostics Plots

The regression diagnostics utilized for the verification of regression model assumptions and state if there are observations with huge or undue influence on the analysis using studentized residual which is quotient of the residual and its predicted standard deviation. Studentized residuals are essentially used as outliers’ detector; Outlier is group of data which differs significantly from other observations due to measurement variability or as an indication of experimental error. Studentizing the residuals maps all the different normal distributions to a single standard normal distribution. Diagnostic statistical tests were carried out with respect to UCS and CBR responses [61].

4.3.1 Normal Probability Plot

Normal probability plot checks that the errors are normally roughly distributed which indicates that many of the residuals are positioned near the line of fit and not far away. It has very essential significance for the model estimation. Look only for definite patterns, like an “S-shaped” curve, which indicates that a transformation of the response may provide a better analysis [62]. Normal probability plot for UCS and CBR responses are presented in Figs. 3 and 4.

Fig. 3
figure 3

Normal plot of studentized residuals for UCS response

Fig. 4
figure 4

Normal plot of studentized residuals for CBR response

4.3.2 Residual vs. Predicted Plot

This statistical diagnostic test verifies the assumption of constant variance with the externally studentized residuals on the y-axis and the predicted values on the x-axis. The plot for UCS and CBR responses is presented in Figs. 5 and 6, respectively. The result implies an expanding variance which indicates the need for a transformation. The scattered plot were very close to the zero studentized residual points with the maximum and minimum of 15.4435 and − 15.4435, respectively for UCS responses and 4.98253 and − 4.98253 for CBR responses [63].

Fig. 5
figure 5

Residuals vs. predicted plot for UCS response

Fig. 6
figure 6

Residuals vs. predicted plot for CBR response

4.3.3 Residuals vs. Run Plot

This diagnostic statistic shows a plot of the externally studentized residuals on the y-axis versus the run order of experiments on the x-axis. Lurking variables are checked which may have influenced the response during the experiment in this statistical computation. The plot for the two response cases are presented in Figs. 7 and 8. The plot shows the studentized residuals are close to the line which indicates a time-related variable lurking in the background [64].

Fig. 7
figure 7

Residuals vs. experimental run plot for UCS response

Fig. 8
figure 8

Residuals vs. experimental run plot for CBR response

4.3.4 Predicted vs. Actual

This diagnostic plot presents the estimated model response values on the y-axis versus the actual values on the x-axis. This plot help to determine a value, or group of values, that are not easily predicted by the model in terms of accuracy and is shown in Figs. 9 and 10 for the two response cases. The result deduced from the plotted results indicates a strong correlation between the experimental and the model predicted values [65].

Fig. 9
figure 9

Predicted vs. actual plot for UCS response

Fig. 10
figure 10

Predicted vs. actual plot for CBR response

4.3.5 Box-Cox Plot for Power Transforms

This diagnostic plot test provides guidelines for power law transformation selection. Based on the derived best value for lambda, a recommended transformation is then listed which is situated at the lowest point of the curve generated by the natural log of the sum of squares of the residuals. Box-Cox Power Transforms plot for CBR and UCS responses are presented in Figs. 11 and 12. The result showed best lambda at 3 and 2.79 for UCS and CBR, respectively [66, 67].

Fig. 11
figure 11

Power transforms Box-Cox plot for UCS response

Fig. 12
figure 12

Power transforms Box-Cox plot for UCS response

4.4 Influence Plots

The statistical influence is evaluated through the cook’s distance, leverage vs. experimental run and difference in fits (DFFITS) statistics vs. experimental run. The results are presented in graphical plots which provide a better perspective on the data points.

4.4.1 The Cook’s Distance

The cook’s distance is used commonly for the determination of the data point influence when carrying out ordinary least square regression analysis. The influential points which are particularly worth for validity checks and also to show planes of the feasible experimental design space where better performance can be achieved. The cook’s distance vs. experimental run plot for the two response cases are shown in Figs. 13 and 14. The plotted result indicates that the cook distance score for the UCS response are mostly within 0 and 1 with run number 5 and 9 dispersed while the same was observed for CBR response plot where only run number 5 was dispersed [68].

Fig. 13
figure 13

The Cook’s distance for UCS response

Fig. 14
figure 14

The Cook’s distance for UCS response

4.4.2 Leverage vs. Run

Leverage measures how much each point influences the model fit. If a point has a leverage of 1.0, then the model exactly fits the observation at that point. Experimental run with leverage greater than 2 times the average is generally regarded as having high leverage, such runs have few other runs near them in the factor space [69]. The average leverage is the number of terms in the model divided by the number of experimental runs in the design and the plot for UCS and CBR is presented in Figs. 15 and 16.

Fig. 15
figure 15

Leverage vs. Run plot for UCS response

Fig. 16
figure 16

Leverage vs. run plot for CBR response

4.4.3 DFFITS vs. Runs

DFFITS is a statistical diagnostic tool which shows how influential experimental points are in a regression analysis computation. It is the change in the estimated value for experimental point derived in regression when that point is left out and also the product of the leverage factor and externally studentized residual shown in Figs. 17 and 18 for UCS and CBR responses. The plotted results indicate DFFITS points with respect to the experimental runs lie very close to the zero points within the regions of \(\pm\) 2.59808 and \(\pm\) 2.12132 for UCS and CBR, respectively [70].

Fig. 17
figure 17

DFFITS vs. runs plot for UCS response

Fig. 18
figure 18

DFFITS vs. runs plot for UCS response

4.4.4 Diagnostic Plots and Influence Statistics Summary Report

The summary report for statistical diagnostic plots and influences presenting the predicted and actual values, lambda values, the leverage, internally and externally studentized residuals with respect to the generated standard order for the two response cases as shown in Tables 26 and 27.

Table 26 Report summary for the diagnostic plots and influence statistics for UCS
Table 27 Report summary for the diagnostic plots and influence statistics for CBR

4.5 Numerical Optimization

In a constrained design mixture both upper and lower bounds were at priori through a list of all combinations expressed in the term \([q\left( 2 \right)^{q - 1} + 1]\). For possible blends and in addition to the model choice, desirability function using multi-criteria optimization criteria were incorporated. For each of the criteria, values ranging from 0 and 1 are defined with the scale of desirability satisfying the condition \(\left( {0 \le d\left( {y_{j} } \right) \le 1} \right)\), in which 1 signifies corresponding ideal response while 0 shows that one or some of the criteria lie outside the acceptable values. The rejection or acceptance condition depends generally on the set aim which is optimization direction to either minimization, maximization or target [71, 72].

Where the maximization of the response shows that the bigger value performs better and its desirability function determined using Eq. 7.

$$d\left( {y_{j} } \right) = \left\{ {\begin{array}{*{20}c} 0 & {y_{j} < L} \\ {\left( {\frac{y - L}{{T - L}}} \right)^{{r_{j} }} } & {L \le y_{j} \le T} \\ 1 & {y_{j} > T} \\ \end{array} } \right\}$$
(7)

The minimization of the response indicates that the lesser values performed better and its desirability function is determined using Eq. 8.

$$d\left( {y_{j} } \right) = \left\{ {\begin{array}{*{20}c} 0 & {y_{j} < T} \\ {\left( {\frac{{U - y_{j} }}{U - T}} \right)^{{r_{j} }} } & {T \le y_{j} \le U} \\ 0 & {y_{j} > U} \\ \end{array} } \right\}$$
(8)

While the target shows the best response where its desirability function is computed using Eq. 9.

$$d\left( {y_{j} } \right) = \left\{ {\begin{array}{*{20}c} 0 & {y_{j} < L} \\ 0 & {y_{j} < U} \\ {\left( {\frac{{U - y_{j} }}{U - T}} \right)^{{r_{j} }} } & {T \le y_{j} \le U} \\ {\left( {\frac{y - L}{{T - L}}} \right)^{{r_{j} }} } & {L \le y_{j} \le T} \\ \end{array} } \right\}$$
(9)

where T is the target value, \(y_{j}\) is the estimated result of jth response, L is the minimum acceptable result, U is the maximum acceptable result and \(r_{j}\) is the weight parameter of jth desirability function. Based on the above stated boundary conditions, a numerical multi-response optimization were conducted where the optimum mixture ratio maximizes the weight average of the individual desirability function \(d\left( {y_{j} } \right)\) within the feasible design space. In this process, an equally weighted model is selected utilization, the composite desirability of the form expressed in Eq. 10.

$$D = [d\left( {y_{1} } \right) \times d\left( {y_{2} } \right) \times ... \times d\left( {y_{n} } \right)]^{\frac{1}{n}}$$
(10)

where the total individual response number is denoted by n.

After analysis of variance (ANOVA) and diagnostic test statistics, numerical optimization is carried out through which appropriate ratio combination of factor levels which simultaneously satisfy the criteria (maximize, minimize, in range, equal to and target) for each of the predictors and response parameters with the imposition of the formulated single components constraints as shown in Table 28. In range and maximize, criteria were assigned to the predictors and response variables, respectively [73,74,75].

Table 28 Optimization criteria definition

The mixture optimization solution is presented in Table 29. From the desirability function computation, the solution with the highest score is selected as the optimal solution based on the desirability criteria score of 0.977 with mix proportion 0.45:0.44259:0.1071 for the ratio of soil, CKD and water, respectively.

Table 29 Optimization solutions

4.5.1 Optimization Ramps and Bar Graph

The numerical optimization ramps show the optimal solution graphical view with the optimal predictor parameter factor settings in red and the optimal response parameter factor in blue. This factor tool helps to make required selection of the optimal solution in a graphical view presentation as shown in Fig. 19.

Fig. 19
figure 19

Optimization ramps

The bar graph showing the desirability of the predictor and response variables in blue and red colors, respectively, is presented in Fig. 20. From the desirability bar graph, the predictor parameters produced a desirability scores of 1 while the response variables produced 0.973206 and 0.977709 for UCS and CBR, respectively. Finally the overall durability score of 0.976957 was generated [76, 77].

Fig. 20
figure 20

Optimization bar graph

4.5.2 Optimization Trace Plot

Trace plots are used to evaluate all the mixture components effects in the factor space. The essence of this plot is to find out sensitivity of the response function compared to the deviation from the formulation close to the reference blend [78, 79]. The trace Piepel plot which has U-pseudo coding units on the x-axis for the CBR, UCS and optimal desirability responses is shown in Fig. 20. The contour plot is an important tool for the visualization of the feasible experimental region’s functional points in iteration solution of mixture optimization. It is a graphical tool for 3D-surface representation by contour plotting in terms of constant slice in 2-D form [80]. The contour plots for the CBR, UCS and optimal desirability responses are shown in Fig. 21

Fig. 21
figure 21

Trace (piepel) plot

Three dimension surface plots are diagrammatic presentation of the three mixture component relationships with respect to the response variables and also for the desirability function shown in Fig. 22

Fig. 22
figure 22

Contour plot

4.6 Post Analysis

The post analysis computation results showing the confirmation report at two-sided confidence of 95%, the descriptive statistics of the model predicted results and the component constraints which must sum to one is presented in Table 30. The results indicate a component level of 0.45, 0.43 and 0.12 for soil, CKD and water, respectively [81] (Fig. 23).

Table 30 Confirmation report
Fig. 23.
figure 23

3D-Surface plot

4.6.1 Coefficient Table

The coefficient table showing the factor level combination optimization coefficients of the black cotton soil-CKD blend is presented in Table 31. The special quartic and quadratic models were simultaneously adapted for the complex mixture optimization computation where the former was used for UCS response modelling while the later was for CBR response modelling [82, 83].

Table 31 Coefficient table

5 Conclusion

From the foregoing mixture optimization research using EVD to evaluate the mechanical properties of problematic black cotton soil-CKD blend the following conclusions can be drawn.

The treatment of the problematic soil with CKD leads to improvement in the soil’s mechanical property producing a maximum response of 501 kN/m2 and 58% for UCS and CBR, respectively, at 45% ratio of CKD and 10% ratio of water. Using I-optimal design for factor space, the mixture ratios and run of experiments were derived from the vertex, interior, center and edge of the simplex. The single component constraints were imposed on the mixture ingredients bounded by upper and lower limits while the sum of the ingredients must be unity. The mix ratio designed in this process is utilized for laboratory methodology to derive their respective responses in terms of CBR and UCS characteristics. Data gotten from this process are utilized for the model development for the soil-CKD behaviour. The results obtained showed an improvement in the mechanical properties of the studied soil makes it useful for pavement subgrade materials while also encouraging the recycling and re-use of industrial waste a very fundamental aspect of waste management for safe environment. Statistical fit test, ANOVA, diagnostic and influence tests were carried out after generating experimental response where model coefficients were derived with respect to UCS and CBR responses. UCS response modelling was carried out using special quartic model while quadratic model was utilized for CBR response modelling. Numerical and graphical optimizations were further conducted so as to derive the optimal solution using the desirability scale ranging from 0 to 1. A desirability score of 0.977 was obtained at optimal mix ratio of 0.45:0.44259:0.1071 for soil, CKD and water, respectively, to produce optimal response of 497.758 kN/m2 and 57.3758% for UCS and CBR, respectively. Based on the upshot of this study, the incorporation of cement kiln dust in the amelioration of mechanical performance of an expansive clay has occasioned a considerable level of enhancement. However, the optimization of other inherent soil parameters like resilient modulus could as be investigated upon using EVD method and this is part of the limitation of the study. Finally, from the economic assessment point of view of the current study, it entails in numerous folds: (i) the reutilization of waste materials in soil re-engineering as either cement or lime surrogate materials facilitates reduction in cost of infrastructural constructions, (ii) it will reduce the amount of C02 emanation during cement production process thereby promoting sustainable environment and (iii) it will eliminate the rate of waste in the society and as well reduce the trouble of inadequate waste management practices.