Introduction

Soil management and crop productivity play a major role in the agricultural field through improvements in soil parameters. In crop productivity and soil fertility management, cation exchange capacity (CEC) acts as a key property that controls basic soil functions such as water retention, contaminants, nutrients, and pH. Accurate measurement of micronutrients and efficient soil management enhances crop productivity and soil fertility, and quality is determined (Emamgholizadeh et al. 2023). Soil properties depend on environmental factors such as parent material, climate, and topography; the pH measures soil alkalinity and soil acidity due to population growth and high food demand, requiring large amounts of fertilizer, which leads to many problems in the future generations (Zhang et al. 2019). Excessive use of fertilizers generates soil pollution and degrades soil quality in cultivated land (Reshma and Aravindhar 2022; Sankareshwaran et al. 2023; Senthil Pandi et al. 2022a; Dhiravidachelvi et al. 2023).

Soil properties generally contain organic matter, soil nutrients, soil moisture, heavy metals, clay, and organic carbon; the narrow wavelength range of multispectral bands provides complex variability measurements and continuous spectral monitoring of soil nutrients (Blesslin Sheeba et al. 2022). Weak spectral changes in soil are captured by hyperspectral remote sensing variables, which can be performed with sufficient spectral resolution. It may have greater sensitivity and response relative to multispectral remote sensing in soil nutrient estimation (Song et al. 2018; Senthil Pandi et al. 2022b; Kalpana et al. 2023). Soil nutrients are generally classified as high, medium, and low based on the fertilizers and required nutrients present in the soil. The pH value of soil is categorized as alkaline (14 ≥ 7), neutral (Iorliam et al. 2020), and acidic (0 < 7), calculated based on the acid and basic substances present in the soil sample (Li et al. 2019). Figure 1 represents the recommendation of pH level.

Fig. 1
figure 1

pH level recommendation

Earlier, machine learning–based approaches along with artificial neural networks (ANN) were employed to determine soil fertility levels, and regression studies were performed for predicting soil fertility in soil with different parameters such as electrical conductivity (EC), organic carbon, clay loam, water capacity, bulk density, sandy loam, and silt loam (Keerthan Kumar et al. 2019; Iorliam et al. 2020). Some soil fertility and pH level–related problems were identified and solved using different machine learning approaches; the Bayesian approach was implemented to determine soil nutrients such as iron, copper, nitrogen, zinc, phosphorous, organic carbon, and potassium, as well as pH levels (Jena et al. 2022). This paper introduces a novel approach to predicting and classifying village-wise soil fertility nutrients and soil pH; the soil fertility nutrients are classified as normal, medium, and high, while soil pH level is categorized as slightly acidic (SLA), moderately acidic (MA), highly acidic (HA), and strongly acidic (SA). With the increase in data availability and advancements in machine learning techniques, predicting soil nutrient levels has become a topic of interest for researchers.

Environmental persistence of carbon emissions can cause global warming, which can cause climate change, including the melting of polar ice, rising sea levels, disturbance of natural habitats, and extreme weather events (Abbasi and Erdebilli 2023). The supply chain design (SCD) is utilized for reducing carbon emission in the environment (Abbasi et al. 2023a). Many policymakers aim to design effective carbon policies to reduce greenhouse gas (GHG) emissions, which can establish efficient procedures for auditing and certifying these facilities and equipment (Abbasi et al. 2021). The reduction of carbon emission industries is reconfiguring their design supply chain networks (SCNs) which can result in low carbon systems and energy-efficient operation (Abbasi and Choukolaei 2023). The main purpose of decommissioning is to properly separate parts and materials for recycling, replication, and reuse (Abbasi et al. 2022). A quantitative method of multi-criteria decision-making allows for solving problems by creating a weighting scale regarding the priorities of decision-makers (Abbasi and Daneshmand-Mehr 2022). Centers for the production, collection, and disposal of masks can affect the efficiency of the supply chain because the number and location of production, collection distribution, and disposal centers directly affect the amount of masks stored at each center, whereas the supply chain costs are important to the health of the individual (Abbasi et al. 2023b).

The agricultural sector can secure an important acceptance in ensuring global food security and the livelihoods of millions of people; with the world’s population, projected to reach 9.7 billion by 2050, agricultural practices must become more sustainable and efficient. To achieve this by using precision agriculture techniques that allow farmers to make informed decisions about crop management, fertilizer application, and irrigation. In recent years, machine learning algorithms have been applied to soil nutrient classification and prediction with promising results. One popular machine learning algorithm for classification and prediction tasks is the MLPNN. In the field of agriculture, MLPNNs have been applied to predict crop yields, detect plant diseases, and classify soil types; the efficiency of MLPNNs depends heavily on the choice of their parameters, such as the number of layers, neurons, and activation functions. Optimizing these parameters is a challenging task, and traditional optimization algorithms such as gradient descent and genetic algorithms can be slow and inefficient.

To address these challenges, researchers have explored the use of metaheuristic optimization algorithms for MLPNN parameter optimization (Kaveh and Mesgari 2022). Metaheuristic algorithms are population-based optimization techniques that can efficiently explore a large search space and converge to optimal solutions. One such algorithm is the Whale Optimization Algorithm (WOA), which is based on the feeding behavior of humpback whales. The HWAO-MLPNN method is proposed to enhance the accuracy while classifying the soil nutrients. This approach has the potential to provide accurate soil nutrient information to farmers; the research is motivated by the need for sustainable agriculture practices that can help to meet the growing demand for food while minimizing environmental impact. By enhancing the accuracy of soil nutrient classification and prediction, the proposed approach can contribute to more efficient use of resources and better crop yields.

Soil fertility is defined by the determinants of crop yield and quality, and the problem of accurately predicting soil nutrient levels is critical to the sustainable and efficient use of resources. Nutrient deficiency or excess can cause significant damage to the environment and decrease crop productivity; therefore, soil nutrient management is a critical aspect of agricultural practices, especially in regions where land is limited, and food production needs to be increased to meet growing demand. Traditional methods of soil nutrient analysis involve laborious and time-consuming laboratory tests that are expensive and require skilled personnel (Choudhary and Machavaram 2022). Such methods are also limited in their ability to provide spatial and temporal information on soil fertility levels.

In recent years, machine learning techniques particularly neural networks have shown promise in accurately predicting soil nutrient levels (Zhang et al. 2022). MLPNN is a type of neural network that has been used for soil nutrient classification and prediction; the MLPNN has multiple layers of neurons that learn the features of the input data and map them to the output, which is the predicted nutrient level (Wu et al. 2020). Parameter optimization is a manually tedious and time-consuming task; there is a risk of suboptimal parameter settings that result in poor prediction accuracy. To address this problem, optimization algorithms have been developed that automate the process of parameter tuning; such algorithms can find the optimal values of the parameters that maximize the classification accuracy of soil nutrient levels (Yang et al. 2022); the WOA algorithm has some limitations in terms of balancing the exploration and exploitation phases of optimization. To overcome these limitations, a hybrid algorithm called the Whale Optimization Algorithm and Archimedes Optimization Algorithm (HWAO) has been proposed, which integrates the exploration phase of WOA and the exploitation phase of the Archimedes Optimization Algorithm (AOA) (Hashim et al. 2021). Therefore, the problem that this research aims to address is the need for an accurate and efficient method for predicting soil nutrient levels using MLPNNs while addressing the challenges of parameter optimization.

The specific objectives are:

  • A novel HWAO-MLPNN is proposed to perform an accurate classification and prediction of soil nutrients.

  • To optimize the parameters of the MLPNN using the HWAO algorithm to improve its classification accuracy.

  • To compare the performance of the optimized MLPNN with the baseline MLPNN and other optimization algorithms.

  • To provide insights into the relationship between soil nutrient levels and the input variables in the dataset.

The proposed HWAO algorithm for MLPNN parameter optimization is expected to contribute to the advancement of soil nutrient classification and prediction. The major contribution of this research is as follows:

  • Development of an MLPNN model for soil nutrient classification: This research proposes an MLPNN model for soil nutrient classification. This MLPNN model can predict soil nutrient levels with high accuracy and can be used as a tool for sustainable agriculture and efficient use of resources.

  • Optimization of MLPNN parameters using the HWAO algorithm: This research proposes the HWAO algorithm for optimizing the parameters of the MLPNN. The HWAO algorithm integrates the WOA and AOA algorithms that enhance the convergence rate in soil prediction.

  • Comparison of the performance of the optimized MLPNN: This research compares the performance of the optimized MLPNN with the baseline MLPNN and other optimization algorithms. The results show that the proposed HWAO algorithm outperforms other optimization algorithms and significantly enhances the classification accuracy of soil nutrient levels.

  • Insights into the relationship between soil nutrient levels and input variables: This research provides insights into the relationship between soil nutrient levels and input variables in the dataset. The analysis of the MLPNN weights and biases can help understand the importance of each input variable in predicting soil nutrient levels. This information can aid in making informed decisions for sustainable agriculture and efficient use of resources.

Overall, this research contributes to the advancement of soil nutrient classification and prediction by proposing an MLPNN model and an HWAO algorithm for parameter optimization. The proposed method can enhance the accuracy of soil nutrient classification and provide insights into the relationship between soil nutrient levels and input variables.

The outline of this paper is organized as follows: the “Review of related works” section describes the related work of this paper with an overview of soil nutrient classification and comparison. Materials and methods are explained in the “Material and methods” section, which includes the study area, background information of each technique, and optimization procedure. The “Experimental results” section presents an experimental result and simulation tools, optimized parameters, and performance evaluation analysis with various measures. Finally, the “Discussion and conclusion” section includes a discussion and conclusion along with future research.

Review of related works

In this section, a detailed explanation of different soil nutrient predictions and classifications related to existing research papers was provided to identify the effectiveness. Also, this section was used to identify the difficulties involved during classification, performance requirements, and limitations. The detailed study of soil nutrient and pH classification was explained as follows.

Various studies have investigated the use of machine learning models for soil nutrient prediction. Trontelj and Chambers (2021) employed a spectroscopic method and nutrient characterization strategy to categorize nutrients into three groups, using phosphorous, potassium, and magnesium as examples. The support vector machine (SVM) model was found to outperform other machine learning models (ML), but its performance varied depending on soil properties and structure. The performance is increased or decreased by the range of 25 to 35% depending on soil properties. Lou et al. (2022) developed a preference neural network (PNN) model using the radial bias function (RBF) as a kernel for support vector regression. The PNN model achieved four fertility targets—nitrogen, pH level, organic matter, and salt. The performance range is increased by 25.81% and 27.99% respectively due to the fertility of the soil. Sirsat et al. (2018) used a support vector regression (SVReg) model for predicting five soil nutrients, including organic carbon, iron, zinc, phosphorous, and manganese, and found that the extreme randomized regression tree provided the best prediction accuracy of 0.89%. Benedet et al. (2021) utilized a random forest (RF) and generalized linear model (GLM) for predicting soil fertility properties based on portable X-ray fluorescence (pXRF) data, and the RF model yielded more accurate predictions than the GLM model. The validation values were obtained from the range of 0.59 to 0.89, where the spatial variability maps are required to identify soil fertility properties on the landscape.

Several studies have focused on developing machine learning models for predicting and classifying soil nutrients and fertility levels. Real-time data collected from IoT sensors such as pH, soil moisture, soil temperature, and color sensors have been utilized in these studies. Senapaty et al. (2023) implemented to perform the classification of soil using Multi-class Support Vector Machine with a Directed Acrylic-based Fruit Fly Optimization (MSVM-DAG-FFO) algorithm that showed a better performance of 0.932%. However, the model was only able to classify a few types of soil nutrients which could lead to low soil quality.

Yang et al. (2021) developed a Particle Swarm Optimization–based Extreme Learning Machine (PSO-ELM) for estimating soil nitrogen level and soil organic matter. The established method provided a high prediction performance of 0.73% compared to other related methods. However, large-scale field and imagery samples were not suitable for estimation. Emamgholizadeh et al. (2023) utilized the Particle Swarm Optimization–based Adaptive Network Fuzzy Inference System (ANFIS-PSO) model to predict soil fertility levels using different chemical and physical properties of soil. The accurate prediction of ANFIS-PSO was performed in terms of RMSE of 0.212%, coefficient of determination of 6.328%, mean absolute error (MAE) of 0.78%, and Lin’s concordance correlation coefficient (LCCC) of 0.67%. However, for more developments in soil fertility level prediction, a neural network model was required.

Escorcia-Gutierrez et al. (2022) implemented for classifying the soil and pH values by Intelligent Soil Nutrient and pH Classification–based Weighted Voting Ensemble (ISNpHC-WVE) deep learning method. The weight vector was provided to each deep learning model using the WVE model, and three types of deep learning models such as deep belief network (DBN), gated recurrent unit (GRU), and bidirectional long short-term memory (BiLSTM) were employed for soil nutrient and pH classification. The model achieved a high classification performance accuracy of 0.9281% using various performance measures. However, large-scale datasets and real-time data were not able to be implemented in this paper.

Reshma and Aravindhar (2022) used machine learning models such as decision tree (DT), support vector machine (SVM), and multilayer perceptron (MLP) for predicting and classifying soil nutrients. The MLP model achieved a better classification accuracy of 94% compared to other classification methods. However, for more nutrient classification, a hybrid activation function needed to be integrated with the MLP model. Data preprocessing methods were employed to remove data duplication and missing values.

In a study, Suchithra and Pai (2020) used an extreme learning machine (ELM) to classify village-wise soil fertility indices such as K, P, B, OC, and soil pH levels. The ELM model was integrated with various activation functions to solve the classification problems, and performance evaluation measures like the accuracy of 86.27%, kappa score of 50.68%, precision of 90%, specificity of 94%, sensitivity of 92.1%, and cross-validation accuracy of 59% were used to identify the best classification performance.

In another study, Song et al. (2018) utilized to determine spatial variation obtained in soil by Back Propagation Neural Network–based Ordinary Kriging (BPNNOK) method. The study used 1287 soil samples for prediction, which contained soil available potassium, soil total nitrogen, and soil available phosphorous. The soil contents were predicted after performing a dimensionality reduction process based on auxiliary variables. The study found that the BPNNOK model was effective in the mapping and monitoring of soil nutrients by achieving an accuracy of 0.915%, but there was room for improvement in spatial mapping performance for predicting more soil nutrient variations. Table 1 represents the summary of the related works.

Table 1 Summary of related works

Material and methods

Some of the soil samples collected from the Marathwada dataset are micronutrients (Cu, Zn, Fe, Mn, and B), pH, OC, electrical conductivity (EC), plant available primary nutrients (P, K), and secondary nutrient (S). The predicted soil fertility index and soil pH levels were then stored in a database for training, testing, and validation processes. To improve the prediction and classification performance, the data was partitioned into 70% for training, 20% for testing, and the remaining 10% for validation. During the classification phase, the HWAO-MLPNN model was used to classify the soil samples into the required soil fertility index and soil pH level. The MLPNN model was used to enhance the soil nutrient and pH level classification. The classification model categorized soil fertility into organic carbon (OC), available phosphorus (P), boron (B), and available potassium (K).

Study area

This research is focused on the state of Kerala which is considered that the agriculture is the main source of income for rural people (Suchithra and Pai 2020). However, the structural changes are determined due to the minimization of the Gross State Domestic Product (GSDP).

To address this issue, data were collected by the State Government of Kerala between 2014 and 2017 from the North Central Laterites region, covering an area of 171,469 ha (4.41%) of the State. The region has a tropical humid monsoon-type climate with a mean annual temperature of 27.6°C and rainfall of 2795 mm.

The MLPNN parameters are optimized to enhance the accuracy by using the WOAOA algorithm. This approach resulted in a maximized classification accuracy of soil nutrient levels and an improved efficiency of the MLPNN. Figure 2 illustrates the overall architecture of the proposed model.

Fig. 2
figure 2

Overall architecture of the proposed model

Background information for used techniques

In this section, the background information of the MLPNN, WOA, and AOA is described; the MLPNN network model classifies the soil nutrients and high convergence rate based on bias and weight. The WOA is used to enhance the exploration capacity while maintaining a fast convergence speed (Sun and Chen 2021). AOA algorithm updates the global best position to balance the exploitation and exploration phases (Akdag 2022).

Multilayer Perceptron Neural Network (MLPNN)

The MLPNN is utilized in various practical problems because of its high generalization capability, non-linearity, fault tolerance, and robustness (Song et al. 2018). The MLPNN is determined with three layers and the connections between these layers have some weights relying on the range of [−1, 1]; in MLPNN, each node performs two types of operations, namely the summation function and activation function. The summation function (Fsi) is derived based on the combination of weights, inputs, and bias that is expressed in Eq. (1);

$${Fs}_i=\sum \limits_{j=1}^m{w}_{ji}\;{X}_j+{\textrm{B}}_i$$
(1)

where m denotes the number of inputs, Xj denotes the input variable, wji denotes the connection weight, and Βi denotes the bias respectively. The mostly used activation function is the S-shaped curved sigmoid function that is applied in MLPNN. The activation function is denoted by αi(y) which is represented using Eq. (2);

$${\alpha}_i(y)=\frac{1}{1+{e}^{-{Fs}_i}}$$
(2)

The MLPNN is concluded with the output of the hidden neurons and the output of MLPNN (Υj) is expressed by using Eq. (3);

$${\textrm{Y}}_j={\alpha}_i\left(\sum \limits_{j=1}^m{w}_{ji}\;{X}_j+{\textrm{B}}_i\right)$$
(3)

The weights are rationalized for error reduction and outcome estimation; the MLPNN can address the problems occurred in different classes and also has a non-linear mapping capability (Heidari et al. 2019). To eliminate the slow convergence rate and local optimum issues, a gradient descent–based model is employed that provides high convergence rate, and global optimum solutions, and solves classification problems.

Whale Optimization Algorithm

The WOA is a metaheuristic optimization algorithm inspired by the hunting behavior of humpback whales, which can be used to solve challenging continuous problems (Tawhid and Ibrahim 2021). This algorithm employs three different stages, namely the surrounding prey stage, the prey hunting stage, and the bubble net attack stage.

  • Surrounding prey stage

At first, the whale determines the prey’s location but it does not have enough knowledge about the prey’s location in advance. So the target individual is considered the current position, and all individuals move towards the optimal position. The new search agent position is updated by using Eq. (5);

$${S}_j^{\delta }=\left|{H}_j\;{Y}_j^{\ast \delta}\left({\tau}_{iter}\right)-{Y}_j^{\delta}\left({\tau}_{iter}\right)\right|$$
(4)
$${Y}_j^{\delta}\left({\tau}_{iter}+1\right)={Y}^{\ast \delta}\left({\tau}_{iter}\right)-{B}_j\;{S}_j^{\delta }$$
(5)

From Eqs. (4) and (5), the current candidate solution is represented by Yj, Y denotes the best solution, δ denotes the search space dimension, and || denotes the absolute value respectively. The parameters such as Bj, b, and Hj are derived as,

$${B}_j=2b\;{\Re}_1-b$$
(6)
$$b=2\left(1-\frac{\tau_{iter}}{M_{iter}}\right)$$
(7)
$${H}_j=2{\Re}_2$$
(8)

where τiter and Miter denote the iteration number and a maximum number of iterations. The random values are depicted as 1 and 2 with an interval [0, 1] and the convergence factor is indicated by b updates within the iteration ranges from 2 to 0.

  • Bubble net attack stage

In this stage, the prey is attacked through a bubble net strategy which is assumed as an exploitation phase. The prey is attacked by using shrinking and surrounding approaches which is accomplished by minimizing the convergence factor and the other approach is a spiral updating position that is evaluated between the current optimal position and whale position to capture food by helix-shaped movement. The position is updated by using Eq. (9);

$${Y}_j^{\delta}\left({\tau}_{iter}+1\right)=\left\{\begin{array}{l}{Y}^{\ast \delta}\left({\tau}_{iter}\right)-{B}_j\;{S}_j^{\delta },\kern3.599998em \textrm{if}\kern0.24em q<0.5\\ {}{S}_j^{\hbox{'}\delta}\;\exp (ak)\;\cos \left(2\pi\;k\right)+{Y}^{\ast \delta },\kern0.84em \textrm{if}\kern0.24em q\ge 0.5\;\end{array}\right.$$
(9)

In Eq. (9), \({S}_j^{\hbox{'}\delta }\) and a are delineated as the distance between the whale and prey and constant of logarithmic spiral for shape characterization. k is denoted as a uniformly distributed random number that relies on the range of [−1, 1] and q is presented as a random number between 0 and 1.

  • Prey hunting stage

If the condition |B| > 1 is satisfied, the random search agent positions can be updated. Otherwise, the search agent position is updated which is denoted as the exploration phase and the exploration phase can avoid the local optimum problem for enhancing global search capability, the random search agent position is updated using Eq. (10);

$${\displaystyle \begin{array}{l}{Y}_j^{\delta}\left({\tau}_{iter}+1\right)={Y}_r^{\delta}\left({\tau}_{iter}\right)-{B}_j\;{S}_j^{\delta}\\ {}{S}_j^{\delta }=\left|{H}_j\;{Y}_r^{\delta}\left({\tau}_{iter}\right)-{Y}_j^{\delta}\left({\tau}_{iter}\right)\right|\end{array}}$$
(10)

The WOA algorithm is included to enhance the global exploration capability while maintaining a fast convergence speed.

Archimedes Optimization Algorithm

AOA is implemented to address a real-world problem that has an exerted force behavior during an object is immersed in a fluid (Houssein et al. 2021). The AOA algorithm is mainly focused on exploration issues and exploitation issues. The steps involved in the AOA algorithm are as follows.

  • Update volume and densities

The solution volume and density are updated based on Eq. (11) and are formulated as,

$${\displaystyle \begin{array}{l}{V}_j^{\tau_{iter}+1}={V}_j^{\tau_{iter}}+r\times \left({V}_{best}-{V}_j^{\tau_{iter}}\right)\\ {}{D}_j^{\tau_{iter}+1}={D}_j^{\tau_{iter}}+r\times \left({D}_{best}-{D}_j^{\tau_{iter}}\right)\end{array}}$$
(11)

In Eq. (11), the volume of the global best solution is depicted by Vbest, the density of the global best solution Dbest, and r is the random number between 0 and 1.

  • Density factor and transfer operator

The collisions between the objects are used to start the process of reaching the equilibrium state and the objects are switched from the exploration to the exploitation phase by using the transfer operator. The density factor is helpful for global search but the density factor values are decreased according to time; the transfer operator and density factor are expressed in Eqs. (12) and (13) as

$${\textrm{T}}_f=\exp \left(\frac{\tau_{iter}-{M}_{iter}}{M_{iter}}\right)$$
(12)
$${D_f}^{\tau_{iter}+1}=\exp \left(\frac{M_{iter}-{\tau}_{iter}}{M_{iter}}\right)-\left(\frac{\tau_{iter}}{M_{iter}}\right)$$
(13)

From Eqs. (12) and (13), the terms Τf, \({D_f}^{\tau_{iter}+1}\), τiter, and Miter are delineated as transfer operator, density factor, number of iterations, and maximum number of iterations respectively. If \({D_f}^{\tau_{iter}+1}\) is gradually decreased, the solution includes the promising search area and the correct setting of this variable affects the balance between the exploitation and exploration phases.

  • Exploration phase and Exploitation phase

In the exploration phase, the objects are in a collision state due to satisfying the conditionΤf ≤ 0.5. The materials are randomly selected and the acceleration of an object is updated based on Eq. (14) and is derived as

$${Acceleration}_j^{\tau_{iter}+1}=\frac{D_{\textrm{M}}+{V}_{\textrm{M}}+{Acceleration}_{\textrm{M}}}{D_j^{\tau_{iter}+1}\times {V}_j^{\tau_{iter}+1}}$$
(14)

whereas the density and volume of random material are denoted by DΜ and AccelerationΜ represents the acceleration of random material respectively.

If objects satisfy the condition of Τf > 0.5, the objects are in no collision state which means there are no collisions between the objects. The acceleration of an object is updated according to Eq. (15) as

$${Acceleration}_j^{\tau_{iter}+1}=\frac{D_{best}+{V}_{best}+{Acceleration}_{best}}{D_j^{\tau_{iter}+1}\times {V}_j^{\tau_{iter}+1}}$$
(15)

From Eq. (15), the density of the best object, volume of the best object, and acceleration of the best object are explained by using Dbest, Vbest, and Accelerationbest respectively.

  • Acceleration normalization

The acceleration of normalization is derived based on the below-mentioned equation,

$${Acceleration}_{j-n}^{\tau_{iter}+1}={U}_n\times \frac{Acceleration_j^{\tau_{iter}+1}-\mathit{\operatorname{Min}}(Acceleration)}{\mathit{\operatorname{Max}}(Acceleration)-\mathit{\operatorname{Min}}(Acceleration)}+{L}_n$$
(16)

The values of upper normalization Un and lower normalization Ln are 0.9 and 0.1 respectively. In general, the value of the acceleration is high at the starting point but this value is slowly decreased which allows the solution to reach the global best and avoids trapping.

  • Update position

The position of the object in the exploration and exploitation phase is updated and delineated in Eqs. (17) and (18) are used as follows:

$${y}_j^{\tau_{iter}+1}={y}_j^{\tau_{iter}}+{Z}_1\times {Acceleration}_{j-n}^{\tau_{iter}+1}\times {D}_f\times \left({y}_r-{y}_j^{\tau_{iter}}\right)$$
(17)
$${y}_j^{\tau_{iter}+1}={y}_{best}^{\tau_{iter}}+\Im \times {Z}_2\times r\times {Acceleration}_{j-n}^{\tau_{iter}+1}\times {D}_f\times \left(I\times {y}_r-{y}_j^{\tau_{iter}}\right)$$
(18)

The value of Z1, Z2, Z3, and Z4 are 2, 6, 2, and 0.5 respectively, and the increasing variable is denoted by I. The flag is utilized for changing the direction of motion and is given by

$${\displaystyle \begin{array}{l}\Im =\left\{\begin{array}{l}+1,\kern0.36em if\kern0.24em q\le 0.5\\ {}-1\kern0.48em if\kern0.36em q>0.5\end{array}\right.\\ {}q=2\times r-{Z}_4\end{array}}$$
(19)
$$I={Z}_3\times {\textrm{T}}_f$$
(20)

Finally, the objective function is computed to determine the global best solution of the current iteration.

HWAO-optimized MLPNN soil nutrient classification

The soil nutrient and pH level classification are conducted by achieving optimal solutions in the HWAO-MLPNN model and the optimal solutions are attained through balancing exploitation and exploration phases. The parameters of the MLPNN model are optimized using the HWAO algorithm and the procedures to predict the optimal solutions are provided in below-mentioned subsections.

Proposed Hybrid WOAO algorithm

The proposed WOAO algorithm is a metaheuristic optimization algorithm that combines the WOA with the exploitation phase of the AOA.

The WOAOA algorithm uses a spiral model and a search for prey to explore the search space, while the exploitation phase from AOA is used to exploit promising solutions. To balance these two phases, the algorithm introduces a new parameter called γ, which controls the probability of switching between the spiral model and exploitation phase from AOA. This parameter depends on the search dimension of the problem and is used to give more flexibility in the exploration and exploitation phases. The step-by-step procedure of the proposed HWAO algorithm is described in Algorithm 1.

Algorithm 1
figure a

 Pseudocode of proposed HWAO

The proposed MLPNN and HWAO algorithm is utilized for optimizing the parameters and the search space boundaries are randomly initialized. The algorithm then updates the positions of the solutions using the spiral model, search for prey, and exploitation phase from AOA to balance the exploration and exploitation phases of the optimization process. The γ parameter is used to control the probability of switching between the exploration and exploitation phases.

HWAO-optimized MLPNN soil nutrient classification (HWAO-MLPNN)

The classification problem in this research involves two categories such as soil fertility index and pH prediction. The cultivation land is varied for each nutrient level. Similarly, for the pH prediction category problem, the aim is to predict the soil’s pH levels, and there are four classes with a varying number of cultivation lands per class.

This method can enhance the prediction accuracy of soil nutrient classification using MLPNN by optimizing its parameters with the HWAO algorithm. The solution representation is a vector of real-valued parameters representing the weights and biases of the MLPNN, and the fitness function is the classification accuracy on a validation dataset. The optimization process aims to maximize the classification accuracy of soil nutrient levels by finding the optimal values of the parameters. The flowchart of the HWAO-optimized MLPNN model is shown in Fig. 3.

Fig. 3
figure 3

Flowchart of HWAO-optimized MLPNN model

The research’s ultimate goal is to help farmers make informed decisions for sustainable agriculture and efficient resource use by accurately predicting soil nutrient levels. To achieve this goal, we propose HWAO-optimized Multilayer Perceptron Neural Network (MLPNN) for soil nutrient classification. The optimization process aims to find the optimal values of the parameters that can maximize the classification accuracy of soil nutrient levels. The MLPNN has multiple layers of neurons that learn the features of the input data and map them to the output, which is the predicted nutrient level. To use the HWAO algorithm, we need to define a solution representation and a fitness function.

Algorithm 2
figure b

 Pseudocode of HWAO optimized MLPNN (HWAO-MLPNN)

Solution representation: The solution representation in this problem is a vector of real-valued parameters that control the behavior of MLPNN and can be adjusted to maximize the classification accuracy of soil nutrient levels. The search space for these parameters is bounded to ensure that the solution representation is valid and feasible.

Fitness function: The fitness function of the problem is the classification accuracy of the MLPNN on a given dataset. The accuracy is computed using Eq. (21) by comparing the predicted nutrient levels of the MLPNN with the actual nutrient levels in the dataset.

To evaluate the fitness of each solution in the HWAO algorithm, we train the MLPNN using the solution representation and evaluate its classification accuracy on a validation dataset. The validation dataset is separate from the training dataset and is used to evaluate the generalization performance of the MLPNN. The fitness function is then computed as the classification accuracy on the validation dataset. Algorithm 2 shows the procedure to attain optimal classification accuracy.

Experimental results

The experimental analysis of the proposed HWAO-MLPNN model is clearly described; the experiments are performed on Python and the village-wise fertility index data are collected from the Marathwada dataset (Sirsat et al. 2017). The soil nutrients are classified into five classes P, K, OC, and B. The soil pH level is categorized into SLA, MA, HA, and SA based on the cultivation lands. The details about the experimental tool, parameters, and performance analysis are explained in the next upcoming sections.

Experimental setup

The experimental setup of pH level and soil nutrients classification of the paper is conducted on Python 3.10.10 along with Intel (R) Core (TM) i3-1005G1 CPU@1.20GHz, 4GB RAM, 64-bit operating system, and Windows 11.

Hyperparameter configuration

The hyperparameter tuning process is implemented to improve the classification performance and reduce the loss performance by using the proposed HWAO-MLPNN. The optimized parameters of each technique are provided in Table 2. The MLPNN hyperparameters are the activation function of sigmoid, the number of hidden layers and the neuron’s hidden layers are 1 and 35, and the learning rate and the decay rate are 0.001 and 0.002, whereas the momentum value is 0.05 respectively. In the WOA, the convergence factor value is [2, 0], the activation function (a) is 1, and the tested function and the population size are set as 30, whereas the maximum number of iterations is 500. In the AOA, the normalization range of the upper and the lower value is 0.9 and 0.1, the update positions values Z1, Z2, Z3, and Z4, are 2, 6, 2, and 0.5, and the size of the population is 30, whereas the number of iterations and the problem definition is represented by the values of 1000 and 10 respectively.

Table 2 Hyperparameter settings

Dataset description

The classification of soil nutrients and validation of pH is determined by using the Marathwada dataset. The data are captured from the Marathwada region which is located in Maharashtra, India. The proposed HWAO-MLPNN is validated with the Marathwada dataset which offers better classification accuracy rather than other state-of-the-art methods. The MLPNN model with sigmoid activation function uses 50 hidden neurons for soil nutrient classification and also uses 150 hidden neurons for pH classification.

The data are captured from the Marathwada region which is located in Maharashtra, India, and this dataset uses some of the important nutrients such as P, K, OC, and B for classification.

Performance evaluation measures

The performance evaluation measures such as MSE, AUC/ROC, Naccuracy, Nk, Nspecificity, Nprecision, NF1 − score, NG − mean, and Nsensitivity are used to evaluate the performance rate. The performance evaluation depends on true positive value (NTP), true negative value (NTN), false positive value (NFP), and false negative value (NFN). The true positive and true negative values are defined as the correctly predicted positive and negative values respectively. The incorrect classification is assumed as the correct classification is called a false positive. The false negative is referred to as a correct classification is wrongly assumed as an incorrect classification. The mathematical form and its corresponding definitions are derived as below:

  • Mean Square Error (MSE)

The MSE is used to measure how many errors appeared in the statistical model and also checks the closest value near the actual value. The mathematical formulation of MSE is expressed as

$$MSE=\frac{1}{\textrm{N}}\sum \limits_{k=1}^{\textrm{N}}{\left({a}_k-{\hat{a}}_k\right)}^2$$
(21)

The number of instances, actual value, and predicted value are depicted byΝ, ak, and \({\hat{a}}_k\) respectively.

  • Area Under the Receiver Operating Characteristic Curve (AUC/ROC)

ROC computes the classification performance at various classification thresholds which are drawn based on the true positive value and false positive value. AUC measures a two-dimensional AUC/ROC curve that ranges from 0 to 1.

  • Accuracy (Naccuracy)

Accuracy is the closeness measurement to the actual values that indicate the quality of classification. On the other hand, the accuracy is computed based on the ratio of correctly classified samples to the total number of samples taken for the classification process, and the accuracy is measured by using Eq. (21) as

$${N}_{accuracy}=\frac{N{}_{TP}+{N}_{TN}}{N{}_{TP}+{N}_{TN}+{N}_{FP}+{N}_{FN}}$$
(22)
  • Kappa statistics (Nk)

The kappa statistics are used to interrater reliability and kappa is the controlling tool for random agreement factor. The kappa value is evaluated by applying the below-mentioned equation

$${N}_k=\frac{P_a-{P}_p}{1-{P}_p}$$
(23)

From the above equation, Pa and Ppare denoted as the probability of actual value and the probability of predicted value respectively.

  • Specificity (Nspecificity)

Specificity predicts the incorrect classification of soils which is calculated based on the ratio of true negative values to the sum of true negative and false positive values. This measure is very useful in the medical field because it accurately predicts who does not have the disease. The specificity is calculated by using the below-mentioned equation,

$${N}_{specificity}=\frac{N_{TN}}{N_{TN}+{N}_{FP}}$$
(24)
  • Precision (Nprecision)

The closeness value of two or more measurements is called precision. The proportion of true positive values and the sum of true and false positive values are referred to as prediction. Equation (25) explains the precision of soil nutrient and pH classification.

$${N}_{precision}=\frac{N_{TP}}{N_{TP}+{N}_{FP}}$$
(25)

F1-score (NF1 − score) and geometric mean (NG − mean) or G-mean

The harmonic mean of precision and sensitivity is known as F1-score and the geometric mean is calculated from the combined measurement of specificity and sensitivity. The equations for F1-score and geometric mean are represented as

$${N}_{F1- score}=2\ast \frac{N_{precision}\ast {N}_{sensitivity}}{N_{precision}+{N}_{sensitivity}}$$
(26)
$${N}_{G- mean}=\sqrt{N_{specificity}\ast {N}_{sensitivity}}$$
(27)
  • Sensitivity (Nsensitivity)

Sensitivity is calculated by using the ratio of the total number of correctly predicted classifications to the total number of actual classifications. The highest sensitivity value represents the correct classification of soil nutrients and pH levels which is computed by using the below-mentioned equation,

$${N}_{sensitivity}=\frac{N_{TP}}{N_{TP}+{N}_{FN}}$$
(28)

Performance analysis

The performance of soil nutrients and pH level is estimated by confusion matrix evaluation. It presents the classification results in a tabular format that can be easily interpreted, showing the number of true positive, true negative, false positive, and false negative values for each class. The matrix is generated by comparing the actual values with the classification results obtained from the model. In this study, a total of 620 cultivation lands were used for soil nutrient and pH level classification. The confusion matrix for soil nutrient classification is shown in Fig. 4, where the nutrient levels are categorized into low, medium, and high. The 3×3 matrix represents the number of cultivation lands that belong to each nutrient level.

Fig. 4
figure 4

Confusion matrix for soil nutrient classification

Similarly, the confusion matrix for pH level classification is depicted in Fig. 5, where the pH levels are classified into SLA, MA, HA, and SA. The 4×4 matrix illustrates the number of cultivation lands that fall into each pH level category, the diagonal values of the matrix correspond to the correctly predicted values, while the off-diagonal values represent misclassifications.

Fig. 5
figure 5

Confusion matrix for pH level classification

The Area Under the Receiver Operating Characteristic Curve (AUC/ROC) is a graphical representation used to evaluate the classification performance of a model at different threshold levels. Figure 6 displays the AUC/ROC curve of the proposed HWAO-MLPNN model, illustrating its superiority over other methods. The curve is generated by plotting the true positive values against the false positive values. The proposed model achieved an AUC rate of 0.981, which is the highest among all other methods. The AUC/ROC graph shows how the true and false positive values change as the threshold varies from 0 to 1.

Fig. 6
figure 6

AUC/ROC representation

In Fig. 7, a graphical representation of the cross-validation accuracy analysis concerning the number of hidden neurons for soil nutrient classification is shown. The proposed HWAO-MLPNN model is evaluated for four soil nutrients, namely P, K, OC, and B. The results show that the cross-validation accuracy attained using the proposed HWAO-MLPNN model is 94% for P, 85% for K, 82% for OC, and 96% for B.

Fig. 7
figure 7

Cross-validation accuracy analysis for soil nutrient classification

Figure 8 shows the cross-validation accuracy analysis for the classification of soil pH levels using different numbers of hidden neurons. Soil pH is an important parameter that indicates the level of acidity or basicity in the soil, and is generally categorized as SLA, MA, HA, and SA. The analysis reveals that the proposed HWAO-MLPNN model achieves a high cross-validation accuracy of 91.3% for SLA, 90% for MA, 92% for HA, and 93.4% for SA, with only small variations observed across the different pH levels.

Fig. 8
figure 8

Cross-validation accuracy for pH level classification

The overall performance of the proposed HWAO-MLPNN model is tabulated in Table 3 for the best understanding, which include the performance of cross-validation accuracy analysis, MSE, accuracy, specificity, sensitivity, geometric mean, kappa, F1-score, and AUC/ROC.

Table 3 Overall performance analysis

Comparison study

The proposed HWAO-MLPNN model is compared to other methods for performance evaluation including Particle Swarm Optimization–based Multilayer Perceptron Neural Network (PSO-MLP) (Houssein et al. 2021), Convolutional Backpropagation-based Multilayer Perceptron (BP-MLP) neural network (Blesslin Sheeba et al. 2022), RBF, RF, SVReg, ELM, and Gaussian Process (GP). Figure 9 shows an accuracy analysis of all these methods. The accuracy measures the closeness of the predicted values to the actual values of the classification model. The proposed HWAO-MLPNN model shows superior performance rate with an accuracy of 98.1%. The performance rates for other methods are 90%, 85%, 93%, 90.9%, 96.3%, 97.2%, and 95.6% for BP-MLP, RBF, SVReg, RF, GP, PSO-MLP, and ELM, respectively.

Fig. 9
figure 9

Accuracy analysis

Figure 10 illustrates a comparison of various methods, including BP-MLP, RBF, SVReg, RF, GP, proposed HWAO-MLPNN model, PSO-MLP, and ELM, in terms of their specificity evaluation. Specificity is a crucial measure that can identify correct classifications without any misclassifications. The proposed HWAO-MLPNN model achieves the highest specificity value of 97.2%, while the RBF model has the lowest specificity value of 86.8%.

Fig. 10
figure 10

Analysis of the specificity

Figure 11 displays the precision evaluation of various methods, including the proposed HWAO-MLPNN model, BP-MLP, RBF, PSO-MLP, ELM, SVReg, RF, and GP. Precision is a metric that measures the accuracy of positive predictions. The proposed HWAO-MLPNN model achieved the highest precision rate of 98.5% among all the methods. The other methods attained precision rates ranging from 85.1% of ELM to 97.1% of GP. The proposed HWAO-MLPNN model outperforms all other methods in terms of precision during soil nutrients and pH level classification.

Fig. 11
figure 11

Comparative analysis of precision

In Fig. 12, the sensitivity analysis of several analysis methods including GP, BP-MLP, PSO-MLP, SVReg, proposed HWAO-MLPNN model, RBF, ELM, and RF is compared. The sensitivity analysis is used to predict how much the input values vary to affect the output value. The proposed HWAO-MLPNN model outperforms other state-of-the-art methods in terms of sensitivity, achieving a rate of 97.8%.

Fig. 12
figure 12

Sensitivity analysis

In Fig. 13, a Mean Squared Error (MSE) analysis is presented for different methods, including the proposed HWAO-MLPNN model, SVReg, GP, BP-MLP, PSO-MLP, RBF, ELM, and RF. The MSE measures the difference between the predicted and actual values, and lower MSE values indicate a better fit of the model. However, MSE is not suitable for binary classification as it does not guarantee a reduction in the cost function. The proposed HWAO-MLPNN model achieved the lowest MSE value of 0.019, indicating the best fit among all the methods compared.

Fig. 13
figure 13

Analysis of MSE

Figure 14a and b display the F1-score analysis and G-mean analysis, respectively, using different comparison methods including GP, BP-MLP, PSO-MLP, RBF, ELM, SVReg, proposed HWAO-MLPNN model, and RF. The F1-score is generally used to evaluate binary classification performance, particularly when positive values require more attention in the classification model, while G-mean aims to increase accuracy in each class. The proposed HWAO-MLPNN model shows the highest performance in both F1-score and G-mean, with an F1-score of 96.3% and a G-mean of 94.5%, outperforming all other methods compared in this analysis.

Fig. 14
figure 14

A comparison study of a F1-score and b G-mean

The cross-validation accuracy analysis of different methods such as PSO-MLP, RBF, GP, BP-MLP, proposed HWAO-MLPNN model, RF, ELM, and SVReg is performed by using several hidden layers. Figure 15a and b describe the cross-validation accuracy rate during soil pH level classification and soil nutrient level classification respectively. The fertilizers change the soil pH level which may increase or decrease the nutrient levels in soil. In Fig. 14a, the proposed HWAO-MLPNN model attained a high cross-validation accuracy of 98.3% for soil pH level classification. In Fig. 15b, the HWAO-MLPNN model attained a high cross-validation accuracy of 97.9% for soil nutrient classification related to other state-of-the-art methods.

Fig. 15
figure 15

Cross-validation accuracy analysis for a pH level classification and b soil nutrient classification

The proposed HWAO-MLPNN model has managerial implications as follows: the result of the HWAO-MLPNN system shows that the system is used to accurately classify the nutrients of the soil and the pH level. This can enhance the quality of the soil environment system; also the profitability of agriculture is enhanced ultimately. The cost of the HWAO-MLPNN model is also minimized, which can be represented in Fig. 16a. The cost of the system can be minimized compared to the other methods, which can result in better performance of the nutrients of the soil, and the pH level of the soil can be achieved at a low cost. The time of the system is also minimized; this can be shown in Fig. 16b. When the number of tests is enhanced, the time of the system is to be minimized. This can represent the HWAO-MLPNN model can attain better classification of soil and the pH levels.

Fig. 16
figure 16

a Cost analysis. b Time analysis

Figure 17 shows the performance analysis of the proposed and the existing optimization algorithms. The existing optimization algorithms like ACO (Nazari et al. 2022), ABC (Andrushia and Patricia 2020), and FOA (Hu et al. 2021) are compared with the proposed HWAO optimization. The ACO attains a performance of 93.2%, ABC attains a performance of 89.6%, and the FOA attains a performance of 91.8%. But the proposed HWAO attains the performance of 98.1%; thus, the proposed HWAO attains better performance than the other optimization algorithms.

Fig. 17
figure 17

Performance analysis of the optimization algorithms

Statistical analysis

The statistical analysis of the HWAO-MLPNN and the existing methods RBF, SVReg, and PSO-MLP are shown in Table 4. To compare the effect of the HWAO-MLPNN method with existing methods, a t-test was performed on the AUC of the model. If the p-value is less than 0.05, then the null hypothesis is rejected. Each column of the t-test contains 20 samples, which represent the AUC of each subject. The HWAO-MLPNN method has the best AUC of 0.986, which has higher values than other methods; the effect size is denoted by es.

Table 4 Statistical analysis

Discussion and conclusion

Agriculture is the backbone of the Indian economy, which is sustained by the fertility of the soil. Soil is essential for growing food production, increasing soil quality also enhances the quality of the food, and a lack of soil nutrients can significantly reduce crop yields. Soil fertility can also be analyzed by the production of waste, cost, and consumption time the production of wastes. This paper can analyze soil nutrient classification accuracy and pH levels, which can lead to improved soil quality, reduced fertilizer use, and enhanced quality of the environment.

Findings

The proposed HWAO-MLPNN model has demonstrated promising results in improving the classification accuracy of soil nutrients and pH levels in the Marathwada region. By utilizing the WOAO algorithm, the model effectively balanced exploration and exploitation phases to achieve optimal solutions. The hyperparameter tuning process further improved the classification performance and reduced the loss performance. The analysis of MSE, AUC/ROC, accuracy, kappa statistics, specificity, precision, F1-score, geometric mean, cross-validation accuracy, and sensitivity showed that the proposed HWAO-MLPNN model outperformed other state-of-the-art methods such as PSO-MLP, BP-MLP, RBF, RF, MLP, SVReg, ELM, and GP. The model achieved an accuracy of 98.1%, precision of 98.5%, specificity of 97.2%, sensitivity of 97.8%, F1-score of 96.3%, G-mean of 94.5%, MSE of 0.019, and AUC of 0.981. These results demonstrate the effectiveness of the proposed model in accurately classifying soil nutrients. Moreover, the HWAO-MLPNN method is further enhanced by integrating it with other efficient neural networks for fertilizer recommendation and for classifying more soil nutrients such as copper (Cu), zinc (Zn), iron (Fe), nitrogen (N), and sulfur (S). Additionally, the model can be used to balance soil pH levels to maintain soil fertility when using a higher number of fertilizers on cultivated land. Overall, this research work provides a comprehensive approach to improving soil fertility management, which can be beneficial for farmers in the Marathwada region and other regions with similar agricultural practices. By accurately classifying soil nutrients and pH levels, farmers can optimize their fertilizer application and ensure sustainable agricultural practices.

Research limitations

Soil fertility is a crucial aspect of crop production, and understanding the nutrient levels in the soil is vital for sustainable agriculture, soil nutrient classification, and prediction are complex tasks that require accurate data, advanced analytical techniques, and efficient algorithms. The proposed system only identifies the soil nutrient and pH levels, and cannot recommend crop location based on the soil. Also, it cannot analyze time series that can be used to analyze changes in soil properties over time. Image-based classification is also not recognized in the proposed system.

Recommendations for future research

The proposed HWAO-MLPNN model has shown promising results for soil nutrients and pH level classification. However, there is still room for improvement and further research can be conducted to enhance its performance. One area of future study is to perform soil classification by using CNN and RNN. CNNs are particularly suitable for image-based classification tasks and could be used to process satellite imagery for land cover classification. RNNs, on the other hand, can be used for time-series analysis and could be applied to analyze changes in soil properties over time.