1 Introduction

The agriculture sector is a crucial source of livelihood for people all around the globe. In addition to being the primary source of food, this sector plays a pivotal role in a country’s economy and produces employment opportunities. Worldwide, India ranks second among food producers with more than a 50% contribution towards total national employment [1]. Furthermore, the share of agriculture and allied sectors for 2022–2023 reached 18.3% of the country’s gross value added [2]. However, rapid population growth has increased food demand, thus putting pressure on agricultural productivity. Moreover, 86% of all farmers in India are small and marginal, holding less than two hectares of land [3].

Furthermore, these small and marginal farmers still use conventional or customary farming practices. For instance, they use their primitive knowledge for crop selection by preferring traditional or popular crops in their region. As a result, crop yield and land fertility may suffer impacting farming returns. Increased soil acidity is one of the major fallouts of inappropriate crop selection and insufficient soil nutrients. Further, environmental conditions, climate variability, and water levels influence crop quality and productivity. Crop selection is an essential factor in increasing agricultural productivity and quality. Motivated by the above challenges, this research identifies and addresses issues that arise in the production of crops.

The fusion of contemporary technologies with conventional sectors also has the capacity to revolutionize established practices, and a notable instance of this synergy is the incorporation of data science and data mining in agriculture [4]. Data science in agriculture is emerging as a transformative power, providing sophisticated analytics and predictive modeling methods that assist farmers in maximizing yields, managing resources, and adapting to shifting environmental conditions. The integration of data science techniques with agricultural practices is reshaping farming methods for the modern era, leading to increased productivity, and reduced costs. Further, the utilization of cutting-edge technologies like machine learning (ML) [5], deep learning [6, 7], big data analytics [8], and the internet of things [9, 10] have proven to be highly beneficial to the agricultural industry [11].

A machine learning based Smart Crop Recommendation (SCR) framework is proposed to address the crop selection dilemma by considering factors such as temperature, precipitation, soil pH, humidity, and soil nutrients. In-situ soil and environmental factors are important for farming. Specifically, soil nutrients such as Nitrogen (N), Phosphorus (P), and Potassium (K) are essential for plant growth and preventing disease. In contrast, soil pH controls the chemical reactions by checking its acidic or alkaline level. In addition, the development of plants is highly affected by soil electical conductivity, indicating soil fertility, water quality, and salinity. Rainfall is another critical factor for crop yield, as different crops may require different amounts of water.

The government has made efforts to improve agricultural productivity by providing soil health cards (SHCs) to individual farmers after analyzing the soil of their farms. SHC contains macro and micro soil nutrient levels corresponding to the farm. However, farmers’ traditional cultivation approaches fail to utilize this information to improve agricultural productivity.

The SCR framework is simple and cost-effective that uses machine learning techniques to recommend crops based on local parameters. The main contributions of this paper are summarized as follows:

  • A Smart Crop Recommendation framework comprising two distinct phases is proposed to guide farmers in selecting optimal crops for enhanced returns.

  • The initial phase incorporates an artificial neural network (ANN) model designed to filter out unsuitable crops by considering farm soil nutrients and regional weather conditions.

  • The subsequent phase employs a regressor model utilizing the random forest to predict crop yields accurately.

  • The final crop recommendations are determined based on maximizing profit, considering both the cost of production (COP) and market price (MP).

  • Extensive experiments are conducted to demonstrate the efficacy of the proposed framework.

The rest of the paper is structured as follows: Sect. 2 provides a summary of related works, and Sect. 3 delves into the proposed methodology for the crop recommendation framework. In Sect. 4, the experimental setup, data exploration, and evaluation metrics are detailed. The analysis of the results for both phases is presented in Sect. 5, followed by a discussion. Finally, Sect. 6 concludes the work with the scope for future research.

2 Related Works

Digital agriculture, also known as smart farming or e-agriculture, is the utilization of digital tools and technologies to collect, store, analyze, share and mine electonic data within the agricultural sector [12, 13]. Research in digital agriculture has made significant progress toward its goals of crop selection, yield prediction and real-time farm management [14]. Cheema et al. [15] devised a diversified crop model utilizing various soil parameters to identify suitable crops. Their model employed a quantum value-based gravitational search algorithm (GSA) to optimize solutions, considering soil factors like pH, salinity, texture, nitrogen, phosphorous, and potassium as inputs for crop selection. Bakthavatchalam et al. [16] proposed a crop prediction system leveraging multilayer perceptron, JRip, and decision table classifiers based on diverse attributes. WEKA tool implementation showcased that multilayer perceptron (MLP) was the best-performing model with an impressive accuracy of 98.22%.

Table 1 Summary of the relevant works

Jain et al. [17] proposed a soil-based machine learning comparative analytical framework that assesses soil characteristics and climate factors to predict crop yield classes (high, low, and medium). The result for comparative analysis demonstrates that support vector machine (SVM) achieved maximum accuracies of 85.62% and 75.64% for wheat and maize, respectively. Gupta et al. [18] presented a crop recommendation system integrating MapReduce and K-means clustering, considering crop yields per acre for various regions and different varieties grown in the target area. Mariammal et al. [19] proposed a feature selection technique named modified recursive feature elimination (MRFE) for crop prediction, aiming to identify essential features from crop data. Their approach demonstrated that MRFE outperformed various wrapper-based feature selection techniques utilizing a ranking algorithm, achieving an accuracy of 95%. Shams et al. [29] proposed XAI-CROP, a crop recommendation system leveraging explainable artificial intelligence (XAI) for transparency. The study extensively compares XAI-CROP with various machine learning models, demonstrating superior performance through low MSE (0.9412) and MAE (0.9874), indicating highly accurate crop yield predictions. The robust \(\text {R}^{\text {2}}\) value of 0.94152 emphasizes XAI-CROP’s ability to explain 94.15% of data variability, showcasing its interpretability and reliability.

Swathi et al. [26] proposed a model for crop classification and prediction based on soil nutrition in India to address issues of low yield. Various machine learning models are employed on datasets collected from Kaggle, including six crop types and 11 nutrients. The results indicate that extreme gradient boosting and naive bayes outperform other models with AUC scores of 0.994 and 0.993, respectively. Bandi et al. [27] proposed a voting classifier-based crop recommendation system, leveraging machine learning to enhance precision agriculture. The system addresses the challenges faced by farmers in optimizing crop production based on climate and soil properties by utilizing ensemble modeling with majority voting, and it achieved an impressive accuracy of 99.4%. This approach aims to minimize financial losses for farmers and enhance informed agricultural decision-making.

Khosla et al. [20] employed various models, such as support vector regression (SVR), random forest (RF), linear regression, and k-nearest neighbors, to predict crop yield across four major Kharif crops. They initially forecasted rainfall using a modular artificial neural network model and utilized this prediction as input to SVR for crop yield estimation, revealing that SVR outperformed among various machine learning models. Gopal et al. [21] introduced a hybrid model combining multiple linear regression (MLR) and an ANN for yield forecasting. The hybrid model utilized MLR intercept and coefficients to initialize input layer weights and bias, showcasing superior performance with paddy crops compared to conventional ML techniques. Devi et al. [31] conducted a study to identify significant factors affecting agricultural production, utilizing ordinary least squares and ridge regression. Time series data were used to measure the variability in the area, production, and yield for four selected crops based on adjusted \(\text {R}^{\text {2}}\) and RMSE values.

Elavarasan et al. [22] leveraged reinforcement learning to predict crop yield. They employed a Deep Recurrent Q-Network based on input parameters, achieving a notable 93.7% accuracy in crop yield anticipation. Olisah et al. [30] presented a deep neural network regressor (DNNR) for corn yield prediction to address the interaction between weather and soil variables. Outperforming random forest and extreme gradient boosting regressors, the DNNR achieved impressively low prediction errors of 0.0146 t/ha and 0.0209 t/ha. The study emphasized empowering smallholder farmers with a mobile application decision support system, thus incorporating education and farmer-to-market access modules for intelligent farming decisions and potential impact on food crises. Daniel et al. [23] proposed a web application for crop recommendation to aid farmers in selecting effective crops and organic fertilizers. Their algorithm incorporated a deep neural network to predict prices, enhancing farmers’ decision-making in crop selection.

Table 1 provides a comparative analysis of various studies on crop recommendation and yield prediction. The analysis reveals a significant need for improvement in the literature, as existing works have focused on a limited set of parameters to predict crop suitability. For example, [15, 17], and [23] solely utilized soil parameters to recommend crops, while [20] relied exclusively on rainfall as an input parameter for predicting crop yield. Notably, a specific soil type may be suitable for various crops, but the yield can be adversely affected if climatic conditions are unfavorable. This research proposes a crop prediction architecture to address these limitations.

3 Methodology

Small and marginal farmers frequently find themselves entangled in the cycle of decreased production, leading to insufficient earnings, limited savings, and minimal investments. Their struggles with reduced crop yields and profits are rooted in the need to comprehend crop selection and the factors influencing their growth. Since crop selection is the most critical factor in maximizing crop yield and profitability, this work aims to develop a smart crop recommendation framework for enhancing agricultural returns. The framework helps farmers decide on suitable crops based on various local parameters.

Fig. 1
figure 1

Proposed framework

Let, \(C_1\), \(C_2\),...., \(C_i\) be ‘i’ different crops and \(F_1\), \(F_2\),...., \(F_j\) be ‘j’ different farmlands. Each farmer is assumed to have a SHC for their farmland ‘\(F_j\)’ providing the soil nutrition level and regular metrological updates by government agencies. The goal is to find suitable crops for each farmland based on soil and weather inputs. Figure 1 presents the proposed framework that recommends diverse crops for each farm using a two-phase process. The first phase filters ‘n’ crops for each land ‘\(F_j\)’ from the available ‘i’ crops. This phase matches the suitability of various crops corresponding to the local soil and weather conditions. The filtered crops are then fed to the next step for further analysis. The second phase estimates the yield for each crop on the available farmer’s land. The yield estimation helps compute the return for each crop individually. Further, the cost of cultivation and market price are used to estimate the net profit for each crop and recommend a list of crops to farmers along with the net profit. Each of these phases is further elaborated in the following subsections.

3.1 Crop Filter

Figure 2 depicts the first phase that filters ‘n’ suitable crops. For each farm ‘\(F_j\)’, let \(w_1\)(t), \(w_2\)(t),...., \(w_k\)(t) be the weather conditions, such as temperature, rainfall at time t, and let \(s_1\)(t), \(s_2\)(t),...., \(s_l\)(t) be the soil attributes such as N, P, K. The regular weather update \(w_k\)(t) at time ‘t’ is provided to the farmers by the metrological department or local government agencies for advance planning. In addition, the government provided soil health card contains 12 essential soil macro- and micro-nutrients, including pH, electrical conductivity (EC), Organic Carbon (OC), nitrogen (N), phosphorus (P), potassium (K), sulphur (S), zinc (Zn), boron (B), iron (Fe), Manganese (Mn), Copper (Cu). Crop growth is directly impacted by weather and soil conditions. Hence, these important soil parameters are fetched from the farmer’s SHC and weather updates from government agencies to filter out the most suitable crops. A deep learning model is proposed to compute probabilities \(p_1\), \(p_2\),...., \(p_x\), using input parameters and rank crops based on probabilities. Finally, the top ‘n’ crops are filtered and passed on to the second phase for further analysis.

Fig. 2
figure 2

Crop filtration phase

Figure 3 depicts the architecture of the proposed ANN model used for the first phase. In an ANN feed-forward backpropagation network, the weights and bias, number of hidden layers, hidden neurons, learning rate and number of training epochs are essential parameters affecting prediction accuracy. Hence, the trial and error method has been used to select the parameter values for accurate predictions. A total of seven inputs are given to the input layer, and a weighted sum of inputs and bias is given as input to the hidden layer. ReLu is used as the activation function for the hidden layers, whereas the softmax activation function is used as the output layer to predict probabilities. Further, the hidden layer contains 512 nodes each, whereas the input layer contains seven nodes, and the output layer contains 17 nodes for each crop.

Fig. 3
figure 3

Proposed ANN architecture for crop filtration

3.2 Crop Yield Prediction

The second phase of the framework deals with the yield prediction for each of the ‘n’ filtered crops obtained from the first phase. Let \(L_1\), \(L_2\),...., \(L_p\) be the farm location and \(A_1\), \(A_2\),...., \(A_q\) be the farm area for the target lands. Figure 4 depicts the second phase of the SCR framework that predicts yield using a regression model for each of the filtered crops individually on the available farmer’s land \(L_p\) with an area of \(A_q\). The model takes input filtered crops, season, location, and farm size and predicts yield for the land. Different regression models such as multiple linear regression, random forest regression, support vector regression (SVR), and XGBoost regression are applied to identify the best-performing regressor for the proposed framework.

Fig. 4
figure 4

Crop yield prediction phase

4 Experiment

This section empirically evaluates the performance of the proposed architecture. It begins by discussing the experimental setup, the dataset source, and the data analysis. Following this, the implementation details of the proposed model are elaborated, along with the algorithmic steps and the employed evaluation metrics.

4.1 Experimental Setup

The experimental configuration includes an Intel Core i5 processor with a 3.6 GHz and 8 GB RAM. Python served as the programming language and Google Colab Notebook was used for the program execution. Standard software libraries, including Keras, TensorFlow, Matplotlib, and NumPy, were employed.

Table 2 Feature description (crop filtration phase)

4.2 Dataset

Two different datasets were used to evaluate the performance of the proposed SCR framework. The crop filtration dataset is obtained from Kaggle [32]. In this dataset, lands and crops are classified based on several weather and soil properties, including 2200 land and 22 crop samples. However, only 17 crops, maize, rice,...., and pomegranates, are considered for this phase due to data availability for these crops in the next phase. Table 2 describes the features used in the first phase of the framework.

Table 3 Feature description (yield prediction phase)

The dataset utilized in the second phase was sourced from the Department of Economics and Statistics, Government of India [33]. Although the dataset comprises more than 30 crops, the selection was narrowed to 17 crops common to data availability for both the phases. Table 3 presents the attributes of the collected dataset for the crop yield analysis.

Fig. 5
figure 5

Phase I dataset (a) correlation matrix (b) and feature importance

4.3 Dataset Analysis

This section examines the soil and environmental data affecting crop filtration and yield prediction procedures. Macronutrients like N, P, and K are substantially required for proper crop development. Figure 5a illustrates the correlation among the utilized features, emphasizing a high correlation between potassium and phosphorous as soil parameters, while humidity and rainfall show a moderate correlation. Figure 5b identifies the pivotal features in the crop filtration dataset, underscoring the significance of rain and humidity.

Fig. 6
figure 6

N, P, and K values required by different crops

Figure 6 illustrates the comparison of nitrogen (N), phosphorus (P), and potassium (K) values needed by different crops. Cotton, apples, and grapes exhibit the highest macronutrient requirements for optimal growth, while lentils, black gram, and oranges have the lowest. The significance of soil macronutrients such as N, P, and K is relatively consistent across all crops. Overall, rainfall emerges as the most critical factor, with pH being the least influential among the specified parameters. Figure 7 presents the correlation matrix for the features used in the crop yield dataset.

4.4 Algorithm for the SCR Framework

The primary goal of this experiment is to develop a recommendation system that will advise farmers on which crops to plant based on various factors such as soil constituents, crop traits, and climate. Algorithm 1 presents with the detailed steps involved in crop selection using ANN-RF.

Fig. 7
figure 7

Correlation matrix (crop yield data)

Algorithm 1
figure a

Top \( m \) Crop Recommendation

The algorithm is divided into two parts. The first part computes each crop’s rank and filters the top ’n’ crops, and the second part predicts the yield for each crop corresponding to the farmer’s land. It requires soil health card details and environmental values concerning each land as input. Further, the top ’m’ crops are recommended based on the maximum profit for each crop individually.

4.5 Evaluation Metrics

For multiclass recommendation tasks, accuracy serves as the most straightforward metric when considering only instances for which the predicted and true categories match. The accuracy is defined in Eq. 1, where TP (true positive) indicates instances correctly predicted as positive, FN (false negative) represents instances incorrectly predicted as negative, FP (false positive) denotes instances incorrectly predicted as positive, and TN (True Negative) signifies instances correctly predicted as negative. Additionally, for a regression evaluation, error metrics like mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (\(\text {R}^{\text {2}}\)) are employed to assess the model’s performance.

$$\begin{aligned} Accuracy= & {} \frac{TP+TN}{TP+FN+FP+TN} \end{aligned}$$
(1)
$$\begin{aligned} MAE= & {} \frac{1}{n}\sum _{i=1}^{n} |(Y_i-\hat{Y}_i)|\end{aligned}$$
(2)
$$\begin{aligned} RMSE= & {} \sqrt{\frac{1}{n}\sum _{i=1}^{n} (Y_i-\hat{Y}_i)^2} \end{aligned}$$
(3)
$$\begin{aligned} R^2= & {} 1-\frac{\sum (Y_i-\hat{Y_i})^2}{\sum (Y_i-\bar{Y_i})^2} \end{aligned}$$
(4)

Here, \(Y_i\) is the actual value, \(\hat{Y}_i\) is the predicted value, and \(\bar{Y_i}\) is the mean value. Mean absolute error (MAE) in Eq. 2 calculates the average absolute variance between predicted and actual values, offering robustness to outliers by treating all errors linearly. Root mean square error (RMSE) in Eq. 3 measures the Euclidean distance between predictions and actual values, providing an advantage in representing errors in the same unit as predicted. It assigns greater weights to larger errors. \(\text {R}^{\text {2}}\) in Eq. 4 is a statistical metric indicating the goodness of fit for a regression model, with values ranging from 0 to 1.

5 Results and Discussion

In this section, the effectiveness of the proposed crop recommendation framework has been rigorously evaluated through comparative experiments. The results of these experiments are comprehensively analyzed and compared with the results produced by state-of-the-art approaches. Additionally, the adaptability of the framework is explored through the lens of various research questions, providing a holistic view of its performance and potential.

5.1 Results Analysis of the Crop Filtration Phase

The proposed framework employs classification models to predict probabilities and regression models for yield prediction. The evaluation involves training and testing accuracy for the classification model and \(\text {R}^{\text {2}}\), RMSE, and MAE for the regression models. Table 4 showcases the accuracies of different models in predicting the most suitable crop based on environmental conditions. The results as highlighted in the table, reveal that the proposed ANN model attains the highest training and testing accuracy values of 99.27% and 99.10%, respectively. In contrast, the decision tree records the lowest training and testing accuracies of 98.50% and 97.41%, respectively. Consequently, the proposed framework utilizes ANN for the crop filtration phase. Table 5 details the crops filtered by the ANN model for three lands, with flexibility for adjusting the number of crops based on user preference.

Table 4 Comparative analysis (phase I)

Figure 8a and b illustrate accuracy versus epoch and loss versus epoch graphs for the proposed ANN model, respectively. The loss curve indicates that the global optimal minima are attained at an early iteration stage. Moreover, the ANN-based crop recommendation model consistently produces more accurate results than conventional models. Table 6 provides a comprehensive comparison of the results achieved in the crop filtration phase against state-of-the-art work, affirming the superiority of the proposed model.

Table 5 List of filtered crops for sample land
Fig. 8
figure 8

ANN results a accuracy versus epoch b loss versus epoch

Table 6 Comparative analysis (phase I) with state-of-the-art work

5.2 Results Analysis of the Yield Prediction Phase

Further, various regression models were applied to evaluate the second phase’s performance, and the best-performing model was selected for yield prediction. Initially, data cleaning was applied to the crop yield dataset as different attributes have different measurements. The Min-Max Scaler was implemented from Python’s Scikit-learn library to obtain an accurate yield prediction. The dataset was rescaled using Eq. 5.

$$\begin{aligned} Y=\frac{X-X_{min}}{X_{max}-X_{min}} \end{aligned}$$
(5)

where Y is the rescaled value, X is the attributes value, \(X_{min}\) is the minimum value, and \(X_{max}\) is the maximum value of the attribute.

Table 7 Comparative analysis (phase II)

ML algorithms such as RF, DT, XGB, SVR, and ridge regression are applied to the preprocessed data for crop yield prediction. The performance analyses of these models are compared in Table 6, and the model is selected based on performance metrics, including MAE, RMSE, and \(R^2\). The optimal scenario is when \(R^2\) is maximized and MAE or RMSE is minimized. Further, comparisons of the achieved results are shown in Figs. 9 and 10. Figure 9a and b compare the models in terms of \(R^2\) and MAE, respectively. The graphs shows that SVR is the worst performing model with the lowest \(\text {R}^{\text {2}}\) and highest MAE values; other models have optimal values, whereas RF has the lowest MAE value. Similarly, Fig. 10 compares models in terms of RMSE. SVR has the highest RMSE value and the lowest model score. RF and XGB have the lowest RMSE values and with the highest model scores.

Fig. 9
figure 9

Performance comparison of various models in terms of a \(R^2\) and b MAE

Tree-based models like RF, XGB, and DT exhibit more stable performance than others, owing to their ability to establish a stable and accurate decision boundary. Their robust performance is attributed to decision-making based on the outcomes of multiple trees, with majority voting contributing to precise predictions. RF has consistently demonstrated precision in various agricultural applications, particularly yield prediction, offering high accuracy, convenience, and practical utility in data modeling. It emerged as the best-performing model across multiple parameters and, thus, was selected for the yield prediction stage. Contrarily, SVR is the least effective model, as it is better suited for discrete problems for which it attempts to create the best-fit line using support vectors. Figures 9 and 10 visually highlight that RF delivers optimal performance with a 0.99 \(\text {R}^{\text {2}}\) error. Consequently, Table 7 with highlighted results, show that RF outperforms in the production of the crops obtained from the first phase.

Fig. 10
figure 10

Performance comparison of various models in terms of RMSE

5.3 Results Analysis of the Final Recommendations

The yield, cost of production, and market price of each filtered crop for Land 1 are shown in Table 8. The COP data were collected from a government website [39], and the MP data were collected from the agriculture commodity market [40]. It can be seen that of the five filtered crops, papaya, watermelon, and apple would be the most profitable for the framer. Hence, these crops are recommended to the farmer by the proposed framework as highlighted in the table. Further, Table 9 provides a comparative analysis of the results achieved in the yield prediction phase against state-of-the-art methods, affirming the efficiency of the proposed tree-based RF model.

Table 8 Crop yield and profit
Table 9 Comparative analysis (phase II) with state-of-the-art work

5.4 Discussion

One of the noteworthy developments presented in this study is the formulation of a two-phase framework designed to recommend crops. In the initial phase, artificial neural network implementation helps filter crops based on their probabilities of utilizing farm-specific preferences. The random forest model also stands out for its exceptional performance in accurately suggesting crop yield. The results achieved by these models reflect the proposed framework’s efficacy in situations requiring personalized farm recommendations. Furthermore, the following research questions (RQs) are framed to validate the applicability of SCR in current real-time farming settings.

  • RQ1: How does the smart crop recommendation framework utilizing an artificial neural network for crop filtration and random forest for yield prediction contribute to more accurate and informed crop recommendations compared to traditional farming practices?

    Traditional farming often relies on experience, intuition, and general practices passed down through generations that might not align with current or specific on-site farm conditions. In contrast, the SCR framework is a data-driven decision-making tool that analyzes extensive datasets, incorporating weather patterns, soil health, historical yields, and market trends. Further, the ANN excels at recognizing intricate patterns within a dataset, enabling precise crop filtration based on a myriad of local input parameters. This allows for a nuanced understanding of the complex relationships between various factors, something traditional practices struggle with. As an ensemble learning method, random forest contributes to robust yield predictions by aggregating outputs from multiple decision trees. This enhances prediction accuracy and provides a reliable estimation of potential yields, surpassing the capabilities of traditional methods that often rely on intuition or experience. Hence, this data-driven approach enables more precise and personalized recommendations than other approaches.

  • RQ2: How does integrating local input parameters, such as weather, soil conditions, cost, and market prices, into the SCR framework affect the accuracy and relevance of crop recommendations for farmers?

    By incorporating in-situ weather and soil condition data, the model adapts to the specific agro-climatic conditions of each farm. Considering costs and market prices ensures economic viability, helping farmers make informed decisions based on both yield potential and financial feasibility. This comprehensive integration not only refines the accuracy of crop predictions but also tailors recommendations to unique challenges and opportunities at the farm level, thus optimizing the overall effectiveness of the SCR framework.

  • RQ3: How does the proposed SCR framework address the challenges and limitations farmers face in traditional crop planning, and how does it contribute to sustainable agricultural practices?

    The proposed framework addresses several challenges inherent to traditional crop planning, contributing significantly to sustainable agricultural practices. This heightened accuracy minimizes the risks associated with suboptimal crop choices, promoting resource efficiency and reducing financial losses for farmers. Furthermore, by incorporating sustainability metrics like soil features, the SCR framework encourages the cultivation of crops aligned with environmental conservation goals, optimizing resource utilization and minimizing ecological impact. In essence, the SCR framework enhances the economic viability of farming and promotes ecologically sustainable agricultural practices.

  • RQ4: How does the SCR framework align with current technological advancements and the need for modernization in the agricultural sector to meet the increasing demands for both quantity and quality of food?

    The SCR framework utilizes machine learning models, including ANN and RF, to analyze extensive datasets and derive valuable insights in alignment with the broader trend of integrating technology into agriculture for improved productivity and decision-making. Additionally, by integrating local farm features, the framework embodies digital agriculture, adheres to modernization goals, implements personalized data-driven strategies, and addresses increasing food demands.

  • RQ5: What are the potential implications of implementing the proposed SCR framework on the agricultural sector’s overall economic viability and productivity, and how does it align with the long-term goals of ensuring food security and meeting the demands of a growing global population?

    The implementation of the SCR framework holds profound implications for the overall economic viability and productivity of the agricultural sector. Below are several ways in which it aligns with long-term goals of ensuring food security and meeting the demands of a growing global population:

    • By cultivating crops with higher market demand and favorable growth conditions, farmers are better positioned to improve their income, contributing to the overall economic health of the agricultural sector.

    • Precise recommendations tailored to specific agro-climatic conditions ensure that available resources are efficiently utilized, leading to higher yields per unit area and, consequently, improved overall productivity in the agricultural sector.

    • By aligning crop recommendations with ecological considerations, the SCR framework contributes to sustainable farming practices crucial for the long-term health of the agricultural sector.

    • Enhancing the precision of crop planning, mitigating risks associated with suboptimal choices, and adapting to changing conditions contribute to a more resilient and secure food supply chain.

    • With its data-driven and technologically adaptable approach, the SCR framework aligns with global efforts to modernize agriculture.

These research directions validate the practicality of the proposed framework, especially in contexts characterized by resource limitations and constraints. Beyond its basic architecture, the framework fulfills a dual function of crop selection and yield prediction, eliminating the need for separate applications for these tasks. This streamlined approach enhances the model’s efficiency and practicality.

6 Conclusion

India stands as a leading producer of agricultural goods, yet there exists untapped potential for optimizing productivity. If crop yield and return are to be elevated, it is imperative to pinpoint factors that can enhance the current agricultural landscape. A critical determinant in crop production is the selection of the most suitable crop based on geographical and geological conditions. However, there is a noticeable deficiency in scientific agricultural literacy within the farming community, leading to reliance on conventional practices. Addressing such challenges through computational means is of paramount importance. Machine learning algorithms have emerged as a transformative bridge, reducing the knowledge gap between agricultural experts and farmers.

This research proposes a crop recommendation system that can identify the most fitting crops for specific regional conditions. The envisioned solution relies on a standardized dataset acting as a domain expert to inform decisions. Subsequently, a computational process harnesses this agricultural dataset, employing machine learning techniques to construct trained models. The proposed smart crop recommendation framework, grounded in machine learning, empowers farmers with well-informed decisions on optimal crop selection. Uniquely positioned to handle challenges at the farm level, the two-phase SCR framework comprehensively analyzes local factors, recognizing the dynamic nature of diverse agricultural features. In the initial phase, the framework achieved an impressive accuracy of 99.10% using artificial neural networks. Additionally, in the next phase, the random forest demonstrated high performance, with a 0.99 \(\text {R}^{\text {2}}\) error metric. The experimental results attest to the proposed framework’s efficacy, positioning it as a practical and efficient real-time recommendation solution. In terms of practical implications, the simple and lightweight design of the suggested framework offers potential for future integration with handheld devices. Moreover, there are opportunities for future explorations of the expansion of the application’s utility (e.g., incorporating a closed-loop supply chain, predicting fertilizer needs, and recognizing plant diseases), thus providing more comprehensive solutions for crop and soil management.