Keywords

1 Introduction

Machine Learning techniques are used in many sectors, such as healthcare to predict the suitable treatment, supermarkets, manufacturing companies to analyze the customers' behavior like products used by customers. From several years Artificial Intelligence and ML techniques are also being applied in the agriculture-Farming sector to solve farmers’ problems. Crop production is based on several parameters like seed type, climate, fertilizer used, weather and soil type, etc.

A problem with most of Indian farmers is, Lack of knowledge and Lack of proper assistance for precision Farming and so the objective of this Literature Review is to study and analyze existing Indian agriculture problems and solutions provided to these problems using Machine Learning techniques, to study and analyze different soil parameters which affect the agriculture production, to find the novel approach for proposed work.

The beauty of Machine Learning algorithm is to train the model using a training dataset and predict the class of new samples even though the new example is not completely matching with training samples. For example, where the training dataset contains CAT and DOG faces and predicted class for Tiger face as CAT.

There are three main categories of Machine Learning approaches. First, Supervised Learning includes learning from experience data, i.e., empirical data and its examples includes Classification, Regression (KNN, Decision Tree, and Linear Regression). Second Unsupervised Learning, i.e., Learning from observations given in the dataset, i.e., patterns in the dataset, its examples include, Clustering Techniques such as K-means, DBSCAN, third, Reinforcement Learning, i.e., learning from environment feedback in the form of penalty and rewards, for example, Deep Q Networks. Nowadays, deep learning algorithms are used for optimization of models because they attempt to learn by using a hierarchy of multiple layers [1].

2 Related Work

This work of literature review includes a survey of existing Indian agriculture problems and solutions provided to these problems using Machine Learning techniques, survey of different soil parameters which affect the agriculture production, and survey of different ML techniques to find the novel approach for proposed work.

Identified Research questions are:

Q1. Identify machine learning algorithm used for the Agriculture Support System.

Q2. Identify features used to design Agriculture Support System.

Q3. Identify model evaluation parameters and evaluation approaches used for the agriculture Support System.

Q4. Identify the Gaps in the field of Agriculture Support System.

2.1 Bibliography Analysis

Figure 1 shows bibliography analysis for distribution of papers used for the literature review.

Fig. 1
A horizontal bar graph of document count and document type. The Scopus journal has the highest count and books and thesis has the lowest count.

Bibliography Analysis

Table 1 gives the count of documents referred for the survey on the basis of type of document and Table 2 gives the count of documents referred for the survey on the basis of publication year (Fig. 2).

Table 1 Number of papers referred on the basis of type
Table 2 Distribution of documents based on the publication year
Fig. 2
A pie chart represents the documents referred based on the publication year. From highest to lowest, before 2016, 2020, 2019, 2018, 2017, and 2016.

Number of documents referred on the basis of publication year

Sirsat et al. developed 20 different classification models for Classifying Indian agricultural soil parameters. They developed Soil nutrients N, P, K Classification model, Soil pH Classification model, model for Classification of Crop, model for Soil classification by type. These Classification problems are studied and implemented for the Marathwada dataset using Bagging, Boosting, Decision Tree (DT), K-Nearest Neighbor (KNN), Rule-Based (RB), Neural Network (NN), Random Forest (RF), and Support Vector Machine (SVM) models. Cohen kappa (k) in % is used by authors for measuring accuracy of these models. Model results are discussed below [2].

  • Best k for Decision Tree Soil classifier using Weka tool is 97.82%,

  • Best k for Random Forest Crop classifier using R language is 88.13%,

  • Best k for Random Forest pH classifier using R language is 47.32%,

  • Best k for Random Forest NPK classifier using R language is 33.6%,

  • Best k for Random Forest OC-F classifier using Weka tool is 90.65%.

Sirsat et al. proposed 76 different models to predict soil fertility based on nutrients values of organic carbon (OC), phosphorus pentoxide (P2O5), Zn-Zinc, Fe-iron, and manganese (Mn) using different Regression Techniques such as Linear Regression (LR), Generalized Linear Regression (GLR), Least Square (LS), Partial Least Square (PLS), LASSO, Ridge, Neural Network, Deep Learning, SVM, Random Tree [3].

R2 accuracy measure is used by authors to find the best Regressor. Authors concluded following results of proposed models [3] (Table 3, Fig. 3).

Table 3 Comparative analysis of Indian agricultural soil parameters models accuracy
Fig. 3
A horizontal bar graph relates the accuracy of the Indian agricultural soil models with its summary. The decision tree is high and the random forest N P K is low.

Indian agricultural soil parameters models accuracy with data chart

ZhaoyuZhai et al. represented a survey and challenges of agriculture (4.0) decision support systems (DSS). They did systematic survey of 13 representative DSS including their applications for planning missions, management of water resources, for controlling food waste, etc. [4].

Alexandre Barbosa et al. proposed a model for optimizing nutrient management for predicting crop yield response using CNN. Authors developed CNN with Early Fusion (EF), CNN with Late Fusion (LF), and 3D CNN and compared results with Multiple Linear Regression (MLR), Full Connected Network, Support Vector Network, and Random Forest models. Root Mean Square Error (RMSE) measure is used by authors to measure accuracy of CNN model. Results shows, CNN-LF with lowest error for nine tested fields (0.66), and CNN-RF with second best result (0.76) [5].

Suchithra et al. proposed a model for proper fertilizer utilization, to reduce the analysis time experts, and to improve quality of soil. In this work accuracy measures used are Accuracy, Kappa, Precision, Recall, FScore, and results given by models are as follows [6].

  • Soil Nutrient Classification for Gaussian radial basis function: 80% (Optimal neurons 50),

  • pH classification for hyperbolic tangent function: 90% (Optimal neurons 150).

Himanshu Pant et al. proposed a model to enhance the precision of crop-fertility prediction using different supervised ML techniques. K-Means is used to identify quality and fertility of the Soil with levels 1, 2, and 3 for Nainital District dataset. Accuracy measures used for Classification problems are Precision, Recall, F1 Score, Support, Accuracy, and results are as below [7].

  • SVM with 96.62% Accuracy (Best classifier among all),

  • KNN with 91.01% Accuracy

  • LR with 89.88% Accuracy

  • LDA with 91.01% Accuracy.

Santhi et al. proposed a model to compare the categories of Farming and types of crops using crop and fertilizer recommendation methods based on soil test reports [8].

Manpriya et al. proposed a model for effective crop prediction technique for better crop production with more crop datasets. Deep NN with two hidden layers is used to predict appropriate crops for every district of India. 124 crops are included in the work. Performance parameters used by authors are Accuracy, Mean Absolute Error (MAE), and MSE. Sigmoid as activation function (SGD optimizer) is used for updating parameters and weights to reduce the loss function. Values of performance parameters are Accuracy with 99.19%, MAE with 0.0157, and MSE with 0.0078 [9].

Deshmukh et al. proposed a model for Soil Health Analysis and Soil quality prediction with N, P, and K Soil parameters. Results for soil quality prediction models and crop prediction models are shown in figure. CN2 Rule Inducer with accuracy of 0.94 declared as Best Classifier. Figure 4 shows performance comparison of Soil Quality and Crop Advice Prediction using different classifiers [10].

Fig. 4
A line graph represents the different agricultural soil models. It plots 2 lines for crop advice prediction and soil quality prediction.

Soil quality and crop advice prediction

Labhade et al. developed a model to predict the outcomes based on the selected data and business requirements. Predictive Analytics is done using KNIME Tool and its results are as follows. Figure 5 shows Accuracy and Error rate for different classifiers using KNIME tool. As per analysis Logistic Regression method gives best accuracy for student datasets [11].

Fig. 5
A 3D graph compares different classifiers using the K N I M E tool. It plots 2 horizontal bars for accuracy and error. The regression model has the best accuracy.

Accuracy and error rate for different classifiers using KNIME tool

Viviliya et al. developed Hybrid model of J48 and Naive Bayes classifiers for recommending crops using ML techniques, to increase crop yield. Models are developed using dataset of parameters State, District, Crop year, Area, etc. and yield info from 1997 to 2015, Season, Temperature, Rainfall, Water requirement, and type of soil. J48 has given best accuracy 95.53% [12].

Devdatta et al. implemented a model of crop yield prediction using historical data by using machine learning on agriculture dataset and recommending fertilizers suitable for crop. Classification models using SVM and RF are built and authors used Precision, Recall, f1-score, and accuracy in % performance measuring parameters and discussed the results are as below [13].

  • Soil Classification model using RF with accuracy of 86.35% and SVM with 73.75%.

  • Crop Yield Prediction model using SVM with 99.47% accuracy and RF with 97.48% accuracy.

Rafael Hernández Moreno et al. presented a Multi-Layer Perceptron (MLP) ANN model with an input layer formed by soil parameters, an output layer with fertilizers and amendments. A GridSearchCV is used to test and optimize the model [14].

Archana et al. proposed a DSS model using Voting Based Ensemble Classifier. Voting based ensemble classifier for Crop recommendation (Random Forest Classifier, Naive Bayes Classifier, and CHAID Classifier) with input parameters, N, P, K, Temperature, and other soil parameter is built and got the 92% accuracy [15].

Rajak et al. developed a model for crop prediction using Ensemble technique (Majority Voting technique). In Ensemble technique different selected algorithms are SVM, Random Forest, NAÏVE Bayes, ANN- Multi-layer Perceptron [16].

Devotha et al. presented a review for survey of use of Characterization techniques in agriculture sector. They applied probabilistic and deterministic approaches, where the supervised algorithms are used in deterministic approaches, while the unsupervised algorithms are used in probabilistic approaches [17].

Srivastava et al. presented survey paper to electorate on different Clustering Techniques such as DBSCAN, Agglomerative, K-means, EM algorithms for Agriculture applications to bring a good advancement in the agricultural area for Forecasting Pollution, Combined Classification of Soil with GPS [18].

Bouighoulouden et al. proposed a model using PCA for reduction of the features and K-means implemented on Rstudio, Orange DM tools to identify groups of productive and non-productive yield [19].

Dr. Madhavi Gudavalli et al. applied Clustering on Wheat seed dataset using different clustering techniques. 3 clusters are formed Kama, Rosa, Canadian with pair of attributes using R tool, authors reported that k-mean is good for large datasets and Hierarchical is good for small datasets [20]

Priya et al. built a model for depiction of management zones and soil dataset analysis using K-means, GK clustering, and Farthest First (obtained Best-faster) Algorithms [21].

Utkarsha et al. developed Modified K-Means Algorithm and used it for crop prediction. District, zone, and selection of seasons, max temperature, min temperature, soil type, and average rainfall are considered for training the model. Work shows comparison of k-Means++ and k-Means with modified k-Means on Crop data. Modified k-Means gave the maximum quality clusters, maximum accuracy count, and correct prediction of crop [22].

Silas et al. used Association Rule Mining and Clustering Techniques for Tea Production prediction in Kenya country. Dataset contains 156 tea production records from year 2003 to 2015. Clustering techniques are used to form the groups of similar productions using (SPSS) K-Means [23].

Majumdar et al. presented analysis using different ML techniques such as Multiple LR, CLARA, PAM, and Modified DBSCAN to identify optimal parameters to maximize crop production. Modified DBSCAN was declared as a Best to cluster the data having similar rainfall, temperature, and soil type [24].

Vandana et al. proposed model for crop production and US arrest dataset analysis. Techniques used are Hybrid K- means which declared as a Best. Elbow, Gap Statistic, Silhouette Methods are used to select optimal “K” value [25].

Aurelia-Vasilicalana et al. used clustering methods for Organic farming patterns analysis. Work identified three possible clusters using clustering methods [26].

Chunjiang et al. built a model using Frequent Pattern Tree for mining association rules with multiple inputs of minimum supports (MSDMFIA). It overcomes the problem of single minimum support used in tradition method [27].

Geetha et al. used Apriori algorithm to assess different association algorithms and used them into a soil science database to identify meaningful relationships [28].

Kane et al. proposed model for Classification of home loan sales in an Irish retail banking using Association Rules. Associative classifier models used are CMAR, Classification Based Association (CBA), and SPARCCC [29].

Vasoya et al. proposed distributed model based on distributed and parallel computing for large dataset association rule mining to find frequent patterns in less time. Clustering process is used to divide large data into number of clusters and these clustered data are used for mining process [30].

Thakkar et al. used Association rule mining algorithms like Apriori and classification techniques like ID3 and C4.5, to solve agriculture crops problems [31]. Khan and Singh [32], presented survey of Association Rule Mining methods for agriculture problems. Survey represents techniques used to solve problems using the Partition Algorithm, Apriori, Pincer search Algorithm, FP-Tree Growth Algorithm, Dynamic Itemset Counting Algorithm [32].

Mishra et al. presented survey of Associative Classifiers (CBA, CMAR, MCAR, and GARC) used on Soil dataset of Bhopal M.P District [33]. Sun et al. [34] presented an Overview of Associative Classifiers. The conventional classification system such as C4.5 is compared with associative classifier. Total 27 UCI datasets are used for comparison [34].

Prachitee et al. proposed a model of Classification Technique using Associative Classifier based on the Neural Network system (NNAC) to improve its accuracy. NNAC system performance is compared with the Classification Based Association on four different datasets from UCI repository [35].

Soni et al. proposed solution for Health Care domain using Associative Classifiers to predict the disease with some suitable treatments. Authors used class rule mining—Associative Classification (AC), classification Association rule (CAR) techniques [36]. Classifier to assist the physician to find association among patient parameters (e.g., personal data, medical tests,) have also been developed, and advanced association rule mining with classifiers are used to develop models of an AC based on positive and negative rules, Temporal AC, AC using Fuzzy Association Rule, Weighted AC [36].

Jinubala et al. proposed a model to classify Pest Level based on whether data using Constraint-based AC (Accuracy 92%) and Traditional method accuracy of 59% [37]. Mattieva and Kavšeka [38], proposed a model using associative classification techniques such as AC based on strong association rules. Average accuracy given by model is 91.3%. Experiments are done on 15 UCI ML D/B repositories [38].

Li Yu Hu et al. work presented Novel CBA-based method: MMSCBA, (multiple minimum supports (MMSs)) [39]. Dalvi et al. [40], proposed a Ontology-based model for agricultural (IR) using NLP, to extract knowledge in Marathi language [40].

Pai et al. presented ML models for Identification of Kannada Farmer’s Query using a speech recognition system for agricultural dataset in Kannada language. The dataset consists of the name of the crops and name of the districts of Karnataka state. MFCC is the most prominent feature extraction method used in speech recognition. MFCC for CROP, District Data [41].

Savant et al. presented survey of existing system of Maharashtra Government, Survey of clustering Techniques, and Classification of farmer’s feedback [42]. Vispute et al. [43] proposed a model for automatic personalized Marathi content generation in Marathi language using LINGO algorithm. Work has experimented on five different datasets and personalization is done using “Time Session”, “Number of hits” and Bookmark methods [43].

Vispute et al. extended previous work using HADOOP parallel system platform for Marathi dataset [44]. Vispute et al. [44], developed a model for categorizing Marathi text documents automatically for dataset of three categories- Health Programs, Tourism, and Maharashtra festivals using Lingo Clustering algorithm. Dataset contains 107 Marathi documents [45].

Sonigara et al. built a model for effective information retrieval system to input the data in heterogeneous forms and represent it into a common format, i.e., a text file, and categorizing Marathi data automatically using LINGO algorithm [46].

Tayal and Meena developed parallel system solution using the MapReduce approach on HADOOP platform for associative classification and experimented on six datasets available on UCI repositories. To provide solution to problems they developed two algorithms MRMCAR-F and MRMCAR-L [47]. Figure 6 shows accuracy comparison of proposed association classification techniques by Devendra et al. for six different datasets of UCI data repository.

Fig. 6
A line graph compares the average accuracy of the classification techniques for six different datasets. All the datasets show a fluctuating trend.

Comparison of proposed association classification techniques for six different datasets of UCI data repository [47]

Figure 7 shows comparison of time required to execute different associative classification techniques proposed by Tayal and Meena [47].

Fig. 7
A line graph compares the computation time of classification techniques for six different datasets. All the datasets show a decreasing trend.

Comparison of execution time with the proposed associative classification techniques [47]

Dang Nguyen et al. proposed an efficient constraint-based CARs model with the item set constraint. To test the performance of novel model authors used 14 different datasets like Adult, Breast, German, Chess, Connect4, etc. Figure 8 shows proposed models for adult dataset [48].

Fig. 8
Two graphs plot the runtime versus selectivity and runtime versus minSup for different proposed models of an adult dataset. All show an increasing trend.

Comparison of execution time required for different proposed models for adult dataset [48]

Wang et al. proposed an improved model using dynamic property in the associative classification [49]. Villuendas-Rey et al. [50] used and evaluated the Naïve Associative Classifier on financial dataset for simple, transparent, and accurate classification [50].

Figure 9 shows the overall AUC results of NAC, compared with other classifiers. It shows that NAC outperforms as compared to other algorithms.

Fig. 9
A vertical bar graph compares the A U C results of N A C with other classifiers. The N A C performs well and others have an average performance.

AUC results of NAC, compared with other classifiers [50]

Chen et al. proposed an efficient classification approach, Principal Association Mining to design a compact classifier for generating reduced association rules [51]. Padillo et al. [52] introduced a new Library of JAVA language for Associative classification, i.e., LAC. This library package includes the full taxonomy of associative classification paradigm [52]. Loan et al. [53], developed a new model for extracting class-association rules [53]. Antonell et al. [54] used fuzzy-frequent pattern mining algorithm to proposed a novel classification model. Authors tested the new approach on 17 datasets and represented comparative analysis. New model gave better results than existing [54]. Hadi et al. [55] proposed efficient model for text classification which combines features of Naïve Bayes and associative classifiers [55]. Thasleena et al. [56] developed an efficient classifier for XML documents using associative classifier to overcome the drawback of the existing technology [56]. Mattieva et al. [57], proposed simple classification with “strong” class-association rules to improve the classifier performance with good accuracy [57].

Nguyena et al. and Wang et al. proposed hybrid and an efficient method to solve problem using associative classification techniques [58, 59]. Villuendas-Reya et al. proposed new model NAC, based on Associative classifier, and tested and evaluated model on financial dataset [60].

In the next literature survey of agricultural decision support systems for precision farming we compared different ML and Deep Learning algorithms and explored possible uses of these algorithms to solve multiple problems related to farming.

Many algorithms like SVM, Random Forest, and CNN were used to detect plant diseases. The result shows that CNN detects a greater number of diseases of plants with high accuracy [61,62,63,64,65].

In scenarios where there is huge difference between size or color difference between crop and weed, image processing-based algorithm works well. Survey tells that CNN performs better than the SVM and ANN due of its ability of learning in depth to learn related features from the image dataset. ANN is very accurate but requires huge amounts of training data and is slower [66,67,68,69,70].

For weather forecasting research shows that different models such as ANN, CNN, and Recurrent NN can be used. Out of these models, Long Short-Term Memory LSTM (type of RNN) works exceptionally well for sequential data of weather prediction [71,72,73,74,75].

Many algorithms like ARIMA, SARIMA, and RNN algorithms such as LSTM and Gated Recurrent Units (GRU) can be used to predict agricultural prices. The results show that in general LSTM models perform better than others with higher data while ARIMA and SARIMA can perform reasonably well even with less data [76,77,78,79,80].

The next survey of work shows, solution to a variety of problems like prediction of soil fertility level, disease detection, prediction of yield based on weather conditions, identifying correct action during farming in different situations, etc. [81,82,83,84,85].

3 Common Findings from Literature Review

3.1 Results and Common Methodology

Most commonly used methodology by authors is given in the below Fig. 10. It includes following basic steps to develop a model for solving problems.

Fig. 10
A circular flow of M L developing tools. Problem framing, data collection, preprocessing, analysis, engineering, training, evaluation, and prediction.

Common flow diagram for Developing ML Model

Table 4 shows most used machine learning algorithms in the existing work with efficient model details. (Answer of question1).

Table 4 Algorithms used by most of the existing work with best model details

In the review of existing work, it is found that Agriculture decision support systems are developed to provide decisions about single areas of farming such as recommendations for Crop yield prediction, recommendation for fertilizer, etc. by using different machine learning techniques mentioned in Table 4.

In the survey, it is found that very little work is done to provide solutions to the agriculture problems using associative classifiers, only three paperwork shows solutions to agriculture problems using associative classifiers. This existing work only provides a single decision to farmers at a time by considering different agriculture parameters such as N, P, K, Ph, Crop year, Rainfall, etc. So, the more effective agriculture decision support system needs for precision farming.

Features used in most of the work (Answer to question 2) are Sulfur, Magnesium, potassium, zinc, nitrogen, calcium, boron, and Phosphorus, pH-value State, District, Crop year, Season, Area, Production and yield details, Rainfall details, Temperature details, Groundwater level, Water availability, type of soil, Organic carbon (OC).

Most used evaluation parameters (Answer of question 3) are Accuracy (36 times), Kappa (8), Precision (27), Recall (27), FScore (24), RMSE(6), R^2(5), WCSS (9), Support and confidence(5).

4 Conclusion

This systematic literature review showed that the work in the referred documents those used a several features, depending on the research type and requirements with the selected dataset. Most of the work is done for prediction of yield and applied machine learning algorithms but on different features. Also, work is done for plant disease prediction, weed detection. Selected features are dependent on the objectives of the research. The best model can be identified by testing models with more features and fewer features and also models with different ML techniques. According to survey study and analysis, most of the authors used rainfall, temperature, and type of soil, and most preferred classification algorithms are Neural Networks, Regression techniques, SVM, Random Forest, andRandom Forest worked better in most of the work. The most applied clustering algorithm is K-means for finding efficient solution to problem.

As per the additional survey, Convolution Neural Networks (CNN) with optimized parameters is used by most of the authors for image processing and classification and then another widely used DL algorithm is Deep Neural Networks (DNN). Also, survey shows that very few authors (only 2) used associative classifiers and association rule mining techniques to solve the agriculture problems.