Keywords

4.1 Introduction

Machine learning has made a significant impact in all industries and agriculture sectors. Several sectors in each domain try to use past data and apply intelligent machine learning algorithms for prediction and analysis purposes. With a value of INR 56,564 billion in 2019, the agriculture industry plays a vital role in the nation’s economy in terms of employment and contribution to GDP. Moreover, with 18% of the world’s population, the demand for agro-products has increased year by year [1, 2].

Farmers encounter some obstacles in using conventional methods of farming:

  • In the agriculture life cycle, climatic factors like temperature, rainfall, and humidity are crucial. The rise in deforestation and pollution are leading to climatic changes, so it’s increasingly demanding for farmers to determine how to prepare the soil, sow seeds, and harvest. India is a land with varying temperature ranges and rainfall levels, which play an essential factor in farm management. This shows how a variety of crops can be grown in agricultural fields.

  • Every crop demands a specific type of nutrition for the soil. The three major nutrients essential in soil include nitrogen (N2), phosphorus (P) and potassium (K). The inadequacy of nutrients, as well as their excessive use, can lead to harvest failure.

  • For crop protection, weeds play a significant role most of the time. If unregulated, it can directly affect crop yield and cease its growth. It can also absorb nutrients from the soil, which can cause a shortage of nutrition. Then preserving damaged plants on large acres of land may require a substantial financial setup.

In our research work, these crucial problems are considered where artificial intelligence algorithms have been applied to enhance agronomic management in India since agriculture is one of the major occupations of many rural Indians. In India, there exist around 394.6 million acres of arable land. So growing more agricultural products helps not only humans but also the entire environment. However, as India is a significant producer of wheat, rice, maize, cereals, etc., growing these in every farming land is impossible due to different climatic conditions. However, this increases the scope of producing varieties of crops with an excellent yield depending upon suitable weather conditions.

The topic of our research is based on agriculture yield amplification. The main aim is to help farmers increase their production with the help of suitable fertilizers and make the farmers aware of what diseases the staple crops of India like rice, wheat, apple and tomato possess. The importance lies in the fact that the farmers are unaware of the various properties of their crops due to a lack of thorough knowledge and are misinformed. This misinformation leads to many faulty courses of action, leading to a lower yield of crops [3]. In this scenario, if the crop is infected, it becomes unwanted and no longer edible, which even hampers nutrients present in the soil. Therefore, they need to destroy the previous ones to develop fresh crops. For a long time in India, farmers have been practicing stubble burning, leading to a significant contribution to pollution and global warming. Here, an automated model envisages the crop yield above a specific soil type, and periodically checks the plant’s health, whether it is infected, and its fertilizer content requirement [4].

With various machine learning techniques, one can feasibly predict the best type of crop that can be cultivated with a given set of parameters [5]. Further, the growth of the product can be moderated by fertilizer prediction and disease prediction models, which will help in the early detection of diseases and suggest a change in the type of fertilizers based on Nitrogen, Phosphorus Potassium (NPK) values and soil type. This will lead to proper crop management and yield amplification. Through AI techniques, the information will be unbiased and farmers will be able to make an informed decision regarding their net production [6]. This will help increase the profit margin of the farmers and hence, will increase the exports leading to an increase in the GDP of India. This research will have a more lasting impact than it currently appears to have.

The recent advancement of technology using the Internet of Things, machine learning and artificial technology is also applied in the agriculture domain for better productivity [7,8,9]. In our research, we have aimed at providing automation, where, using machine learning and neural network model, farmers can reduce their crop wastage rate. We divided our model into three subparts, i.e., crop prediction, early detection of plant diseases and fertilizer prediction. Our first model focuses on predicting an appropriate crop to be grown depending on climatic conditions and soil contents of the particular area. We have opted for the random forest classifier algorithm with hyper parameter tuning which helped improve the accuracy of the model and reduce the loss. Our next model is to identify any disease a crop can have before excessive damage using Convolutional Neural Network (CNN). CNNs are actively used for image detection purposes, and by feeding the picture of the diseased crop into the CNN model, through various layers of convolutional, dropout and pooling mechanisms, our model will identify the type of disease. Also, we have used the softmax activation function to reach maximum efficiency and reduced error. Consequently, after the disease is identified, our third model will suggest an appropriate kind of fertilizer that can be used to improvise the content of the rest of the crops depending on the nutrient content present in the soil. We have incorporated an Artificial Neural Network (ANN) model using Adam as an optimizer and regularization techniques to enhance the learning rate for predicting new inputs.

Using various algorithms, we did construct our model, whose detailed workflow is presented in the subsequent sections of the chapter. Section 4.2 contains the Literature Review on farmland optimization and crop monitoring techniques presented previously. Section 4.3 has an elaborated analysis of Related Works corresponding to our current research presented. Section 4.4 contains the Proposed Model which elicits a qualitative and quantitative analysis of our machine learning and ANN techniques proposed in this research. Section 4.5 has a Result Analysis that summarizes all our observations (subjective as well as pictorial) and computes various outputs based on tuning parameters of the algorithms. Section 4.6 contains the Conclusion that briefs on the three connected models that aid farmers to reduce crop wastage.

4.2 Literature Review

Agriculture, being a crucial sector of India, serves as the backbone of the economic system. The prediction of various crops under suitable conditions has been highlighted using data mining techniques [1]. The growth rate of three major crops in India is shown in Fig. 4.1 [2]. With exponential progress in agriculture, Singh et al. [3] reported a very sensitive issue of burning stubble and grasslands to control the growth of weeds, insects, plant infections, and excess crop residues that were vigorously practiced earlier. Even now in some parts of India, the stubble burning practice is still prevalent. Due to this quick, inexpensive and effective method, ease of seeding and other soil operations have been enhanced at a large scale. It is also the fact that it takes one-and-half months to decompose the wheat residue left by harvesters and when farmers don’t have sufficient time to wait for sowing their fresh crops, stubble burning is practiced to prepare a new soil bed for their crops. But the dangers of burning wheat stubble can lead to major threats to the environment. This can directly harm the atmosphere, and indirectly the plants and humans as depicted in Fig. 4.2 [10].

Fig. 4.1
A grouped bar graph of the growth rate in percentage for wheat, rice, and maize. It plots yield and harvested area. In all three crops, the yield growth rate exceeds the harvested area.

Growth rate of 3 major crops in India [2]

Fig. 4.2
A stacked bar graph of the air quality in the harvest season and after. It plots the level of air quality index as severe and very poor for October 2016 and 2017, November 2016 and 2017, and December 2016 and 2017. December 2016 and October 2016 have the highest and lowest values, respectively.

Air quality in and after Harvest [10]

The flaming can cause soil nutrient loss of organic carbon, nitrogen, phosphorous, and potassium and also deteriorate the ambient air quality [3]. Not only wheat but also burning of other agricultural residues discharge various trace gases like methane, oxides and ample quantities of suspended particulate matter causing adverse effects to humans. Till now there are many large-scale applications to tackle this deterioration. Many ways to conserve agriculture, mainly wheat-maize [5], can be effectively practiced if crop residue management plans are developed taking into consideration the demand, quality, feasibility, and economics of residue management which serves as an efficient way to preserve land. Figure 4.3 shows how important it is to take care of these leading crops like rice, wheat and maize due to their burning demands and exports. Still, there has been a minute sort of gaps in preserving soil as early as possible or early detection of any infection. Our work is to feature some improvement toward the conservation and preservation of natural soil.

Fig. 4.3
An area-cum-line graph of million metric tons over the years from 2005 to 2013 plots wheat and rice stocks and exports. The wheat and rice stock area ascends from 20 M M T to 70 M M T. The line for exports follows a fluctuating upward trend.

A comparison of the wheat and rice stocks with their respective export

As soon as image processing is done to detect various infections, performing testing of soil samples can break the chain of spoil or destruction at a perfect time. There are many effective techniques to treat soil using suitable fertilizers as per requirements [11]. So this research is about performing tests of soil to know the present type of soil, weather conditions and contents of the fertilizer used. This way one can easily keep track and change fertilizers from time to time for a better outcome. Literature [11] reported the regulating of fertilizers used after inspecting the current soil content. There exist various efficient tests to find co-integration between consumption of fertilizer and food grain production. Kumar and Indira have reported [12] the effective methods and tests to support the cointegration relation. As far as classifying infection is concerned, there are various recent machine learning approaches and techniques to detect diseases and pests in agricultural products [13].

During the second phase, apple being the cure to everything – a well-accurate model has been set up for early detection and classification of its infection. Turkoglu et al. [14], highlighted proper measures using a multi-model Long short-term memory (LSTM) with good accuracy and bagful features. Our chapter will further add on techniques of early detection of infections in apples with better accuracy levels using various kinds of activation functions.

4.3 Related Work

Crop yield mapping, yield estimation, matching of supply with demand, and crop management to increase productivity [15, 16] are essential steps to boost the economy. Machine learning provides low-cost and efficient solutions for crop yield prediction. There are many efficient ways of predicting a crop yield under weather conditions over the field [17]. As mentioned earlier, crop production in India solely depends on temperature and moisture content in a specific area of growth. The review [18], has clearly shown the dependence of crop yield w.r.t change in diurnal temperature range by which one can know the impact of time and temperature to boost up their yield. Similarly, our research deals with amplifying any crop, and its yield under suitable climatic conditions.

In our work, we have portrayed how any spare land can grow its worthy crop rather than being a lonely site or growing up a factory. Growing ample amounts of crops make a path to greenery and a pristine atmosphere which can directly improve polluted air around us. So, with the context of [17] keeping an eye on crop-climate relationships, we have created a model which predicts which crop is suitable to grow under the area depending on climatic conditions and soil content. After a crop is produced, depending upon the soil conditions fertilizer levels are given initially. While applying any fertilizer to the soil, crucial care must be taken of the contents in the soil. Haynes and Naidu [19] explained the influence of fertilizers and manures on the soil’s organic matter content and how it changes the physical conditions of the mineral-rich soil. Many times due to improper irrigation facilities available to the poor farmers, the soil gets extensively waterlogged and some of its nutrients may be diluted which in turn will affect crop yield. In [20], it is explained how the soil due to waterlogging loses some nutrients and causes infection itself. Hence, from there the need for fertilizers comes. A complete analysis of fertilizer used in India for the last 20 years is reported by Kumar and Indira [12]. This showed a long-run correlation between fertilizer use and food grains consumption over the last 20 years as in Fig. 4.4. They explained how farmers are increasing their production with increased fertilizer consumption without considering the environmental, health consequences, and sustainability of agriculture [12].

Fig. 4.4
A multi-line graph of fertilizer consumption and food grain production over the years from 2000 to 2014. Both lines ascend with fluctuations.

Relation b/w fertilizer consumption and food grain production

India is the leading producer of wheat and maize whose yield mostly depends on monsoon and temperature. Being the sole grain for all beings, producing an ample amount of the crop is necessary. A comparative analysis of their yield based on various machine learning techniques is elaborated in [15]. High-yielding varieties such as maize and wheat demanded more usage of fertilizers which is supported by the subsidy policy on fertilizers [12]. With vast acres of land and ample crop yield, the crop should be taken care of any blight. So our model is to timely inspect the condition of the crop with high accuracy to minimize any error.

Many plant leaf diseases were studied including maize (corn) with northern leaf blight, common rust and gray spot (shown in Fig. 4.5) with a well-detailed analysis [21,22,23]. The model proposed in this work has achieved a higher accuracy of 98.82% than that reported by Geetharamani and Pandian [22, 24].

Fig. 4.5
3 photos of the diseases found in maize leaves labeled A, B, and C. It has the northern leaf blight on the edges of the leaf, common rust throughout the leaf, and gray spots on various parts of the leaf.

Diseases found in maize

Singh and Arora [23] have used a convolution neural network method to distinguish between healthy and unhealthy wheat. The unhealthy wheat crops affected by leaf rust and stem rust are shown in Fig. 4.6. In this work, we were successful in enhancing our prediction with an accuracy of 98.7% by learning effective algorithms and activation functions that one should use to pump up the accuracy than that reported in the literature [21, 23].

Fig. 4.6
2 photos of the diseases found in wheat leaves labeled A and B. It has leaf rust throughout the leaf, and stem rust along the stem.

Diseases found in wheat

Fruit diseases can cause a literal economical loss if not controlled on time. An example has been taken to show how the apple scab, rust, and black rot diseases [22] affect the apple leaf and crop (shown in Fig. 4.7). Alharbi and Arif [25] have used CNN to detect and classify apple diseases. In our work, we have included some concepts and better algorithms to predict diseases with better accuracy of 98.93%.

Fig. 4.7
3 photos of the diseases found in apples labeled A, B, and C. It has the apple scab on the sides and the middle of the apple, cedar apple rust on the bottom part, and apple black rot on the body.

Diseases found in apple

If necessary action is not taken on time, this will result in wastage of crops, and fertilizers leading to financial loss and poor resource management. If the waste is huge due to late inspection, this can lead to burning the residues to grow fresh crops which will directly affect the entire environment drastically. Our research will keep the four most important features – temperature, rainfall, moisture, and pH level of the soil to envisage a particular crop.

4.4 Proposed Model

India’s prime source of economy is agricultural produce, because of the vast areas of cultivable land it is blessed with. As stated earlier, due to biochemical and environmental factors the crop yield may get affected. Hence, we have proposed three interconnected models, which when run in parallel can help the farmers produce more yield. Our first model is the crop prediction module, where based on parameters like temperature, humidity, pH and rainfall, we predict which crop should be ideally grown in a particular area. Then after the crop yield, a snapshot can be taken to check whether the particular crop has been affected or not. If the crop is healthy, then we continue with timely crop monitoring to prevent any crop loss. But, if the crop is affected, we can pass it through our model and predict which disease the crop is affected by. After the prediction is made, based on conditions of the soil one check which suitable fertilizer should be used to prevent the crops from being affected by any disease, based on parameters of the soil like the temperature of the surrounding, humidity levels of the region, the soil type, the crop type grown and the concentrations of nitrogen, potassium and phosphorus in that particular land. With the help of these parameters, an appropriate remedy can be provided to the farmer. Figure 4.8 gives a diagrammatic representation flow of our proposed model, where the relational connection between the various modules such as the crop prediction module, disease classification module and fertilizer prediction module is represented step by step.

Fig. 4.8
A flowchart of the crop production prediction module has various processes. It includes crop produced, diseases classification module, extracting soil features, fertilizer prediction module, and refresh crop production module, among others via found and not found.

Proposed model architecture

4.4.1 Machine Learning Approach for Crop Prediction

The use of statistics and mathematical models clubs together to build a machine learning model. Data and statistics help any model or network to learn relationships, dependencies, and equations among their participating factors. The development of this form of correlations between dependent and independent variables has been made. In [15], comprehensive research is presented on how machine learning is of importance in agriculture. To ensure the machine learning (ML) model is successful, we need to tune every step towards betterment. Figure 4.9 shows a layout of how a machine learns and analyzes patterns.

Fig. 4.9
A flow diagram has the following flow. Training data, labeled or unlabeled, is fed to a machine learning algorithm that yields a classification or prediction rule with new examples and predicted outputs.

Basic architecture of a machine learning model

Here some trained data passes through which a model analyzes and learns all co-relations by a specific algorithm. Then the model infers some prediction or classification rules to predict new examples or future data. In Fig. 4.10, a detailed chart is presented on how the ML model works.

Fig. 4.10
A block diagram of the working steps of a machine learning model has the following flow. Data collection, preparing and data pre-processing, learning algorithm, training model, and evaluating model and prediction.

Working steps of a machine learning model

4.4.1.1 Data Collection

Gathering data is crucial for a model as the quality and quantity of data that we collect directly determines how good and accurate your predictive model can be. Mathematically, the amount of training data is directly proportional to model accuracy. Our source of data has been extracted from Kaggle datasets having 3100 entries with attributes temperature, rainfall, pH, and the humidity of the field to predict the desired crop that can be grown. Figures 4.11 and 4.12 depict the range of rainfall and temperature present in our data where the optimum range of temperature and rainfall for most of the crops produced is 14–38 °C and 20–250 cm, respectively.

Fig. 4.11
A histogram cum line graph of rainfall range. It plots a downward trend with a peak between 30 and 130 millimeters rainfall on the horizontal axis. All values are estimated.

Rainfall range for crop prediction in India

Fig. 4.12
A histogram cum line graph of the temperature range. It plots a downward trend with a peak between 15 and 35 on the horizontal axis. All values are estimated.

Temperature range for crop prediction in India

4.4.1.2 Data Preparing and Pre-processing

Firstly in data preparation we read and load our data into a suitable set and prepare it for training. Pre-processing sometimes becomes very important because the data we collect can be irregular, absurd, and erroneous due to which data needs other forms of adjusting and manipulation like deduping, normalization, error correction, etc. Raw data should always be cleaned before feeding into the machine to enhance overall accuracy. Moreover, Bhaya and Wesam [26] have discussed various beneficial methods of preprocessing techniques in detail, which are used for data mining.

The next phase explains the need of splitting data into training and testing sets to get validated by the learning algorithm. To prepare this data, the test set is taken as 25% of the whole dataset with a random state tuned to 1. The random state is used in the data splitting module to ensure that the generated splits are reproducible. Feature scaling in entire data is needed in almost all models to give unbiased importance to every factor. In Table 4.1, a data frame is shown with varying data ranges in all columns.

Table 4.1 Dataset sample for crop prediction

Due to this irregularity, there can be a possibility of the machine giving utmost importance to a factor with a high data range and neglecting others due to low data range irrespective of their real contributing nature. So we need to standardize all variables in the same range. There are mainly 2 processes of normalizing as given in Eqs. 4.1 and 4.2.

$$ {X}_{new}=\frac{X-{X}_{mean}}{Standard\ Deviation} $$
(4.1)
$$ {X}_{new}=\frac{X-{X}_{min}}{X_{max}-{X}_{min}} $$
(4.2)

A broader perspective has been provided in [27] highlighting various methods of normalization and their influence. For our work, we have used a standard scalar to normalize all values under each attribute to develop equal importance as in Table 4.2. Using this scalar all values range from −3 to +3.

Table 4.2 Normalized dataset sample

4.4.1.3 Learning Algorithm

Machine learning is backed by algorithms that predict output values within an acceptable range after analyzing input data. These algorithms are programmed to learn and optimize their calculations for better performance and efficiency when new data is allocated, developing intelligence over time. Machine learning marks the use of a wide range of algorithms to get a good correlation among the variables. We have used supervised algorithms like Logistic Regression, Naive Bayes classification, K-Nearest Neighbor Classification, and Support Vector Classification to make the machine learn correlations among variables [28,29,30]. Apart from Random Forest Classification which is derived from the Decision Tree was found to be at maximum accuracy (Fig. 4.13) with a great learning rate.

Fig. 4.13
A bar graph of accuracy versus models plots 6 models with their accuracy. The random forest classification model has the highest accuracy, at 0.9522, and the logistic regression classification model has the lowest accuracy, at 0.7516.

Accuracy of different models

Since the best result is achieved using a decision tree with entropy, the criterion for the random forest is also chosen as entropy. This gave an accuracy of 95.22% and is the best among all other models.

4.4.1.4 Training the Model

As in the earlier section, it concluded that applying Random Forest Algorithm with the Decision Tree algorithm as its origin has increased the predictive nature of the model effectively with an accuracy of around 95%. In [17], a well-designed approach has been researched on crop prediction using Decision trees. So here in this work, we chose a random forest as our proposed algorithm. Figure 4.14 shows the working of the Random Forest Algorithm for any model.

Fig. 4.14
A block diagram of the training set has several training samples 1 and training samples 2 with common voting that leads to a prediction. A test set block is given below the training set.

Working of random forest classifier

We can understand the working of the Random Forest algorithm with the help of the following steps. These processes go over desired iterations to make the predictive model perfect and efficient.

  • Step 1 − For a given data, a random number of samples are selected.

  • Step 2 − For every sample, a decision tree is constructed which gives the prediction result.

  • Step 3 − For every result, voting is performed.

  • Step 4 − The most voted result is selected as the final result.

In Fig. 4.15, the code snippet shows the initiation of a random forest algorithm and fitting that classifier to learn co-relations among attributes for the prediction of future data.

Fig. 4.15
A code snippet has the parameters for random forest classification. It imports the random forest classification module from s k learn dot ensemble and uses it to plot the entropy.

Parameters considered under random forest classification

This classifier object fits the training set to learn the relations, then validates with the testing test over iterations to figure out the best accuracy which is discussed in the section below. This final model is then used to predict future data in real-time scenarios.

4.4.1.5 Evaluating the Model and Prediction

To evaluate our model’s performance [29], we use different types of evaluation metrics to improve the power of prediction before performing predictions on unseen data. To evaluate our models, we used metrics like f1-score, recall, and precision to validate at their best. In [23, 31], a proper dependency of evaluation/accuracy metrics is shown w.r.t. various predictive models.

Precision: When the model predicts a class, precision is used to determine how many times the model is predicting that class correctly as in Eq. 4.3

$$ Precision=\frac{TP+ TN}{TP+ TN+ FP+ FN} $$
(4.3)

Where, TP = True Positive, TF = True Negative, FN=False Negative, FP=False Positive.

Recall: When the value is true, recall tells us how many times the model can predict it’s true as in Eq. 4.4

$$ True\ Positive\frac{Rate}{Recall}=\frac{TP}{TP+ FN} $$
(4.4)

F1-score: The harmonic mean of precision and recall is F1-score, where its best value is 1 which means perfect precision and recall and the worst is 0 in Eq. 4.5

$$ F1=2\times \frac{Precision\times recall}{Precision+ recall} $$
(4.5)

The study of data analysis [23] inferred that the higher the F1 score, the better is the predictive model, with 0 being the worst possible and 1 being the best. Taking into consideration all crops, the average of all evaluation metrics tends toward 1 which indicates our predictive model is well-developed and good.

The average of all evaluation components was found to be:

  • Precision = 0.9519354839

  • Recall = 0.9451612903

  • F1_score = 0.9464516129

These components generally should be near to 1 for any model to fit perfectly. The model accuracy for crop production amplification was found to be 95.22% with the proposed algorithm as Random Forest Classifier using entropy index.

4.4.2 Disease Detection Prediction

A Convolutional Neural Network (CNN) is a type of deep learning algorithm used for image classification. It can be termed a multi-layer neural network designed for analyzing visual inputs to perform tasks such as image classification, segmentation and object detection. So every CNN follows a flowchart where it can learn the pixels step by step and with that knowledge, it can predict and classify any image. The following steps are depicted in Fig. 4.16.

Fig. 4.16
A block diagram of the working principle behind a C N N model has the following flow. Image acquisition, image processing, feature extraction, disease classification, and image validation.

The working principle behind a CNN model

In this research work, we would like to brief you on a basic CNN overview in Fig. 4.17. The initial pixels have been reduced for better machine interpretation. Abetting the pixels removes the complexity of any image and allows algorithms to learn images effectively. So now we will follow a step-by-step procedure where an applied algorithm can detect and classify diseases caused by maize, wheat and apple.

Fig. 4.17
A diagram of a C N N model has layers of convolution 1 through 5 connected via 5 max pooling layers, and the fully connected layers of f e 6, 7, and 8 from left to right.

Basic architecture of a CNN model [32]

4.4.2.1 Image Acquisition

Image datasets for maize and apple diseases have been taken from the Kaggle data source and tested on real field image data. This data set is reformed using offline augmentation from the original data set. For wheat disease classification, images come from a variety of sources. Some of the data is also from public images under Google.

Here for extracting images from source to machine, class_mode is assigned as ‘sparse’ due to multinomial variations of diseases for each crop. The following is some information and basics of how infections grow. We have highlighted all types of diseases focused on them in this research. As mentioned earlier, maize and wheat are important – utmost care should be taken to its cause and precautions adding to timely checkups.

The diseases specified in Table 4.3 are generally caused by wet springs and humid weather conditions. This disease may not kill the host but is involved in fruit deformation and premature fruit drop. The spread of these infections across the fields can also be determined by the windblown spores of the fungi which can carry disease long distances.

Table 4.3 Number of training and testing sets of all types of each crop

4.4.2.2 Image Pre-processing

Image pre-processing involves re-scaling or transforming every pixel value from the range (0, 255) to (0,1). Some images have a high pixel range and some have a low pixel range, due to which it can be quite perplexing for a machine to recognize its complete features. In this case, if we treat all the images in the same manner, the neural network module will also consider all images equally important. Also, scaling every image to the same range of (0,1) will make images contribute more evenly to the total loss existing. Moreover, a brief study about image processing is provided in [33].

From Keras, we import the Image Data Generator library to perform the enhancement of an image as shown in Fig. 4.18.

Fig. 4.18
A block diagram of the stepwise procedure involved in image processing has the following flow. Raw image, invert, thickening, resize into 64 by 64, and pre processed image.

Step wise procedure involved in image processing

Here, in our proposed methodology, some mini processing steps have been taken for a better enhancement of images: such as every image being resized into 64*64 target size. The training data is processed in a way the machine can withstand any image flips i.e., rotations – left and right and also preserve the true classification of any crop disease. Also for better feature extraction in further steps, all images have been processed with a prior zoom limit of 0.2.

As mentioned above data augmentation has been used in all possible forms to make the learning better for the neural network which can help to increase the amount of relevant data in any dataset. Images of all crops for every disease after re-scaling are shown in Fig. 4.19.

Fig. 4.19
11 photographs of the diseased leaves and stems of apples, maize, and wheat numbered accordingly. 4 apple leaves are at the top, 4 maize leaves are in the middle, and 3 wheat leaves and stems are at the bottom.

Re-scaled images of all crops diseases after pre-processing

In Fig. 4.20, it is shown how the whole image processing is well connected to image acquisition, and feature extraction processes extending data validation and classification. These procedural steps give clarity about which image has been in input, from noise removal to final classification via perfect validation accuracy.

Fig. 4.20
A flow diagram has 2 parts of preprocessing and post-processing with the following flow. Image acquisition, background removal, image enhancement, object representation, track initialization, track maintenance, data validation, and classification.

Cycle of pre-processing and post-processing

4.4.2.3 Feature Extraction

To recognize any image, CNN must scan every image deeply. All the existing features in any image should be known for better extraction of key features to enrich the classification accuracy. Feature extraction has convolution layers followed by max-pooling and an activation function. The working principle of this step includes treating the pre-trained network as an arbitrary feature extractor, allowing the input image to propagate forward, stopping at the pre-specified layer, and features are the output of that layer. Also, Keiron and Ryan [34] have reported an overview of the working of CNN architecture.

The accuracy of learned models is increased using feature extraction by extracting features from input data. This phase not only helps in enhancing the final accuracy but also removes redundant data and hence reduces the dimensionality of data. Hence, it increases training and inference speed.

In this research for the feature extraction process, we have created two deep convolutional layers, followed by two pooling layers and an activation function ‘reLU’ for each crop having the same parameters as shown in Table. 4.4. We found our model to be in a good perspective using 2-layers in feature extraction. A good amount of research has been done in [25] based on feature extraction. In agreement with their work, our model has been regenerated and optimized with greater accuracy.

Table 4.4 Parameters used in the feature extraction of the CNN architecture
4.4.2.3.1 Convolutional Layer

A conv2D layer has a filter or a kernel that consists of dimensions 3*3 kernel size and filters as 32 in our model as shown in Fig. 4.21. These are generally smaller than the input image so the whole image is covered. The area, called the receptive field, is where the filter is on the image area which is willing to consider or accept new suggestions and ideas.

Fig. 4.21
A convolution 2 D layer has a kernel size of 3 by 3 and 32 filters that are represented via 4 square blocks. The first square block has a 3 by 3 grid filled with asterisk marks.

Kernel size and filters used in conv2D model

There are three channels in an image mainly red, green, and blue through which the filter present in Conv2d extends with the possibility of each channel having different filters. Convolution is performed individually for each channel and then, they are integrated to get the final output called convoluted image. After the convolution operation, a feature map is received as an output of a filter. Figure 4.22 represents a generic conv2D network used [25]. Convolution layer output is represented by Eq. 4.6.

$$ {M}_j^p=f\left({\sum}_{\textrm{i}\in {\textrm{M}}_{\textrm{j}}}{M}_{\textrm{i}}^{\textrm{p}-1}\ast {k}_{i\ j}^p+{N}_j^p\right) $$
(4.6)

Where p represents the pth layer, ki j denotes convolutional kernel, Nj denotes bias and Mj denotes a set of input maps.

Fig. 4.22
A convolution 2 D layer of a network has a 7 by 7 grid of the input image with the top left and bottom right grids of 3 by 3 highlighted. It leads to a filter grid of 3 by 3 with data, and an output grid of 3 by 3 with data.

Generic representation of a Conv2D network overlapping a filter dimension into input image to form the output shape [25]

4.4.2.3.2 Pooling Layer

Extraction of sharp and smothered features is mainly done using the pooling layer. It is also done to reduce divergent features of data and computations for better estimation. Mathematically pooling is calculated as in Eq. 4.7.

$$ \textrm{Output}\ \textrm{size}=\frac{\left(\textrm{Input}\ \textrm{size}-\textrm{Pool}\ \textrm{size}+2\ast \textrm{padding}\right)}{\textrm{stride}}+1 $$
(4.7)

Gholamalinejad and Khosravi [35] have discussed various effective pooling methods in detail. In our work, we have used max-pooling represented as in Eq. 4.8. And a related diagram is shown to understand its working in Fig. 4.23.

$$ V={\mathit{\max}}_{i,j=1}^{h,w}{S}_{i,j} $$
(4.8)
Fig. 4.23
A max pooling grid of 4 by 4 has data that leads to a max pooling filter of 2 by 2 grid with data. It has stride given as 2.

Max pooling – Single Depth slice

4.4.2.4 Disease Classification

This is the last phase of the architecture where the disease is predicted. After the max-pooling feature, the process of flattening is done. The flattening method converts the output into a vector. The result of the classification can be obtained only through a vector. Hence, the conversion is necessary. Fully connected layers flatten the network’s 2D spatial features into a 1D vector that represents image-level features for classification purposes [36].

Table 4.5 shows four multiple dense layers, a deeply connected neural network with the units and activation function used in each layer respectively. The hidden layers have used ‘reLU’ to increment the non-linearity in our images. The output layer has 4 units for each class of disease and softmax is used because it is suitable for mutually exclusive multi-class classification in the logistic regression model.

Table 4.5 Number of units and activation function in a dense layer
4.4.2.4.1 ReLU

The ReLU layer is used as an activation function here between the convolution layer and the feature maps (Fig. 4.24) [25] to convert all negative values to zero without affecting the size of the image and its dimensions.

Fig. 4.24
A convolution 2 D layer has a 7 by 7 grid of the input image with the top left and bottom right grids of 3 by 3 highlighted. It is followed by a stack of feature maps pointed via arrows, which leads to a graph of Y versus sigma. The graph has a formula f of x equal to max x, 0 within parentheses from the origin 0.

Applying activation function Relu to feature maps of convolution layer [25]

So these 3 layers sum up to feature extraction giving a basic accuracy level. We added a dropout layer to achieve better accuracy. This regularization technique is used to prevent the model from overfitting as it randomly sets the input neural units to zero at each step during the training process [37]. It takes preventive measures to avoid complex co-adaptations on the training data which results in reduced overfitting.

4.4.2.4.2 Softmax

Softmax is a combination of multiple sigmoid functions. A brief discussion about various activation functions used in the deep learning technique is provided in Fig. 4.25 [38].

Fig. 4.25
A graph of the softmax activation function has 4 quadrants of positive and negative values. It plots an S-shaped curve from the third to the first quadrant.

Graph of softmax activation function

In the compilation process for our CNN model, we have used the Adam optimizer because it gives the best accuracy among other optimizers available.

Figure 4.26 represents the overview of the internal mechanism of our model (including all the layers). The fully connected layers are finally formed by the last few flattened layers of our CNN model. These layers are instrumental in predicting the final output, i.e., predicting a particular disease.

Fig. 4.26
The convolution plus pooling layers have an image of a diseased leaf in convolution plus nonlinearity that connects with the stacks of max pooling layers. It leads to the fully connected layers of v e c that form the N x binary classification of disease 1, 2, 3, and healthy leaves.

Internal mechanism of how fully connected layers are formed using maize common rust image

4.4.2.5 Image Validation

Several sample images were selected from different sources for validation for which the model predicted diseases accurately. This was achieved due to the high accuracy of our models which was accomplished by tuning various parameters to enhance for a more precise outcome. Later in Sect. 4.5.2, a detailed analysis will be presented on validation tests and overall accuracy. Further, it is discussed how regularization techniques can improve performance in a supportive manner reducing the margin of error.

The various parameters of the CNN such as the number of layers needed, the number of units per layer, the kernel size and the apt activation function for our model, were carefully tuned after several considerations. After making changes, we obtained a good accuracy for our CNN model. Figure 4.27 describes the summary of the model. It shows the output shape of the layers and the number of layers after each change.

Fig. 4.27
A table of a model titled sequential has 2 columns of layer type and output shape. It has 9 rows of data underneath that includes c o n v 2 d, max underscore pooling 2 d, and c o n v 2 d underscore 1, among others.

Proposed model summary for disease detection using CNN

4.4.3 Artificial Neural Networks for Fertilizer Prediction

Artificial neural networks are structures used for computing various tasks and their working is inspired by the human brain. These networks are similar to simulating tasks like clustering, pattern recognition, and classification on the computer. Presently, the usability of these networks has also forayed into the world of agriculture, helping our farmers to increase their profits.

The ANNs learn by adjusting the weights and biases for all the input parameters, it tries learning and gives a prediction. The process of learning is summed up in the steps in Fig. 4.28.

Fig. 4.28
A block diagram of the working of an A N N model has the following flow. Collected data, data preprocessing, input layer, hidden layer and feature extraction, and output layer and prediction.

Working of an ANN model

The description of the diagrammatic representation of the above steps is discussed below.

4.4.3.1 Data Collection

Our data source for the research work is Kaggle. The parameters (Fig. 4.29) used for predicting an apt fertilizer are the temperature conditions, the humidity level (an absolute value), and the moisture content of the soil. Sandy, loamy, red, etc. were some of the soil types which were the determinants of the correct fertilizer. The amount of nitrogen, potassium, and phosphorus, some of the essential nutrients required for the good growth of the crop, are also included as parameters. Some of the fertilizers which can be predicted by our model are- Urea, 14-14, DAP, 28-28, etc. (Fig. 4.30) among which urea is found to be of maximum use in India.

Fig. 4.29
A table has 9 columns of temperature, humidity, moisture, soil type, crop type, nitrogen, potassium, phosphorous, and fertilizer name with 5 rows of data given underneath.

First five readings of the fertilizer dataset

Fig. 4.30
A pie chart of the fertilizer usage distribution has the following fertilizers in percentage. Urea, 22.7. D A P, 18.6. 14 35 14, 14.4. 28 28, 15.5. 17 17 17, 7.2. 20 20, 14.4. 10 26 10, 7.2.

Distribution of various fertilizers present in the dataset obtained

4.4.3.2 Data Pre-processing

The collected data is labeled encoded. There might be a substantial difference in the values of some columns of the data which may lead to prioritizing the columns which have larger values, hence normalization is done. Among various normalization methods [26], the mathematical representation for some of the normalization techniques like Standard Scalar and Min-Max Scalar is given in Eqs. 4.1 and 4.2. We have used Standard Scalar in our model.

4.4.3.3 Input Layer

Artificial neural networks need data to be fed into our systems as input. Then, the data is randomly assigned with weights and forwarded to the next hidden layer. Figure 4.31 shows the input layer parameters where the neuron is represented by a circle to which all the inputs are added. Then, an activation function is applied followed by bias.

Fig. 4.31
An input layer has the input parameters of temperature, humidity, moisture, soil type, crop type, nitrogen, potassium, and phosphorous in an oval circle with the formula of f input parameters within parentheses + b. It gives prediction equal to fertilizer.

Input layer showing various input parameters

A bias can be considered to be analogous to adding a constant value in a linear equation and finally combining it to form a network. After this internal computation, a predicted value is obtained. This is an abstract overview of how the computation is done.

4.4.3.4 Hidden Layer and Feature Extraction

The hidden layers are used so that the machine gets enough to learn about the data and its co-relationship with other attributes which indeed helps in predicting the class for any future data. We have implemented our model using 2 hidden layers with 8 neurons each (Table 4.6).

Table 4.6 Hidden layers specification

When the input parameters are fed into the machine, due to the neurons in hidden layers some weights are multiplied by the input data to increase its learning rate. Finally, a bias is added to the result which serves as an additional parameter to adjust the output. Our developed neural network with hidden layers and its neuron is figured in Fig. 4.32.

Fig. 4.32
A neural connection network of the A N N model has inputs X 1 through 8. It also has an input layer, 2 hidden layers from 0 through 7, and output layers from 0 through 6 interconnected to each other.

Neural connections in our ANN model for fertilizer prediction

Activation functions [38, 39] are generally used in deep learning models to introduce non-linearity. Arya and Ankit [40] have provided a brief analysis of the learning and recovery of ‘reLU’ function. The ReLU activation function (Eq. 4.9) is used after the features are extracted. ReLU activation function (Fig. 4.33) solves the problem of vanishing gradients which is computationally less expensive as compared to other activation functions like ‘tanh’.

$$ ReLU=f(x)=\Big\{x, if\ x>0\ 0, if\ x<0 $$
(4.9)

Later in Sect. 4.5.3, we have discussed how a regularization technique can render a better accuracy with increasing performance for our base model. In [39], a comparative analysis of various regularization techniques used in ANN is provided.

Fig. 4.33
An R e L U activation graph moves along the x axis from (minus 10, 0) and rises linearly from (0, 0) onwards. It also contains the formula R of z = max 0, z enclosed in parentheses.

ReLU activation function

4.4.3.5 Output Layer and Prediction

The output layer is also referred to as the classification layer. The sigmoid function is being used for predicting the final output. The number of units used in the output layer is 7 (Fig. 4.34).

Fig. 4.34
2 code snippets have the configuration and compilation steps of the A N N model. These include the functions a n n dot add and a n n dot compile.

Output Layer configuration and compilation step

Then we call the compile module to combine all the layers to start its training phase and finally predict the favoring class. The optimizer used in the compilation is Adam as in Fig. 4.35. Among the various varieties of SGD such as RMSProp, Adagrad and Adam optimizer gives the best outcome. A more detailed analysis of Adam and a variation on it is in [41].

Fig. 4.35
A graph of the Adam Optimizer plots an inverted S-shaped curve. It has a peak at 7 on the y axis and a dip at 2 on the x axis. There are various points marked on the curve. All values are estimated.

Curve of Adam optimizer

Many optimizers can be used such as SGD, and RMSProp [41]. But Adam optimizer is considered to be the best one due to its learning rate, reliability, efficiency and cost-effectiveness as compared to others. Table 4.7 shows the learning rate of different algorithms.

Table 4.7 Learning rate of various Optimizers

Here, Adam and RMSProp show a good learning rate of 0.001. In Sect. 4.5.2, we will further discuss the results obtained from using these various optimizers.

After training the model with Adam optimizer, using metrics like precision, recall, and F1-score, the model was evaluated. The average values of these stated metrics were found close to 1 giving an insight that our model has nice accuracy and fitting.

The ANN model for fertilizer prediction was found to be 96% with our proposed network model with 2 hidden layers of 8 neurons each and the output layer of 7 neurons due to 7 classified fertilizers in our dataset.

4.5 Result Analysis

4.5.1 Crop Prediction

We have extended some scope of hyper-parameter tuning in our proposed random forest algorithm by scaling parameters like n-estimators and max-depth. N-estimators are the number of trees internally built by the algorithm before the averages of the predictions are made. Hence, the higher the number of trees, the algorithm will be able to choose from more options, which will help the classifier learn better. From Table 4.8, it is evident how an increase in the number of estimators yields better accuracy in the random forest classifier.

Table 4.8 Tabulation of accuracy achieved using different values of n-estimators

But n-estimators also depend on the shape of the original dataset or inputs. If n-estimators exceed the number of input rows, the model can lead to fainting situations. So, utmost care must be taken while tuning the parameters. However, reduced n-estimators can also raise problems in the designed model.

As the max_depth increases, the accuracy decreases due to over-fitting. Clearly, from Table 4.9, we can see how the accuracy decreases up to a certain level by increasing the max-depth and then reaches a threshold. The model learns patterns and various correlations to adopt predicting new data correctly. Hence, an optimal value for max-depth must be chosen based on the number of features present in the dataset.

Table 4.9 Comparison of accuracy achieved using different max-depth values

For our final model, we have trained the Random forest classifier using 1000 n-estimators and max-depth as 10. Using these hyper-parameters, the evaluating metrics such as precision, recall, support and F1-score are close to 1.00.

To visualize the deviation of predicted output from the actual output, a graph was plotted for 100 observations. In the above graph (Fig. 4.36), the blue line indicates the predicted data and the red line indicates the test data. The coinciding lines depict that most of our predictions are in synchronization with the actual output which accounts for the accuracy of 95.22%.

Fig. 4.36
A graph of accuracy versus the number of observations plots predicted data and given data for test data versus predicted data. Both of them follow intensely fluctuating trends.

Deviation of predictions from the actual result

Heatmap is a visualization technique to show the multicollinearity amongst the attributes present in the dataset. Values closer to 1 depict a positive correlation, whereas being closer to 0 means there is no linear trend. The monochromatic scale parallel to the heatmap represents colour association to correlation. As shown in Fig. 4.37, highly correlated are defined using a lighter shade while the least ones are shown using a darker shade. And the correlation among the same variables is depicted by the white color, which is 1, as shown in the corresponding colour bar of the heatmap since the same attributes will always be completely correlated. The heatmap in Fig. 4.37 also shows how the characteristics like temperature, humidity, pH and rainfall are associated with our target variable. Some negative correlations are between pH and precipitation, temperature and the final label output. At the same time, some positive correlations are humidity and rain.

Fig. 4.37
A heatmap plots the correlation between temperature, humidity, p H, rainfall, and labels on both the y and x-axis with the help of a color gradient scale.

Heatmap showing a correlation between different attributes in crop prediction

Figure 4.38 depicts the importance of the attributes used to predict the crop. India has a monsoon type of climate directly influenced by the water bodies surrounding the sub-continent. So as per the graph, the amount of rainfall is the most critical factor in determining the crops that can be cultivated in a particular region of India. Ensemble learning is used to determine the importance of features with the ExtraTreeClassifier. For better visualizations, the output of the feature_importances_ class is plotted as a “barh” graph.

Fig. 4.38
A horizontal bar graph plots the importance of different features in crop production for p H, temperature, humidity, and rainfall. Humidity and rainfall have the highest values of 0.30, whereas p H has the lowest value of 0.17. All values are estimated.

Importance of different features in crop production

Finally, the accuracy we have achieved using the Random Forest classifier is 95.22%, which is better than other proposed models [17].

4.5.2 Disease Classification

We will discuss the diseases that affect three crucial crops grown in India for our proposed CNN architecture. Better accuracy has been achieved through regularization techniques and varying the number of epochs. By training our model more times, the model will better understand patterns and produce more correct predictions. Hence, accuracy is a measure of how well our model will reduce mispredictions. Better the accuracy, the better the model we have developed. The relationship of accuracy with varying epochs for all the three crops is graphically depicted in Fig. 4.39.

Fig. 4.39
A grouped bar graph of accuracy in percentage versus types of crop plots 10, 25, and 50 epochs for apples, maize, and wheat. The accuracy % for 10, 25, and 50 epochs are the highest under apple, maize, and maize and wheat, respectively.

Bar Graph depicting the varying values of accuracy with the number of epochs

Along with the increase in accuracy, another measure that provides insight into the model’s robustness is the value of the loss function. Loss is a measure of the distance between the true values of the problem with the values which our model is predicting. Greater is the loss; more are the errors made on the dataset. In our model prediction, we have used the sparse categorical cross-entropy loss function as each dataset sample belongs to only one particular class. In the case of the loss function, the lesser the value of the loss function, the better the model’s training. The correlation between the loss values to the number of epochs is represented in Fig. 4.40.

Fig. 4.40
A grouped bar graph of loss value versus types of crop plots 10, 25, and 50 epochs for apple, maize, and wheat. Wheat has a higher loss value for all epochs.

Bar graph depicting the variation of loss values with the number of epochs

It is evident from Figs. 4.39 and 4.40 that among various epochs, 50 epochs led to the best accuracy and least loss which allowed our model to be trained efficiently. Hence, we will be considering 50 epochs as a suitable benchmark for our further training process to improve model accuracy to a better pedestal.

While modeling any architecture, one must take utmost caution while selecting the optimal number of epochs for model training based on the size of the dataset and the quality of images. For instance, if the size of the dataset is small, then a large number of epochs will cause our model to overfit. Similarly, if a small number of epochs are employed for a large dataset, the model will be under-fitted. Both of these extreme conditions can lead to an increase in the loss function, degrading the model’s efficiency.

Now the performance of our model has been amplified by using a regularization technique adding a dropout layer as mentioned earlier in Sect. 4.4.2. The dropout layer shuts off some neurons so our model does not overfit.

In Fig. 4.41, we can infer that with a decrease in dropout percentage, model accuracy increases to some extent which makes the learning better and even reduces the probability of overfitting. In Fig. 4.42, it is shown that with a decrease in dropout percentage, the loss function decreases, building the model less erroneous.

Fig. 4.41
A grouped bar graph of accuracy in percentage versus types of crop plots 0.25, 0.50, and 0.75 dropouts for apple, maize, and wheat. Maize has a higher percentage of accuracy for all three types of dropouts.

Bar graph representing the varying accuracy values with different dropout values

Fig. 4.42
A grouped bar graph of loss value versus types of crop plots 0.25, 0.50, and 0.75 dropouts for apple, maize, and wheat. Apple and wheat have higher loss values for dropouts compared to maize.

Bar graph representing the varying loss values with different dropout values

The percentage value of the dropout layer must be chosen carefully because if a very huge dropout percentage is taken, the maximum number of neurons will be switched off and our model will be unable to learn properly. If a low dropout percentage is used, it may lead to overfitting. While modeling our algorithm we observed that at 15% dropout we are obtaining somewhat good accuracy for some crops, but the graphs obtained showed that the learning process was not very efficient due to some inconsistency of validation curves. From Figs. 4.41 and 4.42, we got an optimum dropout percentage range. The best results with high accuracy and lower loss values were generated keeping the dropout percentage between 25% and 50%. Any value below that can lead to a scope of overfitting and any value above 50% may lead to a poor learning curve. Moreover, Alvin and Dae-Ki [42] have explained different dropout regularization techniques used in Neural Networks.

Here sparse categorical accuracy and sparse categorical cross-entropy are used to evaluate the model performance and efficiency in predicting new data (Figs. 4.43, 4.44 and 4.45).

Fig. 4.43
2 multi-line graphs of model accuracy and model loss for apples. The model accuracy graph of accuracy versus epoch plots train and validation lines that ascend in a concave downward manner with fluctuations. The model loss graph of loss versus epoch plots train and validation lines that descend in a concave up manner with fluctuations.

Sparse validation performance curve for crop ‘Apple’ with dropout = 0.25

Fig. 4.44
2 multi-line graphs of model accuracy and model loss for maize. The model accuracy graph of accuracy versus epoch plots train and validation lines that ascend in a concave downward manner with fluctuations. The model loss graph of loss versus epoch plots train and validation lines that descend in a concave up manner with fluctuations.

Sparse validation performance curve for crop ‘Maize’ with dropout = 0.25

Fig. 4.45
2 multi-line graphs of model accuracy and model loss for wheat. The model accuracy graph of accuracy versus epoch plots train and validation lines that ascend in a concave downward manner with fluctuations. The model loss graph of loss versus epoch plots train and validation lines that descend in a concave up manner with fluctuations.

Sparse validation performance curve for crop ‘Wheat’ with dropout = 0.5

The graphs compare the learning curves based on validation and training data where the gap area between them is significantly less and somewhat superimposing, which aligns with the general trend of CNN architecture. Table 4.10 contains the final accuracy of our described architecture.

Table 4.10 Accuracy of defined crops

Using our CNN architecture we obtained an accuracy of 98.70% for wheat prediction, which is 0.1% more than the best model used in [23].

The overall classifier accuracy for our model is 98.82% for the common diseases affecting the maize crop, which is higher than the one achieved by the model used in [21, 22]. In addition, we have also achieved a mean accuracy of 98.93% for classifications of apple infections, which is greater than the mean accuracy obtained using the model defined in [22].

4.5.3 Fertilizer Prediction

As mentioned earlier in Sects. 4.4.2 and 4.4.3, Adam is the best-known optimizer used in CNN architecture. Adam is an acronym for the Adaptive Moment Estimation Algorithm, which estimates moments and utilizes them to optimize a function. It combines the gradient descent with the momentum algorithm and RMSProp algorithm.

Figures 4.46 and 4.47 show the validation accuracy score and validation loss parameter among various used optimizers like RMS Prop and SGD, which is why Adam is considered the best optimizer.

Fig. 4.46
A bar graph of accuracy versus optimizers used for validation accuracy plots ADAM, R M S prop, and S G D. ADAM has the highest accuracy of 95, whereas R M S prop has the lowest accuracy of 80.

Validation accuracy of various optimizers

Fig. 4.47
A bar graph of loss value versus optimizers used for validation loss plots ADAM, R M S prop, and S G D. R M S prop has the highest loss value of 0.691, whereas ADAM has the lowest loss value of 0.2055.

Validation loss values of various optimizers

As is depicted in Fig. 4.46, the Adam optimizer’s accuracy is the best optimizer. The accuracy used here is sparse categorical accuracy. Similarly in Fig. 4.47, Adam optimizer shows the lowest loss value among others. Less is the loss function, and more efficient is our model. So from Fig. 4.47, it is evident that RMSProp gives the maximum loss, hence the predictions made by using that optimizer are too far from the ground truths. The optimizer Adam gives the least amount of loss and greater accuracy. Hence, Adam optimizer is used.

Fewer epochs are used so that the neural network generalizes better on unseen data. On the other hand, multiple epochs help the neural network see the previous data and readjust the various parameters of the model so that our model is not biased towards the last few data points while being trained.

An optimal value must be chosen as epochs can affect the distance between predicted and actual values. As evident from the graphs in Fig. 4.48, the model with 500 epochs has reached the global minimum resulting in the lowest value of the loss. As the epochs are decreased to 250 and beyond, the loss increases. Similarly, when the epochs are increased to 1000, the loss increases significantly. Both the extreme cases are detrimental to the prediction model. Hence, for our model, the minimum loss is obtained at 500 epochs.

Fig. 4.48
A bar graph of loss values versus the number of epochs for validation loss plots for 250, 500, and 1000 epochs.1000 epochs have the highest loss value of 0.3306, whereas 500 epochs have the lowest loss value of 0.1116.

Validation loss values with changing Epochs

Our dataset has 99 rows of data for which the optimal range of epochs for our model was 350 to 700 beyond which the model can be erroneous. The best accuracy was obtained at around 500 epochs. If epochs are less than 350, there are chances of underfitting. Consequently, if epochs are taken more than 1000, there are chances of overfitting. So an optimal value of 500 epochs gave the best accuracy as depicted by the graphs in Fig. 4.49.

Fig. 4.49
A bar graph of accuracy versus the number of epochs for accuracy plots for 250, 500, and 1000 epochs. 500 epochs have the highest accuracy of 95, whereas 1000 epochs have the lowest accuracy of 88.

Accuracy with changing Epochs

Concerning the detailed analysis of the heatmap in Sect. 4.5.1, we observe that in Fig. 4.50 the attributes are phosphorus, potassium, nitrogen, moisture, humidity and temperature. Among these, temperature and humidity are positively correlated while phosphorus and nitrogen are negatively correlated.

Fig. 4.50
A heatmap plots the correlation between temperature, humidity, moisture, nitrogen, potassium, and phosphorous on both the y and x axes with the help of a color gradient scale.

Heatmap showing the correlation between attributes in fertilizer prediction

As discussed earlier we have set the optimal number of epochs used for training as 500. Along with only considering the number of epochs, we observed that although the accuracy was good still the validation curve showed a slight tendency of overfitting which was regularized using a dropout layer. After iterating our ANN model through different values of dropout percentages, we derived that with a dropout percentage of 25 we achieved a good accuracy and eliminated overfitting (Fig. 4.51).

Fig. 4.51
A multi-line accuracy graph of accuracy versus epochs plots train and validation lines that follow an upward, heavily fluctuating trend.

Graph between epochs and accuracy for train and validation data

Figure 4.52 shows the change in accuracy and loss function values obtained at various dropout percentages at an optimal value of 500 epochs. Maximum accuracy and minimum loss were obtained with the dropout parameter value of 0.25, hence we used this proposed technique.

Fig. 4.52
A grouped bar graph of accuracy and loss value versus dropout value for changing dropout values with a fixed epoch of 500. It plots the accuracy and loss value of 4 dropout values. The dropout value of 0.25 has the highest accuracy with 96.02 % and the lowest loss value of 0.1081 compared to other values.

Comparing accuracy and loss function for varying dropout values

The optimal value of the loss is obtained within a range of 200–500 epochs. The values within this range returned a minimum loss. As in Fig. 4.53, the graphs show a decreasing trend which concludes our model to be well trained. Furthermore, the gap between training and validation loss curves is less or somewhat superimposing, proving that our proposed model is in good agreement with the general trend of perfect CNN architecture.

Fig. 4.53
A multi-line loss graph of loss versus epochs plots train and validation lines that ascend as S-shaped curves.

Variation curve between epochs and loss function

4.6 Conclusions

Agriculture contributes to a substantial percentage of our Indian economy. But due to a lack of updated technology, the agricultural sector still suffers from various problems. This chapter aims to propose a machine learning kit that contains different data mining techniques for crop prediction and farm management. Automating such processes will increase the productivity of a farm to a large extent. Several techniques like CNN, ANN and random forest classifier will aid the farmers to understand what is best for their growth, owing to which their profits can increase. The various alterations made in the hyper-parameters of these techniques, helped us pick the most optimum parameters to build our model. Accuracy close to 95% was achieved in the first stage of our model of crop prediction using Random Forest Classifiers. Also, among the given input parameters, principal component analysis elicited criteria like rainfall and humidity to be most imperative. Secondly, for disease detection in crops, our designed CNN architectural model with 4 hidden layers and softmax activation function gave an accuracy of 98%. Lastly, our ANN model to predict fertilizers suitable for any infected soil gave us an accuracy of 95%, using adam optimizer and regularization dropout value set to 0.25. Also, using heatmaps, parameters like temperature and humidity were found to be positively correlated.

Machine learning and deep learning have been applied in various industries and if the agriculture sector adopts them quickly, we can see a boom in the agro-economic conditions. Precise and intelligent agriculture will be a research hotspot soon. With different advancements in AI and machine learning, the long wait for an intelligent farming solution will soon be over.

Further enhancement of this model is to use different IoT devices to directly gain on-field values of soil moisture, and NPK values and serve it on cloud technologies where the ML models can be hosted. This will help in understanding high dimensionality patterns between seasonal climatic changes, which can help predict the effects of drought and severe climatic repercussions.