Introduction

Disease detection is crucial in agriculture, as the disease is a natural part of plant life. Bananas are the principal food in Africa, Asia, and Latin America, and they account for 13% of global production, clearly indicating the fruit's importance in domestic markets as well as food security [1]. Smallholder farmers, who account for 85% of the world's farms, confront numerous abiotic and biotic constraints [2]. The identification of banana plant diseases by image processing [3] becomes more efficient, and it is also extremely important for farmers in monitoring plant growth without any manual assistance inexpensively.

The most common disease that occurs in banana leaves is Panama disease, moko disease, Sigatoka disease, black spot, banana bunchy top, infectious chlorosis, banana streak virus, and banana bract mosaic virus disease [4]. The fungus Mycosphaerella fijiensis causes black Sigatoka [5]. Little chlorotic spots appear first, then thin dark stripes are formed, which are delimited by leaf veins. Leaf speckle [6] is caused by a fungus. Its early signs would be light brown small spots, but as time goes on, the size of the spots will increase, eventually turning black. Otherwise, if it is detected as soon as possible, it can be treated to save the plant [7, 8].

Artificial intelligence (AI) [9,10,11,12,13,14] has been used to identify plant diseases based on their appearance and outlier symptoms, while deep learning prototypes have been used to take human actions into account. Farmers can use the AI-enabled application on their mobile phones to receive disease alerts that can help them diagnose and prevent crop damage. Deep learning [15] is a new image processing approach that has been utilized for object recognition and has increased the accuracy of classification in various crop diseases. Transfer learning is one of the most used deep learning techniques. Design for pre-model has been obtained as a novel application in this technique [16, 17].

Deep transfer learning(DTL) was developed for a novel architecture of image processing as well as prediction analysis with improved accuracy, and it has a lot of potential for detecting crop disease [18, 19]. Many Convolutional Neural Network (CNN) architectures have been trained with simple leaf disease images with one leaf and with no complex backgrounds hence the accuracy obtained for this dataset is not reliable for images with intricate backgrounds and more leaves in a single image itself. Until now, plant disease identification systems have been mostly focused on identifying leaf symptoms [20, 21]. This paper focuses on identifying the banana pest and disease signs using a dataset that contains complex backgrounds via a Convolutional Recurrent Neural Network–Region-Based Convolutional Neural Network (CRNN–RCNN) architecture. The major contributions of this paper are as follows:

  • A histogram pixel localization technique with a median filter to minimize the overall training time of the CRNN network.

  • To minimize the data complexity and simplify the disease identification process, a region-based edge normalization technique is used for segmentation.

Review of Related Works

On the basis of their research, Lee et al. [22] designed a CNN for detecting plant disease without the need for manual assistance. Grinblat et al. [23] developed a comparably basic but effective neural network for the efficient identification of three different legume types based on patterns in leaves detected morphologically through their veins. Mohanty et al. [24] compared two commonly used and deployed system architectures for CNNs for identifying 26 variations of plant disease using an expanded leaf database with images containing 14 variants in plants. The outputs are very efficient, with an automated detection rate of 99.35%. Though their main restriction is with the whole substance of the image, which consists of single images for simulation arrangements but not for real-time agricultural farm implementation.

Sladojevic et al. [25] offered the same approach for identifying plant disease by leaf images. Their output efficiency ranges between 91 and 98%, depending on the data utilized for testing. Pawara et al. [26] have now examined a few conventional pattern recognition techniques utilizing CNN designs.

Some researchers succeeded in connecting the texture, color, and shape characteristics that indicate the presence of the disease. Though it can focus on diseased leaves at a time when their early symptoms aren't visible [27]. Furthermore, existing datasets are not taken into account when more than one disease is present in the same plant; as a result, prototypes that have been trained and tested have been used to identify the disease with greater visibility, and they are not necessary to be prominently essential for crop [28, 29].

Proposed Approach

In the existing techniques, neural networks were employed to identify plant diseases and to recognize textural features. The use of deep learning architectures in the identification of plant leaf diseases is expanding in the literature as shown in Sect. 2. However, there are still gaps in the identification of banana leaf disease utilizing deep learning architectures. The importance of efficient deep learning models with fewer parameters that can be trained rapidly and without compromising performance cannot be underestimated. The selection of a dataset is a common problem in the literature [16, 24,25,26].

The majority of the approaches proposed used controlled datasets, such as images shot under perfect conditions in a controlled environment. In reality, however, producing high-quality, high-resolution images of adequate quality for the detection and classification of banana leaf diseases is difficult. In this work, a CRNN–RCNN architecture is employed to identify and classify banana leaf diseases. Unlike previous studies, we used a dataset that comprised true field data from an uncontrolled banana farm. This proposed technique aims to extract and classify the disease of bananas at their initial level and this can inhibit the disease to nearby plants. The outline of the proposed methodology is shown in Fig. 1.

Fig. 1
figure 1

Flow diagram of the proposed approach

Pre-Processing of Input Training Image

Histogram pixel localization is used to enhance contrast. Then the noise of the image has been removed using the median filter which is integrated here. The Histogram pixel localization algorithm, which is based on probability theory, has recognized gray mapping for pixels. Gray functions and their histogram transforms have been investigated in parallel by leveling up the grayscales, resulting in the function of image enhancement.

Segmentation of the Pre-Processed Training Image

The segmentation using edge normalization is done by the following steps,

  1. a.

    On basis of a certain threshold range, the image has been transformed into a binary image.

  2. b.

    The image noise has been minimized as a result of the Gaussian 5 × 5 filter variation which is exposed between the edge and the non-edge areas.

  3. c.

    The density of gradient and their edge direction has been evaluated using the Sobel operator

  4. d.

    The angles \(0^{o} ,45^{o} ,90^{o} ,135^{o}\) have been grouped based on their edges.

  5. e.

    If the pixels don’t have an optimal value then they are neglected and replaced with the two adjacent pixels values through the edge direction.

  6. f.

    Neglect the edge which is not connected with lines of the object through two threshold rates T1 also T2. The arithmetic equation for this condition is formulated as:

    $$f\left( {x\,,\,y} \right)\, = \,\left\{ {\begin{array}{l} {0,\qquad f\left( {x,y} \right)\, < \,T_{1} } \\ {128, \quad T_{1} \le \,f\left( {x,y} \right) \le \,T_{2} } \\ {225,\quad f\left( {x,y} \right) > \,T_{2} } \\ \end{array} } \right\}$$
    (1)

Extraction of Features from the Segmented Image

Here the segmented image has been extracted using the integrated Gabor-based binary patterns with convolution recurrent neural networks technique. The 2D Gabor wavelets can be defined as follows,

$$\psi_{v\,,\,\mu } \left( z \right) = \,\frac{{\left\| {\,\,k_{v,\,\mu } } \right\|^{2} }}{{\sigma^{2} }}e^{{\left( { - \,\left\| {\,\,k_{v,\,\mu \,} } \right\|\,\,\,\left\| {\,\,z\,\,} \right\|\,2\sigma^{2} } \right)}} \left[ {e^{{i\,k_{{v,\mu^{2} }} }} - e^{{\frac{{ - \sigma^{2} }}{2}}} } \right]$$
(2)

where ν and µ define the scale and orientation of the Gabor wavelets, \(z = (x,y)\), denotes the norm operator, and the wave vector \(k_{v,\mu } = e\,i\phi \mu\), where \(kv = {{k\max } \mathord{\left/ {\vphantom {{k\max } {\lambda v}}} \right. \kern-\nulldelimiterspace} {\lambda v}}\) and \(\phi \mu\) is the alignment constant, \(\lambda\) is the positioning aspect among frequency domain in wavelets. Banana leaf image has been transformed by Gabor transformation which is acquired by performing convolution operation in the image via Gabor wavelets.

Region-Based Convolution Neural Network Classification

The training of RCNN is initiated (Fig. 2) with the classification of ImageNet along with the source application trained using our dataset. The sliding window brute force search strategy is used for the region selection, which selects regions based on fixed shape and size. The input image is subjected to a Selective search operation in which they can select several proposed regions of the maximum standard. Typically, these proposed zones have been chosen with varying scales, sizes, and shapes.

Fig. 2
figure 2

Proposed RCNN architecture

For the category and ground-truth bounding box, each proposed region has been labeled. Finding a perfect window for every object in the image, on the other hand, is computationally expensive and results in an excessive number of windows. As a result, to slide over the images in this work, a fixed number of sizes and window templates are used. Several proposed regions for a single image are used to enhance speed, necessitating a CNN evaluation in disease diagnosis.

Ranking Matrix

A ranking matrix \(f \in F\) represents the aggregate value for each of the output layers of the CRNN model and it estimates the summary value for each instance \(x_{i}{^{\prime}} \, \in \,X\). Thus, it is conceivable to furnish a preference connection between fully connected output pixel values of the model output from a set of ranking functions F, It has been verified as:

$$x\,_{i}{^{\prime}} \, > \,x_{j}{^{\prime}} \, \Leftrightarrow \,f\left( {x_{i}{^{\prime}} } \right)\,f\left( {x_{j}{^{\prime}} } \right)$$
(3)

whereas, ranking is determined from the indexing correlation of pixel co-ordinates \(\left( {x_{i}{^{\prime}} \,,\,x_{j}{^{\prime}} } \right)\) which is considered as the learning problem and is interpreted from the ranking problem. It has been widely used for classification on pairs of pixel values \(\left( {x_{i}{^{\prime}} \,,\,x_{j}{^{\prime}} } \right)\) rather than classifying best or worst values. The more established relation \(\left( {x_{i}{^{\prime}} \,,\,x_{j}{^{\prime}} } \right)\) furnished by a new vector \(\left( {x_{i}{^{\prime}} \,,\,x_{j}{^{\prime}} } \right)\) is represented using the below equation:

$$\left( {x_{i}{^{\prime}} \, - \,x_{j}{^{\prime}} \,,\,z\, = \,\left\{ {\begin{array}{*{20}c} { + \,1\,\,\,y_{i} \, > \,y_{j} } \\ { - 1\,\,\,y_{j} > \,y{}_{i}} \\ \end{array} } \right\}} \right)$$
(4)

Accordingly, two classes are specified to classify each pair of images \(\left( {x_{i}{^{\prime}} \,,\,x_{j}{^{\prime}} } \right)\), the two classes are positive when the samples are correctly sampled and are ranked as (+ 1) and the other class is negative when the samples are incorrectly sampled and are ranked as (− 1), where \(x_{i}{^{\prime}}\) should be ahead of \(x_{j}{^{\prime}}\) and the former being those where the transposed is true. The above equation is composed of \(x_{i}{^{\prime}} \, \in \,X\) where each and every pixel value belongs to the original instances. It also forms a new instance of training data set S’ and it furnishes with a new labeled vector. In this paper, a Bayesian optimization is used for hyperparameter tuning of CRNN and RCNN, which selects the next hyperparameter value depending on past iterations. Bayesian optimization takes into account all past iterations, not just the most recent one. Here, LDA technique is the feasible technique to optimize the objective function and it is used to reduce the constraints in between-class scatter matrix as specified in the above techniques and to maximize the classification accuracy between-class scatter matrix.

Performance Analysis

The simulation results discuss for banana leaf disease detection from the input dataset. The banana dataset used in our work was developed by capturing photographs of both normal and abnormal (diseases) leaf images from banana plantation farms in India, as well as some images from the web. A total of 1875 photos are acquired, with 70% used for training, 20% for validation, and 10% for testing.

The image is enhanced and segmented using a region-based edge normalization approach, as shown in Fig. 3. As demonstrated in Fig. 4, feature extraction for the leaf is done with Gabor-based binary patterns and CRNN.

Fig. 3
figure 3

Segmentation using region-based edge normalization

Fig. 4
figure 4

Extracted feature using Gabor-based binary patterns with convolution recurrent neural networks

The overall comparison for accuracy, precision, recall, specificity for proposed and existing techniques are presented in Table 1. In precision value CNN gives 81.91%, DCNN gives 89.19%, KNN gives 78.45%, SVM gives 95.96%, Proposed CRNN_RCNN gives 97.7% among these techniques SVM and Proposed CRNN_RCNN gives optimum value. The proposed CRNN_RCNN model offers an accuracy of 98% for the banana dataset which is relatively higher than the accuracy obtained for CNN(87.6%), DCNN(88.9%), KNN(79.56%), and SVM(92.63%) techniques. Among all the techniques proposed SVM and Proposed CRNN-RCNN have obtained enhanced value.

Table 1 Comparison between proposed and existing techniques

The confusion matrix for the proposed classification scheme has been built based on the output obtained, as illustrated in Fig. 5. The accuracy per disease (class) is based on the confusion matrix, as well as a quantitative representation of a condition in which the architecture has been misclassified. The simulation is evaluated visually based on the results of the classifier, which also concludes the classes and features that are more important. When the number of misclassifications is maximized among two certain classes, they have to specify the requirement of maximum data collection for the classes for training by the convolutional architecture so that it can discriminate among those two classes.

Fig. 5
figure 5

(a, b) Confusion matrix for classification using RCNN

Figure 6 shows that there is a significant difference between the training and validation losses. This shows that the network attempted to retain the training data and, as a result, can achieve higher accuracy on it.

Fig. 6
figure 6

(a, b) Curves for training and validation accuracy and loss

Conclusion

Image processing techniques are used to recognize and classify images of the banana plant in order to detect disease with high accuracy. For banana plant disease detection and classification, the manual system has been replaced by the proposed technique since it is less time-consuming and more accurate. In the past, farmers have used this method to diagnose diseases more accurately and to detect them earlier, before they spread to nearby plants. The diseased image is segmented using a median filter and region-based edge normalization. The proposed system then performs optimal classification using RCNN as well as feature extraction using CRNN, potentially yielding a greater yield. As a consequence, two banana leaf diseases have been discovered, and additional research on the banana plant's fruit and stem with various disease detection methods may be conducted in future. The proposed methodology gives an accuracy, precision, recall, and sensitivity value of 98, 97.7, 97.7, and 98.69% when compared with the CNN, DCNN, KNBN, and SVM techniques.