1 Introduction

While developing image-based plant disease detection systems, a number of problems are encountered. Some of these problems can be listed as follows; the presence of symptoms with common visual attributes, different disorders with similar symptoms and the occurrence of multiple disorders at the same time in a single plant [1, 2]. The external factors, such as the presence of parasites in the image and the lighting variations complicate the problem more. In the studies carried out with image processing methods and traditional machine learning methods, significant successes have been achieved but the desired success rates have not been achieved yet. In recent years, important developments and achievements in the field of deep learning have led to the usage of deep learning in plant diseases. Information about some of the recent deep learning based studies on the plant diseases detection have been given below.

Mohanty et al. [3] have utilized a method based on convolutional neural network (CNN) to develop the plant disease detection system. They have achieved a 99.35% accuracy rate with the proposed method. Although the model gave good result, it showed poor performance when tested on image sets obtained under different conditions. Sladojevic et al. [4] has used a CNN model to develop the detection of plant diseases. In the study in which 14 different diseases have been diagnosed, a 96.3% accuracy rate has been obtained. Similarly, Dyrmann et al. [5] have utilized deep learning architectures to classify plant species. In the study in which CNN has been used as a model, experiments have been performed on a total of 10.413 images with 22 weeds and product types. A classification accuracy of 86.2% has been obtained with the proposed model. The developed system has classified some plant species with a low accuracy rate. This is thought to be because of a small number of training images for these species. Barbedo [2] has investigated that how the size and diversity of data sets affect the success of deep learning methods implemented to plant pathology. The results have shown that technical limitations due to classification of plant disease have been significantly overcome. However, the use of data sets consisting of a small number of images for training prevents the effective use of such technologies. Lu et al. [6] have developed a method based on the CNN model to diagnose rice diseases. In the study that classified 10 common rice diseases, 95.48% classification accuracy have been obtained.

Apart from the CNN algorithm, there are also studies using different deep learning models. Maniyar and Budihal [7] developed a deep learning method based on generative adversarial networks (GANs) for disease detection in plant leaves. High accuracy values were obtained with the proposed model. To address the problem of disease detection in plants, Jadon [8] proposed a novel metrics-based few-shot learning SSM network, which consist of stacked Siamese and matching network components. Experiments were performed on two different data sets: mini leaf diseases and sugarcane diseases data set. With the proposed approach, 92.7% accuracy rate in the mini leaf data set and 94.3% accuracy rate in the sugarcane dataset were obtained. Verma et al. [9] used capsule networks (CapsNet) to classify potato diseases and compared them with several popular pre-trained CNN models such as ResNet18, VGG16, and GoogLeNet to test the performance of their method. CapsNet has proven to achieve performance comparable to state-of-the-art CNN models with 91.83% accuracy. In [10], the authors proposed an optimized capsule network model for banana leaf disease classification. The results proved that the CapsNet algorithm achieves comparable performance to state-of-the-art CNN models.

There are also studies in the literature using ensemble learning based methods to increase the success of existing algorithms. Turkoglu et al. [11] have developed a hybrid of multi‑model LSTM and CNNs for apple diseases detection. The experimental results show that the results obtained with the proposed ensemble learning-based method are better than pre-trained deep learning models. Darwish et al. [12] have been developed an ensemble model of two pre-trained CNNs for detection of apple diseases. Experimental results prove the effectiveness of the proposed diagnostic approach. Unlike this deep learning and ensemble learning-based studies, there are also studies in which classical machine learning methods and ensemble learning methods are used together. You can look at the studies in [13, 14] to examine these studies.

When the literature review is examined, it is seen that CNN-based algorithms are frequently used for the detection of plant diseases. However, the natural complexity of plant images still limits the performance of many CNN models. In addition, CNNs do not understand the hierarchy between the parts which form an object (for example, a face consists of the eyes, mouth, nose, etc.). Objects need to be represented with different parameters to be recognized or classified independently of location, orientation and angular status. Object recognition independently of the point of view is possible thanks to the dynamic connections between the capsules and capsule network which can be considered as customized convolutional neural network architecture. In this study, the efficient method called multi-channel capsule network ensemble (MCCNE) based on capsule networks and ensemble learning is proposed for the plant disease detection system. The proposed method is a 5-channel method. The information source of each channel is different. First, 5 different plant images are obtained by applying 5 different methods to the image of the plant leaf. Then, each of these images is presented as input data to the 5 capsule networks. In the last stage, the ensemble learning technique is applied to the output of these 5 capsule networks. The majority voting algorithm has been used to determine the output value of this ensemble algorithm. The success of the method lies in using the combination of capsule networks, a strong classifier fed by different sources of information, and the ensemble learning technique, which combines the strengths of these different capsule networks. Major contributions of the proposed method are:

  • In this study, a novel approach based on deep learning has been developed for the determination of plant diseases with high accuracies.

  • The information source of the five channels is different. So, the developed system is multi-source. This provides a powerful architecture that is fed from five different sources, not from a single source.

  • To our best knowledge, no attempt has been made to utilize the combination of capsule network and ensemble learning for plant disease detection.

  • The developed method has been compared with several CNN based state-of-the-art classification methods and better results have been obtained.

The organization of the article is as follows: Sect. 2 provides information about the data and methods used in this study. Section 3 contains experimental results, comparative analyzes and discussion. Information about the obtained results are given in Sect. 4.

2 Methods

2.1 Capsule networks

Capsule networks and dynamic routing algorithms have been proposed as a solution to the problems in which convolutional neural network models are inadequate [16]. Even if the position, orientation, stance, and angular value change, it is suggested to learn the properties which present the object such as thickness, scale, shifting by routing-by-agreement.

The information routing steps of the capsule network from one layer to another layer by a dynamic routing technique are shown in Fig. 1. Lower level capsules are used to predict the output of higher level capsules, and capsules with greater compatibility with higher layer capsules are even more connected to the parent through a positive feedback loop.

Fig. 1
figure 1

Illustration of CapsNet operator [16]

Considering that \({u}_{i}\) is the output of the ith capsule, estimation of parent capsule jth;

$$\hat{u}_{{j|i}} = W_{{ij}} \cdot u_{i}$$
(1)

Here, \({W}_{ij}\) is a weight matrix. An agreement between capsules is realized with the product of a coupling coefficient \({c}_{ij}\). The \({c}_{ij}\) value are calculated using the Softmax function given in Eq. (2).

$$c_{{ij}} = \frac{{\exp \left( {b_{{ij}} } \right)}}{{\mathop \sum \nolimits_{k} \exp \left( {b_{{ik}} } \right)}}$$
(2)

\({b}_{ij}\) and \({b}_{ik}\) are the log prior probabilities between two coupled capsules. Equation (3) calculates the input vector for the higher level \(j\) capsule.

$$s_{j} = \mathop \sum \limits_{i} c_{{ij}} \hat{u}_{{j|i}}$$
(3)

The nonlinear squash activation function is used to provide that long vectors are close to one and short vectors are reduced to almost zero. The squash function is calculated as shown in Eq. (4).

$$v_{j} = \frac{{\left\| {s_{j} ^{2} } \right\|}}{{1 + \left\| {s_{j} ^{2} } \right\|}}\frac{{s_{j} }}{{\left\| {s_{j} } \right\|}}$$
(4)

Equations (1)–(4) form a whole routing procedure to calculate the value of \({v}_{j}\).

Equation (5) computes the margin loss of each output capsule \(k\);

$$L_{k} = T_{k} \max \left( {0,~m^{ + } - \left\| {v_{k} } \right\|} \right)^{2} + \lambda \left( {1 - T_{k} } \right)\max \left( {0,~\left\| {v_{k} } \right\| - m^{ - } } \right)^{2}$$
(5)

where \({T}_{k}\) is 1 when class \(k\) actually exists, otherwise 0. \({m}^{+}\), \({m}^{-}\) and \(\lambda\) are hyper parameters.\(\lambda\) is utilized to control the impact of gradient backpropagation at the initial learning.

2.2 Ensemble learning

Ensemble learning is the process of classifying the outputs (obtained by creating more than one classification model by providing diversity and giving the sample as input to these models) with final class label using a voting mechanism. Many studies have shown that the ensemble learning approach provides a higher accuracy rate than a single classifier [17]. The algorithm of the Bagging method, which is a basic ensemble learning technique, is given in Algorithm 1. Here, the information source required for each classifier is obtained by the Bootstrap method. The majority voting method is used at the output stage.

figure a

2.3 Gabor filter

This filter is often used to perform texture analysis in spatial space and to show the characteristic within the image [18]. Gabor, a linear filter that provides edge detection, measures the texture within the image series. Thanks to this filter, textures with the most consistent angle and scale are easily found and highlighted. A two-dimensional Gabor filter is given by:

$$G\left( {x,y} \right) = \exp \left( { - \frac{1}{2}\left[ {\frac{{x_{\theta } ^{2} }}{{\sigma _{x} ^{2} }} + \frac{{y_{\theta } ^{2} }}{{\sigma _{y} ^{2} }}} \right]} \right)\cos \left( {\frac{{2\pi x_{\theta } }}{\lambda } + \psi } \right)$$
(6)
$$x_{\theta } = x\cos \left( \theta \right) + y\sin \left( \theta \right)$$
(7)
$$y_{\theta } = - x\sin \left( \theta \right) + y\cos \left( \theta \right)$$
(8)
$$\sigma _{x} = \sigma ,{\text{~}}\sigma _{y} = \sigma /\gamma$$
(9)

As seen in Eq. (6), two-dimensional Gabor filter applied in the spatial domain is the modulation of the sinus function and the Gaussian function. The parameters of this filter are: wavelength of the sinusoidal factor \((\lambda )\); standard deviation of the Gaussian envelope \((\sigma )\); routing of the normal to the parallel stripes of a Gabor function \((\theta )\); phase offset \((\psi )\); and spatial aspect ratio \((\gamma )\). Equations (7) and (8) describe the transformation matrix that enables the detection of the distances extending in certain directions by the Gabor filter [18].

2.4 Principal component analysis (PCA)

One of the algorithms used to reduce the dimension of the feature vector and extract its principal features is PCA. The PCA is the algorithm that transforms a \(d\)-dimensional \(X\) image to an \(n\)-dimensional \(Y\) image with minimal loss [19].

The basic principle in PCA is to find the projection vectors in the direction of the greatest variance. The vectors are obtained by applying the eigenvalue—eigenvector transformation to the obtained covariance. Eigenvalues are ordered in descending order.

When arranging the columns of the matrix, the first \(n\) of the eigenvectors corresponding to these ordered \(d\) eigenvalues are used. Thus, the best projection matrix \(\left(W\right)\) is obtained. As a result of multiplying the projection matrix and the data as in Eq. 10, data with reduced dimensions are obtained:

$$Y = W^{T} X$$
(10)

2.5 Multi-channel capsule network ensemble (MCCNE)

Capsule networks are a very powerful method as mentioned above. In this study, a new method which takes into account capsule networks and ensemble learning is proposed to increase the performance of capsule networks and to achieve a higher success rate in plant disease detection. The pipeline of this new method is presented in Fig. 2. As shown in Fig. 2, five pre-processing algorithms are first applied to each diseased leaf image. After this stage, each of the 5 new images obtained is presented as input to channels consisting of a capsule network. Details of these processes are as follows.

Fig. 2
figure 2

The pipeline of the proposed method

First, image is a plant leaf image scaled to 28 × 28 pixels. Five different images are obtained from this image. The input of the first three channels is a particular color component. The three components are the R, B and G components of the leaf image. RGB color space is commonly used and is the default color space to normally store and represent digital images. As shown in Fig. 3, the primary colors consist of three components, red, green and blue. The input of the fourth channel is the image obtained by the Gabor filter applied to the original image. The input of the fifth channel is the image obtained by PCA applied to the original image.

Fig. 3
figure 3

A plant leaf image and its color channels. a Original image, b R channel, c G channel, d B channel

As can be seen, 5 different preprocessing algorithms are used in each channel. The reason why these algorithms are preferred is briefly as follows. Many studies in the literature show that color channel selection is very important for many computer vision algorithms. In addition, in these studies, it is seen that the features extracted from different color channels (R, G and B) have different discrimination power [20, 21]. Therefore, in this study, the effect of color channels on the results was investigated. Thanks to the Gabor filter as mentioned in Sect. 2.3, textures with the most consistent angle and scale are easily found and highlighted. Due to this important feature of this filter, its effect on the detection of plant diseases was investigated in this study. As mentioned in Sect. 2.4, PCA reduces the dimensionality of a data set consisting of many variables that are related to each other at different levels. In this process, it protects the variation in the data set at the maximum rate. In this respect, PCA is frequently used in the feature selection field for the detection of meaningful data. In this study, it was aimed to measure the effect of PCA in the detection of plant disease due to these advantages.

The block diagram of the single channel capsule network in the proposed architecture is presented in Fig. 4.

Fig. 4
figure 4

The block diagram of the single channel capsule network in the proposed architecture. L1 Layer (Convolutional Layer): 9 × 9 convolutional with 256 output channel. L2 Layer: Convolutional with 256 output channels. These channels are divided into 32 8-dimensional capsules. L3 Layer: Fully-connected with 16-D capsules. The number of capsules is 10. Note that we have dynamic routing only between the PrimaryCaps and DigitCaps layers

Algorithm 2 presents the training procedure applied for each channel during the training phase of the proposed method. Parameter1, and Parameter2 define the parameters in the 1st convolutional layer and PrimaryCaps layers, respectively. The first convolution layer and the second convolution layer in the primary caps stage are represented as \(Conv1\) and \(Conv2\), respectively; \(conv\_layer\) and \(primary\_caps\_layer\) refer to the output from the convolution layer 1 and PrimaryCaps layers, respectively. Thanks to the attribute vector obtained from the convolutional layer and the PrimaryCaps layer (lines 7 and 8); primary_caps_layer is packaged as capsule \(u\) (line 9). Where \(\widehat{u}\) indicates the contribution of one layer to the next layer. At the next stage, the routing algorithm (lines 10–18) is used to generate Digit Caps. In line 19, error calculation and network parameter update operations are performed.

In the ensemble learning technique, different techniques are used to obtain a single result from the results of different classifiers. In this study, majority voting technique has been preferred for this purpose.

In this case, the output of the proposed method is calculated as follows.

$$L_{{{\text{output}}}} = \arg \max \left( {L_{1} ,L_{2} ,L_{3} ,L_{4} ,L_{5} } \right)$$
(11)
figure b

3 Experimental setup

The experiments are realized in Python platform on a high-performance computer with NVIDIA GeForce RTX 2070 Graphics Processing Unit (GPU). The information about the dataset and parameter values of the algorithms used in this study are given below.

3.1 Dataset

The data set used in this study consists of images of 9 different diseased and healthy tomato crops. This data set has been taken from an open access repository [15] with images containing more than 50.000 leaf images. Only images of tomato leaves have been used from this dataset. Table 1 and Fig. 5 provide a summary of the dataset used. The total number of images in the data set is 14.785 and divided into nine diseases. Finally, 80% of the data in the dataset was used as a training set and the 20% as a test set. The distribution of data is based on recommendations from studies [3] and [22].

Table 1 Dataset summary
Fig. 5
figure 5

Extracted images from dataset

3.2 Parameter settings

The CapsNet is trained with the margin loss with \({m}^{+}\)= 0.9, \({m}^{-}\)= 0.1, and \(\lambda\)= 0.5. CNN is trained using Stochastic Gradient Descent (SGD). The batch size, maximum epoch, and learning rate are set to 32, 30, and 10–3, respectively. The weight decay is set to 1e-6, and the Nesterov momentum is set to 0.9.

Gabor filter parameters were investigated within the following ranges: \(0.001\le \sigma \le 1\); \(0.001\le \theta \le \pi\); \(0.001\le \lambda \le \pi\); \(0.001\le \gamma \le 1\); and \(0.001\le \psi \le \pi\). The best results with the grid-search technique were determined as follows. \(\sigma\)=0.35, \(\theta\)=0.65, \(\lambda\)=0.5; \(\gamma\)=0.25 and \(\psi\)=0.25.

Analyzes were made according to the number of key components in PCA. Accordingly, if the number of principal components was 10, better results were obtained in terms of both calculation time and success rate. It was observed that the success rate did not change significantly as the number of components increased.

3.3 Performance evaluation metrics

Classification accuracy and kappa statistic value have been used in the performance evaluation stage. The classification accuracy is computed as shown in Eq. 12. This measure is the ratio of the correctly classified sample number to the total sample number.

$${\text{Accuracy}}~\left( {CA} \right) = \frac{{TP + TN}}{{TP + FP + FN + TN}} \times 100\%$$
(12)

where \(\boldsymbol{T}\boldsymbol{P}\) is the number of true positives, \(\boldsymbol{T}\boldsymbol{N}\) is the number of true negatives, \(\boldsymbol{F}\boldsymbol{P}\) is the number of false positives, and \(\boldsymbol{F}\boldsymbol{N}\) is the number of false negatives.

The Kappa statistic value is used to compute the compatibility between the evaluations made by two or more evaluators. This value is computed as shown in Eq. 13.

$${\text{Kappa~}}\;{\text{Value}} = \left( {X_{0} - X_{c} } \right)/\left( {1 - X_{c} } \right)$$
(13)

where \({\boldsymbol{X}}_{0}\) represents the classification accuracy, and \({\boldsymbol{X}}_{\boldsymbol{c}}\) defines the accuracy value obtained by random guessing on the same dataset. The Kappa statistic value is in the range of − 1 to 1. − 1 indicates a bad classification, while 1 indicates that a good classification.

4 Results and discussion

The test results are shown in Table 2. When Table 2 is examined, it is seen that the best results are obtained with the proposed method. It is aimed to combine the strengths of different channels with the ensemble learning algorithm and success has been achieved. The success of the fourth channel is higher than the other 3 channels. This situation is thought to be caused by the fact that Gabor Filter contains important features of the image. Successful results have been obtained also with the third channel. The information source at the input of this channel is the G component. In the literature, G component is known to contain important and distinct information about a plant leaf compared to R and B components [23]. In Fig. 3, it is seen that the details on the leaf appear clearer in G component. The lowest success rate has been obtained by PCA method. Although the PCA method is a successful pre-processing technique, it can cause the loss of important information in the data.

Table 2 Comparison of results for ensemble and individual classifier

The confusion matrix in Fig. 6 shows the success of the proposed model for each class. In this way, it is possible to visually evaluate the performance of the classifier. The rows define the output class, and the columns define the actual class. Diagonal cells correspond to observations classified correctly, while non-diagonal cells correspond to observations misclassified. The results were obtained by applying the proposed method to the test data and written as a percentage value. When Fig. 6 is examined, it is seen that generally all classes are classified with high accuracy. This shows that the proposed method has good discrimination. Only the 9th grade Tomato mosaic virus is classified with a lower accuracy rate. It is thought that this situation is due to the small number of samples belonging to this class.

Fig. 6
figure 6

Confusion matrix obtained for plant disease detection with the proposed method

Table 3 presents a comparative analysis and the studies using deep learning-based methods for plant disease recognition. For a fair comparison, all methods have been applied on the data set used in this study. The experiments have been performed with the same computer using Python programming language. Values which the authors have used in their articles have been used as parameter values.

Table 3 Comparative analysis with existing methods

In Table 3, the first four studies in the table are based on the CNN algorithm. In the studies in [9, 10] are based on capsule networks. As can be seen, these deep learning-based methods are quite successful. When analyzed in terms of computation time, it is seen that CNN-based algorithms work faster. When the table is examined, a higher accuracy rate was obtained with the MCCNE method compared to other methods. It requires longer inference time compared to other methods. The reason for this is that the multi-level features in the proposed method increase the complexity of the model. This is the insufficiency of the proposed model and needs to be further improved in the future.

5 Conclusion

This study presents a new method of deep learning, called multi-channel capsule network ensemble for plant disease detection. The proposed method is based on a five-channel capsule network fed from five different data sources and ensemble learning. The experiments on a benchmark dataset compared with state-of-the-art methods prove the superiority of the developed method. The proposed method gives successful results compared to existing methods.

As mentioned in the discussion section, the computational cost of the proposed method is higher than other algorithms. In future studies, it is aimed to decrease the complexity of the network by sharing parameters between capsule layers.