Keywords

1 Introduction

Crop diseases cause significant reductions in agricultural productivity worldwide [1]. Crop disease symptoms often first appear on the crop leaves, moreover the different kinds of diseases have different symptoms, which is the important basis to detect disease occurrence and recognize crop types. The image segmentation of diseased leaves is a key step in disease atomic recognition, and the segmentation effect directly affects the subsequent recognition effect. The segmentation of disease leaf image is to subdivide the diseased leaf image into normal area and diseased spot area [1,2,3]. In the process of disease recognition, the extraction of eigenvalues and the construction of classifiers are based on them. Diseased leaf image segmentation is always one of the most difficult tasks in the crop disease recognition, and it is also a hot research topic in image processing and pattern recognition fields. There are many kinds of image segmentation algorithms for diseased leaf image segmentation, including threshold segmentation, edge detection, segmentation based on mathematical morphology and fuzzy clustering [4,5,6]. Camargo and Smith [7] successfully segmented the lesion and background of banana leaf black spot in their designed crop disease classification and recognition system, taking the banana leaf black spot as an example, using the optimal threshold histogram threshold segmentation method. The characteristics of threshold segmentation are simple and efficient. The difficulty lies in the selection of threshold. The color and texture of crop pest and disease areas are often different from those of non-disease areas [8, 9]. Baum et al. [10] used Sobel edge detection operator to isolate barley plaque from the background. In the experiment of rice leaf image segmentation, a rice leaf edge detection algorithm based on multi-strategy fusion technology is proposed, which combines minimum outer rectangle algorithm, median filter and Canny operator. The disadvantage is that the segmentation efficiency of the edge detection method depends on the edge detection operator and its robustness is poor [11, 12]. Casady et al. [13] used mathematical morphology to segment rice canopy images, combined with gray median method to extract the height and area of rice canopy, and achieved good results. Combining the linear mathematical morphology segmentation with the non-linear edge detection method, the segmentation of crop leaves is completed. Fuzzy clustering segmentation algorithm belongs to unsupervised learning in pattern recognition [14]. It stipulates that the membership of each pixel in an image is fuzzy and has been widely used in recent years. Jaware et al. [15] used Otsu’s algorithm [16] to calculate the shielding threshold of green pixels of disease image, eliminate the zero RGB pixels and edge features of infection, and used the optimized K-means clustering method to segment the image of crop disease leaves. The experiment proved that the algorithm was efficient and high-precision. Li et al. [17] used K-means clustering method based on a and b components of Lab color model to identify red spiders in color images, with remarkable results. A large number of image data of crop diseases images have the characteristics of fuzziness and uncertainty. The membership function in the fuzzy clustering can model the fuzziness and uncertainty in the image, so it can be effectively applied to image segmentation. Fuzzy clustering method also has some shortcomings, such as sensitivity to noise and initialization data, and large amount of computation of the algorithm, which affect its practical application in agricultural production, and need to be further improved and optimized.

In recent years, deep learning has achieved good results in image segmentation and image recognition. Many researchers have applied deep learning technology to crop diseases segmentation and recognition, and achieved some results [18,19,20]. Hanson et al. [7] proposed a plant disease recognition method based on convolution neural networks (CNNs). The experimental results show that CNNs has high recognition accuracy. Good segmentation results were achieved in the process of grape disease leaves segmentation in natural scenes. Fully convolutional networks (FCNs) is an effective segmentation method, which has been applied to several research fields [19, 21, 22]. Based on FCNs, an improved modified FCNs based crop disease leaf segmentation method is proposed.

2 Modified FCNs

Based on FCNs, a modified FCNs model is constructed for crop disease leaf image segmentation. The network is trained by several images of maize leaf lesions. The features of lesion area in maize leaf lesion images are learned, and the end-to-end image segmentation of crop disease leaf is realized from input to output.

The main steps of crop disease leaf image segmentation by FCNs are described as follows,

  1. (1)

    The original image is input into the convolution neural network model and the initial feature map is obtained by convolution operation. The convolution layer mainly consists of a convolution kernel of K size N * N * C. After convolution operation between the original image and the convolution kernel, the non-linear activation function is used to enhance the feature extraction ability of the convolution layer. By operation, the feature of K size (M – N + 1) * M – N + 1) graph can be obtained. The concrete operation formulas of convolution layer are as follows:

    $$ x_{i}^{(l)} = f(\sum\limits_{{i \in \delta_{j} }}^{M} {W_{i}^{(l)} X_{i}^{(l - 1)} + b_{i}^{(l)} )} $$
    (1)

    where \( x_{i}^{(l - 1)} \) is the output of the l − 1 hidden layer, \( x_{i}^{(l)} \) is the input image of the input layer, \( W_{i}^{(l)} \) is the mapping weight matrix of the lth hidden layer, \( b_{i}^{(l)} \) is the bias matrix of the lth hidden layer, f is the activation function used to solve the problem of the inadequate expressive ability of the original linear function. Its expression is f (x) = max (0, x).

  2. (2)

    The maximum pooling is adopted in this study. Activation function is still used to enhance the nonlinearity of the model after the downsampling operation. The concrete formulas for calculating the pooling layer are as follows:

    $$ \begin{array}{*{20}l} {x_{i}^{(l)} = w_{i}^{(l)} down(a_{i}^{(l - 1)} ) + b_{s} } \hfill \\ {a_{i}^{(l)} = f(x_{i}^{(l)} )} \hfill \\ \end{array} $$
    (2)

    where l denotes the number of current pooling layer, down is a downward operation, W is a weight matrix, and \( b_{s} \) is a biased matrix.

  3. (3)

    Full-convolution neural network performs the end-to-end pixel-by-pixel classification process. After feature extraction of the input original image using convolution layer, pooling layer and activation function, it is necessary to input the extracted feature map into the pixel-based classification layer for pixel classification. Common classifiers include SoftMax, SVM and so on. The classification level is as follows:

    $$ {\text{soft max}}(x_{i} ) = \exp [w_{i}^{T} x^{(i)} ]/\sum\limits_{j = 1}^{C} {\exp [w_{j}^{T} x^{(i)} ]} $$
    (3)

    where is \( w_{i} \) the pixel matrix of the convolution layer output feature map is obtained.

  4. (4)

    After classifying the pixels at the classification level, the loss function is used to evaluate the training effect of the model. The lower the loss values of the training set and the test set, the better the training effect of the model.

The coding network consists of 13 convolution layers and 5 pooling layers, including convolution kernel (CONV), batch normalization layer (Batch Normalization) and activation layer (RELU). The convolution kernel is uniformly set to 3 * 3 size and the stride is set to 1. The maximum pooling layer is used in the pooling layer. The size of the pooling layer is set to 2 * 2 and the step size is set to 2. The decoding network consists of convolution layer, up sampling layer and SoftMax classifier. The scale of upper sampling layer is set to 2. Because the segmentation of maize leaf lesion image is a two-classification problem (normal area and lesion area), the channel number of convolution kernel in the last layer of decoding network is set. Secondly, the dense feature map obtained by decoding network is input into the classification layer to classify the pixels, that is to say, the segmentation of maize leaf lesion image is completed.

Based on the improved FCNs model, the Tensor flow framework in deep learning is constructed, and Python language is used as the programming language. The operating system of the experimental platform is Ubuntu 16.04 and the memory of the computer is 32 GB. The platform uses Nvidia GTX1080Ti 11 GB GPU graphics card and carries Intel (R) Kernel i7 processor. In the process of training the network, the small batch stochastic gradient descent (SGD) algorithm with momentum factor is used to train the network model. In order to guarantee the nonlinearity of the model and improve the learning efficiency of the convolution layer, RELU is used as the activation function. Batch size is set to 32, 64 and 128 due to the large image data set and the limitation of computer memory. In order to ensure the efficiency of network training, the initial learning rate is set to 0.01, the momentum factor is set to 0.9, and the Batch size is set to 128. After 1200 iterations, the training speed of the network model will slow down, so the learning rate will be reduced to 0.001 [23, 24].

3 Experiments and analysis

Different size convolution kernels will affect the feature extraction ability of network model. In order to better understand the network, the type of visual convolution kernels in this study can be seen that smaller convolution kernels (CONV) have smaller local receptive fields, but contain more details, which can better extract the main information contained in the image and remove redundant feature information. In order to verify the effect of convolution kernel size on segmentation accuracy, Batch Size was set to 128, and the default learning rate was 0.01. When pooling type is chosen as average pooling, the convolution kernels of 1 * 1 size are 2.25% points higher than those of 3 * 3 and 5 * 5 in IOU evaluation index, respectively. The convolutional kernels are shown in Fig. 1.

Fig. 1.
figure 1

Visualization of different types of convolutional kernels

In order to test the segmentation performance of this study, different segmentation methods were used to segment maize leaf lesion images. Three different image segmentation network structures were selected in the experiment, namely FCN-8s, DeepLabV3 and PSP Net, and two traditional image segmentation methods, Logic Regression and SVM, were selected. The segmentation effect of different segmentation methods is shown in Fig. 2. From the graph, we can see that FCN-8s has obvious misclassification phenomenon; DeepLabV3 has better segmentation effect than FCN, but can only segment larger lesions, but the segmentation accuracy of small lesions is poor; PSP Net segmentation effect is good, and the details of lesions are more accurate, but there are still some errors in the logical regression and SVM segmentation. The two traditional segmentation methods are basically the same in segmentation effect, and both of them can segment the lesion accurately, but because of the use of mathematical morphology processing, the local edge information of the image is lost, and the small lesion area also appears over-segmentation phenomenon [24]. The segmentation method proposed in this study not only guarantees the integrity of the lesion, but also highlights the small lesion area segmentation. According to different segmentation methods, the segmentation effect is compared on the test set images, and the evaluation index is shown in Table 1.

Fig. 2.
figure 2

Segmented lesion images by different segmentation methods

Table 1. The segmentation accuracies by different segmentation methods

From Table 1, it is found that the proposed method outperforms the other methods. In the segmentation experiments on the test set image, the IOU value reaches 0.9123. Compared with the traditional convolution neural network structure, the training time of the traditional segmentation method is shorter because it does not need a lot of convolution operations. But in the early stage of segmentation, a lot of image preprocessing is needed, and the method of feature extraction needs to be set manually, and the end-to-end pixel segmentation is not achieved.

4 Conclusion

An improved FCNs model is proposed for crop disease leaf image segmentation. It mainly includes a coding network and a decoding network. The coding network is improved on the traditional VGG-16 network. The decoding network structure corresponds to the coding network. The main purpose is to deconvolute the pooling layer in the coding network and restore the output characteristics of the coding network. The improved model can segment the crop leaf lesion image more accurately, avoiding the traditional method of artificial design feature extraction, and is simpler than the existing convolution neural network in the structure of the model. This method has better adaptability to different backgrounds in the process of segmentation. It can overcome the influence of complex environment on image segmentation and accurately realize the segmentation of lesion area in maize leaf image. It also has great advantages in the speed of segmentation. It can realize real-time image segmentation and lay a foundation for the subsequent accurate recognition of maize disease types.