Keywords

1 Introduction

Plant diseases and insect pests have always been one of the main factors restricting the sustainable development of agriculture. On the one hand, plant diseases have caused a lot of economic losses and even caused famine. According to the United Nations Food and Agriculture Organization (FAO) estimates, 10% of cereal production is lost due to diseases, and 12% of cotton production is lost due to diseases perennially. Accordingly, the economic loss caused by pests in the world is as high as 120 billion dollars every year, equivalent to China’s agricultural output value, 1/3 of the United States, two times that of Japan, and more than four times that of the United Kingdom [1]. Production cuts due to epidemics have been a global problem.

  1. (1)

    In 1970, the epidemic of corn blight in the United States caused a loss of 1 billion US dollars [2].

  2. (2)

    In 1990, China’s wheat stripe rust epidemic reduced rice production by 2.5 billion kilograms. In 1993, the rice blast epidemic in Chinese southern rice areas reduced rice production by 15 billion kilograms [3].

  3. (3)

    In 1845, the Irish famine that shocked the world was caused by the potato late blight epidemic. In 1942, a large rice area in Bangladesh suffered from flax spot disease, and by 1943, 2 million people died of starvation [4].

  4. (4)

    Rice dwarf disease was prevalent in some areas of Japan at the end of the 19th century, and more than 10,000 people starved to death due to it [5].

  5. (5)

    Cocoa swollen branch disease is extremely devastating in Africa. Ghana alone has cut down 179 million diseased trees from 1946 to 1981 [6].

On the other hand, pests are another factor restricting the development of the agricultural economy. There are more than 400 kinds of pests that have been found in China, and more than 40 kinds are more common in the northern region. It is divided into several major categories, such as leaf-eating, underground, gnawing, and sucking pests [7]. Compared with plant diseases, pests cause damage to more parts of plants, which will affect the roots, stems, leaves and fruits of plants. According to relevant statistics and analysis, from 2006 to 2015, the affected area of crop diseases and insect pests in China ranged from 463 million hm\(^2\) to 507.5 million hm\(^2\). According to the forecast of the National Agricultural Technology Extension Service Center, in 2020, the affected area of crop diseases and insect pests in China will reach 300 million hm\(^2\), which will cause colossal food and economic losses [8].

Fig. 1.
figure 1

Symptoms of individual plant pests and diseases.

Table 1. Common types of pests and diseases of some crops

There are many types of plant diseases and insect pests, as shown in Table 1 for common types of plant diseases and insect pests. From Table 1, we can see that different crops also have the same diseases and pests. This opens the possibility of building a unified detector. Figure 1 shows images of individual plant pests and diseases. Figure 1(a) shows anthracnose on a grape leaf, Fig. 1(b) shows a plant infected with tobacco mosaic virus, Fig. 1(c) shows aphids on a vegetable leaf, and Fig. 1(d) shows a plant infected with scale insects. The occurrence of plant diseases and insect pests is related to their cyclical growth characteristics and is also affected by various factors. When the diseases and insect pests are not treated in time, their scope of destruction will continue to expand, and natural resources will be seriously damaged [9].

Timely and effective control has good economic benefits for planting. At present, the treatment of plant diseases and insect pests is mainly based on prevention, and chemical reagents are used to deal with the problems of plant diseases and insect pests. Although the use of chemical reagents can achieve satisfactory results in plant protection, it will also cause unavoidable damage to the acid-base balance and structure of the soil. It will have adverse effects on the reuse of the soil, which does not meet the requirements of ecological agriculture [10]. At the same time, the lack of awareness of the types of plant diseases also leads to the use of the wrong chemical reagents, which does not solve the problem, but instead brings potential threats to the plants. Therefore, early identification of plant disease problems can reduce the use of chemical agents in the later stage and achieve sustainable development. For pest problems, The use of new ecological management methods such as trap lamps, yellow plates, and sexual attractants, combined with real-time pest detection on farmland and determination of insect types, can achieve the purpose of pest control in a more timely and efficient manner. These put forward requirements for the detection of plant diseases and insect pests.

The detection of plant diseases and insect pests is challenging. In the past, the diagnosis method of plant diseases mainly relied on manual identification. This judgment method is highly subjective and arbitrary. If the judgment error of the type of disease is significant, it will cause irreversible damage to crops. Furthermore, Farmers identify diseases and pests by comparing and analyzing online pictures, which is time-consuming and inefficient. They can also consult disease experts for judgment [11]. However, the time and economic costs are also high, and it is not suitable for large-scale judgment and management.

2 Traditional Image Detection

The traditional computer vision recognition of diseases employs manually extracted features to classify images. This method mainly relies on the prior knowledge of researchers to design algorithms to extract and match the texture, color, and shape of the disease to realize the recognition of the disease [12]. Jia Jiannan and Ji Haiyan collected images of cucumber bacterial angular spots and cucumber downy mildew leaves. They used the maximum inter-class variance method to extract ten lesion shape features to identify them [13]. Guanlin L et al. [14] used an unsupervised segmentation method (HCM) in a cluster-based (K_means) algorithm to process plant disease symptoms and image attributes, which was the color image for segmentation of 3 grape diseases. The results show that it can segment diseased regions from images of various colored grape diseases accurately. Xiaodan M et al. [15] cut images of each disease according to the characteristics of maize leaf diseases, then counted the number of disease spots, the shape, and type of disease spots in a single leaf, and removed redundant spots. Among them, the threshold method and the area marking method are the core content of the processing. The above two methods are combined with various operations on the image to prove that the statistics of maize leaf diseases are practical. Phadikar S [16] realized the identification of four rice diseases by extracting the features of disease spots and using rough set theory to screen and model the features. After Pugoy RAD and Mariano VY [17] converted the image to HSI color space, K-means clustering was used to group the pixels, compare them with each disease, and generate the matching degree with each disease, to realize the identification of the disease.

The extraction of artificial features is often costly, and for different crops and diseases, the artificially extracted features cannot be directly reused. Researchers need to redesign the algorithm to adapt to new crops and diseases, significantly increasing the subsequent workload. Due to the complexity of the research, it is difficult for this method to be implemented in practical applications [12]. On the other hand, most traditional computer vision methods rely on artificial feature extraction of crop disease, making the algorithm’s expressive ability too limited to generalize the disease characteristics. All features have poor generalization ability, poor adaptability to different environmental backgrounds, and are easily confused with other diseases [18].

3 Development and Application of Deep Learning

Deep learning has developed rapidly in recent years with large-scale data sets and increasingly powerful computing power of devices. In the field of image recognition, there are four basic categories of problems [19], namely image classification, object detection, semantic segmentation, and instance segmentation (Fig. 2).

Fig. 2.
figure 2

Comparison of different vision tasks in image recognition.

The task of image classification (Fig. 2(a)) trains the classifier to detect the category of the target in the picture. Object detection networks (Fig. 2(b)) not only identify the category of the object, but also predict the object through the bounding box location, as in YOLO [20] and its improvements [21,22,23,24]. Semantic segmentation (Fig. 2(c)) is responsible for training a pixel classifier and generating a mask to determine the class of the target object and shape. However, unlike object detection, semantic segmentation does not distinguish multiple individuals of the same category, such as Mask R-CNN [25]. Instance segmentation (Fig. 2(d)) uses different masks to distinguish different categories and objects of different individuals. In research on image recognition about plant diseases, the symptoms of plant lesions are generally accompanied by lesions or color blocks, and the color and shape of different lesions are also different. The plant lesions are extracted by image segmentation, and the color parameters of the lesion shape can first be judged. Then, a model is established based on machine learning to determine the type of disease [7]. According to the target’s steps, the detection network is divided into three categories: classification network, localization network, and segmentation network.

Fig. 3.
figure 3

The network structure of VGG-16.

3.1 Classification Network

The most classic type of network structure in the classification network is VGG-16 [26], whose structure is shown in Fig. 3. The early AlexNet network [27] used a 5-layer network to build a model, and the convolution kernel used was a large convolution kernel of 55 or 77. Although a larger convolution kernel means a larger receptive field, it also means many parameters. The VGG network is improved based on the AlexNet network, and the 55 and 77 convolution kernels are replaced by multiple 33 convolution kernel stacks, which deepens the complexity and depth of the network, and also effectively reduces the number of parameters. Szegedy C et al. [28] proposed GoogLeNet, which uses Inception architecture to replace the fully connected layer in the network. This structure not only fuses multi-scale features but also uses the high computational performance of dense matrices to improve the computational speed of the model. Chen et al. [29] transplanted the Inception architecture to the VGG network and adopted the transfer learning to apply the feature extraction ability learned from other datasets to the data of rice disease, achieving a classification accuracy of 92\(\%\). Shuangping H et al. [30] proposed a detection method of rice ear blast based on GoogLeNet. It overcomes the influence of outdoor natural light and uses multi-scale convolution kernels to extract the distributed features of different scales of lesions and performs cascade fusion of them to realize the identification of rice blast. He K et al. [31] proposed that a deeper network has better performance than before, but there is a problem of model degradation, which hinders the model from developing to a deeper level. They proposed a residual edge module. Combining low-level feature information with convolutional feature information overcomes the feature loss problem in deep-level models through the residual edge directly. Xiaodong Y et al. [32] proposed a CDCNNv2 algorithm based on a residual network (ResNet 50) by using transfer learning combined with deep learning. Through training more than 30,000 images of diseases and insect pests of 10 types of crops, the classification model of the severity of diseases and insect pests is obtained, and the recognition accuracy rate can reach 91.51\(\%\). At present, the development of the network is moving towards lightweight, such as SquuezeNet [33] and MobileNet [34]. Kamal K et al. [35] combined the depthwise separable convolution structure with the simplified MobileNet, and used the Plant Village dataset for training and testing. The results show that the network can achieve high disease classification accuracy with a small volume.

Fig. 4.
figure 4

Two structures of object detection network.

3.2 Localization Network

The localization network can judge the type of lesions in the picture and mark the location of lesions on stems and leaves. The positioning network is generally divided into two types, the single-stage detection method, and the two-stage detection method. The single-stage detection algorithm is a regression analysis-based target detection algorithm represented by YOLO [20] and SSD [36]. The two-stage detection algorithm is a candidate region-based target detection algorithm represented by the R-CNN [37] series [38]. The two-stage detection algorithm first uses algorithms such as selective search or Edge Boxes to perform a preliminary screening of the input image. Secondly, candidate boxes that contain detection targets are selected as much as possible, and then the network will classify the targets and correct coordinate respectively. Finally, the results will be fed back to the output image, as shown in Fig. 4(a). Faster R-CNN [39] is a widely used two-stage target detection algorithm, which proposes candidate regions through the RPN network, and then classifies and adjusts the candidate regions to complete localization and recognition. Ozguven et al. [40] changed the parameters of the CNN model to update the model structure of Faster R-CNN [39] according to the detection task of beet leaf spot. They detected diseases and insect pests on a single beet leaf photographed in the natural environment, making the classification accuracy rate reach 95.48\(\%\). The single-stage detection algorithm adopts the idea of regression analysis, as shown in 4(b). The single-stage object detector omits the candidate region generation stage and directly obtains the object classification and location information. Liu et al. [41] detected tomato pests and diseases by improving YOLOv3 [22], which has a high detection speed.

3.3 Segmentation Network

The segmentation network judges the image pixel by pixel, determines the location of the lesion, and obtains the shape of the lesion. Mask R-CNN [25] is a classic segmentation algorithm that adds a mask branch to Fast R-CNN [42]. Wang Q et al. [43] first used Fast R-CNN to identify the location of lesions on pictures of tomato pests and diseases and then used Mask R-CNN to mask the location of lesions to analyze the shape of plant lesions.

For instance segmentation and the deep learning classification and detection networks described earlier, a well-labeled dataset is an important foundation for achieving higher performance. Table 2 lists some existing public datasets.

Table 2. The existing pest and disease datasets

4 Multi-information Fusion Detection System

Single image input has a weak ability to express information. Multi-information fusion technology can be used, such as infrared thermal imaging or spectral information, to enhance the expression of information.

Infrared thermal imaging technology is a technology that uses various devices to detect infrared radiation images of objects, processes them with photoelectric technology, and finally converts them into visible images. Infrared thermal imaging technology has the advantages of extensive measurement range, strong sensitivity, and fast measurement. Based on this technology, the temperature distribution of each part of the object can be seen. The transpiration of plant leaves is negatively correlated with temperature. With diseases, the temperature of plant leaves will also change to a certain extent. Therefore, the use of infrared thermal imaging technology can detect plant diseases and physiological conditions in real-time [51]. In 2011, Wang Q et al. [52] used digital infrared thermal imaging technology to study the disease degree of apple leaves after inoculation with apple scab at different times. They also considered its effect on the transpiration of apple leaves. The study found that with the increase of the diseased area, the transpiration of the leaves will continue to increase, resulting in the continuous change of the maximum temperature difference of the leaves. The maximum temperature difference of the leaves is closely related to the proportion of the diseased area to the total leaf area. Consequently, infrared thermal imaging technology provides the possibility to quantify the extent of leaf lesions.

Spectral image analysis technology is a technique that determines the nature, structure, or content of substances according to their characteristics of absorption, emission, or scattering spectral spectrum. Spectroscopic technology has the advantages of high sensitivity, firm characteristics, accuracy, rapidity, and non-destructiveness. This technology has been widely used in the field of plant disease, and physiological condition detection [51]. Jingyi Z et al. [53] selected the first three principal components from the hyperspectral data of Cercospora beet leaf spot for principal component analysis. Although some samples overlapped in different degrees of disease, healthy, and disease samples, the difference in samples is significant. That makes the recognition accuracy rate of using a support vector machine up to 88.2\(\%\). When analyzing barley leaves infected with rice blast, Zhou RQ et al. [54] also used the first three principal components to accurately identify healthy and diseased parts of leaves based on the difference in spectral reflectance. Then, the hyperspectral data processing used classification methods such as BP neural network, fuzzy clustering, and support vector machine. Wang Y et al. [55] used the first derivative to denoise and selected the first five principal components as the characteristic wavelengths. Next, they used the support vector machine and extreme learning machine algorithms to establish classification models based on characteristic wavelengths and texture features respectively. The performance of the support vector machine model of data fusion is stable, and the correct rate of the prediction set reaches 98\(\%\). These research methods combine spectral information with deep learning processing methods and achieve good results.

5 Current Challenges and Future Directions

The current detection algorithms can identify images quickly and efficiently to some extent, but there are still many challenges when applied to the practical application of plant disease and insect pest detection.

Insufficient Datasets: Using machine learning to establish a crop pest and disease identification model can help researchers quickly detect different crop pests and diseases and take effective treatment measures. However, in reality, the number of crop pests and diseases is far more than imagined. It is one of the difficulties how to quickly and effectively collect samples of crop pests and build a more comprehensive database of crop pests and diseases that needs to be overcome today. Building a plant pest and disease database requires a long time and much workforce. Wiesner-Hanks T et al. [56] describe a method for generating maximum high-quality training data from a minimum of experts. The method consists of two stages. First, experts make simple and fast annotations of pests and diseases on low-resolution images and then assign these annotated images to non-experts for further pixel-by-pixel annotation, which can obtain high quality faster label data. Embrapa [44] spent years building a database of plant pests and diseases. This database, called PDDB, has 2326 images of 171 diseases and 21 other symptoms affecting plants. There are other large datasets arranged in Table 2.

Another method to solve the insufficient number of training sets is to perform data enhancement on a small number of existing images, such as various transformations CutMix [57], Mosaic [23], which can effectively improve the robustness of the detection model. At the same time, the GAN generative adversarial network is developing increasingly, which is used to generate pictures from different perspectives. Further, simulate the natural environment of plants to generate more data sets, which may be a direction in the future. The performance of mobile phones is also gradually improving. Mohanty SP et al. [58] achieved 99.35% accuracy by training models on large public datasets collected under controlled conditions. It proves the possibility of using data collected by mobile phones to train and assist disease diagnosis. It is foreseeable that the future cloud technology will significantly promote the establishment of databases.

The Complexity of Multi-stage Detection: It is also challenging to identify the various stages of diseases or pests. It is still necessary to explore the relationship between diseases in each stage and various physiological indicators such as plants’ water content and nitrogen content, which will be used for plant detection. Multi-information fusion can become a critical factor in preventing and controlling intelligence. The multi-platform detection system can be used, as shown in Fig. 5, which can send the photos obtained by agricultural robots, drones, and human-held detection equipment to the cloud server for data cleaning. After the server clears the abnormal data, it analyzes and detects the remaining data. At the same time, through manual maintenance of the database, adequate data is marked to improve the recognition ability of the model. The server realizes timely feedback in the early stage of pests and diseases and effectively prevents the development of pests and diseases.

Influence of Complex Background: There are few studies on multi-leaf diseases and insect pests in the natural environment, and there is room for improvement in the detection effect [41]. Especially under the influence of complex backgrounds, some scholars have made some attempts. Liao et al. [59] proposed a new grayscale transformation method for the problem of plant foreground and background segmentation and then used the Ostu method and the mixed cutting method based on the Sobel operator. The method successfully separates the foreground and background of plant leaves and improves AlexNet for 38 plant diseases, using 19,000 preprocessed images to train the model, and the final recognition rate reaches 98.4375%. Li et al. [60] proposed a multi-leaf rose disease and insect pest detection method, TSDDP. They first identified the leaf body through YOLOv3, masked the background part, and finally used Faster R-CNN to detect the leaf body part to eliminate the influence of the complex background. Jaisakthi et al. [61] firstly used the Grabcut algorithm to segment the leaves from the background image in order to remove the area of interest other than the pests and diseases, and then used the global threshold method to segment the segmented leaves into the disease area to extract the texture and color features of the image. Finally, the support vector machine was used to classify grape leaf diseases and insect pests. These methods can improve detection efficiency to a certain extent. However, the analysis found that most current research on diseases and insect pests is based on single leaf images in natural or experimental environments to classify diseases and insect pests, which is not in line with practical application scenarios.

Detection Accuracy and Real-Time Performance: Generally speaking, the deeper the network model, the longer the training time, while the training and detection speed of the lightweight model is fast, but the accuracy is slightly reduced [45]. Using transfer learning can make full use of the existing large datasets and initial weights to speed up the network’s training. Malathi V et al. [11] proposed a rice pest classification and detection model based on a transfer learning method with the fine-tuned Resnet-50 model. Experiments show that the fine-tuned resnet50 model performs better than the pre-trained model, achieving a test accuracy of 95.012%. High-precision models often have slow detection speed and are unsuitable for efficient operations. Lightweight networks have smaller parameters and faster detection speed, but the accuracy is not as good as large-scale networks. The balance between accuracy and speed is a problem that still needs to be considered and more in the future. The emergence of powerful processors can overcome this limitation and enable large-scale deep learning applications.

Fig. 5.
figure 5

Multi-platform detection system.

6 Conclusion

Detecting plant diseases and insect pests is very important to protect plants. The detection technology is constantly improving from the traditional feature extraction for image recognition to the current deep learning detection method for building a network and training models. Many scholars have made a lot of efforts and contributions. Good results have been achieved, but there are still many problems to be solved in practical application. It is believed that with the development and progress of detectors, more intelligent farms can be established. Farmers will have a comprehensive and detailed understanding of plant growth characteristics. At the same time, they can use the experience of local plant diseases and pest control to prevent the occurrence of diseases and pests in time, increase productivity and reduce the cost .