Keywords

1 Introduction

Human civilization now has the potential to generate enough food to feed more than 7 billion people thanks to modern innovations. Nevertheless, a numeral of factors such as climate change [1], pollinator reject, and plant diseases [2] continue to pose a threat to food security. Pests and pathogens are a worldwide danger to foodstuff security, but they might also be destructive for smallholder farmers whose livelihoods are dependent on healthy crops. Small-scale farmers contribute more than 80% of agricultural production in the on the rise countries and reports of yield losses are not less than 60% due to pests and infections are widespread [3].

Workers do passive observing in many circumstances as they go about their on a daily basis tasks. The downside to this approach is that by the time the plague is recognized, a substantial proportion of harm has already been done. In big farms, early pest detection required a more organized methodology. Traps are, without a doubt, the most commonly used technique for pest monitoring [4, 5]. The great majority of research in the literature is concerned with the second stage. The first step is generally only handled straightforwardly: an explanation of how the data was obtained is frequently included. The third phases are largely outside the purview of research.

2 Literature Review

Gutierrez et al. [6], who conducted a comparative analysis using a combination of pre-trained deep learning model as a mixture of models implemented with machine learning and computer vision, stimulated the current study. The main goal of the [6] study is to improve pest identification accuracy by using current frameworks like TensorFlow and Keras to construct a deep convolutional neural network (CNN). In addition, several recent pre-trained models may be applied to the dataset to assess accuracy. Table 1 depicts the overall survey of pest management and algorithms used and their accuracy.

Table 1 Comparison of algorithms with accuracy based on the problem

2.1 Pest Detection Methods

The goal of uncovering methods is to separate a confident target bug from the rest of the scene in a picture. This corresponds to a dual classification using the classifications “target visible” and “target missing.” K-means clustering is a vector quantization approach for grouping a set of comments into k-clusters or k-classes. The image is first divided into 100 × 100 blocks by the algorithm. The RGB and L*a*b* color spaces are then utilized as the foundation for an algorithm that pre-selects probable cluster centers before using K-means clustering to categorize each pixel. Using ellipse eccentricity rules, erroneous objects are removed.

2.2 Pest Classification Methods

The difficulty of classifying pests is significant since a classification like this must not only distinguish among the embattled species but also contract with nontargeted species, which might be many. The closest coldness between the retrieved attribute vector and the reference vectors associated with each class was used to classify each item as a whitefly, aphid, or thrip. Xia[16] utilized the watershed method to partition the insects, then used the Mahalanobis distance to extract color characteristics from the YCrCbcolour space. For the classification of eight pest species. Dawei [8] used transfer learning to classify 10 species in pictures collected in the field using a pre-trained AlexNet CNN. Metwalli [19] present the DenseFood model, which is a densely linked CNN model with several convolutional layers. The phrase “You Only Look Once” is abbreviated as YOLO.

To identify objects, the technique just takes a single forward propagation through a neural network, as the name indicates. This indicates that a single algorithm run is used to forecast the whole picture. The CNN is used to anticipate multiple bounding boxes and class probabilities at the same time. There are several variations of the YOLO algorithm. Tiny YOLO and YOLOv3 are two popular examples.

3 YOLO V3 Architecture

YOLO because of its velocity and accuracy, this algorithm is very fashionable. YOLOv3's network design is made up of three distinct networks. The first is Darknet-53, which serves as the network's backbone. The detecting layers, also known as YOLO-layers, come next, followed by an upsampling network. Figure 2 depicts the network structure. The backbone network, Darknet-53, is utilized to extract features from the input picture. The basic components of Darknet-53 are residual blocks and 53 convolutional layers. A residual block is made up of two 3 × 3 and 1 × 1 convolutional layers linked together via a shortcut connection. Figure 3 depicts the Darknet-53 architecture in its entirety.

Figure 1 shows the overview of YOLOv3 structure. The numbers below each layer show the dimension decrease of the input at that layer. The gray layer is the input layer. The blue layers are part of the backbone network, Darknet-53. The red layers are upsampling layers and the yellow layers are YOLO-layers.

Fig. 1
A diagram explains a YOLO layer. An input image begins with 1 by 2, 1 by 4, and ends with 1 by 8.

Overview of YOLO V3

Fig. 2
A process chart explains the S-by-S size input image of a bee on a leaf, further bounding boxes and confidence, and a class probability map results in a final prediction.

Improvised YOLO V3 architecture

4 Improvised YOLO V3 Architecture

The model divides the images into an S X S grid and for each grid cell predicts B bounding boxes, confidence (C) for those boxes and class probabilities(CP). The predictions are encoded as an S X S X (B* C + CP) Tensors. Dataset: Identifying a species from a photograph is a difficult task. The categorization of a picture is based on the assumption that the image contains just one species. However, in general, we want to identify ALL of the species in a photograph. Thankfully, biologists and taxonomists have created a taxonomic hierarchy to classify and organize species. Insects, spiders, crustaceans, centipedes, millipedes, and other arthropods are included in the ArTaxOr data set. Figure 2 depicts the overall working of improvised YOLO.

The dataset consists of images of arthropods in jpeg format Araneae (spiders), adults, juveniles, Coleoptera (beetles), adults, Diptera (true flies, including mosquitoes, midges, crane file, etc.), adults, Hemiptera (true bugs, including aphids, cicadas, planthoppers, shield bugs, etc.), adults and nymphs, Hymenoptera (ants, bees, wasps), adults, Lepidoptera (butterflies, moths), adults, Odonata (dragonflies, damselflies), adults, Orthoptera (grasshoppers, locusts, crickets, etc.)

Figure 3 predicts the pest in the picture. Accuracy of the pest prediction is also marked in the image. Ground truth image and predicted image are specified for comparing the accuracy of the prediction. The model fails to predict the class is also projected in the figure. Table 2 shows the accuracy comparison of each class.

5 Results and Discussions

See Fig. 3 and Table 2.

Fig. 3
Four webpage screenshots for image detection and prediction. The screenshots show hymenopter, Diptera, multiple class classification, and failure prediction.

Sample predictions of different class

Table 2 Comparing the accuracy of different classes

6 Conclusion

It is difficult to automate pest monitoring. Machine learning algorithms have evolved to the point where the apparatus desirable to develop a precise system with real-world application is now readily obtainable. Congregation data that is reflective of the enormous variety observed in live-out is difficult, more common, and procedures to permit consumer research get more refined, this may become less of an issue in the vicinity of future. but, as mentioned all through this paper, there are unmoving numerous explore gaps to be filled, implying that pest monitor mechanization will remain a fascinating study topic for numerous years. As proposed YOLO V3 architecture shows around 95% of accuracy in different pest predictions. Comparatively YOLO v3 works better and provide good result than RCNN. Adding more images for training will help to reduce the failure cases. In case if we have less images we recommend to use image argumentation for better and more accuracy.