Keywords

1 Introduction

With the growing population, it is necessary to increase the supply of fruit or vegetables as with the increasing demand for fruit and vegetables. Fruit and vegetable are essential for a healthy diet. They are a good source of vitamins, fiber, minerals and nutrients that keep us healthy and prevent diseases. Food production is crucial to preserve their color, taste, texture, and shape in a specific period [1]. The conventional farming system suffers from a lack of labor, which causes increased challenges in farms [2]. In agriculture field various task using object detection method with the support of robot guidance like harvesting and detection of diseases in plants [3].

In traditional research, most of the work is done manually, such as experts are higher to assess the quality for the inspection of food or a crop. However, the manual task has some flaws like human mistakes lack of knowledge about the characteristics of fruit and vegetable. For this reason, an efficient, consistent system requires that is suitable for the recognition task. The agriculture industry uses an automated system for detecting fruit and vegetable, including pre-harvesting and post-harvesting mechanisms of crops, which mainly depends upon computer vision techniques. Computer vision plays a vital role in agriculture, which exploits fruit classification, fruit harvesting, catalogue tools, and fruit supervision in markets [4]. However, it is crucial to discriminate the fruit based on its visual appearance among various lightning conditions with complex backgrounds [5].

To address these problems, computer vision introduces various algorithms and techniques that are proposed by the various researchers for grading the fruits, such as classification, segmentation and feature extraction, which automate the industrial field, remove the manual authentication of food and increases the quality and inspection of fruit using the guidance of robots [6]. Some authors have focused on the individual fruits to classify them accurately. They discussed the 3-category of oranges. Each category has its properties like color, taste, size and cost. Automatic classification of various fruits is a challenging task in [7]. Fruit detection is a crucial task and state-of-the-art challenge. Multi-Task Convolution Neural Network (MTCNN) is the most popular technique which has made progress for object recognition and classification to precisely target the object-like fruits with superior performance in terms of accuracy and time utilization [8].

In another line of research, post-harvest quality measurement is essential for plant phenotyping and ranking fruits, which helps to calculate the grading of better or poor, fresh or damaged fruits. The convolution Neural Network (CNN) approach was adopted to identify the disease and defects in fruits, specifically in peaches [9], lemons [10], pear [11] and blueberries [12]. Faster Region base Convolution Neural Network (Faster-RCNN) with ResNet101 trained on Common Object in Context (COCO) datasets and designed to detect the green tomato plant with high precision and minor error [13]. The outlook features of fruit like color, shape, and size essentially matter among supermarkets' trading, classification and grading. Cherry usually grows in the form of pairs and clusters. The uneven shape of the cherry causes the disorder during the development and less profitability in markets. The fruit becomes damaged after a specific period. Hence, an efficient algorithm is required to preserve the food from damage and increase its selling rate [14]. A semi-supervised approach was utilized with the combination of U-Net and Faster RCNN models for the yield estimation of detection and counting of apples in the orchard. U-Net was employed for the segmentation task while CNN counted fruit on the individual image dataset. The proposed methodology achieved a higher F1-score, which relies on the technique that has been deployed [15]. The list of Abbreviations is shown in Table 1.

Table 1 List of abbreviations

2 Background Study

Computer vision is applicable in various agriculture fields like production, monitoring, and harvesting the crop. However, there are still some issues raised technological issues, farming automation, environmental influences, building scalable datasets. Therefore, it is necessary to develop a public database to overcome the agriculture challenges [16]. In another line of research, some spatial challenges discussed innovative farming land, automated sensors, Robot farming with the help of exploiting computer vision in agriculture [17]. In agriculture, automated tools and techniques facilitate food grading, fruit harvesting, and production rate to strengthen and preserve the fruit prolong time. In this scenario, the researchers concluded that using image processing various filters and techniques [18] with feature selection process [19] performed accurate classification with the help of novel architecture Voronoi diagram base class and neuro-fuzzy architecture [20] for the recognition of fruits. Accurate classification and recognition of fruits using machine vision and computer vision techniques have been challenging, considering various circumstances such as the choice of accurate sensors, environmental influences, and heterogeneous variation between interclass and intraclass of fruits [21]. One of the drawbacks of computer vision is designing a dataset that is time-consuming and increases the computational cost. From the literature, it has been analyzed that environmental factor directly influences the detection rate of fruit and vegetable, it also identifies the disease present in soybean. The model could generate different results using similar computer vision techniques on a related dataset under variations in environmental conditions [22].

In this research, CNN architecture is based on the sliding convolution, insufficient for the multi-classification labelling. It only deals with binary classification. Hence multi-layer classification problem is still challenging and crucial to extend the dataset in [23]. Automated estimation of fruit harvesting detection of fruit ripeness accurately is still a challenging and laborious task. In previous studies, machine vision was utilized to estimate the fruit ripeness; now, deep learning with multiple features accounts for promising results [24, 25].

2.1 Computer Vision and Agriculture

Precision Horticulture (PH) is the most trending technology utilized to maximize yield estimation, preserve fruits from diseases, and automate fruit harvesting in orchards [26]. In similar research, various algorithms were adopted for the fruit harvesting robots. With the development in agriculture and current imaging technologies, most of the information is visualized in a better way and precisely target the fruits that assist the fruit recognition process also support the growth of fruit picking. The quality of the fruits detection system depends upon the various light conditions, stroke, and the environment in which the robot survives with suitable sensors. Expert farmers are the basic need of the farms. It is crucial to collect precise information about the growth of the crop. Due to the manual system, agriculture industries face many problems such as labor and time cost, lack of knowledge, and less experience of the workers causes the reduction of farming in orchards as discussed in [27].

Most of the focus of this research is to measure fruit quality. In-depth analysis, the computer vision and image processing comparison reported for fruit and vegetable quality assessment in the food industry, looks at the image feature and segmentation problem. The analysis of fruit and vegetables relies on the color, shape, size, texture, and disease identification [28]. Deep learning has made tremendous progress in the past few years in agriculture. CNN architecture used computer vision techniques to identify the potato disease in plants. The performance of the proposed architecture varies with the ratio of training data 90–96 using database images in the tomato field [29]. According to this study [30], Generative Adversarial Network (GAN) and CNN were introduced to recognize diseases in plant leaf with the support of android apps. In similar research, AlexNet with SequeezNet was deployed to detect 9 various kinds of diseases in tomato farms.

The proposed model is trained over the dataset of the plant village. The results taken over the AlexNet, which achieved an accuracy of 95.65 while SequeezNet attained an accuracy of 94.3 with less computational resources. To the best of this research [23], CNN considerably has better classification accuracy than Support Vector Machine (SVM) for the real-time identification of diseases in plants. The result showed that using a cloud-based system proposed model trained over the 1030 images yielded an accuracy of 93.4 on pomegranate and 88.7 on Firecracker images. In this survey, DensNet with 152 layers was proposed to classify multiple diseases in 14 kinds of plants accurately. DensNet achieved 99.75 accuracy on the plant village dataset using less number of parameters. However, it improved the computational time and performed best compared to the other architectures such as VGG16, ResNet-50, ResNet-101, and ResNet-152 and Inception-v4 in [31]. In similar research, tomato disease was identified using plant village data, and the model AlexNet estimated higher accuracy than VGG16 with feasible computational time in [32]. Transfer learning was utilized to detect tomato and sugar beet plants accurately. It also compared six different kinds of convolutional network architectures such as ResNet-101, ResNet-50, AlexNet, inception-v3, Google Net, and VGG19 under the consideration of various lighting conditions. The experiments declared that AlexNet showed a higher accuracy of 98.0 while VGG19 estimated 98.7 accuracy in [33].

3 Literature Review of Surveys

In this survey, Machine Learning (ML) presented tremendous progress in various agriculture applications such as disease detection, weed detection, preserving the crop from diseases and, most commonly, prediction and yield estimation. Artificial Neural Network (ANN) exploited for this purpose. However, using ML, new methods are proposed to save the agricultural food products, urging in [34]. In another research, deep learning was employed to address the challenge of food manufacturing in the agriculture domain. Besides the approaches mentioned above, deep learning achieved better accuracy and precision in the case of classification and regression problems. It also reduced the regression error. Despite extensive training, deep learning emerged in the agriculture field to solve various problems [35,36,37]. In similar research, a mixture of ML and image processing techniques was developed to facilitate an automated system for the precise recognition and grading of fruit and used to discriminate the fruit based on fruit appearance, diversity and maturity level. Moreover, image processing facilitates continuous, sterilized rapid growth in the fruit industry [38].

In similar research, Faster-RCNN cope with inception v2 and single-shot multi-box detector cope with Mobile-Net deployed for counting fruit containing 3 categories like Avocado, lemon, and Hass. The experiments showed that Faster-RCNN efficiently performed with 93.1 accuracy compared to MobileNet estimated 90 accuracy while counting fruit [39]. This survey analyzed computer vision techniques to address professional challenges in various fields of agriculture. Unmanned Aerial vehicle technique utilized to keep track of crop development, disease precaution, automate the harvesting and quality evaluation of agriculture products as discussed in [16]. This research concludes that scarcity of datasets is a common problem because newly developed RGB-D sensors have not been utilized to classify fruit [21]. This research briefly analyzed the quality inspection of fruit or vegetable-based on texture, pattern, color, size and shape characteristics. Besides the advancement of computer vision, multi-dimension images were not utilized for the quality evaluation of fruit or vegetable. Only a single image focused on the grading of fruit. A generic framework for classification, segmentation, sorting and grading on multiple fruits is required [6]. In reference [40], various image processing techniques with CNN mainly focus on three approaches of fruit detection, quality assessment control and fruit classification. This study also supports robot harvesting. The complete Literature Review is summarized in Table 2. The results showed that CNN and pretrained network explicitly outperformed for these tasks and achieved almost 100 accuracy. This survey analyzed that computer vision and ML integrated to solve the problem of agriculture domain and performed the brief analysis of seed, crops and fruits also improved their quality in [41]. ML, along with artificial intelligence, performed the agriculture supply chain assessment. Various ML algorithms were utilized to develop the permissible agriculture supply chain, which increased their yield [42].

Table 2 Literature review table

3.1 Deep Learning Framework for the Detection of Fruit and Vegetable

Deep learning methods have been commonly used in recent research to successfully detect various kind of fruits.

A : Artificial Neural Network (ANN)

Prediction of the vineyard for better yield is a necessary and challenging task to estimate the productivity rate in viticulture at various vineyard zones. ANN, combined with the association of vegetation index and vegetation fraction, was covered using computer vision techniques to address these problems. The proposed methodology is based on remote sensors and Unmanned Aerial vehicles (UAV), facilitating prior pre-diction instead of ground base measurement [67, 68]. Many authors have studied the various aspects of fruit classification on a public dataset on RGB images. In a similar study, the author presented the classification of 18 different categories of fruit using a computer vision algorithm. The proposed scheme showed that 99.8 accuracy were achieved on fruits like strawberries, blueberries, blackberries, pineapples, green grapes, red grape, black grape, and cantaloupes using Feed Forward Neural Network (FNN) with a deep learning algorithm [43]. This study presented a classification of three varieties of oranges using hybrid Artificial Neural Network—Artificial Bee Colony (ANN-ABC) with an accuracy of 97, Artificial Neural Network—Harmony Search (ANN-HS) provided 94 accuracy. From the comparative analysis with the traditional K-Nearest Neighbor (KNN) approach, the proposed method has been a significant advantage over the KNN 70.88 in [7]. The Artificial Neural Network (ANN) architecture can be seen in Fig. 1.

Fig. 1
A diagram that illustrates N number of inputs gives a output.On the top of the image is written input layer, hidden layer and output layer. Each input is denoted by a circle and the circles below the input layer is connected to the circles below the hidden layer by arrows and is further connected to one circle of output layer by the arrow.

ANN

B: DCNN

Recently, deep learning has made significant progress in object detection. In this research, cascaded CNN architecture used with augmentation method for better detecting fruit like apple images is collected in orchards. In addition, the image Net dataset was used to generate the dataset. The model is applied for the other type of fruit, such as strawberries and oranges, on the test dataset and achieved remarkable results [8]. Image acquisition has been made from various resources. Therefore, it is a difficult task to identify the object accurately. For this purpose, a novel Fruit Detection and Recognition (FDR) algorithm was proposed. Most of the well-known architecture CNN implemented precisely nominates the classification of fruit. The model performance showed high accuracy results using its dataset containing various images with less computational complexity [69]. In this research, hyperspectral images were exploited to classify fruit and vegetable using a pre-trained network with CNN of RGB data. A dataset of hyperspectral images is captured from the real images. The analysis estimates that Google Net with pretrained pseudo-RGB images is calculated from hyperspectral images achieved an average accuracy of 85.23, which was enhanced by using the compression of kernel module of 92.3 in [70].

The CNN model is efficiently designed to detect and recognize 60 categories of fruits. The model has been trained on the Fruit-360 dataset for early detection of fruits. Experiment results showed that 96.3 accuracy was achieved while training the NN. However, the model was not suitable for real-time application. It was just limited to Fruit-360 in [71]. Automatic identification of a defect in fruits analyzed exploiting CNN architecture such as tuta absoluta defect exists in tomato plants. For this purpose, 3 pre-trained networks like inception v3, VGG16, VGG19 and ResNet module were proposed. Inception v3 well performed with accuracy of 87.2 compared to the other model. These pretrained networks easily calculate the variation among the severity condition of tuta absoluta at low-, high, and no tuta.

The results showed that mango 88 accuracy, lime 83 accuracy, pitya 99 accuracy by the utilization of video streaming efficiently [72]. Fully convolution network developed for automatic detection and semantic segmentation of guava fruit and branches with 3D-pose estimation in the orchard, and accuracy of 0.893 with 0.806 IOU estimated of guava fruit on segmentation. Although on-branch segmentation it is a difficult task. However, their results performed better than the traditional algorithm for the detection of guava of 0.983 precision and 0.948 recall [64]. CNN architecture was implemented for precise on-branch-based fruit recognition using the PH method. The proposed algorithm is designed for real-time applications. For the experiments, data has been collected from six kinds of fruits: apricot, apple, nectarine, sour cherry, peach, and colored plums in orchards using RGB images. The proposed model attained an accuracy of 99.76 with 0.019 cross-entropy loss. Hence, it declared that the proposed technique efficiently works compared to the traditional approaches Yolo, ResNet, and VGG16 in [26]. In this paper, the CNN model is presented to accurately classify fruit and vegetable. In the addition of VGG architecture on a publicly available dataset, the model achieved 95.6 accuracy. This task is accomplished using the data preprocessing step, feature extraction method, and multiple classifiers to classify the images using different performance metrics [73]. With the fusion of two feature learning algorithms such as CNN and multi-scale multi-layered perceptron’s, a pixel-wise fruit segmenting was proposed for the fruit detection using watershed segmentation and the Circular Hough Transformation (CHT) for the individual supervision of the fruits while counting the images captured in orchards. The performance of the proposed model achieved the best results with watershed segmentation by the utilization of a squared correlation coefficient of 0.826 in [51].

This paper presented a framework of a deep convolution neural network trained on a small custom dataset pretrained on a large dataset for developing a high-performance fruit detection system. The authors utilized a faster region-based R-CNN network to combine two modules of RGB and Near Infra-red (NIR) images with early and late fusion enhanced the DCNN, and Fig. 2 shows the architecture of DCNN. The results showed that the proposed scheme gave better results than the conventional system [45]. DCNN presented a challenging classification of cherry fruit due to its irregular shape. The performance of the proposed algorithm is enhanced by the addition of hybrid max and average pooling. The results showed 99.4 accuracy using the data augmentation method is higher than traditional ML methods such as KNN, ANN, EDT, and fuzzy logic [14].

Fig. 2
A flowchart for D C N N. The start is from the input image. Then it goes through n stages. Each stage is illustrated as a rectangle. Inside the rectangle is four rectangles marked Kernels, convolution, Re L U, and maxpooling. Then it goes through fully connected layers then to output to label to loss function.

DCNN

C: Mask-R-CNN

Mask R-CNN was adopted on the public dataset to target the three main problems in this research: object recognition, semantic segmentation, and instant segmentation [43]. In another line of research, Mask-RCNN with Feature pyramid architecture was exploited for automated identification of strawberry harvesting under various lighting conditions, occlusions, and complex backgrounds. The model's performance was evaluated over the 100 images and achieved 95.7 precision, 98.4 recalls with 0.89 intersections over union [59]. Mask-RCNN is specifically designed for the object detection and instant segmentation task for the pixel-wise detection of each fruit as shown in Fig. 3. The developed framework performed the experiments over the RGB and HSV data undergoing natural environmental conditions in orange orchards. The output of the model showed 0.89 F1-Score, including RGB and HSV images. Robot harvesting is one of the advantages of the mask segmentation approach [60]. The WGISD dataset is utilized to detect wine grapes using the ResNet architecture for counting and tracking fruits. On the other hand, bounding box techniques were applied for the object recognition task and targeted the grape cluster successfully using the structuring element hit and miss strategy with the precise shape and size of the fruit. Instant segmentation maintains the tracking with mask annotation using the CNN architecture. Mask R-CNN was deployed for all three approaches, while the Yolo was employed just for the object recognition task. Hence, the Yolo-v3 approach was utilized for the multi-label classification. The results showed the 3D model employed for grape segmentation with a 0.90 F1-score [43].

Fig. 3
A flowchart for mask R - C N N. It begins from an input image, to extract features, to feature maps, to features maps projected region proposals, to classification. Feature maps includes R P N. Classicification includes for each R O I.

Mask R-CNN

D: Faster-RCNN

Faster RCNN is one of the famous frameworks for object recognition in [74]. Multi-function architecture was proposed for the detection and segmentation of fruits for robot harvesting in apple orchards. The proposed technique Detection and Segmentation Network (DasNet) outperformed with 83.6 Average Precision (AP) and 0.832 F1-score rather than three traditional schemes Yolo-v3, Faster-RCNN, and ResNet-101. In addition, the light-weighted network achieved the best results F1-score of 0.827 on the classification of apple and 87.6 and 77.2 on segmentation of apple and branches in the orchard, which increases the model's performance in [75]. In fruit classification, Faster-RCNN was adopted for the parallel detection of various fruits, including mango, apple, and almond. It improved the performance via data augmentation and reduced the labeling cost [76]. The combination of Faster-RCNN with three residual networks of ResNet50, ResNet101, and ResNet inception-v2 accurately detected tomato plants using the COCO dataset the architecture of Faster-RCNN shown in Fig. 4. Experiments take a long training time; hence the proposed technique improved accuracy with an F1-Score of 83.67, AP of 87.83, and IOU greater than equal to 0.5. Therefore, Faster-RCNN with ResNet101 strengthen the fruit counting, robot harvesting and is pertinent for yield prediction [13].

Fig. 4
A flowchart of faster R C N N. It begins with a input image to feature map to R P N to R O I pooling to output image. R P N and R O I pooling has different stages in it namely f c layer , softmax and regression.

Faster-RCNN

E: Yolo Network

Specifically, this research utilized a supervised-based Yolo-v2 architecture to detect green mango under various lightning postures. A novel method, UAV introduced for visual detection in the orchard. The proposed algorithm showed 96.1 precision and 89.0 recall considering illumination effects in [77]. In another research line, DensNet-Yolo-v3 with the fusion of anthracnose method was introduced to detect apple lesions in orchards. DensNet is useful for Yolo-V3 to optimize the feature extraction process and help to minimize their resolution. In addition, Cycle Consistent Generative Adversarial Network (CycleGAN) was deployed to enhance the dataset. The proposed technique efficiently worked compared to the traditional Faster-RCNN and VGG16 model with less detection time in a real-time environment [78]. In this paper, a novel YOLO-V3-dense model is developed to detect fruit to monitor its various growth stages in orchards. For this purpose, real-time data was approximately convenient to prevail. Figure 5 shows the YOLO Network architecture. The Dense-Net method was exploited to substitute a low-resolution feature layer in the Yolo architecture. The results showed that the YOLO-V3 Dense-Net model had been a more significant advantage over YoloV3 and Faster RCNN algorithm in [79].

Fig. 5
A flowchart of Y O L O network in clockwise direction. Theflowchart indicates how the input image undergoes different levels of conversions and detection to give the output image.

YOLO network

F: SSD

In similar research, Faster-RCNN cope with inception v2 and single-shot multi-box detector cope with Mobile-Net deployed for counting fruit containing 3 categories like Avocado, lemon, and Hass. The experiments showed that Faster-RCNN efficiently performed with 93.1 accuracy compared to Mobile Net, which has been estimated 90 accuracy while counting fruit in [39]. A novel system Efficient-Net and Mix-Net, developed for the automatic detection of fruit, overcomes prolonged training and testing time. The model was trained over the 48,905 images. The experimental results performed over the ImageNet dataset showed that Mix-Net speeds up the model's performance and reduces the computation cost. However, Efficient-Net achieved better accuracy than the conventional system and reduced the number of parameters in [80].

4 Datasets

The dataset has been crucial to solving various problems such as object recognition, classification, and segmentation in research. Availability of the dataset affects the prediction of the model to achieve the desired outcome using a compatible algorithm. Increasing the training dataset and developing a new dataset for the best performance are needed to solve the critical, challenging problem. The availability of various images assisted in making different kinds of comprehensive datasets over the internet. Therefore, with the presence of millions of images, the scalability of datasets has become covered. Object recognition has made an extraordinary performance with the significant development in datasets. This survey reported various datasets, which are listed in below Table 3.

Table 3 Datasets

5 Performance Assessment Metrics

Various assessment metrics have been utilized to evaluate the performance of deep learning models that vary to the corresponding problem. Deep learning used different types of performance evaluation parameters such as True Positive (TP), True Negative (TN), Precision, Recall, F1-Score, Average Precision (AP), and Intersection over Union (IOU), Root Mean Square Error (RMSE), Mean Residual Error (MRE) and Relative Error (RE). These metrics are designed to validate the prediction of various models. Some researchers used individual metrics to measure the performance, and some used a combination of metrics. Table 4 lists these metrics with their symbols and formulas. All Performance metrics are listed in Table 4.

Table 4 Performance evaluation metrics

A: P

P stands for precision which measures the TP observations from predicted positive observations.

B: R

R stands for recall which predicted the TP detections from the actual annotations.

C : AP

AP metric stands for average precision that is most widely used for object detection tasks. Typically, this metric calculates the accuracy of deep learning models.

D:  IOU

IOU metric accurately performed the object localization and was mainly used for object detection purposes as shown in Fig. 6. A few research studies considered threshold value greater than equal to 0.5, which showed better prediction of the models.

Fig. 6
Intersection over union equals, I o U equals area of overlap over area of union. There is two figures of two rectangles overlapping each other. One rectangle has orange border marked ground-truth and the other one has blue border marked prediction.

IOU

E: F1-Score

F1-score defines the harmonic mean of precision and recall curve, which is widely used to test the performance of ML and deep learning models. F1-Score is the best choice of the researcher to optimize the model performance and reduce the FP and FN rates. F1-Score is considered the best case when it’s equal to 1, whereas it is considered worse when it’s equal to 0. The Tradeoff between the precision and recall curve can be seen in Fig. 7. However, few metrics have not been able to decrease both FP and FN rates simultaneously, which is why this metric is used commonly for specific problems. Table 5 shows the Comparative analysis of Fruit/ Vegetable using machine learning and digital image processing techniques.

Fig. 7
A line graph illustrates the tradeoff between the precision and recall curve. The recall is maximum at low precision, as precision increases the value of recall decreases.

Tradeoff between the precision and recall curve

Table 5 Comparative analysis of fruit/ vegetable using machine learning and digital image processing techniques

6 Conclusion

We have discussed the comparison of state-of-the-art deep learning techniques in detail for detecting and classifying fruit and vegetable under the supervision of computer vision. Computer vision supports the steady, fast and trustworthy fruit and vegetable yield estimation. Hence, our work reported that our proposed deep learning framework copes with various challenges of the agriculture domain compared to the traditional ML approaches. Our study mainly focuses on computer vision techniques with a deep learning algorithm for the accurate detection of fruit or vegetable, which constitute various developed datasets, training models, and performance evaluation parameters. Future researchers focus on state-of-the-art deep learning algorithms, which would be the best approach to automate the farming system that deals with agriculture problems with the help of computer vision.

6.1 Key Challenges

With the development of deep learning algorithms, the researchers propose many techniques; still, there are some challenges to overcome, and computer vision, as a promising technology, will continuously play an essential role in the quality inspection of fruits and vegetables. The demand for large-scale datasets has been increasing to address the state-of-the-art challenges in farming. It would be more interesting to adopt the hybrid approach of computer vision and artificial intelligence with the diversity of scalable datasets [16]. With the growth of un-organized situations plus obstruction, varying lighting conditions and inconsistent clustering have been significant challenges for detecting fruits in orchards [8]. Fruit and vegetable accurate detection is a state-of-the-art challenge. Due to the similar color, size, and shape characteristics, it is difficult to discriminate between the apple and tomato. For this purpose, an efficient algorithm is exploited to distinguish between the similar properties of the items based on feature and texture information [71, 102].

Instant segmentation is still a challenging task because it provides fruit's overall geometry like its color, shape, and size. In the future, it would be best practice to implement the expert technique to obtain the desired results of instant segmentation [75]. Pose estimation is necessary to target the exact region of the object instead of colliding non-interested object or background, which enhances harvesting profit. In [64], the 3D pose estimation is improved, which is still a challenging task. The above research analyzed the numerous aspects of computer vision for fruit and vegetable detection. Hence, it declares that the existing system has some problems, which is still challenging. One of the major flaws of the classification and recognition of the system is the less availability of dataset of fruit and vegetable. Lack of knowledge about the utilization of techniques like A.I, ANN, ML, fuzzy logic, etc., in [73]. Due to the various bad illumination effect background complexity of the fruit detection system, it is difficult to design the automated robot for the fruit and vegetable in yield estimation.

6.2 Future Directions

Several newly developed sensors have still not been utilized for the detection of fruit and vegetable. However, it is crucial to design a significant dataset that would be enough to acquire benefits from RGBD-sensors in the future [21]. In the future, there is a need to explore an expert system for detecting lesions on multiple fruits and focus on identifying the kind of lesion that facilitates the diagnosis process to prevent plant diseases [78]. One of the major drawbacks of computer vision is the extensive computational complexity in large recognition time. Implementing a system that takes less time in recognition would be the best practice. In another way, various descriptors are exploited to acquire the best performance for classifying and detecting fruit or vegetable-based on variant color, shape, size, and texture, as discussed in [105]. Future research is trying to exploit the GAN to generate synthetic images similar to real data, significantly affecting the classification and recognition of an object in computer vision [106,107,108].