Introduction

Computer vision technology corresponds to the effect of the human vision in inspecting the quality of fruit and vegetables by electronically perceiving an image, interpreting, and recognizing the characters, and finally, information is provided for the quality grading and sorting machine. By combining machine vision (MV) and image processing (IP), with the technological advances in computers, such systems have found the capability to be applied in the different fields of food engineering to accurately identify product features in on-line, in-line, at-line, and off-line approaches. Today, researchers use a combination of the features including color, morphology, and texture to extract and analyze the desired data and achieve higher accuracies and efficiencies than to the cases where these features are applied, individually [1, 2].

Based on the IP procedures, the MV systems are used to extract quantitative information from digital images captured by cameras for rapid and non-contact qualitative evaluation. By MV applications as non-destructive and fast systems, destructive testing methods in quality assessment of the products would be avoided. Therefore, quality assessment is performed in highly controlled conditions in terms of lighting and photography, using algorithms and mathematical relations, etc. [3, 4]. In the traditional methods, only the external information of a product could be accessed, but with the advancement of science and technology, in addition to external information such as color, the internal features such as texture can be extracted to have a better defect detection to products. It should be noted that it is not possible to access internal information using the RGB image acquisition systems, but some researchers used alternative methods such as multispectral technology, NIR imaging [5], thermal imaging [6], X-ray [7], magnetic resonance imaging [8, 9], fluorescence imaging [10], and hyperspectral imaging [11, 12] methods to create a solution for extracting internal information. However, these methods are facing some challenges such as speed limitation and being required expensive systems.

RGB images are used for physical information such as color, shape, and size, but compared to hyperspectral images, they cannot provide compositional properties, and instead, hyperspectral images can reflect chemical information, in addition to extracting the characteristics of position, shape, and size. Therefore, by using MV systems, a non-destructive method can be developed for better defect detection in food products [13,14,15]. Various parameters are involved in classifying and grading agricultural products. These parameters are color, size, maturity, defects, shape, etc., which are the chemical and physical parameters of the product [16]. Analysis of the texture is helpful for defect detection due to the discrimination among defective and healthy regions of products,also, an appropriate method is a combination of color and texture features to increase the accuracy of defect detection applications [17].

Defect in fruit and vegetables is defined as “an imperfection or abnormality causes fruit and vegetables to be less healthy and valuable, while impairing their quality and utility.” Defects are one of the most common causes of fruit and vegetable quality loss in the agricultural industry. Degradation of nutrients and appearance of the product due to the defects would affect the market and the price of the product. To overcome these problems, defect detection is considered as one of the best approaches in current state of the art. Defect detection is a process to identify the defects or damages in fruit and vegetables, based on the shapes, colors, textures, etc. [18]. One of the important issues in the field of defect detection of fruit and vegetables is the amount of impact on their market value, which is directly related to how consumers choose the product. Today, with the advancement of science and technology in the field of agriculture, defects are detected with greater accuracy than the traditional methods which cause economic growth and improve the quality of products. It should be also noted that the color, size, regular shape, and other quality parameters are the most important indicators for customers to choose a product. Due to preharvest and postharvest processing and storage conditions, fruit and vegetables are prone to various defects leading to losses through the food chain. The defects can be divided into internal and external ones; therefore, due to the nature of the defect, detection methods may differ from one case to another. Therefore, the correct choice of detection systems and technologies helps to have a more accurate defect detection procedure that brings a high-quality product to market.

In this review, we focus on the application of computer vision systems (CVSs) in non-destructive defect detection approaches and the issues existing in this field. We focus on fruit and vegetables from a variety of horticulture products, since they are important parts of agricultural production and the detection of defects is a major challenge for this category of products, and separation of defective products can play a significant role in reducing losses and increasing quality and consumer satisfaction. Therefore, this paper aims to give a survey of IP techniques and computer vision applications in the field of fruit and vegetables and also to review various (CVSs) types of segmentation, image features, and quality analysis of fruit and vegetables based on color, shape, size and texture, and the type of their defects. Additionally, the principal components, basic theories, and corresponding analysis and processing methods are reviewed. We put one’s finger on the hurdles that scientists are trying to overcome and discuss future trends for the research and development of CVSs.

Accordingly, the review is organized as follows: “Types of Defects” describes types of defects followed by “Computer Vision System”, which broadly and deeply describes the computer vision systems (CVSs) applied for fruit and vegetables. Further, in “Basic Steps in Computer Image Processing”, the basic procedure in IP is presented. In “Stem-Calyx Problems” and “Defects in Fruits and Vegetables”, external and internal defects detected by CVS in fruit and vegetables are respectively reviewed. In “Future Trends” and “Conclusion”, the future scope and conclusion are presented.

Types of Defects

Defect detection has three overall results, ensuring consistently high-quality products, reducing food losses, and enhancing profitability for the industry [19]. The common involved defects in fruit and vegetables are internal defects,morphological disorders like deformation and irregular shapes; pathological disorders; mechanical damages during harvest and processing; and physiological terms such as metabolic changes (Fig. 1) [20].

Fig. 1
figure 1

The defects that are commonly found in fresh fruit and vegetables [20]

However, there are no accurate and comprehensive data on fruit and vegetable losses arising from defects in the literature and studies that have addressed this issue have provided different and inconsistent information on fruit and vegetable losses. Nevertheless, Gustavsson et al. [22] have provided useful information in this regard which is presented in Fig. 2.

Fig. 2
figure 2

The initial production lost or wasted at different stages of the food supply chain (FSC) for fruit and vegetables in worldwide [22]

Figure 2 shows a series of pie charts corresponding to the initial production lost or wasted at different stages of the food supply chain (FSC) for fruit and vegetables in worldwide.

In the FSC, five segmentations have been distinguished for fruit and vegetables loss/waste as follows [22]:

  • Production: During harvest operation, losses can be occurred due to spoilage or mechanical damage. In this section, morphological disorders, mechanical damages, and pathological disorders are considered.

  • Storage: During handling, storage and transportation spillage and degradation can cause losses to the product between farm and distribution. Internal defects and mechanical damages can be considered in this part.

  • Processing: Industrial or domestic processing such as grading and sorting, and washing can cause losses to the product. Crops may also break or injured and bruised in industrial processes.

  • Distribution: losses in stores distributing products. In industrialized and developed countries, there will not be many losses in the distribution of products to consumers due to special facilities. But in non-industrialized countries, these losses may be significant.

  • Consumption: Losses at household section, physiological disorders may cause fruit and vegetable losses

In the sector of fruit and vegetables, the disease is a major factor for quality losses, leading to economic losses. Although many diseases and pests can be prevented by utilizing pesticides and fertilizers, in many cases, such diseases are not completely prevented, and therefore, more inspections should be done in postharvest processes to separate the diseased crop, as early as possible. An important issue is that the disease may be spread over the part of the image of fruit and vegetables within short spam [23]. Therefore, the application of CVSs for detection of different type of diseases is crucial in the sector of fruit and vegetables. Nowadays, so much research is conducted to propose robust CVSs to overcome such challenges. Table 1 represents the recent advances of the CVS technology in detecting different types of defects.

Table 1 Recent advances of the CVS technology in detecting different types of defects

Morphological Disorders

Deformation of the product so that it takes on an irregular shape is one of the morphological disorders that can prevent the diagnosis of defects and diseases by creating shadows or due to its distortions, so it may have an adverse effect on the results obtained from CVS. Because the damaged areas may not be identified, the product would be considered as a healthy product [34, 35]. Image processing may be helpful to detect the irregular shapes of products. Jahanbakhshi and Kheiralipour [36] used the image processing technique to discriminant between the regular and irregular shape of carrots, they extracted the features including length, width, perimeter, and compactness to classify with the linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) methods and reported the classification accuracies of 92.59% and 96.30% LDA and RDA, respectively.

Internal Defects

For internal defects, special cameras have been introduced to detect the defects of agricultural products. These cameras explore latent damages and disorders including physiological and pathological disorders and mechanical damages. The ability of CVSs to detect such latent defects is of great importance along the food chain. Diagnosis of internal diseases prevents the spread of disease, food losses, and consumer dissatisfactions [31, 37,38,39]. X-ray radiography was suggested by van Dael et al. [31] to detect the internal defects of citrus which resulted in 93.6% accuracy for lemons and 95.7% accuracy for oranges. Hyperspectral imaging was also used to detect internal defects in cucumber which obtained 91.1% accuracy reported by Cen et al. [32].

Pathological Disorders

Another parameter is pathological damages which are associated with attacks by fungi, viruses, bacteria, or microbial pathogens [20]. Many disorders of pathological nature exist, and their manifestations in agricultural products may be visually similar regardless of the type of infection or product [40]. Thus, the ability to detect the infecting agent and/or chemical reactions there associated helps to identify the causal effects and accurately determine the specific disorder [41]. Sun et al. [42] suggested hyperspectral imaging to detect fungal disease based on the chlorophyll content of peaches that gives 98.75% accuracy.

Mechanical Damages

Another factor that affects the quality and shelf life of the product is mechanical damages arising from impacts, wind forces, and environmental conditions. Mechanical damages lead to metabolic changes, tissue failure, and crushing, scratches, pigment deterioration, etc., on the surface or inner tissues of the product [43, 44]. Some of the mechanical damage to citrus fruit is bruising, cuts, and compression. Various factors, especially in the harvest stage, affect the sensitivity of the fruit. High relative humidity can increase the sensitivity of the fruit to mechanical damage. Environmental factors include hail and rain damage. Also, insect damage is one of the most serious problems of the citrus industry. For example, in oranges, damage to the surface of the fruit can be referred to as insect injury, wind scarring, scarring, scale infestation, canker spot, dehiscent fruit, copper burn, and phytotoxicity [45]. A suitable technique for mechanical damage detection is to use thermography which uses infrared radiation emitted by a heated object to provide good quantitative information about damage depth and damage size [46]. Hyperspectral imaging was also suggested by Cui et al. [47] which was used to detect the mechanical damages with 87% accuracy for blueberry.

Physiological Disorders

Sunburn, superficial scald, watercore, internal dryness, granulations, mealiness, browning, bitter pit, etc., occur due to physiological stresses related to temperature, respiration, transpiration, and nutrition of the product at different stages of growth, harvest, and postharvest processing [48, 49]. Therefore, due to these disorders, the commercial value of the product would decrease [31]. Near-infrared shows the ability to detect the physiological disorders in melons as reported by Ito et al. [50]. Thermography is considered as another technology to be able to detect the watercore occurrence in apples as reported by Baranowski and Mazurek [51].

Computer Vision System

The traditional methods of defect detection are usually based on the diagnosis of experts. These methods are inconsistent, fickle, time-consuming, and costly. Human error due to fatigue and distraction is also inevitable. CVS can help to solve such challenges. In the analysis of fruits and vegetables, the light systems are structured as front and backlighting. To inspect the surface quality, attributes like color, texture, and skin defect front lighting are defined. However, to inspect the boundary quality, attributes like size and shape backlighting are defined. The traditional, multispectral, and hyperspectral computer vision systems are defined extensively for the quality analysis of food and agricultural products [52].

In general, as shown in Fig. 3, a CVS consists of these main components: camera, illumination, an image capture board, and a computer system. For hyperspectral and multispectral photography, a spectrometer and a sample displacement mechanism are normally used.

Fig. 3
figure 3

A typical vision system for traditional, hyperspectral, and multispectral CVSs

Types of CVSs can be divided into three categories: traditional CVS (T-CVS), hyperspectral CVS (H-CVS) and multispectral CVS (M-CVS). Table 2 represents different MV systems and their applications in the literature for defect detections in fruit and vegetables.

Table 2 Some studies were conducted for defect detection in fruits and vegetables

Traditional Computer Vision System

As the human eye is sensitive to the basic colors of red, green, and blue (RGB), T-CVSs act like the human eye, and their photography is based on the three colors mentioned. Low-cost and easy-to-use are the most important advantages of these systems. T-CVSs can be used to identify foreign diseases by extracting features such as color, size, and shape. However, T-CVSs have some weaknesses, but a significant part of those identification problems can be resolved with the help of artificial intelligence algorithms. One of these problems is the inability to recognize the damaged areas that appear in the same color as healthy areas on the product. Also, it can be noted that these types of CVS are unable to detect the internal damages of the products because it does not have the ability to penetrate into products. T-CVS is limited to capture only three monochromatic images centered at 700.0 nm (red, R), 546.1 nm (green, G), and 435.8 nm (blue, B) which are very close to the colors received by the human eye. Therefore, T-CVSs are merely based on a visual imaging system, meaning that they can only provide limited information, which limits the application of CV for defect detections, especially for early-stage internal defects [62]. Some disadvantages of T-CVS are low accuracy for classifications and weak ability comparing to the newer systems or methods. Figure 4 shows an RGB color image of peach and its components images.

Fig. 4
figure 4

An RGB color image of peach and its components images [63]

Hyperspectral Computer Vision System

Hyperspectral imaging acts based on molecular vibrations, which makes it an attractive method to simultaneously achieve both spectral and spatial data for fruit and vegetables. H-CVS is mainly used for mechanism researches and selecting efficient wavelengths. Unlike the T-CVS, which can only capture three monochromatic images to imitate the vision of humans, H-CVS or hyperspectral imaging system combines both imaging techniques and spectroscopic into one system to get a set of monochromatic images at almost continuous hundreds of thousands of wavelengths. In hyperspectral imaging, the recorded spectra have fine wavelength resolution and cover a wide range of wavelengths.

The data structure of hyperspectral images is commonly called hypercube, or data cube. For a hyperspectral image cube, three methods can be used to acquire: point scanning, line scanning, and area scanning. As shown in Fig. 5, in point scanning, the object or camera is moved in x and y directions to capture the entire surface of the object. For the third dimension of this cube, different wavelengths can be applied, it should be noted that the method is time-consuming. In-line scanning, the object, or camera is moved in only one direction to capture the entire surface. In area scanning, the entire surface of the body is imaged in an instant and is completed with different wavelengths of the third dimension of the cube; this method works faster than the mentioned methods.

Fig. 5
figure 5

Methods of acquiring the hyperspectral image cube [63]; (a) Point scanning approach, (b) Line scanning approach, (c) Area scanning approach

The greatest advantage of an H-CVS is providing wide ranges of information in the hyperspectral image. Some external quality characters are always unclear in traditional digital images, so it is making the detection impossible or difficult such as early bruising, rottenness, and some other diseases. Also, some defects are identical to the skin of the product due to texture and color which is a challenging task to detect. Unlike images in the T-CVS, where spectrum information is limited to three color channels, hyperspectral images contain hundreds of thousands of monochrome images in the spectral domain, when the data cube is large, significant storage capacity is needed which can be considered as a challenge. As another advantage, it also can be noted that with the help of the spatial relationships among the different spectra, accurate classification and segmentation of the images can be achieved [64]. Some of the disadvantages of the H-CVS are complexity and cost. Figure 6 shows the conceptual view of a hyperspectral image with spectral and spatial domains.

Fig. 6
figure 6

The conceptual view of a hyperspectral image with spectral and spatial domains [65]

Multispectral Computer Vision System

M-CVS differs from the hyperspectral imaging system in the number of monochromatic images in the spectral range. One of the advantages of the M-CVS is that the wavelengths of the monochromatic images captured can be chosen freely by using narrowband filters. Spectral imaging can allow the extraction of additional information that the human eye fails to capture with its visible receptors for red, green, and blue. Multispectral imaging measures light in a small number (typically 3 to 15) of spectral bands. M-CVSs are used for fast in-line applications. Figure 7 shows the apple images with different filters and manual segmentation.

Fig. 7
figure 7

Examples of apple images and their manual segmentations. The first four columns present images from different filters, while the last one shows corresponding manual segmentations. Rows display apples damaged by different defect types [54]

Using multispectral systems, in the discussion of external quality inspection of products, it is possible to identify damages that were not visible by RGB cameras at specific wavelengths with special filters. Some disadvantages of this system are that it must be manually programmed, calibration should be severally repeated, and the problems such as camera lens distortion should be resolved, and so on.

A comparison between the hyperspectral and the multispectral imaging is shown in Fig. 8, as it is obvious, the number of the captured images in the hyperspectral system is much more than the multispectral ones, so the response of them is also different, hyperspectral gives a continuous spectrum, but multispectral has been created by several spectral bands.

Fig. 8
figure 8

The difference between hyperspectral and multispectral imaging [66]

Basic Steps in Computer Image Processing

A typical CVS task can be grouped into three major ones as shown in Fig. 9, which includes the image acquisition that deals with the components such as camera, illumination, and digitizer, and in the next step, IP are divided into three items containing preprocessing, segmentation, and feature extraction, and at the end image understanding that entails image recognition and interpretation.

Fig. 9
figure 9

Basic steps in IP

The success of a CVS task can be achieved while all three sub-tasks of IP work well, but if a single sub-task is disrupted, then it will not have the desired result.

Image Acquisition

Image acquisition for processing is the main activity in MVSs. A good light source or lighting system and a proper camera are the most important tools to achieve a good digital image for processing.

Camera

In the first step, the right camera should be chosen to take images under controlled conditions in terms of illumination and photography. The most common cameras applied in MVSs are charged coupled devices (CCD), complementary metal–oxide–semiconductor (CMOS), magnetic resonance imaging (MRI), electrical tomography, and computed tomography (CT). The CCD cameras have been used for nearly all computer imaging applications since their introduction in the early 1960s and according to its advantages, it can be connected directly to a computer, can be highly sensitive to low-light levels and their images are instantly available without film or dark-room processing, CMOS cameras were introduced in the mid-1990s, and in the comparison between CCD and CMOS, it can be mentioned that CMOS sensors are faster than their CCD counterparts, which allows for higher video frame rates, CMOS imagers provide higher dynamic ranges and require less current and voltage to operate, Therefore, to choose the right camera for the CVS task, the specific conditions and requirements of a task must be weighed, and then, the camera should be selected, In addition, other factors that are important in this selection are resolution, analog or digital output, signal-to-tone ratio, minimum brightness required, signal output, and additional camera adjustment [52]. In the field of hyperspectral and multispectral imaging, the used camera can analyze a wide spectrum of light instead of just assigning primary colors (red, green, blue) to each pixel, these cameras work by imaging different wavelengths of light, the number of wavelengths used for hyperspectral is way to more than multispectral, and the main difference between multispectral and hyperspectral imaging is the number of wavebands being imaged and how narrow the bands are.

Illumination

The illumination source is the crucial factor in acquiring informative and high-quality images. Any vision application requires that the examined features are well seen in the image. With advances in image analysis and processing algorithms, illumination can affect camera performance on the success and reliable performance of computer vision applications. Proper illumination improves detection and discrimination between sample features and reduces processing time. Therefore, it is preferable to improve the lighting conditions instead of writing complex algorithms to reduce the cost and reduce the execution time.

In general, computer vision applications in the agricultural industry face more challenges than other industries such as the automotive industry. The problems such as overlapping and shading are examples of these challenges that can be overcome with proper lighting; Designing a proper lighting system and tailoring the lighting conditions to the environmental conditions and the product under study is one of the important principles of computer vision applications. Lighting will shine on the product from the front and back, front-lighting used for surface features and backlighting used for subsurface features. The light reflected from the object passes through an RGB glass filter to simulate the standard functions of an observer for a particular illuminator. An optical detector beyond each filter detects the amount of light passing through each filter, and these signals are displayed as X, Y, and Z values. When light shines on an object, it passes through three red, green, and blue filters. The detector creates an image for each filter, which is a two-dimensional matrix. Each image is a gray image individually, and the combination of these three images presents a color image; according to the standard of the Commision de International de I’Eclairage (CIE), X, Y, and Z values are obtained from Eq. (1) that make up the colors. The tristimulus value for the color with a spectral radiance Le.Ω.λ is given in terms of a standard observer; λ is the wavelength of the monochromatic light with the [380, 780] limits; the factors \(\overline x,\;\overline y,\;\overline z\)  are three curves in this standard according to the filters used and the light received [52].

$$\begin{array}{c}X=\int_\lambda L_{e.\Omega.\lambda}\,(\lambda)\,\overline xd\;\lambda\\Y=\int_\lambda L_{e.\Omega.\lambda}\,(\lambda)\,\overline yd\;\lambda\\Z=\int_\lambda L_{e.\Omega.\lambda}\,(\lambda)\,\overline zd\;\lambda\end{array}$$
(1)

Therefore, the color matrices namely x, y, and z can be obtained by Eq. (2):

$$\begin{array}{c}x=\frac X{X+Y+Z}\\y=\frac Y{X+Y+Z}\\z=\frac Z{X+Y+Z}\end{array}$$
(2)

Image Processing

IP operates on acquired images and results in images, which can enhance the visibility of the existing features and facilitate further analysis. As mentioned, IP is divided into three main parts: image preprocessing, image segmentation, and feature extraction, each of which is described in detail below [52].

Preprocessing

The purpose of preprocessing is to enhance the quality of the acquired image in the IP field, which is often degraded by distortion and noise in the optical and electronic systems of the input device. In the IP section, this action makes the desired and important features ready for further processing by improving the existing image; geometric transformations of images such as rotation and scale are considered as preprocessing operations [67].

In the preprocessing phase, the taken images should be improved in terms of quality and readability. These improvements aim to eliminate noise and to solve problems such as blurring of photos taken from moving objects on the rails and lack of lightning. Two main approaches in preprocessing are pixel preprocessing and local preprocessing. The first approach converts the acquired image into an output image, while each output pixel is correlated to the input one having the corresponding coordinates. While, local preprocessing uses a small neighborhood of a pixel to produce a new brightness value in the output image [68]. With filters and masks, anomalies would be identified and solved. Some useful low-pass filters in the field of image preprocessing are the average filter, Gaussian filter, and median filter, which are always used to remove the blur and noise of the captured images. Rong et al. [45] used the median filter for filling holes and removing the region of the stem end. Mittal et al. [3] used a median filter with a 5 × 5 pixel size to remove the noises. In the field of vegetables and fruits, because of uneven illumination, low-level contrast will be obtained which can be solved by increasing the brightness level,one of the methods is histogram equalization which is a non-linear technique that makes the histogram to be flatted; in fact, it increases the value of the brightness of the pixels. Another field of image preprocessing is image distortion that also can be corrected with some geometric transforms, such as image rotation, mirroring the object, translation, transpose, and scaling. The most important drawback of pre-processing is that it is time-consuming and success in further phases heavily depends on this stage.

Segmentation

Segmentation is used to divide or separate the areas in an image. White and black background colors make segmentation easier because, in the histogram, thresholding would be easier to calculate. Other applications of segmentation can be used to diagnose diseases and defects. RGB color images are far more complex than gray images, depending on the software used in the algorithms. But using gray images, it is easier and faster to identify samples for removing the background.

Gray-level thresholding is the simplest segmentation process method that can easily be performed in real-time. Many image regions or objects can be identified by the constant reflectivity or light absorption of their surface; then, a threshold can be determined to segment objects and background. Thresholding is computationally inexpensive, fast, and also the oldest segmentation method which is still widely used in simple CVS tasks. According to Eq. (3) thresholding is the transformation of an input image (A) to an output (segmented) binary image (B):

$$\begin{array}{c}B(i,\;j)=\mathit1\;\;\;{\text{if}}\;\;\;A(i,\;j)>\;{\text{Threshold}},\\B(i,\;j)=\mathit0\;\;\;{\text{if}}\;\;\;A(i,\;j)\leq\;{\text{Threshold}}\end{array}$$
(3)

where (i, j) represents the pixels of a two-dimensional image and threshold acts as a limitation for segmenting the object from the background. In segmentation, according to the mentioned equation, the algorithm searches across the entire pixels in the image and compares them with the threshold value, and then decides to consider as the background (value 0 or black) or object (value 1 or white). A proper threshold selection is necessary for a successful segmentation, this method is suitable for those images in which the objects do not touch each other, and their gray levels are distinct from background gray levels.

As mentioned, the thresholding operation based on the histogram diagrams, because of the large color differences between the background and the sample, separation would be done with high accuracy. Rong et al. [45] used a histogram thresholding technique selecting a white background for better clarity between the sample and the background. Mittal et al. [3] used a dark background for rice to have a precise segmentation. Therefore, according to the sample color, a suitable color for the background should be selected to perform a better segmentation. Ireri et al. [61] have chosen a black non-reflective background for the red sample.

To subtract the background from the image, different strategies are proposed by researchers, which the segmented techniques in one color space are preferred. The most useful methods of image segmentation are presented following with the description, advantages, and disadvantages: thresholding method, edge-based method, region-based method, traditional segmentation algorithms, watershed method, artificial neural networks, k-means, and Otsu’s thresholding [69].

The thresholding method focuses on looking for peak values based on the histogram of the image to find similar pixels; it does not require complicated pre-processing; simple but many details can get omitted and threshold errors are common. The edge-based method is based on discontinuity detection unlike similarity detection; it is well for images having better contrast between objects but not suitable for noisy images. The region-based method is based on partitioning an image into homogeneous regions; it works really well for images with a considerate amount of noise, can take user markers for fast evaluation, but is time– and memory–consuming. Traditional segmentation algorithms divide an image into k number of homogenous, mutually exclusive clusters – hence obtaining objects. The watershed method is based on topological interpretation of image boundaries, segments obtained are more stable, detected boundaries are distinct, but gradient calculation for ridges is complex. Artificial neural networks are based on deep learning algorithms – convolutional neural networks (CNNs); they are easy to implement, no need for following any complicated algorithms, ready–made libraries available in Python, more practical applications, but training the model for custom and business images is time-consuming and resource costly [69]. When it comes to the unlabeled data, k-means clustering is one of the best ways to be used as a segmentation algorithm. k–means clustering algorithm is an unsupervised algorithm and is used to segment the interest area from the background. Unlabeled data can be explained as a bunch of data that are ungrouped or undefined categories. This algorithm will cluster, or partitions the given data into K–clusters or parts based on the K–centroids. Other clustering algorithms with better features tend to be more expensive. In this case, the k-means clustering algorithm is a great solution for pre-clustering, reducing the space into disjoint smaller sub-spaces where other clustering algorithms can be applied. K–means in the field of implement and run is the simplest method. Otsu thresholding method is one of the simplest and easiest ways to segment an image and divides it into foreground and background. It is a global thresholding technique that uses the histogram of the picture to do the threshold searching process. For object segmentation using global thresholding, limitations will decrease the accuracy of the work; some of the limitations are a large amount of noise, the small mean difference between foreground and background pixels, and also large variances of the pixels that belong to the object and those that belong to the background [70].

Feature Extraction

After the segmentation process, the features of the sample are extracted. These descriptors are usually applied for image recognition; they represent useful information of an image, while the redundant one is left out [20, 71]. These features can be categorized in the areas of color, morphology, and texture. As a crucial step of the defect detection process in fruit and vegetables, feature extraction aims to manage data, while feature selection aims to reduce features to the most important without loss or minimal loss of information [72]. This helps to develop more agile detecting models by selecting the lowest possible number of features resulting in the lowest error and the highest correct classification rate. Color as a most basic feature descriptor is extracted by various techniques comprising RGB histogram, hue histogram, opponent histogram, and transform color distribution [73]. Mean, variance, correlation, entropy, energy, and contrast are the main texture descriptors widely used in MV applications. High efficiency, ease of extraction, and powerful representation of visual content of images are the main advantages of the color features. Some color features are color coherence vector, color histogram, color correlogram, and color moments. Color moments are effective and simple features. Mean, skewness, and standard deviation are the most common moments [68]. Space and Fourier descriptors as well as image moment and angular transform are applied as shape descriptors [74].

Deep learning also can be applied to provide integration of feature extraction resulting in superior performance over conventional IP techniques which subsequently enhances the performance of defect detection systems for fruit and vegetable applications [20]. The method of deep feature extraction based on a deep neural network would be useful when the data is complex or when the training set is of limited size [75]. The conformation of CNNs consists of stages starting from the convolutional layer which detects local connections of features from the previous layer and pooling layers which semantically merges similar features into one [20, 76]. Cen et al. [32] introduced a combination of a stacked sparse auto-encoder (SSAE) with CNN, naming CNN-SSAE system, for learning spectral and spatial features for defect detection in a cucumber, leading to higher accuracy and shorter processing times of detection than that achieved with spectral data alone [32].

Therefore, success in developing algorithms and models for grading and classifying fruits and vegetables is a function of the feature extraction strategies. Due to the tremendous importance of feature extraction in the success of MVS-based detecting models, researchers have emphasized the development of feature extraction techniques [20].

Color Features

In this area, some of the useful color spaces such as RGB, HSV, and LAB are widely used in researches. RGB is a three-dimensional color space consisting of red, green, and blue. It is like a three-dimensional matrix; each matrix represents pixel values in the color space. In RGB color photos, each pixel is a combination of the three primary colors red, green, and blue. The different combinations of these three primary colors create different colors. RGB color spaces are well for capturing, storing, and displaying images, but in terms of IP, information about this color space is incomprehensible and requires complex algorithms to understand. The RGB and HSV color spaces are shown in Fig. 10.

Fig. 10
figure 10

RGB (A) and HSV (B) color spaces [77]

For RGB to HSV conversion, Eq. (4) can be used: 

$$\begin{aligned}&V={\text{max}}={\text{max}}\;(R,\;G,\;B),\;\;\;{\text{min}}={\text{min}}\;(R,\;G,\;B),\\&S=({\text{max}}-{\text{min}})/\;{\text{max}}\;\;\;({\text{or}}\;S=0,\;\;{\text{if}}\;\;V=0),\\&H={60}\times\left\{\begin{array}{l}0+\frac{G-B}{{\text{max}}-{\text{min}}}\;\;{\text{if}}\;\;{\text{max}}=R\\2+\frac{B-R}{{\text{max}}-{\text{min}}}\;\;{\text{if}}\;\;{\text{max}}=G\\4+\frac{R-G}{{\text{max}}-{\text{min}}}\;\;{\text{if}}\;\;{\text{max}}=B\end{array}\right.\\&H=H+360,\;\;\;{\text{if}}\;\;H<0\end{aligned}$$
(4)

HSL (hue, saturation, lightness) and HSV (hue, saturation, value, also known as HSB or hue, saturation, brightness) are alternative representations of the RGB color model. In these models, colors of each hue are arranged in a radial slice, around a central axis of neutral colors that range from black to white. The HSV is a cylindrical color model that remaps the RGB primary colors into dimensions that are easier for humans to understand, the three dimensions of the HSV color model are interdependent. HSL is another cylindrical color model that shares two dimensions with HSV while replacing the value dimension with a lightness dimension. L*A*B* is one of the most widely used color spaces having three dimensions as shown in Fig. 10. L* indicates lightness intensity, which value of 100 gives the white and 0 is black. +a* is the red axis, −a* is the green axis, +b* is the yellow axis, and −b* is the blue axis. Figure 11 represents the CIELAB color space.

Fig. 11
figure 11

Representation of the CIELAB color space [78]

The mentioned color spaces are widely used in defects detection, grading, and other postharvest processing applications. Table 3 shows a summary of studies about the color inspection of fruits and vegetables.

Table 3 A summary of some studies about the color inspection of fruits and vegetables

Morphological Features

Binary images may contain many imperfections. The binary regions extracted by a simple segmentation method like thresholding are distorted by noise. Morphological IP can help to remove these imperfections according to the form and structure of the image. Morphological IP is a collection of non-linear operations related to the shape or morphology of features in an image. Morphological operations rely only on the relative ordering of pixel values, not on their numerical values. Morphological operations by applying a structuring element to an input image will create an output image of the same size, and the value of each pixel in the output image is based on a comparison of the corresponding pixel in the input image with its neighbors.

The morphological features (shape, size, …) are used for classification and grading applications. The shape is a key factor for consumers; fruits and vegetables are expected to have a regular shape without any morphological disorders; the irregular shape will lower the price of the product [92]. Morphological features with their mathematical formula are represented in Table 4.

Table 4 Morphological features with their mathematical formula [93]

With the help of CVS, several features help to measure the shape of the products, namely, Fourier descriptor, invariant moments, boundary encoding, and size-dependent. Boundary encoding describes the shape by using a chain code vector which records the sequence of coordinates of pixels on the boundary. Invariant moments describe the shape by their magnitudes which are invariant under translation, rotation, and also changes in scale. Fourier descriptors describe the shape by taking the Fourier transform of the boundary of the two-dimensional object in the image. Size-dependent shape measurements use one single size parameter or combine two or more different size parameters to form dimensionless expressions for shape description. Due to the irregular shape of the fruits and vegetables, for acquiring better accuracy in the sorting of agricultural products, it is recommended to use more than one feature.

Besides, size is one of the effective factors in the pricing of agricultural products. Size can be measured with some parameters such as area, perimeter, major and minor axes, length, and width, which can be calculated after the segmentation stage. The longest and smallest lines across the object indicate the major and minor axes which are perpendicular. The general basis of the morphological discussion is to transform the image into a structure in which the information displayed is more understandable. It is used in the field of image geometric structure, which includes the size, shape, and distribution of sample particles. The researches indicate that the size-independent method is the most accurate inspection for quality evaluation compared to the size-dependent. Kondo [94] used the maximum length and diameter to measure fruit size. Chen et al. [60] used the large diameter of an ellipse to identify broken rice grains. Also, applying other geometric features such as perimeter, area, and aspect ratio to identify broken grains, one can refer to van Dalen [95] and Zareiforoush et al. [96] researches.

Texture Features

The texture of an image is a set of criteria and gives information about the spatial arrangement of color or its intensity in an image or the desired area in an image. There are also two types of textures: region-based and boundary-based, which are not good criteria for segmentation, but if used in conjunction with other criteria such as color, they can help to have a better segmentation for an image.

The image texture will be divided into four categories: statistical texture which contains gray-level co-occurrence matrix (GLCM), grey-level pixel-run length matrix, and neighboring grey-level dependence matrix which the texture will be obtained based on the orders of the intensity values of pixels across images [97]. Table 5 represents the GLCM equations in the field of texture features.

Table 5 GLCM feature with their mathematical formula [98]

For the transform-based, convolution masks can be used as a small matrix which applied to the image and slide across the entire pixels until the full of an image to be covered; one of the uses of the masks is to determine the edges of the objects inside of an image; therefore, in the statistical methods, convolution mask, Fourier and wavelet transform, and texture can be extracted in corresponding spatial frequency domain images. The other two types of image texture are model-based which contains fractal and autoregression methods, also the structural texture that is rarely used in the field of agriculture. The statistical texture is the most used method in the texture analysis and evaluation of fruits and vegetables due to its low computational cost and high accuracy, transform-based texture, and model-based texture are used in the evaluation and inspection of the external quality of fruits and vegetables [100].

Regarding the texture, Kavdir and Guyer [101] used texture features based on spatial gray-level dependence matrices for processing monochrome images of apples to detect defects and bruises on skin of apples as well as differentiation between stem-calyx methods; they used backpropagation artificial neural network (BPANN), decision tree (DT), and k-nearest neighbor (K-NN). Ramirez-Paredes and Hernandez-Belmonte [4] used local binary pattern (LBP), median ternary pattern (MTP), local phase quantization (LPQ), etc., as texture descriptors, along with color and shape properties to extract barely malting properties.

Defect Detection

In the diagnosis of peel diseases, it should be borne in mind that the entire surface of the product should be examined, and if no damage is identified in the hidden part of the imaging, the product would be considered a healthy product. Accordingly, Bennedsen et al. [102] suggest that multiple images can be obtained from the product rotating by conical rollers so that the entire surface can be imaged and explored.

In citrus, the color of the product is one of the main parameters in the disease diagnosis. Because the color of the affected area on fruit may be known as a healthy area in another fruit, the systems should be chosen to assess the products and to diagnose their defects should be well trained, so choosing a trainable system is essential to examine a wide range of fruits to avoid identification problems [103]. Since the disease usually occupies a smaller area on the surface of a product than the healthy area, Blasco et al. [104] used an MV system that uses the unsupervised growing region algorithm.

As mentioned, defects can be separated into two groups, external and internal defects. For the external defects, the surface of a product and image acquisition procedure are important factors in Defect Detection; the entire surface of the peel should be photographed. But for the internal defects, choosing the correct camera for seeking inside the texture is important to diagnose the defects, because agricultural products are almost all opaque.

Stem-Calyx Problems

As mentioned, a problem in diagnosing diseases is the presence of stem and calyx, which may be diagnosed as defects in products. Rong et al. [45] used a method which after applying median filtering and segmentation, the number of regions is counted,if the number of regions is more than or equal to 2, that orange would be defective, but if it is one region, the border of 450 pixels is considered, and if the area of the region is more than or equal to 450 pixels, that orange would be considered defective. The average accuracy of this study for 8 types of diseases was 97%. Li et al. [105] used the BPANN algorithm with one hidden layer to detect stem and calyx from defective areas with an accuracy of 93%. Ireri et al. [61] used the values of mean g-r in histogram thresholding, the areas related to the presence of stem and calyx were identified with 95% accuracy. Also, in the YCbCr color space, different calyxes in apples were detected by the k-means clustering method [35]. Liming and Yanchao [91] also used the values of mean g-r in thresholding in strawberry fruit.

In the study of Xiao-bo et al. [53], the sample was apple and they paid attention to the number of regions. Since stem and calyx should not be inside a photo, if each photo has two or more detected areas, it is considered a defective apple. This method did not use a special and complicated algorithm and is proper to count the region of interest.

As shown in Fig. 12, the calyx and stem of the tomato samples have been detected and removed; Ireri et al. [61] implemented the segmentation process before classifying to achieve better accuracies,by applying 50 images of tomato, the detection algorithm was developed; they realized that the average value of g-r of the calyx-stalk scar (CS) was greater than the fruit so; by a histogram threshold, the CS has been precisely detected and removed.

Fig. 12
figure 12

Calyx and stalk scar detection and segmentation: the original RGB images in the first row, detected calyx and stalk scar in the middle row, and calyx and stalk scar segmentation in the last row [61]

Defects in Fruits and Vegetables

A useful parameter to obtain more accurate segmentation results is measuring the contrast between defective and sound products. Rong et al. [45] proposed sliding comparison window local segmentation for oranges, which allows the division of defects such as insect injury, wind scarring, thrips scarring, scale infestation, canker spot, dehiscent product, copper burn, and phytotoxicity,they achieved detection accuracy of 97% for defective oranges. After applying a median filter, the region number of an image was measured to discriminate the stem end from a defect area. They set a limitation criteria (450 pixels) to be a boundary between stem end and defective area; once the region number was equal or greater than 2, the pixels would demonstrate the existence of defects; the used method obtained 93.8% accuracy to discriminant stem end.

With the help of the invariant moment ellipse major axis, the broken rice grains were identified with 99% accuracy by the SVM classifier according to their length [60]. The length parameter played an important role in discriminant between sound and broken kernels,furthermore, chalky areas were detected by centroid distance constraint and pixel positioning with 96.3% accuracy. The damaged and spotted areas were identified by edge detection and morphological processing, which resulted in 93.6% accuracy. The average running time of the method for detecting four types of defects at one-time was 0.15 s which shows the great potential for fast and accurate quality inspection for rice.

Ramirez-Paredes and Hernandez-Belmonte [4] proposed a method by combining color and shape with several texture descriptors about malting barley grains,they identified defective samples based on the obtained F1 score which is the number of a harmonic mean of recall and precision; they achieved an accuracy of over 99%. The best texture descriptor was related to LPQ, which achieved 95.65% accuracy with linear SVM and 99.96% accuracy with radial basis function SVM (RBF-SVM). The best performance of this method resulted in the use of a combination of color histogram feature, LPQ texture descriptor, and shape descriptors. Furthermore, SVM-RBF shows significant potential for rapid and accurate algorithm for classification.

In another study by Xiao-bo et al. [53], three-color cameras have been used to photograph the surface of the apples,by rotating the apples using rollers, the entire surface was photographed and nine images were captured from one apple; after the diagnosis, it was concluded that the use of three cameras increases the accuracy to an acceptable level and reduces the error rate from 21.8 to 4.2%. The classification error rate would also be reduced from 15 to 11%. But this method cannot be widely used. Regions of interest were counted for every nine images and because stem end and calyx may not be at least in one image, therefore the sample image which has two or more regions of interest would be considered as a defective sample. In another way, regions of interest are a key factor to detect the defects in apples. One of the concerns in photographing agricultural products is the curvature or sphericity of fruits and vegetables, which makes the reflected light have not a uniform distribution which makes the results to be less accurate [59, 106, 107].

Ireri et al. [61] proposed an RGB imaging system for tomatoes which its stem-calyx were removed using histogram mean g-r value with 95% accuracy to avoid them being identified as a defect,also, the damaged areas in the LAB space were identified with 98.9% accuracy. Using RBF-SVM, the highest accuracy was obtained, which was 97% for the diagnosis of healthy and defective groups. As the number of available groups increases, the accuracy could decrease. As shown in Fig. 13, the histogram indicates that the healthy tomatoes have higher pixel intensity than defective ones. The histogram is a rapid technique to detect the defective regions of products, but it should be considered that all of the conditions such as illumination and a proper camera have to be provided. Improper illumination affects the imaging procedure, and thereby, the incorrect result would be obtained.

Fig. 13
figure 13

Pixel intensity differences distinguishing the background, defective, transition, and healthy tomato regions [61]

Zhang et al. [11] used hyperspectral in the visible-near infrared (Vis–NIR) spectral region to detect mandarin defects. The identification accuracy with 5 different peel conditions (sound and four types of defects) was about 96.63%. The results of this study also showed that multispectral imaging based on band ratio and principal component analysis (PCA) can be used to diagnose defects,two wavelengths have been used as a multispectral imaging system, 680 nm, and 715 nm; also, ratio image and second principal component image were used to detect the defects by using two selected wavelengths; the result demonstrates that multispectral imaging system can be considered as a fast method in grading the mandarins.

Xiong et al. [108] applied a lab hyperspectral imaging system to acquire the hyperspectral images of litchi fruit for micro-mechanical damage detection. They used GLCM to extract the dimensional texture features, including contrast, correlation, energy, homogeneity, variance, mean value, and entropy. The partial least squares discriminant analysis (PLS-DA) method was used for analyzing the hyperspectral data to establish a prediction model. The proposed method showed recall and precision equal to 94.10% and 93.95%, respectively. They established the LS-SVM model based on the dimensional texture features of GLCM representing average accuracy of 95%. The proposed future work focusing on the exploration of improved techniques for application to achieve the on-line damage detection of litchi fruit. It is highly recommended that hyperspectral technology in combination with deep learning techniques can be effectively used to detect mechanical damage of litchi fruit to bring the system accuracy closer to 100%.

Bruising arising from mechanical impact is a very commonly occurring defect in fruits and vegetables. Early detection of bruises is useful for screening applications. The early invisible bruises that are not usually detectable with digital cameras can be identified by hyperspectral imaging. However, the challenge of fast detection of such defects at the early stages of development for automatic applications remains. Nturambirwe et al. [109] proposed a hyperspectral imaging system for detecting latent bruise damage and determining the state of bruise latency using classification learners of hyperspectral imaging data in “Golden Delicious” apples.

Artificial Intelligence in Defect Detection

With the advancement in artificial intelligence and its application in various sections, including agriculture, due to its high accuracy and time savings, this technology can also be used to detect defects of fruits and vegetables. Several studies on defect detection of fruit and vegetables have been performed using artificial intelligence, which have yielded promising results with very high accuracy. Artificial intelligence can be used for classification or regression, data analysis and feature extraction, data clustering, etc., For example, discriminant analysis, K-nearest neighbors, nonlinear regression, neural networks, support vector machine, and decision trees have been widely used for classification and regression applications in supervised learning. In the other hand, principal component analysis, fuzzy c-means, k-means clustering, Gaussian mixture model, and factor analysis have been used for clustering and dimensionality reduction of data in unsupervised learning.

CNN is one of the novel artificial intelligence methods applied in agriculture science which used to detect a wide range of defects for example physiological disorders [33] and mechanical defects [25]. When photos of the product were taken, they would be processed by segmentation algorithms to detect defects. The processed images are considered as the input of the artificial neural network, which would test the available data by back-propagation algorithms and recognizes fruit and vegetable defects with a high percentage of accuracy and less time than human inspectors. CNN as a new formulation of neural networks has been very successful in the field of pattern recognition in images “called as deep learning,” with the help of layers interconnectivity,, and automatic parameter extraction. CNN has different architectures for different applications, which are AlexNet [110], ResNet [111], GoogLeNet [112], VGG Net [113], ZF Net [114], Region-based CNN (R-CNN), and LeNet-5 [115]. A typical CNN architecture including simple convolutional and pooling layers represented in Fig. 14.

Fig. 14
figure 14

The diagram of the convolutional neural networks

Wu et al. [116] used CNN and light backscattering to detect apple defects. The Alex-net model was taught with an 11-layer structure. ReLU function was also used to prevent over-fitting and improve pragmatism. Fan et al. [117] provided 92% accuracy for defective apple samples using CNN, which, after learning, was much more accurate than the SVM method, which analyzed area counts. CNN is one of the deep learning algorithms which is used in classification and recognition applications because of its high accuracy,each input image passes through a series of convolution layers with filters, pooling and fully connected layers (FC) and applies the SoftMax function to classify an object with probabilistic values between 0 and 1. As shown in Fig. 14, the diagram of the CNN model is displayed; an input image is converted to a range of other images using convolution masks which these masks slide over the original image matrixes and create a new image with new matrix values; pooling the new extracted matrix is to be smaller in size for faster processes, until the data would be flatten for classification or recognition using the SoftMax function.

Benefits and Drawbacks

The most important capability of CVSs for the fruit and vegetable industry is implementing the grading and classification processes in a non-destructive, more rapid, and more safe manner than to the conventional methods leading to a reduction in the employment of laborers. However, illumination, dusty condition, positioning, and partial osculation are the main challenging issues. The accuracy of such CVSs is generally high; however, in most cases, 100% accuracy cannot be achieved. In contrast, workers conduct surveys more confidently, and grading and classification of fruit and vegetables are implemented with almost 100% accuracy. But, operator tiredness and distraction can significantly affect the accuracy.

Today, with the development of digital MVSs and ultra-fast computers, the time required for IP has been significantly reduced; it is clear that this technological advancement is very useful for automated machine-based grading which allows these systems to be applied in the continuous and at high-speed mode.

The price of imaging systems such as MRI and HSI equipment is an important factor that should be considered. Also, there are some challenges in integrating the imaging devices with automatic sorting lines. The most important issue is the synchronization of the mechanisms of the screening system and response of CVS because new CVSs require a long time for image acquisition, IP, and decision making [62].

In the development of IP algorithms in the fruit and vegetable grading systems, it is necessary to pay attention to the fact that the procedure of the algorithm should be correctly and accurately designed and developed. If each of the steps: preprocessing, segmentation, feature extraction, and selection, and the artificial intelligence-based models have some bugs, these imperfections unfavorably affect the next steps and ultimately reduce system performance. Therefore, it is better to develop these steps individually, to identify and fix probable bugs, and finally to integrate the overall algorithm.

The major issue is that unlike industrial products, fruit and vegetables are very non-uniform and heterogeneous and are a function of various factors such as climate, cultivation area, variety, and growing conditions, so, the variations and heterogeneities in a particular product are so great that in many cases automatizing the defect detection process by CVSs faces serious challenges. Consequently, the same approach of a typical CVS probably produces different accuracy on the same product. However, real-time defect detection has been recently developed based on CVSs [117,118,119], but there is still a long way to go before their commercial application.

Future Trends

Defect detection in fruit and vegetables is still a difficult process due to the huge variation of defect types [68, 120]. In the future, more research should be conducted to propose solutions to the current challenges in the detection of defects in fruits and vegetables. The main involved challenges are uniform light distribution on the arch surface, stem/calyx recognition, powerful and proper wavelength selection for different products, image acquisition and processing times for spectral image, and defect discrimination. Deep learning as a new approach in the field of IP attracts the attention of many researchers, which may be one of the future works for defect detection tasks in fruit and vegetables. In future works, applying the deep learning concept in combination with MVSs helps to overcome many of the problematic issues in detecting defects of fruit and vegetables. The development of low-price CVSs based on hyperspectral and multispectral imaging at lower prices and miniaturization of such equipment while maintaining performance can help them to be commercialized and used in the industry. The research on the integration of CVSs with automatic screening lines is also helpful. Scanning the entire surface of the product is essential to accurately detect defects. Fruits and vegetables are usually close to elliptical, spherical, and conical in shape, leaving half of the surface hidden from the camera. A rotating device would be helpful to provide a thorough scanning to take several images from different angles.

In many cases of research, a single background is considered to have an easy and rapid separation of fruit or vegetables from the background. In the industry, the presence of dust, noise, etc., affects the performance of these simplified laboratory systems. Therefore, future research should expand the training dataset comprising other aspects, such as various backgrounds and catching angles for light during scanning. Future works can focus on improving the generalization ability of models by implementing data augmentation methods to new data and on applying cutting edges classification methods such as deep learning techniques to constitute a more compact pipeline for image processing and classification applications [109].

Conclusion

This study provides basic definitions of MV and IP, and their use in the agricultural industry to assess product quality, defect detection, grading, and classification applications. The different types of CVSs and the scope of their use are discussed, and then, the accuracy of these types of systems is compared with the methods selected by researchers for different products. The CVSs including T-CVS, H-CVS, and M-CVS have been widely used in the quality evaluation and inspection of agricultural products which can provide a non-destructive, rapid, and accurate assessment.

We have found that CVSs plays a significant role to have a great potential in meeting the challenges related to the defect detection of fruit and vegetables. CVSs are scientific and powerful tools for the accurate and rapid automatic external quality inspection of fruits and vegetables. Also, with the advancement of technology, traditional methods are not very efficient today and should be replaced with novel methods such as deep learning or spectral images, because these new methods have higher accuracy and speed than the traditional ones. We have given a comprehensive survey on the technical and technological aspects of different systems employed for defect detection of fruit and vegetables; we also discuss different types of defects present in fruit and vegetables, and explore in detail the challenges and issues related to each of the types of defects. Most of the challenges are because agricultural products do not have the same external, internal, and physicomechanical properties even in one type of fruit or vegetable, which severely affect the performance of algorithms and defect detection systems.

Proposing novel methods of defect detection in the fruit and vegetable sector should be considered from two main aspects: the technological development of instruments with new capabilities being able to overcome the existing challenges and developing data processing and analysis algorithms. Establishing a connection between these two aspects can solve many challenges. However, the development of high-speed computer systems capable of storing large amounts of information should not be overlooked.

Online defect detection is one of the most desirable applications of nondestructive techniques towards grading and sorting fruit and vegetables. However, MVSs are still somewhat ineffective for internal quality and defects detecting applications. To promote the performance of such a system, other emerging technologies, as well as hyperspectral imaging, are introduced having desirable capabilities for exploring internal defects in fruits and vegetables. But these systems are slow, and further efforts are needed to develop high-speed algorithms to match this emerging technology to the industrial online grading applications. However, further research on advanced learning algorithms with reduced IP time improved feature selection, and robust preprocessing techniques are essential. Although to develop online systems, applying a proper learning algorithm is a prerequisite for successful defect detection. Therefore, standardizing and providing a framework for machine learning techniques that have been developed so far for different products would be very helpful in industrializing online defect detection systems.

However, the most successful technology is MVS for nondestructive rapid defect detection in the fruit and vegetable industry, but it faces challenges in identifying internal defects that require further research. For surface defects, MVS in combination with deep learning represents successful results by reducing the required time of IP and a more effective feature extraction procedure. However, more research is essential for the improvement of this method to apply to all products. The development of new strategies in the innovative use of machine vision-based systems would be very effective in the development and improvement of these systems. One strategy may be that by applying high-speed computers and the possibility of storing big data, a comprehensive database of images from all defects in fruits and vegetables would be made to be used for deep learning models which can be applied in automatic grading applications.