1 Introduction

Among several species of fruits grown worldwide, there are 7500 known cultivations of an apple tree. Worldwide production of apples in 2017 was 77.3 million tones, with India accounting for 2.3 million tons of the total. Apple grading is done mainly by human investigators which leads to misclassifications. For many years, the food industry has been incorporating manual inspection which is inconsistent, laborious, and expensive. The assurance of properties like sweetness and firmness is done by scientific techniques generally used for agricultural products which are destructive and time-taking [1, 2]. Thus, intelligent, rapid and non-destructive techniques are required to grade apple [3]. Jackman et al. [4] provide an alternative solution for Computer vision systems in agriculture food quality evaluation and control, which replaces manual inspection. Hyperspectral imaging commonly used in food technology and science needs spatial and chemical information simultaneously [5, 6]. However, using a small number of wavelength [7], multispectral imaging is inexpensive, simple and rapid. Yu et al. [8] presented a novel edge-based active contour model for medical image segmentation which guarantees the stability of the evolution curve and the accuracy of the numerical computation. Chen et al. [9] proposed a novel matting method based on full feature coverage sampling and accelerated traveling strategy to get good samples for robust sample-pair selection. Zhang et al. [10] presented a new deep learning-based method for removing haze from a single input image by estimating a transmission map via a joint estimation of clear image detail. Singh and Singh [11] presented a novel technique to grade apples by using different features like the histogram of oriented gradients, Law’s Texture Energy, Gray level co-occurrence matrix and Tamura features.

Apple variety is classified as bi-colored (e.g. Fuji, Jonagold) and mono-colored skin (e.g. Granny Smith, Golden Delicious). For grading of mono-colored skin apples, work that utilized ordinary machine vision, Leemans et al. [12] proposed a color model-based to segment the defects on pixels locally of ‘Golden Delicious’ apples. Rennick et al. [13] use color information to extract intensity statistics features from enhanced monochrome images of ‘Granny Smith’ apples for defect detection. Blasco et al. [14] presented a system to sort apples by applying threshold segmentation on the defected area and attains an 85.00% recognition rate. Xing et al. [15] used second and third principal components with moment thresholding to identify the presence of noises and achieve 86.00% accuracy. Suresha et al. [16] used a red and green color component for classification and found 100% accuracy. Dubey and Jalal [17] proposed to use k-means clustering with a multi-class support vector machine to accurately detect the apple fruit diseases and achieve 95.94% accuracy. Ashok and Vinod [18] used a probabilistic neural network approach for detecting the healthy and defected apple and achieved 86.52% accuracy. Raihana and Sudha [19] presented a modified watershed segmentation method to detect the defected apple fruit using gray level covariance matrix based feature extraction and achieve 91.33% accuracy. Ali and Thai [20] proposed a prototype of an automated fruit grading system to detect the defect of apple fruit. Moallem et al. [21] proposed a computer vision algorithm to grade the ‘Golden Delicious’ apple using a multilayer perceptron neural network and found 92.50% accuracy. Jawale and Deshmukh [22] proposed an automatic evaluation of apple fruit disease using thermal camera and image processing bruises detection system.

The techniques using bicolored skin apples for grading are outlined here. Wen and Tao [23] proposed a rule-based decision for a single spectral system to sort ‘Red-Delicious’ apples with 85–90% accuracy. Leemans et al. [24] proposed a method based on Bayesian classification process for defect detection. Leemans et al. [25] presented multi-layer perceptron and quadratic-discriminant analysis for classification with 72.00% and 78.00% recognition rate for bi and mono-colored apples. Kleynen et al. [26] proposed a filter using quadratic discriminant analysis for detecting a wide range of defects using a multispectral vision system and achieve 90.80% accuracy. Leemans and Destian [27] proposed a quadratic discriminant classifier to detect defects in apples and achieved a 73.00% recognition rate. Unay and Gosselin [28] proposed an artificial neural network to eliminate false segmentation and achieved a 75.00% recognition rate. Kleynen et al. [29] incorporated Bayes Theorem using a multispectral vision for the detection of defects in apple and achieve 90.00% accuracy. Kavdir and Guyer [30] proposed back-propagation to grade apples and achieve 84.00–89.00% accuracy. Kleynen et al. [31] presented a multi-spectral vision system to sort apple by the linear discriminant classifier and attains 90.00% accuracy. Unay and Gosselin [32] introduced pixel-wise processing to grade apple by an artificial neural network with 90.00% accuracy. Xiaobo and Jiewen [33] use an electronic nose system and near-infrared machine vision system. Zou et al. [34] introduced multiple color cameras for scanning the apple surface by thresholding with 96.0% accuracy. Unay et al. [35] used minimal confusion matrix to extract features for classification with 93.50% accuracy. Bhargava and Bansal [36] reviewed various techniques for fruits and vegetables quality evaluation.

In the present work, the authors presented distinct methods that differ at quality categories taken, apple varieties tested, particular equipment used, and imaging technique employed. As a result of this review, finding a commonly relevant basis to the bi-colored and mono-colored group, the used grading algorithm compare these works and concludes as followed: “Quality grading of apple fruit by machine vision is a burdensome task due to the variance of the problem. Thus, the research for a robust, generic and accurate grading system that works for all apple variations while respecting all forms of standards is still in progress” [35]. This paper introduces a fruit grading system that segments the defected part by fuzzy c-means segmentation, then using defect segmentation several features from the defective skin are selected/extracted and lastly, statistical classifiers are incorporated for classifying apples into a corresponding quality category (healthy or defective).

2 Methodology

In this work, firstly segmentation is done rigorously on the defected part of the apple and then apple is sorted based on quality (healthy or defected). The defected segmentation area comprises specific discrimination of defects corresponding to image space. Figure 1. shows the proposed methodology.

Fig. 1
figure 1

The basic workflow for grading of apple fruit

2.1 Image acquisition

Our proposed algorithm uses four databases. The first database consists of ‘Jonagold’ apples with bandwidths of 80, 40, 80 and 50 nm centered at 450, 500, 750 and 800 nm respectively from a high-resolution camera with 8 bit per pixel resolution (created by Unay and Gosselin [32]). 1120 of the fruits were normal while 984 of them include various defects like a bruise, russet, hail damage, scald, rot, etc. as shown in Fig. 2. The second database consists of 247 defective apples like blotch, rot, scab and 86 normal images that are downloaded from google images [37] by entering the keywords “healthy apple” and “apple + disease name” for defective apple. The third database is ‘Golden Delicious’ apples (created by Blasco et al. [14]) which contain 100 images (74 healthy and 26 defective) acquired by EOS 550D digital camera with a resolution of .03 mm per pixel The fourth dataset consist of 112 images (56 healthy and 56 defectives) of 40 apples taken at different angles with 4000 × 3000 pixels from Redmi note 5 mobile (created by author). Dataset used by our algorithm consists of the following characteristics as shown in Table 1. Figure 3 shows some samples of dataset images used.

Fig. 2
figure 2

Sample of defected image as a Bruise b Russet c Scald d Rot

Table 1 Characteristics of the dataset used
Fig. 3
figure 3

Sample of dataset images used as Healthy and Defective for a Jonagold Apples [32] b Google Images [37] c Golden delicious apple [14] d Created by author

2.2 Pre-processing and automatic defect segmentation

Image acquired by camera contains various noise which degrades the appearance of the image, hence it cannot provide sufficient data for the processing of the image. The enhancement of the image is therefore done by adjusting the image intensity value or color map. European commission marketing standard [38] for apples defines a category for quality which also requires defect information. Therefore, it is necessary to specifically segment the defect which is ambiguous because of size, type, texture, and color variations. In this work, images are taken on a black background and observations acknowledge that defected parts can be segregated from the background easily by thresholding, Fuzzy c-means clustering. Figure 4 depicts three samples from the database along with segmentation. The first four columns present images from different filters (Jonagold Apple) while the last one shows corresponding manual segmentation, row displays apples damaged by different defect types. Fuzzy c-means (FCM) [39] with different membership grades employs fuzzy partitioning such that a data point can belong to all groups. The aim is to minimize dissimilarity function for cluster centroids.

Fig. 4
figure 4

Examples of apple images and their segmentation

According to the membership matrix (U) initialized for fuzzy partitioning [39],

$$ {\sum}_{i=1}^c{u}_{ij}=1,\forall j=1,\dots \dots .,n $$
(1)

The dissimilarity function is given by,

$$ J\left(U,{c}_1,{c}_2,\dots \dots \dots .,{c}_c\right)={\sum}_{i=1}^c{J}_i={\sum}_{i=1}^c{\sum}_{j=1}^n{u}_{ij}^m{d}_{ij}^2 $$
(2)

where uijis the membership (either 0 or 1) of jth neighbor to the ith class.

  • ci is the centroid of cluster i.

  • c is a number of clusters.

  • dij is Euclidean distance between data point j and centroid i.

The conditions to reach minimum dissimilarity function are [39].

$$ {c}_i=\frac{\sum_{j=1}^n{u}_{ij}^m{x}_j}{\sum_{j=1}^n{u}_{ij}^m} $$
(3)
$$ {u}_{ij}=\frac{1}{\sum_{k=1}^c{\frac{d_{ij}}{d_{kj}}}^{\raisebox{1ex}{$2$}\!\left/ \!\raisebox{-1ex}{$\left(m-1\right)$}\right.}} $$
(4)

2.3 Feature extraction

Segmentation results in definite unconnected objects with different sizes and shapes. Each object handles separately or together for the correct decision of fruit. These segmented areas are further used for feature extraction summarized in Table 2.

Table 2 Feature Extracted

2.3.1 Statistical and textural features

The statistical features are the probability of the first-order measure observing the random gray pixels values that include mean, standard deviation, variance, smoothness, inverse difference moment, RMS, skewness, and kurtosis. Table 3 list the corresponding indexes and formulae. These features do not take contingent relations of gray values into account. The textural features are the probability of second-order measure as pixel pairs that include energy, contrast, entropy, correlation, and homogeneity.

Table 3 A measure of statistical and textural feature

Pattern recognition widely uses geometric moment’s textural features. Another prominent group of textural features includes gray-level co-occurrence matrices (GLCM) [40, 41] which shows a number of occurrences for gray level pairs as a square matrix. The GLCM features, inverse difference moment (IDM) is related to smoothness while variance and contrast are an assessment of local variations. A GLCM is defined as a matrix (M X N) gray level image I, parameterized by an offset [42], defined as:

$$ {C}_{\Delta x,\Delta y}\left(i,j\right)={\sum}_{p=1}^N.{\sum}_{q=1}^M\left\{\begin{array}{c}\ 1,\kern0.5em if\ I\left(p,q\right)=i\ and\ I\left(p+\Delta x,q+\Delta y\right)=j\\ {}0,\kern0.5em otherwise\end{array}\right. $$
(5)

where I(p, q), the gray level of the image I at pixel (p, q).

In this work, Unser’s texture features are chosen because the addition (a) and subtraction (s) of two variables with the same variances are not related. Hence a and s histograms for texture [43] are used. The addition and subtraction with relative displacement for the non-normalized image I is defined as:

$$ a\left(p,q;{\tau}_1{\tau}_2\right)=I\left(p,q\right)+I\left(p+{\tau}_1,q+{\tau}_2\right) $$
(6)
$$ s\left(p,q;{\tau}_1{\tau}_2\right)=I\left(p,q\right)-I\left(p+{\tau}_1,q+{\tau}_2\right) $$
(7)

The addition and subtraction histograms for domain N are defined as:

$$ {h}_a\left(r,{\tau}_1{\tau}_2\right)=\mathit{\operatorname{card}}\left(\left(p,q\right)\in N,a\ \left(p,q;{\tau}_1{\tau}_2\right)=r\right) $$
(8)
$$ {h}_s\ \left(t,{\tau}_1{\tau}_2\right)=\mathit{\operatorname{card}}\Big(\left(p,q\right)\in N,s\ \left(p,q;{\tau}_1{\tau}_2\right)=t $$
(9)

2.3.2 Geometrical features

Also, features can be extracted for recognition depending on the geometry of the object. Nonetheless, defects of apple cannot have peculiar characteristics. Hence, we inked the geometric features that include solidity, area, and a maximum length of the area, eccentricity, and perimeter listed in Table 4.

Table 4 Geomtrical features based on the shape

The extraction of geometrical features can be done using the following steps:

  • Step 1: Extract “area” feature precisely in the object.

  • Step 2: Form a convex hull by the “Graham Scan method” [44].

  • Step 3: Form an ellipse to extract “minor length”, “major length” and “eccentricity” features (the second moment for both must be same)

2.3.3 Gabor wavelet feature

Gabor wavelet is invented by Dennis Gabor using the complex function to minimize the product of its standard deviation in time and frequency domain [45]. Mathematically,

$$ f(x)={e}^{-{\left(x- xo\right)}^2/{a}^2}{e}^{- ik0\left(x-x0\right)} $$
(10)

The Fourier transform of Gabor Wavelet is also a Gabor Wavelet given by:

$$ F(k)={e}^{-{\left(k- ko\right)}^2/{a}^2}{e}^{- ix0\left(k-k0\right)} $$
(11)

2.3.4 Discrete cosine transform (DCT)

DCT is a powerful transform to extract proper features. After applying DCT to the entire image, some of the coefficients are selected to construct feature vectors. Most of the conventional approaches select coefficients in a zigzag manner or by zonal masking. The low-frequency coefficients are discarded in order to compensate illumination variations [46].

This work uses several combinations of these features. An overview of the approach used in the present work is outlined in Fig. 5.

Fig. 5
figure 5

Different combination of extracted features

2.4 Grading

The essential feature for fruit grading is classification. By using a sufficient number of training samples, the grading is done, by using the following classifiers.

Nearest Neighbor Classifier (k-NN) - k-NN measures the closeness of samples using a distant metric and assign a value to the more appropriate category within its closest k-neighbors. KNN is a statistical classifier that focus on the proximity of samples measured by Euclidean distance to measure the distance between points in input data and trained data [47]. It assigns data to the most represented category within its closest k-neighbors.

Algorithm

  1. 1.

    Firstly select k number of neighbors.

  2. 2.

    Apply Euclidean distance and select k nearest neighbor of the new point.

$$ ED=\sqrt{\sum_{I=1}^N{\left({q}_i-{p}_i\right)}^2} $$
(12)

where q = (q1,q2…….,qn) and p = (p1,p2…….,pn) are the points in Euclidean space.

  1. 3.

    In each category, a number of data points are calculated.

  2. 4.

    Finally, to find more points, a new point is chosen.

Figure 6 demonstrates the working of k-NN

Fig. 6
figure 6

kNN institution for k = 5 [49]

Support Vector Machine (SVM) - SVM is used for grading purpose which is formerly proposed for 2-class problems. SVM is a supervised learning method that is based on the minimization procedure of structural risk [48]. It calculates the hyperplane which separates the classes with a maximum margin for binary values. To prevent biasing of sample order before being introduced to the classifier, samples are randomly ordered. Assuming two training of hollow and solid dots, H is hyperplane optimized, H1 & H2 (support vectors) are samples whose distance is minimum as shown in Fig. 7.

Fig. 7
figure 7

Linearly separable 2D hyperplane

The linear regression is defined by

$$ f(x)=\mathit{\operatorname{sgn}}\left({\sum}_{i=1}^N{a}_i^{\ast }{y}_i+{b}^{\ast}\right) $$
(13)

where ai is Lagrange multipler, b is the threshold, yi is either 1 or − 1, which indicates the class to which point belongs.

Sparse Representation Classifier (SRC) - SRC is a non-parametric and prediction based learning method that allocates a label to a test sample using the SRC dictionary (training samples). The basic block diagram of SRC is as shown in Fig. 8. Unlike k-NN, SRC does not require training in its classification process. SRC is first introduced [50] in which a dictionary is constructed from training samples for signal classification purpose represented by

$$ {y}_{mX1}={D}_{mX n}{x}_{nX1} $$
(14)

where y is input signal, D is a dictionary and x is a sparse representation.

Fig. 8
figure 8

Basic block diagram of SRC

The SRC uses sparse representation to solve the minimization problem as follows:

$$ \mathit{\min}\left\Vert x\right\Vert \kern1.25em s.t.\kern0.5em y= Dx $$
(15)

Above equation is also known as non-deterministic polynomial hard problem [50].

The above classifiers are selected based on architectural complexity. k-NN is a popular classifier that made its decision based on the closeness of training samples using a distant matric. SVM is a very popular classifier that has proven its efficacy in various classification problems. SRC is a non-parametric and prediction based learning method that allocates a label to a test sample using the SRC dictionary (training samples). In this work, MathWorks Inc., LIB-SVM [51] and adaptation of Quinlan’s [52] works are employed for k-NN, SVM classifiers whereas SRC is implemented by authors. After several trials, optimum parameters for each classifier were formed as k = 5 for k-NN, a kernel with γ = 10 and C = 80 for SVM.

2.5 Evaluation

The classification is evaluated by k-fold (k = 10) cross-validation process. Here, K complementary subsets are partitioned from the dataset from which training is done for k-1 subset and validation is done by one subset. The whole process repeats K times by using every subset once for validation. The mean value of the results is computed for the final estimation. Figure 9 shows the simplified diagram of a 10 –fold cross-validation process.

Fig. 9
figure 9

A tenfold cross-validation

The prediction performances of the classifier use the following measures tested in this study: accuracy, sensitivity, and specificity. Accuracy is calculated as all correct overall (true positive & negative) classifications. Sensitivity is also known as recall or true positive rate and is the probability of detection of undesirable objects which is correctly classified. Specificity is known as the true negative rate and is the probability of detecting the sound kernel correctly.

$$ Accuracy\ \left(\%\right)=\frac{True\ positives+ True\ negatives}{Total}\ X\ 100\% $$
(14)
$$ Sensitivity\ \left(\%\right)=\frac{True\ positives}{Total\ defect}\ X\ 100\% $$
(15)
$$ Specificity\ \left(\%\right)=\frac{True\ negatives}{Total\ sound}\ X\ 100\% $$
(16)

3 Experimental results

European commission marketing standard [38] for apples defines one reject and three acceptable quality category. Nonetheless, wide literature consists of healthy/defective grading due to the adversity of the collection of databases and sorting processes. Hence, to compare with the reviewed literature, two-category sorting was introduced. It is observed from several trials that two-category grading Haralick features from GLCM matrices degrade the accuracy. The fruit grading is performed with each classifier (k-NN, SRC and SVM) first using Statistical/ Textural with Gabor Wavelet & DCT (15 features) and Geometrical features with Gabor wavelet & DCT (16 features) separately and then all features (31 features) combined together. A total dataset of apples is 2104 (1120 healthy, 984 defectives) for Set 1, 333(86 healthy, 247 defectives) for Set 2, 100 (74 healthy, 26 defectives) for Set 3 and 112 (56 healthy, 56 defective) for Set 4. Note that, each feature set selected, consist of a combination of Statistical/ Textural, Geometrical, Gabor wavelet, and DCT features.

We implement fruit sorting using different classifiers (k-NN, SRC, SVM) to analyze different classification capacity and to confirm which kind of classifier is better for the apple fruit classification using the different feature set taken one at a time. Highest recognition rate achieved with SVM for Statistical/ Textural with Gabor wavelet & DCT (15 features) is 78.37% (Jonagold Apple), 84.00% (Google Dataset), 88.54% (Golden Apple) and 82.57% (Self-Created). As the accuracy is unacceptable, so different combination of feature (Geometrical features with Gabor Wavelet & DCT (16 features)) are considered which results in 74.15% (Jonagold Apple), 83.91% (Google Dataset), 83.76% (Golden Apple) and 79.64% (self-created). These recognition rates are still unsatisfactory. Statistically, as examined, when all the features are taken together (Statistical/ Textural, Geometrical, Gabor Wavelet, DCT (31 features)) highest recognition rate achieved is 95.21% (Jonagold Apple), 93.41% (Google Dataset), 92.64% (Golden Apple) and 87.91% (Self-Created) respectively with accuracy as shown in Table 5. Note that, feature set selected with each classifier are presented in Table 2. It can be found that the SVM classifier is superior to all other classifiers and is best for apple recognition. It can also be observed that by increasing the number of features accuracy is also increasing which is comparable with the literature survey.

Table 5 Various classifiers based fruit quality grading results for different database

An attempt has been made to study the impact of segmentation and it has been found that the segmentation technique employed in the proposed system accurately segments 93.24% images. Further, if we consider only correctly segmented images for training and testing the accuracy of the proposed system to determine defect increase to 96.02% from 95.21%. However, there is a marginal increase but it indicates with improved segmentation techniques the accuracy of the proposed system may increase further.

Moreover, not only increasing the number of features but also the selection for the combination of several features is necessary to identify the deficiency efficiently. Taking an increasing number of combinations of features above 30, the accuracy rate is observed to be decreasing. Therefore, the proposed combination of features is effective for two-category apple grading.

Figure 10 displays the results obtained. In each plot, k-NN, SRC and SVM are taken along the x-axis and the recognition rate obtained using the test images is represented on the y-axis. In each plot, it can be observed that the performance of all the combined features is far better than the performance of a small set of features. Sensitivity and specificity give an indication of how well the sound kernels were classified. As classification accuracy only indicates the presence of errors, one may prefer to describe the model in terms of sensitivity and specificity to better describe the model. The lower sensitivity and specificity indicate the classification errors and the higher values indicate the perfectly classified classes. The False Positive Rate (FPR) and False Negative Rate (FNR) were used to measure the errors done by the method. The minimum and maximum values of FPR are 8.01% and 41.67% and of FNR are 11.51% and 44.34% respectively.

Fig. 10
figure 10

The accuracy rate for a different database with different classifiers

A significant contribution to fruit sorting was done by Unay et al. [34] and Moallem et al. [21]. The fruit database employed in said work was identical as used by authors and summarized in Table 6. In Unay’s and Moallem’s work, Jonagold apple [32] and golden delicious apple [14] database were used which predicts results with an accuracy of 85.60% and 92.50% respectively. The obtained accuracy in the present work of the proposed system is comparable with Moallem et al. [21]. However, our results show an increase of 3% from 92.50% to 95.21% is encouraging and satisfactory. Comparative analysis of the proposed system with other existing techniques show improved and better results with four different datasets. Hence, our approach contributes to improved recognition with cascaded features and a suitable classifier. The accuracy obtained by the proposed system is better as compared to the available system. Nevertheless, the system performance can be improved by taking more combinations of features.

Table 6 Comparative Analysis for grading of apple fruit

3.1 Proposed methods comparison

The sorting of apple fruit results manifests that by direct and cascading features high recognition rate can be achieved. Table 7 presents a specified contrast to these methods. We perceive that cascaded features mainly out-perform individual features in terms of user’s and producer’s predictions of each category, overall accuracy, and actual error. Kappa statistics are cogent with marketing standard for apples [38] defined by three quality categories: Extra, Class I and Class II with corresponding tolerances of 5%, 10% and 10% as observed on user’s accuracy. It also reveals that the author method will not fit in these tolerance ranges because the statistics ignore the up-graded fruits and downgraded-ones, whereas the user’s accuracy considers both.

Table 7 Maximum accuracy of the proposed system with different classifiers (in percentage)

Concerning the computational complexities, individual features as geometrical or statistical are a relatively simple while, cascading features are more efficient. In conclusion, the individual features are less reliable while the combination of features is more accurate and reliable. Therefore, the decision depends on how powerful the user wants a computer vision-based apple inspection system.

3.2 Practical implementation

In order to cope up with the industry, the inspection system has to process at least 10 apples/s. Moreover, to inspect the whole surface of the fruit, different images must be obtained. The proposed method requires an Intel Pentium IV Processor (1.5GHz) with 256 M memory with a computation time of the order 3 s/view. However, the computational time can be mitigated using optimized software, efficient hardware and various inspection systems simultaneously. Currently, statistical and geometrical features are used. Employing more feasible features e.g. local binary patterns can also be tested in terms of inspection accuracy and computational cost.

In the present work, experimental assessment is done on single-view images but can be extended for multiple-view images as per the needs of the industry.

4 Conclusion

In this paper, a fully automatic sorting system for mono and bi-colored apples is proposed. The area of fruit is extracted from the background and is segmented by the Fuzzy c-means clustering algorithm. After segmentation, multiple features are extracted which are fed to the binary (healthy or defective) classifier. The apple fruit sorting results show that the combination of statistical, geometrical, Gabor and Fourier features are more accurate than individual features. The maximum accuracy of 95.21% with 31 features and the SVM classifier is encouraging and provides a reliable estimation. Results showed that the proposed system with the SVM classifier has better performance as compared to SRC, k-NN classifier.

Further, the system performance can be further improved by considering a large number of apple images, different segmentation techniques, more significant features and a combination of classifiers techniques. In the future, a more generalized and robust system with improved performance may be worked upon. The idea of the proposed system may be extended to other multimedia data such as social media [55], video data [56, 57] and graphics data [58, 59].