Abstract
Application of artificial intelligence methods in agriculture is gaining research attention with focus on improving planting, harvesting, post-harvesting, etc. Fruit quality recognition is crucial for farmers during harvesting and sorting, for food retailers for quality monitoring, and for consumers for freshness evaluation, etc. However, there is a lack of multi-fruit datasets to support real-time fruit quality evaluation. To address this gap, we present a new dataset of fruit images aimed at evaluating fruit freshness, which addresses the lack of multi-fruit datasets for real-time fruit quality evaluation. The dataset contains images of 11 fruits categorized into three freshness classes, and five well-known deep learning models (ShuffleNet, SqueezeNet, EfficientNet, ResNet18, and MobileNet-V2) were adopted as baseline models for fruit quality recognition using the dataset. The study provides a benchmark dataset for the classification task, which could improve research endeavors in the field of fruit quality recognition. The dataset is systematically organized and annotated, making it suitable for testing the performance of state-of-the-art methods and new learning classifiers. The research community in the fields of computer vision, machine learning, and pattern recognition could benefit from this dataset by applying it to various research tasks such as fruit classification and fruit quality recognition. The study achieved impressive results with the best classifier being ResNet-18 with an overall best performance of 99.8% for accuracy. The study also identified limitations, such as the small size of the dataset, and proposed future work to improve deep learning techniques for fruit quality classification tasks.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Studies are being done to find creative solutions to lessen food waste, which has become a matter of concern in recent years. It has been labelled as a significant issue for the long-term viability of the food supply, demand, and production chains [8]. Meals have always been in great demand since they are the primary source of nutrition for all living things. The artificial intelligence (AI) approaches can be used to decrease food waste. These smart quality control systems monitor any environmental factors that can affect food safety [40].
The food sector places a lot of importance on quality control. It oversees sorting and grading fruits and vegetables effectively, so that they may be used in the manufacturing process. By automating the quality inspection process using computer vision techniques, the production process may be made more efficient overall and human laborers might be given more crucial jobs [4]. One of the most crucial and time-consuming phases in the production of fruit products such as jams is grading and sorting the fruits according to their freshness. In essence, a jam production firm would rely heavily on this phase to determine the overall quality of the finished product. To do this, the freshness level must be assessed using measuring techniques and physical characteristics or visual appearance such as color, stiffness/firmness of texture, size, skin gloss, perfect shape, etc. [11, 39]. These characteristics enables the customers to purchase fruits and vegetables with the top level of freshness thanks to quantitative analysis, making the experience more pleasant and healthful. Customer satisfaction is anticipated to rise in this fashion. In addition, the grocery might adjust the price of the fruits and vegetables that are about to go bad to stop them from spoiling in the shop, which benefits the food store, but also lessens waste [13].
Fruits are an important diet or food product needed for human balanced nutrition consisting of essential nutrients like vitamins, minerals, sugars, organic acids, etc. [45]. In smart agriculture, the automatic recognition of fruit quality is an essential player in increasing efficiency in production [5], minimizing sorting time [21], and reducing human interventions [33]. In the agricultural industry there is still a growing need for effective and efficient recognition/ classification of fruits products or vegetables based on their quality level [42].
Fruit quality recognition is needed to ensure that quality fruits products can be identified, sorted and preserved thereby grossly preventing food wastage and other losses such as economical and environment [24]. Scarcity of datasets in fruit quality recognition, especially for multi-fruit quality evaluation, is one of the major factors affecting productivity [5], while existing datasets are mostly dedicated for specific and single fruit quality detection. A recent review [45] on the application of the AI methods in fruits, vegetables and mushroom quality assessment emphasized the role of creating public datasets with the aim of standardizing state-of-the-art methods and improving overall comparison. On this note, the need to develop standardized methods that can identify different fruit products and automatically sort them based on the quality criteria is very important.
Conventional quality assessment approaches are typically destructive and off-line in nature. These days, agricultural and food goods including fruit and vegetables are subject to applications of computer vision techniques for quality and safety evaluation and monitoring [21, 27]. Efforts have been made to develop a variety of non-contact, quick, accurate, and environmentally friendly methods for non-invasive examination of various food products, including fruits and vegetables, meat and meat products, fish, poultry, dairy products, eggs, among others [14, 16]. In the agricultural sector, classifying rotting and fresh fruits is a crucial duty and a significant issue since, if done incorrectly, decaying fruits can spread disease and destroy new crops. To save labour expenses associated with rejecting rotting fruits at the manufacturing stage, it is necessary to precisely determine the fruit’s freshness, especially for the harvesting robots that select only fresh fruits [23].
The development of distant and proximal sensing technologies has been greatly aided by the deployment of machine learning models, which span from traditional classification and regression methods to cutting-edge deep neural networks and transfer learning [25]. Precision agriculture is one of these crucial application areas [38]. These proximate sensing technologies have been used for the detection, identification, and measurement of plants and harvest because to developments in machine learning algorithms used in computer vision [30]. The application of modern information and communication technologies enables the promotion of qualitatively and quantitatively sustainable actions. Food processing may be aided by the deployment of proximity sensors in the field of operational technologies for food quality control [22]. By assisting in monitoring and treatment operations, mobile and robotic apps are providing answers for the digital innovation processes. AI must be integrated into these systems to help the operator make meaningful decisions about the actual status of a plant’s vigour [28] and its fruit ripeness [10].
Pattern recognition and image processing are combined in computer vision. It is a non-destructive technique that enables the analysis and extraction of an image’s characteristics to facilitate classification. It is also acknowledged as a helpful tool for extracting measurements of external features like size, shape, color, and flaws for a variety of applications in the food industries, including assessing the stages of apple ripeness [6], assessing the quality of table grapes [7], mango-fruits recognition [47], evaluating freshness of parsley [48], banana [34], and identifying and recognizing plant diseases [1, 2, 17].
In this paper, we present a new dataset for tracking fruit freshness. We conduct an analysis of publicly available image datasets created for fruit quality evaluation. We concentrate on datasets made available in open format via sites for data exchange. The creation and distribution of publicly accessible datasets enables academics to focus more time and resources on the unbiased assessment and comparison of algorithms. This study aims to fill the gap left by the absence of multi-fruit image datasets in this field.
The novelty of this study can be outlined point-by-point as follows:
-
The study presents a new multi-fruit dataset of fruit images aimed for fruit freshness evaluation. This is a novel contribution as there is a lack of multi-fruit datasets to support real-time fruit quality evaluation.
-
The dataset contains the images of 11 fruits (banana, cucumber, grape, kaki, papaya, peach, avocado, pepper, strawberry, tomato, and watermelon) categorized into three freshness classes (fresh, mildly rotten, fully rotten). This is a novel approach as previous studies typically focused on a single type of fruit.
-
The study adopted five well-known deep learning models (ShuffleNet, SqueezeNet, EfficientNet, ResNet18, and MobileNet-V2) as baseline models for fruit quality recognition using their proposed dataset. This is a novel approach as it provides a comparison of the performance of these models on the new dataset.
-
Validity of trained models: The study tested the models trained on their dataset on the benchmark FruitNet dataset and found that MobileNet-V2 outperformed other models with an accuracy of 81.5%. This is a novel finding as it demonstrates the validity of the trained models on a different dataset.
The original contributions of this study are as follows:
-
We have created a new multi-fruit dataset for fruit freshness evaluation, filling a gap in the research field.
-
We have demonstrated the suitability of their dataset for training deep learning models, achieving high accuracy rates with ResNet18.
-
We have provided a benchmark dataset for fruit quality detection and classification, which can be used as a standard for testing performance of state-of-the-art methods and new learning classifiers.
-
We have valuated the performance of baseline CNN models and identified the best one for fruit quality recognition.
-
We have shared their dataset publicly, increasing openness and reproducibility of results, which can benefit the research community in various fields such as computer vision, machine learning, and pattern recognition.
The remainder of the essay is structured as follows. Section 2 discusses the qualities of the publicly accessible image plant fruit and leaf collections. Section 3 presents the new dataset, and Section 4 presents its exploratory analysis. Section 5 discusses the deep learning models used as baseline classifiers for fruit classification and the evaluation of their performance using the proposed dataset. Section 6 presents and compares and presents the classification results using the proposed dataset and the external FruitNet dataset. Section 7 discusses our results, while Section 8 concludes this paper.
2 Related works and analysis of existing datasets
Recently several datasets of plant fruits and leaves were introduced for various tasks including plant disease recognition and quality evaluation.
Fenu & Malloci [15] introduced the DiaMOS Plant field dataset, made up of 3505 photos of pears with four different diseases, which was gathered to monitor and identify plant problems. Additionally, they conduct a comparative analysis of the datasets used in the literature that are intended for the classification and identification of leaf diseases, emphasizing the elements that increase the utility and informational value of the gathered data.
Medhi & Deb [29] offers a dataset of images showing several banana plant kinds and the diseases that affect them. Bacterial Soft Rot, Banana Fruit Scarring Beetle, Black Sigatoka, Yellow Sigatoka, Panama disease, Banana Aphids, and Pseudo-Stem Weevil are the diseases and pathogens that they have taken into consideration here. A potassium deficiency dataset has also been considered. The collection contains more than 8000 images.
Meshram & Patil, K. [31] created an image dataset of high-quality Indian fruits that are widely consumed or exported. As a result, we created a dataset using six fruits: apple, banana, guava, lime, orange, and pomegranate. The dataset is divided into three folders: (1) Good quality fruits, (2) Bad quality fruits, and (3) Mixed quality fruits, with six fruits subfolders in each. The dataset contains over 19,500 images in processed format.
Rajbongshi et al. [35] shows a dataset of guava photos that includes both leaves and fruit images. These images are categorized into six classes: Phytophthora, Scab, Styler end Rot, and Disease-free Fruit for guava fruits, and Red Rust and Disease-Free Leave for guava leaves. This dataset is primarily intended for researchers who use deep learning, machine learning, and computer vision to create a system that can identify the guava disease and help guava farmers in their farming.
In Rauf et al. [36], citrus fruits, leaves, and stem are shown in an image dataset. Images of healthy and diseased plants with illnesses including Black spot, Canker, Scab, Greening, and Melanose are included in the dataset, along with images of citrus fruits and foliage. The dataset is intended for researchers who create computer applications to assist farmers in the early diagnosis of plant diseases using machine learning and computer vision methods.
Hughes & Salathe [19] presented a plant village dataset which consist of 54,309 image samples for healthy and diseased leaf. The images cut across 14 different crop species such as soybean, potato, corn, apple, cherry, tomato, grape, peach, blueberry, raspberry, squash, strawberry, bell pepper, and orange. The diseased images include viral disease, fungal diseases, mold diseases, disease resulting from mite and bacterial diseases while the healthy category shows twelve plant species Table 1.
These research papers present various image datasets of plants for disease recognition and freshness monitoring. These datasets cover a range of plant species, including pears, bananas, Indian fruits, guavas, and citrus fruits, with images of healthy plants and those with various diseases. The datasets aim to assist researchers in developing computer vision and machine learning techniques to identify diseases and monitor plant health. Such methods can assist farmers in the early diagnosis of plant diseases, leading to more effective treatment and better crop yields.
3 Proposed dataset
This study presents a new FruitQ-dataset to identify and monitor fruit freshness and level of rottenness. The dataset was composed from videos collected manually from YouTube (Google, San Bruno, California, United States) video platform with the intention of creating a diverse fruit quality dataset. The selected YouTube videos cover eleven varieties of fruits and the description of each fruit timelapses with duration of videos and their links is summarized in Table 2. This work aims to identify and classify fruit quality; the classes in the dataset were manually annotated according to the quality of freshness, such as Fresh, Mild, and Rotten. This dataset is suitable for performing machine and deep learning methods in classification and detection tasks.
A total of 9421 images were collected, which includes 3010 fresh class, 2376 mild (mildly rotten) class and 4035 rotten class fruit images, respectively. A detailed summary is given in Table 3.
The process for creating the dataset and the evaluation process is depicted in Fig. 1. The initial step is pre-processing the videos by first applying de-watermarking to filter/remove extra visuals such as text and image icons watermarks from the videos.
Afterwards, a MATLAB (MathWorks, Inc., Natick, Massachusetts, USA) program was written to extract image frames from each YouTube video with the algorithm described in Table 4. Using the image frame extraction algorithm each fruit image frame was saved and annotated appropriately. Further pre-processing was done to resize the images from the original size of 1280 × 720 pixels to 224× 224 pixels.
We have generated two variants of the dataset containing the same set images, but with different arrangement as explained:
-
The FruQ-Multi dataset as depicted in Table 5 with the details of image frames per fruit type against the number of classes. The dataset can be used for developing and training new methods and models for fruit-dependent freshness evaluation and classification.
-
The FruQ-DB is the dataset that combines all fruits types, and it is categorized into three classes as fresh, mild, and rotten. The dataset can be used for developing and training new methods and models for fruit-independent freshness evaluation and classification.
The sample images of pre-processed FruQ-DB images based on their different classes rotten, mild and fresh are shown in Figs. 2a-c, 3 and 4, respectively. Table 6 illustrates the number of fruit object instances and their distribution for each target class.
4 Exploratory analysis of dataset
In this section we present the results of the exploratory data analysis (EDA) of the proposed dataset. We follow the recommendations presented in [26] as guidelines and our workplan as follows:
-
1.
Gain as much insight into the dataset as possible by analyzing its structure;
-
2.
Visualize potential relationships (direction and magnitude) between independent variables (i.e., image features) and outcome variables (i.e., target class);
-
3.
Identify outliers and anomalies (values that differ significantly from the rest of the observations);
-
4.
Identify relevant (i.e., important) features.
-
5.
Create target (i.e., classification) models (a predictive or explanatory model).
4.1 Analysis of the structural composition of the dataset
Figures 5 and 6 shows a visual representation of the structural relationships between the kinds of fruits and the image freshness labels in the proposed FruitQ dataset. We used the Alluvial flow diagram (Fig. 5) and radial plots (Fig. 6). The visual inspection shows that the distribution of data is imbalanced.
For further analysis we use the imbalance ratio (IR) [50], which is a common metric to characterize the imbalance ratio of a dataset. IR is defined as:
here Nmaj is the sample size of the largest majority class and Nmin is the sample size of the smallest minority class. Based on Eq. (1), the imbalance ratio is 1.7 (the largest class is “Rotten”, and the smallest class is “Mild”) according to freshness, and 7.8 (the largest class is “Tomato” and the smallest class is “Strawberry”) according to fruit kind. The imbalance in “Fresh” category is 12.5, in “Mild” – 20.5, and in “Rotten” – 9.8. The largest imbalance according to fruit kind is in “Pepper” category – 27,5, whereas as the smallest imbalance is in “Grape” category – 1.5. Summarizing, the dataset is highly imbalanced, especially in the “Mild” and “Pepper” categories. These imbalances are also clearly seen in Fig. 6.
4.2 Visualization of relationships between image features and target class
We consider the mean values of image pixels in RGB (Red-Green-Blue) and HSV (Hue-Saturation-Value) color spaces. Further comparison of images in the RGB and HSV color spaces by color and by freshness category is presented in Fig. 7. Most noticeable differences were observed in the HSB space, where the values of hue, saturation and value tended to increase as the fruits were decaying. Differences also were observed in the RGB space, where the intensity of red, green and blue colors tended to decrease as the fruits were decaying.
4.3 Statistical analysis of image features
The significant differences in feature distributions must be taken into consideration. We performed the statistical analysis results using the p value of a two-sided Wilcoxon rank sum test. The tests the null hypothesis that data in independent variable and dependent variable are samples from continuous distributions with equal medians, against the alternative that they are not. The test assumes that the samples are independent.
In the RGB color space, we have found the following statistically significant differences between the values of:
-
red color of fresh and mildly damaged fruits (p < 0.05), fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001);
-
green color of fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001);
-
blue color of fresh and mildly damaged fruits (p < 0.001), fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001).
-
Note that in all cases, the difference between fresh and rotten categories is more significant than the different between fresh and mildly damaged fruits.
-
In the HSV color space, we found statistically significant differences between the:
-
hue of fresh and mildly damaged fruits (p < 0.001), fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001).
-
saturation of fresh and mildly damaged fruits (p < 0.001), fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001).
-
value of fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001).
4.4 Analysis of distribution for anomaly and outlier detection
The outlier observations and skewed distributions, which will bias the results in the direction of their skew, can have a significant impact on classification results thus degrading the classifier’s ability to learn from the data [50]. The distribution of the color (red, green, blue) values in the two-dimensional RGB color space is shown in Figs. 8, 9 and 10, while the distribution of the HSV color space values is shown in Figs. 11, 12 and 13.
The fruit kind-based analysis using a two-sided Wilcoxon rank sum test, showed that the difference between freshness categories is highly significant (p < 0.001), except for:
-
tomato in green color between fresh and mild categories (not significant).
-
pepper in red color and value between fresh and mild categories (p < 0.05).
-
grape in red, green, saturation and value between mild and rotten categories (not significant).
-
pepper in red color and in value (HSV) between mild and rotten categories (p < 0.05).
-
tomato in green color between mild and rotten categories (not significant).
The outlier analysis involves calculation the ratio of outliers in a particular dataset or subset of the dataset. Usually, outliers are defined as differing by more than three standard deviations from the mean as follows:
here \({r}_o^f\) is the outlier ratio according to some feature f, Nc is the total number of instances in the class, and μf, σf are the mean and standard deviation of feature f values.
Another approach is the distance sum-based outlier [46] definition considered the sum of distances from a point to its k-nearest neighbors as its outlier degree, and the n points with the largest sums are determined as outliers, which can be denoted as O(k, n).
The parameters k and n can be selected freely, however here both parameters in determining nearest neighbors and outliers are set to 10, as recommended by [46].
The results of outlier analysis are presented in Fig. 14. It shows the outliers in the “Rotten” category of each fruit in the proposed dataset. From visual inspection one can note that all these fruits are highly decayed.
5 Baseline classification and performance evaluation
5.1 Classification
We have adopted five deep learning classifiers, namely, MobileNet-V2 [37], SqueezeNet [20], ShuffleNet [49], EfficientNet [43], and ResNet18 [18], for classification of fruit images according to the level of freshness or rottenness. Our choice of these classifiers is based on their better performance, reduced computational complexity, and good generalization skills especially in classification [9]. MobileNet-V2 was designed to be used in lightweight, low-delay systems such as in Internet-of-Things and computer vision applications in many fields. SqueezeNet is a lightweight model with fewer structural parameters and fewer calculations, and its structure and classification accuracy meet real-time requirements. ShuffleNet is an effective CNN primarily designed to serve for applications that demand low computational capability. EfficientNet is accurate, lightweight and robust in various image classification tasks. The residual block structure in ResNet-18 network may allow the model to learn deeper characteristics of images.
The image input size for the five baseline models were configured with different sizes as MobileNetV2, ResNet18, EfficientNet and ShuffleNet have the best input size of 224 × 224 pixels, while for SqueezeNet the input size is 227 × 227 pixels. The number of images were increased using the data augmentation technique in the FruitQ dataset and augmentation techniques adopted include, random rotation, scaling, flipping, shearing, and translation. The experimental results show promising results for the deep learning models with better performance as presented in Section 6.
5.2 Performance evaluation
The evaluation of the performance of every round of the baseline models uses eight metrics as presented in Eq. (1–7). This performance metrics was analysed for each dataset (FruitQ dataset and FruitNet dataset). The following are the metrics, their description and mathematical expression.
Precision (PRC): also referred to as Positive predictive value, is the ratio of the correctly predicted positive samples to the total number of all predicted positive samples.
Negative Predictive Value (NPV) is the ratio of the correctly predicted negative samples to the total number of all predicted negative sample.
Recall (RCL) is the number of positive samples that is predicted as positive. It is the ratio of correctly predicted positive samples to the total number of all positive samples.
F1-Score is the measure of the harmonic mean of precision and recall.
Accuracy (ACC), is the number of all correctly predicted positive and negative samples to the total number of samples
Balanced Accuracy (BAC) is used to provide a better evaluation in case of imbalanced data and it can be defined as:
where c ∈ [0, 1] is a penalization cost of misclassifying the positive sample.
Cohen’s Kappa Coefficient (KAP) is used to estimate or measure the agreement between two samples. Kappa indicates the level of agreement and reliability and therefore, making a fair measure of model performance as presented in the expression:
6 Results
The experiments in this paper were carried out on a computer with Windows 10 operating 157 system, configured with a 64-bit 2.20GHZ, Core i5 CPU, 8GB of memory, 1 TB HDD. The deep learning classification framework was implemented, and the experiments were performed on MATLAB R2022a (MathWorks, Inc.). The training and testing times on the FruitQ-DB dataset using ShuffleNet, SqueezeNet, EfficientNet, ResNet18, and MobileNet deep learning models are summarized in Table 7.
6.1 Multi-class baseline experiment with fruitq-datasets
This section presents a multiclass classification analysis of the FruitQ-datasets using five state-of -the-art DL models. The FruitQ-dataset was divided randomly into ratio 80:20 where 80% of the FruitQ datasets was used for training, and 20% for validation. In addition, we also applied K-Fold cross validation method on each model at K = 5 and the summary of the baseline models is summarized in Table 8.
Our baseline CNN models were trained using the Adam optimizer which is the optimization process with an initial learning rate adjusted from {1e−3, …, 1e−5}, 20 epoch size, mini-batch size of 256. To reduce overfitting of the training model, we applied L2 norm parameters with value 1e−4 and dropout rate of 50%.
The best experimental performance results were obtained using ResNet18 model with a validation accuracy of 99.8%, 99.4% for SqueezeNet, 99.1% for EfficientNet, 96.8%, for ShuffleNet, and 96.3% for MobileNet.
From Table 8, we can see that the experimental result with ResNet18 model achieved better recall rate of 100%, each for fresh, mild, and rotten fruits respectively. In addition, the precision rate for fresh and rotten class is 100% while mild class is 99.8%. The model with the least performance is the MobileNet with an accuracy of 96.3% while the recall rate of rotten class is 98.5%, precision 97.85%, specificity 96.57%, and balance accuracy rate of 96.18%. A graphical representation of experimental results is depicted in Fig. 15a-d where Fig. 15a shows that ResNet18 outperforms other baseline models with an accuracy of 99.8%, and the least accuracy was achieved by MobileNet with 96.3%. Figure 14b shows that the rotten class was best recall for all models with EfficientNet and resnet18 models at 100% rate for both the mild and rotten classes. However, the MobileNet-V2 has the least recall rate in comparison with other baseline models.
The confusion matrices of the model are further presented in Fig. 16 a-d showing the number of misclassified samples in each class. The number of misclassification samples varies between the output class of fresh vs. mild or mild vs. rotten for all models however, there is no misclassified samples for fresh vs. rotten, i.e., we can say the classification rate of fresh vs rotten is a 100% for all the baseline models. The performance of ResNet 18 shows the best results for all the performance metrics but at a more computationally intensive in terms of execution time and memory complexity, finetuning some parameters and applying dropout to reduce the number of unnecessary deep layers also reduced effectively the computational tine and improved the detection rate.
6.2 Binary classification using fruitnet datasets on our baseline models
To further validate the results obtained, we applied another dataset “FruitNet” dataset from Vishal et al. [32] for testing our models. This dataset consists of six types of fruits with a total of 12,000 images comprising of six fruits and two classes: “good” and “bad”. For this study, we renamed the classes as “Fresh” and “Rotten” for effective nomenclature of data classes. On this note, our test samples comprise of 12,000 samples divided into fresh and rotten class with 6000 and 6000, respectively.
In this study, we performed a binary classification task and we utilized two classes in the FruitQ-datasets, i.e., the Fresh and Rotten classes to train the baseline models. For training, we conducted a series of experiments using the learning rate set as 0.001, and to reduce the cost function, we applied Adam optimizer. For each experiment, the deep learning models were fine-tuned to enhance the performance accuracy for the FruitNet dataset images used for testing. To reduce overfitting of our baseline models, we finetuned some of the parameters such as lowering the learning rate where appropriate for specific baseline model till we arrive at an optimal results as learning rate was lowered to 1e−4, learn rate drop factor 2e−4 to 1e−3, max epoch at 10, minibatch size is 256. The classification results for the test datasets (FruitNet) on the five learning models MobileNetv2, SqueezeNet, ShuffleNet, EfficientNet, and ResNet18 after training each models with the FruitQ datasets are presented in Table 9.
As seen from Table 8, the performance of the CNN classifier relatively needs more improvement for real time fruits classification. The MobileNet classifier outperformed the rest of the models with accuracy rate of 81.5%, recall rate of 76.2%, NPV rate of 78.47%, and F1-score rate of 80.45%. Specifically, compared with other CNN models, MobileNet accuracy rate increased by ↑2.6%, ↑5.8%, ↑3.9%, ↑3.5% for ShuffleNet, SqueezeNet, EfficientNet and ResNet18 models, respectively. In addition, the recall rate increased in comparison with ShuffleNet by ↑5.9%, SqueezeNet by ↑15.53%, EfficientNet by ↑15.57% and ResNet18 by ↑10.97. The precision rate of MobileNet reduced in comparison with SqueezeNet by ↓1.46%, EfficientNet by ↓10.85, and ResNet18 by ↓2.31. Therefore, on average the performance metrics improved by more than 2.6% accuracy on both fresh and rotten fruit classification. The comparison results of the fresh and rotten fruits classification experiments of the deep learning models for accuracy, recall, precision, F1-score and NPV is presented in Fig. 17a-e.
Furthermore, we used a T-distributed neighbor embedding (t-SNE) algorithm [44] which is a data visualization tool to effectively visualize the classification results of the MobileNetv2 model by mapping the high dimensional cluster for the two classes. Previous studies have shown that t-SNE can be used for reducing dimensionality, visualizing high-dimensional data sets and demonstrating data samples distribution [3]. Fig. 18a shows the results of mapping the fresh and rotten fruits classes based on the FruitQ-datasets on MobileNetv2 network. From the perspective of FruitQ-datasets, the two unique classes are separated into two distinct clusters (fresh, rotten). We examined that the clusters in the FruitQ-datasets shows higher score for both cluster-1 and cluster-2 fruit groups. Fig. 18b depicts the results of MobileNetv2 model on the mapped FruitNet datasets.
7 Discussion
The data are essential for those developing smart systems in agriculture and food engineering, especially in fruit quality recognition. This study provides a comprehensive analysis of a new multi-fruit dataset for fruit quality evaluation and demonstrates the suitability of the dataset for training deep learning models. The results of the study could help improve the accuracy and efficiency of fruit quality recognition systems and contribute to the advancement of the agricultural sector. The FruitQ-DB dataset provides a benchmark dataset for the classification task, which could improve research endeavors in the field of fruit quality recognition. It could be used as a standard benchmark dataset for testing the performance of state-of-the-art methods and new learning classifiers, as it is systematically organized and annotated. The study highlights the best methods for dataset construction, which could improve the completeness and representativeness of the dataset for further research efforts. The FruitQ-DB dataset is publicly available, which could increase openness and reproducibility of the results. The research community in the fields of computer vision, machine learning, and pattern recognition could benefit from the FruitQ-DB dataset by applying it in various research tasks such as fruit classification, fruit quality recognition, analysis, and comparison of deep/machine learning models and techniques.
Current datasets have some constraints, particularly those related to dataset size, representativeness, completeness, and performance baseline, which are discussed below as follows:
-
Dataset size: The number of illness types and sample sizes in the present databases are their biggest drawbacks. The “healthy” class has few samples in the dataset. In real applications, the model does not generalize well due to an imbalance of classes. This indicates and validates that even while the need for larger datasets is acknowledged, the work is difficult due to the manual labour and associated costs, which are made worse by the fact that few occurrences can be located for some classes. Data augmentation, transfer learning, and fine tuning can help to solve this issue.
-
Representativeness: Data collection in controlled lab settings is the foundation of the acquisition protocol that is used the most frequently. The location and method of gathering both have an impact on how representative the dataset is. The range of variability that can be detected in the field cannot be accurately reflected by controlled circumstances. When trained on laboratory datasets, algorithms frequently attain near-perfect accuracy, but when trained on field datasets, performance suffers greatly. Few datasets also considered how symptoms changed over the course of a full growing season. More work should go into identifying symptoms at the beginning of an emergency. Digital tools are necessary at this point to take prompt action to halt the spread of the disease.
-
Completeness is referred to as “the level of breadth, depth, and appropriateness of a datum according to its purpose” [41]. Even though certain datasets are well-built, in some instances, we discovered the ground truth labels still suffers from some level of completeness. Usability would increase if segmentation masked, and bounding boxes were present.
-
Performance baseline: Having a performance baseline available can aid in the creation and approval of new methodologies.
Moreover, there are some of the challenges experienced by the baseline classifiers when testing with the FruitNet dataset. The FruitNet dataset is a noisy dataset with noisy backgrounds images therefore, the need to clean/ remove noisy backgrounds from datasets will majorly contribute and improve accuracy of learning models.
While the FruitQ dataset contributes to the field of fruit quality recognition, it is important to acknowledge its limitations and consider them in future research:
-
Limited fruit variety: While the FruitQ dataset contains 11 types of fruits, it may not be representative of all fruits or fruit varieties. Therefore, the dataset may not be suitable for evaluating fruit quality recognition systems for fruits that are not included in the dataset.
-
Limited freshness classes: The FruitQ dataset only categorizes fruits into three freshness classes (fresh, mildly rotten, and fully rotten), which may not capture the nuances of fruit quality in real-world scenarios. There may be other freshness categories, such as ripe but not yet overripe, that are important for fruit quality evaluation.
-
Lack of real-world data: The FruitQ dataset only contains images of fruits taken under controlled conditions in a laboratory setting. The dataset does not include images of fruits taken under real-world conditions, such as in a grocery store or at a farm. Therefore, the dataset may not reflect the variability in fruit quality that occurs in real-world scenarios.
-
Limited deep learning models: While the authors applied five well-known deep learning models to the FruitQ dataset, there may be other deep learning models that could perform better for fruit quality recognition. Therefore, the evaluation of deep learning models on the FruitQ dataset may not be exhaustive.
-
Lack of external evaluation: While the authors tested the trained models on the benchmark FruitNet dataset, they did not evaluate the models on any other external datasets. Therefore, the generalizability of the models to other datasets may be limited.
8 Conclusions
We provide a benchmark dataset with the aim of providing a baseline for the classification task. The purpose of this dataset is to enhance future research endeavours to improve recognition of fruit quality in the real-world systems thereby ensuring overall artificial intelligence capabilities. A statistical analysis was carried out to evaluate the images features in the FruQ-DB dataset. A comparison of the performances of well-known deep learning classifiers with improved results and less computationally-intensive architectures was applied to train models and test on the datasets.
We presented existing fruits images publicly available dataset in the literature, and the accessible websites are also included. In addition to releasing the dataset simultaneously, we also reviewed the datasets that have been used in the literature to classify and identify fruit quality and freshness. The analysis that was undertaken has emphasized the best methods for data set construction, effecting the information content that the data can communicate, as well as their usefulness in describing the environment from which they were collected or observed. When creating the suggested dataset, several factors were considered. To improve the dataset’s completeness and representativeness for further efforts, we intend to increase it.
The main usefulness of the FruQ-DB dataset is for the research community for the following reasons: it is a dataset for fruit quality detection or classification that would improve research endeavours in the field of specific fruits or varieties of fruit quality recognition. Secondly, this dataset could be used in achieving a standard benchmark dataset for testing performance of the state-of-the-art methods and new learning classifiers as it is systematically organized and annotated. We evaluated the performance of some baseline CNN models such as ShuffleNet, SqueezeNet, EfficientNet, ResNet-18 and MobileNet on the FruitQ dataset and the results are very impressing with best classifier as ResNet-18 with an overall best performance as 99.8% for accuracy, 99.4% for SqueezeNet, 99.1% for EfficientNet, 96.8%, for ShuffleNet, and 96.3% for MobileNet. This FruitQ dataset is publicly available and shared to increase openness and reproductivity of results. The research community in the fields of computer vision, machine learning, pattern recognition could also benefit from these data by applying them in various research tasks such as: fruit classification, fruit quality recognition, analysis and comparison of deep/machine learning models and techniques.
For future work, we intend to improve optimally the classification performance of deep learning techniques for fruit quality classification task. In addition, we plan apply segmentation and data augmentation methods to increase diversity in data and improve data generalization. Further study will be considering reducing of overfitting and presenting robust deep learning technique for effective training and enhancing fruit quality detection; expanding the dataset by adding more fruits or additional images of the existing fruits, as well as collecting images from different regions or seasons to increase the variability of the dataset. In addition to image data, future work could explore the use of other types of data such as texture, color, or spectral information for fruit quality recognition.
Data availability
The dataset used in this study is available online: FruQ- DB (Version v1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.7224690 (accessed on 17 October 2022).
References
Abayomi-Alli OO, Damaševičius R, Misra S, Maskeliūnas R (2021) Cassava disease recognition from low-quality images using enhanced data augmentation model and deep learning. Expert Syst 38(7):10.1111/exsy.12746
Almadhor A, Rauf HT, Lali MIU, Damaševičius R, Alouffi B, Alharbi A (2021) Ai-driven framework for recognition of guava plant diseases through machine learning from dslr camera sensor based high resolution imagery. Sensors 21(11):10.3390/s21113830
Anowar F, Sadaoui S, Selim B (2021). Conceptual and empirical comparison of dimensionality reduction algorithms (PCA, KPCA, LDA, MDS, SVD, LLE, ISOMAP, LE, ICA, t-SNE). Computer Science Review, 40 https://doi.org/10.1016/j.cosrev.2021.100378
Bhargava A, Bansal A (2021) Fruits and vegetables quality evaluation using computer vision: A review. J King Saud Univ - Comput Inform Sci 33(3):243–257. https://doi.org/10.1016/j.jksuci.2018.06.002
Bird JJ, Barnes CM, Manso LJ, Ekárt A, Faria DR (2022) Fruit quality and defect image classification with conditional GAN data augmentation. Sci Hortic 293:110684
Cárdenas-Pérez S, Chanona-Pérez J, Méndez-Méndez JV, Calderón-Domínguez G, López-Santiago R, Perea-Flores MJ, Arzate-Vázquez I (2017) Evaluation of the ripening stages of apple (Golden Delicious) by means of computer vision system. Biosyst Eng 159:46–58. https://doi.org/10.1016/j.biosystemseng.2017.04.009
Cavallo D, Pietro Cefola M, Pace B, Logrieco AF, Attolico G (2019) Non-destructive and contactless quality evaluation of table grapes by a computer vision system. Comput Electron Agric 156(2018):558–564. https://doi.org/10.1016/j.compag.2018.12.019
Chauhan C, Dhir A, Akram MU, Salo J (2021). Food loss and waste in food supply chains. A systematic literature review and framework development approach. J Clean Prod, 295, 126438. https://doi.org/10.1016/j.jclepro.2021.126438
Chen L, Li S, Bai Q, Yang J, Jiang S, Miao Y (2021) Review of Image Classification Algorithms Based on Convolutional Neural Networks. Remote Sens 13:4712. https://doi.org/10.3390/rs13224712
Cho WH, Kim SK, Na MH, Na IS (2021) Fruit ripeness prediction based on DNN feature induction from sparse dataset. Comput, Mater Continua 69(3):4003–4024. https://doi.org/10.32604/cmc.2021.018758
Civille GV, Oftedal KN (2012) Sensory evaluation techniques—Make “good for you” taste “good”. Physiol Behav 107(4):598–605
Das S, Datta S, Chaudhuri BB (2018) Handling data irregularities in classification: Foundations, trends, and future challenges. Pattern Recogn 81:674–693. https://doi.org/10.1016/j.patcog.2018.03.008
Elbir Z, Caferoglu BA, Cihan O (2022) Freshness Grading of Agricultural Products Using Artificial Intelligence. In: Khan M, Khan R, Praveen P (eds) Artificial Intelligence Applications in Agriculture and Food Quality Improvement (pp. 29–54). IGI Global. https://doi.org/10.4018/978-1-6684-5141-0.ch003
Fahad LG, Tahir SF, Rasheed U, Saqib H, Hassan M, Alquhayz H (2022) Fruits and vegetables freshness categorization using deep learning. Comput, Mater Continua 71(2):5083–5098. https://doi.org/10.32604/cmc.2022.023357
Fenu G, Malloci FM (2021) DiaMOS plant: A dataset for diagnosis and monitoring plant disease. Agronomy 11(11):10.3390/agronomy11112107
Fu Y, Nguyen M, Yan WQ (2022) Grading methods for fruit freshness based on deep learning. SN Computer. Science 3(4):10.1007/s42979-022-01152-7
Geetharamani G, Pandian A (2019) Identification of plant leaf diseases using a nine-layer deep convolutional neural network. Comput Electr Eng 76:323–338
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30, pp. 770–778
Hughes D, Salathé M (2015). An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv preprint arXiv:1511.08060.
Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) SqueezeNet: Alex-Net-level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size. arXiv 2019, arXiv:1602.07360
Ismail N, Malik OA (2022) Real-time visual inspection system for grading fruits using computer vision and deep learning techniques. Inform Proc Agricult 9(1):24–37. https://doi.org/10.1016/j.inpa.2021.01.005
Javaid M, Haleem A, Rab S, Pratap Singh R, Suman R (2021). Sensors for daily life: A review. In Sensors International (Vol. 2, p. 100121). Elsevier BV. https://doi.org/10.1016/j.sintl.2021.100121
Kang J, Gwak J (2022) Ensemble of multi-task deep convolutional neural networks using transfer learning for fruit freshness classification. Multimed Tools Appl 81(16):22355–22377. https://doi.org/10.1007/s11042-021-11282-4
Kaur P, Harnal S, Gautam V, Singh MP, Singh SP (2022). An approach for characterization of infected area in tomato leaf disease based on deep learning and object detection technique. Eng Appl Artif Intell, 115, 105210.
Kazi A, Panda SP (2022) Determining the freshness of fruits in the food industry by image classification using transfer learning. Multimed Tools Appl 81(6):7611–7624. https://doi.org/10.1007/s11042-022-12150-5
Komorowski M, Marshall DC, Salciccioli JD, Crutain Y (2016) Exploratory Data Analysis. In: Secondary Analysis of Electronic Health Records. Springer, Cham (CH)
Ma J, Sun D-W, Qu J-H, Liu D, Pu H, Gao W-H, Zeng X-A (2014). Applications of Computer Vision for Assessing Quality of Agri-food Products: A Review of Recent Research Advances. In Crit Rev Food Sci Nutr (Vol. 56, Issue 1, pp. 113–127). https://doi.org/10.1080/10408398.2013.873885
Mavani NR, Ali JM, Othman S, Hussain MA, Hashim H, Rahman NA (2022) Application of Artificial Intelligence in Food Industry—a Guideline. Food Eng Rev 14(1):134–175. https://doi.org/10.1007/s12393-021-09290-z
Medhi E, Deb N (2022). PSFD-musa: A dataset of banana plant, stem, fruit, leaf, and disease. Data in Brief, 43 https://doi.org/10.1016/j.dib.2022.108427
Melki P, Bombrun L, Millet E, Diallo B, ElChaoui ElGhor H, Da Costa J-P (2022) Exploratory Analysis on Pixelwise Image Segmentation Metrics with an Application in Proximal Sensing. Remote Sens 14:996
Meshram V, Patil K (2022). FruitNet: Indian fruits image dataset with quality for machine learning applications. Data in Brief, 40 https://doi.org/10.1016/j.dib.2021.107686
Meshram V, Thanomliang K, Ruangkan S, Chumchu P, Patil K (2020), "FruitsGB: Top Indian Fruits with quality", IEE.
Nemade SB, Sonavane SP (2020). Co-occurrence patterns-based fruit quality detection for hierarchical fruit image annotation. Journal of King Saud University-Computer and Information Sciences
Ni J, Gao J, Deng, L, Han Z (2020). Monitoring the change process of banana freshness by GoogLeNet. IEEE Access, https://doi.org/10.1109/ACCESS.2020.3045394
Rajbongshi A, Sazzad S, Shakil R, Akter B, Sara U (2022). A comprehensive guava leaves and fruits dataset for guava disease recognition. Data in Brief, 42 https://doi.org/10.1016/j.dib.2022.108174
Rauf HT, Saleem BA, Lali MIU, Khan MA, Sharif M, Bukhari SAC (2019). A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning. Data in Brief, 26 https://doi.org/10.1016/j.dib.2019.104340
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNet V2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, pp. 18–22
Sharma A, Jain A, Gupta P, Chowdary V (2021) Machine Learning Applications for Precision Agriculture: A Comprehensive Review. IEEE Access 9:4843–4873
Sherafati A, Mollazade K, Saba MK, Vesali F (2022) TomatoScan: An Android-based application for quality evaluation and ripening determination of tomato fruit. Comput Electron Agric 200:107214
Sonwani E, Bansal U, Alroobaea R, Baqasah AM; Hedabou M (2022). An Artificial Intelligence Approach Toward Food Spoilage Detection and Analysis. Frontiers in Public Health (Vol. 9). https://doi.org/10.3389/fpubh.2021.816226
Strong DM, Lee YW, Wang RY, (1997) Data quality in context. Commun ACM , 40, 103–110.
Suryawanshi Y, Patil K, Chumchu P (2022). VegNet: Dataset of vegetable quality images for machine learning applications. Data in Brief, 108657.
Tan M, Le QV (2020) EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2020, arXiv:1905.11946.
Van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9(11)
Wieme J, Mollazade K, Malounas I, Zude-Sasse M, Zhao M, Gowen A, … J. (2022) Application of hyperspectral imaging systems and artificial intelligence for quality assessment of fruit, vegetables and mushrooms: A review. Biosyst Eng 222:156–176
Xu H, Mao R, Liao H, Zhang H, Lu M, Chen G (2016) Index Based Hidden Outlier Detection in Metric Space. Sci Program 2016:1–14. https://doi.org/10.1155/2016/8048246
Yang J, Luo X, Zhang X, Passos D, Xie L, Rao X, ..., Ying L, (2022). A deep learning approach to improving spectral analysis of fruit quality under interseason variation. Food Control, 109108.
Zarnaq MH, Omid M, Firouz MS, Jafarian M, Bazyar P (2022) Freshness and quality assessment of parsley using image processing and artificial intelligence techniques. CIGR J 24(2):282–290
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23, pp. 6848–6856.
Zhu R, Guo Y, Xue J-H (2020) Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recogn Lett 133:217–223. https://doi.org/10.1016/j.patrec.2020.03.004
Funding
This research received no external funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Abayomi-Alli, O.O., Damaševičius, R., Misra, S. et al. FruitQ: a new dataset of multiple fruit images for freshness evaluation. Multimed Tools Appl 83, 11433–11460 (2024). https://doi.org/10.1007/s11042-023-16058-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-16058-6