1 Introduction

Studies are being done to find creative solutions to lessen food waste, which has become a matter of concern in recent years. It has been labelled as a significant issue for the long-term viability of the food supply, demand, and production chains [8]. Meals have always been in great demand since they are the primary source of nutrition for all living things. The artificial intelligence (AI) approaches can be used to decrease food waste. These smart quality control systems monitor any environmental factors that can affect food safety [40].

The food sector places a lot of importance on quality control. It oversees sorting and grading fruits and vegetables effectively, so that they may be used in the manufacturing process. By automating the quality inspection process using computer vision techniques, the production process may be made more efficient overall and human laborers might be given more crucial jobs [4]. One of the most crucial and time-consuming phases in the production of fruit products such as jams is grading and sorting the fruits according to their freshness. In essence, a jam production firm would rely heavily on this phase to determine the overall quality of the finished product. To do this, the freshness level must be assessed using measuring techniques and physical characteristics or visual appearance such as color, stiffness/firmness of texture, size, skin gloss, perfect shape, etc. [11, 39]. These characteristics enables the customers to purchase fruits and vegetables with the top level of freshness thanks to quantitative analysis, making the experience more pleasant and healthful. Customer satisfaction is anticipated to rise in this fashion. In addition, the grocery might adjust the price of the fruits and vegetables that are about to go bad to stop them from spoiling in the shop, which benefits the food store, but also lessens waste [13].

Fruits are an important diet or food product needed for human balanced nutrition consisting of essential nutrients like vitamins, minerals, sugars, organic acids, etc. [45]. In smart agriculture, the automatic recognition of fruit quality is an essential player in increasing efficiency in production [5], minimizing sorting time [21], and reducing human interventions [33]. In the agricultural industry there is still a growing need for effective and efficient recognition/ classification of fruits products or vegetables based on their quality level [42].

Fruit quality recognition is needed to ensure that quality fruits products can be identified, sorted and preserved thereby grossly preventing food wastage and other losses such as economical and environment [24]. Scarcity of datasets in fruit quality recognition, especially for multi-fruit quality evaluation, is one of the major factors affecting productivity [5], while existing datasets are mostly dedicated for specific and single fruit quality detection. A recent review [45] on the application of the AI methods in fruits, vegetables and mushroom quality assessment emphasized the role of creating public datasets with the aim of standardizing state-of-the-art methods and improving overall comparison. On this note, the need to develop standardized methods that can identify different fruit products and automatically sort them based on the quality criteria is very important.

Conventional quality assessment approaches are typically destructive and off-line in nature. These days, agricultural and food goods including fruit and vegetables are subject to applications of computer vision techniques for quality and safety evaluation and monitoring [21, 27]. Efforts have been made to develop a variety of non-contact, quick, accurate, and environmentally friendly methods for non-invasive examination of various food products, including fruits and vegetables, meat and meat products, fish, poultry, dairy products, eggs, among others [14, 16]. In the agricultural sector, classifying rotting and fresh fruits is a crucial duty and a significant issue since, if done incorrectly, decaying fruits can spread disease and destroy new crops. To save labour expenses associated with rejecting rotting fruits at the manufacturing stage, it is necessary to precisely determine the fruit’s freshness, especially for the harvesting robots that select only fresh fruits [23].

The development of distant and proximal sensing technologies has been greatly aided by the deployment of machine learning models, which span from traditional classification and regression methods to cutting-edge deep neural networks and transfer learning [25]. Precision agriculture is one of these crucial application areas [38]. These proximate sensing technologies have been used for the detection, identification, and measurement of plants and harvest because to developments in machine learning algorithms used in computer vision [30]. The application of modern information and communication technologies enables the promotion of qualitatively and quantitatively sustainable actions. Food processing may be aided by the deployment of proximity sensors in the field of operational technologies for food quality control [22]. By assisting in monitoring and treatment operations, mobile and robotic apps are providing answers for the digital innovation processes. AI must be integrated into these systems to help the operator make meaningful decisions about the actual status of a plant’s vigour [28] and its fruit ripeness [10].

Pattern recognition and image processing are combined in computer vision. It is a non-destructive technique that enables the analysis and extraction of an image’s characteristics to facilitate classification. It is also acknowledged as a helpful tool for extracting measurements of external features like size, shape, color, and flaws for a variety of applications in the food industries, including assessing the stages of apple ripeness [6], assessing the quality of table grapes [7], mango-fruits recognition [47], evaluating freshness of parsley [48], banana [34], and identifying and recognizing plant diseases [1, 2, 17].

In this paper, we present a new dataset for tracking fruit freshness. We conduct an analysis of publicly available image datasets created for fruit quality evaluation. We concentrate on datasets made available in open format via sites for data exchange. The creation and distribution of publicly accessible datasets enables academics to focus more time and resources on the unbiased assessment and comparison of algorithms. This study aims to fill the gap left by the absence of multi-fruit image datasets in this field.

The novelty of this study can be outlined point-by-point as follows:

  • The study presents a new multi-fruit dataset of fruit images aimed for fruit freshness evaluation. This is a novel contribution as there is a lack of multi-fruit datasets to support real-time fruit quality evaluation.

  • The dataset contains the images of 11 fruits (banana, cucumber, grape, kaki, papaya, peach, avocado, pepper, strawberry, tomato, and watermelon) categorized into three freshness classes (fresh, mildly rotten, fully rotten). This is a novel approach as previous studies typically focused on a single type of fruit.

  • The study adopted five well-known deep learning models (ShuffleNet, SqueezeNet, EfficientNet, ResNet18, and MobileNet-V2) as baseline models for fruit quality recognition using their proposed dataset. This is a novel approach as it provides a comparison of the performance of these models on the new dataset.

  • Validity of trained models: The study tested the models trained on their dataset on the benchmark FruitNet dataset and found that MobileNet-V2 outperformed other models with an accuracy of 81.5%. This is a novel finding as it demonstrates the validity of the trained models on a different dataset.

The original contributions of this study are as follows:

  • We have created a new multi-fruit dataset for fruit freshness evaluation, filling a gap in the research field.

  • We have demonstrated the suitability of their dataset for training deep learning models, achieving high accuracy rates with ResNet18.

  • We have provided a benchmark dataset for fruit quality detection and classification, which can be used as a standard for testing performance of state-of-the-art methods and new learning classifiers.

  • We have valuated the performance of baseline CNN models and identified the best one for fruit quality recognition.

  • We have shared their dataset publicly, increasing openness and reproducibility of results, which can benefit the research community in various fields such as computer vision, machine learning, and pattern recognition.

The remainder of the essay is structured as follows. Section 2 discusses the qualities of the publicly accessible image plant fruit and leaf collections. Section 3 presents the new dataset, and Section 4 presents its exploratory analysis. Section 5 discusses the deep learning models used as baseline classifiers for fruit classification and the evaluation of their performance using the proposed dataset. Section 6 presents and compares and presents the classification results using the proposed dataset and the external FruitNet dataset. Section 7 discusses our results, while Section 8 concludes this paper.

2 Related works and analysis of existing datasets

Recently several datasets of plant fruits and leaves were introduced for various tasks including plant disease recognition and quality evaluation.

Fenu & Malloci [15] introduced the DiaMOS Plant field dataset, made up of 3505 photos of pears with four different diseases, which was gathered to monitor and identify plant problems. Additionally, they conduct a comparative analysis of the datasets used in the literature that are intended for the classification and identification of leaf diseases, emphasizing the elements that increase the utility and informational value of the gathered data.

Medhi & Deb [29] offers a dataset of images showing several banana plant kinds and the diseases that affect them. Bacterial Soft Rot, Banana Fruit Scarring Beetle, Black Sigatoka, Yellow Sigatoka, Panama disease, Banana Aphids, and Pseudo-Stem Weevil are the diseases and pathogens that they have taken into consideration here. A potassium deficiency dataset has also been considered. The collection contains more than 8000 images.

Meshram & Patil, K. [31] created an image dataset of high-quality Indian fruits that are widely consumed or exported. As a result, we created a dataset using six fruits: apple, banana, guava, lime, orange, and pomegranate. The dataset is divided into three folders: (1) Good quality fruits, (2) Bad quality fruits, and (3) Mixed quality fruits, with six fruits subfolders in each. The dataset contains over 19,500 images in processed format.

Rajbongshi et al. [35] shows a dataset of guava photos that includes both leaves and fruit images. These images are categorized into six classes: Phytophthora, Scab, Styler end Rot, and Disease-free Fruit for guava fruits, and Red Rust and Disease-Free Leave for guava leaves. This dataset is primarily intended for researchers who use deep learning, machine learning, and computer vision to create a system that can identify the guava disease and help guava farmers in their farming.

In Rauf et al. [36], citrus fruits, leaves, and stem are shown in an image dataset. Images of healthy and diseased plants with illnesses including Black spot, Canker, Scab, Greening, and Melanose are included in the dataset, along with images of citrus fruits and foliage. The dataset is intended for researchers who create computer applications to assist farmers in the early diagnosis of plant diseases using machine learning and computer vision methods.

Hughes & Salathe [19] presented a plant village dataset which consist of 54,309 image samples for healthy and diseased leaf. The images cut across 14 different crop species such as soybean, potato, corn, apple, cherry, tomato, grape, peach, blueberry, raspberry, squash, strawberry, bell pepper, and orange. The diseased images include viral disease, fungal diseases, mold diseases, disease resulting from mite and bacterial diseases while the healthy category shows twelve plant species Table 1.

Table 1 Description of Existing Fruit databases

These research papers present various image datasets of plants for disease recognition and freshness monitoring. These datasets cover a range of plant species, including pears, bananas, Indian fruits, guavas, and citrus fruits, with images of healthy plants and those with various diseases. The datasets aim to assist researchers in developing computer vision and machine learning techniques to identify diseases and monitor plant health. Such methods can assist farmers in the early diagnosis of plant diseases, leading to more effective treatment and better crop yields.

3 Proposed dataset

This study presents a new FruitQ-dataset to identify and monitor fruit freshness and level of rottenness. The dataset was composed from videos collected manually from YouTube (Google, San Bruno, California, United States) video platform with the intention of creating a diverse fruit quality dataset. The selected YouTube videos cover eleven varieties of fruits and the description of each fruit timelapses with duration of videos and their links is summarized in Table 2. This work aims to identify and classify fruit quality; the classes in the dataset were manually annotated according to the quality of freshness, such as Fresh, Mild, and Rotten. This dataset is suitable for performing machine and deep learning methods in classification and detection tasks.

Table 2 Fruit video sources from YouTube used for creating FruQ-DB

A total of 9421 images were collected, which includes 3010 fresh class, 2376 mild (mildly rotten) class and 4035 rotten class fruit images, respectively. A detailed summary is given in Table 3.

Table 3 Dataset description

The process for creating the dataset and the evaluation process is depicted in Fig. 1. The initial step is pre-processing the videos by first applying de-watermarking to filter/remove extra visuals such as text and image icons watermarks from the videos.

Fig. 1
figure 1

Workflow of the dataset creation and evaluation

Afterwards, a MATLAB (MathWorks, Inc., Natick, Massachusetts, USA) program was written to extract image frames from each YouTube video with the algorithm described in Table 4. Using the image frame extraction algorithm each fruit image frame was saved and annotated appropriately. Further pre-processing was done to resize the images from the original size of 1280 × 720 pixels to 224× 224 pixels.

Table 4 Algorithm for image frame extraction

We have generated two variants of the dataset containing the same set images, but with different arrangement as explained:

  • The FruQ-Multi dataset as depicted in Table 5 with the details of image frames per fruit type against the number of classes. The dataset can be used for developing and training new methods and models for fruit-dependent freshness evaluation and classification.

  • The FruQ-DB is the dataset that combines all fruits types, and it is categorized into three classes as fresh, mild, and rotten. The dataset can be used for developing and training new methods and models for fruit-independent freshness evaluation and classification.

Table 5 Summary of Extracted Images per fruit types and class samples (FruQ-Multi)

The sample images of pre-processed FruQ-DB images based on their different classes rotten, mild and fresh are shown in Figs. 2a-c, 3 and 4, respectively. Table 6 illustrates the number of fruit object instances and their distribution for each target class.

Fig. 2
figure 2

Random FruQ-DB samples of class “Rotten”

Fig. 3
figure 3

Random FruQ-DB samples of class “Mild”

Fig. 4
figure 4

Random FruQ-DB samples of class “Fresh”

Table 6 Summary of extracted mixed images per class samples (FruitQ-DB)

4 Exploratory analysis of dataset

In this section we present the results of the exploratory data analysis (EDA) of the proposed dataset. We follow the recommendations presented in [26] as guidelines and our workplan as follows:

  1. 1.

    Gain as much insight into the dataset as possible by analyzing its structure;

  2. 2.

    Visualize potential relationships (direction and magnitude) between independent variables (i.e., image features) and outcome variables (i.e., target class);

  3. 3.

    Identify outliers and anomalies (values that differ significantly from the rest of the observations);

  4. 4.

    Identify relevant (i.e., important) features.

  5. 5.

    Create target (i.e., classification) models (a predictive or explanatory model).

4.1 Analysis of the structural composition of the dataset

Figures 5 and 6 shows a visual representation of the structural relationships between the kinds of fruits and the image freshness labels in the proposed FruitQ dataset. We used the Alluvial flow diagram (Fig. 5) and radial plots (Fig. 6). The visual inspection shows that the distribution of data is imbalanced.

Fig. 5
figure 5

Composition of a dataset using alluvial flow diagram. Each kind of fruit is related to fruit freshness class

Fig. 6
figure 6

Composition of dataset by freshness type (a) and by fruit kind (b)

For further analysis we use the imbalance ratio (IR) [50], which is a common metric to characterize the imbalance ratio of a dataset. IR is defined as:

$$\textrm{IR}=\frac{N_{\textrm{maj}}}{N_{\textrm{min}}}$$
(1)

here Nmaj is the sample size of the largest majority class and Nmin is the sample size of the smallest minority class. Based on Eq. (1), the imbalance ratio is 1.7 (the largest class is “Rotten”, and the smallest class is “Mild”) according to freshness, and 7.8 (the largest class is “Tomato” and the smallest class is “Strawberry”) according to fruit kind. The imbalance in “Fresh” category is 12.5, in “Mild” – 20.5, and in “Rotten” – 9.8. The largest imbalance according to fruit kind is in “Pepper” category – 27,5, whereas as the smallest imbalance is in “Grape” category – 1.5. Summarizing, the dataset is highly imbalanced, especially in the “Mild” and “Pepper” categories. These imbalances are also clearly seen in Fig. 6.

4.2 Visualization of relationships between image features and target class

We consider the mean values of image pixels in RGB (Red-Green-Blue) and HSV (Hue-Saturation-Value) color spaces. Further comparison of images in the RGB and HSV color spaces by color and by freshness category is presented in Fig. 7. Most noticeable differences were observed in the HSB space, where the values of hue, saturation and value tended to increase as the fruits were decaying. Differences also were observed in the RGB space, where the intensity of red, green and blue colors tended to decrease as the fruits were decaying.

Fig. 7
figure 7

Comparison of images in RGB and HSV spaces by color and by freshness category

4.3 Statistical analysis of image features

The significant differences in feature distributions must be taken into consideration. We performed the statistical analysis results using the p value of a two-sided Wilcoxon rank sum test. The tests the null hypothesis that data in independent variable and dependent variable are samples from continuous distributions with equal medians, against the alternative that they are not. The test assumes that the samples are independent.

In the RGB color space, we have found the following statistically significant differences between the values of:

  • red color of fresh and mildly damaged fruits (p < 0.05), fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001);

  • green color of fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001);

  • blue color of fresh and mildly damaged fruits (p < 0.001), fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001).

  • Note that in all cases, the difference between fresh and rotten categories is more significant than the different between fresh and mildly damaged fruits.

  • In the HSV color space, we found statistically significant differences between the:

  • hue of fresh and mildly damaged fruits (p < 0.001), fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001).

  • saturation of fresh and mildly damaged fruits (p < 0.001), fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001).

  • value of fresh and rotten fruits (p < 0.001), and mildly damaged and rotten fruits (p < 0.001).

4.4 Analysis of distribution for anomaly and outlier detection

The outlier observations and skewed distributions, which will bias the results in the direction of their skew, can have a significant impact on classification results thus degrading the classifier’s ability to learn from the data [50]. The distribution of the color (red, green, blue) values in the two-dimensional RGB color space is shown in Figs. 8, 9 and 10, while the distribution of the HSV color space values is shown in Figs. 11, 12 and 13.

Fig. 8
figure 8

Distribution of green and blue colors in the fruit images across the freshness categories

Fig. 9
figure 9

Distribution of red and blue colors in the fruit images across the freshness categories

Fig. 10
figure 10

Distribution of green and red colors in the fruit images across the freshness categories

Fig. 11
figure 11

Distribution of hue and saturation in the fruit images across the freshness categories

Fig. 12
figure 12

Distribution of hue and value in the fruit images across the freshness categories

Fig. 13
figure 13

Distribution of values and saturation in the fruit images across the freshness categories

The fruit kind-based analysis using a two-sided Wilcoxon rank sum test, showed that the difference between freshness categories is highly significant (p < 0.001), except for:

  • tomato in green color between fresh and mild categories (not significant).

  • pepper in red color and value between fresh and mild categories (p < 0.05).

  • grape in red, green, saturation and value between mild and rotten categories (not significant).

  • pepper in red color and in value (HSV) between mild and rotten categories (p < 0.05).

  • tomato in green color between mild and rotten categories (not significant).

The outlier analysis involves calculation the ratio of outliers in a particular dataset or subset of the dataset. Usually, outliers are defined as differing by more than three standard deviations from the mean as follows:

$${r}_o^f=\frac{\left|{\mu}_f-3{\sigma}_f<f<{\mu}_f+3{\sigma}_f\right|}{N_c}$$
(2)

here \({r}_o^f\) is the outlier ratio according to some feature f, Nc is the total number of instances in the class, and μf, σf are the mean and standard deviation of feature f values.

Another approach is the distance sum-based outlier [46] definition considered the sum of distances from a point to its k-nearest neighbors as its outlier degree, and the n points with the largest sums are determined as outliers, which can be denoted as O(k, n).

The parameters k and n can be selected freely, however here both parameters in determining nearest neighbors and outliers are set to 10, as recommended by [46].

The results of outlier analysis are presented in Fig. 14. It shows the outliers in the “Rotten” category of each fruit in the proposed dataset. From visual inspection one can note that all these fruits are highly decayed.

Fig. 14
figure 14

Examples of outliers in the “Rotten” category of each fruit in the proposed dataset

5 Baseline classification and performance evaluation

5.1 Classification

We have adopted five deep learning classifiers, namely, MobileNet-V2 [37], SqueezeNet [20], ShuffleNet [49], EfficientNet [43], and ResNet18 [18], for classification of fruit images according to the level of freshness or rottenness. Our choice of these classifiers is based on their better performance, reduced computational complexity, and good generalization skills especially in classification [9]. MobileNet-V2 was designed to be used in lightweight, low-delay systems such as in Internet-of-Things and computer vision applications in many fields. SqueezeNet is a lightweight model with fewer structural parameters and fewer calculations, and its structure and classification accuracy meet real-time requirements. ShuffleNet is an effective CNN primarily designed to serve for applications that demand low computational capability. EfficientNet is accurate, lightweight and robust in various image classification tasks. The residual block structure in ResNet-18 network may allow the model to learn deeper characteristics of images.

The image input size for the five baseline models were configured with different sizes as MobileNetV2, ResNet18, EfficientNet and ShuffleNet have the best input size of 224 × 224 pixels, while for SqueezeNet the input size is 227 × 227 pixels. The number of images were increased using the data augmentation technique in the FruitQ dataset and augmentation techniques adopted include, random rotation, scaling, flipping, shearing, and translation. The experimental results show promising results for the deep learning models with better performance as presented in Section 6.

5.2 Performance evaluation

The evaluation of the performance of every round of the baseline models uses eight metrics as presented in Eq. (1–7). This performance metrics was analysed for each dataset (FruitQ dataset and FruitNet dataset). The following are the metrics, their description and mathematical expression.

Precision (PRC): also referred to as Positive predictive value, is the ratio of the correctly predicted positive samples to the total number of all predicted positive samples.

$$PRC=\frac{TP}{TP+ FP}\times 100\%$$
(3)

Negative Predictive Value (NPV) is the ratio of the correctly predicted negative samples to the total number of all predicted negative sample.

$$NPV=\frac{TN}{TN+ FN}\times 100\%$$
(4)

Recall (RCL) is the number of positive samples that is predicted as positive. It is the ratio of correctly predicted positive samples to the total number of all positive samples.

$$RCL=\frac{TP}{TP+ FN}\times 100\%$$
(5)

F1-Score is the measure of the harmonic mean of precision and recall.

$${F}_1=2\times \frac{PRC\times RCL}{PRC+ RCL}$$
(6)

Accuracy (ACC), is the number of all correctly predicted positive and negative samples to the total number of samples

$$ACC=\frac{TP+ TN}{TP+ TN+ FP+ FN}\times 100\%$$
(7)

Balanced Accuracy (BAC) is used to provide a better evaluation in case of imbalanced data and it can be defined as:

$$BAC=\left[c\times \left(\frac{TP}{TP+ FN}\right)+\left(1-c\right)\times \left(\frac{TN}{TN+ FP}\right)\right]\times 100\%$$
(8)

where c ∈ [0, 1] is a penalization cost of misclassifying the positive sample.

Cohen’s Kappa Coefficient (KAP) is used to estimate or measure the agreement between two samples. Kappa indicates the level of agreement and reliability and therefore, making a fair measure of model performance as presented in the expression:

$${\displaystyle \begin{array}{cc}& \\ {}& {f}_c=\frac{\left( TN+ FN\right)\left( TN+ FP\right)+\left( FP+ TP\right)\left( FN+ TP\right)}{n}\end{array}}$$
(9)

6 Results

The experiments in this paper were carried out on a computer with Windows 10 operating 157 system, configured with a 64-bit 2.20GHZ, Core i5 CPU, 8GB of memory, 1 TB HDD. The deep learning classification framework was implemented, and the experiments were performed on MATLAB R2022a (MathWorks, Inc.). The training and testing times on the FruitQ-DB dataset using ShuffleNet, SqueezeNet, EfficientNet, ResNet18, and MobileNet deep learning models are summarized in Table 7.

Table 7 Training and testing times of deep learning models on the FruitQ-DB dataset

6.1 Multi-class baseline experiment with fruitq-datasets

This section presents a multiclass classification analysis of the FruitQ-datasets using five state-of -the-art DL models. The FruitQ-dataset was divided randomly into ratio 80:20 where 80% of the FruitQ datasets was used for training, and 20% for validation. In addition, we also applied K-Fold cross validation method on each model at K = 5 and the summary of the baseline models is summarized in Table 8.

Table 8 Performance metrics for multiclass classification of FruitQ-DB

Our baseline CNN models were trained using the Adam optimizer which is the optimization process with an initial learning rate adjusted from {1e−3, …, 1e−5}, 20 epoch size, mini-batch size of 256. To reduce overfitting of the training model, we applied L2 norm parameters with value 1e−4 and dropout rate of 50%.

The best experimental performance results were obtained using ResNet18 model with a validation accuracy of 99.8%, 99.4% for SqueezeNet, 99.1% for EfficientNet, 96.8%, for ShuffleNet, and 96.3% for MobileNet.

From Table 8, we can see that the experimental result with ResNet18 model achieved better recall rate of 100%, each for fresh, mild, and rotten fruits respectively. In addition, the precision rate for fresh and rotten class is 100% while mild class is 99.8%. The model with the least performance is the MobileNet with an accuracy of 96.3% while the recall rate of rotten class is 98.5%, precision 97.85%, specificity 96.57%, and balance accuracy rate of 96.18%. A graphical representation of experimental results is depicted in Fig. 15a-d where Fig. 15a shows that ResNet18 outperforms other baseline models with an accuracy of 99.8%, and the least accuracy was achieved by MobileNet with 96.3%. Figure 14b shows that the rotten class was best recall for all models with EfficientNet and resnet18 models at 100% rate for both the mild and rotten classes. However, the MobileNet-V2 has the least recall rate in comparison with other baseline models.

Fig. 15
figure 15

The performance metrics for the FruitQ-DB classification with baseline models

The confusion matrices of the model are further presented in Fig. 16 a-d showing the number of misclassified samples in each class. The number of misclassification samples varies between the output class of fresh vs. mild or mild vs. rotten for all models however, there is no misclassified samples for fresh vs. rotten, i.e., we can say the classification rate of fresh vs rotten is a 100% for all the baseline models. The performance of ResNet 18 shows the best results for all the performance metrics but at a more computationally intensive in terms of execution time and memory complexity, finetuning some parameters and applying dropout to reduce the number of unnecessary deep layers also reduced effectively the computational tine and improved the detection rate.

Fig. 16
figure 16

Confusion matrices for the FruitQ-DB classification experiments with baseline deep learning models: (a) ShuffleNet (b) SqueezeNet (c) EfficientNet (d) ResNet-18 (e) MobileNet-V2

6.2 Binary classification using fruitnet datasets on our baseline models

To further validate the results obtained, we applied another dataset “FruitNet” dataset from Vishal et al. [32] for testing our models. This dataset consists of six types of fruits with a total of 12,000 images comprising of six fruits and two classes: “good” and “bad”. For this study, we renamed the classes as “Fresh” and “Rotten” for effective nomenclature of data classes. On this note, our test samples comprise of 12,000 samples divided into fresh and rotten class with 6000 and 6000, respectively.

In this study, we performed a binary classification task and we utilized two classes in the FruitQ-datasets, i.e., the Fresh and Rotten classes to train the baseline models. For training, we conducted a series of experiments using the learning rate set as 0.001, and to reduce the cost function, we applied Adam optimizer. For each experiment, the deep learning models were fine-tuned to enhance the performance accuracy for the FruitNet dataset images used for testing. To reduce overfitting of our baseline models, we finetuned some of the parameters such as lowering the learning rate where appropriate for specific baseline model till we arrive at an optimal results as learning rate was lowered to 1e−4, learn rate drop factor 2e−4 to 1e−3, max epoch at 10, minibatch size is 256. The classification results for the test datasets (FruitNet) on the five learning models MobileNetv2, SqueezeNet, ShuffleNet, EfficientNet, and ResNet18 after training each models with the FruitQ datasets are presented in Table 9.

Table 9 Performance of classifiers trained on the Fruit-Q dataset and trained on the FruitNet dataset

As seen from Table 8, the performance of the CNN classifier relatively needs more improvement for real time fruits classification. The MobileNet classifier outperformed the rest of the models with accuracy rate of 81.5%, recall rate of 76.2%, NPV rate of 78.47%, and F1-score rate of 80.45%. Specifically, compared with other CNN models, MobileNet accuracy rate increased by ↑2.6%, ↑5.8%, ↑3.9%, ↑3.5% for ShuffleNet, SqueezeNet, EfficientNet and ResNet18 models, respectively. In addition, the recall rate increased in comparison with ShuffleNet by ↑5.9%, SqueezeNet by ↑15.53%, EfficientNet by ↑15.57% and ResNet18 by ↑10.97. The precision rate of MobileNet reduced in comparison with SqueezeNet by ↓1.46%, EfficientNet by ↓10.85, and ResNet18 by ↓2.31. Therefore, on average the performance metrics improved by more than 2.6% accuracy on both fresh and rotten fruit classification. The comparison results of the fresh and rotten fruits classification experiments of the deep learning models for accuracy, recall, precision, F1-score and NPV is presented in Fig. 17a-e.

Fig. 17
figure 17

The performance metrics for the FruitNet classification experiments: The test classification performance achieved with (a) Accuracy (b) Recall, (c) Precison and (d) NPV and (e) F1-Score

Furthermore, we used a T-distributed neighbor embedding (t-SNE) algorithm [44] which is a data visualization tool to effectively visualize the classification results of the MobileNetv2 model by mapping the high dimensional cluster for the two classes. Previous studies have shown that t-SNE can be used for reducing dimensionality, visualizing high-dimensional data sets and demonstrating data samples distribution [3]. Fig. 18a shows the results of mapping the fresh and rotten fruits classes based on the FruitQ-datasets on MobileNetv2 network. From the perspective of FruitQ-datasets, the two unique classes are separated into two distinct clusters (fresh, rotten). We examined that the clusters in the FruitQ-datasets shows higher score for both cluster-1 and cluster-2 fruit groups. Fig. 18b depicts the results of MobileNetv2 model on the mapped FruitNet datasets.

Fig. 18
figure 18

t-SNE map of the patterns in (a) FruitQ- dataset and (b) FruitNet dataset: the blue colour represents the fresh samples, and red color corresponds to the rotten class

7 Discussion

The data are essential for those developing smart systems in agriculture and food engineering, especially in fruit quality recognition. This study provides a comprehensive analysis of a new multi-fruit dataset for fruit quality evaluation and demonstrates the suitability of the dataset for training deep learning models. The results of the study could help improve the accuracy and efficiency of fruit quality recognition systems and contribute to the advancement of the agricultural sector. The FruitQ-DB dataset provides a benchmark dataset for the classification task, which could improve research endeavors in the field of fruit quality recognition. It could be used as a standard benchmark dataset for testing the performance of state-of-the-art methods and new learning classifiers, as it is systematically organized and annotated. The study highlights the best methods for dataset construction, which could improve the completeness and representativeness of the dataset for further research efforts. The FruitQ-DB dataset is publicly available, which could increase openness and reproducibility of the results. The research community in the fields of computer vision, machine learning, and pattern recognition could benefit from the FruitQ-DB dataset by applying it in various research tasks such as fruit classification, fruit quality recognition, analysis, and comparison of deep/machine learning models and techniques.

Current datasets have some constraints, particularly those related to dataset size, representativeness, completeness, and performance baseline, which are discussed below as follows:

  • Dataset size: The number of illness types and sample sizes in the present databases are their biggest drawbacks. The “healthy” class has few samples in the dataset. In real applications, the model does not generalize well due to an imbalance of classes. This indicates and validates that even while the need for larger datasets is acknowledged, the work is difficult due to the manual labour and associated costs, which are made worse by the fact that few occurrences can be located for some classes. Data augmentation, transfer learning, and fine tuning can help to solve this issue.

  • Representativeness: Data collection in controlled lab settings is the foundation of the acquisition protocol that is used the most frequently. The location and method of gathering both have an impact on how representative the dataset is. The range of variability that can be detected in the field cannot be accurately reflected by controlled circumstances. When trained on laboratory datasets, algorithms frequently attain near-perfect accuracy, but when trained on field datasets, performance suffers greatly. Few datasets also considered how symptoms changed over the course of a full growing season. More work should go into identifying symptoms at the beginning of an emergency. Digital tools are necessary at this point to take prompt action to halt the spread of the disease.

  • Completeness is referred to as “the level of breadth, depth, and appropriateness of a datum according to its purpose” [41]. Even though certain datasets are well-built, in some instances, we discovered the ground truth labels still suffers from some level of completeness. Usability would increase if segmentation masked, and bounding boxes were present.

  • Performance baseline: Having a performance baseline available can aid in the creation and approval of new methodologies.

Moreover, there are some of the challenges experienced by the baseline classifiers when testing with the FruitNet dataset. The FruitNet dataset is a noisy dataset with noisy backgrounds images therefore, the need to clean/ remove noisy backgrounds from datasets will majorly contribute and improve accuracy of learning models.

While the FruitQ dataset contributes to the field of fruit quality recognition, it is important to acknowledge its limitations and consider them in future research:

  • Limited fruit variety: While the FruitQ dataset contains 11 types of fruits, it may not be representative of all fruits or fruit varieties. Therefore, the dataset may not be suitable for evaluating fruit quality recognition systems for fruits that are not included in the dataset.

  • Limited freshness classes: The FruitQ dataset only categorizes fruits into three freshness classes (fresh, mildly rotten, and fully rotten), which may not capture the nuances of fruit quality in real-world scenarios. There may be other freshness categories, such as ripe but not yet overripe, that are important for fruit quality evaluation.

  • Lack of real-world data: The FruitQ dataset only contains images of fruits taken under controlled conditions in a laboratory setting. The dataset does not include images of fruits taken under real-world conditions, such as in a grocery store or at a farm. Therefore, the dataset may not reflect the variability in fruit quality that occurs in real-world scenarios.

  • Limited deep learning models: While the authors applied five well-known deep learning models to the FruitQ dataset, there may be other deep learning models that could perform better for fruit quality recognition. Therefore, the evaluation of deep learning models on the FruitQ dataset may not be exhaustive.

  • Lack of external evaluation: While the authors tested the trained models on the benchmark FruitNet dataset, they did not evaluate the models on any other external datasets. Therefore, the generalizability of the models to other datasets may be limited.

8 Conclusions

We provide a benchmark dataset with the aim of providing a baseline for the classification task. The purpose of this dataset is to enhance future research endeavours to improve recognition of fruit quality in the real-world systems thereby ensuring overall artificial intelligence capabilities. A statistical analysis was carried out to evaluate the images features in the FruQ-DB dataset. A comparison of the performances of well-known deep learning classifiers with improved results and less computationally-intensive architectures was applied to train models and test on the datasets.

We presented existing fruits images publicly available dataset in the literature, and the accessible websites are also included. In addition to releasing the dataset simultaneously, we also reviewed the datasets that have been used in the literature to classify and identify fruit quality and freshness. The analysis that was undertaken has emphasized the best methods for data set construction, effecting the information content that the data can communicate, as well as their usefulness in describing the environment from which they were collected or observed. When creating the suggested dataset, several factors were considered. To improve the dataset’s completeness and representativeness for further efforts, we intend to increase it.

The main usefulness of the FruQ-DB dataset is for the research community for the following reasons: it is a dataset for fruit quality detection or classification that would improve research endeavours in the field of specific fruits or varieties of fruit quality recognition. Secondly, this dataset could be used in achieving a standard benchmark dataset for testing performance of the state-of-the-art methods and new learning classifiers as it is systematically organized and annotated. We evaluated the performance of some baseline CNN models such as ShuffleNet, SqueezeNet, EfficientNet, ResNet-18 and MobileNet on the FruitQ dataset and the results are very impressing with best classifier as ResNet-18 with an overall best performance as 99.8% for accuracy, 99.4% for SqueezeNet, 99.1% for EfficientNet, 96.8%, for ShuffleNet, and 96.3% for MobileNet. This FruitQ dataset is publicly available and shared to increase openness and reproductivity of results. The research community in the fields of computer vision, machine learning, pattern recognition could also benefit from these data by applying them in various research tasks such as: fruit classification, fruit quality recognition, analysis and comparison of deep/machine learning models and techniques.

For future work, we intend to improve optimally the classification performance of deep learning techniques for fruit quality classification task. In addition, we plan apply segmentation and data augmentation methods to increase diversity in data and improve data generalization. Further study will be considering reducing of overfitting and presenting robust deep learning technique for effective training and enhancing fruit quality detection; expanding the dataset by adding more fruits or additional images of the existing fruits, as well as collecting images from different regions or seasons to increase the variability of the dataset. In addition to image data, future work could explore the use of other types of data such as texture, color, or spectral information for fruit quality recognition.