Introduction

Mango (Mangifera indica L) is a highly sought-after fruit cultivated in tropical countries worldwide due to its delectable taste and enticing aroma [1]. India stands as the largest mango producer globally, with the ‘Alphonso’ cultivar being renowned for its distinct flavor and higher cost. However, major mango-growing regions experience internal physiological disorders like Spongy Tissue (ST), jelly seed, insidious fruit rot, tip pulp, and soft nose [2,3,4]. The ‘Alphonso’ and ‘Keitt’ cultivars, famous for their taste, but suffer annual production losses of 20–30% and 10%, respectively, due to ST disorder [2, 5]. The symptoms usually initiate on the stone’s surface and spread to the mango’s skin, affecting its nutritional value and overall quality [6]. Consequently, stringent phytosanitary regulations in the United States and the European Union have led to a decline in the export potential of these cultivars. India’s Alphonso faced import bans in several countries for an extended period [7].

One of the main challenges in addressing quality disputes related to mangoes is that fruits with infected ST do not exhibit visible symptoms externally, and the flaw becomes apparent only upon cutting [8]. Mango fruits are typically harvested at 75–80% maturity to minimize losses and mechanical injuries during the long supply chain. However, these internal disorders often develop during fruit ripening and can also appear during postharvest storage [9]. Consequently, detecting spongy affected fruits at the farm level becomes exceedingly difficult. Traditionally, the investigation of ST in fruit involved cut and open method, with random sampling from the orchard or fruit lot [6]. As a result of these issues, the marketability and popularity potential of Alphonso and Keitt mangoes are declining steadily [10]. Consumers also express doubts when purchasing these fruits and tend to avoid repurchasing. At present, there are methods that allow the non-destructive sorting and grading of mangoes by evaluating their maturity and physicochemical characteristics. These techniques include VNIR spectroscopy, hyperspectral imaging, and Nuclear Magnetic Resonance (NMR) [11]. This creates the need for individual fruit prediction to confirm physiological disorders at different grading and sorting stages during the supply chain or prior to shipping for export.

Considering the industries’ demand for a non-invasive technique to evaluate mango internal quality, X-ray imaging emerges as a prominent and sustainable solution [11]. While X-ray imaging is commonly used in the medical field, security purposes, and luggage screening [12], its potential in the food and agricultural industry remains is improving [13]. Soft X-ray imaging offers numerous advantages, including no sample preparation, non-destructiveness, cost-effectiveness, reliability, rapidity, ease of handling, and high throughput, producing images in just 2 to 3 s [14, 15]. Several previous studies have successfully demonstrated the adaptability of X-ray imaging in the horticulture sector, such as detecting internal defects and bitter rot in apples [16, 17], internal rot in avocados [18], internal disorders in pears [19], seed spoilage location in mangoes [1]. Upon reviewing recent studies, the utilization of X-ray imaging for detecting ST in mangoes has emerged as a significant contribution. Consequently, this technique emerges as a new troubleshooter in the fruit export business, ensuring an uninterrupted supply chain.

In recent years, researchers have increasingly adopted a machine-learning approach to address real-world issues. DL, a subfield of machine learning known for its high capacity to abstract and recognize patterns in images, has gained significant popularity [20]. One of the primary techniques in DL is convolutional neural networks (CNNs), which have become widely used in various domains, particularly for image classification tasks, including fruit classification [1, 21]. Enhanced iterations of convolutional neural networks (CNNs), such as ResNet and VGGNet models, have been employed for classification tasks in the agricultural sector, serving diverse purposes. The performance of these models was investigated to understand their performance in these applications [22,23,24]. DL, like neural networks, uses layered structures for hierarchical feature learning in tasks such as image or language processing. DL is favored for complex, unstructured data due to its ability to learn intricate patterns. The combination of DL techniques and advanced imaging methods is proving to be highly effective in solving practical challenges across various fields.

First and foremost, this study’s primary objective is to implement a non-destructive and non-invasive X-ray imaging technique for detecting internal ST infection in Alphonso mango fruits. Secondly, the study aims to apply several supervised DL models to the mango X-ray image dataset, enabling accurate classification of the fruit into two categories: Non-spongy and internally damaged (Spongy). Additionally, grid search hyperparameter tuning within each fold of the cross-validation technique has been used to train the DL models effectively. Lastly, all DL models were evaluated through statistical tests to determine their comparative performance superiority.

Materials and methods

Mango fruit sample collection and preparation

A sample set consisting of 324 units of Alphonso mango fruit at 75 to 80% maturity (based on color and specific gravity) was collected from commercial orchards in Ratnagiri (16° 59’ 39” N, 73° 37’ 09” E) and Sindhudurg (16° 10’ 53” N, 73° 44’ 86” E) districts of Maharashtra state, India, during two harvesting seasons in 2022 and 2023. After harvesting, the mangoes were immediately kept for pre-cooling. Mangoes with clear skin surfaces, without any damage, wounds, or visible injuries, were carefully selected for image acquisition. Subsequently, numbers were assigned to the mango fruits for identification purposes. During the storage period, the samples were stored at room temperature and non-destructively evaluated at every 5-day interval for up to 25 days of storage.

X-ray image acquisition

Mango samples were randomly chosen from storage, naturally rest position on photostimulable phosphor (PSP) plate, and X-ray images were captured (MARS 40–80 Fixed X-ray (Radio & Fluoro), Allengers Ltd, India) with adjustable voltage, time, and current. Dark current and brightness were corrected for noise reduction. Hermetically sealed air-cooled X-ray source was focused on a table. Voltage (30–45 kV), current (150–600 mAs), and time (4–10 mA) were varied for optimal contrast, resulting in 40 kV, 6 mA, and 600 mAs. X-ray images were processed in Digix ECO software (Allengers, Chandigarh, India), then saved as PNG files from an imaging plate (MD4.0T, AGFA, Hilton, NY).

Discrimination of spongy tissue

In the initial stage of this study, a total of 324 mango units were obtained for the purpose of training and validating models. Subsequently, 648 original X-ray images were taken to enable the non-destructive evaluation of these mangoes. At regular intervals of 5 days (0, 5, 10, 15, 20, and 25 days post-harvest), 54 mangoes were chosen randomly for further examination. This involved capturing X-ray images and halving the mangoes from the equator position near the stone to verify the presence of internal disorders, such as ST in Alphonso mangoes, and to validate the results obtained from X-ray images. In this experiment, all mangoes were divided into two categories only: Non-spongy and Spongy. The division was based on visual inspection by an expert panel. The internal assessment of sliced fruits and the identification through X-ray imaging demonstrates the reliability of this technique.

Data augmentation

A total of 648 original X-ray images were acquired and augmented to diversify the dataset using label-preserving transformations. Data augmentation is a common technique to expand datasets, reduce overfitting, and enhance generalization. By incorporating additional data from underrepresented classes, data augmentation helped minimize the disparity in class representation [25]. In this study, the initial dataset contained images of both Non-spongy and Spongy mangoes. Five augmentation techniques were applied: horizontal flipping (50% chance), Gaussian blur (sigma 0.5-1), Gaussian noise (sigma 0-0.05), scaling on the x-axis (0.5, 1.0) and y-axis (1.5, 1.0), and horizontal shear (-30 to 30 degrees). The chosen specifications were determined through trial and error, as illustrated in Fig. 1.

Fig. 1
figure 1

Augmentation of X-ray images using different techniques (A) Original, (B) Flipping and Gaussian noise, (C) Gaussian blur, (D) Rotation and shearing, (E) Scaling

Dataset preparation

A dataset comprising 3888 images, including original X-ray images and augmented ones, was created. It contained 1944 Non-spongy and 1944 Spongy images, ensuring balance in dataset in both classes. This dataset was used for model training. The data was split into two parts: 80% (3110 images) for training and 20% (778 images) for validation purpose. Within the 80%, four subsets were created for cross-validation, with each fold containing 388/389 Non-spongy and 388/389 Spongy images. Four folds were used for training DL models, and one for validation iteratively for five rounds. This approach ensured an unbiased evaluation of the model’s performance on new data, contributing to the reliability of the study. For testing DL models, a dataset consisting 252 original X-ray images was prepared from 126 Alphonso mangoes randomly collected from the market.

Deep learning models and their architecture for classification

DL is a subfield of artificial intelligence that studies and develops algorithms capable of learning from data and making predictions. The X-ray imaging technique is used as a non-destructive method to discover defects of ST in mangoes, with data gathered in the form of images. Classification and regression analysis are two common machine-learning applications used for detecting defects in horticultural products [26]. Models are created to categorize data using classification algorithms, which forecast discrete outcomes. It focuses on automatically identifying patterns in data using computer algorithms and utilizing such patterns for tasks like categorization [27]. For evaluating horticultural product characteristics, including defect identification, numerous learning algorithms have been applied.

Deep neural network architecture of CNN model

The CNN model comprises essential elements: filter bank kernels, convolution, activation, and pooling layers. It’s designed with sequential layers, including two convolutional layers (32 and 64 filters, 3 × 3 kernels, ReLU/LeakyReLU activation), followed by max-pooling as shown in Fig. 2. To prevent overfitting, a dropout layer is added. The flattened representation connects to a dense layer (128 neurons, ReLU/LeakyReLU), followed by a final sigmoid dense layer for binary classification. Customizable options include optimizer (‘RMSProp’ or ‘Adam’) and learning rate. It’s compiled with binary cross-entropy loss and accuracy evaluation, offering adaptability for diverse image classification tasks [21].

Fig. 2
figure 2

CNN architecture model used for binary classification of mango

Visual geometry group network (VGGNet 16)

In the study, a VGG16 deep neural network was used for mango image classification with 224 × 224-pixel input images. The model had five blocks of convolutional layers for feature extraction, with varying numbers of filters in each block. The extracted features were then flattened and connected to a dense layer with two neurons for final classification, outputting probabilities for two classes. This architecture, known for its effectiveness in computer vision tasks, performed well in the study [28].

Visual geometry group network (VGGNet 19)

We tailored a standard VGG19-based neural network for binary classification, keeping VGG19’s 16 convolutions, 3 fully connected layers and 3 × 3 filters. We froze VGG19 base layers, added a 128-neuron dense layer with ReLU activation, and a binary classification layer. Model compiled with binary cross-entropy loss, accuracy metric and optimizer choice.

ResNet

The ResNet model employs skip connections to mitigate the vanishing gradient issue in deep neural networks. It’s organized into stages (conv2, conv3, conv4, conv5) with increasing filter counts (64, 128, 256, 512) to capture complex features while reducing spatial dimensions. The final output is flattened to a 100,352-size vector and connected to a Dense layer for binary classification.

AlexNet

The Keras Sequential model follows an AlexNet-style architecture: 96 filters (11 × 11, stride 4 × 4, ReLU) with MaxPooling (3 × 3, stride 2 × 2) and Batch Normalization. Then, 256 filters (5 × 5, ReLU) with MaxPooling, and three layers with 384 filters (3 × 3, ReLU) with MaxPooling and Batch Normalization. Fully connected layers: 4096 units (ReLU, Dropout 0.4, Batch Norm), repeated, and a final layer with 1000 units (ReLU, Dropout 0.4, Batch Norm). Details of all DL models provided in supplementary materials.

Model optimization: Grid search with cross-validation

To address overfitting and underfitting in DL models, we employed grid search hyperparameter tuning. This systematic method optimized various CNN models by adjusting hyperparameters like learning rate, batch size, dropout rate, optimizer type, and early stopping criteria (Table 1). We tested four learning rates (0.001, 0.01, 0.1, and 0.2), four batch sizes (32, 64, 128, 256), four dropout rates (0.00, 0.10, 0.20, and 0.30), and three optimizers (Adam, RMSProp, and SGD). Early stopping with a patience setting of 30 epochs was used to prevent overtraining. Binary cross-entropy loss function and a threshold of 0.95 were applied for binary mango classification. This comprehensive approach resulted in 384 unique hyperparameter combinations per training fold, ensuring thorough parameter space exploration for optimal model settings.

Table 1 Hyperparameters used for training various DL models for image classification

Following best practices, we employed a 5-fold cross-validation strategy. In each fold, grid search hyperparameter tuning was conducted with 384 combinations on the validation dataset across five folds. This rigorous approach maximized data utilization for training and validation, enhancing model integrity and robustness. It facilitated the exploration of the hyperparameter landscape, ensuring optimized performance and resilience against overfitting [29, 30].

Model selection, validation and testing of model

The performance of each model, corresponding to the varied hyperparameter combinations, was evaluated based on the accuracy across the five folds. This metric provided a dependable estimate of the model’s efficacy with specific hyperparameters, relative to unseen (test) data. The hyperparameter combination that yielded the most favorable average performance across all folds was identified as the optimal model configuration. The assessment of model performance was conducted on a consistent test dataset consisting of 252 images, which remained the same for all models during testing of models.

Model evaluation

The evaluation of model performance on a consistent test dataset comprising 252/253 images, which remained unchanged for all models within the five-fold cross-validation, involved comparing predictions against ground-truth labels of ‘Non-spongy’ (0) and ‘Spongy’ (1). This assessment utilized a confusion matrix to depict true positive (TP), true negative (TN), false positive (FP), and false negative (FN) classifications. From these outcomes, parameters such as accuracy (Eq. 1), precision (Eq. 2), recall (Eq. 3), F1-score (Eq. 4) were computed.

$$\:Accuracy=(TP+TN)/(TP+TN+FP+FN)$$
(1)
$$\:Precision=TP/(TP+FP)\:$$
(2)
$$\:Recall=TP/(TP+FN)$$
(3)
$$\:F1\:score=TP/(TP+0.5(FP+FN\left)\right)$$
(4)

Statistical analysis and software

TukeyHSD test assessed mean accuracy differences between models, using a predefined alpha level (usually 0.05) for significance. DL models were developed in Python using TensorFlow, on a Linux system with 32 GB RAM and an NVIDIA GeForce GTX 1080 Ti GPU with 11 GB memory.

Results and discussion

Discrimination of ST and its behavior during storage

X-ray CT imaging stands as an innovative and non-invasive machine vision technique with significant potential for identifying internal flaws in situations where alternative machine vision methods prove inadequate. This technology holds a distinct advantage due to its capacity for X-rays to traverse through the majority of objects, granting it a competitive edge. X-ray images were analyzed to discriminate Non-spongy and ST-affected pulp which exhibits sponginess without apparent external symptoms revealed evident mesocarp damage. This was characterized by distinct dark gray patches on the images, indicating internal air cavities within the flesh. These dark patches were surrounded by light gray areas, representing the regions with healthy pulp. Despite the discernible outline of the endocarp (stone) on the X-ray images, it did not interfere with the identification of ST. In contrast, Non-spongy fruits displayed a consistently light gray color, indicative of uniformly healthy pulp throughout (Fig. 3).

Fig. 3
figure 3

Discrimination of mango fruits using X-ray imaging and cutting method

To confirm the presence of internal disorders in X-ray fruit images, samples underwent a destructive method by cutting. On rare occasions, mangoes have been observed to develop spongy tissue (ST) when they reach a maturity level of 75 to 80%, although this phenomenon is not very common at this maturity stage. Detecting ST-affected pulp in early ripening stages is challenging. As the mangoes continue to ripen, the likelihood of spongy tissue occurrence increases significantly due to a range of contributing factors. To investigate the progression of spongy tissue development, a research study was conducted where mangoes were subjected to a 25-day storage period, during which alterations in the severity of spongy tissue were observed using X-ray imaging techniques. Subsequent ripening stages witnessed a significant increase in the extent of ST, spreading towards the exocarp of the fruit. The texture and appearance of the ST varied among fruits and underwent changes with the advancing ripening stage. By the conclusion of the storage period, mangoes affected by ST displayed pulp that was highly disintegrated.

In the present study, the assessment of ST focuses solely on its presence, without considering the volume of infection. Notably, there exists morphological variability within the ST. Destructive analyses revealed diverse characteristics of Spongy pulp, including the presence of air pockets with discoloration, discoloration of pulp, white corky and hard pulp, and black soft damaged pulp (Supplementary Fig. S1). Importantly, the nature of the pulp influences the transmittance of X-ray images, introducing variations in the observed characteristics. X-ray imaging was used to determine its effectiveness for internal quality detection of mango fruit. Variations in image grey value result from differential X-ray absorption. The firm fruit pulp exhibits lower grey values due to its increased X-ray absorption, while higher grey values represent reduced X-ray absorption in voids within the fruit and surrounding air. The absorptivity of X-rays in mango pulp is employed to classify it into two categories: Non-spongy and Spongy. Healthy pulp absorbs more X-rays compared to Spongy pulp. So Non-spongy image showed a uniform distribution grey value [11]. Avocado fruits are susceptible to fungal fruit rots during the ripening process. To ensure the quality of the fruits before shipment to retailers, it is crucial to employ efficient and reliable sorting systems [31]. Traditional radiography requires scanning both sides of mangoes due to the interference caused by the presence of the stone center, which can affect X-ray absorbance/transmittance. X-ray CT, offering 3D insights by encompassing entire fruits to reveal internal density variations, stands out as viable method for non-destructive examination of internal fruit disorders [19, 32]. Despite 3D CT classifiers capturing spatial information, 2D radiography classifiers prove as accurate or even superior (95 to 99% accuracy) compared to 3D models (95 to 96%) [24]. In certain studies, fruits like pears or apples are categorized as consumable or non-consumable [19]. However, when dealing with mangoes affected by ST, such classification is unnecessary. This is due to the aggressive spread of ST, which, once established in the fruit, rapidly absorbs nutrients from the surrounding healthy pulp.

Performance of hyperparameter tuning through grid search

In the hyperparameter tuning of various neural network architectures, distinct preferences emerged across models. The CNN, VGG-19, and ResNet50 architectures demonstrated optimal performance with a learning rate of 0.001, indicating a preference for gradual weight updates. This slower learning rate is often advantageous in complex models or intricate datasets, as it helps in navigating the loss landscape more precisely, avoiding overshooting minima. In contrast, VGG-16 and AlexNet favoured a higher learning rate of 0.01. This higher rate can be indicative of these models’ ability to converge faster, possibly due to their less complex structures or different learning dynamics. Interestingly, all models except VGG-16 converged best with a batch size of 128, suggesting a balanced approach to learning that avoids the extremes of very small or large batches. VGG-16’s preference for a smaller batch size of 64 could hint at better handling of noise and generalization, as smaller batches often provide more frequent updates and can navigate the loss landscape differently (Table 2).

Table 2 Results obtained from grid-search hyperparameter tuning with 5-fold cross validation technique for training various DL models

Augmentation techniques significantly enhance DL performance by expanding limited datasets. These models typically perform better when trained on large datasets, but in certain fields, such extensive data may not be readily available such as X-ray, NMR. Data augmentation addresses this by creating synthetic data through transformations, enlarging datasets, and enhancing model generalization [25, 33]. By dividing X-ray images into training and validation datasets in an 80:20 ratio DL-based algorithms were trained and validated. The division of available data into subsets with similar distributions is essential. In our study dataset was randomly split. Earlier studies employing DL on image data often utilized random splitting [23, 34, 35]. However, random splitting, while appearing fair, may lead to overly optimistic results, as the test set shares the same distribution as the training set. The data split introduces an additional challenge, potentially impacting model performance. To address this, 5-fold cross-validation was implemented in DL, often yielding a model performance that is less biased or optimistic compared to a single random train-validation split. DL automates feature extraction, versatile for produce inspection. Model tuning via grid search optimizes hyper-parameters despite complexity [31].

Regarding other parameters, a varied dropout rate was observed: CNN and ResNet50 were tuned to 0.10, indicating a lower regularization need, which might be due to their inherent architectural features. VGG models and AlexNet, with higher dropout rates of 0.20 and 0.40, respectively, suggest a greater need for regularization, possibly to combat overfitting in these architectures. The optimizer choice showed a division: RMSProp was favored for CNN, VGG16, and VGG19, which might be due to its effectiveness in handling the gradients in deeper networks, whereas Adam was chosen for AlexNet and ResNet50, potentially for its adaptive learning rate capabilities. Finally, the activation function was consistently ReLU for all models except the CNN, which opted for LeakyReLU. This choice could reflect the CNN’s requirement for addressing the dying ReLU problem, where LeakyReLU helps by allowing a small gradient when the unit is not active.

The industry’s key objective is to accurately detect defective fruit to maintain consumer satisfaction and financial viability. Balancing recall and precision are essential for classifier assessment, considering biological variability to bolster consumer confidence and mango purchases. Evaluating overall model performance, especially for online mango classification, is critical for sustaining supply chains and reducing food waste. Detecting minor defects and preserving healthy fruit minimizes financial losses. Hence, developing models with a high F1 score, such as the VGG19 model in our study, is imperative for mango binary classification, meeting these criteria effectively.

Binary classification of mango using radiography

DL models successfully distinguish Alphonso mango into Non-spongy and spongy classes through X-ray imaging (Fig. 4). CNN model showed a precision of 81.76 and 95.12%, yielding a recall of 95.86 and 79.22%, and showed an F1-score of 0.88 and 0.86 for Non-spongy and spongy classes respectively. Overall accuracy stood at 87.27% on test dataset. AlexNet model achieved a precision of 86.33 and 94.95%, attained recall of 95.39 and 85.07%, and showed a balanced F1-score of 90.91 and 89.71% for Non-spongy and spongy classes respectively. Overall accuracy resulted in 90.21% on test dataset. The ResNet50 model excels in binary image classification, demonstrating 82.5% precision and an impressive 99.19% recall for the “Non-spongy” class, resulting in a balanced F1 score of 90.05%. For the “spongy” class, the model achieves exceptional precision at 99.02%, with a strong recall of 79.33%, leading to an effective F1 score of 88.06%. The overall accuracy of the model is reported at 89.18% on test dataset. VGG16 model during binary image classification achieved a precision of 86.95 and 98.76%, recall of 98.91 and 84.91%, and F1 score of 92.25 and 87.58% for Non-spongy and spongy classes respectively. overall accuracy during testing reported 91.75%. VGG19 excels with a 93.47% precision and 98.82% recall for a Non-spongy balanced F1 score of 95.87%. Remarkably, for spongy it achieves a 98.47% precision, 93.14% recall, and an effective F1 score of 95.71%. The overall accuracy stands at 95.82% during testing of model.

Fig. 4
figure 4

Performance of various machine learning models on test dataset interpreted by confusion matrix (A) CNN, (B) AlexNet, (C) ResNet50, (D) VGG16, and (E) VGG19

After grid search hyperparameter tuning, the best parameters were selected based on validation accuracy. These were used to train the entire dataset for generalization. The optimized hyperparameters were used for testing models (Fig. 5). VGG19 achieved the highest average F1 score for both classes, surpassing other models (VGG16, ResNet50, AlexNet, and CNN). VGG19 also exhibited the lowest standard deviation, indicating consistency. The marginal F1 score difference between “Non-spongy” and “Spongy” classes suggests comparable accuracy (Fig. 5A). Additionally, VGG19 had the highest recall for both classes, while CNN and ResNet50 showed a significant difference in recall between these classes (Fig. 5B). VGG16 demonstrates the highest precision for “Spongy” fruits, followed by VGG19, whereas for “Non-spongy” fruits, VGG19 outperforms others with a smaller difference between the two classes (Fig. 5C). VGG19 exhibits the highest overall accuracy, followed by VGG16, ResNet50, and AlexNet, while the standard CNN model lags behind (Fig. 5D).

Fig. 5
figure 5

Comparison of different models using different parameter (A) F1 score, (B) Recall, (C) Precision, (D) Accuracy on test dataset

In direct comparison, ResNet50 excelled with the highest recall for the “Non-spongy” class (99.19%) and the highest precision for the “Spongy” class (99.02%), showcasing its strength in accurately identifying “Non-spongy” instances and predicting “Spongy” instances correctly. Meanwhile, VGG19 emerged as the model with the highest overall accuracy (95.82%), underscoring its proficiency in making accurate predictions across both classes. Each model demonstrated unique strengths across precision, recall, F1 score, and overall accuracy, emphasizing the importance of considering these metrics when selecting the most suitable model for a given classification task. All the pairwise combinations from the Tukey HSD test show that there is a significant difference between the models for accuracy parameter except the combination of AlexNet & ResNet50 model. (Table 3; Fig. 6).

Table 3 Paired comparison of models
Fig. 6
figure 6

Accuracy comparison of model using Tukey HSD test

In the recent adaptation of the CNN architecture, specifically VGG19, noteworthy classification outcomes were obtained, showcasing exceptional performance. By employing a dataset consisting of 1430 X-ray images of mangoes, the VGG19 model achieved an impressive accuracy rate of 95.82%. Additional metrics like precision, recall, and F1 score also attained a high score of 0.96 for both Non-spongy and Spongy fruits.

In the present study ResNet50 and AlexNet DL models achieved an accuracy of 89.18 and 90.21% on testing dataset. Higher classification accuracy was achieved by both models as compared to CNN model because efficiently learned all the parameters from early activations deeper in the network. For present study precision, recall, F1 score and overall accuracy ratings for all models are given in Table 4. In this study, VGG19 architecture of transfer learning achieved higher accuracy than other architectures. Due to the 5 convolutional layers and increasing filter size 64 to 512, hyperparameters of ReLU activation VGG19 algorithm achieved the best results among all. Similarly, Ansah et al. [1] located seed spoilage in mango fruit using X-ray imaging and deep transfer learning. A VGG 16 DCNN model recorded accuracy of 97.66%. These results highly corroborate with our findings. Van de Looverbosch [19] employed X-ray imaging and deep learning semantic segmentation and utilizing a U-net model for the classification of peach. They achieved 99.4% accuracy for healthy vs. defective peach classification. Matsui et al. [31] used an automated X-ray machine for the detection of internal fruit rot in Hass avocado using DL. They achieved an accuracy score of 0.98 during binary classification using DL-based semantic segmentation.

Table 4 Classification report of various machine learning algorithms on testing dataset

In our research, a universal threshold value of 0.95 was employed across all models, emphasizing the critical need for accurate classification to uphold export standards. The threshold serves as a decisive marker, categorizing pixels above it as foreground and those below as background. As a result of this approach, our models demonstrated accuracies ranging from 87 to 95%. In contrast, a different study opted for a threshold value of 0.5 [22]. In present study, VGG19 demonstrated superior performance among the models tested and also shows significant result in model accuracies ensured by TukeyHSD test. This highlights the task-specific nature of model performance, leading us to select VGG19 for further classification based on our target and dataset.

Hyperparameter tuning played a significant role in influencing model outcomes. To ensure a fair comparison, model evaluations were conducted on a balanced and sizable dataset using cross-validation techniques. In postharvest research, many studies typically adopt pre-existing model architectures and train them on their unique datasets. A notable challenge in the postharvest domain lies in ensuring the adaptability of models to different varieties, harvesting seasons, storage conditions, and treatments.

The full integration of X-ray imaging in fruit processing industries faces challenges due to the significant initial investment costs and the necessity for a high-voltage power source to generate X-rays. Implementing automated X-ray analysis on existing lines requires significant hardware and software improvements for quick and continuous scanning. This could streamline internal defect detection on a larger scale, replacing manual analysis. However, challenges include variations in image quality and magnification with different X-ray equipment [1, 24]. Future studies may explore cross-sectional comparisons with diverse equipment and larger datasets to enhance research validity. While the use of this technique raises concerns about the harmful effects of X-rays, effective shielding can mitigate human exposure [12]. Anticipated advancements in instrumentation and computational power, encompassing both hardware and software, suggest that this technique will gain increased applicability in the agriculture field in the future.

Conclusion

This research achieved accurate mango classification for ST presence using DL models. To prevent underfitting/overfitting, thorough hyperparameter tuning was performed individually in each of the five cross-validation folds. This comprehensive approach ensures finely-tuned models with robust performance. The non-destructive detection model introduced here serves as an objective benchmark and can significantly improve post-harvest inspections for Alphonso mangoes. It enhances inspection efficiency, reduces misclassification, and boosts export prospects, fostering consumer trust.

Future research avenues include validating this model’s effectiveness with external datasets. This evaluation would assess its resilience when applied to samples with various internal irregularities or mechanical impairments. Such efforts aim to further refine and enhance the system’s performance, ensuring its reliability and applicability in diverse mango inspection scenarios.