Introduction

Oilseeds are a prominent part of diets worldwide and are well-known as the primary source of edible oil. Among oilseeds, flaxseeds are highly desirable due to the high contents of alpha-linolenic acid (an omega-3 fatty acid), fiber, and plant lignans. The global trade value of flaxseed was 425.3 million US dollars in 2021, with an expected compound annual growth rate of 12.8% in 2021–2026 (Mordorintelligence, 2022). The growth in the market demand mandates minimizing the loss in quality and quantity of flaxseed throughout the supply chain. Therefore, threats such as spoilage, insect infestation, and mechanical stress should be closely monitored and mitigated. To this end, scholars have evaluated the effect of various potential contributing factors to control the aforementioned threats.

In the case of mechanical stress, damaged seeds may lose viability and yield or become susceptible to insect and fungal infestation, ultimately downgrading sample quality and price. A detailed overview of the potential adverse effects of mechanical damage on seed properties can be found elsewhere (Chen et al., 2020). To minimize the mentioned problems, scientists have focused on exploring the effect of moisture content (MC) and/or impact stress on the breakage susceptibility of various seeds (Erkinbaev et al., 2019; Khazaei et al., 2008; Nadimi et al., 2022; Shahbazi, 2011; Shahbazi et al., 2012, 2014, 2017). The published works have indicated that the appropriate selection of MC or maximum induced impact stress could minimize the mechanical damage to seeds.

Despite several previous works in this domain, the majority of prior efforts investigated the effect of MC and impact energy (IE) on the exterior surfaces of the samples through a visual inspection (Erkinbaev et al., 2019; Khazaei et al., 2008; Shahbazi, 2011; Shahbazi et al., 2012, 2014, 2017). However, visual inspection is cumbersome, slow, subjective, and limited to detecting apparent external damage. Moreover, some studies revealed that seeds’ external and internal damage may not always be highly correlated (Nadimi et al., 2022). Hence, developing a rapid, reliable, and intelligent system that could automatically assess seeds’ mechanical damage beyond the surface has always been of great interest. In this regard, Nadimi et al. (2022) recently demonstrated the capabilities of radiographic imaging and machine vision techniques in evaluating internal mechanical damage to flaxseeds. Two simple percentile-based classification algorithms were developed using the gray level distributions of radiographic images of mechanically damaged flaxseeds to classify them into two broad groups of nil/low and medium/high damage. The authors suggested the implementation of advanced machine learning algorithms to better discriminate the mechanically damaged seeds, which was not in the scope of their study and hence was not explored (Nadimi et al., 2022).

The capability of state-of-the-art data analysis tools such as machine learning and deep learning in fruit and grain quality evaluation has been already demonstrated in several works (Divyanth et al., 2022a; Erkinbaev et al., 2022; Hosainpour et al., 2022; Li et al., 2022; Nadimi et al., 2021; Sabzi et al., 2022). For instance, scholars have reported the applications of image processing (Anami et al., 2015; Chaugule & Mali, 2014; Cubero et al., 2011; Dubey et al., 2006) and machine learning–based models in estimating the ripeness of fruits (Kangune et al., 2019; Kheiralipour et al., 2022; Khojastehnazhand et al., 2019; Nanyam et al., 2012), identifying grain dockage (Paliwal et al., 2003; Sharma & Sawant, 2017), and segregating grain types (Arora et al., 2020; Velesaca et al., 2021). Similarly, the efficacy of convolutional neural network (CNN) models to monitor grain quality, detect infestations, classify grain grades and types, and identify damaged kernels, has been reported in various studies (Bhupendra et al., 2022; Cubero et al., 2011; Divyanth et al., 2022b; Velesaca et al., 2021). Despite the promising results, to our knowledge, there has not been any effort in utilizing the aforementioned techniques to assess mechanical damage in flaxseed, an economically important nutraceutical and industrial oilseed. To address this knowledge gap, the present study aimed to employ machine learning and deep learning tools to classify mechanically damaged flaxseeds into four groups, viz., no damage (ND), low damage (LD), medium damage (MD), and high damage (HD).

Materials and Methodology

Samples

The samples and radiographic images being used in this study were previously explained in detail elsewhere (Nadimi et al., 2022). In summary, flaxseeds at three levels of MC (6, 8, and 11.5%) were subjected to four different stress levels, viz., 0 (control), 2, 4, and 6 mJ forming 3 × 4 = 12 treatments. For each treatment, three replicates of 100 seeds were imaged using a soft 2D X-ray imaging system (model: MX-20, Faxitron Bioptics, LLC, Tucson, AZ). Overall, 3600 seeds (3 (MC) × 4 (stress levels) × 100 (seeds) × 3 (replicates)) were imaged in this study.

Search Workflow

As illustrated in Fig. 1, the proposed image processing algorithm involved image pre-processing, image labelling, feature extraction/selection, and image classification, which are discussed in the subsequent sections. All analyses were performed using MATLAB (R2022a, Mathworks Inc., Waltham, MA) software, with its statistics and machine learning, image processing, and deep learning toolboxes. The MATLAB application was run on Acer Nitro 5 Intel Core i5 9th Generation Laptop (32 GB/1 TB HDD/Windows 10 Home/GTX 1650 Graphics).

Fig. 1
figure 1

Flow chart of the proposed machine vision algorithm for estimating damage in the seeds

Image Pre-processing

The pre-processing of radiographic images (Fig. 2a) consisted of five main steps (i) image enhancement using the imadjust function (Fig. 2b), (ii) image binarization through global thresholding (imbinarize) (Fig. 2c), (iii) applying morphological opening operation (image dilation followed by erosion) on the mask, (iv) obtaining the corresponding masked image (Fig. 2d), and (v) extraction of individual seeds (Fig. 2e) using bounding box coordinates (regionprops function) of the mask. Seeds with undesired segmentation (such as overlapping seeds) were removed from the dataset (~ 4.6% of the entire data set). Table 1 summarizes the applied image pre-processing steps with the corresponding MATLAB functions.

Fig. 2 
figure 2

Image processing procedure for extracting individual seeds from a group of 100 seeds: (a) original image, (b) enhanced image, (c) binary mask, (d) background masked image, (e) individual flaxseed images. Samples shown here are non-impacted (control) seeds at 8% MC

Table 1 Image pre-processing steps and parameter settings for MATLAB implementation

Seed Labelling

To get a comprehensive prediction of the severity of damage in flaxseeds, the individual seeds extracted from the “Image Pre-processing” section were carefully explored and segregated into four classes, i.e., ND, LD, MD, and HD. Damage usually was identified as a crack, or indentation detectable in radiographic images (see Fig. 3). The ND class represented sound/undamaged seeds (Fig. 3a), flaxseeds with slight damage (minor cracks) were assigned to LD (Fig. 3b), flaxseeds with multiple minor cracks and slight to medium indentations were assigned to MD (Fig. 3c), and HD seeds contained severe indentations and cracks (Fig. 3d). As expected, most of the ND seeds belonged to 0 mJ and/or 2 mJ IE, many flaxseeds of the LD class were from 2 mJ or 4 mJ IE categories, and most of the MD and HD seeds were impacted with 4 mJ and 6 mJ IE, respectively.

Fig. 3
figure 3

Sample X-ray images of flaxseeds from the four classes (a) ND, (b) LD, (c) MD, and (d) HD. Ovals indicate the damaged areas

Seed Damage Analysis

As previously mentioned, algorithms needed to be developed to classify the flaxseeds into four classes, namely, ND, LD, MD, and HD. Two main strategies were deployed for this purpose—machine learning–based pattern recognition and a CNN-based approach (details are provided in the sections “Machine Learning and Pattern Recognition” and “Convolutional Neural Network”).

The image distribution in the dataset was as follows: 1452 images were in ND class, 723 in LD, 718 in MD, and 542 in HD. About 70% of images in each class were reserved for training, while other images were used as the test dataset (Table 2 provides a detailed information on the dataset). The precision, recall, accuracy, and mean F1-score evaluation metrics were used to statistically analyze the classification performances. For a given class, precision is defined as the ratio of true positives (TP) to the total number of objects predicted for this class (TP + false positives (FP)), while recall is the ratio of TP to the actual number of objects in that class (TP + false negatives (FN)). The F1-score is the harmonic mean of precision and recall. Accuracy is the percentage of samples correctly classified (TP + true negatives (TN)) by the model. The equations of the abovementioned metrics are provided in Eqs. (1)–(4):

Table 2 Class-wise image distribution of the flaxseed X-ray image dataset
$$\mathrm{Precision}=\frac{TP}{TP+FP}$$
(1)
$$\mathrm{Recall}=\frac{TP}{TP+FN}$$
(2)
$$\mathrm{The\ F1-score}=\frac{2\times \mathrm{Precision}\times \mathrm{Recall }}{Precision+Recall}$$
(3)
$$\mathrm{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}$$
(4)

Machine Learning and Pattern Recognition

As mentioned in the “Introduction” section, several previous works have utilized image texture, morphology, and color (TMC) features to examine the quality of agri-food products (Kheiralipour et al., 2022; Sabzi et al., 2022). Herein, an analogous approach was used to explore the feasibility of such information to assess mechanical damage in flaxseeds. The gray level co-occurrence matrix (GLCM) and gray level run-length matrix (GLRM) were used to derive the textural features. The GLCM is a measure of how often various combinations of pixel values (or gray levels) occur in a gray scale digital image (Mall et al., 2019). GLRM, on the other hand, represents the occurrences of consecutive and collinear pixels of similar gray levels in the image (Preetha et al., 2018). Texture feature calculations use the contents of GLCM and GLRM to provide a measure of the variation in image texture (pixel values) at the pixel of interest. For feature extraction, only the region of interest (i.e., flaxseed) was used to extract information. The ROI was quantified into 16 gray levels (selected after trial-and-error on gray levels of 8, 16, 32, and 64). For each quantized X-ray image, four GLCM and four GLRM matrices with the orientations Θj ∈ [0°, 45°, 90°, 135°] were computed. Four statistics, namely, variance/inertia, correlation, uniformity, and homogeneity, were extracted from every GLCM matrix. From the GLRM matrices, 11 features were extracted, namely, short-run emphasis (SRE), long-run emphasis (LRE), gray level non-uniformity (GLN), run length non-uniformity (RLN), run percentage (RP), low gray level run emphasis (LGRE), high gray level run emphasis (HGRE), short-run low gray level emphasis (SRLGE), short-run high gray level emphasis (SRHGE), long-run low gray level emphasis (LRLGE), and long-run high gray level emphasis (LRHGE). The morphological features were the ROI’s regular area, convex area, perimeter, eccentricity, major axis length, minor axis length, and circularity. The mean and standard deviation (SD) of the pixel intensities in the gray-scale ROI were utilized as two additional color features. Thus, a total of 69 features were extracted from each seed including 60 textural (4 features from GLCM × 4 orientations, and 11 features from GLRM × 4 orientations), seven morphological, and two color.

Machine learning algorithms, namely, linear discriminant analysis (LDA), K-nearest neighbors (KNN), support vector machines (SVMs), and decision trees were employed as classifiers on the above-derived features. The results of SVM have been discussed in detail in the “Results and Discussion” section due to its superior performance. The other classifiers’ results have been attached as supplementary material (Table S1).

It should be noted that non-linear kernel-based classifiers such as SVM have demonstrated advantages over other machine learning algorithms in many similar studies (Divyanth et al., 2022b, c; Neelakantan, 2021; Sujatha et al., 2021; Wang & Paliwal, 2006) as these classifiers are known for their memory efficiency, faster prediction, and better computational complexity. The TMC-extracted data were z-score normalized (with a mean of 0 and standard deviation of 1) and the “quadratic” kernel function was used for the SVM classifier (optimized using the Classification Learner app).

Initially, all the features were used to develop the classification model. However, since redundant features increase the complexity of the model, such features were eliminated through variable importance analysis. In this study, a well-established statistical approach for means comparison, the analysis of variance (ANOVA) F-test algorithm was used to determine the optimal features (Johnson & Synovec, 2002; Kumar et al., 2015; Pathan et al., 2022). Subsequently, another SVM-based classification model was developed using only the optimum features.

Convolutional Neural Network

A typical CNN is designed using the following set of layers: convolution layers, which are defined by the convolution filters that extract semantic features from the previous layers; pooling layers, which reduce the dimensions of the data by connecting a group of neurons from the previous layer to a single neuron, thus minimizing the computational requirements and help in generalizing the features; and fully connected layers, which process the activations/features in the form of flattened matrices to classify the image.

Herein, we used a transfer learning approach and evaluated the performance of six pre-built powerful and popular deep convolutional networks, viz., EfficientNet-B0 (Tan & Le, 2019), VGG19 (Simonyan & Zisserman, 2014), Resnet18 (He et al., 2015), MobileNet-v2 (Sandler et al., 2018), Inception-v3 (Szegedy et al., 2014), and Xception (Chollet, 2016). Transfer learning offers reduced training time in differentiating between classes. The results of EfficientNet-B0 have been discussed in detail in the “Results and discussion” section due to its better performance. Results of other CNNs have been attached as supplementary material (Table S2) for comparison.

In the EfficientNet group of networks (Tan & Le, 2019), the three dimensions of width, depth, and resolution are scaled with a constant ratio (the technique is called the compound scaling method), instead of arbitrarily scaling up. A new baseline network was created and then scaled up according to the computational requirement. A compound scaling coefficient \(\upphi\) is defined that denotes the number of resources available to determine the scaling of \(\alpha\), \(\beta\), and \(\gamma\), where \(depth\ (d)=\alpha\), \(width\;(w)=\beta^\phi\), and \(resolution\ (r)= {\gamma }^{\upphi }\). The restraint \((\alpha \times {\beta }^{2} \times {\gamma }^{2}) \approx 2\) was enforced, such that the total floating-point operations per second (FLOPS) is not more than \(\mathrm2^\phi\) for a given scaling factor. A grid search strategy was used to identify the relationship between different scaling dimensions of the baseline network under the fixed resource constraint.

In the network used in this study, the value of \(\upphi\) was set to 1; hence, the values of \(\alpha\), \(\beta\), and \(\gamma\) were found to be 1.2, 1.1, and 1.15, respectively. The architecture comprises mobile inverted bottleneck convolutions (also called inverted residual blocks), where the skip connections are made between the narrow parts, i.e., the start and end of the block (introduced in MobileNetv2 model (Sandler et al., 2018)). In the residual blocks, the first step widens the network using a 1 × 1 convolution, which is followed by a 3 × 3 depth-wise convolution, and then a 1 × 1 convolution again to shrink the network to match the initial number of channels. The network was pre-trained on Imagenet dataset (Deng et al., 2010) before training on our data.

The architecture of the CNN model is presented in Fig. 4. Since the last three layers in the original network are configured for 1000 classes (number of classes in Imagenet), they were replaced by a new set of fully connected (FC) layer, softmax layer, and a classification layer corresponding to four output classes. To fit the input size of the network, the images were resized to a dimension of 224 × 224 pixels by zero padding along the boundaries. Zero padding makes sure that the morphological representations of the ROI (like the area and perimeter) are not impaired, unlike interpolation-based image resizing operations. Image geometry-based augmentation techniques, such as translation along x- and y-axes, random rotations (+ 90 to − 90), and x- and y-axes mirroring were specified in the training data. The stochastic gradient descent with momentum (sgdm) was chosen as the network training optimizer, with the following hyperparameters: initial learn rate of 0.001; momentum of 0.9; weight decay factor of 0.0001; and a mini-batch size of 32. The maximum number of epochs was limited to 200, and an early stopping condition was enabled.

Fig. 4
figure 4

Schematic representation of the EfficientNet-B0 CNN implemented for the classification of flaxseeds. Each block represents a mobile inverted bottleneck convolution (MBConv). The size of the feature map from each block is provided beside the arrow marks.

To evaluate the performance of the CNNs, the models’ accuracy (Eq. 4) and cross-entropy loss were assessed. The cross-entropy loss can be expressed as (Altuwaijri & Muhammad, 2022; Ji et al., 2022):

$${L}_{CE}=-\sum\nolimits_{i=1}^{n}{t}_{i}\log({p}_{i})$$
(5)

where n is the number of classes, ti is the correct (truth) label (either 0 or 1), and pi is the softmax probability for the ith class. More details on cross-entropy loss calculations are available elsewhere (Ji et al., 2022; Mahjoubi et al., 2022; Matlab Crossentropy, 2022; Yeung et al., 2022).

The details of CNN architectures for MobileNet, Inception, Resnet18, VGG19, and Xception can be found in the original research papers (Chollet, 2016; He et al., 2015; Sandler et al., 2018; Simonyan & Zisserman, 2014; Szegedy et al., 2014). Indeed, similar to the EfficientNet-B0 model described above, the final layers (FC, softmax, and classification layers) were adjusted according to our 4-class data, and the images were resized based on the given network’s input size requirement.

Results and Discussion

The internal and external damages in seeds were noticeable as darker regions in the X-ray images; i.e., the gray value at the impaired region was significantly less compared to the sound portions of the flaxseeds (see Fig. 3). As mentioned in the “Seed Damage Analysis” section, two different approaches were utilized to classify flaxseeds based on their severity of the damage.

Table 3 shows the results of the SVM classification models for classifying the mechanical damage in flaxseeds. The classifier using all the image features achieved an overall classification accuracy of 87.4%. The overall precision and recall for the model were 88.1% and 81.9%, respectively. The corresponding confusion matrix is provided in Fig. 5a. As anticipated, the flaxseeds of the ND class were classified with the highest precision of 92.7% and recall of 99.0%. From the confusion matrix, it can be observed that some seeds of the LD class were misclassified as ND and thus the reason for the LD class’s reduced recall value (72.7%). Also, its poor precision (70.8%) was due to the misclassifications from HD flaxseeds (hence the reduced recall of the MD class). The HD class showed an appreciable F1-score of 90.3%. Most of the misclassifications were reported to the classes representing the severity of damage adjacent to the true class.

Table 3 Analysis of classification results of SVM classifier on TMC extracted features on the test set
Fig. 5
figure 5

Confusion matrices produced by SVM for classifying flaxseeds into the four classes (a) using all of the TMC features (b) with selected features

Some previous studies report that SVMs tend to overfit when too many features are utilized to develop the model (Koklu et al., 2022; Thaiyalnayaki & Joseph, 2021). Hence, as suggested earlier, the redundant features were removed through the ANOVA approach. The rankings of the features based on the importance scores are provided in Fig. 6. Interestingly, among the top 30 features, 24 were derived from the GLRM, including GLN, LGRE, LRHGE, LRLGE, and RLN. Out of the remaining six features, two belonged to color, and four were morphological features.

Fig. 6
figure 6

Features arranged based on their contribution (variable/feature importance score from ANOVA) towards the classification performance of the SVM model

Considering the observed differences between feature important scores, variables with scores over 100 were considered optimal and were used to develop another SVM-based classification model. This means only 25% of TMC features were kept for further analysis. Those features include GLN (0°, 45°, 90°, 135°), average intensity, LGRE (0°, 45°, 90°), LRHGE (0°, 45°, 90°, 135°), LRLGE (45°, 90°, 135°), RLN (90°), and SRE (90°).

After removing the redundant feature representations, the classification accuracy improved slightly to 88.4% from the previously achieved 87.4% (Table 3; Fig. 5b presents the confusion matrix). The total misclassification cost for the MD class decreased by around 10%. There was no improvement in predicting images of the HD class; however, the precision of the LD class and recall rate of the MD class showed some improvement. These results validate the potential of the implemented optimum feature selection strategy in reducing the computation time and power without compromising the system performance.

The classification performance of the CNN model is illustrated in the confusion chart (Fig. 7). The CNN training was stopped early (coordinated using the fivefold cross-validation (CV) loss (Eq. 5) and CV accuracy (Eq. 4) after nearly 2100 iterations to avoid overfitting. The CV accuracy reached > 80.0% soon after almost the 500th iteration; however, the rate of increase was very gradual for the next 400 iterations and reached a saturation point (the training plot has been depicted in Fig. 8). An overall accuracy of 91.0% was achieved on the test data and the final classification accuracy was 91.6% on fivefold cross-validation. Looking at the matrix, the model was able to identify ND and HD flaxseeds with almost 100% and > 96% recall rates, respectively. High precision values (> 93%) were obtained for all classes except LD (76%). The LD class experienced relatively poor precision (compared to other classes) since a noticeable amount of the LD flaxseed samples were misclassified as MD and vice versa. The activation maps from the intermediate layers of the network were also inspected (Fig. 9). The model tends to learn finer and finer details present in the image as we move to the deeper layers. The initial layers just present the outlines of the shapes; however, the activations seem to fade and become more abstract as it passes through subsequent layers of the network.

Fig. 7
figure 7

Confusion matrix produced by the CNN for classification of flaxseeds into the four classes

Fig. 8
figure 8

Training plots of the CNN for 4-class classification task: (a) accuracy plot and (b) loss plot (dotted line denotes the fivefold cross-validation performance)

Fig. 9
figure 9

Activation maps derived from the (a) first convolutional, (b) third convolutional, and (c) fifth convolutional layers of the EfficientNet-B0 model for a given input image

Undoubtedly, CNN provides the best performance among the three classification models with the highest accuracy of 91.0%. The precision and recall rates for ND and HD classes were > 94%, with the MD class securing a recall rate of 93.2%. It can be noticed from Fig. 7 that the number of misclassified images of the MD-HD class has been reduced to a great extent when compared with the confusion matrices produced by feature extraction techniques.

In a relevant study (Nadimi et al., 2022), a percentile method based on SVM and LDA was adopted on the flaxseed X-ray image’s gray level distribution for a similar classification task. However, the maximum classification accuracies for 2-class and 4-class classifications were limited to 87.2% and 60.0%, respectively, which were obtained using an SVM model. This study proves that the CNN model outperforms the previous models as the accuracies for 2-class and 4-class classification could be obtained as 95.2% and 91.0%, respectively.

It is worth mentioning that image feature extraction techniques have accorded appreciable performances for grain quality assessment in the literature. An accuracy of 99.6% was achieved by Singh and Chaudhury (2020) for classifying eight rice varieties using textural features from GLCM and GLRM. Sapirstein et al. (1987) developed a discriminant analysis model primarily on grain morphological features (such as kernel length and width, area, aspect ratio, and contour length) that yielded 99.0% accuracy for classifying wheat, rye, barley, and oats in a four-way admixture. On a similar note, Visen et al. (2003) used textural and color characteristics to identify unknown grain types with over 90% accuracy. A high-speed system based on digital imaging was developed to identify defects in wheat kernels one by one using morphological and textural features of images captured at opposite angles (Delwiche et al., 2013). Analogous to our study, the derived morphological features were the area, perimeter, eccentricity, and major and minor axis lengths. In another study, an artificial neural network was used as a classifier on TMC extracted grain image features for the identification of mechanical damage to corn and barley (Nowakowski et al., 2011).

Despite all the research works mentioned above, our thorough literature review indicates that the present work is the first to utilize machine learning and deep learning algorithms to assess mechanical damage in flaxseeds. The developed model has the potential to be implemented as a pre-screening technique in the agriculture industry to reduce the time and labor currently used in the mechanical damage assessment of grain and oilseeds.

Conclusion

To the best of our knowledge, this work is the first in-depth exploration of mechanical damage to flaxseed using radiographic imaging and artificial intelligence algorithms. Various machine learning and deep learning tools such as pattern recognition, features selection, and transfer learning were used. The features selection revealed that the average pixel intensity, GLCM- and GLRM-derived features were among the most contributing features in discriminating the severity of mechanical damage. However, the best performance was achieved using the EfficientNet-B0 CNN model, where the damaged flaxseeds were classified into four classes with an accuracy of 91.0%.

We believe the developed model can open a promising pathway for the automated detection of mechanical damage in the grain and seeds industry through further research.