Comparative Evaluation of Some Quality Characteristics of Sunflower Oilseeds (Helianthus annuus L.) Through Machine Learning Classifiers

Çetin, Necati; Karaman, Kevser; Beyzi, Erman; Sağlam, Cevdet; Demirel, Bahadır

doi:10.1007/s12161-021-02002-7

Comparative Evaluation of Some Quality Characteristics of Sunflower Oilseeds (Helianthus annuus L.) Through Machine Learning Classifiers

Published: 15 March 2021

Volume 14, pages 1666–1681, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Food Analytical Methods Aims and scope Submit manuscript

Comparative Evaluation of Some Quality Characteristics of Sunflower Oilseeds (Helianthus annuus L.) Through Machine Learning Classifiers

Download PDF

Necati Çetin ORCID: orcid.org/0000-0001-8524-8272¹,
Kevser Karaman²,
Erman Beyzi³,
Cevdet Sağlam¹ &
…
Bahadır Demirel¹

1204 Accesses
28 Citations
Explore all metrics

Abstract

Sunflower seeds are rich in oil and oleic acid, thus having valuable nutritional properties. Sunflower is also resistant to dry conditions and can adapt easily to harsh environmental conditions. Physico-chemical properties play a great role in classification, grading, and quality assessment of sunflower seeds. In present study, six different machine learning algorithms (decision tree, DT; random forest, RF; support vector machine, SVM; multiple linear regression, MLR; Naïve Bayes, NB; and multilayer perceptron, MLP) were used to evaluate the classification performance for six different sunflower oilseed varieties. Additionally, characteristic properties of the oilseeds were evaluated by multivariate tests (MANOVA) and discriminant analysis. The best accuracy values were recorded as 80.16, 79.68, and 78.89 for RF, SVM, and MLP, respectively, and the lowest MAE value (0.088) was observed in NB. The MANOVA, Wilks’ lambda, and Pillai Trace statistics revealed that differences in physical attributes of the sunflower varieties were significant (p<0.01). Colombi and Transol varieties with the lowest Mahalanobis distances had the similar attributes.

Machine Learning for Varietal Binary Classification of Soybean (Glycine max (L.) Merrill) Seeds Based on Shape and Size Attributes

Article 11 April 2022

The Advantage of Multispectral Images in Fruit Quality Control for Extra Virgin Olive Oil Production

Article 17 August 2021

The use of machine learning methods in classification of pumpkin seeds (Cucurbita pepo L.)

Article 25 June 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Sunflower is native to North America and always attracted the attentions of scientist as an important industrial crop (Badouin et al. 2017). In evolutionary biology, the genus Helianthus is a long-standing model of hybrid traits (Rieseberg et al. 1995) and adaptive introgression (Vandenbrink et al. 2014). It is also seen as a model for tracking the sun and understanding the development of flowering in plant science (Tähtiharju et al. 2011). Sunflower seeds are rich in oil (40–50%) (Demir and Basalma 2018) and oleic acid (30%) (Avni et al. 2016). Sunflower seeds also have high polyunsaturated and unsaturated fatty acids and high linoleic acid contents (Kiriamiti et al. 2002; Salgin et al. 2006). The fact that sunflower was responsible for about 87% of vegetable oil production makes it more prominent than the other oilseeds. Sunflower seed is also rich in minerals, vitamins, flavonoids, glycosides, phenolic acids, alkaloids, tannins, caffeic acid and sterols etc. (Adeleke and Babalola 2020). The seeds constitute an important source of protein and offer a balanced source of amino acids (Kiriamiti et al. 2002; Salgin et al. 2006). Therefore, they play an important role human nutrition and health. In general, 90% of sunflower seeds is used for oil production and the rest for fresh consumption as an appetizer (Khodabakhshian et al. 2009).

There are many different engineering traits that distinguish agricultural products. Shape and size properties (dimensions, area, mass, elongation and spherical parameters) are important quality indicators of agricultural products. These features should be well-known while designing machinery and systems used in different processes such as classification and quality assessment of agricultural products (Costa et al. 2011). The physical properties of sunflower seeds should also be well-known while designing various equipment such as cleaner, grader, dehuller, separator, and oil expeller (Gupta et al. 2007).

Size, mass and shape attributes can be determined for a single seed. However, the frequency distributions of these properties should be defined for all sets of seeds when designing the relevant equipment (Khazaei et al. 2008). In order to minimize the number of damaged and defective seeds, reduce the power consumption of threshing machines and increase the efficiency of the machines, the relevant machines and classifiers should be so designed and optimized as to comply with the physical properties of the seeds (Mirzabe et al. 2012). Shape, size, mass and biochemical attributes designate the quality of seeds for various purposes (Sorkheh et al. 2018). Product quality is characterized as a combination of the texture, flavor and visual appearance of the fruit, and consumers often shape their preferences based on these parameters (Kays 1999).

Machine learning-based product classification applications are effective tools used in design of accurate and reliable classifiers. Such applications include various algorithms such as decision tree, artificial neural network, genetic algorithm, regressions and fuzzy logic. Additionally, there are powerful algorithms for training of various machine learning models and for adapting difficult input-output mapping strategies as well as selecting and removing useful features. These models are generally used for correct selection of descriptive features in the quality assessment of agricultural products (Omid et al. 2010; Mollazade et al. 2012).

Several researchers studied size and shape attributes of sunflower seeds (Gupta and Das 1997; Santalla and Mascheroni 2003; Khodabakhshian et al. 2010; Mirzabe et al. 2012; Jafari et al. 2011; Malik and Saini 2016; Munder et al. 2017; Ortiz-Hernandez et al. 2020), mineral composition (Ingale and Shrivastava 2011; Kirbaslar et al. 2012), and fatty acid composition of sunflower seeds (Gupta and Shrivastava 2004; Ekin et al. 2005; Izquierdo and Aguirrezábal 2008; Werteker et al. 2010; Ingale and Shrivastava 2011; Alberio et al. 2016; Li et al. 2017). Some other researchers used different machine learning algorithms for quality traits of sunflower seeds (Kavdır and Guyer 2008; Omid 2011; Mollazade et al. 2012; Zielinska et al. 2012; Barbosa et al. 2015; Soltani and Omid 2015; Siedliska et al. 2017; Niazian et al. 2018; Abdipour et al. 2019; Gumus et al. 2018; Kurtulmuş 2020). However, present literature reviews revealed that there were any studies about discrimination and classification of sunflower seeds based on physical and chemical properties with the use of machine learning algorithms. Therefore, this study was conducted to (i) determine the size, shape, mass attributes, fatty acid, and mineral composition of sunflower seeds; (ii) distinguish the sunflower oilseed varieties; and (iii) classify and compare the samples by using the different machine learning algorithms.

Materials and Methods

Plant Material and Sample Preparation

Seeds of six sunflower cultivars (Transol, Colombi, Tunca, 63MM54, LG5582, and P64LC53) were used as the plant material of the present study. Sunflower seeds were harvested from a local farm located about 48 km away from Konya (Turkey) in October 2018 (38° 18′ 54.5″ N 32° 46′ 35.6″ E). Foreign materials such as dust, dirt, bark and deformed and immature seeds were manually eliminated from the raw materials. The seeds were preserved in a fridge at 5 ± 0.5 °C until the time of analysis. Before the analyses, sufficient quantity of seeds was removed from the fridge and kept under room conditions for 2 hours. Seed mass was determined with a precise balance (±0.001).

Determination of Proximate Composition

The methods specified in AOAC (2000) were used to determine crude ash and oil content of the sunflower seeds. Samples were dried in an oven at 105 °C for 4 hours to get dry matter content. All proximate analyses were performed in three replications.

Determination of Fatty Acid Composition

Fatty acid composition of the sunflower seeds was determined with the use of fatty acid methyl esters (Sukhija and Palmquist 1988) in a gas chromatography device equipped with a flame ionization detector and a capillary column. Helium (H₂) was used as the carrier gas. Resultant peaks were compared to authentic standards to get fatty acid quantities.

Determination of Element Composition

Initially, 0.5 g ground sample was supplemented with 10 ml nitric + perchloric acid. Following the wet-digestion of the resultant mixture, diluted samples were subjected to element (B, Ca, Cd, Cu, Fe, K, Mg, Mn, Na, Ni, P, S, and Zn) analyses in an ICP-OES device (Mertens 2005a, 2005b).

Shape and Dimensions

A digital caliper (±0.01 mm) was used to measure seed length (L, mm), width (W, mm) and thickness (T, mm). Then, the volume (mm³), sphericity (φ, %), shape index (SI), roundness (R), elongation (E), aspect ratio (AR), geometric mean diameter (Dg, mm), projected area (PA, mm²) and surface area (S, mm²) of the samples were calculated with the equations given in Table 1.

Table 1 Equations used to calculate the size and shape properties of the sunflower oilseed varieties

Full size table

Feature Selection and Validation Methodology

Many different techniques are used for the selection of the features. In this study, correlation-based feature selection (CFS) was used. A search algorithm and Pearson correlations are used in CFS to assess feature subsets. It allows individual characteristics to be considered with the level of interrelationship between them to estimate the CFS class label (Hall 1999). Best first search algorithm was used to select the best subset (Witten and Frank 2005). CFS (CfsSubsetEval) algorithm was applied on physical properties of sunflower seeds with the use of Weka® v3.8 software (Hall et al. 2009) and 7 features (length, width, thickness, mass, projected area, sphericity, and aspect ratio) were selected. In the study, 105 samples were measured for each feature. A total of 4410 values were used for classification after CFS attribute selection.

Cross-validation evaluates the generalization ability of each model by comparing its performance in a dataset not used during training to fit the parameters of different models. This method is applied practically and effectively in classification and prediction (Stegmayer et al. 2013). In the present study, in order to validate the classification model generated, the k-fold cross-validation method was utilized. In machine learning applications, the k value is usually chosen as 5 or 10 (Ataş et al. 2012) which was selected as 10 in the present study. The dataset was divided into 10 subsets by using 10-fold cross-validation technique. In this way, every subset had an equal proportion of each class example. Training and testing were carried out with 10 iterations. In each iteration, 1 subset was used for testing and the rest of the subsets (9 subsets) were used for training and also with each of the k subsamples used exactly once as the testing respectively. After that, the average of all errors in each iteration was taken as the error estimate of the classifier. Based on the confusion matrix, the values of evaluation metrics were calculated for algorithms. After that, the performance of classification algorithms was compared by using specified evaluation metrics (Witten and Frank 2005; Stegmayer et al. 2013).

Classification Techniques

A classification technique of machine learning was utilized in the present study by using the Weka® v3.8 software (Hall et al. 2009). Six different machine learning classifiers were performed in a laptop having a core i5 central processing unit (CPU), 2.5 GHz and 8 GB memory. The classification of varieties using machine learning was based on the main physical properties. Shape (sphericity, aspect ratio), size (length, width, thickness, projected area) and mass were used as the criteria for classifying, because these properties have been chosen by CFS attribute selection.

Decision Tree (DT)

Decision tree (DT), a supervised learning algorithm, is able to learn specific data, develop a set of rules to classify or predict an attribute (Drazin and Montag 2012). The method depends on “divide and conquer” strategy (Safavian and Landgrebe 1991). DTs are arranged so that a class is rejected at each layer of the tree. The last class under the tree is considered as the specified class. There are two different DT induction algorithms, namely J48, C4.5, and logistic model tree (LMT) used in classifications (Quinlan 1993; Landwehr et al. 2005). In this study, the decision tree so called as J48, C4.5 was used, and number of leaves was 28 and the size of tree was 55. The schematic illustration of applied DT is presented in Fig. 1.

Random Forest (RF)

Random forest algorithms generate more than one decision tree to train each tree in classification. Since each tree is a separate classifier in RF, this algorithm is used for group classification (Breiman 2001). Contrary to DT, a decision is made with the majority of ensemble of trees constructed by RF in assigned class of data sets (Berhane et al. 2018). Since bootstrap and ensemble scheme could overcome over fitting problem inherited from DT, there is no pruning step in RF (Ali et al. 2012). In addition, RF generally has a high predictive accuracy and is robust against noise (Breiman 2001; Rodriguez-Galiano et al. 2012).

Support Vector Machines (SVMs)

SVM has different learning algorithms used in classification and regression models. In SVM method, the optimal limit is achieved separately based on probability distribution of the training vectors in the data set. The vectors located close by the hyper plane are defined as support vectors. If the space cannot be separated linearly, a kernel function is used to examine the relationships between the investigated traits (Vapnik 2000).

Multiple Linear Regression (MLR)

MLR elucidate the linear relationships of a dependent variable with the independent variables of the interest. MLR, which is one of the prediction methods, was applied to the data sets in the study. With the prediction techniques used in present analyses (Witten and Frank 2005; Wu et al. 2008), a response variable (y) from a range of predictor variables was estimated with the use of the following Eq. 1 (Vereecken and Herbst 2004; Hastie et al. 2009):

$$ \mathit{\mathsf{y}}=\mathit{\mathsf{a}}\sum \limits_{\mathit{\mathsf{i}}=\mathsf{1}}^{\mathit{\mathsf{n}}}{\mathit{\mathsf{b}}}_{\mathit{\mathsf{i}}}{\mathit{\mathsf{x}}}_{\mathit{\mathsf{i}}}+\varepsilon $$

(1)

where a is an intercept, x_i is a predictor variable, b_i is a regression coefficient, and ε represents the error.

Naïve Bayes (NB)

This algorithm computes the previous possibilities of each attribute in each class. It is assumed that the possibilities are independent from each other. Classification is done using known probabilities of each class and known probabilities of each attribute. The subsequent probability of class C_i can be written as (Eq. 2):

$$ P\left({C}_i\right)\times P\left({a}_1={x}_1\kern0.36em and\kern0.36em {a}_2={x}_2..\dots \kern0.36em and\kern0.24em {a}_n={x}_n\mid {c}_i\right) $$

(2)

Assuming that the attributes are independent, the value of this expression can be reached by the following Eq. 3:

$$ P\left({C}_i\right)\times P\left({a}_1={x}_1\mid {C}_i\right)\times P\left({a}_2={x}_2\mid {C}_i\right)\times ..\dots \times P\left({a}_n={x}_n\mid {C}_i\right) $$

(3)

Thus, for equation i = 1, 2, the product is calculated and the classification with the highest value is selected (Bramer 2007).

Multilayer Perceptron (MLP)

ANN models are composed of interconnected processing elements similar to biological neurons and weighted connections similar to brain snaps (Karray and Silva 2004). Multilayer perceptron (MLP) is commonly used ANN for classification purposes (Omid et al. 2010). MLPs consist of input layers, hidden layers, and output layers. MLP is a feed-forward neural network (FFNN), namely, data flow through input layers toward to output layers in a single direction. In classification, MLP model is performed with the use of back propagation algorithm (Karray and Silva 2004). MLP is commonly used in classification of agricultural products (Marini et al. 2004; Mollazade et al. 2012). In this study, 7-6-6 MLP structure model with a learning rate of 0.3, momentum of 0.2, number of epochs of 500 and sigmoid function was used for classification (Fig. 1).

Statistical Analyses and Variety Classification

One-factor variance analysis was applied the experimental data with the use of SPSS v20.0 (IBM SPSS® 2010) software and significant means were compared with the use of Tukey’s test (p<0.05). Canonical correlation was applied to examine the relationships among the investigated parameters. A scatter plot with canonical discrimination functions presented the group centroids of varieties. The principal components were assessed for multivariate tests (MANOVA). Hotelling’s pairwise comparisons were conducted with the use of PAST v3.20 software to see the differences in sunflower oilseed varieties (Hammer et al. 2001).

Model Performance Evaluation

Model performance was assessed through statistical indicators including classification accuracy (A_c), sensitivity (S_e), specificity (S_p), F-measure (F), precision (P), kappa statistic (κ), mean absolute error (MAE), and root mean square error (RMSE) and was calculated using Eqs. 4, 5, 6, 7, 8, 9, 10, and 11, respectively. Kappa statistics measure interrater reliability among the classifiers. Kappa statistics of −1 indicates “total disagreement,” 0 indicates “random classification,” and 1 indicates “perfect agreement” (Viera and Garrett 2005). The following equations are used for calculations (Parker 2001):

$$ {A}_c=\frac{TP+ TN}{TP+ FP+ TN+ FN}\times 100 $$

(4)

$$ {S}_e=\frac{TP}{TP+ FN}\times 100 $$

(5)

$$ {S}_p=\frac{TN}{TN+ FP}\times 100 $$

(6)

$$ F=\frac{2\times P\times {S}_e}{P+{S}_e} $$

(7)

$$ P=\frac{TP}{TP+ FP} $$

(8)

$$ \kappa =\frac{Prob\left(\alpha \right)- Prob(e)}{1- Prob(e)} $$

(9)

$$ MAE=\sum \limits_{i=1}^n\frac{\mid {E}_i-{M}_i\mid }{n} $$

(10)

$$ RMSE=\sqrt{\frac{\sum \limits_{i=1}^n{\left({E}_i-{M}_i\right)}^2}{n}} $$

(11)

where n = number of data instances, M_i = measured target value, E_i = predicted target value, Prob(α) = total agreement probability, Prob(e) = hypothetical probability of chance agreement, TP = number of true positives, FP = number of false positive, TN = number of true negatives, and FN = number of false negatives. The accuracy was assessed through goodness of the prediction (Colton 1974). The correlation coefficients of between 0 and 0.25 indicate little or no relationship, 0.25 and 0.50 indicate fair degree relationship, 0.50 and 0.75 indicate moderate to good relationship, and 0.75 and 1.0 indicate very good to excellent relationship.

It is possible to use relative operating characteristic (ROC) curves to visualize the tradeoffs between the correctly classification of positive and negative rates. A ROC curve is a graphical plot of the true positive rate, also known as sensitivity, versus the false positive rate or one minus the specificity, as its threshold of discrimination is varied. A classifier with a particular tradeoff between sensitivity and specificity represents each point on the ROC curve. The information expressed by the ROC curve must be collapsed into a single response variable to compare the output of multiple classification schemes with statistical instruments (Pietersma et al. 2003; Stegmayer et al. 2013). The region under the entire ROC curve (AUC) was suggested as an acceptable performance index for this reason (Bradley 1997) because it is between 0 and 1 and makes it easier to compare classifiers with each other. Additionally, higher AUC value indicates a better predictive capability of a model. When AUC is close to 1, it means that a score higher than any non-class sample has been allocated to most of the positive class samples, which means that there is a threshold that perfectly distinguishes.

Results and Discussion

Six sunflower varieties (105 single seed) were evaluated in terms of fatty acid and mineral composition, biochemical characteristics, and physical attributes and experimental findings were provided in subsequent sections.

Biochemical Attributes

The proximate compositions of sunflower seeds are given in Table 2. The ash content of sunflower seeds ranged between 3.20 and 4.80%. The greatest ash content was obtained from Transol variety and the least was obtained from Tunca variety. Ash content of sunflower seeds was reported as 4.82–4.87% (Ingale and Shrivastava 2011). Present findings on ash content comply with the results of Gupta and Shrivastava (2004). Seed oil contents ranged from 39.9 to 48.1% with the greatest value in Tunca and the smallest value in Colombi varieties. Oil content of sunflower seeds was reported as between 38.13 and 50.20% (Ekin et al. 2005). Differences in oil contents were mostly attributed to environmental conditions, genotypes and cultural practices.

Table 2 Proximate composition of sunflower oilseed varieties

Full size table

The fatty acid composition of sunflower seeds is given in Table 3. Significant differences were observed in crude oil content and fatty acid composition of sunflower varieties. Among 10 fatty acids identified in sunflower seeds, oleic acid (C18:1 n9c) was identified as the major fatty acid followed by linoleic acid (C18:2 n6c) and palmitic acid (C16:0). Saturated fatty acids are responsible for the cholesterol level in humans; however, it depends on the carbon-chain length and bioavailability (Webb and Sanders 1991; Cater and Garg 1997). Related to this information, behenic acid (C22:0) and arachidic acid (C20:0) acids are less responsible as compared to the myristic (C14:0), palmitic (C16:0) and stearic acids (C18:0). In present study, behenic (C22:0) and arachidic (C20:0) acids were the minor fatty acids in all sunflower varieties. The highest myristic acid (C14:0) was determined in P64LC53 and Tunca varieties (0.07%). The highest behenic (C22:0) and arachidic (C20:0) acids were observed in 63MM54 variety. Similar findings were reported by Izquierdo and Aguirrezábal (2008). Palmitic and stearic acids ranged between 3.58–5.91% and 2.66–3.49 %, respectively. In a previous study, similar palmitic acid (C16:0) and higher stearic acid (C18:0) contents were reported (Alberio et al. 2016). For dietary purposes, higher mono-unsaturated fatty acids, especially higher oleic acid is an important factor for human health (Esmaeili et al. 2012; Salimon and Farhan 2012). Also, in recent years, many studies have been carried out to get more oleic acid in seed with genetic improvements. Present oleic acid (18:1 n-9) contents varied between 22.3 and 26.5%. Oleic acid content of the studied materials is very low as compared to high oleic acid varieties. Present findings on oleic acid contents are consistent with results of Li et al. (2017). Angeloni et al. (2017) reported varying oleic acid contents with the genotypes and growing seasons. Werteker et al. (2010) reported highly variable oleic acid contents in sunflower seeds as compared to rapeseed and soybean seeds. Present linoleic acid (C18:2n6) contents ranged between 66.00 and 67.43%. Such findings comply with the results of Li et al. (2017). Linoleic acid is an essential fatty acid and plays an important role in various physiological functions of human body, especially in prevention of cardiovascular diseases (Connor 2000; Arts et al. 2001). Present linolenic acid (C18:3n3) contents ranged between 0.04 and 0.10%. In a previous study, linolenic acid (C18: 3n3) contents were reported as between 0 and 3.9% (Werteker et al. 2010). Konuskan et al. (2017) indicated significant effects of variety, ecological and growing conditions and agronomic practices on fatty acid composition of oilseeds.

Table 3 Fatty acid composition (%) of different sunflower oilseed varieties

Full size table

Element composition of sunflower seed samples was tabulated in Table 4. As can be seed from Table, 13 different elements were detected in seed samples and differences in micro element composition of the varieties were found to be significant (p<0.05). Phosphorus (P), potassium (K), and sulfur (S) were the major elements with average concentrations of 6225.4, 5487.7, and 2454.2.6 mg/kg, respectively. The greatest K content was obtained from P64LC53 (8507 mg/kg) variety and the lowest from Transol (3675.7 mg/kg) variety. The greatest magnesium content was detected in P64LC53 and the least in Colombi variety. The lowest element concentrations of all sunflower varieties were observed for cadmium which ranged between 0.04 and 1.63 mg/kg. Following this mineral, trace quantities of Ni, Ba, Cu, Fe, Mn and Zn were observed (ranging between 6.01 and 50.87 mg/kg). Kirbaslar et al. 2012 reported the mineral composition of some seeds including sunflower and stated that potassium was the major element (3094.1 mg/kg) and Cu, Zn and Fe were observed in trace quantities (10.13, 28.30 and 38.65 mg/kg, respectively). Ingale and Shrivastava (2011) reported Ca and P of two sunflower genotypes respectively as between 1070 and 1500 mg/kg and between 4000 and 3900 mg/kg. Among the present samples, 63MM54 and P64LC53 were found to be prominent for mineral composition. These two genotypes showed the significantly higher concentrations for majority of the major and minor elements (Table 4).

Table 4 Mineral composition of different sunflower oilseed varieties (mg/kg)

Full size table

Shape and Dimensional Attributes

The gravitational, dimensional and area attributes of the varieties are provided in Table 5. There were significant differences in physical attributes of the sunflower varieties (p<0.01). Seed mass is an appropriate physical property to determine the quality of seed. The greatest seed weights were obtained from Colombi and LG5582 varieties and the least from 63MM54, P64LC53 and Tunca varieties. Similar with the present findings, Munder et al. (2017) reported seed weights of hybrid sunflower variety (PR65H22) as between 0.024 and 0.108. Demir et al. (2018) indicated projected area as an important classification criterion. Projected areas of the present sunflower seed samples varied between 27.37 and 41.46 mm² with the greatest value from Colombi and the least from Tunca varieties. Mirzabe et al. (2012) reported projected area of Sirena sunflower variety as between 27.38 and 69.82 mm².The greatest surface areas were obtained from LG5582 and Colombi varieties (165.82 and 174.03 mm², respectively). However, Ortiz-Hernandez et al. (2020) reported greater surface areas for P64H41 precocious hybrid sunflower seeds than the present values. The greatest volume and geometric mean diameter values were determined in Colombi variety as 201.50 mm³ and 7.26 mm, respectively (p<0.05). Malik and Saini (2016) reported volume and equivalent (geometric mean) diameter values of PSH-996 variety at five different moisture contents (07.60–25.00% w.b.) respectively as between 192.61 and 262.77 mm³ and between 7.15 and 7.93 mm and those values were greater than the present ones. Gupta and Das (1997) reported geometric mean diameter of sunflower seeds as 4.72 mm. The greatest length and widths were obtained from Colombi and 63MM54 varieties. Santalla and Mascheroni (2003) reported seed length and width as 11.526 and 5.008 mm, respectively. Khodabakhshian et al. (2010) investigated the physical properties of Shahroodi variety and reported the thickness values as between 3.88 and 4.94 mm.

Table 5 Gravitational, dimensional and area attributes for sunflower oilseed varieties

Full size table

The dimension and shape attributes are important for sunflower dehuller and separator. Sphericity, shape and aspect ratio of the sunflower varieties are provided in Table 6. All sunflower varieties were not close to sphere. The highest sphericity values were determined in 63MM54 and P64LC53 as 60.82 and 60.36% varieties, respectively. Gupta and Das (1997) reported average sphericity of Morden variety as 57%. Increasing sphericity values were observed with decreasing shape index values. All varieties with a shape index greater than 1.25 were described as oval. Decreasing roundness and sphericity values were observed with increasing shape index values. Roundness values close to unity indicate an almost circular seed shape. All average roundness values of sunflower seeds were close to each other and varied between 0.27 and 0.37. According to Table 6, the highest aspect ratio was observed in 63MM54 variety and the greatest elongation values were found in Tunca (2.10) and LG5582 (2.14) varieties. Jafari et al. (2011) reported average elongation value of Shamshiri variety as 3.26. Elongation and aspect ratio were negatively correlated. Increasing aspect ratios were observed with decreasing sphericity.

Table 6 Sphericity, shape and aspect ratio attributes for sunflower oilseed varieties

Full size table

Discrimination of the Sunflower Varieties

Results of canonical discriminant functions are provided in Table 7. The higher the eigenvalues, the greater the function explains dependent variable. Functions effect size depends on square of the canonical correlation values. First three functions explained 93.3% of the total variation in physical attributes of the varieties. Since the results of biochemical attributes are insufficient in number, it is not handled by this method. Wilks’ lambda generally gives the best prediction model and is important for each predictor variables those are ideal. In this work, canonical correlations for 5 results were significant (p<0.01). In Wilks’ lambda statistics, only 13% of the difference was not explained.

Table 7 Discriminant analysis results for the studied parameters of sunflower oilseed varieties

Full size table

The standardized canonical discriminant function coefficients present the relative importance of 12 predictors for both physical and chemical attributes. In function 1, the highest loadings were observed for thickness and geometric mean diameter. In function 2, the greatest loadings were observed for thickness and aspect ratio. Additionally, for function 1, palmitic and oleic acid, for the function 2, arachidic and eicosenoic acids were the strongest predictors for the biochemical attributes.

Figure 2 shows the centroids of 6 sunflower oilseed varieties based on their canonical discriminant functions. With respect to the differences between principal components, thickness, geometric mean diameter and aspect ratio were considered as an important distinguishing feature. Thickness was identified as the most important feature that enables discrimination of varieties. The 63MM54, P64LC53 and LG5582 varieties which are similar in many attributes located on the right of the canonical function 1 axis. Since Colombi and Transol are particularly similar in shape, they were located close to each other. Tunca, which has the lowest values in terms of most attributes, is clearly distinguished from the other varieties and located on the bottom of the canonical function 2 axis. The aspect ratio for 63MM54 and P64LC53 varieties constituted a separate group.

Multivariate Tests (MANOVA) and Pairwise Comparison

The results of the MANOVA test, Mahalanobis distances and Bonferroni corrected are given in Table 8. According to Wilks’ lambda and Pillai Trace statistics, all varieties were found to be significant in terms of the physical attributes (p<0.01). Pillai Trace statistics, considered to be the most reliable among multivariate evaluations, consider the sum of the variance that explains the greatest discrimination of independent variables in dependent variables. Wilks’ lambda clarified the variance percentage in dependent variables and expressed them with differences in independent variables. The smaller the “Wilks’ lambda” statistics, the greater the differences among the groups to be analyzed (Sayıncı et al. 2015). Mostly, the varieties which Mahalanobis distance is lower than 3 showed statistically similar physical attributes (p>0.05). According to Mahalanobis distances among the sunflower varieties, the closest distances were observed between Colombi and Transol varieties, and between 63MM54 and P64LC53 varieties, while the distance between the Tunca and all other varieties had the greatest values. Confusion matrix results are presented in Table 8. Discriminant analysis was able to classify sunflower varieties with 77.94% accuracy. The greatest correctly classification was obtained from Tunca and P64LC53 varieties. However, among all varieties, Colombi and 63MM54 had the worst correct classification.

Table 8 Differences among the sunflower oilseed based on seed outlines for physical attributes

Full size table

Classification Results

Classification matrix for size, shape, area and mass attributes and performance comparisons made with the use of 6 classification algorithms (DT, RF, SVM, MLR, NB and MLP) are given in Table 9. The confusion matrix of classifiers is shown in Table 10. Classification accuracies ranged from 74.44 to 80.16% and RF algorithm had the greatest A_c and the lowest RMSE (0.218) as compared to the other algorithms. DT which has 55 tree and 28 leaves yielded busy and complicated tree. Mass was identified as an important classification node in the structure of trees. For this reason, if an automatic system for sorting is developed based on the classifiers, mass attribute can be used as a main criterion for product quality assessment (Banakar et al. 2017). DT classification accuracy was 75.71% which was the third lowest accuracy after NB (74.44%) and MLR (75.24%). Similar to present findings, Zielinska et al. (2012) reported that engineering properties-based classification accuracy of four different red clover seeds ranged from 41.0 to 78.5% for Naïve Bayes and from 52.8 to 85.4% for decision tree algorithms. Omid (2011) determined reported the correct classification rates and RMSEs respectively as 99.52% and 0.07 for training set and respectively as 95.56% and 0.21 for the test set (Table 9).

Table 9 Performance results of machine learning classifiers

Full size table

Table 10 Confusion matrix of classifiers for sunflower oilseed varieties (10-fold cross-validation)

Full size table

There are mainly four types of kernel functions (polynomial, normalized polynomial, radial basis and Pearson VII function) to select the best SVM. In this study, the appropriate kernel function for SVM was Pearson VII function. The classification matrix of SVM showed that among 630 sunflower seeds, 502 seeds were categorized precisely (Table 10). The accuracy calculated was 79.68% for SVM. However, SVM had the highest MAE and RMSE values. Soltani and Omid (2015) reported the accuracy of SVM Pearson VII function-based function as 0.891. Mollazade et al. (2012) obtained the greatest r and lowest RMSE from SVM polynomial and Pearson VII function-based kernel functions.

A multilayer perceptron (MLP) neural network was applied for analysis of size, shape, and mass of the sunflower seeds. Seven physical attributes were used as inputs, there were six hidden layers and 6 outputs (i.e., 7-6-6 structure). Accuracy, Kappa, sensitivity, specificity, precision and F-measure values of MLP classifier were respectively identified as 78.89%, 0.75, 0.789, 0.958, 0.790, and 0.789. Kavdır and Guyer (2008) used artificial neural networks, MLP and DT classifiers to apples and reported the greatest classification accuracy (83.33%) for MLP, then (75.56%) for DT. Gumus et al. (2018) informed that the accuracy of six olive oil classifications based on biochemical attributes ranged from 81.63 to 85.71%. The authors expressed that ANN results exhibited a well-correlation with PCA results. Zielinska et al. (2012) used multilayer perceptron and reported the accuracy of varietal classification as between 57.6 and 89.6%. These findings support the present ones.

NB classifiers had the lowest accuracy, kappa statistic, sensitivity, specificity and precision as 74.44%, 0.69, 0.744, 0.949 and 0.745, respectively. In fact, NB had the lowest MAE values. Present findings comply with the results of Gumus et al. (2019) reporting accuracy as 79.59% for Naïve Bayes, 79.59% for decision tree, and 83.67% for support vector machine. In contrast, accuracy and kappa statistic values of Naïve Bayes (74.44%—0.690) reported by Siedliska et al. (2017) were greater than those of the present work (87.6%—0.724) for fresh cherries, but similar with the accuracy values of random forest (80.3%). Barbosa et al. (2015) also reported greater accuracy (95.4%) for Nb and Rf than the present study.

MLR is a simple model used for estimation in agricultural practices (Abdipour et al. 2019). In this study, the minimum value of F-measure was obtained for MLR and the calculated values of A_c, S_e, and S_p were 75.24%, 0.752, and 0.950, respectively. Among all classifiers, MLR was the best classifiers for LG 5582, 63MM54 and Tunca varieties. Similarly, Abdipour et al. (2019) used ANN and MLR and indicated that MLR was less capable to predict seed yield in both training as R²=0.710, RMSE=0.064, and MAE=0.052 and testing R²=0.686, RMSE=0.071, and MAE=0.054. Present findings comply with the results of Niazian et al. (2018) reporting the ability of MLR model to predict seed yield for training (R²=0.81) and testing (R²=0.79) stages. Author also indicated that the performance of ANN was better than MLR.

The accuracy of the predictive model was presented in the ROC curve. As seen in Fig. 3, the receiver operating curve is a graphical representation of the effectiveness of the predictive model confirmed that the classifier accurately identified varieties. As expected, the highest ROC area values were obtained from RF, MLP, and SVM. In particular, the ROC area values ensure quite a high performance for automatic identification of any understudy of the variety classification, as the values obtained are higher than 0.84 in all cases.

Conclusion

The present study focused on fatty acid and mineral profile and physical attributes of six different sunflower oilseed varieties and detailed discrimination and classifications were performed based on physical attributes. In general, the greatest shape and dimension values were observed in Colombi and LG5582 varieties, while the highest mineral and fatty acid composition values were observed in 63MM54 and P64LC53 varieties. In terms of physical attributes, excluding shape index and elongation, Tunca variety had the lowest values. The first three functions explained 93.3% of the total variation in physical attributes of the varieties. In the Wilks’ lambda statistics, unexplained part of the similarities or dissimilarities among the groups was 13%. Colombi and Transol varieties with the lowest Mahalanobis distances had the closest attributes. Before classification, feature selection was applied to physical attributes, and projected area, length, sphericity, width, thickness, aspect ratio and mass were identified as the best features by using CFS-Best First method. Among six classifiers (DT, RF, SVM, MLR, NB and MLP), the RF and SVM (with Pearson VII kernel function) yielded better outcomes. MLP with a 7-6-6 topology also yielded quite a well classification for sunflower varieties.

Correct classification of sunflower oilseeds that meet specific requirements is important in the food and agricultural industries. By providing a correctly classified final product, the proposed algorithms could help food and seed companies (Kurtulmuş 2020). By using the algorithms and techniques suggested in the study, a separator or sorter could be designed, and the actual classification accuracy of the system could be determined during the process. Cleaning, dehulling, packaging, and oil expelling apparatuses are generally designed based on physical features of the seeds. Such features are also utilized in the design of conveyor systems used in product transportation. In addition, the specified features can be helpful in the selection, breeding, quality assessment, breeding, chemical, and pharmaceutical industries (Çetin et al. 2020). This research showed that using machine learning algorithms, the classification of sunflower seeds was feasible. The results observed from the tests were encouraging in that the approach proposed here could classify related sunflower oilseed varieties. Future studies related to seed classification with machine learning should also be inspire more dataset, features, and algorithms by these results.

References

Abdipour M, Younessi-Hmazekhanlu M, Ramazani SHR (2019) Artificial neural networks and multiple linear regression as potential methods for modeling seed yield of safflower (Carthamus tinctorius L.). Ind Crop Prod 127:185–194. https://doi.org/10.1016/j.indcrop.2018.10.050
Article Google Scholar
Adeleke BS, Babalola OO (2020) Oilseed crop sunflower (Helianthus annuus) as a source of food: nutritional and health benefits. Food Sci Nutr 8:4666–4684. https://doi.org/10.1002/fsn3.1783
Article CAS PubMed PubMed Central Google Scholar
Alberio C, Izquierdo NG, Galella T, Zuil S, Reid R, Zambelli A, Aguirrezábal LA (2016) A new sunflower high oleic mutation confers stable oil grain fatty acid composition across environments. Eur J Agron 73:25–33. https://doi.org/10.1016/j.eja.2015.10.003
Article CAS Google Scholar
Ali J, Khan R, Ahmad N, Maqsood I (2012) Random forests and decision trees. Int J Comput Sci 9:272–278
Google Scholar
Angeloni P, Echarte MM, Irujo GP, Izquierdo N, Aguirrezábal L (2017) Fatty acid composition of high oleic sunflower hybrids in a changing environment. Field Crop Res 202:146–157. https://doi.org/10.1016/j.fcr.2016.04.005
Article Google Scholar
AOAC (2000) Official methods of analysis of AOAC international, 17th edn. AOAC International, Gaithersburg
Google Scholar
Arts MT, Ackman RG, Holub BJ (2001) Essential fatty acids in aquatic ecosystems: a crucial link between diet and human health and evolution. Can J Fish Aquat Sci 58(1):122–137. https://doi.org/10.1139/f00-224
Article CAS Google Scholar
Ataş M, Yardimci Y, Temizel A (2012) A new approach to aflatoxin detection in chili pepper by machine vision. Comput Electron Agric 87:129–141
Article Google Scholar
Avni T, Anupriya S, Rai P, Maan K, Naryansamy CCN (2016) Effects of heating and storage on nutritional value of sunflower oil. DU JUndergr Res Innov 2(1):196–202
Google Scholar
Badouin H, Gouzy J, Grassa CJ, Murat F, Staton SE, Cottret L et al (2017) The sunflower genome provides insights into oil metabolism, flowering and Asterid evolution. Nature 546(7656):148–152. https://doi.org/10.1038/nature22380
Article CAS PubMed Google Scholar
Banakar A, Zareiforoush H, Baigvand M, Montazeri M, Khodaei J, Behroozi-Khazaei N (2017) Combined application of decision tree and fuzzy logic techniques for intelligent grading of dried figs. J Food Process Eng 40(3):1–12. https://doi.org/10.1111/jfpe.12456
Article Google Scholar
Barbosa RM, Batista BL, Barião CV, Varrique RM, Coelho VA, Campiglia AD, Barbosa F Jr (2015) A simple and practical control of the authenticity of organic sugarcane samples based on the use of machine-learning algorithms and trace elements determination by inductively coupled plasma mass spectrometry. Food Chem 184:154–159. https://doi.org/10.1016/j.foodchem.2015.02.146
Article CAS PubMed Google Scholar
Berhane T, Lane C, Wu Q, Autrey B, Anenkhonov O, Chepinoga V, Liu H (2018) Decision-tree, rule-based, and random forest classification of high-resolution multispectral imagery for wetland mapping and inventory. Remote Sens 10:580. https://doi.org/10.3390/rs10040580
Article Google Scholar
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159
Article Google Scholar
Bramer M (2007) Principles of data mining, vol 180. Springer, London. https://doi.org/10.1007/978-1-4471-7307-6
Book Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Cater NB, Garg A (1997) Serum low-density lipoprotein cholesterol response to modification of saturated fat intake: recent insights. Curr Opin Lipidol 8(6):332–336. https://doi.org/10.1097/00041433-199712000-00002
Article CAS PubMed Google Scholar
Çetin N, Yaman M, Karaman K, Demir B (2020) Determination of some physicomechanical and biochemical parameters of hazelnut (Corylus avellana L.) cultivars. Turk J Agric For 44(5):439–450
Article Google Scholar
Colton T (1974) Statistics in Medicine. Little Brown and Co, New York, p p179
Google Scholar
Connor WE (2000) Importance of n− 3 fatty acids in health and disease. Am J Clin Nutr 71(1):171–175. https://doi.org/10.1093/ajcn/71.1.171S
Article Google Scholar
Costa C, Antonucci F, Pallottino F, Aguzzi J, Sun DW, Menesatti P (2011) Shape analysis of agricultural products: a review of recent research advances and potential application to computer vision. Food Bioprocess Technol 4:673–692. https://doi.org/10.1007/s11947-011-0556-0
Article Google Scholar
Demir İ, Basalma D (2018) Response of different level of nitrogen and sulphur doses on oil yield and seed nutrients content of sunflower (Helianthus annuus L.). Fresenius Environ Bull 27(9):6337–6342
CAS Google Scholar
Demir B, Sayıncı B, Çetin N, Yaman M, Çömlek R, Aydın Y, Sütyemez M (2018) Elliptic Fourier based analysis and multivariate approaches for size and shape distinctions of walnut (Juglans regia L.) cultivars. Grasas Aceites 69(4):e271. https://doi.org/10.3989/gya.0104181
Article Google Scholar
Drazin S, Montag M 2012. Decision tree analysis using weka. Machine Learning-Project II, University of Miami, 1-3.
Ekin Z, Tuncturk M, Yilmaz I (2005) Evaluation of seed, oil yields and yield properties of different sunflower (Helianthus annus L.) hybrid varieties in Van, Turkey. Pak J Biol Sci 8:683–686. https://doi.org/10.3923/pjbs.2005.683.686
Article Google Scholar
Esmaeili A, Shaykhmoradi F, Naseri R (2012) Comparison of oil content and fatty acid composition of native olive genotypes in different region of Liam. Iran Intl J Agric Crop Sci 4(8):434–438
Google Scholar
Fıratlıgil-Durmus E, Sárka E, Bubník Z, Schejbal M, Kadlec P (2010) Size properties of legume seeds of different varieties using image analysis. J Food Eng 99(4):445–451. https://doi.org/10.1016/j.jfoodeng.2009.08.005
Article Google Scholar
Gumus ZP, Ertas H, Yasar E, Gumus O (2018) Classification of olive oils using chromatography, principal component analysis and artificial neural network modelling. JFood Meas Character 12(2):1325–1333. https://doi.org/10.1007/s11694-018-9746-z
Article Google Scholar
Gumus O, Yasar E, Gumus ZP, Ertas H (2019) Comparison of different classification algorithms to identify geographic origins of olive oils. J Food Sci Technol 57:1–9. https://doi.org/10.1007/s13197-019-04189-4
Article CAS Google Scholar
Gupta RK, Das SK (1997) Physical properties of sunflower seed. J Agric Eng Res 66(1):1–8. https://doi.org/10.1006/jaer.1996.0111
Article Google Scholar
Gupta M, Shrivastava SK (2004) Proximate composition of some new varieties of oil seeds. Int J Chem Sci 2(3):375–378
CAS Google Scholar
Gupta RK, Arora G, Sharma R (2007) Aerodynamic properties of sunflower seed (Helianthus annuus L.). J Food Eng 79(3):899–904. https://doi.org/10.1016/j.jfoodeng.2006.03.010
Article Google Scholar
Hall MA (1999) Correlation-based feature selection for machine learning. PhD Thesis, University of Waikato, Hamilton
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I. H. 2009. The WEKA data mining software: an update. SIGKDD Explorations. Available online: http://www.cs.waikato.ac.nz. (Accessed 12 April 2019).
Hammer Ø, Harper DAT, Ryan PD (2001) PAST: Paleontological statistics software package for education and data analysis. Palaeontol Electron 4(1):9
Google Scholar
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer-Verlag, New York. https://doi.org/10.1007/978-0-387-84858-7
Book Google Scholar
Ingale S, Shrivastava SK (2011) Chemical studies of new varieties of sunflower (Helianthus annuus) LSF-11 and LSF-8 seeds. Agric Biol JN Am 2(8):1171–1181
CAS Google Scholar
Izquierdo NG, Aguirrezábal LAN (2008) Genetic variability in the response of fatty acid composition to minimum night temperature during grain filling in sunflower. Field Crop Res 106(2):116–125. https://doi.org/10.1016/j.fcr.2007.10.016
Article Google Scholar
Jafari S, Khazaei J, Arabhosseini A, Massah J, Khoshtaghaza MH (2011) Study on mechanical properties of sunflower seeds. Food Sci Technol 14(1):6
Google Scholar
Karray FO, Silva CD (2004) Soft computing and intelligent systems design: theory, tools and applications. Addison Wesley Pearson Press, New York
Google Scholar
Kavdır I, Guyer DE (2008) Evaluation of different pattern recognition techniques for apple sorting. Biosyst Eng 99(2):211–219. https://doi.org/10.1016/j.biosystemseng.2007.09.019
Article Google Scholar
Kays SJ (1999) Preharvest factors affecting appearance. Postharvest Biol Technol 15:233–247. https://doi.org/10.1016/S0925-5214(98)00088-X
Article Google Scholar
Khazaei J., Jafari, S., Noorolah. S. 2008. Lognormal vs. Normal and Weibull distributions for modeling the mass and size distributions of sunflower seeds and kernels. World Conference on Agricultural Information and IT.
Khodabakhshian R, Emadi B, Abbaspour Fard MH (2009) Aerodynamic properties of sunflower seed, kernel and its hull affected by moisture content and size, azargol variety as a case study. International Agricultural Engineering Conference, Bangkok
Google Scholar
Khodabakhshian R, Emadi B, Fard MA (2010) Some engineering properties of sunflower seed and its kernel. Nong Ye Ke Xue Yu Ji Shu 4(4):37
Google Scholar
Kirbaslar FG, Türker G, Özsoy-Günes Z, Ünal M, Dülger B, Ertas E, Kizilkaya B (2012) Evaluation of fatty acid composition, antioxidant and antimicrobial activity, mineral composition and calorie values of some nuts and seeds from Turkey. Records Nat Prod 6(4):339
CAS Google Scholar
Kiriamiti HK, Rascol E, Marty A, Condoret JS (2002) Condoret, Extraction rate of oil from high oleic sunflower seeds with supercritical carbon dioxide. Chem Eng Process 41:711–718. https://doi.org/10.1016/S0255-2701(01)00191-X
Article CAS Google Scholar
Konuskan OMER, Konuskan DB, Levai CM (2017) Effect of foliar boron fertilization on chemical properties and fatty acid compositions of corn (Zea mays L.). Rev Chim (Bucharest) 68(9):2073–2075. https://doi.org/10.3965/j.ijabe.20160901.1790
Article CAS Google Scholar
Kurtulmuş, F. 2020. Identification of sunflower seeds with deep convolutional neural networks. J Food Meas Characterization, 1-10. https://doi.org/10.1007/s11694-020-00707-7
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59:161–205. https://doi.org/10.1007/s10994-005-0466-3
Article Google Scholar
Li WP, Shi HB, Zhu K, Zheng Q, Xu Z (2017) The quality of sunflower seed oil changes in response to nitrogen fertilizer. Agron J 109(6):2499–2507. https://doi.org/10.2134/agronj2017.01.0046
Article CAS Google Scholar
Malik MA, Saini CS (2016) Engineering properties of sunflower seed: effect of dehulling and moisture content. Cogent Food Agric 2(1):1145783. https://doi.org/10.1080/23311932.2016.1145783
Article CAS Google Scholar
Marini F, Magrı AL, Balestrieri F, Fabretti F, Marini D (2004) Supervised pattern recognition applied to the discrimination of the floral origin of six types of Italian honey samples. Anal Chim Acta 515(1):117–125. https://doi.org/10.1016/j.aca.2004.01.013
Article CAS Google Scholar
Mertens D (2005a) Plants preparation of laboratory sample. In: Horwitzand W, Latimer GW (eds) Official Methods of Analysis of AOAC International, 18th edn. AOAC International, Gaithersburg, pp 1–2
Google Scholar
Mertens D (2005b) Metal in plants and pet foods. In: Horwitz W, Latimer GW (eds) Official Methods of Analysis of AOAC International, 18th edn. AOAC International, Gaithersburg, pp 3–4
Google Scholar
Mirzabe AH, Khazaei J, Chegini GR (2012) Physical properties and modeling for sunflower seeds. Agric Eng Int CIGR J 14(3):190–202
Google Scholar
Mohsenin NN (1986) Physical properties of plant and animal materials: structure, physical characteristics and mechanical properties. Gordon and Breach Science Publishers, New York
Google Scholar
Mollazade K, Omid M, Arefi A (2012) Comparing data mining classifiers for grading raisins based on visual features. Comput Electron Agric 84:124–131. https://doi.org/10.1016/j.compag.2012.03.004
Article Google Scholar
Munder S, Argyropoulos D, Mueller J (2017) Class-based physical properties of air-classified sunflower seeds and kernels. Biosyst Eng 164:124–134. https://doi.org/10.1016/j.biosystemseng.2017.10.005
Article Google Scholar
Niazian M, Sadat-Noori SA, Abdipour M (2018) Modeling the seed yield of Ajowan (Trachyspermum ammi L.) using artificial neural network and multiple linear regression models. Ind Crop Prod 117:224–234. https://doi.org/10.1016/j.indcrop.2018.03.013
Article Google Scholar
Omid M (2011) Design of an expert system for sorting pistachio nuts through decision tree and fuzzy logic classifier. Expert Syst Appl 38(4):4339–4347. https://doi.org/10.1016/j.eswa.2010.09.103
Article Google Scholar
Omid M, Khojastehnazhand M, Tabatabaeefar A (2010) Estimating volume and mass of citrus fruits by image processing technique. J Food Eng 100(2):315–321. https://doi.org/10.1016/j.jfoodeng.2010.04.015
Article Google Scholar
Omobuwajo TO, Akande AE, Sanni LA (1999) Selected physical, mechanical and aerodynamic properties African Breadfruit (Treculia africana) seeds. J Food Eng 40:241–244. https://doi.org/10.1016/S0260-8774(99)00060-6
Article Google Scholar
Ortiz-Hernandez AA, Araiza-Esquivel M, Delgadillo-Ruiz L, Ortega-Sigala JJ, Durán-Muñoz HA, Mendez-Garcia VH, Yacaman MJ, Vega-Carrillo HR (2020) Physical characterization of sunflower seeds dehydrated by using electromagnetic induction and low-pressure system. Innov Food Sci Emerg Technol 60:102285. https://doi.org/10.1016/j.ifset.2019.102285
Article Google Scholar
Ozkan G, Koyuncu MA (2005) Physical and chemical composition of some walnut (Juglans regia L.) genotypes grown in Turkey. Grasas Aceites 56:141–146. https://doi.org/10.3989/gya.2005.v56.i2.122
Article CAS Google Scholar
Parker J (2001) Rank and response combination from confusion matrix data. Inf Fusion 2:113–120. https://doi.org/10.1016/S1566-2535(01)00030-6
Article Google Scholar
Pietersma D, Lacroix R, Lefebvre D, Wade KM (2003) Performance analysis for machine-learning experiments using small data sets. Comput Electron Agric 38(1):1–17
Article Google Scholar
Quinlan JR (1993) Combining instance-based and model-based learning. In Proceedings of the tenth international conference on machine learning, pp 236–243
Rieseberg LH, Van Fossen C, Desrochers AM (1995) Hybrid speciation accompanied by genomic reorganization in wild sunflowers. Nature 375:313–316. https://doi.org/10.1038/375313a0
Article CAS Google Scholar
Rodriguez-Galiano VF, Ghimire B, Rogan J, Chica-Olmo M, Rigol-Sanchez JP (2012) An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J Photogramm Remote Sens 67:93–104. https://doi.org/10.1016/j.isprsjprs.2011.11.002
Article Google Scholar
Safavian SR, Landgrebe D (1991) A survey of decision tree classifier methodology. IEEE Trans Syst, Man, Cybernet 21(3):660–674. https://doi.org/10.1109/21.97458
Article Google Scholar
Salgin U, Doker O, Calimli A (2006) Extraction of sunflower oil with supercritical CO2: experiments and modeling. J Supercrit Fluids 38(3):326–331. https://doi.org/10.1016/j.supflu.2005.11.015
Article CAS Google Scholar
Salimon J, Farhan N (2012) Physiochemical properties of Saudi extra virgin olive oil. Int J Chem Environ Eng 3(3):205–208
CAS Google Scholar
Santalla EM, Mascheroni RH (2003) Equilibrium moisture characteristics of high oleic sunflower seeds and kernels. Dry Technol 21(1):147–163. https://doi.org/10.1081/DRT-120017288
Article Google Scholar
Sayıncı B, Ercişli S, Akbulut M, Şavşatlı Y, Baykal H (2015) Determination of shape in fruits of cherry laurel (Prunus laurocerasus) accessions by using Elliptic Fourier analysis. Acta Sci Polonorum, Hortorum Cultus 14(1):63–82
Google Scholar
Siedliska A, Baranowski P, Zubik M, Mazurek W (2017) Detection of pits in fresh and frozen cherries using a hyperspectral system in transmittance mode. J Food Eng 215:61–71. https://doi.org/10.1016/j.jfoodeng.2017.07.028
Article Google Scholar
Soltani M, Omid M (2015) Detection of poultry egg freshness by dielectric spectroscopy and machine learning techniques. LWT-Food SciTechnol 62(2):1034–1042. https://doi.org/10.1016/j.lwt.2015.02.019
Article CAS Google Scholar
Sorkheh K, Kazemifard A, Rajabpoor S (2018) A comparative study of fuzzy linear regression and multiple linear regression in agricultural studies: a case study of lentil yield management. Turk J Agric For 42:402–411. https://doi.org/10.3906/tar-1709-57
Article Google Scholar
Sukhija PS, Palmquist DL (1988) Rapid method for determination of total fatty acid content and composition of feedstuffs and feces. J Agric Food Chem 36(6):1202–1206
Stegmayer G, Milone DH, Garran S, Burdyn L (2013) Automatic recognition of quarantine citrus diseases. Expert Syst Appl 40(9):3512–3517
Article Google Scholar
Tähtiharju S, Rijpkema AS, Vetterli A, Albert VA, Teeri TH, Elomaa P (2011) Evolution and diversification of the CYC/TB1 gene family in Asteraceae—a comparative study in Gerbera (Mutisieae) and sunflower (Heliantheae). Mol Biol Evol 29(4):1155–1166. https://doi.org/10.1093/molbev/msr283
Article CAS PubMed Google Scholar
Vandenbrink JP, Brown EA, Harmer SL, Blackman BK (2014) Turning heads: the biology of solar tracking in sunflower. Plant Sci 224:20–26. https://doi.org/10.1016/j.plantsci.2014.04.006
Article CAS PubMed Google Scholar
Vapnik VN (2000) Methods of pattern recognition. In: In The nature of statistical learning theory. Springer, New York, pp 123–180. https://doi.org/10.1007/978-1-4757-3264-1_6
Chapter Google Scholar
Vereecken H, Herbst M (2004) Statistical regression. In: Pachepsky YA, Rawls WJ (eds) Development of pedotransfer functions in soil hydrology. Elsevier, Amsterdam, pp 3–18. https://doi.org/10.1016/S0166-2481(04)30001-2
Chapter Google Scholar
Viera AJ, Garrett JM (2005) Understanding interobserver agreement: the kappa statistic. Fam Med 37(5):360–363
PubMed Google Scholar
Webb DR, Sanders RA (1991) Caprenin 1. Digestion, absorption, and rearrangement in thoracic duct-cannulated rats. J Am Coll Toxicol 10(3):325–340. https://doi.org/10.3109/10915819109079813
Article CAS Google Scholar
Werteker M, Lorenz A, Johannes H, Berghofer E, Findlay CS (2010) Environmental and varietal influences on the fatty acid composition of rapeseed, soybeans and sunflowers. J Agron Crop Sci 196(1):20–27. https://doi.org/10.1111/j.1439-037X.2009.00393.x
Article CAS Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Press, San Francisco
Google Scholar
Wu, X., Kumar, V., Quinlan, J. R., Ghosh, J., Yang, Q., Motoda, H., ..., Zhou, Z. H. 2008. Top 10 algorithms in data mining. Knowledge and Information Systems, 14 (1), 1–37. https://doi.org/10.1007/s10115-007-0114-2
Zielinska M, Zapotoczny P, Białobrzewski I, Zuk-Golaszewska K, Markowski M (2012) Engineering properties of red clover (Trifolium pratense L.) seeds. Ind Crop Prod 37(1):69–75. https://doi.org/10.1016/j.indcrop.2011.12.002
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Biosystems Engineering, Faculty of Agriculture, Erciyes University, Kayseri, Turkey
Necati Çetin, Cevdet Sağlam & Bahadır Demirel
Department of Agricultural Biotechnology, Faculty of Agriculture, Erciyes University, Kayseri, Turkey
Kevser Karaman
Department of Field Crops, Faculty of Agriculture, Erciyes University, Kayseri, Turkey
Erman Beyzi

Authors

Necati Çetin
View author publications
You can also search for this author in PubMed Google Scholar
Kevser Karaman
View author publications
You can also search for this author in PubMed Google Scholar
Erman Beyzi
View author publications
You can also search for this author in PubMed Google Scholar
Cevdet Sağlam
View author publications
You can also search for this author in PubMed Google Scholar
Bahadır Demirel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Necati Çetin.

Ethics declarations

Ethics Approval

This study does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent is not applicable in this study.

Conflict of Interest

Necati Çetin declares that he has no conflict of interest. Kevser Karaman declares that she has no conflict of interest. Erman Beyzi declares that he has no conflict of interest. Cevdet Sağlam declares that he has no conflict of interest. Bahadır Demirel declares that he has no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Çetin, N., Karaman, K., Beyzi, E. et al. Comparative Evaluation of Some Quality Characteristics of Sunflower Oilseeds (Helianthus annuus L.) Through Machine Learning Classifiers. Food Anal. Methods 14, 1666–1681 (2021). https://doi.org/10.1007/s12161-021-02002-7

Download citation

Received: 11 November 2020
Accepted: 04 March 2021
Published: 15 March 2021
Issue Date: August 2021
DOI: https://doi.org/10.1007/s12161-021-02002-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Comparative Evaluation of Some Quality Characteristics of Sunflower Oilseeds (Helianthus annuus L.) Through Machine Learning Classifiers

Abstract

Similar content being viewed by others

Machine Learning for Varietal Binary Classification of Soybean (Glycine max (L.) Merrill) Seeds Based on Shape and Size Attributes

The Advantage of Multispectral Images in Fruit Quality Control for Extra Virgin Olive Oil Production

The use of machine learning methods in classification of pumpkin seeds (Cucurbita pepo L.)

Introduction

Materials and Methods

Plant Material and Sample Preparation

Determination of Proximate Composition

Determination of Fatty Acid Composition

Determination of Element Composition

Shape and Dimensions

Feature Selection and Validation Methodology

Classification Techniques

Decision Tree (DT)

Random Forest (RF)

Support Vector Machines (SVMs)

Multiple Linear Regression (MLR)

Naïve Bayes (NB)

Multilayer Perceptron (MLP)

Statistical Analyses and Variety Classification

Model Performance Evaluation

Results and Discussion

Biochemical Attributes

Shape and Dimensional Attributes

Discrimination of the Sunflower Varieties

Multivariate Tests (MANOVA) and Pairwise Comparison

Classification Results

Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Ethics Approval

Informed Consent

Conflict of Interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation