Introduction

Chickpea seeds are quite rich in protein and carbohydrates, thus largely consumed worldwide. Chickpea, as an edible legume, has a significant place in the daily diets of low-income countries and is largely grown in the Mediterranean countries, Asia, Africa, and Europe (Sastry et al. 2019). Chickpea seeds contain 29% protein, 59% carbohydrate, 5% oil, 4% ash, and 3% fiber (Varol et al. 2020; Iqbal et al. 2006). Chickpea seeds were reported to be effective in the prevention of various diseases such as obesity, colon cancer, diabetes, and cardiovascular diseases (de Camargo et al. 2019). Annual chickpea production was reported as 14.776.827 tons in the world and 470.000 tons in Turkey (FAO 2019).

Shape and size attributes of legumes offer quite reliable data for transportation, classification (Bacci et al. 2002), discrimination (Symons et al. 2003), packaging and storage systems (Öztürk and Kara 2009). Such attributes are also used in the design of planting systems of sowing machines (Sayinci et al. 2012), the design of harvesting systems. Color parameters play a great role in consumer demands and culinary preferences (Cetin et al. 2020). Chickpea seeds have great potential in the seed and food industry. However, for the economic and efficient use of seeds, knowledge is required about seeds physical properties (Ropelewska et al. 2022). Based on seed color and geographical distribution, chickpeas are divided into two types desi (India-originated) and kabuli (Mediterranean and Central Asia-originated) types. Kabuli chickpeas are large with white and cream seed coat colors (Chavan et al. 1987; Ozaktan 2021).

Both destructive methods (RAPD analysis- random amplification of polymorphic DNA, multisensory gas analysis) and non-destructive methods (imaging, spectral reflection, vibration sensors, and excitation systems) are employed in the identification of different products. In destructive methods, samples pass through different processes in the laboratory for identification (Abbaspour-Gilandeh and Azizi 2018; Ropelewska and Jankowski 2019; Fathizadeh et al. 2021). Besides the complexity and time-consuming nature of destructive methods, other limitations arise, the most important of which is the possibility of damaging the sample (Abbaspour-Gilandeh and Azizi 2018). Therefore, image analysis operations have successfully been employed as an inexpensive, accurate, and reliable approach for the assessment of seeds of different species (Ropelewska and Jankowski 2019; Sabanci et al. 2021; Ropelewska 2020).

Physical measurements for seeds are time-consuming and laborious. Novel technologies have been developed to overcome these problems. Development technologies could be detected, sorted, and classified simply and quickly (Soares et al. 2013). However, such practical approaches are needed to define the characteristics used in the quality assessment of seeds (Omid et al. 2010). Image processing and machine learning methods are powerful tools for describing the characteristics of seeds. The shape and size attributes of the seed are revealed in a short time with the image processing technique (Çetin et al. 2020). Machine learning offers nonlinear models that can predict past and future values within an input–output link. (Zhang et al. 2012). Several researchers previously investigated the shape, area, size, and color attributes of chickpea and legume seeds (Masoumi and Tabil 2003; Nikoobin et al. 2009; Simonyan et al. 2009; Eissa et al. 2010; Abou-Salem and Abou-Arab 2011; Güzel and Sayar 2012; Kibar et al. 2014; Jogihalli et al. 2017; Sastry et al. 2019; Cetin et al. 2022). In these studies, principal physical attributes such as length, thickness and width were generally measured with a caliper and with the use of these properties, area, diameter, volume, and shape attributes were calculated. Measuring relevant attributes with conventional methods significantly increases the time and labor required (Çetin et al. 2022). However, novel approaches such as texture image analysis and spectrophotometric analysis would be carried out for such purposes. In addition, the discriminant analysis would be conducted with the use of principle component analysis (PCA) and hierarchical cluster analysis (HCA) methods. Instrumental and spectrophotometric analytical methods provide significant savings from the required labor and time. PCA is used to determine the attributes of cultivars, evaluate correlations among the investigated traits, then identify superior cultivars for certain traits. HCA offers reliable discrimination and grouping of cultivars (Cetin et al. 2020).

Mass prediction of the seed from the easily determining shape and size properties may support the design of sorting machines and classification processes (Gonzalez et al. 2017). Many researchers reported that machine learning yielded better performances for the prediction of agricultural product characteristics (Moosavi and Sepaskha 2012; Demir et al. 2017; Kus et al. 2017; Shabani et al. 2017; Çetin et al. 2021). There are limited number of studies on seed mass prediction (Soares et al. 2013; Rad et al. 2015, 2017; Gurbuz et al. 2018; Demir et al. 2020; Sağlam and Çetin 2021). However, there are no studies on the mass prediction of chickpeas with the use of image processing and machine learning algorithms. This study was carried out to (I) determine the differences in seeds of 26 chickpea cultivars grown under the same ecology without using any chemical inputs, (II) detect shape and size attributes of chickpea seeds with image analysis technique, (III) reveal these data through hierarchical clustering analysis and principal component analysis and to compare the relationships between each other and (IV) different machine learning algorithms (MLP, RF, SVR, and k-NN) were used for mass prediction from physical attributes and performance of these algorithms were compared.

Material and method

Study area and sample preparation

In the present experiments, 26 kabuli chickpea cultivars (Akça, Akçin 91, Aksu, Aziziye 94, Azkan, Cevdetbey 98, Çagatay, Çakır, Damla 89, Dikbas, Er 99, Gökçe, Gülümser, Hasanbey, Ilgaz, Işık 05, İnci, Izmir 92, Menemen 92, Osmanbey, Sarı 98, Seçkin, Sezenbey, Uzunlu 99, Yaşa 05 and Zuhal) were used as the plant material. Osmanbey is not a standard cultivar, it is a genotype.

Experiments were conducted on experimental fields of the Agricultural Research and Implementation Center (38°42′54.2"N–35°32′42.4"E) of Erciyes University in the years 2018 and 2019 in randomized blocks design with 3 replications. Each plot had 6 rows 30 cm apart with 5 cm on-row plant spacing. Conventional agronomic practices were not conducted on experimental fields during the last 3 years. Following the manual sowing, sprinkler irrigation was practiced until the emergence and the first manual weed control, then drip irrigation was practiced through drip lines placed by each row. Climate parameters (monthly average temperature, °C; relative humidity, %; precipitation, mm) of the experimental years and long-term (1931–2019) averages. Soil samples were taken from 0–30 cm soil depth at different points to represent the entire study area. Air-dried samples were passed through 2 mm sieves and subjected to soil analyses.

Image texture analysis of chickpea seeds

In the present study, 100 seeds were used in one image. Three replications were performed for each cultivar. A total of 600 seeds were used for each cultivar, as 300 seeds (1 variety × 3 replications) in the year of 2018 and 300 seeds in the year of 2019. Seed mass (M, g) was determined with a precise balance (± 0.001 g). The image acquisition system was composed of a digital CCD (Charge-Coupled Device) camera, lens, macro flash, lighting equipment, and imaging platform. Images were taken in a dark environment to get shadow-free images (Sayinci et al. 2012; Kara et al. 2013).

Chickpea seeds were placed on gray cardboard paper in two different orientations (horizontal and vertical) with the use of double-sided adhesive tape. Images were taken from a vertically positioned camera at a fixed height of 50 cm. To provide dimensional calibration, the unit conversion ratio of the images was obtained in pixels/mm by using a ruler on a fiberglass plate. A representative scheme of the imaging setup given in Fig. 1. Captured images were transferred to a computer and saved as *.tiff files for descriptive analysis. Length (L, mm), width (W, mm), thickness (T, mm), projected area (PA, mm2), equivalent diameter (ED, mm), perimeter (P, mm), shape factor (SF, mm) and compactness (C, %) were determined in both horizontal and vertical orientations with the use of SigmaScan®Pro 5.0 and MATLAB software. Equations used to calculate volume (V, mm3), shape index (SI), roundness (R), geometric mean diameter (Dg, mm), surface area (S, mm2), elongation (E) and sphericity (φ, %) (Sayıncı et al. 2015; Demir et al. 2018). Figure 2 shows an example image through the entire segmentation process. As a first step, the background of the original image has been removed. In the second step, the gray-scale transformation was performed. In the third step, the gray-scale image was inverted. Finally, the remaining background pixels were removed according to the morphology operator and objects with less than 100 pixels were removed. The final value was chosen after trial and error.

Fig. 1
figure 1

Representative scheme of the imaging setup

Fig. 2
figure 2

Image segmentation process of chickpea seeds a: Original image, b: Removal of background, c: Conversion into gray-scale, d: Inversion of gray-scale image, e: Improved binary after remove objects, f: Segmented and chickpea detected image

Chromatic characteristics

Color measurements of chickpea seeds of each cultivar were made with a chroma meter (Konica Minolta CR-400, Japan). Measurements were made in CIELab color space. L* (brightness, 0 dark, 100 light), a* (+ values red,—values green), and b* (+ values yellow,—values blue) values were measured. The following equations were used to calculate the color index (CI), hue angle (h *), and chroma (C *) values from the measured L*, a*, and b* values (McGuire 1992).

$$c^{*} = \sqrt {(a^{*} )^{2} + (b^{*} )^{2} }$$
(1)
$$h^{*} = \tan ^{{ - 1}} (b^{*} /a^{*} ),{\text{ }}\,\,\,(e\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{g} er{\mkern 1mu} {\mkern 1mu} a{\text{ > }}0{\mkern 1mu} {\mkern 1mu} {\mkern 1mu} and{\mkern 1mu} {\mkern 1mu} b* \ge 0)$$
(2)
$$h^{*} = 180 + \tan ^{{ - 1}} (b^{*} /a^{*} ),{\text{ }}\,\,(e\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{g} er{\mkern 1mu} {\mkern 1mu} a*{\text{ < }}0{\mkern 1mu} {\mkern 1mu} {\mkern 1mu} and{\mkern 1mu} {\mkern 1mu} b* \ge 0)$$
(3)
$$h^{*} = 180 + \tan ^{{ - 1}} (b^{*} /a^{*} ),{\text{ }}\,\,(e\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{g} er{\mkern 1mu} {\mkern 1mu} a*{\text{ < }}0{\mkern 1mu} {\mkern 1mu} {\mkern 1mu} and{\mkern 1mu} {\mkern 1mu} b* < 0)$$
(4)
$$h^{*} = 360 + \tan ^{{ - 1}} (b^{*} /a^{*} ),{\text{ }}\,\,\,(e\overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{g} er\,\,\,a*{\text{ > }}0{\mkern 1mu} {\mkern 1mu} {\mkern 1mu} and{\mkern 1mu} {\mkern 1mu} b*{\text{ < }}0)$$
(5)
$$CI = \frac{{1000\, \cdot a^{*} }}{{L^{*} b^{*} }}$$
(6)

Principal component and hierarchical cluster analysis

Principle component analysis (PCA) is generally used to identify superior cultivars for certain trait/traits, to reduce large datasets, and to make some cultivars prominent for some traits. Cluster analysis allows the grouping of samples based on similarities between each other. PCA and hierarchical cluster analysis were conducted with the use of Past3 software. For hierarchical cluster analysis, Euclidean square distance, Ward’s method, and PCA correlation matrix were used (Demir et al. 2018).

Machine learning modeling

The prediction of seed mass was performed using WEKA machine learning software (Machine Learning Group, University of Waikato) (Witten and Frank 2005). The Correlation-based Feature Selection (CFS) subset evaluator and Best First algorithm were utilized to determine the attributes with the higher power to predict seed mass (Sağlam and Çetin 2021). The model development was conducted on various datasets (inputs) including shape and size attributes such as cultivar, SFh, PAh, EDh, Dg, φ, T, SFv, PAv, and EDv. A total of 3000 data were used, 300 from each attribute. The models were built by a test mode of tenfold cross-validation, train (70%)-test (30%) split, and various algorithms from groups of Multilayer Perceptron (MLP), Random Forest (RF), Support Vector Regression (SVR) and k-Nearest Neighbors (k-NN). In this study, in k-NN algorithm, the Euclidean distance rule was utilized in the searching process, and k value was chosen as 1, 3, and 5 for mass prediction. Pearson VII (PUK) kernel function was chosen SVR. In the prediction of seed mass, the number of inputs, the number of neurons in the hidden layer and the number of outputs were 10-10-1 ANN structure. In all MLP prediction, number of epochs were chosen as 1000, the learning ratio was 0.3 and the momentum coefficient was 0.1, and the Sigmoid was chosen as the activation function. The applied MLP model structure is provided in Fig. 3. Model performance was assessed by following statistical metrices, correlation coefficient (R), mean absolute error (MAE), root mean square error (RMSE), relative absolute error (RAE) and root relative squared error (RRSE) for estimation were calculated using Eqs. (711) (Parker 2001).

$$R = \frac{1}{n - 1}\sum\limits_{i = 1}^{n} {\frac{{(M_{i} - \mathop {M)}\limits^{.} \,\,(E_{i} - \mathop {E)}\limits^{.} }}{{S_{M} \,\,S_{E} }}}$$
(7)
$$MAE = \sum\limits_{i = 1}^{n} {\frac{{{\mid }E_{i} - M_{i} {\mid }}}{n}}$$
(8)
$$RMSE = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {\left( {E_{i} - M_{i} } \right)^{2} } }}{n}}$$
(9)
$$RAE = \frac{{\sum\limits_{i = 1}^{n} {{\mid }E_{i} - M_{i} {\mid }} }}{{\sum\limits_{i = 1}^{n} {{\mid }\mathop M\limits^{.} - M_{i} {\mid }} }} \times 100$$
(10)
$$RRSE = \sqrt {\frac{{\sum\limits_{i = 1}^{n} {(E_{i} - M_{i} )^{2} } }}{{\sum\limits_{i = 1}^{n} {(\mathop M\limits^{.} - M_{i} )^{2} } }}} \times 100$$
(11)

where n: Number of data, Mi: Measured target value, Ei: Estimated target value, Ṁ: Average of measured target values, Ė: Average of estimated target values, SE: Sum of estimated target values, SM: Sum of measured target values. Correlation coefficients (R) were analyzed to assess the success of estimations in accordance with the principles specified in Colton (1974). Correlation coefficients of between 0–0.25 indicate no correlation or slight correlations, 0.25–0.50 indicate moderate correlation, 0.50–0.75 moderate or high correlation; 0.75–1 indicate perfect correlation.

Fig. 3
figure 3

Structure (10-10-1) of the MLP model for prediction of the seed mass

Statistical analysis

Physical and color parameters were analyzed with the use of JUMP 13.2.0 software, image analyses of physical properties with SigmaScan®Pro 5.0 (2004) and MATLAB software, hierarchical clustering, and principal component analysis (PCA) with PAST software. The mean values of each parameter were subjected to the Shapiro–Wilk (W) test and after it was determined that the values showed normal distribution, analysis of variance was performed, and significant means were compared using the Tukey test (Düzgüneş et al. 1987).

Results and discussion

Physical attributes of chickpea cultivars

Mean values and Tukey groups of variance analysis for physical attributes of chickpea cultivars are given in Table 1. The mean values of seeds for both years (2018 and 2019) are given in the tables. As can be inferred from Table 1, the effects of years (Y) on seed mass, length and perimeter were not found to be significant. However, the effects of cultivars (G) and YxG interactions on mass, volume, length, thickness, geometric mean diameter, surface area and equivalent diameter (vertical) were found to be significant at a 1% level.

Table 1 Mass, dimension and area traits of chickpea cultivars and Tukey groups*

The highest seed mass was obtained from Ilgaz cultivar (0.49 g) and the lowest from Menemen 92 and İzmir 92 cultivars (0.38 g). The highest volumes were obtained from Sarı 98, Ilgaz, Çakır, Cevdetbey and Aksu cultivars, respectively and the lowest from Menemen 92 cultivar. The greatest length was measured in Çakır cultivar (12.13 mm) and the lowest in Menemen 92 cultivar (10.13 mm). In terms of seed width, Cevdetbey, Sarı 98, Aksu, Ilgaz cultivars had the highest values, while İzmir 92 and Menemen 92 cultivars had the lowest values. The highest thickness was obtained from Aziziye 94 cultivar (9.29 mm). Geometric mean diameter values varied between 9.00—9.96 mm.

Nikobin et al. (2009) reported seed lengths of chickpeas with different moisture content as between 8.46–9.71 mm. Sastry et al. (2019) reported length values (between 6.4–0.6 mm) lower than the present ones. Masoumi and Tabil (2003) reported the average seed width of large, small and desi types respectively as 8.35, 7.06 and 6.46 mm. Seed thickness and mass for kabuli type were reported as 8.25 mm and 0.507 g. Simonyan et al. (2009) reported seed widths of 'Rongai' cultivar as between 7.39–8.07 mm and thickness values as between 5.39–5.88 mm. Geometric mean diameter values of 'Rongai' cultivar varied between 7.17–7.90 mm. Çetin (2022) reported volumes of 5 different chickpea cultivars as between 76.38–232.77 mm3 and seed mass values as ​between 0.09–0.29 g. Present findings on seed volumes comply with the results of Masoumi and Tabil (2003), Simonyan et al. (2009) and Sastry et al. (2019).

The greatest projected area (vertical orientation) was obtained from Çakır (82.60 mm2) cultivar and the lowest from İzmir 92 (65.42 mm2) cultivar. The projected area (horizontal orientation) values changed between 63.46 and 81.44 mm2 and differences between the years were found to be significant (p < 0.05) and the average value was identified as 76.12 mm2 in the first year and 71.35 mm2 in the second year. The highest surface area (313.21 mm2) was obtained from Ilgaz cultivar and the cultivars Sarı 98 (312.32 mm2), Çakır (312.04 mm2), Cevdetbey 98 (309.83 mm2) and Aksu (306.26 mm2) were also placed into the highest group and the lowest value was obtained from Menemen 92 (255.82 mm2) cultivar. In vertical orientation, the greatest perimeter was obtained from Çakır (44.74 mm) cultivar and the lowest from Menemen 92 (33.48 mm) and İzmir 92 (33.86 mm) cultivars. While Seçkin (41.58 mm) cultivar had the greatest perimeter in horizontal orientation, the lowest was obtained from Menemen 92 (32.57 mm) cultivar. For perimeter, differences between the years were not found to be significant. Among the chickpea cultivars, Çakır (10.22 mm) cultivar had the highest equivalent diameter in vertical orientation and Zuhal (10.62 mm) cultivar had the greatest equivalent diameter in horizontal orientation.

Masoumi and Tabil (2003) reported the average projected area of large-type chickpeas as 66.09 mm2. The surface area is closely related to evaporation from the seed surface. George et al. (2007) stated that drying rates increased with the increasing surface area of ​​the seed. Similar to the present findings, Kibar et al. (2014) reported the average surface area of bean seeds ​​as 200.62 mm2 in non-irrigated areas and 239.99 mm2 in drip-irrigated areas. Çetin (2022) reported the projected area and surface area of 5 soybean cultivars respectively as between 21.76–45.75 mm2 and between 87.06–182.99 cm2.

The variance analysis results, mean values and Tukey groups for shape and color attributes of chickpea cultivars are provided in Table 2. Effects of YxG (1%) interactions on compactness (Ch), shape factor (SFv and SFh), a* and b* parameters were found to be significant. Effects of years on sphericity, shape index, elongation (Ev) and b* were also found to be significant at 1% level. Compactness (Cv) values varied between 16.56–29.12 and all cultivars were placed into the same statistical group. The greatest sphericity value (89.04%) was obtained from Menemen 92 cultivar and the lowest from Çakır (82.48%) cultivar. Among the chickpea cultivars, Çakır (1.35) had the highest shape index value. For roundness (in vertical orientation), Aziziye cultivar (0.84) was prominent, while Azkan cultivar (0.69) had the lowest value.

Table 2 Shape and color parameters of chickpea cultivars and Tukey groups*

The lowest roundness (horizontal orientation) value (0.66) was obtained from Zuhal cultivar, while the highest value (0.79) was obtained from Damla cultivar. A roundness value of close to “1” indicates a circular seed shape (Çetin et al. 2020; Çetin 2022). Consistent with present findings, Simonyan et al. (2009) reported the equivalent diameter values of 'Rongai' chickpea cultivar at 9.7, 21.1, 23.9 and 29% moisture contents respectively as 7.26, 7.77, 7.83 and 8.01 mm. Sastry et al. (2019) reported mean sphericity values of desi, kabuli and intermediate-type chickpea seeds as 79.5, 85.7 and 84.5%, respectively. Those values were higher than the present ones. Nikobin et al. (2009) indicated that the sphericity values of chickpea seeds with different moisture content varied between 86.69–88.50% for 'Arman' cultivar and between 86.30–87.65% for 'Hashem' cultivar. Eissa et al. (2010) reported the sphericity value of chickpea seeds as 85.53% for 'Giza 3' cultivar and 87.00% for 'Giza 195' cultivar. Simonyan et al. (2009) reported sphericity values of Rongai' cultivar at different moisture contents as between 77.4–76%. Çetin (2022) reported average sphericity, shape index and roundness as 88.47%, 1.20 and 0.78, respectively. The mean aspect ratio and elongation of soybean seeds were also reported as 0.78 and 1.30, respectively.

While Azkan (1.35) cultivar had the highest elongation in the vertical orientation, the lowest average was obtained from Aziziye 94 (1.18) cultivar. The highest average elongation in horizontal orientation was obtained from Çakır cultivar (1.35) and the lowest values were obtained from Damla 89 (1.19), Menemen 92 (1.20) and Çağatay (1.21) cultivars. While classifying seeds based on shape ratio, the ones with a shape ratio of less than 1.25 are classified as round and the ones with a ratio of greater than 1.25 are classified as round (Çetin et al. 2020; Çetin et al. 2021). In the present study, chickpea cultivars were all classified as “oval” in shape. It was determined that the roundness and sphericity values decreased.

Similar to the current study, Nikoobin et al. (2009) reported the elongation value in horizontal orientation as 1.21 for 'Arman' cultivar and 1.22 for 'Hashem' cultivar. Elongation value in horizontal orientation was reported as 1.27 for both 'Arman' and 'Hashem' cultivars.

The highest L* value was obtained from Sezenbey (72.80) cultivar, and the lowest value was obtained from Cevdetbey (67.59) cultivar. The a* values varied between 6.23–8.18. The highest b* value was obtained from İzmir 92 (23.75) cultivar and the lowest values from Ilgaz (19.64) and Aksu (19.65) cultivars.

Jogihalli et al. (2017) reported L* and b* values of chickpea seeds respectively as 88.48 and 22.43, which were higher than the present values. Abou-Salem and Abou-Arab (2011) reported L* and b* values for chickpea seeds respectively as 82.22 and 19.69 for 'Giza' cultivar. Complying with the present findings, Güzel and Sayar (2012) reported L*, a*, and b* values of chickpea seeds respectively as 60.75, 6.77, and 16.14.

Correlations among physical properties of the cultivars

The correlation matrix between the physical properties of the chickpea cultivars is tabulated in Table 3. Seed compactness (in vertical orientation) had negative correlations with shape factor (in vertical orientation) (r =– -0.93). The projected area in vertical orientation had a positive correlation with equivalent diameter in vertical orientation (r = 0.90, roundness in the vertical orientation (r = 0.90), volume (r = 0.95) and compactness in the horizontal orientation (r = 0.95). Compactness in horizontal orientation also had a positive correlation with equivalent diameter in the vertical orientation (r = 0.99) and elongation in vertical orientation (r = 0.99). Besides, there was a positive correlation between shape index and elongation in horizontal orientation (r = 0.95). In addition, a* had the high correlation with b* (r = 0.97).

Table 3 Correlation matrix for physical characteristics of chickpea cultivars

Similar to the present findings, Sastry et al. (2019) reported positive correlations of 100-seed weight with seed length (r = 0.901), width (r = 0.959) and thickness (r = 0.961); positive correlations of seed width with sphericity (r = 0.740) and shape index (r = 0.727) and positive correlations between sphericity and shape index (r = 0.958).

Hierarchical cluster analysis (HCA)

Hierarchical cluster analysis was used to cluster present cultivars based on physical properties and a dendrogram was generated based on the physical properties of chickpea cultivars (Fig. 4). The dendrogram-generated Euclidean similarity index was used in Unweighted pair group method with arithmetic mean (UPGMA) algorithm. Correlation coefficient (r) values for the first, second and combined years were calculated as 0.6858, 0.7280 and 0.8376, respectively. Three sub-clusters were formed under two main clusters in the dendrogram. There were two cultivars (Menemen and İzmir) in the sub-cluster of first main cluster. The second main cluster was divided into two sub-clusters. While two sub-clusters were divided into two sub-groups, the second sub-group included Cevdetbey, Sarı, Ilgaz and Çakır cultivars, which were similar in terms of physical properties. Aksu cultivar was separated from the other second sub-group and took place in the first group of the second sub-cluster of the third main cluster alone. In addition, Akça, Osmanbey and Gökçe cultivars were similar to each other and formed the first group of the second sub-cluster of the second main cluster. According to Euclidean distances, Akça – İnci and Damla – Işık cultivars were the closest cultivars in terms of physical characteristics.

Fig. 4
figure 4

Dendrogram for physical characteristics of chickpea cultivars (Euclidean similarity index was used in Unweighted pair group method with arithmetic mean (UPGMA) algorithm. 1st Year: r = 0.6858; 2.nd Year: r = 0.7280; Mean: r = 0.8376)

Sastry et al. (2019) stated that three main groups were formed in hierarchical cluster analysis for the physical and hydration properties of chickpea seeds. The first group consisted of 26 cultivars largely represented by the cultivars, except for one desi cultivar; the second group consisted of 57 cultivars represented by desi type (30), kabuli type (19) and intermediate type (8) cultivars. The authors found that the third group, except for an intermediate cultivar, consisted of the desi type (128) cultivars. Cultivars in the first group were prominent for seven traits (100 seed weight, seed volume, surface area, sphericity, seed shape direction, hydration capacity, swelling capacity), cultivars in the second group for four traits (seed moisture content, seed bulk density, porosity, swelling index) and cultivars in the third group for three traits (seed coat content, seed true density, seed hydration index).

Guidoti et al. (2018) performed hierarchical clustering analysis (HCA) for morphological, agronomic, and molecular characteristics of commonly grown bean cultivars and indicated that 17 cultivars were clustered into two large groups with a difference value of 0.81. The first group was composed of 35.3% of the examined cultivars and the second group was composed of 64.7% of the cultivars.

Principal component analysis (PCA)

In the principal components analysis, the first two principal components (PC1: 50.37 and PC2: 22.80) accounted for 73.17% of total variation in physical properties. The first principal component included length, geometric mean diameter, volume, and surface area traits and the second principal component included roundness (in the horizontal orientation), thickness, elongation (in horizontal orientation) and sphericity traits.

A positive correlation was identified among geometric mean diameter, surface area and volume. In addition, a close relationship was found between the projected area (in the horizontal orientation) and the equivalent diameter (horizontal orientation). Ilgaz, Sarı and Çakır cultivars were determined to be ahead of the other cultivars in terms of geometric mean diameter, volume and projected area. Aksu and Cevdetbey were found to be prominent for geometric mean diameter and surface area ​​(Fig. 5).

Fig. 5
figure 5

PCA analysis and group centroids for physical characteristics of chickpea cultivars *L: length; W: width; T: thickness; M: mass; V: volume; GMD: geometric mean diameter; SA: surface area; S: sphericity; SI: shape index; PAv: projected area (vertical); PAh: projected area (horizontal); EDv: equivalent diameter (vertical); EDh: equivalent diameter (horizontal); Pd: perimeter (vertical); Ph: perimeter (horizontal); SFv: shape factor (vertical); SFh: shape factor (horizontal); Cv: compactness (vertical); Ch: compactness (horizontal); Ev: elongation (vertical); Eh: elongation (horizontal); Rv: roundness (vertical); Rh: roundness (horizontal); L*: Lightness; a*: redness-greenness; b*: yellowness-blueness

Sastry et al. (2019) conducted a principal component analysis of physical and hydration properties of chickpea seeds. The first PC (PC 1) explained 59.3% of the total variation, followed by PC 2 (14.8%) and PC 3 (7.0%) (three principal components together explained 81.05% of total variation). PC1 included seed weight, geometric mean diameter, hydration capacity, seed length, width, thickness, volume, sphericity, surface area, and seed swelling capacity, PC2 included seed bulk density, seed porosity, and swelling index and PC3 included only seed true density. It was determined that only two of 15 traits in the first three PCs (seed coat content in PC 1 and seed true density in PC 3) had a negative effect on variation.

Kibar et al. (2014) evaluated the factor coefficients for quality traits of dry beans grown under non-irrigated and drip-irrigated conditions with the use of PCA and indicated that the first three principal components explained 70% of the total variation under non-irrigated conditions and two principal components (PC1 and PC2) accounted for 100% of the total variation in physical and mechanical properties under drip-irrigated conditions. While the first principal component (PC1) explained 70.16% of the total variation, the second principal component (PC2) explained 29.84% of the total variation.

Prediction results of machine learning algorithms

Machine learning models were built according to data from 26 chickpea cultivars. The results were assessed with the statistical parameters of the seed mass prediction. Performance results of prediction using MLP, RF, SVR, and kNNs are presented in Table 4. The higher R and lower MAE, RMSE, RAE, and RRSE were indicated successful prediction.

Table 4 Comparison of the performance of all machine learning algorithms for seed mass prediction

The highest correlation coefficient values were found in RF and ML with the values of 0.8054 and 0.8043 for train-test split, and 0.8231 and 0.8142 for k-fold cross validation, respectively. Generally, algorithms use RMSE because of easy to perform mathematical operations. The lowest RMSE was also observed in RF as compared to the other algorithms for both validation methodologies. The lowest MAE, RMSE, RAE, and RRSE values determined from RF for train-test split methods as 0.0276, 0.0414, 51.01%, and 59.27%, respectively. For k-fold cross validation, RF also had the lowest MAE (0.0272), RMSE (0.0382), RAE (51.22%) and RRSE (56.78%) values. In this study, the lowest R values were found in 1-NN algorithms as 0.7271 and 0.7059 for train-test and k-fold methods, respectively. The greatest MAE, RMSE, RAE and RRSE values were determined in 1-NN algorithm for both validation methodologies. Among the k-NN algorithms, most successful “k” value was obtained as 5. The 5-NN algorithm had the greatest R with the values of 0.7974 and 0.8065, respectively. The 3-NN yielded similar R values with 5-NN for both validation methodologies.

Similar to the present study, Soares et al. (2013) reported the R2 values as between 0.69 and 0.91 for mass prediction of banana bunch using six different ANN structures. Rad et al. (2017) obtained R values of four different ANN structures as between 0.67 and 0.93 for eggplant mass prediction. Saglam and Cetin (2022) used MLP, kNN, RF, and GP to predict pistachio mass and stated Gaussian Processes (GP) had the lowest RMSE (0.038 for nut and 0.029 for kernel mass prediction) and the greatest R (0.976 for nut mass and 0.948 for kernel mass prediction). Gurbuz et al. (2018) obtained the greatest correlation coefficient with the value of 0.8603 for almond mass prediction by different data mining algorithms. Demir et al. (2020) used Modular Neural Network (MNN) and Radial Basis Neural Network (RBNN) structures to predict walnut mass from the physical attributes and reported similar RMSE values between 0.60 and 0.89 for MNN and as 0.0002 for RBNN. In contrast, Rad et al. (2015) indicated predicted mass of melon by MLP algorithms and R2 was lower than the present value. Overall, each base learner performed all statistical parameters well with all achieving an R value of > 0.7059. These results showed that all models had sufficient success in the mass prediction of seed.

The limitation of the study, because of the large number of images, affected the computer's processor in image processing and machine learning processes and this extended the processing time. Another limitation, darker colors were not preferred for the background while the images were taken. This situation caused the segmentation of images to be difficult. It is recommended that researchers pay attention to these two factors in future studies.

Conclusion

In this study, the physical properties of 26 different chickpea cultivars grown for two years under the same ecological conditions without using any chemical inputs were determined by image processing technique, and relationships between these traits were evaluated. Highly positive correlations were seen between projected area (in vertical orientation) and equivalent diameter (in vertical orientation), roundness (in vertical orientation), volume, and compactness (in horizontal orientation). Besides, there was a high correlation between a* and b*. In PCA analysis, the PC1 and PC2 explained 73.17% of the total variation. The first principal component included length, geometric mean diameter, volume, and surface area, the second principal component included roundness (in the horizontal orientation), thickness, elongation (in the horizontal orientation), and sphericity. According to Euclidean distances, Akça – İnci and Damla – Işık cultivars were the closest cultivars in terms of physical characteristics. The Random Forest and Multilayer Perceptron yielded better outcomes as compared to the other machine learning algorithms. Present findings may contribute significant data for quality grading, selection, and breeding studies, and the food industry. Computer and machine vision techniques offered an accurate and rapid classification of chickpea cultivars and may offer effective and practical tools for agricultural machinery designers and seed companies.

In recent years, technological developments in the seed industry have significantly improved the quality and functional requirements of seeds. In this study, important knowledge is presented for both the seed industry. In future studies, the use of different algorithms, attribute selection methods, and species will contribute to the literature and optimization of systems. Therefore, it is thought that the use of deep learning techniques together with machine learning for these systems will increase efficiency.