Introduction

Almond [Prunus dulcis (Mill.) D.A.Webb syn. Prunus amygdalus (L.) Batsch] is a species of genus Prunus and subgenus Amygdalus (Rosaceae, subfamily Prunoideae). This species originated in Central Asia and dispersed through cold and xeric environments in the mountainous areas and deserts of western China and into Iran (Watkins 1976). Due to the high nutritive value of almond fruit and its favorable effects on human health (Kester et al. 1991; Amarowicz et al. 2005), the almond tree is of great importance throughout the world. Wild almond species commonly grow in areas between 28° and 38°N and 41° and 54°E and from 1,100 to 2,700 m altitudes (Kester et al. 1991).

Traditional methods for cultivar and genotype characterization and identification of almonds are based on phenotypic observations. Morphological traits are useful for preliminary evaluation because they facilitate fast and simple evaluation and can be used as a general approach for assessing genetic diversity among morphologically distinguishable genotypes. Morphological characterization combined with multivariate statistical methods such as principal component analysis (PCA) and cluster analysis are useful for screening genotypes (Lansari et al. 1994; Prats-Moya et al. 1997; Talhouk et al. 2000; Sorkheh et al. 2009, 2010).

Multivariate techniques can help to evaluate large data sets and resolve several phenotypic and genotypic measurements into fewer more interpretable and more easily visualized groups. PCA is a method of data reduction that transforms the original variables into a limited number of uncorrelated new variables. This method is therefore useful for representing a set of variables with a much smaller set of composite variables that account for much of the variance among the original set. It facilitates visualization of differences among individuals and the identification of possible groups and relationships among individuals and variables (Martinez-Calvo et al. 2008).

Iran, with a total land area of 1,648,195 square kilometers, lies between 25° and 39°N latitude and 44° and 63°E longitude and is primarily subtropical in the southern half of the country, temperate in the northern half part, and mostly desert in the middle. The resultant variability in environment and climate has resulted in an extensive diversity of plant germplasm (Ghahreman and Attar 1999). Almond production in Iran is based on locally adapted clones, with minimum to no inputs, and traditional management. The objective of this study was to describe the variability in 90 selected almond genotypes, determine the correlation among traits, identify the most useful variables for discrimination among genotypes, and detect relationships among genotypes. Furthermore, an evaluation of economically valuable traits was performed to identify useful genotypes for almond producers and breeding programs.

Materials and methods

Plant material

This study was carried out on 90 almond genotypes with five trees for each genotype as replicates. Genotypes were selected from a collection in Ashtian city, Markazi province, Iran, at 34°31′30″N latitude, 50°00′10″E longitude and 2,120 m height above sea level. The city has an annual average temperature of 12.80 °C and an annual average precipitation of 350 mm. Initially, genotypes were selected after evaluation on the basis of regular fruit production and observed phenotypic diversity, mainly according to relevant morphological traits of the tree and nuts, as well as phenology from the main almond producing areas of central Iran and then cultivated in the mentioned collection. At the assessment, trees were mature (10 years old), healthy and fruit producing.

Morphological evaluation

Morphological and pomological characterization of each genotype was performed according to the guidelines provided by the International Plant Genetic Resources Institute (IPGRI) (Gulcan 1985). This manual includes agronomic and biological characters to be recorded for the almond trees. In this work, 60 phenotypic and agronomic characteristics, comprising 32 quantitative and 28 qualitative traits, were used to assess the range of morphological variation among the genotypes during two growing seasons (Table 1). Measurements of nut and kernel traits were based on 20 replicates and the mean values were used. Variables such as leaf size (length and width) and fruit size (length, width and thickness) were measured by a digital caliper. Fruit weight was measured by an electronic balance with 0.01 g precision. Traits such as tree habit, tree vigor, leaf shape, fruit shape, fruit color and kernel taste were determined based on rating and coding according to the almond descriptor (Table 2). Agronomic traits evaluated included flowering date (evaluated from March to April as relatively early, middle, late and very late within this period), pollen-pistil compatibility (evaluated by bagging flowers and characterizing resultant fruit set as self-compatible (i.e. fruit set comparable to nearby open-pollinated branches) or self-incompatible (no fruit set to very low fruit set), and ripening date (evaluated from end of August to September as early, middle, late and very late within this period). The Tardy-Nonpareil cultivar (late-flowering) was used as a control for flowering time.

Table 1 Descriptive statistics for morphological characters for the studied almond collection
Table 2 Codes and states of the qualitative variables for the studied almond collection

The data resulting from the two-year study (2012 and 2013) were grouped, and the average values were used for statistical analysis. The following parameters were evaluated for their quantitative variables: mean, minimum value, maximum value, and coefficient of variation (CV %). Analysis of variance was performed for all morphological traits by SAS software (SAS Inst. 1990) using one-way ANOVA. Coefficients of variation (CV %) were determined as indicators of variability. Correlations between the traits were determined using the Spearman correlation coefficients by SPSS software. Relationships among genotypes were investigated by principal component analysis (PCA) using SPSS statistics software. Mean values were used to create a correlation matrix from which standardized principal component (PC) scores were extracted. To avoid the effects due to scaling differences, mean of each character was normalized prior to cluster analyses using Z scores. Thereafter, Euclidean distance coefficient for pairs of entries (i.e. genotypes) was computed using NTSYSY-pc (Numerical Taxonomy and Multivariate Analysis for personal computer) software program version 2.00 (Rohlf 2000). To better understand the patterns of variation among genotypes, distance matrix generated from morphological data was used as input data for cluster analysis based on unweighted pair-group method of arithmetic average (UPGMA). Also, a scatter plot was created according to the PC1 and PC2 using PAST statistics software (Hammer et al. 2001).

Results and discussion

Genotype characterization

Native almond germplasm in Iran is diverse. All traits showed large differences between the studied almonds according to the ANOVAs (P ≤ 0.01), indicating a high level of morphological variation (data not shown). This was confirmed by the relatively high coefficient of variation (CV) values established for the majority of the examined traits. In general, the highest levels of variation were found for kernel taste (CV = 82 %), suture opening of the shell (CV = 76 %), location of flower bud (CV = 67 %), percentage of double kernels (CV = 65 %) and kernel shriveling (CV = 65 %), whereas shell retention (CV = 7 %), green fruit length/width (CV = 9 %), green fruit width (CV = 10 %), kernel length (CV = 10 %), green fruit length (CV = 10 %), and kernel width (CV = 10 %) showed the smallest differences (Table 1). Zeinalabedini et al. (2012) reported a range of 5.85–76.06 % for coefficient of variation for morphological traits in their almond germplasm. The coefficient of variation (CV %) is a parameter that is not related to a unit of measure, and thus is more effective in comparing the traits examined. The CV value may be an indicator of the ability to distinguish between genotypes based on morphology. Morphological characteristics with a low CV are more homogeneous and repeatable among genotypes, while descriptors with high CV values are more discriminating and can be reliable markers for the characterization of genotypes. Our studied genotypes have shown suitable candidates for breeding programs according to our objectives: extension of the blooming season, moderate vigour and fruit quality.

Vegetative tree habit

The genotypes investigated showed large differences in tree vigor, so that vigor of genotypes was generally found to be weak, intermediate to strong. Most genotypes showed medium vigor. Tree habit was drooping, spreading, spreading to upright, upright and extremely upright (Table 2), although the dominant tree habit was spreading, agreeing with previous results (Colic et al. 2012; Zeinalabedini et al. 2012). Lateral branching, branch density, branch flexibility, and canopy density varied from low to high.

Leaves

Leaf density ranged from low to high. Four types of leaves were distinguished: flat, ovate, narrow elliptic, and elliptic. Leaf-base shape was predominantly obtuse, followed by acute. Leaf apex shape was acute or oblate (Table 2). The highest values for shoot leaf length and width were 8.56 and 2.88 cm, while the least shoot leaf length and width were 3.90 and 1.38 cm, respectively. Also, spur leaf length ranged 3.70–7.54, while spur leaf width varied from 1.08 to 2.24 cm. Petiole length ranged from 0.91 to 3.39 cm (Table 1).

Bloom time and flower characteristics

Full blooming dates (50 % of flowers completely open) were recorded for all genotypes, and average values from 2 years were considered. Results showed that the considered genotypes had considerable differences in blooming time; flowering time of the studied genotypes varied from March 19 to April 8. Genotypes were then divided into six groups based on flowering date early, middle, middle-late, late, late-very late and very late within this period, indicating differences in the chilling requirements of the studied genotypes, in agreement with findings of Zeinalabedini et al. (2012). Most of the genotypes were middle-late flowering. Genotype Badam40 was the earliest to bloom, flowering at 19 March. The latest genotype to bloom was Badam19 that was very-late flowering (8 April), two days after flowering of Tardy-Nonpareil, a very-late flowering cultivar (Sorkheh et al. 2009). Also, genotypes Badam80 and Badam82 were late to very late flowering. Frost resistance is a major breeding goal in many production areas owing to cultivated almond’s very early flowering time during late winter and early spring. The possibility of use of these genotypes (Badam19, Badam13 and Badam14) with a late to very late flowering date as cultivar in orchards or to develop new cultivars with delayed flowering would reduce frost damage. Also, disease damage is reduced because their flowering is delayed beyond the rainy season, and will allow more efficient use of increasingly scarce insect pollinators (Rickter 1972). In most genotypes, the flower buds were mainly on spurs, whereas distribution was mixed in some genotypes, in agreement with findings of Colic et al. (2012) and Zeinalabedini et al. (2012). Genotypes with light pink petal color were dominating with few having white or pink petals. Most of the genotypes had intermediate flower size. In terms of blooming density, the genotypes were classified into four categories; most of the genotypes had high density.

Furthermore, all of the studied genotypes were found to be self-incompatible and needed pollinizer. Gametophytic incompatibility prevents self-fertilization, favours crosspollination, and maintains genetic variability within seedling populations (Arulsekar et al. 1986). This trait, although is a negative trait from the agronomic point (Socias i Company 1992), would have contributed to the phenotypic and genotypic variability that likely insured the wide distribution and adaptation of almond.

Ripening time and fruit characteristics

Harvest date (commercial ripening stage) was recorded for all genotypes in the experimental site. There were large variations in ripening date between the studied genotypes. The earliest harvest time was found in Badam69 and Badam80 (August07), followed by Badam45 and Badam58 (August 09), while the latest harvest time (September 9–11) was recorded for Badam28, Badam73, Badam57, Badam35 and Badam15. A 1-month later harvesting season for these five genotypes is an important element and makes them promising genotypes for future almond breeding. Ripening date for most almond genotypes was 1st to 8th September. Fruit yield varied from very low to very high and showed big differences between genotypes. The highest yields were found for the six genotypes Badam13, Badam14, Badam62, Badam63, Badam80 and Badam82.

Oblong nut shape, intermediate shell color intensity, intermediate marking of the outer shell, and semi-hard shell were dominant in most genotypes. Nut weight varied from 1.94 to 7.74 g. Colic et al. (2012) reported nut weight with range of 2.53 to 6.00 g for almond. Nut length ranged from 2.24 to 4.52 cm and nut width varied from 1.72 to 3.62 cm. Kernel length ranged from 1.65 to 3.01 cm and kernel width varied from 1.11 to 1.92 cm, while kernel weight ranged between 0.72 and 2.24 g (Table 1). Colic et al. (2012) recorded 0.62 to 1.29 g for this trait. Kernel weight/shell weight ratio was 0.19 to 1.14. Zeinalabedini et al. (2012) recorded 2.20–4.20 cm for nut length, 1.60–3.00 for nut width, 1.00–7.00 g for nut weight, 1.00–3.00 for kernel length, 1–1.70 for kernel width, and 0.50–2.30 g for kernel weight for another Iranian collection.

Kernel shape was flat to oblong, although oblong was dominant in most genotypes, in accordance with results of Zeinalabedini et al. (2012). Slightly wrinkled kernel shriveling was dominant and agreed with results of Zeinalabedini et al. (2012) and Colic et al. (2012). Kernel pubescence ranged from low to high, although it was predominantly moderate in most of genotypes. Kernel taste was predominantly sweet compared to intermediate and slightly bitter. Results showed that all of the almond genotypes had different frequency of double kernel occurrence. Percentage of double kernel ranged from 0.00 (in Badam9, Badam10, Badam11, Badam15, Badam61, Badam78 and Badam88) to 100 % (Badam46), with an average of 38.88 %. The occurrence of double pistils is markedly increased when the trees are exposed to high temperatures throughout the period of flower differentiation (Beppu et al. 2001). Kernel color that is an important characteristic from a commercial viewpoint varied from light to dark. Agronomic characters and the most important fruit traits for superior genotypes are presented in Table 3.

Table 3 Flowering date, harvest date, and most important fruit traits for superior almond genotypes in this investigation

Correlations among variables

Strong correlations were observed among most of the studied traits (S1). Tree canopy was positively correlated with tree habit (r = 0.25), tree vigor (r = 0.24), branching habit (r = 0.25), branch density (r = 0.28), trunk diameter (r = 0.25) and leaf density (r = 0.44), which corresponds with previous results (Talhouk et al. 2000; Nikoumanesha et al. 2011; Sorkheh et al. 2009). Significant positive correlation coefficients were determined between shoot leaf length and shoot leaf width (r = 0.59) and between spur leaf length and spur leaf width (r = 0.38), which corresponds with previous results (Talhouk et al. 2000; Sorkheh et al. 2009). The full-bloom season was positively correlated with ripening date (r = 0.43), nut weight (r = 0.37) and kernel weight (r = 0.31), in agreement with findings of Zeinalabedini et al. (2012). This could be interpreted as the tendency of late-bearer cultivars for having heavier nuts and kernels. In addition to the high correlations among nut traits (nut length, nut width, nut thickness, and nut weight), kernel traits including kernel length, kernel width, kernel thickness, and kernel weight, were also correlated with each other and with nut traits. For instance, a positive correlation was observed between nut weight and kernel weight (r = 0.53), which is in accordance with previous results (Ledbetter and Shonnard 1992; Talhouk et al. 2000; Sorkheh et al. 2010; Zeinalabedini et al. 2012). On this basis it can be concluded that these characters have a similar effect on determining cultivar cropping potential and also germplasm characterization. In addition, softness of shell was negatively correlated with nut weight (r = −0.57), kernel weight (r = −0.52), and kernel/shell weight ratio (r = −0.65) and agreed with previous results (Sanchez-Perez et al. 2007; Zeinalabedini et al. 2012). This result indicates that a softer shell is accompanied by a higher kernel-to-shell ratio and that soft-shell cultivars have a tendency to bear relatively larger kernels (i.e. more kernel than shell). Also, kernel pubescence showed positive correlations with kernel shriveling (r = 0.40) and softness of shell (r = 0.45). A close relationship between traits could facilitate or hinder gene introgression since strong selection for a desirable trait, could favor the presence of another desirable trait from this population. Also, established relationships between some traits can help breeders in setting goals for parental partner selection and breeding (Dicenta and Garcia 1992).

Principal component analysis (PCA)

Principal component analysis (PCA) proved useful for determining the most significant variables, as well as the stability of characteristics. For each factor, loading values above 0.54 were considered as significant, which indicated that 17 components explained 77 % of the total variance. The first three PCs accumulated 33.78 % of the variation (Table 4), indicating that these attributes have the highest variation between the genotypes and had the greatest impact on separation of the genotypes (Iezzoni and Pritts 1991). Green fruit width, ease of harvesting, shell weight, nut width, nut weight, kernel width, had positive loads and full bloom date, softness of shell and kernel weight/shell weight ratio had negative loads in PC1 (14.28 % of the variability). Morphological characteristics related to green fruit length, green fruit length/width ratio, nut shape, nut length, nut length/width ratio, nut length/width/thickness ratio, kernel length, kernel length/width ratio and kernel shape were influential in PC2 (9.84 %). In PC3 (9.67 %), five traits had a stronger positive influence: green fruit thickness, kernel thickness, kernel weight, number of pistils and percentage of double kernels. Zeinalabedini et al. (2012) reported that nut and kernels weights had the highest loadings in the first two components. Furthermore, Lansari et al. (1994), Talhouk et al. (2000), and Sorkheh et al. (2009), who used a similar analysis to compare kernel, nut, and leaf characters in different almond collections, found that the variables contributing to nut and kernel size were more important than leaf traits. This indicates that these traits are not only useful for assessment of diversity but also for characterization of almond germplasm. The remaining components (PC4–PC17) explained less variability (43.22 % of total variance) and included other variables. The aim of principal component analysis is determining the number of main factors for reducing the number of effective parameters to discriminate genotypes. In addition, associations between traits emphasized by this method may correspond to genetic linkage between loci controlling traits or a pleiotropic effect (Iezzoni and Pritts 1991). Previously, PCA has been used to establish genetic relationships among cultivars and genotypes, to study correlations among tree traits and to evaluate germplasm of different Prunus species (Ruiz and Egea 2008; Sorkheh et al. 2009; Nikolic et al. 2010; Khadivi-Khub et al. 2012, 2013).

Table 4 Eigenvectors for the main variables for the first three principal component axes from PCA analysis in the studied almond collection

Cluster analysis and scatter plot

Morphological analysis based on different characters showed high polymorphism between the studied 90 almond genotypes. UPGMA cluster analysis using Euclidean distance coefficients was used in order to divide the available data into groups of increasing dissimilarity. The dendrogram identified two major clusters, containing 38 and 52 genotypes each (Fig. 1). The first major cluster was divided into two subclusters which genotypes Badam1, Badam43, Badam73, Badam52, Badam85, Badam37, Badam46 and Badam67 were placed in the subcluster I and characterized by 1 year old shoot for flower bud location, high fruit length, very high percentage of double kernels and no shell retention, while the remaining 30 genotypes were placed in the subcluster II. Also, the second cluster included 52 genotypes that were divided to two subclusters.

Fig. 1
figure 1

Unweighted pair-group method of arithmetic average (UPGMA) dendrogram of the studied almond genotypes using Euclidean distance coefficient matrix, based on morphopomological data

Furthermore, a scatter plot was prepared according to the PC1 and PC2 that reflected relationship among genotypes in terms of phenotypic resemblance and morphological characteristics. Results supported findings from the cluster analysis with genotypes distributed into two groups (Fig. 2). Proceeding from the negative to positive values of PC1 (left to right) in scatter plot, the studied genotypes indicated a gradual increase in values of green fruit width, ease of harvesting, shell weight, nut width, nut weight and kernel width. Starting from the negative towards the positive values of PC2 (bottom to upwards), the studied genotypes showed decrease in values of green fruit length, nut length and kernel length. The characters adopted in this study could be used to establish a catalog of local almond genotypes.

Fig. 2
figure 2

Scatter plot for the first two principle components (PCs) for the studied almond genotypes based on morphopomological characters. Numbers in the plot represent genotypes

The UPGMA dendrogram and the scatter plot confirmed the high variation between genotypes, indicating that the studied germplasm is a good candidate gene pool for breeding programs. Discrepancy between cluster and scatter dendrograms can be explained by the variability considered for the analysis. The UPGMA cluster analysis was based on all morphological data and took into account the whole variability, while the cumulative variance explained by the first two principal components, creating the scatter plot, was relatively low (24.11 %). Yet, the relative concordance between the results of PCA and cluster analysis showed that morphological analysis can provide reliable information on the variability in almond trees. In correspondence with our findings, other authors (Ledbetter and Shonnard 1992; Talhouk et al. 2000; Sorkheh et al. 2010; Zeinalabedini et al. 2012) showed that morphological evaluation is an efficient tool for characterization of almond germplasm and for species distinction. The overall analysis of all traits illustrates a wide diversity that may have important implications for management of the genetic resources.

Conclusion

Almond cultivation in Iran has a long historical background, and there are many genotypes growing in different regions of the country. These genotypes include a vast range of diversity in many traits. The present study was undertaken to determine morphopomological and phenological traits of almond germplasm belonging to the central Iran. Germplasm showed a wide variation in full bloom date, harvest season, fruit yield, nut weight and kernel weight. One of the important findings of this study is that, among 90 genotypes studied, 27 had high to very high-yield and most of them were determined to have a kernel weight 1.50 g or above. This shows that most of the genotypes possessed a relatively good kernel weight. The highest kernel weights were in genotypes Badam46 (2.24 g), Badam24 (2.13 g) and Badam43 (2.05), which represented the maximum values for this characters. Thus, they can be considered unique and the most promising for breeding or commercial growing. Also, late-flowering genotypes Badam19, Badam13 and Badam14 are less threatened by spring frost and so can be used in breeding for later flowering. Furthermore, with attention to later ripening time, genotypes Badam28, Badam73, Badam57, Badam35 and Badam15 can be considered unique and the most promising for breeding or commercial growing. A high correlation was found among some almond quality attributes, which could reduce the number of morphopomological traits which need to be studied in breeding programs and orchard management. In addition, principal component analysis (PCA) made it possible to establish similar groups of genotypes, according to their quality characteristics, as well as to study relationships among morphopomological traits.