Introduction

Madhuca latifolia Macb. Syn M. indica, Bassia latifolia (Sapotaceae) commonly known as Indian butter tree is a large, highly branched, deciduous tree. Indigenous to the Indian subcontinent and grown predominantly in dry tropical and sub-tropical regions of the continent, it is common in deciduous forests and dry sal plain forests in scattered distribution. It is planted on large scale and extensively cultivated in northern India and the Deccan peninsula for sweet fleshy corollas and ripe fruits, used as a major source of industrial alcohol as well as liquor. The seeds contain valuable edible oil ranging from 38 to 57 % of seed weight that is used by forest dwelling tribal and local people for cooking and as medicine (rheumatism). The fatty oil is known in commerce as mahua butter, mahua fat, Illeppe butter or Basia fat. Historically mahua has been the single largest indigenous source of natural hard fat in soap manufacture both by the small-scale and organized sectors. Seed cake is used as organic manure, fish poison and exported as effective wormicide to control earthworms in turf, lawns and golf courses. Mahua has several medicinal uses. Seed paste is used to cure muscle fatigue and improve vigour of skin. Bark essence is used in curing bleeding gums and ulcers. In addition to these benefits and advantages, mahua seed oil, because it has properties similar to those of diesel fuel has gained importance as bio-diesel and is fast emerging as a viable alternative to fossil fuel.

The successful adoption of bio-fuels is reliant on the supply of feedstock from non-food crops with the capacity to grow on marginal land that is not destined for the cultivation of food crops (Hill et al. 2006). M. latifolia meets both of these criteria and is therefore a potential candidate to contribute significant amounts of fuel feedstock. Thus, it can form the basis of a highly promising, profitable, and self-sustaining platform for small-scale entrepreneurship and self-employment in rural areas, ensuring optimum utilization of wasteland resources and unemployed manpower. Although M. latifolia is well known as an oil-yielding tree with wide adaptability and many uses, little effort has been directed to improve it as a crop plant because of its long gestation period and slow growth. Even though M. latifolia supports the livelihood of tribal populations in India, no information is available regarding the total number of mahua trees, seed yield of seeds per tree and actual production. Hence, it is important to screen the naturally available M. latifolia genetic resources and to select the best planting material with high oil content for higher productivity. The selection of superior trees based on seed morphology and oil content may have greater impact than conventional breeding. There is no available information on geographical variation and its influence on seed quality and quantity with regard to Mahua. Mahua is a backbone for the livelihood of tribal communities in Jharkhand. Genetic improvement through selection, evaluation and breeding will definitely impact tribal socio-economics and improve livelihoods. For these reasons, we investigated genetic variation, association and divergence in seed traits and oil content among 23 accessions of M. latifolia collected from Jharkhand, India.

Materials and methods

Plus/superior tree selection

Reconnaissance survey was conducted to identify high yielding candidate plus trees (CPTs) of M. latifolia at fruiting stage from different predominantly naturalized locations in Jharkhand, India (Table 1; Fig. 1). We selected sample trees by using a single tree selection method based on phenotypic assessment of characters of economic interest, such as yield potential, crown spread, total height, diameter at breast height (dbh), age of the tree, and apparent resistance to pests and disease. We selected a total of 23 CPTs at latitudes from 22° to 24°50′N and longitudes from 83°30′ to 87°E. Using Microsoft Encarta (2005) edition, latitude and longitude of each CPT were recorded as listed in Table 1.

Table 1 Locational details of Madhuca latifolia Candidate Plus Trees (CPTs) selected in Jharkhand, India
Fig. 1
figure 1

Distribution of Madhuca latifolia candidate plus trees in Jharkhand, India mapped using Microsoft Encarta 2005. Details of number representation is in Table 1

Seed collection and recording of observations

A few kg of mature seeds were collected following a random sampling procedure from all four directions of the crown of each sampled tree during June-July 2005. The total seeds collected were randomly divided into three replications consisting of 100 seeds for measurement of morphometric traits, viz. seed length, seed breadth, aspect ratio, 2D surface area, seed perimeter, equivalent diameter and roundness using Image Analyzer (Leica). Seeds were spread on a glass platform of macro-viewer for each replication and images were captured using a charge-coupled device (CCD) camera and uploded into the software called Quantimet 500+ or Qwin. Qwin identified the objects based on our specification for seed colour and calibrated captured images to the actual scale. The various 2D measurements of the detected images were measured as follows:

  1. a.

    Seed length—Length of the object at longest side in mm

  2. b.

    Seed breadth—Length of the object at shortest side in mm

  3. c.

    Aspect ratio—Ratio of length divided by breadth.

  4. d.

    2D surface area—2D surface area of the seed in the direction of measurement in mm2

  5. e.

    Seed perimeter—Total length of the boundary of the feature

  6. f.

    Equivalent diameter—Diameter of the seeds measured in different directions and the mean in cm

  7. g.

    Roundness A shape factor that gives a minimum value of unity for a circle. This is calculated from the ratio of perimeter squared to area (Mahadevan et al. 1999)

$$ {\text{Roundness}} = \frac{{\text{Perimeter}^{\text{2}} }}{{\text{4} \times P_{\text{i}} \times \text{2}D\text{ area} \times \text{1}\text{.064}}} $$

Beside these image parameters, the following estimates were also recorded for each genotype.

  1. h.

    100-seed weight—Weight of 100 pure seeds in gm

  2. i.

    Total oil content—Estimated following the procedure of Sadasivam and Manickam (1992)

  3. j.

    Acid value—Estimated following the procedure of Cox and Pearson (1962)

  4. k.

    Iodine number—Estimated following the procedure of Horowitz (1975)

Data analysis

Best Linear unbiased predictors (BLUPs) were obtained for each trait. BLUPs were subjected to a significance test at 5 % critical difference to quantify differences between CPTs. Also pod traits of 23 genotypes of M. latifolia were analysed using analysis of variance (ANOVA) to understand the significance of differences between genotypes for various pod traits (Gomez and Gomez 1984). The phenotypic variation for each trait was partitioned into components due to genetic (hereditary) and non-genetic (environmental) factors and estimated using the following formula (Johanson et al. 1955):

$${V_{\text{p}}} = {\raise0.7ex\hbox{${MSG}$} \!\mathord{\left/ {\vphantom {{MSG} r}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$r$}};\,{V_{\text{g}}} = {\raise0.7ex\hbox{${(MSG - MSE)}$} \!\mathord{\left/ {\vphantom {{(MSG - MSE)} r}}\right.\kern-\nulldelimiterspace} \!\lower0.7ex\hbox{$r$}};\,{V_{\text{e}}} = MSE$$
(1)

where, MSG, MSE and r are the mean squares of CPTs, mean squares of error and number of replications, respectively.

The phenotypic variance (Vp) is the total variance among phenotypes when grown over the range of environments of interest. The genotypic variance (Vg) is the part of the phenotypic variance that can be attributed to genotypic differences among the phenotypes. The error variance (Ve) is the portion of the phenotypic variance attributed to environmental effects. To be able to compare the variation among traits, phenotypic coefficients of variation (PCV) and genotypic coefficients of variation (GCV) were computed according to the method suggested by Burton (1952):

$$PCV = \sqrt {{{Vp} \mathord{\left/ {\vphantom {{Vp} X}} \right. \kern-\nulldelimiterspace} X}} \times \,100;\,GCV\, = \,\sqrt {{{Vg} \mathord{\left/ {\vphantom {{Vg} X}} \right. \kern-\nulldelimiterspace} X}} \times \,100$$
(2)

where Vp, Vg and X are the phenotypic variance, genotypic variance and grand mean for each pod and seed-related trait, respectively.

Broad sense heritability (h2b) was calculated according to Allard (1999) as the ratio of the genotypic variance (Vg) to the phenotypic variance (Vp). Genetic advance (GA) expected and GA as a percentage of the mean assuming selection of the superior 5 % of the genotypes were estimated following Johanson et al. (1955) as below:

$$ GA = K \cdot h^{2} b \cdot \sqrt {V_{p} } ;\text{Genetic gain} = \left( {{\raise0.7ex\hbox{{GA}} \!\mathord{\left/ {\vphantom {{GA} X}}\right.\kern-0pt} \!\lower0.7ex\hbox{X}}} \right) \times \text{100} $$
(3)

where K is the selection differential (2.06 for selecting 5 % of the genotypes).

Phenotypic (rp) and genotypic (rg) correlations were further computed to examine inter-character relationships among seed and seedling traits following Goulden (1952) as:

$$ {r_p} = \,{{{\text{Cov}}_{p}^{({x_1},{x_2})}} \mathord{\left/ {\vphantom {{{\text{Cov}}_{p}({x_1},{x_2})({x_1},{x_2})} {{{\left[ {{V_p}({x_1})\cdot{V_p}({x_2})} \right]}^{1/2}}}}} \right. \kern-\nulldelimiterspace} {{{\left[ {{V_p}({x_1})\cdot{V_p}({x_2})} \right]}^{1/2}}}}$$
(4)
$${{{r_g} = {\text{Cov}}_{g}^{({x_1},{x_2})}} \mathord{\left/ {\vphantom {{{r_g} = {\text{Cov}}_{g}({x_1},{x_2})/({x_1},{x_2})} {{{\left[ {{V_g}({x_1})\cdot{V_g}({x_2})} \right]}^{1/2}}}}} \right. } {{{\left[ {{V_g}({x_1})\cdot{V_g}({x_2})} \right]}^{1/2}}}}$$
(5)

where Covp and Covg are phenotypic and genotypic covariances for any two traits x1 and x2, respectively, and Vp and Vg are the respective phenotypic and genotypic variances for those traits.

The mean observations for all traits for each season were standardized by subtracting from each observation the mean value of the character and subsequently dividing it by its respective standard deviation. These standardized values, with average 0 and standard deviation 1, were used for principal component analysis (PCA) on Genstat 10 to quantify the importance of different traits in explaining multivariate polymorphism. Cluster analysis was performed using the scores of the first three principal components (PCs) following Ward (1963). Mean, range and variance were computed for each trait and cluster. Means of clusters were compared using the Newman-Keuls (Newman 1939; Keuls 1952) procedure. The homogeneity of variances among the clusters was tested using Levene’s test (Levene 1960).

Results and discussions

BLUPs differed significantly among genotypes for all traits (Table 2; Fig. 2). Variability studies for seed traits revealed that genotype CPT-16 had the highest 100 seed weight (281.5 g) and oil content (51.0 %), followed by CPT-3 (263.9 g, 50.2 %). Maximum values for seed length (38.6 mm) and aspect ratio (2.2) were observed in CPT-15. Lowest 100 seed weight, oil content, aspect ratio, acid value, and Iodine number were recorded for genotype CPT-4 (216.8 g), CPT-21 (37.8 %), CPT-10 (1.6), CPT-13 (13.4) and CPT-11 (62.6), respectively. Though range is a crude measure of variability in genotypes, it does give an idea of the spread of variation for a particular character. Wide variation was observed for seed length (27.3–38.6 mm), seed breadth (15.6–19.1 mm), 2D surface area (328.3–495.4 mm2), 100 seed weight (216.8–285.3 g), acid value (13.4–25.8 mg KOH/g), iodine number (62.4–78.6) and oil content (37.8–51.0 %) (Table 3). It has been documented that seed weight depends on reserve food material, which is produced as a result of double fertilization (endosperm), is dominated by the maternal traits and is also influenced by the nutrient availability at the time of seed setting and environmental factors (Allen 1960; Johnsen et al. 1989). Embryo development and its physiological function are contributed by the maternal and paternal (pollen grain) traits in the species. The occurrence of M. latifolia over a wide range of habitats with diverse geo-climatic conditions was expected to be reflected in the genetic constitution of its populations. In the present study, the significant variation in seed traits exhibited by the genotypes could be attributed to isolations that influence gene flow. Significant variability of seed characters, such as seed size and weight, was observed in seeds of the selected plus trees (Bagchi and Sharma 1989) and among various provenances of Santalum album (Veerendra et al. 1999).

Table 2 Mean performance of selected Madhuca latifolia genotypes for seed traits
Fig. 2
figure 2

Seed diversity among selected Candidate Plus Trees of Madhuca latifolia

Table 3 Estimates of variance components and other parameter for seed traits in Madhuca latifolia

The phenotypic and GCV were also similar for all traits, except acid value, which exhibited striking difference between PCV (36.2) and GCV (23.5), indicating that for most traits genetic control was quite high (Table 3). The magnitude of the error variance was lower than that of the genotypic variance for all traits except acid value (data not given). Estimates of broad sense heritability ranged from 42.1 % (acid value) to 93.5 % (oil content) and for the other seven traits heritability was 60 % or more. Genetic gain (%) ranged between 11.1 and 31.4 % with equivalent diameter giving the lowest value and acid value giving the highest value. Relatively high value of genotypic variance resulted in high estimates of heritability which contributed to the high genetic gains expected in this material. The genotypic coefficient of variation, heritability and genetic gain were higher for an important trait such as oil content. Our results align with those reported by Kaushik et al. (2007a) for Jatopha curcas. Our high estimates of heritability combined with high GA suggest that population means for oil content can be changed considerably by selecting the superior 5 % of the genotypes.

As variation among clones used for estimation of genetic variation and genetic gain, co-variance estimates between traits can be used to estimate genetic correlations between the traits (Foster 1986). In genetic improvement of seed and oil yields of M. Latifolia, clear understanding of the relationships among different seed traits is very essential. Correlation establishes the extent of association between seed traits and their attributes so that these components can be used as additional criteria for selection in breeding program. Correlated quantitative traits are of major interest in an improvement program, as the improvement of one character may cause simultaneous correlated changes in other characters. Of the 110 (55 genotypic and 55 phenotypic) correlations, 16 genotypic and 15 phenotypic combinations were significant at 1 % along with 6 genotypic and 4 phenotypic combinations that were significant at 5 % (Table 4). Skinner et al. (1999) suggested that only those correlation coefficients greater than 0.707 or smaller than −0.707 are biologically meaningful so that 50 % of the variation in one trait is predicted by the other. The seed trait pair showing such high correlation was 18 and all were positive. However, 100 seed weight registered positive and highly significant association with oil content at both phenotypic (0.57) and genotypic (0.60) levels is also important as it is the trait of economic interest (Table 4). Similar correlation trend was seen in Jatropha curcas (Kaushik et al. 2007a) and Pongamia pinnata (Kaushik et al. 2007b). Hence 100 seed weight may be a suitable criterion for selection of plus trees. Aspect ratio recorded negatively significant correlation with acid value (−0.52) at genotypic level and with iodine number both at phenotypic (−0.41) and genotypic (−0.46) levels. Genotypic correlation is an estimated value, whereas, phenotypic correlation is derived from the genotype and environmental interaction (Chaturvedi and Pandey 2004). The genotypic correlation is, therefore, a more reliable estimate for examining the degree of relationship between character pairs. Since mahua seed oil is edible, acid value and iodine number which influence the shelf life of oil are important characteristics. Hence high aspect ratio may be one of the criteria for selecting genotypes with low acid value and iodine number.

Table 4 Genotypic (G) and phenotypic (P) correlation coefficients for seed traits in Madhuca latifolia

Genetic diversity in plant species forms the basis for selection and further improvement. Our information on the genetic structure and diversity relationships of CPTs provides a basis for planning and conducting future collections and efficient utilization of genetic resources to realize the potential for maximizing seed and oil yield. Diversity among the seeds collected from each CPT is presented in Plate 1. A very large proportion of the total variation (80.8 %) was explained by the first 3PCs (data not given). The first PC alone accounted for 46.2 % of the variation followed by the second PC, which explained 19.8 % of the variation. The third PC accounted for 10.2 % of the variation. Based on loading for the first three PCs, characters such as 2D surface area, aspect ratio, seed breadth, equivalent diameter, 100 seed weight, seed length, perimeter and oil content are important and adequate descriptors in this material.

Cluster analysis of the scores of the first three PCs resulted in four clusters (Fig. 3). The first cluster comprised four genotypes (CPT-10, CPT-13, CPT-19, CPT-23), the second comprised seven genotypes (CPT-5, CPT-7, CPT-8, CPT-9, CPT-14, CPT-16, CPT-21), the third comprised three genotypes (CPT-3, CPT-6, CPT-15), and the remaining nine genotypes (CPT-1, CPT-2, CPT-4, CPT-11, CPT-12, CPT-17, CPT-18, CPT-20, CPT-22) were grouped into the fourth cluster. The clustering pattern proved that geographical diversity need not necessarily be related to genetic diversity. This kind of genetic diversity might be due to differential adoption, selection criteria, selection pressure and environment (Vivekananda and Subramaninan 1993). This indicated that genetic drift produced greater diversity than did geographic diversity (Singh et al. 1996). Absence of any relationship between genetic diversity and geographical distribution is in accordance with the findings of Kaushik et al. (2007a) and Gohil and Pandya (2008) for J. curcas. The range, mean, and variances for the four clusters are listed in Table 5. Cluster 3 was distinguished from other clusters based on significantly higher means for most seed traits except seed breadth, acid value, Iodine number and oil content. Cluster 1 also appeared more divergent as it had significantly higher mean values for acid value and Iodine number. A comparative assessment of the means of the four clusters for 100 seed weight vis-à-vis oil content suggested that cluster 3 may be ideal for developing higher yields of seed and oil. Hence these genotypes (CPT-3, CPT-6 and CPT-15) can be used for direct selection and utilization in breeding programs. Earlier studies, of crop plants indicated that inter-mating of divergent groups led to greater opportunity for crossing over which would release latent variation by breaking up predominantly repulsion linkage (Thoday 1960) and the importance of using diverse parents in breeding was also stressed by Singh et al. (1981). Arya et al. (1999) reported significant variation in growth and biomass of tree seedlings at 1 year for Prosopis cineraria. Similar records of seedling growth were also reported for Eucalyptus and Casuarina sp. (Toky and Bisht 1991) and Bombax ceiba (Chaturvedi and Pandey 2001).

Fig. 3
figure 3

Grouping of 23 Madhuca latifolia genotypes based on scores of first three principal components

Table 5 Range, mean, and variance for different seed traits in four clusters of selected genotypes of Madhuca latifolia

Conclusions

CPT-3, CPT-5 and CPT-16 were found to be superior for 100 seed weight and oil content. Hence seeds of these CPTs can be assigned priority for extensive afforestation programs. Seed traits, viz. 100 seed weight and oil content are under strong genetic control; improvement in these characters may be achieved through further selection. Since 100 seed weight and oil content were positively correlated, while aspect ratio negatively influenced iodine and acid value, 100 seed weight and aspect ratio can be included as potential selection criteria for M. latifolia. The divergence among the genotypes based on the first 3 PCs and the subsequent comparative assessment of means of the four clusters for 100-seed weight and oil content suggested that, cluster 3 may be ideal for developing higher 100-seed weight and oil. Hence these genotypes, CPT-3, CPT-6 and CPT-15, can be used for direct selection and utilization in breeding programs.