Introduction

Tamarind (Tamarindus indica L.; 2n = 2x = 24; family Fabaceae; subfamily Caesalpinioideae) is an evergreen, slow-growing, multipurpose tree that has become a naturalized species in subtropical and semi-arid tropical regions of the world. Originating in Africa, through human intervention it has spread to Asia, America and Australia (Van den Bilcke et al. 2014). It grows on village common lands, along rivers, roadsides, and wooded grasslands in India, from as far north as Punjab to the dry warmer areas of the South and Central region, where it thrives best (Coates-Palgrave 1988). Tamil Nadu, Kerala, Maharashtra, and Madhya Pradesh are the major states contributing more than 90 percent of production (Israel et al. 2019). All parts of the tree, including fruit pulp, seeds, flowers, leaves, and wood, are beneficial to humans and animals, as well as for medicine, fuel wood, construction, trade, and industrial processing (El-Siddig et al. 2006). Tamarind fruit, especially the sticky, sweet–sour pulp, is used to flavor drinks, syrups, sauces and curries, as well as to produce tamarind juice concentrate, tamarind pulp powder, tartaric acid, pectin, tartrates, and alcohol (Khaing et al. 2018). The seeds are grounded to manufacture tamarind kernel powder (TKP) and tamarind seed gum or flour, which acts as stabilizer, thickener, and gelling agent in a variety of processed foods (Glicksman 2020). The high nutritive value of fruit (100 g pulp provides good amount of energy (239 kcal), dietary fiber (5 g), minerals such as Ca (74 mg 100 g−1), Mg (92 mg 100 g−1), P (113 mg 100 g−1), K (628 mg 100 g−1), Fe (2.8 mg 100 g−1) and vitamins Thiamin (0.428 mg 100 g−1), Riboflavin (0.152 mg 100 g−1) and Niacin (1.94 mg 100 g−1) makes it one of the cheapest sources of multivitamins and minerals for the poor (Food Data Central 2022). The mature pods are harvested between February and May, which is a lean time for employment in rural areas with low wages. Women and children in the village pick, clean, and de-seed the pods, then pack the pulp for sale to supplement their income (Israel et al. 2019). The trees are therefore important for nutritional and livelihood security of local communities.

As in other Caesalpinioid legume trees, T indica flowers are mainly cross-pollinated, with honey bees and wasps as the primary pollinators (Nagarajan et al. 1998). Since seeds are frequently used to propagate tamarinds, distinct provenances and accessions with a wide range of morphological variabilities have emerged that can be used in breeding initiatives (Diallo et al. 2008). Using this feature valuable accessions have been identified in Thailand (Feungchan et al. 1996), India (Pareek and Awasthi 2002; Bhavani et al. 2021), Uganda (Okello et al. 2017), Mali (Van den Bilcke et al. 2014), and Ecuador (Álvarez et al. 2019). Local accessions of tamarind in India are reported to show great variability in yield, pod and pulp characteristics, flowering, and disease tolerance (Kulkarni et al. 1993; Shanthi 2003). Despite the acknowledged importance of T. indica, there is a significant gap in research regarding its varied physical characteristics and the exploration of its potential economic uses. This lack of knowledge hinders the development of sustainable strategies to fully harness its economic and ecological potential, limiting its widespread adoption and impeding the optimization of resource allocation and conservation efforts. Engaging farming communities in participatory plant breeding, practicing ecological agriculture for healthier crops and environment, ensuring access to diverse high-quality seeds and planting material, developing and marketing agricultural products, and enhancing the capacity of smallholder farmers to interact with formal research and extension services all contribute to and promote the sharing of benefits (Ruiz and Vernooy 2012). Therefore, the main objectives of this study were to use quantitative phenotypic traits to (a) elucidate the degree of phenotypic variation between provenances (b) correlate phenotypic characteristics with the regional ecology, and (c) provide a conceptual framework for T. indica-related diversity and conservation research.

Materials and methods

Collection of plant material

The current study examined the fruit phenotypic variation of 422 individual trees of T. indica collected from eight Indian states: Mizoram, Karnataka, Chhattisgarh, Odisha, Maharashtra, Madhya Pradesh, Telangana, and Jharkhand (Fig. 1). Literature and local knowledge provided by Forest Departments, National Bureau of Plant Genetic Resources Centers, Krishi Vigyana Kendras and NGOs aided in identifying well-developed populations in these states. Adult fruiting trees that had grown naturally (rather than being planted) were sampled at each location at random. These trees were selected based on visual observation and discussions with local people. The selected trees were typically separated by a minimum distance of 50 m to reduce the likelihood of sampling closely related individuals. To record detailed passport information, a GPS receiver was used. Details about the collection location are presented in Table 1.

Fig. 1
figure 1

Map of India showing distribution of sampling points of T. indica provenances in 8 states

Table 1 Location coordinates, abbreviation, sample size, altitude, climatic zone, mean annual temperature (MAT) and mean annual precipitation (MAP) of selected study sites

Morphological measurements

Ten fruit samples were randomly collected from each tree for morphological measurements. The pods were transported to Indian Institute of Horticultural Research, Bengaluru, where they were stored in cotton bags after air drying for few days. A total of sixteen morphological features were used to assess the extent of diversity. The total pod mass, pulp mass, shell mass, fiber mass, and seed mass were all determined using a precision balance (0.01 g accuracy). Pod length was measured with help of a measuring tape considering the distance between pod tip to pedicel. When measuring curved surfaces, measurements were taken on the outside of the curve. The pulp percentage was calculated as pulp mass/pod mass × 100. The real value of pulp (RVP), which is a common method for primary screening of tamarind accessions in Thailand, was calculated as pulp percent x (pulp mass/100). Standard procedure was considered for determination of titratable acidity, reducing sugars and total sugars AOAC (2005).

Data analysis

Descriptive statistics were performed to analyze the data, including calculating the mean, range, standard deviation, and coefficients of variation (CV) for the variables. The CV, which is obtained by dividing the standard deviation by the mean and multiplying by 100, was used to assess the variability among the parameters. Correlations between different traits were examined, and principal component analysis (PCA) was conducted using R software (ver. 3.1.2). Correlations among fruit traits and between fruit traits and environmental variables were performed by means of the Spearman rank correlation coefficients. Using the ggplot2 package in R, the data correlation matrix was subjected to principal component analysis (PCA) to identify the most important variables contributing to the total phenotypic variation among the accessions. Differences between states were assessed using linear mixed effects. The state was used as a fixed factor, and the fruits were used as a random factor within the trees. Multiple comparisons were performed if significant differences between states could be demonstrated, using the Tukey HSD approach (SAS package). The means of all the characters were subjected to Squared Euclidian Cluster analysis and a dendrogram was derived using Ward’s method).

Results and discussion

Analysis of phenotypic diversity

Focusing on the natural T. indica populations, our sampling area covered twelve climatic zones in eight states from low altitude to high altitude (400–1100 m). The evaluation of phenotypic diversity revealed a wide range of variation in majority of traits of T. indica (Table 2). Substantial variation between the minimum and maximum values of phenotypes with exception of pod length, pod breadth, pulp percent, reducing sugars and total sugars was recorded. In particular, the maximum value of fiber mass, fiber percent and RVP were more than 73, 59 and 36 times greater than the minimum values (Table 2). The coefficient of variation (CV; %) was used as a measure of degree of dispersion for quantitative traits around the mean. Thus, the larger the CV, the greater the dispersion of data points of the measured traits. Majority of the examined traits showed relatively high CV. Highest levels of variation were found for: fiber percent (FP:94.62%) > fiber mass (FM:66.03%) > real value of pulp (RVP: 63.22%) > pulp mass (PM: 53.97%) > acidity (A:52.15%) > pod mass (PM: 49.93%) > shell mass (SM: 49.06%) > seed mass (SDM: 48.87%) > total sugars: acidity (TS:A: 45.62%) > pod length (PL: 29.29%) > total sugars (TS: 22.84%) > reducing sugars (RS: 22.31%) whereas, pod breadth (PB: 19.92%) > seed percent (SDP:17.96%) and pulp percent (PP:16.71%) showed lowest variation. It has been observed that the genetic variation of well distributed species remains higher than that of endemic woody species, which are known to have low genetic variation (Luan et al. 2006). Pareek and Awasthi 2002; Singh and Singh 2005; Prasad et al. (2009) and Sharma et al. (2015) have also reported similar variation in tamarind fruit traits. Algabal et al. 2012 reported high CV values for pulp weight (27.22%) and fiber weight (27.22%) in Karnataka, suggesting that there is a significant variation and scope for improvement through selection. Several other workers (Divakara 2009; Shanthi 2003; Hazarika and Lalrinpui 2020) have also reported high phenotypic CV for pulp weight in tamarind. Ayala-Silva et al. (2016) reported that pomological properties varied significantly among 13 accessions sourced from throughout the world and grown in Florida, USA. In Benin, Fandohan et al. (2010) discovered differences in fruit mass, pulp mass, seed length, seed breadth, and seed mass (3 to 53%) for distinct tamarind morphotypes while, Soloviev et al. (2004) and Van den Bilcke et al. (2014) found significant differences in tamarind accessions from Senegal and Mali respectively. The mean RVP of 19.32 in our study, is higher than maximum RVP value of 9.42 reported by Van den Bilcke et al. (2014) for collections from Mali and the mean RVP (15.09) for accessions identified for plantations in Thailand (Feungchan et al. 1996). The difficulty in separating the main fibers from the extremely sticky pulp, or from the hardened shell, is a contributing factor for the very large variations in fiber mass (Van den Bilcke et al. 2014). T. indica fruit size and seed number variation have been found to be significantly influenced by cross pollination and resource availability. Self-incompatibility and resource constraints in self-pollinated flowers may reduce fruit size and the number of seeds per fruit (Diallo et al. 2008).

Table 2 Descriptive statistics for the quantitative fruit traits of T. indica

Tamarind fruits, in general, contain 8–18% (free) tartaric acid and 25–45% (reducing) sugars (El-Siddig et al. 2006). In a study from Pakistan, Hasan et al. (1978) reported 8.40–12.40% tartaric acid and 21.40–30.85% total sugars in tamarind fruit pulp. Our samples had tartaric acid (min-3.45, max-27.60 and mean-7.95), reducing sugars (min-12.12, max-68.29 and mean-36.16), total sugars (min-12.36, max-78.87 and mean-39.38) and sugar/acid ratio (min-0.51, max-13.69 and mean-6.08) which is on the higher side than reported by different authors (Singh and Singh 2005; Divakara 2009; Praveenakumar et al. 2020; Hazarika and Lalrinpui 2020; Mamathashree et al. 2022). This could be due to the large sample size and diverse provenances in our study. According to Van den Bilcke et al. (2014), tamarind accessions from Mali contained more tartaric acid (9.28–11.94%) but less sugars (27.94–32.25%) than the Thai sour reference (8.96% tartaric acid and 41.50% total sugars). The total acidity and total sugar content of Indian tamarinds, according to our findings, are comparable to Thai sour types. Due of their higher tartaric acid content, Indian tamarind fruits have a sour taste (average sweetness of 6.08).

Principal component analysis (PCA) and biplot

We used PCA on all quantitative traits to investigate the contribution of individual traits to phenotypic variation (Table 3). The first five PCs (with eigen value > 1.0) accounted for 98.61% of the variation, indicating that these components differed the most and had the greatest impact on separation of the accessions. Seed percent, shell percent, reducing sugars, total sugars and sugar/acid ratio had positive loads while pod mass, pulp mass, real value of pulp and fiber mass had strong negative loads in PC1 (60.94% of the variability). Morphological characteristics related to pulp percent (0.51) and acidity (0.36) strongly affected PC2 (19.32%). Reducing (0.55) and total sugars (0.42) had positive influence in PC3 (9.14%) while in PC4 (9.14%), shell percent (0.65) was most dominant trait. The last component (PC5) explained less variability (5.92% of total variance) and included variables such as pod breadth (0.52), pulp percent (0.43) and pod length (0.36) with highest loadings. In summary, we found that the pulp traits viz pulp percent, acidity, total and reducing sugars had the greatest impact on the phenotypic variation of T. indica. PCA helps to reduce the number of effective parameters to discriminate accessions. One explanation for the associations between traits highlighted by PCA is genetic linkage between loci controlling traits or the pleiotropic effect of genes (Iezzoni and Pritts 1991). This tool has been used extensively to evaluate germplasm, explore correlations between tree traits, and define genetic relationships among varieties and accessions (Khadivi-Khub and Anjam 2014; Kanupriya et al. 2019; Mishra et al. 2022). The first two major PCs, which together accounted for 79.5% of the variance, were used to create a PCA bi-plot (Fig. 2). Fiber percent, sugar/acid ratio, seed percent, pod length and seed mass contributed the most to the first principal component (PC1) while, acidity, RVP, pulp mass and pulp percent were major contributors of second principal component (PC2). Acidity, RVP, and pulp mass are all located on the same side as Mizoram accessions, indicating that these traits were more heavily influenced by trees sampled from this state. On the other hand, trees growing in Maharashtra, Jharkhand and Odisha made a higher contribution to fruit sweetness. Acidity, RVP and pulp mass size vectors point in the same direction, indicating that these characteristics were positively correlated, as were sweetness, pod length and seed percent.

Table 3 The first five principal components (PCs) with loadings for quantitative traits in T. indica
Fig. 2
figure 2

Biplot graphic with two principal components for the sixteen quantitative variables of T. indica

Correlation analysis

Principal component analysis (PCA) was helpful in revealing the major traits in phenotypic variation however, it was not able to determine the relationship between them. In addition, our sampling encompassed eight states with large difference in altitudes. Therefore, a correlation analysis among fruit traits and between fruit traits and environmental variables was carried out. Most of the studied traits showed strong correlations (P < 0.05) (Fig. 3). Real value of pulp (RVP) and all pod size traits showed positive correlations ranging from 0.37 to 0.93. High positive correlation between pod length and pulp recovery suggests that selecting for long pods would indirectly select for higher pulp recovery. Pod mass and pod size (r = 0.694–0.776) and pod length and seed mass (r = 0.868) showed significant positive associations. Strong positive correlations were also observed between titratable acidity with pod size and pulp recovery traits (r = 0.513–0.742) implying that breeding for pods with high pulp recovery and high content of tartaric acid may be possible. Reducing and total sugars had high negative correlations with pod size and pulp recovery traits (r = − 0.204 to − 0.694). Sugars/ acid ratio (sweetness) had positive association with seed percent (r = 0.718) and shell percent (r = 0.537). Fruit quality traits having strong, positive correlations (R2 > 50%) are reported to be valuable in breeding programs and can potentially influence selection strategies (Skinner et al 1999). In fact, pod length, which is easier to measure during survey and in situ evaluation, can be used to estimate pulp mass, pulp percent, and RVP, which require more time and resources to record (Maia et al. 2016).

Fig. 3
figure 3

Correlation coefficients among fruit traits, geographic and climatic factors in T. indica from eight states in India

When the correlation coefficients between geographic factors and phenotypic traits were compared, it was discovered that altitude was positively correlated with pulp mass (r = 0.681), RVP (r = 0.749) and titratable acidity (r = 0.805) and negatively with sugars/acid ratio (r = -0.840). Latitude was positively correlated with reducing (r = 0.569) and total sugars (r = 0.615) while longitude with pulp percent (r = 0.673) and titratable acidity (r = 0.738). Furthermore, we identified significant correlations between climatic factors viz mean annual temperature (MAT) with reducing (r = 0.518), total sugars (r = 0.600), sugars/acid ratio (r = 0.890), pod breadth (r = − 0.728), pod mass (r = − 0.514), pulp mass (r = − 0.636), pulp percent (= − 0.556), RVP (r = − 0.715) and seed mass (r = − 0.517) and mean annual precipitation (MAP) with pulp percent (r = 0.800), RVP (r = 0.501) titratable acidity content (r = 0.782) and sugars/acid ratio (r = − 0.628). Fandohan et al. (2010) and Van den Bilcke et al. (2014) found similar relationships between morphological fruit traits and MAT and MAP in tamarind. Our findings align with previous studies conducted on grapes, specifically the work of Kliewer and Lider (1970), who also observed that higher temperatures led to increased sugar content and reduced levels of tartaric acid. Additionally, Coombe (1987) noted that a temperature rise of 10 °C resulted in a 50% decrease in tartaric acid content. Water deficit (low rainfall) resulted in low total acidity and sugar content (Des Gachons et al. 2005) and high temperatures lowered total acidity in grape berries (Dokoozlian et al. 2000). In the case of apples, a study by Mignard et al. (2022) demonstrated a negative correlation between individual sugar and acid concentrations and precipitation across all accession groups.

Diversity between provenances

A provenance's phenotypic diversity correlates positively with its coefficient of variation (CV). Greater diversity is indicated by a high CV. Comparing the CVs (%) for eight provenances it was observed that Karnataka had the highest (49.51), followed by Jharkhand (30.66) and Mizoram (29.81), while Maharashtra (29.50) had the lowest (Table 4). All the sixteen quantitative traits differed significantly (P < 0.05) among the states. The mean values of total pod weight, shell mass, fiber mass, pulp mass, real pulp value and seed mass, were significantly higher in Karnataka compared to the other Indian states (Table 4). Mizoram had the highest mean pod breadth (2.47 ± 0.28), percentage of pulp mass (42.14 ± 4.94), RVP (27.26 ± 10.45) and acidity (14.75 ± 5.89). Jharkhand had the highest values for percentage of seed mass (31.71 ± 4.02), reducing sugars (39.25 ± 7.26), total sugars (42.00 ± 8.02) and sweetness (7.41 ± 2.99) followed by Madhya Pradesh, Telangana and Chhattisgarh. As a result, these states can be further investigated for the selection of sweet tamarind trees. Trees from Mizoram and Karnataka, on the other hand, could be targets for pulp recovery and acidity. Raut et al. (2022) reported sugar/acid ratio of 2.27 for 22 tamarind accessions from Maharashtra which is lower than our study (6.54 ± 2.49) on 24 accessions from this state.

Table 4 ANOVA analysis for comparison of provenances of T. indica for quantitative traits. Values represent the mean ± standard deviation. Mean followed by the same superscripts are not significantly different referring to Tukey’s HSD test (p < 0.05)

Cluster analysis

The similarities and dissimilarities among the accessions was studied with help of Ward’s dendrogram which confirmed the existence of high variability in quantitative traits. Morphological cluster analysis revealed four distinct clusters (Fig. 4), which were further subdivided into numerous sub-clusters based on similarity. Cluster I accounted for 12.55% of all accessions in the collection, with the majority (42%) collected in Mizoram state (Table 5). Most of the accessions in this cluster were sour, with a mean titratable acidity of 14.22 ± 6.36 and low reducing and total sugars. Sweetness was low (2.55 ± 1.41) therefore these trees could be valuable germplasm for tartaric acid extraction. Cluster II comprised of 7.58% of all the accessions with 50% coming from Karnataka. This cluster had members with superior fruit quality in terms of pod size, pod and pulp mass, RVP, and seed mass. The sweetness score was 5.73 ± 3.07, indicating a moderate nutritional quality. The states of Odisha and Chhattisgarh comprised the majority of the 104 accessions in Cluster III. High pulp percent (41.09 ± 5.77), second highest RVP (24.98 ± 7.44) low acidity (6.85 ± 2.64) and high reducing (40.50 ± 6.61) and total sugars (43.93 ± 7.71) were important attributes of this group. Accessions with a distinct sweet taste (highest sugar/acid ratio of 7.08 ± 2.27) in this cluster may be useful for breeding programs attempting to improve this trait. Cluster IV contained the largest number of accessions (233) and had average fruit and pulp nutritional values. The study revealed notable difference between the groups, implying that these groups are distinct from each other based on phenotypic traits, with pulp quality and nutritional value as the effective character. Some disadvantages of morphological characterization in assessing diversity include the fact that most of the characters are influenced by environment, the descriptors do not reveal the full extent of diversity and morphologically different individuals in terms of key traits of interest cluster together (Prohens et al. 2005). This can be overcome through molecular characterization, which, while initially costly, provides a more accurate picture of heritable diversity.

Fig. 4
figure 4

Dendrogram showing the grouping of accessions in T. indica from eight states in India

Table 5 Characteristics of clusters based on quantitative variables in 422 accessions of T. indica. Values represent the mean ± standard deviation

Linking biodiversity with livelihood

An accession belonging to cluster II was collected from Nanadanhalli village of Tumkuru district, Karnataka and was identified through participatory breeding. It was released under the name "Lakshamana" based on its superior characteristics, including high pulp recovery, RPV, yield, regular bearing behavior, pest and disease resistance, and locally preferred features like light brown and sour pulp. Before its release, the accession underwent DNA fingerprinting and profiling for sugar and organic acids using liquid chromatography with tandem mass spectrometry (LC–MS/MS) and high-performance liquid chromatography (HPLC) (Kanupriya et al. 2020). A Memorandum of Understanding was signed between the custodian farmer and ICAR-IIHR, Bengaluru. According to the agreement, the Institute propagated and sold the plants until the farmer acquired the necessary skills to establish a nursery for this accession (Fig. 5). The farmer received 75% of the revenue from the sale of grafted plants. Meanwhile, he was imparted training to carry out softwood grafting, enabling him to earn a steady income from his own nursery. He was encouraged to multiply the tree and establish a mother block. As a result, this valuable genotype has been conserved in the farmer's field, and the farmer has become a successful entrepreneur. This approach not only aided in conservation but also addressed the challenges associated with genetic improvement in perennials, which is a time-consuming and difficult process (Sthapit et al. 2016).

Fig. 5
figure 5

Model depicting the process of linking biodiversity conservation with livelihood security

Conclusions

Our extensive survey and fruit collection, encompassing twelve provenances from eight states, provide a comprehensive understanding of the phenotypic diversity of tamarind in India. These valuable findings can be utilized to select superior trees and unlock its full potential in promoting sustainable land use practices, enhancing soil fertility, maximizing ecosystem services, and contributing to medicine and food processing, leading to sustainable economic development. The study revealed significant variation, particularly in traits of interest such as pulp mass, RPV, and seed mass. Notably, we observed a significant positive correlation between pod length and pulp mass, which holds potential for in-situ evaluation. Promising locations for the selection of elite trees with desirable fruit and pulp characteristics, capable of being profiled for tartaric acid, sugar, and valuable active metabolites in future research, include Karnataka, Chhattisgarh, Mizoram, and Odisha. This study serves as a foundation for selecting parent trees for breeding, generating mapping populations, and identifying desirable genes. As an initial step towards defining a working collection, it is crucial to conserve a few exemplary trees from each provenance while simultaneously empowering farmers to thrive as successful entrepreneurs.