Introduction

The walnut (Juglans regia L.) belongs to the Fagales order and family Juglandaceae (McGranahan and Leslie 2009; Hussain et al. 2021; Zaini et al. 2020). It is a widely cultivated nut crop in temperate regions of the world, including Indian state of Jammu and Kashmir (Shah et al. 2021, 2018). Walnuts are nutritionally dense and is considered as “bread of the future” (Jaćimović et al. 2020; Turdieva et al. 2012). Presently, China is a key walnut-producing country with share of 43.31% global production. The United States, Iran, Turkey, Mexico, and India each contribute 16.74%, 11.19%, 5.87%, 4.35%, and 0.88% of the global walnut output, respectively. Walnut is a native to Eurasia, growing from the Balkans to Southwest China (Aradhya et al. 2017; Feng et al. 2018; Khadivi-Khub et al. 2015; Pollegioni et al. 2004). All the species of genus Juglans are diploid with a karyotype of 2n = 32 and have 16 linkage groups (Kefayati et al. 2017). There have been numerous previous attempts to create genetic linkage maps using RAPD, RFLP, and isozyme markers (Fjellstrom and Parfitt 1994; Woeste et al. 1996; Malvolti et al. 2001). However, there were not sufficient markers to cover all of the linkage groups, and some of the linked markers lacked sequence information. Recently, SNP and InDel markers were also used to construct genetic map of walnut (Zhu et al. 2015; Luo et al. 2015). The first SSR-based linkage map was constructed by Kefayati et al. (2017) with consensus map length of 1569.9 cM. Availability of walnut genome (Martínez‐García et al. 2016) opened many frontier areas including fine mapping of economic traits (Bernard et al. 2019; Marrano et al. 2019a; Ji et al. 2021) and cracking of other Juglans species genomes (Stevens et al. 2018). The walnut genome version 1.0 (Martínez‐García et al. 2016) was highly fragmented and was significantly improved in v1.5 genome assembly (Stevens et al. 2018). Recently, a high-quality chromosome-scale assembly (Chandler v2.0) helped to explain the complex biological processes in walnut (Marrano et al. 2020). This high-quality genome assembly was obtained by combining Oxford Nanopore long read sequencing with chromosome conformation capture (Hi-C) technology. A few genomic studies indicate that walnuts grown in South Asian countries particularly the Pakistani and Indian populations are ancestral (Aradhya et al. 2017; Bernard et al. 2020a; Gaisberger et al. 2020; Roor et al. 2017). However, new phylogenomic studies reveal the hybrid origin of J. regia (Zhang et al. 2019). In Jammu and Kashmir, the crop has not been exploited for any intensive breeding program; therefore, the natural population possess highest genetic diversity (Shah et al. 2022). To effectively harness the walnut latent potential, it is essential to accurately recognize high genetic diversity to breed new genitors and superior cultivars (Doğan et al. 2014). Phenotypic trait evaluation is a common approach to assess walnut diversity. However, such investigations are inefficient, expensive, and difficult to assess directly for complex polygenic traits (Nickravesh et al. 2023). These issues have been resolved by the development of DNA-based markers, which provide reliable results regardless of the external environment (Shah et al. 2020). Among the molecular markers, microsatellites (1–6 bp in length) are most reliable (Grover et al. 2007; Taheri et al. 2018), which are abundant and well distributed throughout the nuclear genome of eukaryotes (Kalia et al. 2011). Microsatellites are powerful and informative markers for assessing the genetic diversity, finding the relationships among different germplasm populations, linkage map construction, validate walnut scions, and source plants for reliable propagation and to investigate biotic or abiotic stresses (Ali Khan et al. 2016; Bernard et al. 2020a, b, 2019; Shah et al. 2018, 2020; Pollegioni et al. 2017; Doğan et al. 2014; Nickravesh et al. 2023). The genetic diversity in J. regia was first studied by Woeste et al. (2002) followed by other researchers (Bai et al. 2010; Chen et al. 2014; Dangl et al. 2005; Foroni et al. 2005, 2007; Hoban et al. 2008; Magige et al. 2022; Najafi et al. 2014; Robichaud et al. 2006, 2010; Ross-Davis et al. 2008; Topçu et al. 2015; Victory et al. 2006; Zhang et al. 2010, 2013). The first set of 13 SSR markers developed from J. regia was developed by Najafi et al. (2014). Second set of 94 SSR markers for walnut was developed by Topcu et al. (2015) and out of which only 19 SSRs markers were polymorphic. Topçu et al. (2015) developed another 276 SSR makers from enriched repeat region of genomic libraries. Among these, 185 SSR markers were polymorphic. In spite of the fact that molecular markers aid in deciphering Juglans species’ population structure and differentiation (Victory et al. 2006; Foroni et al. 2005, 2007; Ross-Davis et al. 2008; Woeste et al. 2002), very few SSR have been developed so far. Although these SSRs have been routinely used to infer the walnut population structure (Bernard et al. 2020a, b; Wang et al. 2008; Ebrahimi et al. 2016), the number is less to construct dense linkage map, marker trait association studies, and QTL mapping. Recently walnut SNP chip, currently the largest chip available in crops, was developed by Marrano et al. (2019b) and is in vogue to map the complex traits (Marrano et al. 2019b; Arab et al. 2019, 2022; Bükücü et al. 2020; Sideli et al. 2020). However, it is difficult to access this chip by the scientific community from developing nations. Alternatively, SSRs being neutral can be used by the labs that do not have high-throughput genomics setup. The best and easiest way to develop large number of SSR markers is to use publicly available walnut genome (Martínez‐García et al. 2016). With the aid of bioinformatics workflows, it is easy to mine huge number of genome-wide SSR markers. Many researchers exploited the genomic information to mine genome-wide SSR markers in different plant species in the past decade. For instance, genome-wide SSR markers were developed using bioinformatic approaches in pear (Liu et al. 2015), citrus (Hou et al. 2014), pomegranate (Patil et al. 2020b), spinach (Patil et al. 2020b), Lilium (Biswas et al. 2020), capsicum (Cheng et al. 2016), watermelon (Zhu et al. 2016), and Palmae (Manee et al. 2020).

To date there are only 1300 SSRs available for walnut (Foroni et al. 2005, 2007; Chen et al. 2014; Dangl et al. 2005; Hoban et al. 2008; Najafi et al. 2014; Robichaud et al. 2006; Ross-Davis et al. 2008; Topçu et al. 2015; Victory et al. 2006; Woeste et al. 2002; Zhang et al. 2010), hence we explored publicly available chandler genome to mine genome-wide SSR markers. We report a new set of 162,594 genome-wide SSR markers. Preliminary wet lab studies show that our SSRs are robust with high discriminatory power. Using these SSR markers, we found high diversity in walnut populations from northern India. Our SSR repository will help the scientific community actively working on walnut to saturate linkage map, phylogenetic analysis, and to map economically important traits. Further, this set will help to deduce the population structure of Juglans species as most of these SSR markers will show cross transferability.

Materials and methods

Genome-wide SSR mining

Walnut genome (Cv. Chandler) is publicly available at NCBI [Juglans regia (ID 17683)—Genome—NCBI (nih.gov)] and we downloaded it in a local server. We used GMATA v 2.0 tool (https://sourceforge.net/projects/gmata) to scan genome-wide SSRs markers as described previously (Wang and Wang 2016; Bhat et al. 2018). To design primers, standalone primer3 was used in batch mode with the following parameters: product size 140–400 bp; primer length 19–25 bp with optimal length 22 bp; primer annealing temperature with optimal Tm 60 °C; and primer must be at least 200 bp away from the microsatellite locus. To calculate amplicon size and number of alleles, we used standalone electronic PCR (e-PCR) module with default parameters. All text handlings were performed using in-house perl scripts.

Selection of plant material and DNA extraction

We collected young leaves from 72 walnut genotypes, that included 60 from Shopian (SW), 8 from Anantnag (AW), and 4 from Pulwama (PW). These populations were selected based on important growing districts of Kashmir, highly diverse agro-ecosystems, and high phenotypic plasticity. The sampling locations are geographically separated from each other (Fig. S1). The plants were selected based on the crucial morphological and pomological traits to include highly diverse genotypes for genotyping (Shah et al. 2021). The plant seedlings thrive in their natural habitat without the use of any management techniques. The genomic DNA was isolated using the CTAB technique (Doyle and Doyle 1987). RNase treatment was used to further purify the extract. On a 1% agarose gel, the DNA's purity was examined and DNA was quantified using a bio spectrometer (Eppendorf, Germany).

Validation of selected SSRs

A set of 110 SSR primers were selected from unique 136,582 SSR markers showing single allele in e-PCR and validated on 10 highly diverse samples that were chosen from geographically isolated places. For instance, From Shopian population, we selected four samples that were at least 200 km apart. Similarly, we selected walnut genotypes from other two districts. The criterion of selecting 110 SSR markers among a large SSR repository was based on the number of repeat motifs. The markers which failed to amplify or produced monomorphic fragments were discarded. From these, 35 markers were selected for validation to find out the highly polymorphic ones. Fifteen markers out of thirty-five markers although were polymorphic but produced low-resolution bands, thus were discontinued for fingerprinting. PCR amplification was carried out in 0.2 ml PCR tubes in a thermal cycler from Biometra T gradient (Gottingen, Germany) using 2 µl of genomic DNA (25 ng/µl), 1U of Taq polymerase (Thermo Scientific), 1.5 µl of 10 X Taq polymerase buffer, 1.5 mM MgCl2, 200 µM of each dNTP, 0.4 µM of each primer, and 8.30 µl of deionized water in a final volume of 15 µl reaction. We used following temperature regimes; initial denaturation for 5 min at 94 °C, followed by 35 cycles of denaturation for 1 min at 94 °C, primer annealing for 30 s at 60 °C, primer extension for 30 s at 72 °C, and a final extension for 7 min at 72 °C. Amplified DNA fragments were resolved in 3% agarose gel. Product sizes of DNA fragments were determined using 100 bp DNA ladder (Thermo Scientific) as molecular size marker.

Data analysis

Genetic diversity and relationship analysis

Online marker efficiency iMEC program (https://irscope.shinyapps.io/iMEC/) was used to calculate multiple indices of marker efficiency such as number of alleles (Na), expected heterozygosity and discriminating power (Amiryousefi et al. 2018). DNA fragments of various molecular weight sizes generated by SSR markers were compared with the standard molecular weight marker and scored as discrete variables using 1 to indicate presence and 0 to indicate absence of a band. The heatmap was generated based on SSR data of 72 walnut genotypes constructed by Euclidean distance with Ward (unsquared distances) linkage method using Clust Vis Bio tools (https://bio.tools/clustvis) (Metsalu and Vilo 2015).

Genetic structure and admixture analysis

The population structure was analyzed using the Bayesian clustering algorithm implemented in STRUCTURE. The program STRUCTURE was run with K values from 1 to 12. A burn-in period of 50,000 iterations followed by 500,000 replications was used to estimate each value of K. No prior information was used to define the clusters. The number of populations was determined by maximizing Ln likelihood of data for different values of K (Evanno et al. 2005) and the optimal K depends on the peak of ΔK (Earl and VonHoldt 2012). Genotypes with affiliation probabilities of 60% or higher were designated as belonging to a specific group, while those with affiliation probabilities below 60% were classified as admixture. For the purpose of analyzing molecular variance, Arlequin software was employed (AMOVA). Based on the geographic location of the samples and the findings of the population structure of the investigated genotypes, Arlequin was used to calculate the pair-wise genetic distances and the population differentiation coefficients within and among populations (Excoffier and Lischer 2010).

Results

Frequency of SSR’s in walnut genome

A total of 198,924 SSR loci were identified in the 647 Mb walnut genome. Among these, successful primers were designed for 136,582 loci (Table S1). The frequency of SSRs per Mb within the genome was 428.71. Overall SSR motifs analysis shows that the frequency of SSRs falls with the increasing number of repeat motifs. Dinucleotides motifs were predominant and accounted for 88.40% (175,075) of total SSRs followed by trinucleotides (17,184) with a frequency of 8.3% while octanucleotides were least frequent (< 0.1%; Fig. 1a). Frequency of dinucleotide repeated motifs was 377.312 SSRs/Mb and the frequency of SSRs/Mb decreased with the increase in repeat motifs (Fig. 1b).

Fig. 1
figure 1

Distribution of motifs and SSR density of walnut genome. Distribution of motif numbers and their percentage (%); the motifs from dinucleotides to octanucleotides are shown by colored boxes (a). Frequency distribution of SSR’s/Mb from di-to-penta motifs. The horizontal axis depicts the motif type, whereas the vertical axis indicates the frequency of SSRs/Mb (b)

Motif type and motif repeats

We looked at the top 20 single and paired group motifs. In both solo and paired dinucleotide motifs, the dinucleotides came in first place, accounting for 88.40% and 88.6% motifs, respectively (Figs. 2, 3). In each class, we discovered that some motif types were more prevalent than others. For instance, the AT motif was significantly overrepresented in dinucleotide motifs (28%) (Fig. 2a). Additionally, an examination of various repeat counts revealed that dinucleotides (AT motif) had the highest frequency (114.33 SSRs/Mb). Among the trinucleotides, the AAT motif has the highest frequency (5.99 SSR/Mb), while tetranucleotides and pentanucleotides had less SSR repeats (Fig. 2b). The paired motifs AT/AT were more common and accounts for 28% alike that of single motif (AT) followed by TA/TA paired motifs (Fig. 3a). The highest number of SSRs/Mb was obtained in motifs AT/AT and TA/TA followed by other paired motif types (Fig. 3b). It was interesting to observe 81- and 62-time repetition of 2 trinucleotide SSR motifs (ATA and TAT). Another intriguing finding was that heptanucleotides had more repeats than tetranucleotides and pentanucleotides (Fig. 4).

Fig. 2
figure 2

Distribution of individual motifs and SSRs/Mb in Chandler walnut genome. Distribution of individual motif type, number, and percentage from dinucleotides to tetranucleotides, which are discriminated from each other by different colors (a). Frequency of individual SSR motifs from di to tetra (b). The horizontal axis depicts the motif type, whereas the vertical axis indicates the frequency of SSRs/Mb

Fig. 3
figure 3

Distribution of motif type, quantity, and percentage of paired nucleotides (di to tetra), that can be distinguished from one another by their respective colors (a). Pair-wise frequency distribution for di-, tri- and tetra-SSR motifs. The vertical axis shows the frequency of SSR’s/Mb and the horizontal axis displays the paired motif type (b)

Fig. 4
figure 4

Distribution of top ten motifs (di to penta) with their repeat numbers

In silico PCR

The unique SSR markers produced a single allele (84.35%) and the remaining markers produced greater than two alleles (15.65%; Table S2). The number of in silico alleles ranged from 1 to 131 and the average amplicons per mapped marker was 1.20. We also found 99.82% markers generating ≤ 10 in silico PCR products and 99.97% of markers generating ≤ 50 in silico products (Table S2).

Validation and marker efficiency of microsatellite markers

When fingerprinted, majority of SSR primers (65) produce monomorphic band. Therefore, 72 walnut genotypes were fingerprinted using a set of 20 highly polymorphic SSR markers which produced 118 alleles. The primers generated alleles with values ranging from 2 to 12 with an average of 5.2 alleles per primer. The primer WSSR008 yielded the most alleles (12), followed by WSSR001 and WSSR026 with ten alleles each (Fig. 5; Table 1). The primer WSSR3 amplified the minimum of two bands. We observed the amplicon size 110–500 bp that matched to the e-PCR band size (161–393 bp) of SSRs. The polymorphic information content (PIC) of 75% of the markers was ≥ 0.5 and 25% of the markers produced PIC value less than 0.5 with overall values ranging from 0.391 to 0.605 and an average value of 0.184 (Table 1). The expected heterozygosity index (H) ranged between 0.081 and 0.625 with a mean value of 0.514. The discrimination power had a mean value of 0.474 and a range of 0.081–0.590 (Table 1).

Fig. 5
figure 5

Electrophoretic monograph of four SSR markers. Lane M1 is 100 bp DNA marker. Lane L1–L48 are walnut genotypes and a = WSSR1; b = WSSR016; c = WSSR018; d = WSSR002

Table 1 List of highly polymorphic 20 SSR markers and their summary statistics

Genetic relationship and admixture analysis

To determine which genotypes are similar and which individuals differ from one another, it is vital to analyze molecular data matrices using methods like heatmaps and principal component analysis (PCoA). The heatmap created from the SSR molecular data set using the Ward's linkage clustering approach and Euclidean distance indicated two unique groups (Fig. 6). The genotypes are clustered as shown by the PCoA ellipses (Fig. 7), with PC1 and PC2 accounting for 22.2% and 10.1%, respectively, of the molecular variation. The results of the PCoA matrix showed that the walnut accessions were divided into two primary clusters (Fig. 7). According to the PCoA results, the accessions from Anantnag and Pulwama form a single group, and are clustered within the Shopian population that is encircled by a red ellipse, except a single genotype at the circumference's edge. Many Shopian accessions were present in the other cluster. The clustering pattern of PCoA and the heatmap are in agreement.

Fig. 6
figure 6

Heatmap categorize 72 walnut genotypes into 3 populations. The blue and light square plots of the heatmap indicate the presence (1) and absence of the loci (0) of the particular sample. The red, blue, and green represent the three populations

Fig. 7
figure 7

PCA biplot categorizes the genotypes into single cluster (encircled by green) with admixture from Pulwama and Anantnag (encircled by red). The Anantnag population is shown by red circle, Pulwama population by blue square, and Shopian population by green triangle

We used a model-based approach to study the genetic structure of walnut. To identify the true number of populations, two distinguished methods, non-parametric (Wilcoxon test) and delta K method, were applied. The non-parametric method could not give the exact number of populations. Therefore, delta K method was applied (Fig. 8). According to the distribution of delta K values, there was only one peak (Fig. 8a) at K = 2 indicating two distinct populations. Among 72 genotypes, 28 genotypes were placed in subpopulation I and 43 were placed in subpopulation II (Fig. 8b). The single genotype SW-46 showed admixtures. Furthermore, the analysis revealed that the overall proportion of membership of the samples in each of the two clusters was 39.43% in cluster I and 59.72% in cluster II excluding admixture member. Statistical analysis revealed that the percentage of genotypes having ≥ 90 membership was 87.5%, 11.11% exhibited membership coefficient ≥ 60%, and 1.39% of the genotypes exhibited membership coefficient percentage of ≤ 5%. The membership coefficient in the bar plot revealed that accessions SW-05 and SW-25 have gene flow from the cluster II (green) and accessions SW-01, SW-29, SW-37, and SW-38 received genetic material from the cluster I (red). Similarly, allele frequency among two sub-populations (net nucleotide distance) was 0.0669 and average distance (expected heterozygosity or gene diversity) between individuals in same cluster was found almost similar in cluster I (0.2303) and cluster II (0.2312). Mean value of fixation indices (FST) measures the genetic differentiation among the populations. It is one of the most important and frequently used parameters in explaining the population structure. The FST measured by the STRUCTURE program revealed greater FST in subpopulation I (0.3134) than in subpopulation II (0.2389). The AMOVA based on geographical origin of samples revealed significant molecular variation within populations (92.04%) than among populations (7.96%). Whereas, analysis based on population structure (K = 2) showed 87.38% molecular variance within the population and 12.62% among the populations (Table 2). The FST among the populations was 0.06 to 0.12 (0.05–0.25), indicating moderate level of genetic differentiation.

Fig. 8
figure 8

Structure stratification indicates 2 populations of 72 walnut genotypes (a). The red and green represent the members of the two groups or clusters inferred by STRUCTURE harvester (b)

Table 2 Analysis of molecular variance of 72 walnut genotypes partitioned into populations based on their geographic location and structure differentiation

Discussion

In the present study, walnut genome downloaded from NCBI (National Centre of Biotechnology Information) was mined to develop large number of microsatellite markers. Genome-wide SSR markers have been successfully developed in various plant species including jujube (Xiao et al. 2015), apple (Zhang et al. 2012), citrus (Biswas et al. 2014; Duhan et al. 2020; Liu et al. 2013), pomegranate (Patil et al. 2020b, 2021), Bunium persicum (Bansal et al. 2022), pear (Xue et al. 2018), watermelon (Zhu et al. 2016), and bottle gourd (Bonthala et al. 2022). In the current investigation, we thoroughly detailed 162,594 genome-wide microsatellite markers for this significant crop. To the best of our knowledge, this is the first study on J. regia that presents enormous number of genome-wide microsatellite markers. Because of its larger genomic size (647 Mb), the number of SSRs in walnut is comparatively large than other crops. For instance, only 28,342 and 39,523 SSRs were mined from foxtail and watermelon genomes, because of their smaller genomic sizes (Zhu et al. 2016; Pandey et al. 2013). In comparison, the density of SSRs within the genome was 428.71 SSRs/Mb. However, it is surprising that SSR densities among the various woody plants did not differ considerably (Liu et al. 2018a). According to other studies, genome size and SSR density are negatively correlated (Cavagnaro et al. 2010; Liu et al. 2013; Morgante et al. 2002). It may be due to variation in search parameters used to mine SSRs from the genomes (Zhu et al. 2016) or, the different sequencing and assembly methods (Xu et al. 2013). This SSR set after validation will help the scientific community for developing saturated linkage map and mapping of useful traits in walnut that were impossible with limited number of available SSR markers. In addition, a large set of SSR markers will make it easier to map QTLs precisely, identify and exploit genes that control critical traits, conduct genome-wide association studies, enable selective breeding through genomic selection, and infer population structure. Microsatellite markers play a major role in genetic improvement of cereals and grasses but are yet to be explored in horticultural crops. For instance, SSRs shed light on gene regulation and genome organization, genetic diversity (Zhao et al. 2014; Göl et al. 2017), crop domestication (Zhao et al. 2014), variety and scion source validation (Arab et al. 2022; Nickravesh et al. 2023), comparative mapping (Zhu et al. 2016; Wu et al. 2017), genetic map construction (Bali et al. 2015; Tan et al. 2013), and breeding studies (Dossa et al. 2017).

Out of 192,924 SSR loci identified, successful primers were designed for 162,594 (84.27%) loci. In the present investigation, the options of 200 bp flanking SSR region must be responsible for not designing SSR marker for 15.73% loci. Most of these SSR loci were present either in the beginning or end of the scaffold. The failure to develop successful primer pairs for each detected SSR locus in plants genomes is consistent with earlier observations (Pandey et al. 2013; Sonah et al. 2011; Parida et al. 2009). The SSR primers designed were subjected to electronic PCR module (e-PCR) to check the amplification efficiency. It is difficult to validate each primer pair through a thermocycler but e-PCR module is very useful for rapid screening and effective identification of informative markers (Patil et al. 2020a, 2021; Duhan et al. 2020). Hence, each microsatellite created in the present study was confirmed using the e-PCR module with default settings. When subjected to in silico PCR, the majority of SSRs produced a single allele; however, few SSR primers produced multiple bands. To validate the microsatellites generated from plant genomes, many researchers have used in silico PCR amplification modules (Biswas et al. 2020; Shi et al. 2014; Wang et al. 2015). Out of the designed primers, 110 microsatellite markers with different motifs and longest repeats were selected for validation purpose because longer repeats in the genome have higher mutation rates, which can result in a high frequency of polymorphism (Bhat et al. 2018; Cavagnaro et al. 2010; Wren et al. 2000).

The frequency of microsatellites is negatively correlated with the number of nucleotides among the different nucleotide types. Frequency analyses of different nucleotide repeats in walnut revealed that dinucleotide repeats are most abundant SSRs, accounting for 88.4% of total SSRs while hepta-nucleotide repeats were least abundant, representing only 0.1% of total microsatellites. These results are in agreement with numerous studies examining various crop species (Liu et al. 2013; Najafi et al. 2014; Tangphatsornruang et al. 2009; Topçu et al. 2015; Xu et al. 2013; Zhang et al. 2007; Zhu et al. 2012). Microsatellite abundances considerably reduced with the increase in number of motif repeats. The dinucleotide repeats experienced the slowest rate of change while other longer repeats experienced a higher rate of change. The results were inconsistent with those of other studies, as Cucumis sativa, Medicago truncatula, Populus trichocarpa, and Vitis vinifera had the highest tetranucleotide repeats, while Glycine max, Arabidopsis thaliana, Oryza sativa, Setaria italica, and Sorghum bicolor had the highest trinucleotide repeats (Cavagnaro et al. 2010). This is most likely a result of the various SSR identification criteria being used. Dinucleotides and trinucleotides were found to have SSRs with a greater repetition count, whereas tetranucleotides, pentanucleotides, and hexanucleotides had less repeats of the SSR motif. Several plant species showed similar tendencies as well such as citrus (Liu et al. 2013) and watermelon (Zhu et al. 2016).

There were apparent differences in the frequency of the motifs. The AT/AT motif was the most prevalent dinucleotide repeat in the walnut genome. Likewise, to this, the trinucleotide and tetranucleotide repeats of the motifs ATA/TAT and AAAT/ATTT were the most common, indicating that they are the most frequent motifs throughout the entire walnut genome. Since AT motifs are unlikely to undergo mutations. For instance, AG/CT is the most abundant motif in rice (Zhang et al. 2007) and citrus (Liu et al. 2013). However, AT/TA motif is abundant in maize (Xu et al. 2013), cucumber (Cavagnaro et al. 2010), pomegranate (Patil et al. 2021), pepper (Zhong et al. 2021), and watermelon (Zhu et al. 2012). Such studies indicate overrepresentation of different motifs in different plant species.

Molecular diversity analysis of J. regia genotypes based on 20 microsatellite markers revealed a high level of polymorphism in different genotypes of walnut indicating a suitability of these markers for studying genetic diversity. Microsatellite markers are suitable for studying the walnut genetic diversity (Ahmed et al. 2012; Bai et al. 2010; Dangl et al. 2005; Foroni et al. 2005; Gunn et al. 2010; Shah et al. 2020; Victory et al. 2006; Woeste et al. 2002; Karimi et al. 2010). All primers showed high rate of amplification success. In the present study, some of the primers were unable to amplify all genotypes indicating that these genotypes are distant to the Chandler. Walnut being diploid so the SSRs produced a maximum of two bands per locus and the results are in accordance with earlier reports (Ahmed et al. 2012; Najafi et al. 2014; Mahmoodi et al. 2019). However, some primers (Walnut primer-7 and Walnut primer-11) produced multiple bands suggesting their multi-loci nature.

The substantial impact on the utilization of the SSR markers depends on the SSR markers, the accuracy of the genotypic data acquisition, and the planting material (Liu et al. 2018b, 2017). We were able to find 20 highly polymorphic SSR markers which amplified distinct and consistent bands across 72 walnut genotypes. The size of the amplified products was at par with the expected size value of each locus. This shows the primer binding site of primers was highly conserved. Surprisingly the few SSR markers produced low PIC value ˂ 0.5 and majority of the markers produced PIC value > 0.5. The low PIC value may likely be due to location of these markers in the coding regions of the genotypes. The SSRs found in coding regions are less prone to mutation than non-coding genomic SSRs (Kalia et al. 2011). The average PIC value of our SSR markers was comparatively lesser than reported by Guney et al. (2021). The variations in PIC value may be due to sampling technique, number of SSR markers, the size and type of SSR motifs repeats, and the location of the SSR motifs in the genome (Orhan et al. 2020). The PIC value of the majority of the newly developed SSR markers is > 0.5 demonstrating their suitability for phylogenetic and diversity studies as well as construction of linkage maps (Biswas et al. 2014). The present study reports 5.2 alleles per primer and is significantly lower than 23.8 alleles per primer reported by Victory et al. (2006). It is interesting to note that compared to agarose, metaphor gel electrophoresis polyacrylamide gel electrophoresis and the automated capillary DNA fragment analyzer significantly contribute to higher polymorphism (Ebrahimi et al. 2011; Dangl et al. 2005; Patil et al. 2020a). We anticipated that our polymorphic SSR markers can reveal higher number of alleles if assayed through automated capillary systems or polyacrylamide gel electrophoresis. The variation in number of alleles amplified may also be due to highly diverse nature of the samples and number of SSRs tested.

Unrevealing the degree of genetic diversity is necessary for accelerating the walnut genetic improvement. To achieve this, molecular marker technologies, such as SSRs, have become a promising method for identifying genetic variation in a set of genotypes. In this context, the heatmap, PCoA, and structure analysis methods were effectively used to measure the genetic relationships and population differentiation (Ebrahimi et al. 2016; Shah et al. 2020; Pollegioni et al. 2011, 2015). According to Roor et al. (2017), the Himalayan range of Jammu and Kashmir is the native range of the J. regia. The fragmentation and geographic isolation of the walnut populations in this area occurs due to genes flow barrier and other natural factors (Pollegioni et al. 2015). This led to population differentiation in natural range of walnut. However, there are other factors such as human activities, which can contribute to the genetic structure of the autochthonous population (Gunn et al. 2010). Therefore, the population genetic structure revealed by our genetic data needs to be integrated with historical and linguistic sources to find whether this is the product of natural factors or anthropogenic dispersal or human cultural interactions. We observed higher molecular variance within the walnut populations, which may be attributed to the predominant cross-pollination of walnut (Victory et al. 2006; Pollegioni et al. 2014) and the higher gene flow. The low molecular variance among populations is related to long separation, avoidance of long-distance pollination, and fragmented character of populations, which causes pollinations within near relatives only. These results are in accordance with other earlier studies (Magige et al. 2022; Wang et al. 2022; Zhang et al. 2022). Therefore, when selecting the populations of J. regia with high genetic diversity, the individuals should be selected from within the population for genetic improvement of the walnut.

Conclusion

Walnut is an economically important nut crop with high diversity. The long juvenile period is a bottleneck for its genetic improvement. For walnut speed breeding, it is imperative to identify the markers tightly linked to the economic traits. Rapid progress has been made in the development of genomic tools over the past few years, such as the release of the genome sequence, which created new prospects for the development of numerous genetic markers like SSRs. To explore this opportunity, we identified 198,924 SSR loci and successfully designed primers for 162,594 SSR loci. As 100 out of 110 SSRs amplified the various walnut genotypes, the e-PCR module demonstrated that each SSR created in the current study will generate an amplicon across all of the walnut genotypes. The majority of our SSRs had PIC values above 0.5, which shows their robustness for predicting genetic diversity and population structure. To the best of our knowledge, this is the first study of scanning SSRs from the walnut genome, and we present a microsatellite repository for the walnut scientific community. These SSRs will be helpful for walnut improvement such as development of saturated genetic linkage map, genetic structure, QTL mapping, and marker-assisted selection.