Introduction

English or Persian walnut (Juglans regia L., 2n = 32), monoic trees, are native to the mountain chain of Central Asia and grows as wild or semi-cultivated trees in a wide area from south-eastern Europe and the Caucasus to Turkey and Iran, through southern portions of the former Soviet Union into China and the eastern Himalayas. The Persian or English walnut is the most economically important species from all the 21 species belonging to the genus Juglans and is the only species widely cultivated for its edible nuts. J. regia is an important and healthy food as well as base material for timber industry. Recent epidemiologic studies have associated walnut consumption with a reduced incidence of cardiovascular mortality [1].

Nuclear ribosomal genes (rDNA) are constituents of individual 18S-5.8S-28S repeats, tandemly repeated at one or more chromosomal loci. In plant genomes, variations in location and number of rDNA sites are commonly observed and the rDNA site loss or duplication is correlated with polyploidization events [2].

Internal transcribed spacers (ITS1 and ITS2) are part of the 18S-5.8S-28S nuclear ribosomal cistrons. In plants, ITS sequences vary from 500 to 700 bp length in angiosperms to 1,500–3,700 bp length in some gymnosperms [3, 4]. ITS region evolves relatively rapidly and is the popular tool for reconstructing molecular phylogenies at the specific, generic or family levels [57]. ITS1, located between the 18S and 5.8S genes, may be especially valuable at the species level and below [8]. ITS2, located between the 5.8S and 28S genes, has proven to contain useful biological information at higher taxonomic levels [9]. ITS sequence data have and may continue to provide insight not only into phylogenetic history but also polyploid ancestry, genome relationships, historical introgression and other evolutionary questions [10, 11]. The ITS regions and 5.8S gene of the nuclear ribosomal DNA are successfully used for investigated phylogenetic and biogeographic relationships within the genus Juglans consistent with the known geological history of Juglans [12].

It has been observed that the functionality of ITSs is related to specific post transcriptional modification by cleavage of the primary transcript within ITS1 and ITS2 during maturation of the small subunit (SSU), 5.8S, and the large subunit (LSU) ribosomal RNAs. The structural integrity of ITSs is an essential prerequisite for the correct processing of mature rRNA and for the biogenesis of active ribosomal subunits [13]. ITS sequences are subject to evolutionary constraints related to the maintenance of specific secondary structures that provide functionality [14, 15].

It has long been recognized that rDNA belong to multigene families and ITS sequences in particular, may be subject to concerted evolution, which occurs primarily through unequal crossing over and gene conversion [16, 17]. Concerted evolutionary mechanisms can homogenize the same sequence type per genome, leaving only species- and clade-specific character-state changes. However, following the hybridization, introgression, polyploidization or ITS pseudogenization events a plant genome may harbour divergent ITS sequence types, recovered from more exhaustive sampling of “ITS clones” from a single genome [1].

Traditional methods to identify walnut cultivars are based on phenotypic observations, but morphological characteristics are often affected by environmental and developmental factors, making their differentiation difficult and sometimes impossible. For this reason, morphological characteristics cannot be used for screening large numbers of walnut samples. Therefore, a simple method of DNA analysis, rather than the traditional authentication methods, is clearly desirable. First described by Newton and colleagues [18], amplification refractory mutations system (ARMS) has become a standard technique that allows the discrimination of alleles at a specific locus differing by as little as 1 bp [19]. The basis of ARMS is that oligonucleotides with a mismatched 3′-residue will not function as primers in PCR under appropriate conditions [18].

The aim of this study was to develop a simple technique for specific authentication of 18 walnut cultivars using DNA method. In this study, we successfully employed ARMS-PCR to specifically identify of 18 walnut cultivars by exploiting some SNPs in internal transcribed spacers (ITS1 and ITS2 of the 18S-5.8S-28S nuclear ribosomal).

We found a cultivar-specific polymorphism in ITS1 and ITS2 regions and even in 5.8S ribosomal RNA, used in ARMS analysis for cultivar discrimination. Moreover, we explored the ITS1 and ITS2 secondary structures to examine SNPs effects on the ITS1 and ITS2 folds.

Materials and methods

Plant materials

A total of 18 cultivars of walnut (J. regia L.) were used in this study. Cultivars belonging to distinct geographical areas: Italy, Hungary, France, USA, Spain, Portugal and Greece (Table 1). The investigated plants were taken from the Fruit Tree Research Unit’s collection field.

Table 1 English walnut J. regia L. cultivars studied with their geographic origin

Genomic DNA extraction

The extractions were performed in triplicate using young leaves randomly sampled from adult trees and frozen in liquid nitrogen until use. For each cultivar investigated, genomic DNA was isolated using 1 g of young leaves from three individual plants following the protocol described by Doyle and Doyle [20], and purified with the ICRISAT DNA extraction procedure [21] to improve the DNA quality. DNA quality and quantity were determined spectrophotometrically and by electrophoresis on 1% agarose gel stained by ethidium bromide [22].

PCR amplification and cloning

ITS1-5.8S-ITS2 region, consisting around 700 bp, were amplified using primers forward 18S (5′-AAGTCGTAACAAGGTTTCCGTA-3′) and the reverse 28S (5′-CCCGCTTATTGATATGCTTAAA-3′). The forward and reverse primers, located in the 3′ end of 18S rDNA and 5′ end of 28S rDNA, respectively, were designed according to the known sequence of J. regia (GenBank accession no. AF399876). PCR was performed in a total volume of 50 μl, and the reaction mix consisted of 80–100 ng of template DNA, 200 μM of each dNTP (Roche), 200 μM of each primer and 2.5 U of Fast Start High Fidelity Taq DNA polymerase in 10 mM Tris/HCl pH 8.3 and 2.5 mM MgCl2 (Roche). A 9800 PerkinElmer thermal cycler (Applied Biosystem) was used to carry out PCR under cycling profile of an initial denaturation step at 94°C for 7 min, followed by 35 cycles consisting of 30 s. denaturation at 94°C, 30 s annealing at 58°C and 30 s extension at 72°C. The final extension is 72°C for 7 min. PCR products were gel-purified using the High Pure PCR Product Purification Kit (Roche), in accordance with the manufacturer’s instructions, and cloned into pGEM-T Easy vector System II (Promega). Five independent clones for each cultivar were sequenced in both directions using an automated DNA sequencer (ABI PRISM 310, Applied Biosystems/Perkin-Elmer) with universal SP6 and T7 primers. The sequencing run of both strands was repeated to confirm the sequence data.

DNA sequence analysis

Nucleotide sequence data were analyzed and compared to the GenBank-NCBI databases using the BLAST network service (http://www.ncbi.nlm.nih.gov/BLAST/). Multiple sequence alignments were performed using AliBee—Multiple Alignment 2.0 (http://www.genebee.msu.su/services/malign_reduced.html) with Clustal W algorithm [23] from the DDBJ Homology Search system (http://www.ddbj.nig.ac.jp).

ARMS-PCR strategy

For each investigated population, one or more specific nucleotide positions in ITS1 and ITS2 were identified. Forward primers harbouring at the 3′ end the population-specific nucleotide and reverse common primers (located in rDNA 5.8S) for each population were designed. Moreover, primer sequences for the internal control (control 1 and control 2) was constructed. Their respective sequences were depicted in Table 2. Melting temperature (Tm) calculations were performed with Oligo Calc: Oligonucleotide Properties Calculator server (http://www.basic.northwestern.edu/biotools/oligocalc.html).

Table 2 ARMS-PCR primers

Based on the modified PCR protocol previously described by Newton and colleagues [18], ARMS PCRs were performed in a volume of 50 μl containing approximately 80 ng genomic DNA, 1.5 mM of each dNTP (Roche), 1 μM of each primer and 1U of Taq DNA polymerase in 10 mM Tris/HCl pH 8.3, 2.5 mM MgCl2 (Roche). In applying ARMS strategy to our analysis, control 1 and control 2 primers were included in all reactions to provide an internal control PCR product. The PCR cycling program consisted of a preliminary denaturation at 94°C for 4 min, followed by 35 cycles of 30 s at 94°C, 45 s at 50 or 66 °C, and 30 s at 72°C with a final extension step for 7 min at 72°C. Then, PCR products were electrophoresed on 2% agarose gels and visualized in presence of 0.5 μg/ml ethidium bromide.

A phylogenetic tree was constructed using Neighbor-Joining algorithm of the phylogeny program of MEGA 3.1.

The percentage of GC nucleotides in ITS1 and ITS2 from each cultivar was calculated using the GC calculator server (http://www.genomicsplace.com/gc_calc.html).

The RNA secondary structures for ITS1 and ITS2 were predicted with mfold version 3.2 program (http://www.bioinfo.rpi.edu/applications/mfold) available on the World Wide Web [24]. Mfold predicts RNA structures by identifying the suboptimal structures using the free energy optimization methodology at a default temperature of 37°C according to the Zuker dynamic programming algorithm [24]. Prediction of the ITS1 and ITS2 RNA common secondary structures was performed using the Pfold server (http://www.daimi.au.dk/~compbio/pfold). The method generates a statistical sample of individual structures from an alignment of RNA sequences using an algorithm based on an explicit evolutionary model and a probabilistic model of structures [25].

RNA fold

The Sribo program in Sfold (Statistical Folding and Rational Design of Nucleic Acids) server was used to predict the probable target accessibility sites (loops) for trans-cleaving ribozymes in ITS1 and ITS2 [26]. The probability profiling approach by Ding and Lawrence [26, 27] reveals target sites that are commonly accessible for a large a number of statistically representative structures in the target RNA. This novel approach bypasses the long-standing difficulty in accessibility evaluation due to limited representation of probable structures due to high statistical confidence in predictions. The probability profile for individual bases (W = 1) is produced for the region that includes a triplet (the default triplet is GUC) and two flanking sequences of 15 bases each in every site of the selected cleavage triplet.

Results and Discussion

Isolation and sequence analysis of ITS1-5.8S-ITS2 clones from 18 J. regia cultivars

Sequence data have been deposited in the GeneBank Data Library under accession number

Table 3

We determined and compared the nucleotide sequences of the ITS1-5.8S-ITS2 region obtained from multiple sequence alignment of the five clones for each J. regia cultivars belonging to seven geographic areas (Fig. 1; Table 1).

Fig. 1
figure 1

Multiple sequence alignment of 18S-ITS1–5.8S-ITS2–28S rDNA from 18 walnut cultivars. Names of cultivars are listed on the left. ‘.’ indicate nucleotides from all sequences identical to those of the J. regia consensus sequence; ‘-’ indicate deletion. SNPs are indicated to letters

The length of ITS regions in the population investigated was similar to that expected [17, 28], as well as the length compensation between ITS1 and ITS2 is a family-specific trait of Juglandaceae and other families, such as Betulaceae, Scrophulariaceae and Viscaceae [4, 17]. ITS1 sequence varied from 257 to 263 bp, 5.8S region was 164 bp and the ITS2 from 217 to 219 bp (Table 3). The G+C average content of ITS1, 5.8S and ITS2 from the 18 J. regia cultivars was 54.3–58.9% (Table 3). It has long been recognized that the G+C content between ITS1 and ITS2 is balanced in Juglandaceae as in most organisms and it has also been suggested that the high content of GC is related to the stability maintenance of the DNA and RNA secondary structures especially in stem-loop [14]. Multiple sequence alignment of the five clones (for each cultivar) from ITS1-5.8S-ITS2 region showed within-cultivar identical sequences. For each cultivar, consensus sequence was used for performed multiple alignment, results are showed in Fig. 1.

Table 3 ITS1, 5.8S, ITS2 length and G+C % content for each cultivar analysed

Variations in ITS1 and ITS2 sequences analysed included transition, transversion, substitution and deletion events. The aligned ITS dataset showed 244 variable positions, 168 in ITS1, 76 in ITS2 and 2 in 5.8S region. There was a AC insertion/deletion in the ITS1 region (Fig. 1). In particular, the transitions (60%) predominant over transversions (40%) (Table 4). Transitions from C to T play an important role in the ITS evolution of plants by achieving part of the GC balance recognized in the spacers [14]. Moreover CpG e CpNpG are the main sites of methylation in plants rDNA.

Table 4 Mutations in ITS1, ITS2 and rDNA 5.8S sequences of 18 walnut cultivars

SNPs are an excellent way to obtain high-quality markers useful for a very quickly identify of a single base differences within the genome [29]. SNPs are the most used molecular markers because they have an high density throughout the genome. They were used for the first time to classify 51 olive cultivars, allowing 49 varieties of olives grown for oil production in Mediterranean area to be discriminated and new cultivars identified [30]. ITS sequence was also used to develop molecular marker to distinguish Tribulus terrestris L. (Zygophyllaceae) from its adulterants [31]. ITS sequences show a relatively high number of SNPs and are easily isolated by PCR, because they are located between highly conserved sequences (18S and 28S gene) that constitute the target site for the construction of primers. These two characteristics make ITS an important means for reconstructing molecular phylogenies and fingerprinting plants.

Relationship between Juglans cultivars

A Neighbour-Joining tree based on nucleotide sequence alignment was constructed for ITS1-5.8S-ITS2 whole region of 18 cultivars. Two major clodes of Juglans cultivars ware recognized (Fig. 2). Clode I included 12 cultivars from Italy, France, Portugal, Spain, Greece, Hungary and USA (Sorrento, Castronuovo, Freni 2, Lara, Rego, Grand Jefe, Del Carril, FK5, FM6, A117, Hartley and Chico C) and J. regia (a.n. AF399876). Clode II comprised two french cultivar (Franquette and Marbotte), one italian (Chiusa) and three american cultivar (Vina, Payne, Gustine).

Fig. 2
figure 2

Phylogenetic tree depicting the relationship among different J. regia cultivars based on rDNA ITS sequences. The tree was inferred using the Neighbor-Joining method. Bootstrap values are shown as percentages at each node based on 500 replicates

In a few cases, cultivars geographically close or growing in similar pedoclimatic conditions were more similar than populations grown in different geographic areas and under different conditions. This is likely due to migration of J. regia between Europe and North America and to certain phenomena, such as high introgression during plant species domestication that occurs in a pauperization of genetic pool [17], hybridization and polyploidization events [32], and/or incomplete intra- or inter-array homogenization processes [6].

SNPs effects on ITS1 and ITS2 secondary structure

Among the molecular markers used for walnut taxonomy, the ITS1-ITS2 of the ribosomal DNA genes are useful for distinguishing among specimens belonging to closely-related species.

Because the structural elements of ITS sequences are essential for the specific cleavage steps during the ribosomal RNA maturation, we analyzed the effects of the SNPs on the ITS1-ITS2 folds and compared the secondary structure from each investigated cultivars.

Hence, the secondary structures of ITS RNA sequences provide additional evolutionary information [3335] and a rather simple molecular marker, which may be particularly useful when studying closely related species [36]. We relied primarily on the free-energy minimization approach to secondary structure inference, which assumes that the dominant interactions (H-bonding between bases and stacking between adjacent bases) are local and that conformations adopted by RNA are equilibrium, lowest free-energy conformations [37].

However, the thermodynamically optimal structure does not necessarily reflect the in vivo structure, since (largely unknown) interacting factors such as other molecules (e.g. proteins) or tertiary structure constraints may occur. Consequently, common structural elements of rRNA transcripts should not be exclusively reconstructed by energy optimization (provided by mFOLD), but also by homologizing internal regions of sequences [‘phylogenetic comparative method’] by mutual comparison (i.e. plausibility of thermodynamically optimal and suboptimal hypotheses). Predicted ITS1 structures of 18 J. regia cultivars are given in Fig. 3. The stems (double stranded paired regions) are assumed to stabilize RNA secondary structures [38]. Comparing secondary structure ITS1 sequences showed that all cultivars had a common base topology, a different number of loops, due to the sequence variability found at the 5′ end ITS1, which generates the assessment in overall structure of ITS1 folds. These results suggest that the mutations that affect ITS1 secondary structures of different walnut cultivars are not simply accumulated due to random mutation and have evolved for functional selection in ribosome biogenesis to assess target accessibility, an essential pre-requisite for the correct RNA processing.

Fig. 3
figure 3

ITS1 and ITS2 transcript secondary structures. Major helices are labeled I ± III for ITS1 (a) and ITS2 (b), respectively

All sequences contain the following ITS1 motif: ‘GGCG-CGGTCT-GCGCCAAGGAA’ which corresponds to the published conserved angiosperm ITS1 motif: ‘GGCG-(4–7n)–GYGYCAAGGAA’ (where Y = C or T) by Liu and Schardl [39] (Table 5). In previous studies on many flowering plants this characteristic sequence has been reported in the middle of ITS1 and this sequence is presumed as a recognition site for processing of a primary transcript into the structural rRNA [39]. The highly conserved specific region of ITS1 which was found in our study, as well as for example in the Asteraceae [40] and in the Rosaceae genus Rosa [41], supports its conservative status and also its important role in secondary structure forming.

Table 5 Conserved ITS1 and ITS2 motifs in J. regia

A general secondary structural model could be set up, the main features of which are: (1) three helices highly conserved; (2) helix I containing in the loop the UGUAAUG motif; (3) helix II that is longer than other two and highly conserved containing a CUCCUCGUGUG motif; (4) a helix III containing GGAAAC motif (Fig. 3). In Payne, Castronuovo, Franquette, Sorrento, Marbotte, Hartley, Gustine and Chico C is present an additional helix from the helix II and III that present the CGGUCUG motif, absent in Del Carril, Grand Jefe, A117, Freni 2, Rego, FK5, FM6, Lara, Chiusa and Vina. In addition, Castronuovo, Franquette, Marbotte, Chiusa, Vina, Sorrento, Grand Jefe, A117, Freni2, Rego, FK5, FM6, Lara present a further helix with the UAAACAAGG motif (Fig. 3).

The secondary structure of the walnut ITS2 cDNA sequences mainly correspond to the common secondary structure (helices I–IV) in angiosperms [15, 40, 42]. According to the position of the stem-loop regions of both spacers among angiosperms, it has been proposed that these act as a scaffold for the processing of the coding regions [39].

In the case of ITS2, distinct hallmarks of a core structure have been shown. These are (1) four helices with (2) helix I present the CUUAUG motif in the loop (3) as well as CUUCUG motif in the helix II with a pyrimidine (U–U) mismatch which appears to play the most important role [15] and (4) helix III as the longest and containing an UGAGAA motif in the loop. A substitution from GT to CT and AG was observed in cultivars Chiusa and Marbotte occurring the absens of the motif UGAGAA in the loop of helix III.

Within helix III, all sequences contain three conserved motifs: ‘GCGCCACGACAATCGGTGGTTGAGA’ which corresponds to the published conserved angiosperm ITS1 motif 4 ‘NNH-N-HRRYNNNAYGGTGGTWNNN’; in addition ‘GTGTTGCC’ which corresponds to motif 5 ‘NYGYNGYN’ and finally, ‘GCTC’ corresponding to motif 6 ‘RCYY’ [39]. This motifs playing an important role in the stem-loop formation of the secondary structure of ITS2.

A complex network of interactions take place to assemble the pre-rRNA structural features directly involved in processing steps into a relatively compact structure. The identification of common structural elements indicates equivalent functionality of the corresponding molecules [43].

Subtle secondary structural motifs may participate in the ITS excision process, spatially positioned by the conserved framework of helices.

The biological function of an RNA molecule is determined by its secondary structure and the sequence variations that contribute to differences between species are those that preserve the RNA functions [44]. The probability profiling approach revealed that the interior loops represent the targeting preferential sites, whereas the exterior loops are target sites of difficult accessibility. Our results suggest that the ITS folds assessed by the SNPs do not affect the target accessibility, an essential pre-requisite for the correct RNA processing. Critical changes in the rRNA folding pattern due to evolutionary sequence variation in the ITS regions may have an important role on the kinetics of precursor rRNA formation for the efficient functioning of the rDNA cluster [44].

ARMS-PCR

Molecular characterization of Juglans cultivars was performed via ARMS-PCR. As shown in Fig. 1 there were 7 variable sites in ITS1 and 12 in ITS2.

The primers designed to differentiate the 18 J. regia cultivars were highly specific, as only the targeted cultivar was amplified by the corresponding primers. The 90 bp internal control was generated in all PCR amplifications whereas no product was observed when the cultivar-specific primers was applied to other cultivars. We designed specific primers that contain at their 3′ ends the specific nucleotide, that match with the SNP which we used in conjunction with a single common primer (Table 2). Specific primers 3′-OH terminal mismatches (which are purine/purine, pyrimidine/pyrimidine and purine/ pyrimidine type mismatches) were refractory to extension by the DNA polymerase lacking in 3′-exonucleolytic proofreading activity [18], thus probably prohibiting hybridization of the specific primer on template DNA from other cultivars.

In particular, the C insertion located next to CpC residues at position 126 in the ITS2 region from Chiusa may be the result of C residue duplication due to the slippage replication events and short mismatch during replication [45]. A substitution from TA to GC at positions 7 and 8 in the ITS1 region are fixed in Payne, Gustine, Chico C and Hartley population-specific single point sites. The primers designed allow to differentiate seventeen of 18 J. regia cultivars (Fig. 4).

Fig. 4
figure 4

ARMS-PCR strategy based on the SNPs. Lanes 118 amplification PCR products for ITS1 (a) and ITS2 (b) and 90 bp internal control obtained from the genomes of Sorrento, Payne, Hartley, Chico C, Gustine, Freni 2, Vina, Chiusa, Franquette, Marbotte, Lara, Castronuovo, Grand Jefe, A117, Del Carril, FM6, FK5, Rego

The SNPs, can provide useful molecular markers for the development of an assay that could discriminate different J. regia cultivars by using primers that hybridize with the specific nucleotide substitutions. Our results demonstrated the feasibility and usefulness of the ARMS-PCR for differentiation of seventeen on 18 J. regia cultivars, suggesting that the ITS specific variation sequence could be used as a molecular marker in cultivar identification and discrimination. Molecular characterization can be performed using other molecular markers such as RAPD, ISSR, RFLP and SSR. These methods are useful for revealing genetic polymorphisms among different walnut cultivars, but RAPD and ISSR are not suitable for cultivar identification due to their lack of reproducibility. RFLP is a laborious test, because it requires treating amplified DNA with restriction enzyme and the maintenance of probes clones The drawback of SSR is that the size differences between the products amplified from each allele are usually small, complicating reliable scoring by standard agarose gel electrophoresis (Fig. 4).

In recent years, with increasing interest in the plant systematic community for resource improvement and conservation, several studies have been undertaken for new marker-assisted breeding development [46], molecular authentication of the medical plants using PCR-RFLP and ARMS strategies [4751], and genetic assessment based on ITS sequence polymorphism from commercially important cultivars and species of authentic geographical localization [5153]. Because the variety/species identification is an important link between the conservation and utilization of a commercially important plant genetic resource such as J. regia, new and fast molecular authentication methods for the cultivar discrimination are needed. ARMS strategy is routinely performed by molecular geneticists for new disease-associated mutation identification with good genotyping accuracy [18, 54, 55].

This strategy may therefore be a good tool for the comparative genetic studies of agronomically important cultivars and identification of the unnamed local cultivars, and can also provide useful information on the uniqueness of the accessions from germplasm collection.