Introduction

Crop improvement relies on the effective use of molecular marker technologies. Numerous kinds of molecular markers have been developed over the last 25 years. Diversity array technology (DArT) involves the use of methylation-sensitive restriction enzymes to digest genomic DNA, thereby reducing the genome complexity and enriching the low copy sequences for marker development. This technique has been shown to be reproducible and cost effective. DArT technology allows the simultaneous typing of several hundred polymorphic loci spread over a genome without any previous sequence information (Jaccoud et al. 2001; Wenzl et al. 2004). For these reasons, DArT markers have been largely used in plant genetic studies, as anonymous genomic markers. In the present study, information derived by a posteriori sequencing of DArT clones in wheat was used to provide valuable knowledge of the features and functional meaning of DArT sequences; these findings are of great interest for mapping studies in wheat, as well as in many other species.

DArT arrays have been developed for a variety of species, from Arabidopsis (Wittenberg et al. 2005) to crops, including barley (Wenzl et al. 2004) and wheat (Akbari et al. 2006). Recent studies have demonstrated the suitability of DArT markers for genetic studies in nonmodel organisms, such as seed-free land plants (James et al. 2008) and wild species (Alsop et al. 2011). The allocation of DArT markers in common wheat was greatly increased through a study of Francki et al. (2009), which was based on the alignment of genetic and cytogenetic maps. Furthermore, increased densities of genetic maps in specific genomic regions of wheat have been obtained through the coupling of DArT technology with chromosome sorting (Wenzl et al. 2010). Over the last few years, a dedicated DArT genotyping platform has been produced also for durum wheat, which has allowed the incorporation of new markers into the genetic maps (Mantovani et al. 2008; Peleg et al. 2008).

The construction of genetic maps based on molecular markers is a prerequisite for definition of the genetic basis of qualitative and quantitative agronomically important traits, for discovery of new genes that are involved in the control of phenotypic variations, and for identification of molecular markers that are linked to genes that are useful for marker-assisted selection. A number of molecular linkage maps have been developed in wheat, and many of them make use of DArT markers, although relatively less attention has been paid to durum wheat (Zhang et al. 2008; Peleg et al. 2008; Mantovani et al. 2008; Francki et al. 2009; Gadaleta et al. 2009; Blanco et al. 2011, 2012).

DArT markers have also been used in the integration of data derived from individual datasets in consensus maps in many species, such as sorghum, triticale, barley and durum wheat (Wenzl et al. 2006; Mace et al. 2009; Alheit et al. 2011; Alsop et al. 2011; Trebbi et al. 2011; Marone et al. 2012). Integrated maps have been used for association mapping studies in bread and durum wheat (Crossa et al. 2007; Neumann et al. 2010; Maccaferri et al. 2011a).

In the present study, the sequences of 2,000 wheat DArT clones (http://www.diversityarrays.com/sequences) were analyzed, and the genetic features of a subset of these were studied in three durum wheat genetic maps. This analysis has revealed new and interesting insights into redundancy, correspondence with expressed sequences, and the functional aspects of these DArT markers.

Materials and methods

Plant material

Three durum wheat populations consisting of 123, 161 and 178 F8–F9 recombinant inbred lines (RILs) were developed by single-seed descent from the crosses of Creso × Pedroso (CP), Ofanto × Cappelli (OC), and Cirillo × Neodur (CN), respectively. The CP population was used previously to study the genetic basis of leaf rust durable resistance of the cultivar Creso (Marone et al. 2009). The CP and OC populations have been used to map genes coding for different lipoxygenase isoforms on chromosome 4B (Verlotta et al. 2010). The CN map has allowed the identification of quantitative trait loci (QTLs) that explain the resistance against soil-borne cereal mosaic virus in the Neodur variety (Russo et al. 2011). Finally, the CP, OC and CN populations belong to a set of six genetic maps that have been used to construct a consensus map in durum wheat (Marone et al. 2012). Over the last few years, these maps have been implemented in terms of numbers of lines (OC and CN) and numbers of molecular markers (CP and CN).

Genotyping

Nearly 800 PCR-based markers (genomic SSRs, EST-SSRs, EST-sequence-tagged sites and target-region amplified polymorphisms [TRAPs]) were used to screen DNA from the parents and the RILs of the mapping populations. The PCR reactions were performed as described by Marone et al. (2009). Different thermal cycling conditions were used to amplify the TRAP markers, according to Liu et al. (2005). The amplification products were analyzed using capillary electrophoresis (ABI3130), with multiplexing of different fluorescent dyes. Electropherograms were analyzed with GeneMapper, version 4.0.

The DArT technology was provided by Triticarte Pty Ltd (Canberra, Australia; http://www.triticarte.com.au), a whole-genome profiling service laboratory. A bread wheat array was adopted for the CP and OC lines, as described by Akbari et al. (2006), whereas the CN population was genotyped with the durum wheat-specific array, as reported by Mantovani et al. (2008).

Segregation analysis and map construction

Goodness of fit for all of the loci to an expected 1:1 segregation ratio was tested using Chi-squared analysis. The genetic linkage maps were constructed using the Kosambi mapping function within the JoinMap 4 software (Van Ooijen and Voorips 2004), considering a minimum limit of detection score of 4.0. The linkage groups obtained were assigned to chromosomes by comparing the marker positions to previously published durum wheat maps (Korzun et al. 1999; Elouafi et al. 2001; Nachit et al. 2001; Elouafi and Nachit 2004; Blanco et al. 1998, 2004; Xue et al. 2008) and to the hexaploid wheat SSR consensus map (Somers et al. 2004).

Sequence analysis of the DArT clones

The sequences of the bread wheat DArT clones are available at the Triticarte website (http://www.diversityarrays.com/sequences.html). All of the 2,000 sequences were analyzed with the ContigExpress tool within the Vector NTI 9 software (Life Technologies), with some modifications to reduce the stringency of alignment (“mismatch score factor” = −1; “gap penalty factor” = 1; and “minimum number of good reads at clip position” = 2). ContigExpress is a program for assembling similar DNA fragments into longer consensus sequences that are commonly known as ‘contigs’, and in this case they were derived from the alignment of overlapping DArT clones. Each contig was visually checked and verified, and then characterized according to the extension of the overlapping region (percentage of overlapping base pairs of single DArT sequences with respect to the length of the contig), and to the level of diversity, defined as the percentage of the number of mismatches across the DArT sequences belonging to the same contig, with respect to the length of the overlapping region. The percentage of base diversity, and not the percentage of identity across the sequences grouped in the same contig, was chosen as the parameter to describe the alignments, as the DArT sequences showed variable quality (different numbers of N residues were present). Indels longer than 1 bp were also considered as mismatches. A similarity search was carried out using the nonredundant DArT sequences as queries with the BLASTN (threshold E-40) and BLASTX (threshold E-10) algorithms in the EST and in nonredundant protein sequences databases, respectively. BLAST matches with probability levels lower than the indicated thresholds were considered when the identity percentage was higher than 80 %. A BLASTN search was also carried out in the Cerealsdb database, which contains the draft assembly of the gene-rich regions of the Chinese spring genome (http://www.cerealsdb.uk.net/CerealsDB/Documents/DOC_search_reads.php; threshold E-100). When a genomic sequence longer than the original DArT sequence was found, it was used as a query in a new BLASTX search. Annotations of the resulting proteins were confirmed or implemented by searching known domains in the Pfam database (http://www.sanger.ac.uk/resources/databases/pfam.html). To identify the different classes and sub-classes of transposable elements, the corresponding DArT sequences were used as queries in a tBLASTX search against the Triticeae repeat-sequence (TREP) database.

Results

Sequence analysis of the DArT clones

Redundancy of the DArT clones

The 2,000 DArT sequences were analyzed for clusters of redundant sequences. A total of 486 DArT sequences (24.3 %) were grouped into 214 contigs. The remaining 1,514 DArT sequences represented non-redundant sequences, i.e. sequences that did not show significant overlapping regions with each other. Figure 1a shows an example of the contig that groups the clones wPt-7558 and wPt-8841, where the two sequences overlap for a region that is 269 bp long, and that corresponds to 80.3 % of the contig length. Three mismatches are present in the overlapping region, and therefore a percentage of base diversity of 1.1 was calculated. The frequency distribution of the contigs in relation to the extension of the overlapping regions is reported in Fig. 1b. Overall, for 204 contigs (95 %) the extension of the overlapping region was greater than 60 % of the consensus. In particular, 171 contigs were characterized by overlapping regions of between 80 and 100 % of the length of the generated consensus sequence. Most of the contigs derived from two sequences (81 %), and contigs with up to seven sequences were also found, even if in very limited numbers (Online Resource 1). Finally, the majority of contigs (146; 68 %) were characterized by ≤2 % mismatches, and therefore by nearly identical sequences. The levels of diversity within the remaining contigs ranged from 2.1 to 14.6 % (Fig. 1c).

Fig. 1
figure 1

Main features of the contigs of the DArT sequences. a Example of a contig resulting from the alignment between clones wPt-7558 and wPt-8841. The overlapping region is underlined, and the mismatches are highlighted in gray. b Distribution of the contigs based on the extension of the overlapping region (percentage of overlapping bases on total length of the contig). c Distribution of the contigs based on the percentage of base diversity in the overlapping regions

DArT markers: a survey of markers corresponding to expressed sequences with high representation of genes related to disease resistance

The sequences of the DArT clones were analyzed using a similarity search that was carried out in different public databases with the BLASTN and BLASTX algorithms. As the sequences were short in many cases, matches were also searched for in the Cerealsdb database, and when a longer genomic clone was identified, this was used as a query for a new BLASTX search. The similarity search gave a positive result for a total of 1,124 DArT clones, which therefore represent putatively expressed sequences. In more detail, 207 clones gave a match with BLASTX searches only, while 917 clones showed positive results with both BLASTX and BLASTN searches in the EST database (Online Resource 2). Interestingly, 26 sequences had no matches in the wheat plant EST database, nor among the wheat genomic sequences of the Cerealsdb database. However, following the BLASTX search, these queries showed very high percentages of identity at the amino acid level with viral and bacterial proteins.

Based on the results of the similarity searches, putative functions are proposed for 718 DArT clones. The remaining putatively expressed sequences corresponded to EST or protein sequences without a functional annotation or known domains. Many gene families are present, although some functional categories are more represented than others. Figure 2a illustrates the more frequent classes. The most represented sequences within the expressed DArT clones were transposons and retrotransposons (17.4 % of the sequences with putative functions). A more detailed analysis was carried out to identify the different classes and sub-classes of transposable elements across the DArT sequences (Fig. 2b, Online Resource 2). Out of 124 DArT clones, 30 (24.2 %) correspond to DNA transposons and 94 (75.8 %) to retrotransposons. Regarding the first group, the most represented sub-class is Mutator (17 clones), while in relation to retrotransposons, most of the clones correspond to nonlong-terminal repeat (LTR) elements (56 clones). Copia and Gypsy elements are also represented in the LTR retrotransposons.

Fig. 2
figure 2

Functional meaning of DArT sequences based on the similarity search. a The most represented functional classes of the DArT clones. b Representation of different classes of transposable elements (TE). Light gray bars represent DNA TE, and dark gray bars represent retrotransposons

Apart from these sequences, significant enrichment was found for genes putatively coding for nucleotide binding site–leucine-rich repeat (NBS–LRR) proteins (92 clones; 12.8 %) and protein kinases (48 clones; 6.7 %), with 11 of these characterized by LRR domains. Other classes with a high number of representatives include the E3 ubiquitin ligases, most of which contain an F-box domain, and various kinds of membrane transporters and proteases.

In many cases, an NBS–LRR domain was found within the translated sequence of the DArT clone itself; in the other cases, the DArT sequence did not cover the coding region of the gene, but an NBS–LRR motif was identified when the genomic clone found in the Cerealdb database was used as a query. An example is seen with the clone wPt-1191, the sequence of which was located in the promoter or 3′-UTR region of the gene, or in the upstream intergenic region (Fig. 3). Similarly, the wPt-9515 clone corresponded to a sequence that is positioned downstream of the coding region. In all of these cases, even if the DArT clones do not themselves correspond to a sequence coding for an NBS–LRR motif, they are positioned very close to this coding region, and they therefore represent a molecular marker for the gene of interest. In this analysis, the assembly of the Cerealdb database corresponding to a given DArT clone was only used for further BLASTX searches if there was an unambiguous relationship between them; i.e. if the DArT sequence mapped to a unique region of the assembly. Even if some caution is needed in identifying sequence similarities and alignments from assemblies from databases, these data provide a large-scale indication of the correspondence between DArT clones and close genes, which can be experimentally validated in cases of interest.

Fig. 3
figure 3

Example of the similarity searches carried out for sequences of the DArT clones. The sequence of the DArT clone wPt-1191, for which no hits were found in the database of nonredundant protein sequences, was used as a query for a BLASTN search in the Cerealdb database. The identified genomic sequence (Contig51364) was in turn used as a query for a BLASTX search. E values are indicated in brackets. Continuous lines indicate the nucleotide (wPt-1191 and Contig51364) and amino acid (ABA95335) sequences. Numbers indicate the position of nucleotides and amino acids that delimitate the overlapping regions

Using Cerealdb assemblies as queries in BLASTX search gave us indications of the distance between DArT sequences and genes. The analysis was carried out on a subset of 50 Cerealsdb assemblies longer than 3,000 bp. For 19 assemblies, the DArT sequences were within the gene, inside introns, or surrounding an intron–exon junction; in one case the DArT sequence partially overlapped with the 3′-end of the coding region. In the remaining 30 cases, the DArT sequences were positioned clearly outside the coding region of the gene. More in detail, 17 DArT clones were located upstream of the start codon, with an average distance of 1,380 bp (the interval was 200 to 3,768 bp), while 13 DArT clones were positioned downstream of the stop codon, at an average distance of 1,332 bp (the interval was 0 to 4,400 bp).

Overview of the durum wheat genetic maps

Three durum wheat genetic maps are shown in the present study, to investigate the distributions of the DArT markers in the durum wheat genome, in relation to their sequences.

The number of polymorphic markers used to generate each genetic map ranged from 398 (CN) to 607 (OC). A total of 794 DArT, 372 gSSR, 4 TRAP and 71 EST-derived markers were assigned to at least one genetic map, whereas the common markers across the genetic maps ranged from 54 between OC and CN, to 115 between CP and OC (Online Resource 3). The numbers of markers, total map lengths, and the marker densities for these genetic maps are summarized in Table 1. All of the chromosomes are covered by the molecular markers, except chromosome 1A for OC. The numbers of linkage groups for CP, OC and CN are 23, 30 and 29, respectively.

Table 1 Summary of the main features of the three genetic maps

The proportions of the DArT and PCR-based markers for individual chromosomes in each map are summarized in Table 2. The lowest proportions of markers (<3 %) are on chromosomes 5B, 3A and 5A, for CP, OC and CN, respectively. In contrast, the highest proportions of markers are seen for chromosome 7B for OC, chromosome 6A for CP, and 6B for CN. Finally, the percentages of mapped markers are higher on the B than the A genome (A genome: 40.5 to 47.2 %, vs. B genome: 52.8 to 59.5 %).

Table 2 Distribution of DArT and PCR-based markers for the chromosomes and genomes in the three genetic maps

Some gaps with sizes from 30 to 40 cM are observed. The map characterized by the highest number of gaps is for CN, on chromosomes 1B, 2A, 3A, 5A and 6A. Similar gaps are also seen on chromosomes 5A and 6A for OC, and on chromosomes 2B, 3B and 5B for CP. There are gaps larger than 40 cM only on chromosome 2A in the CP map, and 4A and 5A in the OC map.

The analysis of the marker distributions along the wheat chromosomes reveals that there are clusters of DArT markers for CP and OC (Online Resource 3). In particular, for CP, 21 DArT markers are clustered in 4 cM on chromosome 3B, and there is a group of 19 DArT markers on chromosome 6A, and a cluster of 20 DArT markers on chromosome 6B. On the same chromosome, for OC, 31 DArT markers are clustered in a region of 6.1 cM. Other clusters identified in this map are on chromosomes 4A (47 DArT markers in a region of 12.5 cM) and 7B (28 DArT markers in 8.6 cM).

The analysis of the three RIL populations identified a proportion of markers with significantly (P < 0.01) skewed segregation of the parental alleles from the expected 1:1 ratio, although with different frequencies across the three populations: 5 % for CP, 17 % for OC, and 10.6 % for CN. The majority of the loci with this distorted segregation show prevalence of the alleles inherited from Pedroso (52 %), Cappelli (65 %) and Cirillo (60 %). A total of 14 (2.4 %), 83 (13.6 %) and 34 (8.5 %) skewed markers are positioned in the CP, OC and CN genetic maps, respectively (Table 1). Not all of the skewed markers are included in the maps, due to a high level of missing data, or because they remained unlinked. The markers that show significant segregation distortion in each of the maps are shown in Fig. 4 (for chromosome 6B) and Online Resource 3. There are well-defined regions that comprise several skewed markers on chromosome 1B for CP, chromosomes 1B and 5B for CN, and chromosomes 2B, 4B, 5B, and 7B for OC. No common skewed regions can be identified across the three genetic maps.

Fig. 4
figure 4

Distributions of the DArT and PCR-based markers for chromosome 6B in the CP, OC and CN genetic maps. The short arm is at the top. Black circles, approximate positions of centromeres. Purple common markers between CP and CN, red common markers between CP and OC, blue common markers between OC and CN, green common markers across all of the three maps. Asterisk markers showing significant (P < 0.01) segregation distortion. Box near the marker name, putative function of the DArT clones (LRR leucine-rich repeat, KIN protein kinase, TE transposable element). The dashed lines connect DArT markers that belong to the same contig (sky blue)

Mapping features of the sequenced DArT clones

Out of the 2,000 DArT markers for which a sequence is available, 291 were mapped on the three durum wheat maps reported in this study, as they were polymorphic in at least one of the three segregating populations presented here. The map positions of DArT markers sharing nearly identical sequences to each other were considered, as redundancy might explain the clustering of DArTs in the genetic maps. Eighteen DArT markers, which are grouped in 8 contigs, were identified for CP, while there are 39 (18 contigs) and 16 (7 contigs) markers in the OC and CN genetic maps. In all of these cases, the DArT markers belonging to the same contigs share the same genetic position, or map to the same chromosomal region. This result was also confirmed for the contigs containing markers that are common to two genetic maps. Two examples are reported in Fig. 4. The three DArT markers wPt-4386, wPt-6563 and wPt-1922 correspond to the same contig: they are characterized by 0.4 % nucleotide diversity and are located in the same genetic position on chromosome 6B for CP and OC. There are another three DArT markers on the same chromosome (wPt-2689, wPt-7777 and wPt-6994), which belong to a single contig with 1.4 % nucleotide diversity, that share the same genetic position on the two genetic maps. Another example is given by the markers wPt-0841 and wPt-9133 (0.0 % nucleotide diversity) on chromosome 7B for CP and OC (Online Resource 3). In conclusion, the detailed analysis of all of the redundant clones indicates that they are positioned in the regions characterized by DArT clusters.

A total of 182 DArT clones that putatively correspond to expressed sequences are positioned in the three durum wheat maps described in this study. Surprisingly, also a marker that corresponds to sequences with a high percentage of identity with viral or bacterial proteins, wPt-8090, was mapped on chromosome 1A for CP (Online Resource 3). Regarding the most represented functional classes, 27 mapped markers correspond to transposable element-related sequences, 29 to genes that code for putative NBS–LRR proteins, and 11 for putative protein kinases. Their map positions are indicated in Fig. 4 (for chromosome 6B) and in Online Resource 3. Transposable element-related sequences map across all of the chromosomes, except chromosomes 1A, 2A, 5A, 4B and 5B. Chromosome 3B shows the highest number of these clones: 5 for OC and 4 for CN. There are contigs of DArT clones that correspond to transposable element-related sequences on chromosomes 3B (wPt-0990/wPt-6000 for CP and CN; wPt-3046/wPt-6216 for OC; wPt-2698/wPt-9049 for CN), 6B (wPt-6994/wPt-7777/wPt-2689 for CP and OC) and 7B (wPt-8919/wPt-4025 for CP). Clones similar to NBS–LRR genes are distributed on all of the chromosomes, except chromosomes 1A, 3A, 5A and 4B. These clones are more numerous on chromosome 4A for OC and on chromosome 2B for CP. There is a contig with clones that putatively correspond to NBS–LRR genes on chromosome 4A for OC (wPt-3729/wPt-1007); another contig with clones that correspond to putative protein kinases is on chromosome 3B for CP (wPt-0302/wPt-3536).

DArT markers as a suitable tool for collinearity analysis

The availability of DArT sequences offers an opportunity to expand the collinearity analysis across grass genomes, moving from the wheat genetic maps. The sequences that correspond to DArT markers mapped in the present study were used as queries in BLASTX searches, to identify segments of collinearity with the Brachypodium and rice genomes (http://www.phytozome.net/). In Online Resource 4, the results of the collinearity analysis are presented for all of the DArT markers and EST-SSRs mapped in the present study and for which a significant match was found on the Brachypodium and rice genomes. Syntenic regions were identified on durum wheat chromosomes 2A, 4A, 5A, 6A, 7A, 1B, 2B, 3B, 5B, 6B and 7B. Overall, the wheat genomic regions identified show correspondence only with the Brachypodium genome. In some cases, the correspondence is maintained also with the rice genome. On linkage group 5A-1 of CP, correspondence was found across three markers mapped on durum wheat (Xmag3794, wPt-2543 and Xmag4060), and regions on chromosomes 4 and 9 of Brachypodium and rice, respectively.

As resistance genes are of great importance in plant breeding, and as they often occur in clusters, regions that contained more than one DArT marker that corresponded to NBS–LRR genes were used to determine whether clusters of resistance genes can be identified in syntenic regions. Interesting results were found for three regions on chromosomes 4A (OC), 7B (OC) and 6B (CP and OC) (Online Resource 4). On chromosome 4A, 14 markers were considered (13 DArT and 1 EST-SSR) in a region spanning 20.1 cM. The best hits for the rice genome mapped to 9 different chromosomes, and therefore it was not possible to identify a well-defined collinearity segment, while there is a syntenic region on Brachypodium chromosome 1 (7 markers).

Further correspondence was found between Brachypodium chromosome 1 and a region of durum wheat chromosome 7B (linkage group 7B-2, OC), spanning 77.6 cM. Sixteen markers were considered in this case (11 DArT and 5 EST-SSR), 10 of which identified the best hit in the Brachypodium sequences on chromosome 1, while there were also hits on chromosomes 2, 3 and 5 for five markers. No similar sequences were found on the genome of Brachypodium for the marker wPt-3004. Six wheat markers showed correspondence with genes on chromosome 6 of rice.

Finally, considering the OC and CP maps, a collinear segment was identified in chromosome 3 of Brachypodium with respect to a region spanning about 27 cM on chromosome 6B (8 DArT markers) (Fig. 5). Two DArT markers that correspond to NBS–LRR genes (wPt-1437 and wPt-1725) are positioned at less than 2 cM apart for CP and OC, and might therefore indicate a cluster of NBS–LRR genes. For both of these sequences, the best hit in the Brachypodium genome was sequence Bradi3g00890, even if this showed different similarity levels, and there is another Brachypodium NBS–LRR sequence in the same region (Fig. 5). Moreover, the sequences that correspond to the wPt-1437 and wPt-1725 markers are similar to two NBS–LRR genes that were mapped on rice chromosome 4. When the genomic region between these two sequences was investigated, another two genes belonging to the same family were found, with all of these showing high similarity with respect to the durum wheat clones (E value < E-30).

Fig. 5
figure 5

Partial genetic map of the region on chromosome 6B between the wPt-6154 and wPt-1725 markers and correspondence with putative syntenic loci on Brachypodium chromosome 3 and rice chromosome 4

Discussion

DArT markers are widely used for different kinds of studies in many plant species. For years, they have been used as anonymous genomic markers, with the clear advantage that no previous knowledge of the sequence is needed to carry out the genotypic analysis. Nevertheless, the acquisition of the sequence knowledge of DArT markers provides functional meaning to these markers, which thus become a useful tool for many applications, such as collinearity studies, fine mapping of loci of interest, and identification of candidate genes in association mapping.

In the present study, we report three durum wheat genetic maps in which DArT and PCR-based markers were integrated, providing a contribution to the wheat genetic mapping effort. In particular, 158 DArT markers were never mapped previously in the wheat genome. As the sequences of 2,000 wheat DArT clones were recently made available by Triticarte (http://www.diversityarrays.com/sequences), we carried out a detailed characterization of the sequences of these clones, and, for the sequenced DArT markers herein genetically mapped, the molecular features were studied in relation to their genetic positions.

General agreement was observed across these three maps for the marker orders and relative distances, and good correlations were also found when the three maps were compared with the bread and durum wheat maps available in the literature (for details, see Online Resource 5). Small differences across the populations can be explained, including mis-scored alleles, by genetic factors, such as undetected rearrangements and differences in recombination rates for the same genomic regions in different segregating populations. A common feature of many wheat maps is the nonuniform distribution of molecular markers in the genome, and clusters of markers have often been reported (Somers et al. 2004; Liu et al. 2005; Chu et al. 2008). DArT markers have been found to cluster in particular regions, more frequently than microsatellites, as also reported in the present study. An example here is a region of about 20 cM in the distal part of the short arm of chromosome 6B for both CP and OC (Fig. 4). The clustering in distal regions might reflect the tendency of PstI-based markers towards hypomethylated telomeric chromosome regions (Moore 2000). A similar tendency was seen for PstI-based AFLP markers in tetraploid wheat (Peng et al. 2000).

Nevertheless, DArT clusters might also be explained by a high level of redundancy in the genomic representation (Semagn et al. 2006). James et al. (2008) analyzed 1,349 DArT markers in two lineages of seed-free land plants, and sequenced 148 of them. They showed that 10.7 % of the Asplenium and 16.2 % of the Garovaglia DArT markers shared over 99 % identity to other DArT markers at the nucleotide sequence level. Twenty-three percent of a set of 431 DArT clones were found to be redundant in Solanum lycopersicum (Van Schalkwyk et al. 2011). The present study provides quantification of the similarity levels across 2,000 DArT sequences: 24.3 % of the DArTs markers were grouped in contigs, and nearly 15 % of these sequences were grouped in contigs with <2 % mismatches. The mapping of some of the DArT markers grouped in contigs showed clearly that a proportion of DArT clusters can be explained by a set of clones that are characterized by nearly identical sequences, even if the markers that share the same map position are not necessarily characterized by identical sequences. In light of these results, we can assume that the redundancy of DArT markers can be explained by clones that are derived from the same genomic sequence, or alternatively by regions that were duplicated in tandem and that have accumulated mutations.

The analysis carried out in the present study provides intriguing insights into the functional natures of these DArT markers. Very limited information is available for this aspect. BLAST searches of 74 Asplenium and 74 Garovaglia DArT marker sequences recovered 18 (24 %) and 14 (19 %) significant alignments, respectively, and included known mitochondrial, chloroplast and nuclear loci, as well as proteins and predicted expressed genes, without particularly over-represented of any gene families (James et al. 2008). Sequences of 384 sugar-cane DArT clones have revealed that one-third of these markers came from the transcribed portion of the sugar-cane genome (Heller-Uszynska et al. 2011). Tinker et al. (2009) analyzed 1,700 nonredundant sequences of DArT clones developed in oat, and about 40 % of these showed highly significant BLAST similarities to gene sequences in public databases. In the present study, 64 % of the sequences match with protein or nucleotide sequences available in public databases. Our search method was based on the use of the DArT sequence and of the most similar Chinese spring genomic clone, when available, as queries. This allowed us to carry out a more detailed analysis, and to find putative functions for a higher number of DArT clones than would be possible using only the sequence itself as a query.

An intriguing result of the sequence analysis performed in the present study is that a high proportion of the DArT clones correspond to sequences related to disease resistance in plants, and in particular to NBS–LRR resistance genes and protein kinases, many of which containing an LRR domain. Resistance (R) genes are numerous in the plant genome, and they often occur in clusters at specific loci following gene duplication and amplification events (Lehmann 2002). Around 600 R gene loci were identified by Shang et al. (2009) in a genome-wide comparison of the major class of R genes between the indica and japonica rice varieties. The numerousness of this gene family, and more than this, the polymorphism level of its members, can explain in part the high ratio of LRR genes identified among the DArT clones. In particular, diversifying selection and sequence exchanges have been detected across these genes (Chen et al. 2010). Moreover, the presence of a codon for leucine (CUG) in the restriction site of the PstI enzyme (5′-CTGCAG-3′) might also account for the high probability of PstI cutting sequences containing the LRR domain.

Knowledge of the map positions of many NBS–LRR genes is of great importance for genetic studies that are aimed at the identification and fine mapping of resistance determinants. Forty DArT sequences that correspond directly or indirectly to NBS–LRR genes and protein kinases were mapped in the present study, and many of these are positioned in regions where resistance genes have been identified previously. As an example, NBS–LRR sequences were found in the region of chromosome 7B near the marker Xgwm146 for OC, where genes for resistance to leaf rust have been mapped in bread and durum wheat (Herrera-Foessel et al. 2008; Maccaferri et al. 2008; Marone et al. 2009). Furthermore, DArT markers that correspond to NBS–LRR genes were found as peak markers of a major QTL for resistance to SBCMV on chromosome 2B (wPt-1601) in the durum wheat population Meridiano × Claudio (Maccaferri et al. 2011b), and of minor QTLs in the CN population on chromosomes 3B (wPt-1191) and 7B (wPt-2883) (Russo et al. 2011).

Other evidence is available in the literature relating to the involvement of some DArT markers that correspond to NBS–LRR genes in plant disease resistance. The marker wPt-8460 on chromosome 2B was shown to be significantly associated with stem rust resistance by Yu et al. (2011), based on the association mapping. No sequence is available for this marker; however, it is positioned at 3 cM from the marker wPt-0189 in CP, which corresponds to an NBS–LRR gene. In the same study, Yu et al. (2011) showed significant association between stem rust resistance and the marker wPt-1560, positioned on chromosome 1B. Based on the similarity search carried out in the present study, this marker corresponds to a protein kinase with an LRR domain.

The collinearity analysis carried out with all of the sequenced DArT markers and EST-SSRs mapped here identified many regions of the durum wheat genome showing syntenic relations with rice and Brachypodium genomes. Moreover, pairs of DArT markers corresponding to R genes were identified in the genetic maps presented in the present study, and collinearity searches led to the confirmation of NBS–LRR clusters in syntenic regions of rice and Brachypodium, despite the fast rate of resistance gene evolution. The importance of establishing syntenic relations across genomes relies in the opportunity to use information derived from sequenced genomes for direct gene isolation in wheat. Even if the gene is not present at the orthologous position in the sequenced genome, good conservation level in the flanking genes can allow molecular markers to be developed that can be used to saturate the region in wheat and can be useful in marker-assisted selection programs (Quraishi et al. 2011). The results of the present study clearly show that DArT markers positioned up to now in single genetic and integrated maps used for linkage and association mapping of QTL, can allow to find more extended syntenic relationships between wheat and cereal model genomes, opening the way to faster identification of candidate genes for traits of interest.

In conclusion, a large number of studies have been published in which DArT markers have been used for mapping or genetic diversity analysis in wheat and many other species. The knowledge of the functional meaning of these widespread markers will provide a very useful tool for the identification of candidate genes for traits under investigation. The high representation of R genes across DArT sequences makes these markers a useful tool, in particular for studies of the genetic determinants of disease resistance in plants.