Introduction

The highly polymorphic barley Bmy1 gene encoding endosperm specific β-amylase provides an excellent model for comparative genomic analysis of the allelic variants and evaluation of the mechanisms of the gene structural rearrangements and their eventual functional significance. Four allelic forms of β-amylase exhibit increasing levels of thermostability in the range of Sd1, Sd2L, Sd2H, and Sd3 alleles (Eglinton et al. 1998). Structural gene Bmy1 consists of seven exons and six introns (Yoshigi et al. 1995). The gene is located on the long arm of chromosome 4H (Kreis et al. 1987) and is characterized by a high level of polymorphisms related to the thermostability and kinetic properties of the enzyme (Erkkilä et al. 1998; Erkkilä and Ahokas 2001; Kaneko et al. 2000; Ma et al. 2001; Paris et al. 2002). To distinguish the Bmy1 alleles, Paris et al. (2002) proposed using two non-synonymous single nucleotide polymorphisms (SNPs) of the coding region of the gene (cSNPs) linked to the enzyme thermostability (G496 → C and C698 → T in exon III and exon IV correspondingly). Despite only two cSNPs in sight, this approach was successfully used in several genotyping studies (Polakova et al. 2003; Malisheva et al. 2004; Sjakste and Röder 2004). Discovery of synonymous cSNP G702 → A enabled Sd1 alleles to be subdivided into two subgroups (Polakova et al. 2003).

Comparison of the enzymatic activity of different allelic forms suggested that rearrangements of the promoter, intron II and especially intron III regions of the structural gene could be associated with the β-amylase properties together with amino acid substitutions (Erkkilä and Ahokas 2001). Actually, sequences of four Bmy1 allelic variants are publicly accessible including genes of Haruna Nijo (GenBank D4999), Adorra (GenBank AF061203), Hordeum spontaneum strain PI 296897 (GenBank AF061204), and Finnish landrace line HA52 (GenBank AJ301645). Simple alignment of the mentioned genomic sequences suggests existence of numerous polymorphic loci in coding and non-coding regions of the gene. Some of them could influence enzymatic properties or/and could be used for more precise variety discrimination.

Interesting data were accumulated about the intron III polymorphisms. A 126 bp insertion/deletion event (indel) in the 5′ region is associated with allelic variants of the gene encoding enzymes of low or high thermostability correspondingly (Erkkilä and Ahokas 2001). Genotyping of the 3′ region of the intron in 55 North European barley varieties revealed the linkage between a 1 + 6 bp indel (S/L allele) and C698 → T (V233A) allelic variations of exon IV (Sjakste and Röder 2004) responsible for the highest and the lowest thermostability of the enzyme correspondingly (Paris et al. 2002). High intron III microsatellite (MS) length polymorphism was identified in varieties from all over the world (Sjakste and Röder 2004; Malisheva et al. 2004). However, no significant correlations were revealed between MS size and S/C698 genotype (Sjakste and Röder 2004; Malisheva et al. 2004). At that time, we described several linkage blocks on three polymorphic events (1 + 6 bp indel, MS size, and C698 → T variation) that were transmitted through the generations of different independent pedigrees (Sjakste and Röder 2004). Our analysis suggested the existence of certain associations between MS locus variability and other polymorphisms of the structural gene. Comparative analysis of the publicly available genomic sequences of the four allelic variants could not reveal the details of these associations due to the lack of data.

Therefore, the first goal of this work was to obtain more details about Bmy1 intron III polymorphisms in different barley varieties. This task was achieved by extending the sequencing data using a set of Latvian barley cultivars. Secondly, we tried to develop approaches for more precise haplotype classification and characterization of inter- and intrahaplotype variability. Finally, we made an attempt to use intron III sequences of different haplotypes as a model system for comparative computational evaluation of the structural and eventual functional reorganization of the non-coding gene region.

Materials and methods

Detection and genotyping of polymorphisms

The set of 21 spring barley accessions used in the polymorphisms detection and genotyping study was composed of 20 Latvian commercial varieties and a well-known Danish variety Maja (Table 1). Genotypes included in the set were previously analyzed on MS or/and Bmy1 alleles’ genetic diversity and inheritance through the generations of known pedigrees (Sjakste et al. 2003; Sjakste and Röder 2004). The seeds of all accessions studied were obtained from the Latvian Gene Bank of Cultivated Plants in Salaspils, Latvia. Genomic DNA was extracted according to previously described procedures (Plaschke et al. 1995) using young leaves of one plant of each accession per sample. Two or three individual DNA samples from each accession or bulk DNA from ten accessions was used in further analysis.

Table 1 Description of the accessions including year of release, origin/breeding station, immediate pedigree, form of spike, β-amylase intron III haplotype, and details on MS polymorphism

The intron III region was screened for detection of polymorphisms by direct sequencing of the amplified fragment of the gene. The strategy of the amplification and sequencing is presented in Fig. 1. Primer design was performed by the Primer 3.0 program using the highly conserved regions from exon III and exon IV as well as from intron III generated from the alignment of database available genomic sequences of four barley accessions: Haruna Nijo (GenBank D4999), Adorra (GenBank AF061203), Hordeum vulgare subsp. spontaneum NPGS PI 296897 (GenBank AF061204), and the Finnish landrace line HA52 (GenBank AJ301645).

Fig. 1
figure 1

Scheme of the intron III region of the Bmy1 gene and strategy of the experiment. The relative positions of some polymorphisms are indicated according to the genomic sequence of accession Haruna Nijo (GenBank D4999)

PCR resulted in amplification of the region encompassing the whole intron III (primers Bmy1 AF: 5′-tgacagatgtatgccgatta-3′ and Bmy1 AR: 5′-ttgttggagtaccatgcaag-3′) or intron 3′ region encompassing the MS portion (Bmy1 S1F: 5′-cggataccatgtaaaactgcac-3′ and Bmy1 S2R: 5′-tttttctgtaatggcaatggt-3′). Amplification was followed by direct sequencing with the same primers plus two others specific for the 5′ region of intron III (Bmy1 S3F: 5′-ggcgtggtaaacctgacttg-3′ and Bmy1 S4R: 5′-tcaggtttaccacgccttg-3′) in both forward and reverse orientation of each PCR product (Fig. 1). Two individual DNA samples were sequenced per each accession. The 126 bp indel in the 5′ region was additionally genotyped on the size difference of the amplified products (primers Bmy1 AF and Bmy1 S4R) using agarose gel electrophoresis. Several DNA samples or/and bulk DNA were used in this case for each accession analyzed.

Fluorescence was measured using an ABI Prism 7000 sequence detection system and analyzed with the ABI PRISM 7000 SDS software version 1.0.

Bioinformatics analysis

Alignments of the previously published Bmy1 intron III genomic sequences and sequences produced in this study were generated by the multiple alignment service ClustalW (http://www.clustalw.genome.jp/). Identification of transposable elements and transcription factor binding sites (TFBSs) was performed using Blast against TREP (http://www.wheat.pw.usda.gov/ITMI/Repeats/blastrepeats3.html) and Genomatix software (DiAlign TF, Release 3.1, and MatInspector, Release 7.4 tools, at http://www.genomatix.de/), respectively.

Results

Genotyping results

Bmy1 intron III sequences of 20 Latvian (Table 1) and 1 Danish variety Maja were compared by alignment with the corresponding regions of 4 allelic variants of the Bmy1 gene (Haruna Nijo, Adorra, H. spontaneum strain PI 296897, and Finnish landrace line HA52). Data on polymorphic loci revealed are summarized in Tables 1 and 2 and Fig. 2.

Table 2 Haplotypes and eventual functional significance of polymorphisms of the intron III of Bmy1 gene. Function of variation was scored as generation (G) or loss (L) of TFBS. Numbering of polymorphic loci is given according to the genomic sequence of cultivar Haruna Nijo (GenBank D49999). TFBSs family/matrix names is given according to Genomatix Matrix Family Library (Version 5.0)
Fig. 2
figure 2

β-Amylase intron III MS sequence alignment, formulas of the haplotype-specific MS motifs and the eventual TFBSs position. HN, HS, HA, and AD haplotypes are represented by the intronic sequences of the β-amylase gene of Haruna Nijo (D49999), H. vulgare subsp. spontaneum NPGS PI 296897 (AF061204), Finnish landrace line HA52 (AJ301645), and Adorra (AF061203) correspondingly. AB haplotype is represented by the intron III sequence obtained for Latvian cultivar Abava. The numbering of the MS position is indicated according to the genomic sequence of accession Haruna Nijo (D49999). SNPs determined the differences in MS motif compared to HN haplotype are underlined. Predicted TFBSs are shown by the individual lines of different types under the MS sequences. Description of TF is given in Table 4

All the previously described (Erkkilä and Ahokas 2001; Erkkilä et al. 1998; Sjakste and Röder 2004) polymorphic loci of the Bmy1 intron III were detected in our study. Data on the high level of MS length polymorphism obtained earlier for Latvian varieties (Sjakste and Röder 2004) were confirmed by the present results. Details on MS motif variations are reported for the first time (Table 1; Fig. 2).

Our study revealed two variable components of the repeated portion of the intron. Variable 5′-MS component of MS could be represented either by (TG) m (Haruna Nijo, H. spontaneum strain PI 296897, Maja, Latvijas Vietejie, and Abava-like Latvian accessions), or by the restructured motifs of Finnish landrace HA52, Adorra, and Adorra-like Latvian accessions (Table 1; Fig. 2). The 5′-(TG) m repeat was shown to be followed immediately by a (G) n repeat in genes of Maja and Latvijas Vietejie similarly to Haruna Nijo, and H. spontaneum strain PI 296897. The 5′-TG-reach region is separated from the 3′-(G) n repeat by a TT motif in Finnish landrace HA 52, Adorra, and all Latvian accessions besides Latvijas Vietejie (Table 1; Fig. 2). A high level of the 3′-(G) n repeat polymorphism as well as several SNPs in the 5′- and the 3′-MS flanks were revealed in the accessions analyzed (Table 2; Fig. 2).

Summarizing the data on MS motif and length polymorphism, we describe here all variations of the region by five formulas (Fig. 2), and classify them as HN (Haruna Nijo-like), HS (H. spontaneum strain PI 296897-like), HA (Finnish landrace HA52-like), AB (Latvian cultivar Abava-like), and AD (Adorra-like) MS variants (Fig. 2).

Haplotype classification

All the 16 indels, 38 SNPs, 3 double SNPs (dSNPs), 1 fragment substitution (FR) and MS sequence motif [in contrast to the number of (TG) m and (G) n repeats] were used for the characterization of the haplotypes of the intron III of the Bmy1 gene. Five haplotypes were classified according to their MS motif as HN, HS, HA, AB, and AD haplotypes (Table 2; Figs. 23).

Fig. 3
figure 3

Haplotype-specific co-localization of the intron III polymorphisms, TFBSs and remnants of mobile elements. HN, HS, AB, HA, and AD haplotypes are represented by the same accessions as in Fig. 2. Numbering is indicated in base pairs according to the genomic sequences of the mentioned accessions, sequence of AB haplotype is numbered starting from the first G of the intron. Each TFBS site is shown as a symbol with inserted abbreviated name. Correspondence of abbreviated names to the full TF names is explained in Table 4. Predicted positions of the conservative (I–VII) and variable (VM1 and VM2) modules are indicated in square brackets. Three pairs of arrows compare the 126, 38, and 21 bp indel loci between different haplotypes. MS motif sequences are given in frames

The Danish variety Maja and the old Latvian cultivar Latvijas Vietejie turned out to have HN haplotype. Intron III sequences of both accessions differ from HN only in three SNPs (allele A1630 is common with HS, alleles T2020 and G2044 are common with HS, HA, AB, and AD), and a repeat number of the MS components that formed (TG)9 for Maja and (G)18 for Latvijas Vietejie in contrast to the (TG)10 and (G)16 for Haruna Nijo.

Eleven Latvian varieties (Table 1) were shown to have AD haplotype of the intron III with MS motif TG(G)2(TG)4TT(G) n of high level of the (G) n variability between cultivars in repeat number, ranging from n = 12 (Ansis, Balga, Kombainieris, Linga, Rasa) to n = 18 (Idumeja, Imula, and Priekulu 1, Table 1).

Intron III of the Bmy1 gene of other eight Latvian varieties studied were classified as novel AB haplotype (Table 1; Fig. 2) with MS motif (TG)6TT(G) n . As for the AD haplotype, the MS component (G) n was revealed as highly polymorphic in repeat number, ranging from n = 14 (Ilga) till n = 18 (Agra). Absence of three insertions A1310-126 bp-T1311, A1943AT1944, and A2013AT2014, as well as the presence of alleles A1318, T1700, C1850, G1882, T1967, A2219 are the AB features common with HN, HS, HA haplotypes in contrast to AD. Alternatively, alleles A1384 and G1605 are common with HS, HA, and AD haplotype in contrast to HN. The presence of three insertions A1389-38 bp-T1426, C1862CA1864, A2003TG2005, and alleles G2002 and G2055, are similar to HN, HA, and AD in contrast to HS. Absence of the 21-bp fragment G1881-21 bp-T1902 and three alleles T1350, G1505, and T1962 are features common only to HA in contrast to HN, HS, and AD haplotypes. Alternatively, indel variants A1674A1675, and T1691CG1693, and allele T2041 are features common with HN, HS, and AD, in contrast to HA. Thirty-one mutations including six indels (A1628CCT1631 → AT, A1757-11 bp-C1769 → AC, C1849C1850 → C-4 bp-C, G2071CT2073 → GT, A2176T2177 → AAT, and T2186T2187 → T-6 bp-T), one fragment substitution C1564ACATTCGTAT1574 → AAATGATATT, and twenty-four SNPs/dSNPs were revealed as identical between AB and both HA and AD haplotypes (Table 2; Fig. 3).

Interhaplotype variability

Table 3 summarizes the publicly available and our present data on interhaplotype variability of the Bmy1 allelic variants focusing on several exon III and exon IV cSNPs as well as on intron III polymorphisms. Five cSNP were taken into account including three synonymous and two non-synonymous substitutions previously used in Bmy1 allele classification (Paris et al. 2002) and in several genotyping studies (Polakova et al. 2003; Sjakste and Röder 2004; Malisheva et al. 2004). Four exonic haplotypes including G495/T666/T698/G702/A741, C495/T666/T698/A702/A741, C495/C666/C698/A702/G741, and G495/T666/C698/A702/G741 are considered to belong to Sd1, Sd2L, Sd2H, and Sd3 Bmy1 alleles correspondingly (Table 3). However, the newly described haplotypes G495/N666/T698/A702/N741 (Polakova et al. 2003) and N495/C666/C698/G702/G741 (Sjakste and Röder 2004) suggest more allelic variants of the gene. Exonic haplotype N495/T666/T698/G702/A741 belongs to accessions of both AB-like (varieties Abava and Agra) and to HA52 haplotypes of intron III. Alternatively, intronic HN and HS haplotypes belong to accessions of the same exonic haplotype C495/C666/C698/A702/G741.

Table 3 Summary of the previously published and present data on the interhaplotype variability of the Bmy1 allelic variants

Data presented in Table 3 illustrate the insufficiency of the Bmy1 allele discrimination based only on two cSNPs in positions 495 and 698, as well as the necessity to take into consideration the structural rearrangements in the non-coding gene regions.

Immediate pedigree analysis

Data on several pedigree stories of Latvian barley varieties (Table 1) permit analysis of transmittance of the intron III polymorphisms through the generations to be performed. For example, Balga and Linga both descended from the same cross Gunilla/KM 1192 and possess the same MS TG(G)2(TG)4TT(G)12 and whole intron III AD-like haplotype. Abava, as one of the immediate parents, passes its haplotype to Ruja. Three of the Latvian varieties analyzed including Abava and Ilga of AB haplotype, and Imula of AD haplotype are involved in the Idumeja (AD haplotype) pedigree (Fig. 4). The scheme illustrates the transmittance of AD-like haplotype from Imula to Idumeja. The presence of 126 bp insertion as the main AD haplotype feature allows us to speculate that Akka also has AD haplotype.

Fig. 4
figure 4

Pedigree of the Latvian barley cultivar Idumeja. Bmy1 intron III haplotype (Hapl.) together with allelic variants of indels are indicated according to the present data for cultivars Abava, Ilga, Imula, and Idumeja. Data on MS length, 1 + 6 bp indel, and C698 → T polymorphism of all the pedigree participants were published previously (adapted from Sjakste and Röder 2004), information on 126 bp insertion of the intron III of the Bmy1 gene of cultivar Akka is present for the first time. The 126 bp insertion and the 21 bp deletion in AD haplotype are marked in gray to underline the transmittance of the polymorphic loci through generations

Functional analysis

Comparative analysis was performed to identify both conservative and variable functional elements in the intron III of the Bmy1 gene.

Consensus and conserved sequences

No polymorphisms were revealed in splice donor and branch sites of intron III. AD haplotype appeared to have one SNP A2219 → T in the splice acceptor site, unlike other haplotypes studied.

Several other highly conserved sequences of different length were derived from the alignment (see Table 3; Fig. 3). Those longer than 30 bp included a 53 bp sequence in the 5′ flank of the intron upstream the site of the 126 bp indel, regions from A1427 till A1504, from T1631 till A1674, from T1701 till A1743, and from T2187 till A2218. Only two SNPs were revealed in the 73 bp region from T1311 till A1383 (Table 2).

Analysis on the TFBSs

Sequences of all five haplotypes studied were analyzed for the eventual presence of TFBS using the Genomatix software. Binding sites (BSs) for 37 plant transcription factors (TFs) of 27 families were taken into account according to the chosen parameters (core/matrix similarity≥0.85/0.85). Comments on TFs family/matrix information and their abbreviations (used in Fig. 3) are given in Table 4. The family/matrix name and its associated abbreviation (MYBST1.01 = M2, for example) are sometimes used in the text to avoid misunderstanding.

Table 4 Description of the TFs which possess BSs in intron III of the Bmy1 gene

Three of the predicted TFs belong each to the W Box family (WBXF) and the MYB-like proteins (MYBL). Two TFs represent each the plant GT-box elements (GTBX), DNA binding with one finger (DOFF), G-box/C-box BZIP proteins (GBOX), MYB proteins with single DNA-binding domains (MYBS), Myc-like binding factors (MYCL), and TATA-binding protein factor (TBPF). Other families are represented only by one member (circadian control factor CCAF/CCA1.01, for example).

All TFBSs revealed could be divided into three groups. Group 1 is formed by BSs found exclusively in the conservative non-polymorphic DNA sequences (conservative TFBSs). Group 2 unites TFBSs that arise or disappear due to polymorphisms (variable TFBSs), and group 3 – BSs found both in conservative and variable regions of the intron (dual TFBSs).

The locations of the conservative, dual and some variable TFBSs within the intron III sequence of a particular haplotype are illustrated in Fig. 3. All other variable TFBSs (generated or lost as a result of sequence variations) are indicated in Table 2.

Conservative TFBSs

Eventually, eight types of conservative TFBSs could be found in portions of the intron III invariable between haplotypes (Table 4; Fig. 3) including TFBSs for the constitutively expressed BZIP protein responsible for light-induced nuclear import (CPRF2.01 = G2), Myb and Myb-like proteins (MYBST1.01 = M2, MYBPH3.01 = M4, and two sites for ATMYB77.01 = M3), soybean embryo factor 4 (SEF4.01 = SE), two sites for protein that binds SP8a and SP8b sequences of sporamin and β-amylase genes (SP8BF.01 = SP), and proteins of the W Box family (WRKY.01 = W2 and ZAP1.01 = W3). Besides the mentioned, conservative TFBS independent from interhaplotype variability of the region was predicted in the MS portion of intron for RAV1-5.01 (Fig. 2). Different TFs could interact with the same conservative regions of intron producing conservative modules consisting of several elements. Modules I and II were predicted to consist each of two conservative TFs (G2 and W3, and M3 and W2 correspondingly). The highly conservative 3′-end of the intron possesses binding capacity to three TFs of module III (SE, M4, and SP, Fig. 3).

Ten eventual TFBSs were evaluated as dual ones (Table 4). Haplotype independent dual TFBSs are illustrated in Fig. 3 including BSs of the circadian clock associated control factor 1 (CCA1.01 = C) and an ethylene insensitive factor (TEIL.01 = E) that flanks the 126 bp indel locus from its 5′- and 3′-ends correspondingly. The intron 5′-flank possesses the binding capacity to module IV (BSs of ATCTA.01 = LR and ATML1.01 = L), module V (two dual TFBSs of RY.01 = LE and GATA.01 = I, accompanied by two conservative TFBSs of MYBST1.01 = M2 and ATMYB77.01 = M3), and to GAAA motif element involved in pollen-specific transcriptional activation (GAAA.01 = P). This binding capacity is present in all the five haplotypes. Further in the 3′ intron direction, conservative binding capacity was predicted for TFBSs of nodulin consensus sequence (NCS1.01 = N), plant TATA box (TATA.02 = T2), module VI (of two TATA-binding proteins), and module VII (GATA.01 = I and S1F.01 = B2).

Variation in the TFBSs pattern

The functional significance of all the polymorphisms revealed was analyzed by Genomatix software. Data were scored as generation or loss of TFBS sites as a result of sequence polymorphism, summarized in Table 2 and partly illustrated in Fig. 3. Among polymorphic events analyzed, at least 24 variations could change the binding capacity of the 10 dual and 18 variable TFBSs correspondingly (Table 4). Three indels of 126, 38, and 21 bp significantly changed the eventual pattern of TFBSs. The MITE element (126 bp insertion) adds six new BSs including three light responsive elements LR, two ethylene insensitive factors E and one storekeeper motif ST. Graphically, it looks like an expansion of LR + E modules in AD when compared to other haplotypes (Fig. 3). The 38 bp deletion is the feature of the HS haplotype and is responsible for the loss of TFBSs of the light response element (GAP.01 = GA), DOF factor PBF.01 = D1 (Fig. 4), and nodulin consensus sequence NCS1.01 = N (Table 2). The 21 bp insertion (present in AB and HA haplotypes) could generate additional TFBSs of both GT-box element (S1F.01 = B2) and MYB protein (OSMYBPH3.01 = M1) simultaneously with the loss of sequence affinity to the L1 box (ATML1.01 = L) and one of the plant TATA boxes (TATA.02 = T2). Deletion A1628CCT1631 and insertion A2176AT2177 results in the generation of new variable modules (VM1 and VM2 correspondingly) in AB, HA, and AD haplotypes (Fig. 3). Insertion A1674CA1675 results in eventual generation only in AB haplotype of the TFBS of the salt/drought responsible element ALFIN1.01 = SA.

In contrast to the mentioned mutations, the A1757-11 bp-C1769 deletion, C1849-4 bp-C1850 and T2186-6 bp-T2187 insertion does not seem to be followed by the changes in sequence affinity to TFs (Table 2) in AB, HA, and AD haplotypes compared with HN and HS.

Eventual function of the MS variations

As mentioned above, the binding capacity to RAV1-5.01 was predicted for MS of all haplotypes studied independently of the restructuring of this portion of the intron III. The binding capacity to other TFs depends on the spectrum of sequence variations (Table 2) and determines the individual pattern of TFBSs for each haplotype studied (Fig. 2). For example, variations T2001 → G leads to the elimination of BSs with type-B response regulator (ARR10.01) of GARP-family and two inducers of CBF expression (ICE.01) in AB and HA haplotypes compared with the HN and HS variants. The presence of the TT motif in the border between (TG) m and (G) n components of MS eliminate binding capacity to the salt/drought responsible zinc-finger protein (ALFIN1.01) in favor of the CAAT box (CAAT.01), and MYB plant protein (GAMYB.01) in AB, HA, and AD haplotypes in comparison with HN and HS sequences. The 3′-flank of the HN haplotype possesses binding capacity to both salt/drought responsible zinc-finger protein (ALFIN1.01) and the ethylene insensitive factor TEIL.01. ALFIN1.01-site disappears in AB, HA, and AD haplotypes due to T2001 → G substitution. Restructuring of the 3′ MS flank in HS haplotype leads to the loss of both ALFIN1.01 and TEIL.01 BSs and the appearance of the binding capacity to GAMYB.01 and MYCRS.01.

Variations in the number of (TG) m and (G) n repeats do not produce significant differences in the affinity of MS sequence to the plant TFBSs according to the Genomatix database. However, interesting information was obtained from alignment with vertebrate TFBSs (not shown). The number of (G) n repeats (accessions of HN, AB, and AD haplotypes) could change the spectrum and/or number of BSs with different zinc-finger proteins. In contrast, intrahaplotype variations in the number of TG-repeats seems not to influence the TFs binding capacity of the region (Haruna Nijo, Maja, Latvijas Vietejie were compared).

Mobile element presentation

The position of the MITE element (126 bp indel) and the remnants of different retrotransposons within the intron III are illustrated in Fig. 3 for all the five haplotypes studied. The presence of the MITE element is a characteristic of AD haplotype only. The remnants of the retrotransposones of the several classes were identified within highly conserved regions of the intron III. Their sequence motifs and positions are identical between all haplotypes studied (remnants of the LTR gypsy position in 5′- and CACTA in 3′-flank of the intron sequence correspondingly). Spectrum and positions of remnants of other mobile elements were revealed as variable between haplotypes studied.

Discussion

Description of novel haplotypes

One of the goals of the current study was to extend data on polymorphisms of the intron III of the Bmy1 structural gene. Besides the four haplotypes reported earlier (Haruna Nijo, Adorra, Finnish landrace HA52, and H. spontaneum), we have described a new AB-like haplotype of intron III. The absence of the 126 bp MITE element, three indels, nine SNPs, and specific MS motif belongs to polymorphisms that discriminate AB and AD haplotypes. Simultaneously, 44 other polymorphisms analyzed were revealed as shared between AB and AD haplotypes. These include A2176T2177 → AAT, and T2186T2187 → T-6 bp-T insertions in 3′ end sequences of both haplotypes previously reported as 1 + 6 bp insertion or L-allele linked to the T698 mutation of exon IV (Sjakste and Röder 2004). Allelic variants of other polymorphic loci of AB haplotype were found to be common with HN, HA, and HS haplotypes in different combinations. All together 59 polymorphisms including MS variations produce the unique Bmy1 intron III AB haplotype. It should be considered that even more alleles and haplotypes could probably be detected by the analysis of a different collection of genotypes from over the world.

Haplotype classification

Paris et al. in 2002 proposed the high-throughput genotyping system for identification of the Bmy1 allele taking into consideration two non-synonymous cSNPs in positions 495 and 698 of the Bmy1 gene. This approach was successfully applied in several genotyping studies (Polakova et al. 2003; Sjakste and Röder 2004; Malisheva et al. 2004). However, identification and genotyping of additional cSNPs (Polakova et al. 2003; Sjakste and Röder 2004) revealed insufficiency of classification based only on two polymorphic loci. Polakova et al. (2003) subdivided the Sd1 allele in Sd1 and Sd1b. Exon IV haplotype C666/C698/G702/G741 of variety Drost. Sjakste and Röder (2004) suggest an additional Bmy1 allelic variant different from SD2H. Assuming the present data on intron III interhaplotype polymorphism, we confirm here that patterns T666/T698/A702/A741 and C666/C698/A702/G741 (accessions Imula, Linga and Latvijas Vietejie; Sjakste and Röder 2004) belong to AD and HN haplotypes of intron III correspondingly. We show here that patterns T666/T698/G702/A741 could belong to both HA and AB haplotypes, as pattern C495/C666/C698/A702/G741 is the trait of both HN and HS haplotypes. Discrimination of the Bmy1 alleles of the similar exonic haplotypes could be achieved by taking into consideration intron III polymorphisms including the MS sequence motif. Our results clearly demonstrate the importance of information on the structural rearrangements in non-coding gene regions as well as in coding ones for allele identification.

Barley β-amylase is one of the key enzymes involved in the determination of malting quality. Structural reorganizations of the gene encoding the enzyme were shown to be responsible for kinetic properties, affinity, and thermostability of the gene product (Eglinton et al. 1998; Erkkilä and Ahokas 2001; Kaneko et al. 2000) that could contribute to barley malting quality. However, it is difficult to establish an association between particular structural variations of the gene-candidate and phenotypic manifestation of the polygenic trait. Therefore, we do not speculate here on the possibility of applying our findings directly to the evaluation of malting quality. On the contrary, we would like to demonstrate here the complexity and unreliability of such direct correlations. For example, the same G495/T698/G702 exonic haplotype belongs to Czech cultivars Olbram and Tolar of good and poor malting quality, respectively (Polakova et al. 2003), to Finnish landrace HA52 of high malting quality (Erkkilä and Ahokas 2001), and to Latvian accessions of AB haplotype (low to middle malting quality). Varieties Ansis, Rasa, and Sencis of AD haplotypes were the only accessions characterized as having middle to high malting quality in contrast to all other Latvian accessions of AD haplotype (M. Bleidere, Stende breeding station, Latvia, personal communication). Among the Latvian varieties analyzed, only the old local variety Latvijas Vietejie is considered to have high malting quality; however, this variety disappeared from the market at the beginning of the last century and hence was not available for quality control testing.

It was reported (Erkkilä and Ahokas 2001) that 126 bp indel in the 5′ region of intron III is associated with allelic variants of the gene encoding enzymes of low or high thermostability correspondingly. Enzyme thermostability, in turn, could influence malting quality (Eglinton et al. 1998). We confirm here that the presence of 126 bp MITE element in Bmy1 intron III appears to be a feature indicating low to middle malting quality of barley genotypes (Adorra, Latvian varieties of AD haplotypes). However, the absence of 126 bp MITE element is not directly associated with high malting quality as it is observed in barley genotypes of both high malting quality (Haruna Nijo, Maja, Latvijas Vietejie, Finnish landrace HA52) and low to middle malting quality (Latvian barleys of AB haplotype).

In this respect we conclude that further studies on association of haplotypes to trait and quality remain of great topicality and importance for fundamental research and practical application.

Interhaplotype variability

Each of the intron III haplotypes analyzed possesses a unique spectrum of allelic variants of the polymorphic loci. Simultaneously, some polymorphisms are common for several haplotypes. The MS sequence motif and its flanking sequences are among the main features that enable the discrimination of all the haplotypes analyzed. Sequence polymorphisms of the 3′ end of MS in contrast to MS repeat number were used earlier to differentiate two α-amylase gene families in crustaceans (Van Wormhoudt and Sellos 2003). The high variability of MS and its flanking sequences may be the common mechanism for the evolution of at least some amylase genes in different taxons.

It seems that interhaplotype variability of intron III could be the consequence of the sequence structural reorganization during both natural evolution (HS compared to cultivated barley) and the breeding process. Modern Latvian barley accessions analyzed belong to only two of the described intron III haplotypes namely, AB and AD haplotypes. Pedigree analysis revealed the stability of the transmittance of this genomic portion through generations in different crosses in approximately 70 years of breeding in Latvia. It should be underlined that many fragments of the intron III sequence of different length are conserved between all haplotypes studied, and may have more ancient origin compared to polymorphic loci.

Intrahaplotype variability and validity of MS markers

A complex MS repeat is present in intron III of the Bmy1 gene. Five haplotypes analyzed possess a different sequence motif in this repeated portion. Variations in the repeat number of both (TG) m or/and (G) n MS components were shown in the present study for HN, AB, and AD haplotypes. Such variations could result in an apparent similarity in the size of the MS portion of the different haplotypes when it is determined as MS length polymorphism. This can give rise to mimicry of different haplotypes when the MS fragment size analysis is exclusively used for genotyping studies. As shown here, approximately half of the Latvian varieties studied belong to the AB-like haplotype, others to the AD haplotype. However, the spectrum of sizes of the MS sequences was similar in both haplotypes. Our previous results (Sjakste and Röder 2004) on the absence of the linkage between MS length polymorphism and the C698 mutation of exon IV is confirmed by the present study. Indeed, not the number of repeats but the MS motif itself is the main trait of the locus that correlates with mutations in the coding region of the gene. Therefore, Bmy1 MS length polymorphism cannot be used for association studies but should be considered as the trait of the intrahaplotype variability of repeated portion of the gene. High level of intrahaplotype variability in the number of repeats was shown previously for MS of intron VI of the α-amylase gene of a crustacean Litopenaeus vannamei (Van Wormhoudt and Sellos 2003). Taking into account our own and published data, we stress the necessity to test the applicability of MS marker size as sole parameter of genomic region polymorphism in every particular association and linkage disequilibrium study.

Eventual functional significance of the intronic rearrangements

In recent years it has become clear that introns are functionally active participants of gene and genome function. They code for regulatory elements of autocatalytic activity, alternative splicing, gene transcription, and sponsors of mobility including endonucleases and reverse transcriptases (Lewin 2004). This means that not only conserved sequences within introns possess functionality as regulatory elements, but variable regions could also be adaptively active. However, the discovery of the functional role of intronic regulatory elements (IREs; Yutzey et al. 1989) is still the task for future investigations.

In plants, IREs, which could be involved in regulation of gene transcription, have been reported for rice (Fiume et al. 2004), Lotus japonicus (Kapranov et al. 2001), Arabidopsis (Gazzani et al. 2003; Michaels et al. 2003; Sheldon et al. 2002; Wang et al. 2002). Large deletions within the first intron of the locus of vernalization VRN-1 are associated with spring growth habit in barley and wheat (Fu et al. 2005). The 126 bp deletion of the intron III of Bmy1 gene encoding barley endosperm specific β-amylase is associated with Sd2L allele of the gene encoding the enzyme of low-thermostability (Erkkilä et al. 1998). 1 + 6 bp indel in 3′ region of the same intron is linked with exon IV functional SNP (Sjakste and Röder 2004).

In the present paper we made an attempt to use intron III sequences of different haplotypes of the Bmy1 gene as a model system for comparative computational evaluation of the structural and eventual functional reorganization of the non-coding gene region. We show that highly conserved sequences within intron could possess binding capacity to regulatory elements of several families including plant circadian control factor, G/C BZIP proteins, GT-, I-, legumin- and W-boxes, light and pollen-specific elements, several MYB- and MYB-like proteins, L1BX, RAV5, SEF4, SPF1 TATA-BS, nodulin consensus sequence, ethylene insensitive factors. Six conservative modules consisting of two and more TFs were predicted for all haplotypes studied. Sequence variations could change the pattern of TFBSs with regulation factors of some mentioned TF families (besides RAV5, SEF4, SPF1), and other TF families including AHBP-, CCAAT-proteins, DOF-, GAP-, and MADS-boxes, MYC-like factors, salt/drought responsive element, and storekeeper motif. Several variable modules could be defined and addressed specifically to individual polymorphic loci. For example, insertion of the MITE element in the 5′ region of the intron results in the expansion of BSs to ATCTA-light responsible and ethylene insensitive factors in AD haplotype. In turn, 38 bp deletion in HS haplotype eliminates highly conserved in other haplotypes module of GAP- and DOF-proteins. The 21 bp insertion creates the additional site TATCCA to bind specific MYB protein involved in regulation of sugar metabolism (Lu et al. 2002).

Many of the TFs predicted in our analysis have been extensively studied, their biological functions span on the regulation of secondary metabolism, cellular morphogenesis, cell cycle, development, signal transduction, and disease resistance (Gubler et al. 1995; Gomez-Cadenas et al. 2001; Lu et al. 2002; Wullmout et al. 1998; Yanhui et al. 2006). Sequence variations could change affinity and/or quantity of TFBSs, change the pattern of functional regulatory elements and as a result, influence gene expression.

Taking into account the high levels of inter- and intrahaplotype variability demonstrated in the present study, intron III of the barley seed-specific β-amylase gene can be used as a perfect model for evaluation of the eventual functional significance of the conserved intron portions as well as for sequence rearrangements.