Grapevine leafroll-associated virus 3 (GLRaV-3) is the type species of the genus Ampelovirus (family Closteroviridae) [1]. It is an economically important virus that is known to only infect Vitis spp. and that has a negative impact on the wine and table grape industries worldwide. To date, there has been only one report that claims the complete nucleotide sequence of GLRaV-3, by Ling et al. in 2004 (isolate NY-1, AF037268) [2]. In that report the 5′ untranslated region (UTR) was found to be 158 nt in length. Here we report that the sequence of GLRaV-3, isolate GP18Footnote 1, has a 5′UTR of 737 nt. This extended UTR was found in all other South African isolates of GLRaV-3 that were analysed.

Grapevine material (Cabernet Sauvignon) was harvested in the Somerset West wine-producing region in South Africa from a monitored vineyard. RNA of isolate GP18 was extracted from a vine that displayed symptoms for the first time (Pietersen, pers comm). GP18 double-stranded RNA (dsRNA) was extracted from phloem tissue of wooded canes using an adapted cellulose (CF11) column method first described by Hu et al. [3]. RT-PCR was performed with primer sets designed to cover a large portion of the genome (nucleotides 1,835–17,905 of AF037268) in 10 overlapping clones (Fig. 1). Amplicons were cloned and sequenced and a consensus sequence was generated using BioEdit [4]. Primers were not included in the consensus sequence assembly of these overlapping sequences.

Fig. 1
figure 1

a Schematic representation of the genome organisation of GLRaV-3 GP18 (drawn to scale). b Lines indicate the regions cloned and sequenced by different techniques. Lines 1a2, 1a34, 1b, 2 + 3, 4, 5, 6, 7, 8 + 9, 10–12 are representative of the clones generated by RT-PCR to sequence the majority of the genome. “PolyA 5′” and “PolyA 3′” represent the clones generated using poly A tailing and “5′ RLM-RACE” show the area amplified using RLM-RACE. “Spanning RT-PCR” represents the area generated by RT-PCR to indicate that other isolates also has the extended 5′UTR. c Enlargement of the 5′ area also shows the start of the NY-1 sequence compared to the GP18 sequence

Poly(A) tailing was performed on dsRNA in an attempt to reach the 5′- and 3′- terminal ends (Fig. 1) [5]. The 3′-terminus of GP18 was found to be similar to that of the NY-1 isolate. However, we were unable to reach the reported 5′-end of the NY-1 isolate and consistently found amplicons that started at +50 nt. After further experimentation with PCR conditions, a range of amplicons were generated that extended beyond the NY-1 sequence’s 5′-end. A possible explanation for this inconsistency, using poly(A) tailing, might be the high uracil content in the 5′-region upstream of the +50 site in the NY-1 strain that leads to stretches of adenines on the negative strand, which could act as priming sites for the oligo(dT) primer and lead to fragments shorter than the true genomic size.

To determine the 5′ end of the GP18 genome, total RNA was extracted from grapevine phloem tissue and subjected to RNA ligase-mediated rapid amplification of cDNA Ends (RLM-RACE) using the FirstChoice® RLM-RACE kit (Ambion, USA) as per the manufacturer’s instructions. The amplicon generated for the 5′-end of the GP18 genome was larger than expected (Fig. 1). It was cloned and four clones sequenced. The 5′UTR was found to extend beyond the sequence reported for the NY-1 isolate [2]. The adapter ligation reaction was repeated on the same total RNA treated with calf intestine alkaline phosphatase (CIP) and tobacco acid pyrophosphatase (TAP), and a further five clones were sequenced. All nine clones showed the first 365 nucleotides of the NY-1 sequence and an additional 579 nucleotides upstream of the 5′-end.

The efficacy of RLM-RACE to determine the 5′-termini of multiple ssRNA viruses from total RNA in a single reaction was also investigated. RLM-RACE was performed on total RNA extracted from three grapevine plants (K1, K5, K6) that contained a mixture of viruses (GLRaV-2, -3 and grapevine rupestris stem pitting-associated virus (GRSPaV)) to determine their 5′-termini in the same reaction. Sequencing results indicated that GLRaV-2 (AY881628) and GRSPaV (AY881626) have the same 5′-ends as previously reported. However, GLRaV-3 showed the same extended 5′-end that was found in the GP18 isolate. The absence of similar extended 5′-regions for GLRaV-2 and GRSPaV confirms that the GP18 result is not an artefact of the RLM-RACE reaction.

Moreover, RT-PCR was used to amplify a fragment spanning the 5′-end of the NY-1 sequence for additional samples (K1, K2, GP16 and KK1; Fig. 1). A PCR product of 786 nt was amplified, and sequence results indicated that these four samples contained the same extended 5′UTR.

The complete genome of GLRaV-3 isolate GP18 is 18,498 nucleotides long and has a sequence identity of 93% with the NY-1 sequence over nucleotides 580–18,498. See Table 1 for nucleotide and amino acid sequence identities between GP18 and NY-1. The GP18 sequence is 579 nt longer than NY-1 at the 5′-end and has a 5′UTR of 737 nt. The extended 5′UTR has an adenine/uracil content of 68.4%, with a high uracil content of 48.5%. The only other virus of the genus Ampelovirus that has been completely sequenced is little cherry virus 2 (LChV-2, AF531505), whose genome was found to contain a region of 539 nt, 5′ of the ORF1a. The 5′-region of LChV-2 and 5′UTR of GLRaV-3 GP18 is much larger than the 5′UTRs of other members of the family Closteroviridae: GLRaV-2, closterovirus (AY881628) 105 nt; BYV, closterovirus (AF190581) 107 nt; CTV, closterovirus (DQ272579) 107 nt and LIYV, crinivirus (NC_003617) 97 nt. The 5′UTR of GLRaV-3 GP18 contains two small ambisense ORFs with no similarity to the LChV-2 ORF0, and it is possible that these ORFs are not expressed [6]. This leaves one to speculate about the function of such a large 5′UTR, which will be further investigated. A further significant difference between the sequence of GP18 and NY-1 is the 82-nt overlap between ORF1a and ORF1b. In the GP18 sequence ORF1b can still be expressed as a +1 frameshift.

Table 1 Position of untranslated regions (UTRs) and open reading frames (ORFs) on the isolate GP18 (EU259806) sequence

The putative function for some of the ORFs was predicted using the Pfam 22.0 domain search software [7]. The domains predicted by this software were similar to those previously described for the NY-1 isolate, with the only differences being the lack of a p-protease domain and the presence of a 2OG-Fe(II) oxygenase domain (aa 1,938–2,199) in ORF 1a. Further investigation is needed to elucidate this discrepancy.

The function of this large 5′UTR is not known, however its role in replication cannot be discounted and needs to be further investigated. Furthermore, analysis of the expression of ORF1b needs to be evaluated to determine the mechanism by which this protein is translated.