Bobone and alomae are diseases of taro [Colocasia esculenta (L.) Schott] that are restricted to Solomon Islands and Papua New Guinea and whose etiology remains unclear [1, 4, 8, 10]. Two types of rhabdovirus-like particles were observed in infected taro plants; the smaller (~210 × 65 nm) were associated with the characterised taro vein chlorosis virus (TaVCV), while the larger (~300-350 × 50-55 nm), which were associated with severe stunting and gall formation typical of bobone/alomae disease, were those of Colocasia bobone disease virus (CBDV) [8]. We report for the first time the genome sequence and structure of a rhabdovirus isolated from bobone-infected taro plants from Solomon Islands that had the larger virus particles, as well as its phylogenetic relationships to other plant rhabdoviruses. As we have not yet been able to establish that this virus causes bobone disease, we have named this virus Colocasia bobone disease-associated virus (CBDaV).

Taro plants with symptoms of bobone disease were obtained in 1995 from Solomon Islands and continuously propagated in a glasshouse at the University of Auckland, New Zealand, since that date. These plants were subsequently found to also be infected with the potyvirus dasheen mosaic virus (DsMV). Total RNA was isolated from leaves of three taro plants using a SpectrumTM Plant Total RNA Kit, including on-column DNase treatment (Sigma Aldrich). Transcriptome sequencing was carried out for each plant by Macrogen Inc. (South Korea) using HiSeq2000 with transcriptome assembly by New Zealand Genomics Limited. In order to remove potyviral sequences, the taro sequences were pooled and aligned against the DsMV genome (NC_003537) [6]. Reads that did not map to DsMV were extracted using Samtools [7] and assembled de novo using Trinity [2]. To remove any remaining pestiviral sequence, the assembled contigs were BLAST searched against the genomes of DsMV, soybean mosaic virus (FJ640956), turnip mosaic virus (NC_002509), and watermelon mosaic virus (HQ384216). The remaining contigs were queried against the GenBank database using the BLASTn algorithm.

Two contigs of 2,663 and 9,581 bp, which overlapped by 43 bp, were found to be similar to the 3′ and 5′ regions, respectively, of northern cereal mosaic virus (NCMV). The sequence of this overlapping region was confirmed by reverse transcription polymerase chain reaction (RT-PCR) using the primers CBDVF1 (5′-GAGCCAAACTGTCAAAAGC-3′) and CBDVR2 (5′-CGATGCACTGGTCTTGATC-3′) and dideoxy sequencing of the PCR product, which was carried out by the Waikato University Sequencing Facility. The ends of the CBDaV genome were obtained using the 3′ and 5′ RACE System for Rapid Amplification of cDNA ends (Life Technologies). The genome structure of CBDaV was determined by comparison with the following cytorhabdovirus and nucleorhabdovirus genomes: lettuce necrotic yellows virus (LNYV, NC_007642), barley yellow striate mosaic virus (BYSMV, KM213865), NCMV (AB030277 and GU985153), alfalfa dwarf virus (ADV, KP205452), lettuce yellow mottle virus (LYMoV, NC_011532), persimmon virus A (PeVA, NC_018381), TaVCV, (NC_006942), eggplant mottled dwarf virus (EMDV, KJ082087), potato yellow dwarf virus (PYDV, GU734660), maize mosaic virus (MMV, AY618418), maize Iranian mosaic virus (MIMV, DQ186554), maize fine streak virus (MFSV, AY618417), rice yellow stunt virus (RYSV, AB011257), and Sonchus yellow net virus (SYNV, L32603). Nucleotide and amino acid sequences were aligned using Muscle in GeneiousPro v6.1.8 (http://www.geneious.com, [5]), and phylogenetic analysis by maximum likelihood was carried out using MEGA 6.0 [11]. Statistical support for tree branching patterns was determined by bootstrapping one thousand times. Individual protein properties were predicted using the Expasy Compute pI/MW tool and searching against the pfam database.

The complete CBDaV genomic RNA (accession number KT381973) is 12,193 nt long with six major ORFs identified in the anti-genomic strand. Based on sequence similarity to proteins encoded by other rhabdoviruses, the CBDaV genome organisation is 3′ leader-N-P-P3-M-G-L-5′ trailer (Supplementary Tables 1 and 2), where N is the nucleocapsid gene, P the phosphoprotein gene, P3 the putative movement protein gene, M the matrix protein gene, G the glycoprotein gene, and L the polymerase gene. This genome arrangement is identical to that of LNYV, the type member of the genus Cytorhabdovirus; however, the most similar sequences based on BLASTX analyses were either NCMV (N and P proteins), or BYSMV (P3, M, G, and L proteins). The L polymerase was the only protein to have any conserved domains, namely pfam00946 (Mononegavirales RNA-dependent RNA polymerase) between amino acids 197 and 1045 (7.59 e-151) and pfam14318 (Mononegavirales mRNA-capping region) between amino acids 1151-2035 (5.65 e-61). Predicted molecular weights and pI values for each CBDaV protein are provided in Supplementary Table 3.

As is typical for all rhabdoviruses, the coding region of the CBDaV genome is flanked by untranslated 3′ leader and 5′ trailer regions that are 177 and 260 nt long, respectively, and which are complementary to each other (data not shown) [3, 9, 1214]. Like BYSMV, but unlike other cytorhabdoviruses [3, 15], the 3′ leader sequence of CBDaV does not possess a 3′ overhang of 1-2 nucleotides. Analysis of the untranslated sequences highlighted a conserved region similar to that observed in other rhabdoviruses [3, 15] (Fig. 1a), allowing prediction of the 3′ and 5′ ends of each mRNA and intergenic sequences between the ORFs. A consensus intergenic sequence can be inferred, 3′-AUUCUUUUU/G(G/A)N n /CUC-5′, which is similar to that observed for other plant rhabdoviruses [3] (Fig. 1b).

Fig. 1
figure 1

a) Similarity between sequences corresponding to predicted 3′ ends of CBDaV mRNAs, intergenic sequences (IS) and 5′ end of the next mRNA. The sequences are provided in the 3′-5′ sense of the viral mRNAs. Dashes indicate gaps introduced to optimise the alignment, while conserved nucleotides are highlighted in grey. b) Consensus sequences of the gene junction regions of CBDaV compared with other plant rhabdoviruses [3, 9]. c) Maximum-likelihood analysis using the GTR + G + I model, of plant rhabdovirus L polymerase open reading frame nucleotide sequences. The tree is rooted at the midpoint, the insect vectors for each virus are shown, and nucleorhabdovirus and cytorhabdovirus clades are indicated by brackets. Bootstrap values greater than 50 are shown for the major nodes, and the scale indicates the number of substitutions per site. Virus abbreviations are as noted in the text

Based on phylogenetic analysis of the nucleotide and amino acid sequences of the genome and each ORF, CBDaV appears to be a cytorhabdovirus, with NCMV and BYSMV its closest relatives. A phylogenetic tree for the L gene is shown as an example (Fig. 1c). The cytorhabdoviruses form two clades, one containing viruses vectored by planthoppers which includes CBDaV, while the other includes viruses vectored by aphids. CBDV is known to be vectored by the planthopper Tarophagus proserpina. Nucleorhabdoviruses were also separated by their insect vector, suggesting that this could be a useful feature for plant rhabdovirus classification. One striking difference between the planthopper-vectored cytorhabdoviruses is that while NCMV and BYSMV both have multiple ORFs between the P and M coding sequences, CBDaV has a single ORF in this position, similar to the aphid-vectored cytorhabdoviruses. Interestingly, CBDaV has the smallest cytorhabdovirus genome reported to date (Supplementary Table 1), while the genome of TaVCV [9], which has the same genome organisation as CBDaV, is the smallest nucleorhabdovirus genome.

The sequence of the CBDaV genome will now enable investigations into genome variability and virus distribution. Isolation and analysis of CBDaV isolates from bobone- and alomae-diseased taro plants in Solomon Islands and Papua New Guinea will help establish if CBDaV is in fact CBDV and may also shed light on the possible role of CBDV in the etiology of bobone and alomae diseases.