Capsicum chlorosis virus (CaCV) is considered a member of a tentative species in the genus Tospovirus, which is taxonomically placed within the family Bunyaviridae, which includes a large group of enveloped vertebrate-infecting viruses with tripartite RNA genomes of negative and ambisense polarity [14]. CaCV was first reported from Australia in 1999 infecting capsicum, chilli and tomato in Queensland [10]. Later, CaCV was discovered in New South Wales and Western Australia [5, 13]. Subsequently, several isolates of CaCV were reported from South-East Asia, including Thailand [7], India [8, 9], Taiwan [1] and China [2]. Complete genomes of CaCV from Thailand (isolate AIT) and Taiwan (isolate Ph) have been published [7, 17]. Molecular characterization of CaCV genome sequences of Australian isolates has so far been limited to the N gene [7, 13]. In this report, we present the first complete genome sequence of an Australian CaCV isolate.

CaCV isolate was collected by DM Persley in October 2013 in the Bundaberg district in southern Queensland from field-infected Capsicum annuum var. Hugo (bell capsicum) showing symptoms of light green chlorotic blotches, mottling and curling of young leaves. This isolate produced typical CaCV symptoms of marginal and interveinal chlorosis, mild mottling, chlorotic spots and young-leaf deformation at 10 days post-inoculation when mechanically inoculated on to C. annuum cv Yolo Wonder [5, 11, 13]. The virus was identified as CaCV by reverse transcription polymerase reaction (RT-PCR) using N-gene-specific primers [5]. Virus was stored as symptomatic leaf material at -20º C in the Queensland Department of Agriculture, Fisheries and Forestry plant virus collection as accession number 3432. We designated this CaCV isolate as Qld-3432.

Total RNA was extracted 4 days after inoculation from mechanically inoculated leaves of C. annuum cv Yolo Wonder using an RNeasy Plant Mini Kit (QIAGEN) and treated with DNase (Ambion®TURBO DNA-free™). The presence of CaCV RNA in total RNA preparations was confirmed by RT-PCR using N-gene-specific primers. Two micrograms of the RNA preparation was submitted to the Australian Genome Research Facility (AGRF) for sequencing. Data files obtained using an Illumina RNA HiSeq 2000 sequencer were assembled de novo to obtain a draft complete genome sequence using Geneious 7.1 [6] software following the method applied by Wylie and colleagues [16]. Sequences of full-length ORFs, except RdRp were independently determined by Sanger sequencing of pDONR221 clones following PCR amplification using gene-specific primers and Phusion (Finnzymes) proofreading DNA polymerase. In addition, the intergenic region (IGR) of S RNA was RT-PCR amplified, cloned and sequenced using primers derived from the 3’ ends of N and NSs ORFs and additional internal primers. Areas of the de novo-assembled genome with any ambiguities were validated with Sanger sequences. Complete genome sequences of L, M and S RNA segments were deposited in NCBI/GenBank with the accession numbers listed in Table 1. Nucleotide (nt) and deduced amino acid (aa) sequences of CaCV Qld-3432-encoded proteins were compared with all CaCV isolates available from the NCBI database. Percentage identities of nt and aa sequences were calculated from BlastN and BlastP alignments (NCBI). Phylogenetic neighbour-joining trees were constructed from ClustalW multiple sequence alignments generated using Geneious 7.1 [6] with 1,000 bootstrap replicates and a 50 % threshold score. The GenBank accession numbers of the sequences used in multiple sequence alignments are as follows: CaCV S RNA: DQ256123 (AIT-Thailand), FJ011449 (Ch-Pan-India), DQ355974 (CP-China), FJ947157 (KK-Thailand), FJ947156 (NRA-Thailand), and KC953852 (Ph-Taiwan). M RNA: DQ256125 (AIT-Thailand), FJ011450 (Ch-Pan-India), and KC953854 (Ph-Taiwan). L RNA: DQ256124 (AIT-Thailand), GU199334 (Ch-Har-India), and KC95385 (Ph-Taiwan).

Table 1 Genome organization of CaCV Qld-3432

CaCV Qld-3432 shares a tripartite ssRNA genome structure and organization with other CaCV isolates, and more generally, other tospoviruses. The genome consists of L RNA (8913 nt), M RNA (4846 nt) and S RNA (3944 nt) segments (Table 1) with sizes of ORFs and untranslated regions (UTR) similar to those of other CaCV isolates. The nine terminal nucleotides (5′ AGAGCAAUC 3′) in all three genomic RNA segments of CaCV Qld-3432 are completely conserved. In addition, the 5′- and 3′-terminal sequences of the RNA segments are reverse complements with perfect or nearly perfect base paring in the terminal 18 (L), 9 (M) and 14 (S) nucleotides. N protein aa sequence analysis confirmed that CaCV Qld-3432 belongs to the watermelon silver mottle virus (WSMoV) group and clusters in clade III of CaCV isolates [3] with isolates from Australia, India, Taiwan and some of the Thailand isolates (data not shown). The CaCV Qld-3432 N protein aa sequence showed highest identity (97-98 %) to that of the previously reported Australian isolate (Burdekin-1043, AY036058), Indian (Ch-Pan), Thai (KK and NRA) and Taiwanese (Ph) isolates. The CaCV Qld-3432 N protein sequence was 92-93 % identical to that of the CP-China and AIT-Thailand isolates. The NSs aa sequence showed highest identity (96 %) to that of the Ph-Taiwan isolate, but only 91 % and 88 % to that of the CP and AIT isolate, respectively.

Interestingly, S RNA IGR was 1663 nt long, significantly larger than the IGR of this segment of other sequenced CaCV isolates, which range from 824 to 1332 nt. The S RNA IGR had an A+U content of 76.6 %, similar to those of other CaCV isolates (range, 76-78 %) with 74 % nt sequence identity to that of Ph-Taiwan and the KK and NRA isolates from Thailand. Ch-Pan-India, AIT-Thailand and CP-China isolates showed much lower nt sequence identities of 45 %, 42 % and 32 %, respectively. The S RNA segment showed a close phylogenetic relationship of Australian CaCV to Ph-Taiwan, KK-Thailand and NRA-Thailand isolates, forming a single clade; AIT-Thailand and CP-China isolates were clearly more distantly related (Fig. 1).

Fig. 1
figure 1

Phylogenetic neighbour-joining tree of S RNA of CaCV isolates and two more distant WSMoV serogroup members, groundnut bud necrosis virus (GBNV, NC_003619) and WSMoV (NC_003843). The scale bar shows the number of substitutions per site with the bootstrap values placed at the relevant branch points

Similar relationships between CaCV isolates were seen when NSm, GP and RdRp aa sequences were compared (data not shown). Amino acid sequences of CaCV Qld-3432 ORFs were most similar (> 97 % identity) to those of Ph-Taiwan and least similar (90-94 % identity) to that of AIT-Thailand. Nucleotide sequence identities between AIT-Thailand and Qld-3432 isolates were lowest when complete genome sequences of M (81 %) and L (84 %) RNA were compared, whereas Ph-Taiwan M and L RNA were 98 % and 97 % identical, respectively, to Australian CaCV. The M RNA IGR of CaCV Qld-3432 was 449 nt in length, which is within the reported 425- to 452-nt range of other CaCV isolates. It contained 82 % A+U residues, which is as high as M RNA IGR of other isolates (78- 83 %) and shared 93 % nt sequence identity with Ph-Taiwan, but only 52 % with AIT-Thailand.

The above sequence comparisons and phylogenetic analysis identified Ph-Taiwan as the closest known relative of Australian CaCV. A lack of complete genome sequences of additional CaCV isolates from Australia and from other countries currently limits geographical and temporal analyses. However, our results show that genome sequences of CaCV AIT-Thailand and CP-China are considerably different from all available complete or partially sequenced CaCV isolates.

The major accepted criteria used in tospovirus species demarcation are N gene phylogeny and sequence identity, with > 90 % aa sequence identity used to classify isolates or strains of a tospovirus in the same species [4]. Based only on this N protein sequence identity threshold, AIT-Thailand and CP-China would be considered strains of CaCV. However, when IGR sequence identities of M and S RNA are included as criteria with a > 65 % nt sequence identity cutoff, both the AIT and CP isolates could be regarded as distinct from CaCV. Such a separation of the AIT isolate from CaCV is further supported by their different vector specificities. The AIT isolate is vectored by Ceratothripoides claratris [15], which was found not to be a vector of CaCV in Australia [13]. Further, C. claratris is not known to transmit any other tospovirus [12]. No vector information is currently available for the CP-China isolate. N and NSs aa sequence identities between AIT-Thailand and CP-China are 92 % and 88 %, respectively, whilst they share only 40 % nt sequence identity in the S RNA IGR, indicating that the AIT and CP isolates are as different from each other as they are from the other isolates of CaCV. The S RNA nt sequences of the AIT and CP isolates showed 62 % and 60 % identity to GBNV S RNA and 65 % and 62 % identity to WSMoV S RNA, respectively. These comparisons support the notion that the AIT-Thailand and CP-China isolates may be regarded as two distinct tospoviruses, separate from CaCV isolates, but within the WSMoV serogroup. Additional evidence may come once M and L RNA genome sequences of CP-China are available. Based on the available sequence data, we suggest that the taxonomic classification of AIT-Thailand and CP-China as isolates of CaCV should be reconsidered.