The major winter legume crops (pulses) grown in Australia include chickpea (Cicer arietinum), field pea (Pisum sativum), lentil (Lens culinaris), faba bean (Vicia faba) and lupin (Lupinus angustifolius and L. albus). Pulse crops in Australia are infected by a number of viruses from the genus Polerovirus (family Luteoviridae), such as turnip yellows virus (TuYV), bean leaf roll virus (BLRV), and Phasey bean mild yellows virus (PBMYV) [31], [8], leading to leaf yellowing, reddening, rolling, and plant stunting. Poleroviruses are transmitted by aphids in a persistent (circulative, non-propagative) manner, are phloem-limited and possess a monopartite linear single-stranded RNA genome encapsidated in an icosahedral shell [5, 11]. The RNA is organized into six open reading frames (ORFs) [22]. ORF 0 encodes a suppressor of gene silencing, which influences host range and symptoms [4, 35]. ORF 1 and 2 (RNA-dependent RNA polymerase; RdRp) are required for viral replication [12, 33, 34]. ORF 3 encodes the coat protein (CP) and ORF 4 the transport or movement protein (MP) [21, 28, 32, 35]. The coat protein and read through domain (RTD; ORF5) together create a fusion protein CP-RTD (P3-P5) required for virus accumulation, circulation and persistence within the vector.

Viruses in the family Luteoviridae, which includes poleroviruses, have frequently undergone intra- and interspecific recombination during evolution to produce new viruses and strains [10, 16,17,18, 23, 25]. Analysis of the various recombination events suggests that there are putative recombination breakpoints, “hot spots”, throughout the genome of a number of these viruses [25]. Breakpoints most often occur at the 3′ end of the RdRp region and the 5′ end of the CP, and the overlapping region of ORF1 and ORF 2 in the RdRp. Recombination events can also occur between ORF 0 and RdRp and between the CP and RTD [25]. Here, we present evidence of a recombination event creating a novel virus, hereafter referred to as faba bean polerovirus 1 (FBPV-1).

During virus surveys of pulse crops in 2014 in northern New South Wales (NSW) and southern Queensland, Australia, a total of 344 samples of chickpea (144 samples), lentil (20), field pea (14) and faba bean (71), as well as some surrounding weeds, were collected for analysis. To determine virus positive samples, the stems of each plant were blotted onto a nitrocellulose membrane for tissue blot immunoassays (TBIAs) as described by Makkouk and Comeau [19]. Blots were processed using monoclonal antibody BLRV-5G4 which reacts to a broad range of members of the Luteoviridae family [2, 14, 24]. Total nucleic acid was isolated from 74 samples that were TBIA positive using a BioSprint 15 workstation with a BioSprint 15 Plant DNA kit (QIAGEN, catalogue no. 941514) as per the manufacturer’s instructions but without the use of RNase A.

Reverse transcription (RT) PCRs were done using a primer pair that detects TuYV and BWYV but does not distinguish between the two. Synthesis of cDNA was done using SuperScript III reverse transcriptase (Thermo Fisher) as per instructions using primer AS3 [2]. PCR conditions were as described by Sharman et al. [30] using primers BWYV3969F (GTCTCCGARGCCTCTTCCCAA) and AS3 under the following cycling conditions: initial denaturation at 95 °C for 1 min, then 35 cycles of 95 °C for 15 s, 62 °C for 20 s, 56 °C for 10 s and 72 °C for 40 s; followed by a final extension step at 72 °C for 3 min. Out of the 74 plants tested, 19 were positive.

The 19 samples positive for TuYV / BWYV were then tested by PCR using the TuYV-specific primer TuYV_3394F (CGCAGGCTTCGTTTCATCGA), located within the non-coding intergenic region (IR), and primer AS5 (5′-CCGGTTCYBCGTCTACCTATTTDG-3′) [7], located in a similar position to AS3 but with less degeneracy, and the PCR conditions described above. Twelve of the 19 samples were positive for TuYV. To obtain a PCR fragment from three of the samples that were negative for TuYV (isolates 5253 (faba bean), 5421 (field pea) and 5422 (faba bean)) the degenerate primers Pol3197F [26] and AS5 were used with the PCR conditions described above. The 1,054 bp PCR products were directly sequenced by the Australian Genome Research Facility (AGRF) using Sanger sequencing. After removal of primer sequences, the resulting sequences of approximately 1,010 nucleotides (nt), consisting of 234 nt of the partial 3′ end of the RdRp gene, the complete IR and almost complete CP gene, were analysed with Geneious 9.0 (Biomatters). Almost complete CP sequences from the three isolates shared 99-100% aa identity with each other and over 93% aa identity to the CP of TuYV type isolate (TuYV-FL1, GenBank accession NC_003743). However, the partial RdRp gene and IR shared only 63% nucleotide (nt) identity to the same isolate of TuYV and the closest match by BLAST [13] was to chickpea chlorotic stunt virus (CpCSV, NC_008249) with 91% aa identity.

A specific primer for FBPV-1 (FBPV-1_3120F; GGAATGTGGTTCTATCCAGGTTCTC) in combination with AS5 was used to screen a range of TuYV PCR-positive (identified using BWYV3969F and AS3) field samples for FBPV-1 using the PCR conditions described above, except with an annealing temperature of 62 °C. FBPV-1 was found in chickpea, faba bean, field pea, lentil, marshmallow weed (Malva parviflora), and milk weed (Sonchus oleraceus) in mixed or single infections from various locations in NSW (Table 1).

Table 1 Details of FBPV-1 hosts and collection locations

Partial genome sequence of the FBPV-1 isolates described above (5253, 5421 and 5422) indicate they are all members of the same species. We selected two FBPV-1 isolates, 5253 from faba bean and 5249 from chickpea (Table 1) to characterise their genomes using Illumina MiSeq Next Generation Sequencing (NGS) in two separate runs. Total RNA was extracted from isolate 5253 as per Asif et al. [3], and from isolate 5249 using Trizol® reagent (Invitrogen) as described by Reinhart et al. [27]. Library preparation and 150 bp paired-end sequencing with an Illumina MiSeq sequencer was done by the AGRF. Total RNA from both isolates was processed in the same manner except for isolate 5249 which had a ribosomal depletion step (Ribo-Zero Removal (plant leaf); Illumina) included to remove ribosomal RNA from the sample. Adaptor and primer sequences were removed from the obtained reads in Geneious 9 using the BBDuK plugin (part of the BBTools package)[6].

For isolate-5253, the total number of reads obtained was 4,953,772 which was reduced to 4,272,088 after trimming for quality using CLC Genomics Workbench 6.5 (CLCGW; CLC bio; quality score limit set to 0.01, maximum number of ambiguities to 2 and reads below 75 nt were removed). Reads were paired and the de novo assembly function of CLCGW was used to assemble contigs with automatic word size and bubble size, a minimum contig length of 500 nt, mismatch cost length 2, insertion cost 3, deletion cost 3, length fraction 0.5 and similarity fraction 0.9. The number of contigs produced after de novo assembly was 9,615, which were sorted by length and subjected to a BLAST search [13]. Further analysis was done using Geneious 9. The 1,010 nt fragment of FBPV-1, for isolate 5253, obtained by Sanger sequencing above, which includes the putative recombination site, was used as a mapping reference against the contigs and one contig, 5,467 nt long, matched the FBPV-1 fragment. The number of paired reads that mapped to the FBPV-1 contig was 1,748 with an average depth of coverage of 44.9.

For isolate 5249, post NGS analysis was performed as above with the total reads of 4,147,508 reduced to 4,146,418 after trimming for quality. The number of contigs produced after de novo assembly in CLCGW was 4,551 and one 5,598 nt long contig mapped to the sequence of FBPV-1-5253 derived by NGS above. The number of paired reads that mapped to the FBPV-1-5249 contig was 369,406 with an average depth of coverage of 8,406. The number of paired reads that mapped to the reference FBPV-1-5253 was 387,548 with an average depth of coverage of 9,000. The contig obtained for FBPV-1-5249 was 5,608 nt. The NGS method for isolate 5249 produced over 200 times more reads that mapped to the reference FBPV-1-5253 than the NGS method for isolate 5253. It appeared that the inclusion of a ribo depletion step for the NGS of isolate 5249 greatly increased the number of virus reads obtained.

The 3′ terminal sequences of FBPV-1 isolates 5249 and 5253 were determined by addition of a poly(A) tail to purified RNA using poly(A) polymerase (New England Biolabs), followed by cDNA synthesis and PCR with primers Potyvirid primer 1 [9] and FBPV_5324F (AAGGCCTCCGCAAAGTCGGAGAAGCTTG). The 5′ terminal sequence was determined as described by Sharman and Thomas [29], ligating the Adaptor2 oligo to the 5′ end of FBPV-1 isolates 5253 and 5249. cDNA, was synthesised using FBPV_507R (AGA TGC AGG CAC CAC GCG TTA AGT AGT C) and semi nested PCRs were carried out using AdaptR2 [29], FBPV_507R (AGA TGC AGG CAC CAC GCG TTA AGT AGT C) and FBPV_272R (TGC GGG AAT GTG GAA GAA CGA GAG CTC C). Products were directly sequenced by the AGRF using Sanger sequencing. After contig assembly, mapping analysis, and terminal end sequencing, both FBPV-1 isolates had a complete genome length of 5,631 nt (GenBank accessions MH464873 (5253) and MH464874 (5249)).

The genome sequences of FBPV-1-5253 and –5249 share 97% nt identity over a 5,631 nt region and shared identical genome properties. As such, further analysis is based on FBPV-1-5253. FBPV-1 has a genome organization typical of a polerovirus (Fig 1A) consisting of 6 putative ORFs labelled ORF 0 to ORF 5. The start of ORF 0 follows a 17 nt 5′ untranslated region and extends to 807 nt downstream to code for a 268 aa sequence. ORF 1, which is translated in a different reading frame to ORF 2, has a putative “shifty” sequence upstream of the ORF 1 stop codon which causes a −1 frame shift to code for the P1-P2 RdRp fusion protein [22]. ORFs 3, 4 and 5 are 3′ of the intergenic non-coding region and code for the CP, MP and the CP-RTD proteins, respectively.

Fig. 1
figure 1

(A) Schematic representation of FBPV-1 genome with the six ORFs (ORF 0-5). The dark grey ORFs depict the 5′ portion that is from a currently unknown virus, which shares some similarity to chickpea chlorotic stunt virus (CpCSV; NC_008249). The light grey ORFs depict the 3′ open reading frames that are from turnip yellows virus (TuYV; reference sequence NC_003743, and TuYV-WA-1 isolate; JQ862472). The genome-linked protein (VPg), necessary for RNA synthesis, is putatively at the 5′ end in viruses belonging to the Polerovirus genus. (B) Results of the RDP recombination analysis for FBPV-1. Lines indicate the percentage of similarity per alignment and the pink area indicates the recombinant part of the sequence. (C) Maximum-likelihood phylogenic trees based on amino acid sequence alignments of FBPV-1 and other members of the genus Polerovirus. Sequences used for alignments: BChV-2a, AF352024; CpCSV, NC_008249; SABYV, NC_018571; BMYV, NC_003491; BWYV, NC_004756; CABYV, NC_003688; MABYV, NC_010809; CYDV-RPS, NC_002198; CYDV-RPV, NC_004751; BrYV, NC_016038; PBMYV, KT962999; PeVYV, NC_015050; TVDV, NC_010732; TuYV, NC_003743; TuYV_WA, JQ862472; TuYV_Anhui, KR706247

Phylogenetic relationships were determined by aligning amino acid sequences (Geneious; ClustalW: BLOSUM cost matrix) from a number of different poleroviruses. Maximum Likelihood (ML) phylogenetic trees were created in Geneious generated with RAxML using the GAMMA WAG protein model with rapid bootstrapping and search for best scoring ML tree algorithm and 500 bootstrap replicates (Fig. 1C). The phylogenetic trees illustrated that FBPV-1 is most similar to CpCSV for ORFs 0 and 1+2, and then most similar to TuYV for ORFs 3 and 5.

From the 5′ end of FBPV-1 to the intergenic region, the closest match by BLAST [13] in GenBank was the reference isolate of chickpea chlorotic stunt virus (CpCSV; NC_008249) from Ethiopia [1] with 66% nt identity (Fig. 1A and 1C). The putative P0 and RdRp proteins of FBPV-1 and CpCSV reference isolate share 35% and 56% aa identity respectively (Geneious alignment; ClustalW: BLOSUM cost matrix). Species demarcation criteria for poleroviruses stipulate that distinct species share less than 10% aa identity for any ORF [15]. Therefore, FBPV-1 is considered a distinct polerovirus species.

In contrast, the closest match by BLAST for the 3′ portion of FBPV-1, consisting of ORFs 3, 4 and 5, was 94% nt identity to TuYV isolate WA-1 from Western Australia (GenBank; JQ862472). This region also shared 90% nt identity with the reference isolate TuYV-FL1 (NC_003743). At the amino acid level, the CP shares 100% identity with TuYV-WA-1 and 94% identity with TuYV-FL1. The MP protein shares 95% and 90% aa identity with TuYV-WA-1 and TuYV-FL1 respectively and the RTD shares 90% aa identity with TuYV-FL1, confirming that this region of FBPV-1 is related to TuYV.

To check for potential recombination events and the break point for FBPV-1, aligned sequences of FBPV-1-5253, TuYV-FL1, CpCSV (NC_008249), BrYV (NC_016038), BMYV (NC_003491), BWYV (NC_004756) and BChV-2a (AF352024) were examined using the recombination program RDP4 [20]. A putative recombination event for FBPV-1 (Fig. 1B), with TuYV and CpCSV, was strongly supported using the RDP, GENECONV, Bootscan, Maxchi, Chimaera, SiSscan and 3Seq methods, all with p-values less than 4 × 10−16. The predicted breakpoint is located approximately 162 nt downstream from the end of ORF 2, within the non-coding intergenic region (IR) of FBPV-1, a known “hot spot” for recombination events in poleroviruses [25]. The recombinant fragment extends from the IR to the 3’ end of the FBPV-1 genome (Fig. 1A, & 1B). Therefore, the 3′ portion of FBPV-1, which encodes the CP, MP and CP-RTD, appears to be from a TuYV parent, while the first 3,427 nt, which encodes the proteins P0, and RdRp (P1-P2) are from an unknown virus that shares some similarity with CpCSV.

Phylogenetic and recombination analyses indicate that an inter-species recombination event has occurred to create a new virus that we have called faba bean polerovirus 1 (FBPV-1). We have found FBPV-1 in NSW, Australia, across an area spanning about 570 km from three plant families (Fabaceae, Malvaceae and Asteraceae), consisting of four legume crop species and two weed species. The full extent of the geographic range and host range of FBPV-1 is not yet clear but is likely to be more extensive than we have found to date. Because the CP (ORF 3) section of FBPV-1 is almost identical to TuYV, it is likely that many isolates of FBPV-1 have previously been mistakenly identified as TuYV using serological methods or PCRs that target the coat protein or the 3′ portion. Besides the genetic differences between FBPV-1 and TuYV, it will be interesting to determine how these viruses differ in terms of transmission, host range and symptoms. As the ORF 0 of FBPV-1 is more closely related to CpCSV than to TuYV, and this region is predicted to influence host range and symptoms [4, 34], it is possible that FBPV-1 has biological properties distinct from those of TuYV. We are continuing investigations to determine if the donor parent of the 5′ portion of FBPV-1 is present in Australia or if the putative recombination event may have occurred overseas prior to arrival into Australia. Nevertheless, the close identity of the 3′ portion of the FBPV-1 genome to an isolate of TuYV from Western Australia, suggests that the recombination event occurred in Australia and that the donor parent of the 5′ portion of FBPV-1 may also be present in the same environment.