Pitaya is a perennial tropical fruit crop of the genus Hylocereus in the family Cactaceae, which includes red pitaya (H. polyrhizus) and white pitaya (H. undatus). Pitaya fruits are rich in numerous nutrients such as vitamin C, polysaccharides, flavonoid, polyphenol and betalain, which are beneficial to human health [1, 2]. In the past years, as an economically important fruit crop, pitaya has been widely cultivated in southern and southwestern China, where its planting area has reached 40,000 hectares.

Usually, pitaya plants are vegetatively propagated using mature stem tissues, which results in the accumulation of multiple viruses. A series of virus-like symptoms, such as mottling, yellowing and chlorosis, have been observed previously on the surface of stems of pitaya plants [3]. In the past decades, a number of viruses belonging to the genus Potexvirus in the family Alphaflexiviridae, including cactus virus X (CVX), zygocactus virus X (ZyVX), schlumbergera virus X (SchVX), and pitaya virus X (PiVX), have been shown to infect Hylocereus plants, and have been reported in Taiwan, Hainan and Guizhou in China [3,4,5,6] and in other countries, including Brazil, Korea and the USA [7,8,9]. Until now, there has been no evidence of any other new viruses infecting pitaya.

In September 2017, we conducted field surveys of virus diseases in pitaya orchards in Guizhou, China. Stem tissues showing viral symptoms such as chlorosis, mottling and malformation were collected and quickly frozen in liquid nitrogen. Total RNA was isolated from the symptomatic tissues using TRIzol Reagent (Invitrogen, CA, USA), and the ribosomal RNA was then removed using a Ribo-Zero Plant Kit (Illumina, CA, USA), following the manufacturer’s protocol. Finally, cDNA libraries were constructed and sequenced with paired-end reads (2 × 150 bp) on an Illumina HiSeq 4000 platform. For the data analysis procedure, low-quality sequences were removed from raw reads and de novo assembly of clean reads was done using Trinity software [10]. The resulting contigs were used to search the NCBI GenBank database using the online tools BLASTx and BLASTn.

BLASTn analysis identified six contigs with lengths from 2,502 nt to 7,978 nt that showed 98-99% nucleotide sequence identity to with EpMoaV [11] and 70% identity to jujube mosaic-associated virus (JuMaV) [12]. The six contigs were assembled from 4,711-34,087 reads. BLASTx analysis revealed that the encoded polyprotein shared 34-37% amino acid sequence identity with those of blackberry virus F (BVF, KJ413252), citrus yellow mosaic virus (CYMV) [13], fig badnavirus 1 (FBV-1) [14] and yacon necrotic mottle virus (YNMoV) [15]. After reassembly and removal of overlapping sequences, a circular DNA sequence of 7,837 nt was generated.

To confirm the presence of pitaya badnavirus-like sequences, total DNA was isolated from symptomatic samples using by CTAB method and used as template for PCR amplification. Then, three specific primer sets covering the whole circular genome were designed based on the assembled circular DNA sequence (Fig.1). These primers are shown in Supplementary Table 1. The PCR reaction was performed using Ex Taq DNA polymerase (TaKaRa, Dalian, China), and amplicons were sequenced using the Sanger method. After de novo assembly and removal of overlapping sequences, the circular DNA genome sequence was reconstructed. This virus was tentatively named “pitaya badnavirus 1” (PiBV1) (Fig. 1), and its genome sequence was deposited in the GenBank database under the accession number MK991812.

Fig. 1
figure 1

Schematic representation of the PiBV1 genome. The “tRNA-Met binding site” was set as the start site of the PiBV1 genome. ORF1-4 are the four putative open reading frames (ORFs) of the PiBV1 genome. The three open boxes (PCR1-3) overlapping each other represent the PCR fragments used to construct the PiBV1 genomic DNA sequence

The complete genome of PiBV1 is 7,837 bp in length and has a GC content of 50.2%. The full-length genome showed 98.7% nucleotide sequence identity to EpMoaV, 54.6% to BVF and 40.0-48.2% to other members of the genus Badnavirus (Supplementary Table 2). Similar to other members of the genus Badnavirus [11, 12, 15, 16], a complementary sequence (5’-TGGTATCAGAGCGAGGTT-3’) of a plant tRNA-Met binding site was also identified in the PiBV1 genome (Fig. 1). Therefore, we defined these 18 nucleotides as the start site (TGGTATCAGAGCGAGGTT1-18). The PiBV1 genome was also found to contain a TATA box (TATA7647-7650) and a polyadenylation-like signal (AATAAA7797-7802), which are typical motifs found in badnavirus genomes [11, 17].

Four putative ORFs were predicted in the viral genome, using the program ORF Finder (http://www.ncbi.nlm.nih.gov/projects/gorf) (Fig. 1). Theoretical molecular masses were calculated using ExPASy (http://web.expasy.org/compute_pi/). ORF1 (nt position 192-731) encoded a 179-aa protein with a molecular mass of 19.9 kDa (Table 1) that shared 99.4% amino acid sequence identity with the protein encoded by ORF1 of EpMoaV, and 17.4-47.5% with those of other members of the genus Badnavirus (Supplementary Table 2). Similar to other badnaviruses [18], the ORF1 also contained a protein domain of unknown function (DUF1319). ORF2 (nt position 728-1,132) encoded a 134-aa protein (14.8 kDa), overlapping with the 3’ end of the ORF1. Pairwise comparisons suggested that the protein encoded by ORF2 shared 94.8% amino acid sequence identity with the ORF2 protein of EpMoaV and 14.4-36.5% with those of other members of the genus Badnavirus. ORF3 (nt position 1,129-7,140) encoded a 2,003-aa polyprotein precursor (225.7 kDa), and it shared a very high level of amino acid sequence identity (98.9%) with the ORF3 of EpMoaV and 34.7-46.0% identity with that of other members of the genus Badnavirus. ORF3 was found to contain a number of highly conserved regions encoding the movement protein, coat protein, viral aspartic protease, zinc knuckle finger, reverse transcriptase, and RNase H domains, which are homologous to those of other badnaviruses [15, 18,19,20]. ORF4 (nt position 6,888-7,289) encoded a putative 133-aa protein (14.2 kDa) overlapping with the 3’ end of ORF3 by 253 nt. The ORF4 of PiBV1 shared 97.7% amino acid sequence identity with that of EpMoaV.

Table 1 Putative open reading frames (ORFs) in the PiBV1 genome sequence

To examine the taxonomic position of PiBV1, phylogenetic trees were constructed using the neighbour-joining method in MEGA 7.0 [21]. ClustalX was used for multiple sequence alignment. The full-length genome nucleotide sequence and the deduced amino acid sequence of ORF3 were used for constructing phylogenetic trees with 23 members covering the two major clades of badnaviruses [11]. One thousand bootstrap replicates were used. The phylogenic tree based on full-length genome nucleotide sequences, revealed that PiBV1 and EpMoaV formed a single clade, and were closely related to BVF, dracaena mottle virus (DrMV), lucky bamboo bacilliform virus (LBBV), YNMoV and JuMaV (Fig. 2). A phylogenic tree based on the deduced amino acid sequence of the ORF3 polyprotein also showed similar topology (data not shown).

Fig. 2
figure 2

Phylogenetic analysis of PiBV1 and other members of the genus Badnavirus. The phylogenetic tree was constructed using the neighbour-joining method based on full-length nucleotide sequences. Numbers at the branch points are percent bootstrap values (1000 replications). The GenBank accession numbers for other members of the genus Badnavirus are listed in Supplementary Table 2. RTBV (rice tungro bacilliform virus, NC_001914), belonging to the genus Tungrovirus in the family Caulimoviridae, was used as an out-group

For the genus Badnavirus, the species demarcation criterion proposed by the ICTV is less than 80% nucleotide sequence identity in the reverse transcriptase and RNase H (RT-RNase H) regions [22]. In pairwise sequence comparisons, the RT-RNase H region of PiBV1 shared 99.4% nucleotide sequence identity with that of EpMoaV. In April 2019, EpMoaV was isolated from a member of the genus Epiphyllum of the family Cactaceae [11]. Given the high level of identity of the PiBV1 and EpMoaV genome sequences (98.7%), we supposed that PiBV1 and EpMoaV were two isolates of a badnavirus that infects members of the family Cactaceae. However, since the nucleotide sequences of the RT-RNase H regions of PiBV1 and other badnaviruses were less than 70.5% identical, we propose that PiBV1 can be considered a new member of the genus Badnavirus.

To summarize, we determined the complete genome sequence of PiBV1, a new badnavirus infecting red pitaya in Guizhou, China. Until now, all viruses identified in red pitaya have belonged to the genus Potexvirus of the family Alphaflexiviridae. To our knowledge, this is the first report of a badnavirus infecting red pitaya. Large-scale field investigations and tests are required for understanding the prevalence and genetic variation of PiBV1.