Introduction

Mycoviruses infect fungi and replicate in fungal cells. They are ubiquitous in various types of fungi, including endophytic, medical, entomopathogenic, and phytopathogenic fungi [1,2,3,4,5]. Most mycoviruses have a double-stranded RNA (dsRNA), positive-stranded RNA (+ssRNA), or negative-stranded RNA (-ssRNA) genome [6,7,8], but a single-stranded circular DNA virus has been reported in filamentous fungi [9]. Mycoviruses with a dsRNA genome are classified into six recognized families whose members have different numbers of genome segments, including Reoviridae (11–12 segments), Chrysoviridae (4–5 segments), Quadriviridae (4 segments), Megabirnaviridae (2 segments), Partitiviridae (2 segments), and Totiviridae (1 segment) [10]. Along with the development of viral metatranscriptomics techniques, the number of studies on virology, especially those on discovering novel viruses is increasing [11,12,13,14]. Newly discovered mycoviruses have been shown to have unique molecular and biological properties, and new families such as “Botytiviridae” and “Fusagraviridae” have been proposed to accommodate these unassigned mycoviruses, which share molecular features with, but are significantly different from, other known mycoviruses [15,16,17,18].

Macrophomina phaseolina, belonging to the ascomycete family Botryosphaeriaceae, is an important destructive necrotrophic fungus [19] that is capable of infecting over 500 plant species worldwide, including the important oil crops sesame (Sesamum indicum) and soybean (Glycine max) [20]. The typical symptom of sesame plants infected by M. phaseolina is charcoal rot; the fungus forms spindle-shaped lesions with a dark border and light-gray center that are covered with black pinhead-sized pycnidia and microsclerotia. This severe disease can kill plants and cause huge losses in annual yield [21]. Compared with other phytopathogenic fungi such as Sclerotinia sclerotiorum and Fusarium graminearum, mycoviruses or dsRNA elements have been rarely reported in M. phaseolina. Previous reports have suggested that approximately 21.7% of M. phaseolina strains isolated from diseased Cyamopsis tetragonoloba plants harbor dsRNA elements, but sequence data for these putative viruses have not been obtained [22]. A high-throughput-sequencing-based metatranscriptomics approach has recently been utilized to identify virus-related sequences in M. phaseolina strains isolated from diseased soybean plants. Eleven novel viruses that were temporarily assigned to five distinct families, including Hypoviridae, Narnaviridae, Virgaviridae, Tombusviridae, and Chrysoviridae, and unsigned -ssRNA viruses belonging to the order Bunyavirales have been discovered [12].

Here, we report the isolation of a new mycovirus, Macrophomina phaseolina fusagravirus 1 (MpFV1), isolated from the strain 2012-19 of M. phaseolina. Genomic characterizations of MpFV1 allowed the elucidation of its tentative taxonomic position. MpFV1 is suggested to belong to the proposed family “Fusagraviridae”, along with other newly reported similar mycoviruses found in Fusarium poae [15], Botrytis cinerea [16], and Rosellinia necatrix [17]. Our results further expand our knowledge regarding the genomic diversity and host range of fusagraviruses and provide insights into genome evolution.

Provenance and sequencing of strains

The strain 2012-19 of M. phaseolina was isolated from the diseased stem of a sesame plant with typical symptoms of charcoal rot, which was collected from Fuyang county, Anhui province, China, in 2012. Strain 2012-19 has a hypovirulent phenotype, characterized by abnormal colony morphology, slow growth on potato dextrose agar (PDA), and lower virulence on sesame plants (unpublished data). Mycelia agar plugs of strain 2012-19 were inoculated on PDA plates covered with cellophane membranes and cultured in the dark at 28-30°C for 2-4 days. Mycelia were harvested using a medicine spoon and stored at -80 °C until further use. New mycovirus discovery by next-generation sequencing was conducted based on a previously reported protocol with modifications [23]. Total RNA was extracted from 1.0 g of mycelia using an RNAiso Kit (TaKaRa, Dalian, China), and rRNA was depleted using a Ribo-Zero™ rRNA Removal Kit (Illumina, CA, USA). Paired-end sequencing libraries was prepared and sequenced on an Illumina HiSeq 2500 platform at Shanghai Bohao Biotechnology Co., Ltd. To obtain clean sequences, the raw reads from deep sequencing were processed to remove the adaptor sequences, and low-quality reads were discarded. These clean transcripts were then matched against genome sequences of M. phaseolina using Bowtie (1.0) software. The unmatched RNAs were next assembled into longer contiguous sequences (contigs), using Velvet software, and used to search the non-redundant protein sequences (nr) of GenBank database (http://www.ncbi.nlm.nih.gov/). Contigs that were identical or complementary to the viral genomic sequence were extracted and identified as potential viral sequences.

Synthesis cDNA from strain 2012-19 was conducted according to the manufacturer’s instructions for the PrimeScriptTM II 1st Strand cDNA Synthesis Kit (TaKaRa) (TaKaRa, Dalian, China). MpFV1 was identified using specific primers (F1, 5’-CCTGTTGAGCATACCG-3’; R1, 5’-ATTGTCCTGCCGAGCCT-3’) based on the putative viral sequence. To complete the 5’- and 3’-terminal genomic sequences, rapid amplification of cDNA ends (RACE) was performed using a SMARTer RACE cDNA Amplification Kit (Clontech, CA) with the help of gene-specific primers (GSPs). GSP-R1 (5’-CGCCAGCGATGTAAGCGATGACCG-3’) and GSP-R2 (5’-GGCGTGAACTTTTGACGATCC-3’) were used for the 5’-RACE reaction. GSP-F1 (5’-GACGATTACCAGTTATTCGC-3’) and GSP-F2 (5’-CGGAAAAGACCTAATCAAGAATGA-3’) were used for the 3’-RACE reaction. The procedures were performed according to the user manual provided with the kit. All PCR products were purified and cloned into the pMD19-T vector (Sangon Biotech, Zhengzhou, China) and introduced into Escherichia coli Trelief 5α (TSINGKE Biotech, Zhengzhou, China) by transformation. At least three recombinant clones were sent to Sangon Biotech for sequencing.

Prediction of putative open reading frames (ORFs) was performed using ORF Finder at NCBI. A protein domain search was conducted using the Conserved Domain Database (CDD). Multiple sequence alignments of the protein sequences were performed using DNAMAN software (version 9) and the CLUSTALX program (version 2.1) [24]. A phylogenetic tree was constructed by the maximum-likelihood (ML) method in the MEGA (version 7) program with 1,000 bootstrap replicates [25].

Sequence properties

From the assembled dataset remaining after discarding genomic sequences of M. phaseolina, we identified a long contig containing 9161 nucleotides (nt) that encoded two putative proteins related to proteins encoded by fusagraviruses or related mycoviruses. Thus, this contig was believed to be a partial sequence of a mycovirus, and this mycovirus was temporarily designated as MpFV1. Finally, the full-length sequence of MpFV1 was obtained and submitted to the GenBank database under the accession number MK780821.

The full genome of MpFV1 comprises a double-stranded RNA with a length of 9289 nt and containing two disconnected ORFs, namely ORF1 (nt 891-5168) and ORF2 (nt 5354-9166). The length of the 5’-UTR and 3’-UTR was 890 nt and 123 nt, respectively (Fig. 1A). The sequences at the 5’ terminus and 3’ terminus of the positive strand of MpFV1 were predicted to form stem-loop structures, and their initial ∆G values were -14.60 kcal/mol and -9.50 kcal/mol, respectively (Fig. 1D).

Fig. 1
figure 1

Schematic representation of the genomic organization and translational strategy of MpFV1. (A) Schematic diagram of the genomic organization of MpFV1. MpFV1 shows the presence of two ORFs (ORF1 and ORF2). The dotted-lined box indicates a possible extension of ORF2 via a-1 translational frameshift mechanism. OFR1 encodes a putative hypothetical protein (P1), whereas ORF2 encodes a putative protein containing two conserved domains, i.e., the RdRp and S7 domains. (B) Schematic representation of the predicted H-type RNA pseudoknot, which is located 1357 nt directly upstream of the RdRp_4 motif in MpFV1, and the proposed mechanism underlying -1 ribosomal frameshifting. (C) P predicted secondary structures for the 5’-UTR and 3’-UTR of the coding strand of MpFV1, constructed using RNA mfold online (http://unafold.rna.albany.edu/?q=mfold/RNA-Folding-Form). (D) Predicted secondary structure of the H-type RNA pseudoknot. The RNA secondary structure was predicted using the KnotSeeker program

The 5’-proximal ORF1 of MpFV1 encodes a hypothetical protein comprising 1425 amino acids (aa), with a predicted molecular mass of 159.3 kDa and a predicted isoelectric point of 7.04. A multiple alignment of the sequence of the protein encoded by ORF1 showed that it shared the highest sequence identity (58%) with a hypothetical protein of Macrophomina phaseolina RNA virus 2 (MpRV2), but relatively low identity (26%–38%) with the hypothetical proteins of eleven other unclassified dsRNA viruses (Table 1). Moreover, a search using the CDD showed that the ORF1-encoded protein does not contain any conserved domains.

Table 1 Comparison of MpFV1 and its related viruses with regard to genome size and putative structures

MpFV1 has a putative shifty heptamer sequence (5159GGAAAAC5165) located immediately upstream of the stop codon of ORF1 (Fig. 1A). Moreover, a candidate recoding stimulatory element (RSE) structure (H-type RNA pseudoknot located at nt 5219 to 5257) was predicted to lie just downstream of the slippery site (Fig. 1B). These two functional elements in MpFV1 are the cis-acting frameshift signals of the -1 ribosomal frameshifting mechanism, suggesting that the ORF2-encoded protein may be translated by ribosomal frameshifting. The 3’-proximal ORF2 of MpFV1 encodes an RNA-dependent RNA polymerase (RdRp) comprising 1270 amino acids (aa), with a predicted molecular mass of 143.2 kDa and predicted isoelectric point of 8.84. A BLASTp search of the protein encoded by ORF2 showed that it shared the highest sequence identity (61%) with the RdRp of MpRV2 but relatively low identity (31%–38%) with the RdRps of eleven other unclassified dsRNA viruses (Table 1). A search of the CDD and multiple protein sequence alignment indicated that the ORF2-encoded protein contains two conserved domains: a conserved RdRp domain (RdRp_4; pfam02123) with eight conserved motifs (I–VIII) that are characteristic of the RdRps of dsRNA viruses [26] (Supplementary Figure S1) and a conserved phytoreovirus S7 domain (S7; pfam07236) (Supplementary Figure S2). The MpFV1 S7 conserved domain is 115 aa long (aa 820 to 935), and polypeptides homologous to S7 were also identified in other fusagraviruses and related mycoviruses (Table 1).

To examine the relationship between MpFV1 and other mycoviruses, we performed phylogenetic analysis using protein alignments of the conserved RdRp domains from MpFV1 and 20 other selected RNA viruses (Fig. 2). The results showed that MpFV1 clustered with the previously reported MpRV2 and 15 other unclassified dsRNA mycoviruses, forming a distinct clade, indicating a close evolutionary relationship. However, MpFV1 and its related mycoviruses are distantly related to the clade including members of the families Totiviridae and Partitiviridae. Furthermore, a new family, “Fusagraviridae”, was recently proposed to accommodate these MpFV1-related but unassigned dsRNA viruses identified in filamentous fungi [15]. Moreover, MpFV1 has a shorter intervening sequence between ORF1 and ORF2, a shorter 5’-UTR, and a longer 3’-UTR than MpRV2 at the corresponding locations. Thus, MpFV1 is a potential new member of the proposed family “Fusagraviridae”.

Fig. 2
figure 2

Phylogenetic analysis of MpFV1 (marked with a red dot) and other related RNA viruses. The phylogenetic tree was generated by the maximum-likelihood method (1000 bootstrap replicates) based on the amino acid sequences of the putative RdRp domains using MEGA 7. Two partitiviruses (PsVF and PsVS) and two totiviruses (ScV L-A and ScV L-B) were included as outgroups. The scale bar is equivalent to a genetic distance of 1.0 amino acid substitutions per site