Members of the family Secoviridae have one or two positive-sense genomic RNA segments. In members with two genome segments, each segment (RNA1 and RNA2) has a genome-linked viral protein (VPg) at its 5′ end and a poly(A) tract at a 3′ end and encodes a large polyprotein. The polyprotein of RNA1 contains genes necessary for replication, and the polyprotein of RNA2 contains one or two ORF(s) encoding coat protein(s) (CPs) and a movement protein [1]. The family Secoviridae is divided into eight genera (Comovirus, Fabavirus, Nepovirus, Sequivirus, Waikavirus, Cheravirus, Sadwavirus, and Torradovirus) and includes a few unassigned species [2].

Yam (Dioscorea spp.) is an annual herbaceous plant belonging to the Dioscoreaceae family of monocotyledons. Yam plants are commonly propagated vegetatively and are commonly cultivated in Africa, Latin America and Caribbean countries for the consumption of their tubers [3]. Yam production in Brazil is mainly concentrated in the northeast region, where it has significant socio-economic importance [4].

Leaf samples that showed mosaic symptoms were collected from 30 Dioscorea plants in the Brazilian states of Pernambuco, Paraíba, and the Federal District (Brasília). To find viral sequences in the samples by high-throughput sequencing, we prepared a virus-rich fraction from the pooled leaf samples from the 30 Dioscorea plants and extracted total RNA from this fraction. Briefly, the viruses were partially purified from the pooled samples (2 g of each plant) following the protocol of Cali and Moyer [5] with minor modifications. Total RNA was isolated from the pellet of the semi-purified sample using a ZR Plant RNA MiniPrep kit (Zymo Research, Irvine, USA) following the manufacturer’s protocol. The total RNA was sequenced using an Illumina HiSeq 2000 (Illumina Inc. San Diego, USA) with 100-base paired-end reads at Macrogen Inc. (Seoul, South Korea). The reads were organized using CLC Genomics Workbench 6.5 (http://www.clcbio.com), and contigs were assembled using Geneious R7.1 (http://www.geneious.com/). Contigs were analyzed by BLASTx and protein BLAST searches against the viral reference genome database (RefSeq) at GenBank. Two of the assembled contigs (5979 bp and 3809 bp in length) were identified as secovirus sequences, and the virus was tentatively named “dioscorea mosaic-associated virus” (DMaV).

cDNA fragments of this isolate of about 500 bp targeting RNA1 were amplified by RT-PCR for the detection of RNA1 in each leaf sample using specific primers (DaMVR1-4719F, 5′-TATCTACAAAATCGTCGGAGGAAC-3′; DMaVR1-5020R, 5′-TCAATCTCAGAAAAGGGCATGTG-3′). cDNA fragments of expected size were amplified from 20 of the 30 plants used for this study. Nearly full-length genomic cDNA fragments were amplified by reverse transcription polymerase chain reaction (RT-PCR) using a selected isolate from the state of Pernambuco (PE-81) to validate the sequencing results obtained by NGS. cDNA was synthesized again using Superscript III reverse transcriptase (Life Technologies, Grand Island, USA) and oligo-d(T)50M10 (GCAGTGTTATCAACGCAGAT50) for both RNA segments. The PCR was performed using the specific primers listed in Supplementary Table 1. These amplified cDNA fragments were sequenced directly at Macrogen Inc. by primer walking. The complete genomic sequences were obtained by amplifying the cDNAs of the 5′ terminus of each segment by a modified method of the rapid amplification of cDNA ends (RACE) protocol [6] using the primers in Supplementary Table 1.

The assembled full-length sequences of RNA1 (KU215538) and RNA2 (KU215539) were 5979 and 3809 nucleotides (nt) in length, respectively, excluding their poly(A) tails. RNA1 contains one large ORF (RNA1-ORF1) of 5663 nt, encoding a predicted polyprotein (201 kDa) carrying motifs associated with the replication process. Putative cleavage sites in the polyprotein encoded by RNA1 (Fig. 1) were identified at the dipeptides Q/S to cleave a protease co-factor and a helicase, Q/G to cleave the helicase and a protease, and Q/S to cleave the protease and an RNA-dependent RNA polymerase (RdRp). The 5′ and 3′ untranslated regions (UTRs) of RNA1 were 148 and 151 nt in length, respectively. RNA2 contains one ORF (RNA2-ORF2) of 3590 nt, coding for a predicted polyprotein (134 kDa) with a motif of CP and a cell-to-cell MP separated by a putative S/G cleavage site (Fig. 1). The 5′ and 3′ UTRs of RNA2 were 72 and 62 nt in length, respectively. The 3′ UTRs of RNA1 and RNA2 were 58.4 % identical.

Fig. 1
figure 1

Schematic representation of the genomic organization of dioscorea mosaic-associated virus RNA1 and RNA2. Putative cleavage sites are indicated. The motifs are indicated by arrows. Abbreviations: Co-Pro, protease cofactor; Hel, helicase; Pro, protease; RdRp, RNA-dependent RNA polymerase; MP, movement protein; CP, coat protein; Q, glutamine; S, serine; G, glycine. Numbers indicate the amino acid position

A phylogenetic tree based on the amino acid sequence of the protease-polymerase (Pro-Pol) region between the protease CG motif and the RdRp GDD motif (CG/GDD) of DMaV and other members of the family Secoviridae indicated that DMaV is most closely related to chocolate lily virus A (CLVA), a recently reported putative secovirus, and viruses belonging to yet unassigned species in the family Secoviridae: black raspberry necrosis virus (BRNV), strawberry mottle virus (SMoV) and strawberry latent ringspot virus (SLRSV) [79] (Fig. 2). The genomic organization of DMaV resembled that of secoviruses.

Fig. 2
figure 2

Phylogenetic relationship of dioscorea mosaic-associated virus with other members of the family Secoviridae based on the protease-polymerase core region (CG-GDD motif). A phylogenetic tree was constructed using the maximum-likelihood (ML) method with 500 bootstrap replicates. The following viruses were used in the analysis: BRNV, black raspberry necrosis virus, CCE57809; SMoV, strawberry mottle virus, NP 599086; CLVA, chocolate lily virus A, YP 004936170; SDV, satsuma dwarf virus, BAA76746; CRLV, cherry rasp leaf virus, CAF21713; PYFV, parsnip yellow fleck virus, BAA03151; RTSV, rice tungro spherical virus, AAA66056; ToTV, tomato torrado virus, ABD38934; CqMV, cowpea mosaic virus, CAA25029; BBWV1, broad bean wilt virus 1, BAD00183

The ICTV species demarcation criteria for the family Secoviridae stipulate amino acid sequence identities >80 % in the Pro-Pol region or >75 % in CP [10]. The amino acid sequence identity in the Pro-Pol (CG/GDD) and CP regions of DMaV to other unassigned members of the family Secoviridae varied from 54.6 (CLVA) to 42.4 % (SMV) and from 27.7 (CLVA) to 21.9 % (SMoV), respectively, which is below the currently valid species demarcation cutoff. In conclusion, dioscorea mosaic-associated virus is a putative new member of the family Secoviridae.