The viruses of the genus Emaravirus have multipartite RNA genomes, each of which encodes a single protein. Of these, RNA1, RNA2, RNA3, and RNA4 encode RNA-dependent RNA polymerase (RdRp), glycoprotein precursor (GP), nucleocapsid protein (NP), and movement protein (MP), respectively [1]. Since 2007, more than 20 emaraviruses have been identified [2,3,4], and some of these viruses are known to be transmitted by a specific eriophyid mite [5,6,7]. Chrysanthemum is one of the most important ornamental crops worldwide. In Japan, flowers are produced not only for ornamental purposes but also as food. It is believed that domestic chrysanthemum cultivars were introduced into Japan from China in the fifth to eighth centuries.

The occurrence of leaf chlorotic ringspot or mosaic symptoms of chrysanthemum, known as “Mon-mon” disease, has been observed since the 1960s. It has long been believed that the symptoms are a kind of physiological abnormality caused by infestation with an eriophyid mite, Paraphytoptus kikus [8,9,10]. However, using reverse transcription polymerase chain reaction (RT-PCR) with newly developed emaravirus-specific degenerate primers in combination with a high-throughput sequencing (HTS) approach (see below), we obtained a partial nucleotide sequence from symptomatic leaves showing sequence similarity to the RNA1 of emaraviruses and found a correlation between the symptoms and detection of the emaravirus-like sequence. The putative emaravirus was tentatively named "chrysanthemum mosaic-associated virus" (ChMaV) [11].

To determine the complete nucleotide sequence of ChMaV, symptomatic leaves of chrysanthemum (cv. Kin-nishiki, an edible flower variety) were collected from a greenhouse in Toyohashi, Aichi, in 2018. Total RNA was extracted from one leaf sample exhibiting representative mosaic symptoms, using the rapid CTAB method [12], and this sample was subjected to HTS as described previously [7]. Out of a total of 127,707 contigs (maximum length, 14,620 nt; minimum length, 125 nt; mean length, 409 nt) derived from 33,954,546 paired-end, 150-bp reads, sequences containing open reading frames (ORFs) encoding proteins with amino acid sequence similarity to emaravirus proteins were identified by BLASTx and DELTA-BLAST searches [13, 14]. Both searches detected seven contigs whose length and amino acid sequences showed the highest similarity to the following emaravirus proteins: 7,144 nt, 2,054 nt, 1,307 nt, 911 nt, 1,353 nt, 1,164 nt, and 1,761 nt; 44.4% to pear chlorotic leaf spot-associated virus (PCLSaV) P1 [15], 33.0% to PCLSaV P2, 40.4% to PCLSaV P3, 38.9% to PCLSaV P3, 59.3% to PCLSaV P4, 24.8% to perilla mosaic virus (PerMV) P5 [7], and 28.6% to rose rosette virus (RRV) P5 [16], respectively. The complete nucleotide sequences of these RNAs were determined via direct sequencing using specific primers designed based on the sequences of the contigs (Supplementary Table S1), and the terminal sequences were determined by 5′ and 3′ rapid amplification of cDNA ends using 5′ or 3′ full RACE Core Sets, (Takara Bio). The seven RNAs contained conserved 13- and 11- nt sequences at the 5′ and 3′ terminus (5′-AGUAGUGUUCUCC......AACACACUACU-3′), respectively, which is a common feature of the RNA segments of emaraviruses [1]. Based on the standard nomenclature for emaravirus RNAs and protein homologs, these seven RNAs were named RNA1, RNA2, RNA3a, RNA3b, RNA4, RNA5, and RNA6 (Fig. 1). To identify other possible segments, RT-PCR was performed with the primers BamHI-ChMaV-5′ter-11mer-fw and BamHI-ChMaV-3′ter-11mer-rv (Supplementary Table S1), which contain a BamHI site followed by the conserved 11-nt sequence at the 5′- and 3′-terminus, respectively, and the amplification products were cloned and sequenced. Of the 48 clones sequenced, 44 were from previously determined RNA segments (30 from RNA3a, seven from RNA3b, three from RNA4, two from RNA5, one from RNA6, and one that included sequences from both RNA5 and RNA6), and the others did not encode proteins; thus, no other emaravirus-like segments were found.

Fig. 1
figure 1

Schematic representation of the organization of chrysanthemum mosaic-associated virus (ChMaV) genomic RNAs. Open reading frames are represented by rectangles, and those with amino acid sequence similarity to one another have the same color. The 13- and 11-nucleotide-conserved sequences at the 5′ and 3′ terminus on each segment are represented by blue and yellow boxes, respectively. Drawings are not to scale.

We previously reported that a partial nucleotide sequence of RNA1 was specifically detected in five symptomatic leaves, but not in five asymptomatic leaves, by RT-PCR [11]. Primer sets targeting RNAs 2, 3a, 3b, 4, 5, and 6 were designed and used for diagnosis (Supplementary Table S1). All of these RNAs were specifically detected in the five symptomatic leaves, whereas no RNAs were detected in the five asymptomatic leaf samples (data not shown), indicating that all of the segments identified are indeed associated with the mosaic symptoms.

RNA1 is 7,093 nt long and contains an ORF (nt positions 7,029 to 121) that encodes P1, a putative RdRp of 2,302 aa. The presence of the seven amino acid sequence motifs (Pre-A, F, and A–E) that are conserved in emaravirus RdRps was confirmed, as reported previously [11]. RNA2 is 2,054 nt long and contains an ORF (nt 1,984 to 83) that encodes P2, a putative GP of 633 aa. As with the P2 of other emaraviruses, ChMaV P2 contains a phlebovirus glycoprotein motif (G490CYSCTQG).

Like some other emaraviruses, the genome of ChMaV contains two derivatives of RNA3: RNA3a and RNA3b. RNA3a and RNA3b are 1,390 and 1,215 nt long, respectively, and each encodes an NP: P3a (ORF at nt 1,316 to 525) and P3b (nt 1,148 to 357), both of which are 263 amino acids in length. The amino acid sequence identity between ChMaV P3a and P3b is 48.3%, which is much lower than that between the P3s of High Plains wheat mosaic virus (HPWMoV) (88.9%) and PerMV (83.0%) [7, 17] (Fig. 2C). Of the three conserved motifs in P3 of emaraviruses [18], ChMaV P3a and P3b contain “N148RLA” and “G169XEX”, but “NX2SXNX3A” is missing. The lack of NX2SXNX3A is also observed in the P3s of PCLSaV, PerMV, CjaEV1, and CjaEV2 [7, 15, 19].

Fig. 2
figure 2

Phylogenetic relationships of chrysanthemum mosaic-associated virus (ChMaV), emaraviruses, and selected viruses of the order Bunyavirales. Analyses were performed using the amino acid sequence alignments for a RNA-dependent RNA polymerase (RdRp), b glycoprotein precursor (GP), c nucleocapsid protein (NP), and d movement protein (MP). The numbers at each node represent bootstrap values in percent, and those < 60% were omitted. The scale bars represent the number of residue substitutions per site. The GenBank accession numbers of the proteins used for phylogenetic analysis are shown. Sequences of maize mosaic virus (MMV), a member of the genus Alphanucleorhabdovirus, order Mononegavirales, rice stripe virus (RSV, genus Tenuivirus, family Phenuiviridae), and tomato spotted wilt virus (TSWV, genus Orthotospovirus, family Tospoviridae), of the order Bunyavirales, were used as outgroups. The members of the genus Emaravirus included actinidia chlorotic ringspot-associated virus (AcCRaV), actinidia emaravirus 2 (AcEV-2), aspen mosaic-associated virus (AsMaV), alfalfa ringspot-associated virus (ARaV), blackberry leaf mottle-associated virus (BLMaV), Camelia japonica-associated emaravirus (CjaEV) 1 and CjaEV2, European mountain ash ringspot-associated virus (EMARaV), fig mosaic virus (FMV), High Plains wheat mosaic virus (HPWMoV), lilac chlorotic ringspot-associated virus (LiCRaV), jujube yellow mottle-associated virus (JYMaV), pear chlorotic leaf spot-associated virus (PCLSaV), perilla mosaic virus (PerMV), pistacia virus B (PiVB), blue palo verde broom virus (PVBV), pigeonpea sterility mosaic virus (PPSMV-1), pigeonpea sterility mosaic virus 2 (PPSMV-2), raspberry leaf blotch virus (RLBV), rose rosette virus (RRV), redbud yellow ringspot-associated virus (RYRSaV), and ti ringspot-associated virus (TiRSaV). The subgroups of emaraviruses based on the trees for NP and MP are indicated by solid lines. The proteins of ChMaV are indicated by arrowheads.

RNA4 is 1,303 nt long and contains an ORF (nt 1,233 to 310) that encodes the P4 of 307 amino acids with a predicted molecular mass of 35.4 kDa, a putative MP. ChMaV P4 contains a conserved D110XR motif, which is present in the MPs of other emaraviruses [20]. However, the most conserved WKT motif is substituted by Y174KV, which is similar to YKT in the P4 of PCLSaV [15].

RNA5 is 1,154 nt long and contains an ORF (nt 1,098 to 574) that encodes P5, a 174-amino acid protein of 20.1 kDa. ChMaV P5 exhibits amino acid sequence similarity to the P5 protein of PerMV and the P7 protein of HPWMoV, a suppressor of RNA silencing [21, 22]. RNA6 is 1,707 nt long and contains an ORF (nt 1,624 to 164) that encodes P6, a 486-amino-acid protein of 57.1 kDa. ChMaV P6 exhibits amino acid sequence similarity to raspberry leaf blotch virus (RLBV) P5 and RRV P7, whose functions are unknown [16, 23].

The ranges of amino acid sequence identity of P1 to P4 of ChMaV to the corresponding proteins of other emaraviruses, calculated by SDT v.1.2 [24], were as follows: P1, 31.3% (PerMV) to 45.0% (PCLSaV); P2, 22.4% (TiRSaV) to 32.6% (PCLSaV); P3a, 21.6% (PPSMV-1) to 39.8% (PCLSaV); P3b, 21.0% (PPSMV-1) to 36.4% (PCLSaV); and P4, 19.3% (EMARaV) to 59.6% (PCLSaV). ChMaV P5 shared the highest sequence identity (30.4%) with PerMV P5 (BBM96182), followed by P7 of Camellia japonica-associated emaravirus 1 (CjaEV1) (QHG11079) (29.1%), HPWMoV P7 (AIK23038) (27.8%), and PCLSaV P5 (QKY77007) and CjaEV1 P8 (QHG11080) (26.5%). ChMaV P6 shared the highest identity with RRV P7 (QJR96844) (28.4%), followed by its orthologous proteins [16] RRV P5 (QIB98227) (25.4%), HPWMoV P6 (AIK23037) (24.8%), HPWMoV P5 (AIK23036) (24.7%), PPSMV-1 P5 (CCW28369) (23.7%), and RLBV P5 (CBZ42028) (22.5%). Following the demarcation criterion for the genus Emaravirus that the amino acid sequences of the relevant gene products of RNA1 (RdRp), RNA2 (GP), and RNA3 (NP) differ by >25% [1], we concluded that ChMaV is a member of a distinct species of the genus.

The amino acid sequences of proteins P1 to P4 of ChMaV, known emaraviruses, and viruses belonging to the order Bunyavirales were aligned using MAFFT [24], and phylogenetic trees were reconstructed by the neighbor-joining method implemented in MEGA X [25] with 1,000 bootstrap replicates (Fig. 2). P1 to P4 of ChMaV consistently segregated together with those of PCLSaV and formed a cluster, namely, subgroup III, with PerMV, CjaEV1, and CjaEV2 (Fig. 2). Interestingly, these viruses were found in Japan or China [7, 15, 27, 28], and their host plants (Chrysanthemum morifolium, Pyrus pyrifolia, Perilla frutescens, and Camelia japonica) also originate from or are widely distributed in East Asia [29,30,31,32]. These observations indicate that subgroup III represents a unique genetic cluster of known and unknown emaraviruses of East Asian origin.