The family Endornaviridae (two genus: Betaendornavirus and Alphaendornavirus) includes viruses infecting plants, fungi, and oomycetes. Endornaviruses have a linear double-stranded RNA (dsRNA) molecule of 9.8–17.6 kb, depending on the virus, encoding a single long open reading frame (ORF), and most of them have a discontinuity (nick) near the 5′ end of the plus strand [1, 2]. Endornaviruses are endogenous viral elements not encapsulated to form virions and without an extracellular phase. Virtually, the infection caused by endornaviruses is persistent and symptomless. They have frequently been found in various healthy plants, ranging from algae to higher plants, not affecting the host phenotype [2,3,4,5,6]. Vicia faba endornavirus [7] and Helicobasidium mompa endornavirus 1 [6] are the two exceptions reported to be associated with altered host phenotypes.

Endornaviruses are known to be unconventional viruses: no evidence of cell-to-cell movement; the dsRNA occurs in every tissue and at every developmental stage at relatively consistent concentrations [2, 8]. Although the dsRNA is found in the cytoplasm of host cells, it is transmitted very efficiently (> 98%) to progeny plants via pollen and ova [2]. Recently, Hao et al. [9] demonstrated endornavirus ability for horizontal transmission between hosts in necrotrophic fungus Botrytis cinerea. Endornaviruses have been reported to infect economically important crops such as avocado, barley, bell pepper, common bean, fava bean, melon, and rice, except for isolates of Vicia faba endornavirus [5] their overall effect on plants is not known.

Next-generation sequencing (NGS) techniques are transforming the study of viruses and opening a new path for viral discovery [10]. Consequently, knowledge of endornaviruses has been expanding [11, 12]. The aim of the present study is to describe the completely sequenced genomes of two Cucumis melo endornavirus (CmEV) strains, which were identified during an NGS investigation conducted using human stool samples collected in Brazil. CmEV was recently reported in Cucumis melo in the USA (CL-01) [8] and South Korea (SJ1) and is being considered as a possible novel species in the genus Endornavirus. Currently, only two complete genome sequences (CL-01 and SJ1 strains) and three partial sequences of CmEV are available in the GenBank database.

During the period September of 2010–February of 2016, a total of 250 fecal samples were collected from patients with acute gastroenteritis in the state of Tocantins, Northern Brazil. The specimens were screened for enteric viruses by NGS. CmEV was detected in only two (0.8%; 2/250) samples: BRA/TO-23 and BRA/TO-74, in which group A rotavirus was also detected. The BRA/TO-23 sample was collected in August of 2014 from a 2-year-old male child; the BRA/TO-74 sample was obtained in December of 2010 from a 6-month-old female infant. Both patients were experiencing acute gastroenteritis symptoms. The collected stool samples were sent to the Central Laboratory of Public Health of Tocantins State (LACEN/TO), a regional reference center for gastroenteritis surveillance. The state of Tocantins is characterized by cerrado vegetation (dry grassland and scrub forests), vast rivers, and soybean plantations. The patients were inhabitants of Araguaína, which is the second largest city in the state and is located 384 km from the state capital, Palmas.

The procedure used to perform deep sequencing is a combination of several previously described protocols that have been applied to viral metagenomics and/or virus discovery [13, 14]. In summary, 50 mg of the human BRA/TO-23 and BRA/TO-74 fecal samples were diluted in 500 µl of Hanks’ buffered salt solution, added to a 2 ml impact-resistant tube containing lysing matrix C (MP Biomedicals, USA), and homogenized in a FastPrep-24 5G Homogenizer (MP biomedicals, USA). The homogenized samples were centrifuged at 12,000 × g for 10 min, and approximately 300 µl of the supernatant was then percolated through a 0.45 µm filter (Merck Millipore, Billerica, MA, USA) in order to remove eukaryotic and bacterial cell-sized particles. Approximately, 100 µl, roughly equivalent to one-fourth of the volume of the tube of cold PEG-it Virus Precipitation Solution (System Biosciences, CA, USA) was added to the obtained filtrate, and the contents of the tubes were gently mixed then incubated at 4 °C for 24 h. After the incubation period, the mixtures were centrifuged at 10,000 ×g for 30 min at 4 °C. Following centrifugation, the supernatants (~ 350 µl) were discarded. The pellets rich in viral particles were treated with a combination of nuclease enzymes (TURBO DNase and RNase Cocktail Enzyme Mix—Thermo Fischer Scientific, CA, USA; Baseline-ZERO DNase-Epicentre, WI, USA; Benzonase-Darmstadt, Germany; and RQ1 RNaseFree DNase and RNase A Solution-Promega, WI, USA) in order to digest unprotected nucleic acids. The resulting mixtures were subsequently incubated at 37 °C for 2 h.

After incubation, viral nucleic acids were extracted using ZR & ZR-96 Viral DNA/RNA Kit (Zymo Research, CA, USA) according to the manufacturer’s protocol. The cDNA synthesis was performed with AMV Reverse transcription (Promega, WI, USA). A second strand of cDNA synthesis was performed using DNA Polymerase I Large (Klenow) Fragment (Promega, WI, USA). Subsequently, a Nextera XT Sample Preparation Kit (Illumina, CA, USA) was used to construct a DNA library, identified using dual barcodes. For size range selection, Pippin Prep (Sage Science, Inc.) was used to select a 300 bp insert (range 200–400 bp). The library was deep-sequenced using the HiSeq 2500 Sequencer (Illumina, CA, USA) with 126 bp ends. Bioinformatics analysis was performed according to the protocol described by Deng et al. [15]. Contigs that shared a percent nucleotide identity of 95% or less were assembled from the obtained sequence reads by de novo assembly. Based on the bioinformatics pipeline used [15], no reads related to human, plant, fungal, or bacterial sequences were obtained. Figure 2 displays types (or species) and amount of the detected viruses in the BRA/TO-23 and BRA/TO-74 samples.

Totals of 88,074 and 1,600,771 paired-end reads were obtained from the BRA/TO-23 and BRA/TO-74 samples, respectively. Of the total reads, 5.9% (n = 5228) from BRA/TO-23 and 1.03% (n = 16,444) from BRA/TO-74 showed BLASTx score (coverage of 4,8925× and 7,64×, respectively) to CmEV, represented by the complete dsRNA genome sequences of the CL-01 (KT727022) [8] and SJ1 (KX641269) strains. At nucleotide level (BLASTn) identity was 97% with 99% of coverage and e-values were ~ 1E-110–1E-120 (Fig. 2).

The final genome analysis was performed using the Geneious software v9.1.8 (Biomatters Ltd., Auckland, New Zealand). ORF was predicted with the Geneious ORF finder. The BRA/TO-23 and BRA/TO-74 strains were nearly identical to the CL-01 and SJ1 strains, showing 97% nucleotide identity. At the amino acid level, the Brazilian CmEV BRA/TO-23 and BRA/TO-74 strains aligned 98% to the polyproteins of CL-01 [Cucumis melo alphaendornavirus] (YP_009222598) [8] and SJ1 [Cucumis melo alphaendornavirus] (ARI71634) strains.

The genomes obtained from the BRA/TO-23 and BRA/TO-74 samples were determined to be 14,737 and 15,004 bp in size, respectively, and contains a single large ORF as reported to Endornaviridae family. The CDD search (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) of the deduced gene products produced from the deep sequencing of the dsRNA of the BRA/TO-23 and BRA/TO-74 strains revealed the presence of several domains with enzymatic functions. The most conserved domain, viral RNA-dependent RNA polymerase 2 (RdRp_2) (CL03049; E-value = 1.44e-14 for BRA/TO-23 and 1.56e-14 for BRA/TO-74), has been recognized as the C-terminus of the polyprotein belonging to the pfam00978 family. The polyprotein always carries a conserved RdRp domain (pfam00978), specifying its affiliation to the alphavirus-like superfamily [11]. Other typical conserved replicase motifs, such as those encoding for methyltransferase, helicase, and glucosyltransferase, were also identified (Table 1). Phylogenetic analyses were conducted using MEGA 6.0 [16]. Clade support was obtained by bootstrap after 500 replications. For the amino acid coding sequences, the JTT matrix-based model was used and the evolutionary models were selected according to the likelihood ratio test. Maximum clade credibility trees were annotated with TreeAnnotator and viewed with FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/). The two Brazilian CmEV strains demonstrated by the present study form a separate clade together with the reference strains CL-01 and SJ1. This group also shows a distant relationship to other Endornaviridae family members, reinforcing the conclusion that the Brazilian (BRA/TO-23 and BRA/TO-74), American (CL-01) and Korean (SJ1) strains belong to the same CmEV species (Fig. 1).

Table 1 Motifs positions in the polyproteins of the CmEV BRA/TO-23 and BRA/TO-74 strains compared to CL-01 and SJ1 reference strains
Fig. 1
figure 1

Phylogenetic tree of Brazilian CmEV strains. CmEV genome references SJ01 and CL-01 and the nearly complete genomes of the Brazilian BRA/TO-23 and BRA/TO-74 strains were used to infer a maximum likelihood tree inferred using the polyprotein of endornaviruses. References used to infer the tree were the best-hit sequences obtained using Blast search. Values above branches are statistical support based of the tree. Hatched brackets indicate genetic divergence (pairwise distances and standard errors) between clades on the tree. The tree was constructed using the JTT model and composite likelihood method implemented in the Mega6.0

Plant viral content has already been described in human feces. Fecal samples have significant dietary content, and plant viral nucleic acid is expected to appear in large amounts in the resulting sequences [17]. Endornaviruses form a well-known group of endogenous plant viruses and have a worldwide distribution [8]; therefore, their presence in feces may simply reflect recent dietary consumption. Endornaviruses have also been detected in Australian Muscovy duck fecal samples [18]. A plethora of plant viruses should be expected in BRA/TO-23 and BRA/TO-74 samples besides CmEV. However, these data were not observed. Figure 2 shows the number of reads plotted against E-values. It is possible to observe that a high quantity of CmEV reads (large pink circle) could be obtained compared to other viruses detected (small blue circles), including the plant viruses. It is noteworthy that only two of 250 fecal samples screened displayed CmEV reads.

Fig. 2
figure 2

Plethora of other viruses present in samples BRA/TO-23 and BRA/TO-74. Number of reads (x-axis and size of circles) was plotted against E-values (y-axis, E-values were expressed in log scale). CmEV reads are represented by large pink circles and other detected viruses, including the plant viruses by small blue circles

The high proportion of CmEV reads in these two particular samples might reflect the different levels of reduction and condensation of mass from food to feces through absorption during digestion, once both patients were presenting acute diarrhea. Zhang et al. [17] demonstrated that fluctuations of the Pepper Mild Mottle Virus RNA viral amount in human feces could be associated with the kinds of food consumed, methods of food preparation, and idiosyncratic conditions in the gastrointestinal tract. On the other hand, earlier studies showed that plant viral particles could be assembled in Escherichia coli cells [19, 20], which leads to the speculation that some feces-borne plant viruses might be capable of interacting with microbes in human guts [17]. However, the evidence for active replication of CmEV in human gut is currently lacking. Further research efforts are needed to determine whether and how plant viruses may interact with intestinal cells or microorganisms in the human gastrointestinal tract [17].

A common feature of virome analysis is the frequent unavailability of epidemiological background information on potential consumption of contaminated food and/or patient medical records. The present study is no exception, and the epidemiological connection between the patient and the virus is lacking. Endornaviruses are not known to be associated with human disease [8], and the gastroenteritis symptoms observed in the patients BRA/TO-23 and BRA/TO-74 are probably linked to one (or more) of the enteric viruses detected in their fecal samples (i.e., Sapovirus, Human Mastadenovirus F or Salivirus) (Fig. 2). However, it was an inquisitive finding the detection of CmEV in a fecal sample collected from a 6-month-old child (BRA/TO-74). Food diversification occurs at the transition between exclusive breast or formula feeding and the introduction of regular and significant quantities of non-milk foods [21]. In Brazil, parents are advised to begin food diversification between 4 and 12 months, generally starting with the introduction of fruit/vegetable purées [22]. It can be speculated that the child was undergoing food diversification; in fact, some members of the Cucurbitaceae family, such as melon and watermelon, are among the eligible groups of fruits that can be introduced at this particular age (http://www.who.int). However, it is important to emphasize that no cultural-epidemiological data were obtained from the patients analyzed.

Melons are members of the gourd family (Cucurbitaceae) and are important crops cultivated for fresh consumption [8]. The number of commercial fields of Cucumis melo in the state of Tocantins has increased rapidly in the last decade (https://sidra.ibge.gov.br/home/ipca15/brasil). However, investigations addressing the identification and genome characterization of endornavirus in Brazil are virtually absent [1]. Therefore, prediction of any association between the results found here and the cultivated plants in the state of Tocantins is hampered by a lack of genome sequencing data. It worth to mentioned that CmEV is not associated with any leaf symptoms; therefore, possible effects of the CmEV on yield and fruit quality remain to be studied [23].

Although plant virus discovery was not included in the original aims of the NGS surveillance proposed to study enteric viruses, metagenomic analyses offered an opportunity to identify for the first time in Brazil the endornavirus species, CmEV. The full genetic diversity of endornavirus genomes remains to be described [12], especially in Brazil. The data obtained from this investigation will contribute to the growing database of the geographic distribution and molecular diversity of endornaviruses, as well as contribute to the growing body of knowledge concerning the ecology, epidemiology, and evolution of these viruses.