Introduction

The known diversity of small circular Rep-encoding ssDNA (CRESS-DNA) genomes has rapidly increased due largely to the analysis of environmental and animal samples using deep sequencing following rolling-circle amplification [6, 1113, 20, 32]. Among the many distinct taxa of CRESS-DNA viruses, a few are known to replicate in eukaryotic hosts ranging from plants and fungi to birds and mammals [7]. The cellular hosts for many viruses with a CRESS-DNA genome remain unknown. Members of the family Circoviridae include circoviruses known to infect birds and mammals, at times with pathogenic consequences. Genomes closely related to those of circoviruses, named “cycloviruses”, have been detected in mammalian tissues and feces as well as in the bodies of different insects [7, 9, 19, 25, 31]. For cycloviruses, the cellular hosts remain unknown but may include mammals and/or insects cells, or conceivably, originate from consumed animals or plants. Another distinct clade of viruses with CRESS-DNA genomes was also described in the feces of chimpanzees and other mammals, including pigs, rats [1, 35, 10, 18, 21, 22, 25, 26, 30], and recently humans [14], whose cellular hosts are also unknown. Yet another clade of CRESS-DNA genomes whose cellular hosts are not known was reported in the feces of pigs, seals, dromedaries and humans [4, 24, 27, 30]. Here we describe five CRESS-DNA genomes in human fecal samples that cluster within two of these phylogenetic clades of unknown cellular origin. We characterize these genomes and describe their prevalence as measured by PCR in human diarrhea fecal samples.

Materials and methods

Clinical samples and viral metagenomics

Fecal samples from Peruvian children with diarrhea of unknown etiology were used that pre-tested negative for rotavirus, adenovirus, norovirus, campylobacter, shigella, salmonella and Escherichia coli [8]. A total of 58 fecal samples were analyzed in pools of three to five samples. Fecal suspensions were first vortexed and clarified by centrifugation at 15,000×g for 10 min. The supernatant was filtered through a 0.45-µm filter (Millipore) to remove bacterium-sized particles. The filtrate was treated with a mixture of enzymes to digest unprotected nucleic acids [16], and viral nucleic acids were then extracted using a Maxwell 16 automated extractor (Promega). Random RT-PCR using primers with degenerate 3′ ends was then used to amplify RNA and DNA molecules, and a library was constructed for Illumina sequencing (MiSeq 2x250 bases) using a Nextera™ XT Sample Preparation Kit. Translated sequence reads showing similarity to viral sequences with an E score <10−5 were identified using BLASTx. Viral sequences were identified based on similarity to annotated viral genome sequences available in the GenBank non-redundant database, and results were mapped to the NCBI Taxonomy.

Determination of genome sequences of novel circular viruses

Complete circular DNA viral genomes were amplified using inverse PCR (iPCR) with specific primers designed from short sequence fragments. iPCR amplicons were then sequenced directly by primer walking. Putative ORFs in the genome were predicted using NCBI ORF Finder. Sequence analysis was performed using ClustalX with default settings [23]. A sequence identity matrix was made using BioEdit. To identify stem-loop structures, nucleotide sequences were analyzed with Mfold [33]. Phylogenetic trees with 100 bootstrap resamplings of the alignment data sets were generated using the neighbor-joining method and visualized using the program MEGA version 6 [28]. Bootstrap values (based on 100 replicates) for each node are shown if >70 %.

PCR assays for novel circular viruses

Three sets of primers for nested PCR based on the Rep coding regions were designed to screen nucleic acids from fecal supernatants.

For CRESS-DNA genome PE31, primers PESmaCV-F1 (5′-GGA TGT TAT TCG GTT GTG CGG A-3′) and PESmaCV-R1 (5′-CAC CGC AGA ATG ATC CGA CAA-3′) were used for the first round of PCR, and primers PESmaCV-F2 (5′-CAG TCC GTC TTT GAT GCT TTC-3′) and PESmaCV-R2 (5′-GGA GTA TGA GGC GAA AGA AGG-3′) were used for the second round of PCR, resulting in an expected amplicon of 255 bp. The PCR conditions were 95 °C for 5 min, 35 cycles of 95 °C for 30 s, 52 °C or 50 °C (for the first and second round, respectively) for 30 s and 72 °C for 1 min, followed by a final extension at 72 °C for 10 min.

For CRESS-DNA genome PE16, primers PESmaCV-F3 (5′-CTT TGG TTG TGT GTT CGT TAG GAC T-3′) and PESmaCV-R3 (5′-AGC AGC AAC ACA GGA TTC TTC-3′) were used for the first round of PCR, and primers PESmaCV-F4 (5′-TGG TAT CTC GTG TCG TAA ACT-3′) and PESmaCV-R4 (5′-GAA AAA GCC GAA AGA GGA GTG-3′) were used for the second round of PCR, resulting in an expected amplicon of 425 bp. The PCR conditions were 95 °C for 5 min, 35 cycles of 95 °C for 30 s, 50 °C or 49 °C (for the first and second round, respectively) for 30 s and 72 °C for 1 min, followed by a final extension at 72 °C for 10 min.

For CRESS-DNA genome PeCV-PE, primers PEPeCV-F1 (5′-TCC TTC ATC GTG GTA GCA ATG GTA G-3′) and PEPeCV-R1 (5′-CTG GTG TGT AAC GAG TTG GAC TTG C-3′) were used for the first round of PCR, and primers PEPeCV-F2 (5′-CTC ACG CTG TTT AGC CCT TTT G-3′) and PEPeCV-R2 (5′-TGA TAG CGT GGA ACG TCT TGT AGT-3′) were used for the second round of PCR, resulting in an expected amplicon of 430 bp. The PCR conditions were 95 °C for 5 min, 35 cycles of 95 °C for 30 s, 53 °C or 51 °C (for the first and second round, respectively) for 30 s and 72 °C for 1 min, followed by a final extension at 72 °C for 10 min.

Results

Viral metagenomics overview

Viral-particle-associated nucleic acids were enriched and randomly amplified from 58 fecal samples from Peruvian children with diarrhea of unknown etiology. Thirty-six million sequence reads were generated in one Illumina MiSeq run. Using a BLASTx E-score cutoff of 10−5, sequence reads related to mammalian viruses consisted of norovirus (124,168 reads), adenovirus (86,714 reads), group A rotavirus (35,869 reads), anelloviruses (21,995 reads), enterovirus (4,670 reads), CRESS-DNA-like viruses (3,043 reads), salivirus (461 reads), group C rotavirus (198 reads), bocavirus (93 reads), picobirnavirus (74 reads), sapovirus (73 reads) and Aichi virus (2 reads).

New clade distantly related to members of the Circoviridae

One fecal sample pool of five samples contained 1,753 sequence reads sharing low levels of genetic similarity to the Rep and Cap proteins of circular ssDNA viruses found in feces from a human (GenBank KJ206566), a pig (GenBank KJ414134) and a seal (GenBank KF246569). The individual fecal sample within the pool containing these sequences was identified using PCR. Using inverse PCR and Sanger sequencing, we then determined the complete genome sequence of the virus. To facilitate description and discussion of that clade, we provisionally refer to that group of viral genomes as “pecoviruses” (Peruvian stool-associated circo-like viruses). The circular human-feces-derived pecovirus genome (PeCV-PE) contained 2,937 bases, with a GC content of 49 %, and two major ORFs encoding replicase and capsid proteins (GenBank KT600065). Similar to many CRESS-DNA genomes, the Cap and Rep genes were in opposite orientation [19]. The Rep and Cap proteins were 320 aa and 388 aa in length, respectively. The Rep showed 35 % aa sequence identity to Rep of porcine- and human-feces-derived CRESS-DNA genomes. A Rep-based phylogenetic analysis showed that PeCV-PE clustered together with previously reported CRESS-DNA genomes from human, seal, pig, and dromedary feces. While the genomes from animal feces all shared the highly conserved canonical nonamer (NANTATTAC) atop a stem-loop, the conserved nonamer was not found in the human-feces-derived PeCV genomes, which contained a stem-loop with the nonamer TTTTATGAG. Direct observation of the PeCV Rep sequence alignment revealed two rolling-circle replication motifs II (xHxH) and III (YxxK) [19]. The seal-feces-derived PeCV contained a rolling-circle replication motif I (LTVKN) that was not found in other PeCV genomes [24]. Similarly, the C-terminal region of the PeCV Rep protein possessed the ATP-dependent helicase Walker A (GxxGxGKS) and Walker B (IWFDEFNG) motifs [19, 24]. The Walker C motif (UxxN) was not found in any of the PeCV genomes [19, 24]. Pairwise comparison of all available PeCV sequences from the feces of humans, seals, pigs, and dromedaries demonstrated a high level of genetic diversity, with Rep identities of only 29-35 %, suggesting the existence of multiple species in that clade.

Nested PCR was then used to specifically test for human-associated PeCV DNA in all 58 fecal samples from Peru, yielding a PeCV DNA prevalence of 15.5 % (9/58). The nested PCR screening for human-associated PeCV DNA was extended to feces from children with diarrhea of unknown etiology from South and Central America [2, 15, 17, 29]. PeCV DNA was found with a detection rate of 5.9 % (3/51) in Nicaraguan and of 3 % (3/100) in Chilean fecal samples. However, none of the diarrhea feces from Brazil (n=170) and Venezuela (n=40) were positive for PeCV DNA. Two PeCV DNA positives (one from Nicaragua and another from Chile) were selected to determine their complete genome sequences by inverse PCR and Sanger sequencing, generating Nicaraguan pecovirus (PeCV-NI, 2928 bases; GenBank KT600066), and Chilean pecovirus (PeCV-CH, 2928 bases; GenBank KT600067). All human-feces-derived genomes were closely related in their Rep gene and more divergent in their Cap genes, with Rep proteins sharing 99 % identity, while the Cap proteins of PeCV-NI and PeCV-CH showed only 76 % identity to that of PeCV-PE. Similar relationships were seen by phylogenetic analysis (Fig. 1B) and genome sequence alignments (Fig. 1C).

Fig. 1
figure 1

New pecoviruses. A. Genome organizations and stem-loop structures. B. Phylogenetic trees generated based on Rep and Cap protein sequences from new pecoviruses detected in patients with unexplained diarrhea in Peru, Nicaragua, and Chile and other genetically related CRESS-DNA genomes. The scale indicates amino acid substitutions per position. C. Nucleotide sequence similarity among the new pecoviruses. The Rep and Cap locations are shown

Highly divergent small circular DNA genomes

During the initial viral metagenomics analysis, we also found eight sequence reads from two sample pools (four reads per pool) showing similarities to a chimpanzee-feces-associated CRESS-DNA genome (ChiSCV-GM488; GenBank GQ351272). Using iPCR, we determined the complete sequences of these two genomes (Fig. 2). We provisionally called these and related viruses “smacoviruses” (SmaCV2 and SmaCV3) [14]. The genomes of SmaCV2 (2601 bases, GenBank KT600068) and SmaCV3 (2,541 base; GenBank KT600069) also contained two major ORFs encoding Rep and Cap proteins in opposite orientation. The Rep (278 aa) and Cap (369 aa) of SmaCV2 showed the highest identities of 57 % to Rep of chimpanzee-feces-associated ChiSCV-GM488 (GenBank GQ351272) and 50 % to Cap of porcine feces associated SmaCV6 (GenBank KJ577819). The Rep (267 aa) of SmaCV3 showed 95 % identity to captive-chimpanzee-feces-derived smacovirus SF2 (GenBank KP233190), while their Cap proteins were 74 % identical. A large stem-loop containing a conserved nonamer (NANT(A/G)TTAC), a small adjoining stem-loop (TAAA), rolling circle replication motifs I-III, and ATP-dependent helicase motifs A-C [19] were detected in SmaCV2 and -3. Phylogenetic analysis showed SmaCV2-3 to be distinct from the other recently described human feces-associated smacoviruses (Fig. 2).

Nested PCR screening did not find additional SmaCV2-positive samples among the 58 Peruvian fecal samples. SmaCV3 DNA could be nPCR amplified from another 10 Peruvian fecal samples, yielding a detection rate of 1.7 % (1/58) and 19 % (11/58) for SmaCV2 and -3, respectively in Peru. None of the fecal samples from children with diarrhea of unknown etiology from South and Central America, including Nicaragua (n=51) [2], Brazil (n=170) [17] and Venezuela (n=40) [29], were SmaCV2 or -3 positive by PCR.

Individual metagenomics analysis of CRESS-DNA containing fecal samples

To investigate whether other viral pathogens were present in the pecovirus- and smacovirus-containing fecal samples from diarrheic patients, five PCR-positive samples from which whole CRESS-DNA genomes were derived were individually analyzed by deep sequencing. As expected, the PeCV sequences were found in the individual feces from Peru (4,704 of 6,971,262 total reads), Nicaragua (537 of 4,704,310 total reads) and Chile (32,603 of 13,458,436 total reads). The following enteric pathogens were also found in the PeCV-positive samples: group A rotavirus (150,301 reads) in the Peruvian sample, enterovirus A (124 reads) in the Nicaraguan sample and enterovirus B (99,112 reads) in the Chilean sample. Deep sequencing also confirmed the presence of SmaCV2 (100 of 448,892 total reads) and SmaCV3 (16 of 770,660 total reads). The SmaCV2- and SmaCV3-positive feces from Peru both showed the presence of nucleic acids from picobirnavirus (134 and 948 reads). The presence of known enteric pathogens indicates that the diarrhea experienced by the sample donors could be explained by these mammalian viruses rather than the CRESS-DNA viruses.

Fig. 2
figure 2

New smacoviruses. A. Genome organizations and stem-loop structures. B. Phylogenetic trees generated based on Rep and Cap protein sequences from smacoviruses detected in Peruvian patients with unexplained diarrhea and other genetically related smacoviruses. The scale indicates amino acid substitutions per position. C. Nucleotide similarity between Peruvian smacoviruses and chimpanzee smacovirus. The Rep and Cap locations are shown

Discussion

Small circular ssDNA viral genomes were genetically characterized in fecal samples from cases of diarrhea of unknown etiology. We provisionally called these viruses “pecoviruses” and “smacoviruses” [14]. Pecoviruses currently include five distinct genomes from the feces of humans, seals, pigs and dromedaries. The first human-feces-associated PeCV was reported in a patient during an outbreak of acute gastroenteritis in the Netherlands (VS6600022) [27]. Other related genomes were found in feces from asymptomatic mammals [4, 24, 30]. Here, we also report PeCV genomes in human feces from Peru, Nicaragua and Chile, while diarrhea samples from Brazil and Venezuela were PCR negative. All PeCVs shared a similar genome organization, but the conserved nonamer located in a stem-loop differed between human-feces-associated PeCVs (containing a nonamer TTTTTATGAG) and animal-feces-derived PeCVs (containing the canonical nonamer TAGTATTAC). The genetic characterization of three related PeCV genomes in human feces from Peru, Nicaragua and Chile plus the previously reported Dutch genome showed highly divergent capsid genes, indicating the presence of diverse PeCVs in human feces. Sequencing of SmaCV2 and -3 also indicated that human-feces-associated SmaCVs are highly diverse, and when compared to other smacoviruses, they exhibited a range of stem-loop structures.

Known or potential enteric viral pathogens were also detected in these diarrheic donors’ feces, which may have played more-direct roles in these patients’ diarrhea. As the cell tropism of these CRESS-DNA genomes remains unknown, so do their roles, if any, in these patients’ symptoms. Detection of these genomes in feces could reflect replication in human gut cells, passive transit of viruses in the food consumed by these individuals, or conceivably, even replication in gut-residing protozoa, fungi, or bacteria.