The Circoviridae family classifies viruses comprising small, icosahedral, non-enveloped particles infecting eukaryote organisms including both invertebrate and vertebrate animals. The family is classified into two genera, Circovirus and Cyclovirus [16]. The single-stranded circular DNA (ssDNA) genome of circoviruses and cycloviruses measures 1.7 to 2.1 kb in both genera with two open reading frames (ORFs) which encode the replication-associated protein (Rep) and the capsid protein (Cap). The orientation and strand-specificity of these ORFs are key features that distinguish members of the two genera [3, 16]. Of interest, fossil circoviral elements, mainly restricted to the rep gene (or its non-functional derivative), have been identified integrated into eukaryotic host genomes [5, 8]. Differentiating these integrated fossil genetic traits from replication competent circoviruses and circovirus-like agents is critical for developing a better understanding of viral genome biology, ecology, possible disease associations, and finally virus taxonomy.

Circoviruses have been identified, with or without disease, in several bird species including parrots, pigeons, ravens, ducks, finches, and chickens [11, 12, 15, 17,18,19,20]. In contrast, chickens are the only bird species, in which cycloviruses have been identified [11, 12]. Recently, we have conducted an ecological survey among wild birds to identify potential reservoirs of circoviruses pathogenic to domestic poultry. One virus whose genome sequence we determined could be classified in the Cyclovirus genus and showed close genetic relatedness to cyclovirus-like partial rep sequences amplified from feces of a healthy Tunisian child and to cyclovirus-like partial rep sequence detected in honey bees in Hungary [11, 13]. These earlier reports did not clarify whether the novel cyclovirus-like sequences are integrated genomic elements, or, are parts of the genome of replication competent exogenous viruses. Thus, we made an attempt to perform whole genome sequencing of a wild bird-origin cyclovirus-like agent.

For this study cloacal swab specimens were resuspended in 1 ml of PBS buffer. Nucleic acid was extracted using the Direct-zol RNA MiniPrep Kit (Zymo Research) omitting DNase treatment. A pan viral degenerated primer set targeting the circoviral rep gene was utilized in a screening PCR assay [11, 13]. PCR mixtures (25 μl final volume) contained 1x DreamTaq Green buffer, 200 μM dNTP mix, 200 nM primers, 0.625 U DreamTaq DNA Polymerase (Thermo Fisher Scientific), and 1 μl of nucleic acid templates. The cycling protocols used for the nested PCR were as follows: denaturation at 95 °C for 3 min, 40 cycles of 95 °C for 30 s, 52 °C (first round of nested PCR) and 56 °C (second round of nested PCR) for 30 s and extension at 72 °C 1 min; a final extension step (72 °C for 10 min) was also included [13]. The second round PCR product was extracted from the gel slice by the Geneaid Gel/PCR DNA Fragments Extraction Kit and was directly sequenced using the BigDye Terminator v1.1 Cycle sequencing Kit (Thermo Fisher Scientific) on an ABI PRISM 3100-Avant Genetic Analyzer.

A total of 16 cloacal swab samples collected from waterfowl species, including seven samples from mallard duck (Anas platyrhynchos), seven samples from lesser white-fronted goose (Anser erythropus) and two samples from great crested grebe (Podiceps cristatus) tested positive for circoviral rep gene by PCR. All samples were collected during December 2013, around the town of Mezőberény. The settlement is located in south-east Hungary. Within and near the settlement there are over a dozen ponds and lakes and a mid-size river (Kőrös) is located about 1-1.5 km north from the town; favorable conditions for many waterfowl species to inhabit the neighborhood.

One specimen collected from mallard duck was selected for whole genome sequencing. Amplification of the whole virus genome was performed by back-to-back PCR primers (forward primer 5’ TCATCTCTTGAACTGGTGTGCC-3’ and reverse primer 5’-CTGTGACGCAATAACGAGGTC-3’) designed based on the sequence of the nested PCR product. The PCR mixtures (25 μl final volume) contained 1x Phusion Green HF buffer, 200 μM dNTP mix, 200 nM primers and 0.25 U Phusion DNA Polymerase (Thermo Fisher Scientific), and 1 μl of nucleic acid templates. The cycling protocol used for the back-to-back PCR was as follows: denaturation at 98 °C for 30 s, 45 cycles of 98 °C for 10 s, annealing at 57 °C for 30 s and extension at 72 °C 1 min; a final extension step at 72 °C for 10 min was added to the protocol [13].

The approximately 2 kb long amplicon generated by the back-to-back primers was initially processed for next-generation sequencing (NGS) using the Ion Torrent PGM instrument. We have previously shared procedures concerning library preparation, emulsion PCR, templated bead enrichment and sequencing of amplified PCR products and we applied the same strategy in this study as well [6]. De novo assembly, which was carried out by using the Geneious software [9], gave a sequence scaffold for additional primer design. To confirm sequence data obtained by semiconductor sequencing we used these additional primers (data not shown) in a primer walking sequencing strategy using the same amplicon that served as template for NGS library preparation. Sequence reads were subsequently assembled into a single consensus genomic sequence by AliView [10] that was deposited in GenBank under the accession number KY851116.

The genome of the suspect cyclovirus strain was 1,902 nt in length. Genome annotation [9, https://www.ncbi.nlm.nih.gov/orffinder/] identified the ORFs coding for the rep and the cap with lengths of 981 nt and 759 nt, respectively. Introns were not found in any ORFs. The two ORFs were predicted to localize on complementary DNA strands (Fig. 1). The non-coding region between the 5’ end of rep and cap gene measured 158 nt, whereas the non-coding region between the 3’ end of rep and cap was 4 nt long. The sequence of the nonanucleotide motif located upstream of the start codon of the cap gene was TAGTATTAC (Fig. 1). Collectively, these genomic features suggested that the duck-origin ssDNA virus can be classified within the newly proposed Cyclovirus genus.

Fig. 1
figure 1

Genomic organization of the novel duck associated cyclovirus 1 representing the rep and cap genes and the nonanucleotide motif

This finding was confirmed by phylogenetic analysis. The PhyML software was used to infer cyclovirus phylogeny [7]. The maximum likelihood algorithm using the GTR+G+I+F substitution model was selected and SH-like support was chosen to validate tree topology (Fig. 2). Pairwise distances from the whole genome alignment were calculated by using the Sequence Demarcation Tool v1.2 [14] using the Muscle alignment algorithm [4], which showed a range of sequence identities between 54.5% (FeACyV-1) and 60.6% (DfACyV-3) (Fig. 3). These values fell below the species demarcation threshold at 80% [16]. Further sequence analyses showed a close genetic relationship for the duck origin cyclovirus sequence with the human origin TN4 (nt, 98%) and the honey bee origin hb10 (nt, 100%) rep sequences along a ~470 nt fragment within the ORF (nt 1240-1710) encoding the rep protein (data not shown). These data together with current classification criteria strongly suggest that this duck origin cyclovirus isolate represents a novel species within the genus. Seeing the putative broad host range of the identified cyclovirus(es), however, assigning a host species may be challenging at this moment. It is unclear whether the duck cyclovirus related sequences detected in other host species might represent (i) viral genetic elements integrated into the respective animal genomic DNA, (ii) exogenous multi-host virus strains, (iii) exogenous viruses of yet unidentified organisms that are capable of colonizing various invertebrate and vertebrate animals, or (iv) these viruses are swallowed and pass through the intestine of animals by consumption of water or food where many newly described viruses possessing circular ssDNA genome can be effectively accumulated [1, 2]. The recent classification proposal for the Circoviridae which provides new taxonomic criteria for cycloviruses, does not satisfactorily discuss issues relating to the host origin of characterized strains. Nonetheless, given that the first representative full genome sequence originated from a wild duck specimen, we propose to introduce the name duck associated cyclovirus 1 for this isolate (DuACyV-1) and “Duck associated cyclovirus 1” for the species.

Fig. 2
figure 2

A maximum likelihood phylogenetic tree of representative cycloviruses’ whole genome sequences. Branches with SH-like support < 80% are not shown. Scale bar represents nucleotide substitutions per site. Sequence names include GenBank accession numbers followed by viral species using the acronyms introduced by Rosario et al. [16]; the novel duck origin cyclovirus is highlighted

Fig. 3
figure 3

A genome-wide pairwise identity matrix for representative cyclovirus strains. Sequence names include GenBank accession numbers followed by viral species using the acronyms introduced by Rosario et al. [16]; the novel duck origin cyclovirus is highlighted

In summary, this paper is the first to report a cyclovirus in wild bird species. Our study illustrates how little is known about the ecology and epidemiology of cycloviruses, a paradigm that needs to be addressed in future research and taxonomy proposals.