The majority of viruses on Earth are proposed to have double-stranded DNA (dsDNA) genomes (Hendrix et al. 1999). This is not surprising, because all known cellular life uses dsDNA as its genetic material and all extant viruses require cellular hosts. All viruses are dependent on host translation machinery, no matter the type of nucleic acid that they use for their genome. Most dsDNA viruses depend on host transcription and many on host replication machinery. Because they share the nature of their genomes, viruses with dsDNA genomes can easily use host transcription and replication machinery. In fact, many fundamental aspects of cellular DNA replication, transcription, and translation were elucidated by studying viruses with dsDNA genomes (e.g., Yaniv 2009; Watson et al. 2014). Most dsDNA viruses appear to evolve slowly and often in concert with their hosts (Duffy et al. 2008; Holmes 2009). Some authors theorize that the use of dsDNA as genetic material arose in viruses (Forterre 2002; Villarreal and DeFilippis 2000).

Viruses with RNA genomes are also highly successful, and are infamous as the causative agents of many human diseases. RNA viruses are thought by many authors to be remnants of the “RNA World” and may be direct descendants of original RNA-replicators (e.g., Koonin et al. 2006; Forterre 2006; Jalasvuori and Bamford 2008; Holmes 2011). Current RNA viruses usually have large populations, small genomes, and high mutation rates and thus evolve rapidly (Holmes 2009). The astounding diversity and evolution of RNA viruses have been extensively reviewed elsewhere (Holmes 2009; King et al. 2011).

Some RNA viruses can use their genomes directly for translation without requiring prior genome replication. These are the “positive strand” RNA viruses. Many animal and plant viruses have positive strand ssRNA genomes (King et al. 2011). However, using the same RNA as both genome and mRNA is inefficient, because equal quantities of mRNAs for both structural and non-structural genes are made even though many more structural proteins are required than non-structural proteins. There is also competition between mRNA formation and genome replication. Most of the larger RNA viruses use subgenomic RNAs to solve this problem and some use multi-partite genomes (Holmes 2009).

The “negative strand” RNA viruses use the RNA strand complementary to the coding strand for their packaged virus genomes. This strategy requires the virus genome to be replicated by a viral RNA-dependent RNA polymerase that must be introduced into the host cell together with the virus genome to make mRNA. This seemingly inefficient process can, however, be used to make large quantities of shorter mRNAs (e.g., for structural proteins) from portions of the negative strand genome, generally at the 3′ end of the genome.

Some viruses contain dsRNA genomes (a negative and positive strand) and thus require RNA helicases or replication to make mRNA. Some virus genomes even code for proteins on both the genome and antigenome RNA (e.g., bunyaviruses; King et al. 2011). The replication of these dsRNA and “ambisense” viruses seems to be a modification of the mechanisms used by positive and negative strand RNA viruses and appears to combine the evolutionary advantages of both (King et al. 2011). These viruses are probably derived from either positive or negative strand RNA viruses (Holmes 2009).

The retroviruses are a special group of positive strand ssRNA viruses which copy their genomes with the aid of a reverse transcriptase (RT) into dsDNA. This dsDNA is often integrated into the host genome (Acheson 2011). These retrovirus genomes are then replicated in the host genome. For virus reproduction, the virus genome is transcribed and then packaged into virus. It is estimated that 8 % or more of the human genome is retroviral in origin (Finnegan 2012). The integration strategy is clearly efficient for virus persistence (Villarreal 2004). Curiously, a number of viruses package dsDNA genomes but are replicated through an RNA intermediate using reverse transcriptase. These viruses, such as the caulimoviruses and hepadnaviruses appear to be closely related to the retroviruses (King et al. 2011).

The last class of virus genomes, and the focus of this manuscript, are those composed of single-stranded DNA (ssDNA). Most ssDNA viruses isolated to date have eukaryotic hosts (King et al. 2011), but there are ssDNA viruses which infect bacteria (Fane et al. 2006) and archaea (Mochizuki et al. 2012; Pietilä et al. 2009). Most ssDNA viruses have circular genomes (King et al. 2011). Moreover, circular ssDNA virus genomes have been detected in many ecosystems from volcanic hot springs and the open ocean to soils, air, and animal feces (Diemer and Stedman 2012; Delwart and Li 2012; Rosario et al. 2012b). Quantification of the prevalence of ssDNA viruses has been challenging. This is due to their lack of completely conserved genes, their small size, and the process of strand-displacement amplification used in many metagenomic studies. This strand-displacement amplification is known to preferentially amplify circular ssDNA genomes (Kim and Bae 2011).

Recently, it has become clear that ssDNA viruses are not only diverse, but probably also an ancient virus lineage. The discovery of genes clearly derived from ssDNA viruses in the genomes of many different host organisms has led to the estimation that ssDNA viruses are at least 40–50 million years old, if not much older (Belyi et al. 2010; Liu et al. 2011; Koonin et al. 2006). Single-stranded DNA viruses have not been as extensively studied as dsDNA and RNA viruses, probably because they do not cause disease in healthy humans. However, ssDNA viruses are known to cause disease in pigs (PCV-2), birds (Beak and Feather Disease Virus) and are common plant pathogens (Martin et al. 2011; King et al. 2011). Moreover, these ssDNA viruses may be involved in the emergence of novel viral pathogens (Firth et al. 2009).

Replication of ssDNA viruses is baroque. Viruses with ssDNA genomes must first make the complementary strand to their single-stranded DNA genome before transcription and translation can take place. This extra DNA replication step seems to be without a clear evolutionary advantage relative to dsDNA and RNA viruses. Some, if not most, ssDNA viruses replicate very rapidly and most have very small genomes (Fane et al. 2006; King et al. 2011). Many ssDNA viruses have very high mutation rates, some as high as ssRNA viruses (Duffy et al. 2008). Homologous recombination rates in ssDNA virus genomes are also very high and genome rearrangements are common (Lefeuvre et al. 2009; Martin et al. 2011). This combination of extremely rapid replication, high reproductive rates and high rates of mutation and homologous recombination indicate that ssDNA viruses are at an evolutionary optimum (Duffy et al. 2008), which might provide some explanation for their success.

A further clue, and considerably more surprising, was the discovery of ssDNA virus genomes that appear to have arisen by non-homologous recombination between DNA and RNA viruses. The first recognized example of this apparent RNA capture by a ssDNA virus was reported in April 2012 by Diemer and Stedman (2012). A sequence assembly from a virus metagenome from Boiling Springs Lake (BSL), a high temperature, high altitude, low pH lake, contained contiguous DNA sequences encoding genes similar to both RNA virus capsid protein (CP) and ssDNA initiator of replication (rep) protein genes. The complete genome sequence was determined and the presence of the virus genome in BSL verified with PCR. The putative CP gene from this genome encodes a protein with very similar amino acid sequence to the CP from two different RNA viruses which infect plant-pathogenic fungi, Plasmopara halstedii-A virus, and Sclerophthora macrospora-A virus. By stark contrast, the putative rep protein was most similar to animal circovirus (ssDNA) rep proteins. This virus genome was provisionally named BSL–RNA–DNA Hybrid Virus (BSL-RDHV; Fig. 1a). Soon thereafter another ssDNA virus genome was reported in a dragonfly metavirome that contained a circovirus-like ssDNA rep protein gene, but a putative CP gene sequence with limited similarity to the ssRNA virus satellite tobacco mosaic virus CP gene, but no detectable similarity to the BSL-RDHV CP gene (Rosario et al., 2012a; Fig. 1c). A ssDNA virus genome associated with Daphnia mendotae cultures from the Finger Lakes in upstate New York, contains a rep and CP gene that are both very similar to the BSL-RDHV genes, but with the genes in ambisense orientation (Hewson et al. 2013; Fig. 1d). Finally, a consensus sequence from a viral metagenome from Tampa Bay has not only very similar rep and CP genes to BSL-RDHV but also similar genome orientation (McDaniel et al. 2013; Fig. 1b). Moreover, many DNA virus metagenomes contain partial sequences of CP genes similar to that of the BSL-RDHV (Diemer and Stedman 2012; Lopez-Bueno et al. 2009; Whon et al. 2012; unpublished observations).

Fig. 1
figure 1

Complete genomes of “RNA–DNA hybrid viruses” (RDHV). All genomes are shown to scale and ORFs are indicated as arrows. Panel a: BSL-RDHV, Boiling Springs Lake RNA–DNA Hybrid Virus, 4,089 nucleotides (Diemer and Stedman 2012). Panel b: TB_AmbHV, Tampa Bay Ambient Hybrid Virus, 3,587 nucleotides (McDaniel et al. 2013). Panel c: DfcycV, Dragonfly cyclicusvirus, 1,831 nucleotides (Rosario et al., 2012a). Panel d: DMClaHV, Daphnia mendotae DNA–RNA hybrid virus, 3,648 nucleotides (Hewson et al. 2013). Black arrows indicate ORFs, products of which are most similar to rep proteins from circoviruses. Grey/Green arrows indicate ORFs, products of which are similar to capsid proteins from RNA viruses. Unfilled arrows represent ORFs with products of unknown function. The “lollipop” structure at the top of each genome map indicates the conserved stem-loop structure and putative origin of replication for all genomes. N.B. The CP gene of DfcvcV is not similar to that of the other known RDHV CP genes

The most parsimonious argument for the presence of these different RNA virus-like CP genes in different orientations in multiple ssDNA virus genomes is that the CP gene was acquired from a co-infecting ssRNA virus. Acquisition of a rolling circle replicase protein mRNA by an RNA virus followed by reverse transcription of the whole virus cannot be ruled out (Krupovic et al. 2009; 2012), but based on the different genome orientations and conservation of the viral rep protein sequences this scenario seems less likely than RNA capture (see Saccardo et al. 2011; Liu et al. 2011). Moreover, acquisition of a novel capsid protein would probably be an evolutionary advantage to the virus, due to possible expanded host range, avoidance of host defenses, and accessibility of novel vectors for virus transport between hosts. However, it could also be a disadvantage if host defenses to the RNA virus CP had already been developed. The advantage of a rolling circle replicase protein and change to DNA for an RNA virus is less clear. In a “RNA world” transition to a DNA genome to escape host defenses could be an advantage (Forterre 2006). DNA viruses could potentially utilize extant host resources better than an RNA virus, but current RNA viruses seem to be very successful.

The mechanism of replication of circular ssDNA viruses is unique (Fig. 2). Replication in ssDNA viruses has been best-studied in porcine circoviruses (PCV; Cheung 2012), plant geminiviruses (Gutierrez 2002), and the microviruses of bacteria (Fane et al. 2006). The latter seem to be very distantly related to the animal and plant ssDNA viruses. Most circular ssDNA viruses have a conserved stem-loop structure in their genomes, which presumably forms in the packaged ssDNA genome and is released on infection. In the first stage of viral replication, the ssDNA genome is copied into dsDNA using host mechanisms and proteins (Fig. 2a). For bacterial ssDNA viruses, primers appear to be formed by the cellular DNA primase (Fane et al. 2006). However, the source of the primer for the anti-genome strand for animal or plant circular ssDNA viruses is not clear (Cheung 2012; Gutierrez 2002). Once double stranded, the stem-loop structure seems to play a critical role for further replication. It is not clear whether this structure in the dsDNA replication intermediate (Fig. 2b) is a cruciform structure, two parallel stem-loop structures or simply double stranded. A “melting pot” model for this structure has been proposed for PCV replication (Cheung 2012) to account for observed template switching and the presence of a recombination hot-spot at this location (Lefeuvre et al. 2009; Cheung 2012).

Fig. 2
figure 2

Schematic diagram of replication model for ssDNA viruses and indication of possible locations for mRNA incorporation. Panel a :Step 1 of DNA replication from single-stranded (ss) to double-stranded (ds) DNA. Black lines represent the original genome and grey/green the newly replicated genome. The stem-loop structure (not to scale) is represented at the top of each genome. Panel b: Step 2 of DNA replication leading to ssDNA genomes. The arrowhead indicates the rep protein. Dashed lines represent newly replicated DNA. Roman numerals represent intermediate steps in genome replication. Asterisks represent steps at which RNA could be incorporated into the replicating genome

In any case, the hairpin loop structure is critical for the initiation of single-stranded DNA synthesis by a so-called “rolling circle” mechanism. To start ssDNA replication, the virus rep protein nicks the DNA at the end of the loop within the stem-loop structure (Fig. 2b-I), and simultaneously forms a covalent link between a conserved tyrosine residue and the 5′ end of the non-template strand (Fig. 2b-II). This process appears to be similar to type-1 topoisomerases and tyrosine recombinases (Yang 2010). The 3′ end generated by this cleavage serves as a primer for cellular DNA polymerases, which then copy the template genome (Fig. 2b-III and IV). After a complete ssDNA copy of the genome has been replicated, the rep reverses its activity and ligates the displaced non-template strand, forming a complete circular ssDNA genome and regenerating a nick for further genome replication (Fig. 2b-V). Often multiple concatameric genome copies are made before ligation (Martin et al. 2011; not shown in figure), providing ample opportunities for homologous recombination.

The mechanism of circular ssDNA virus replication as outlined above provides tantalizing hints regarding how RNA could have been captured to generate the RDHV-like viruses. The most abundant mRNA in most RNA viruses is the capsid protein mRNA, making this the most likely to be used (Acheson 2011). Moreover, the capsid confers host-specificity, determines which intermediate transmission vectors can carry the virus productively, and is critical for avoiding host defenses (Acheson 2011; Saccardo et al. 2011). Thus, a capsid mRNA would be the most likely of any captured RNAs to be maintained after selection.

There are multiple steps in ssDNA virus replication at which RNA could be incorporated. First, the primer used for replication of the second strand of the virus genome (Fig. 2a) could be an mRNA from a co-infecting RNA virus. Similarly, a RNA virus mRNA could be used as a primer for rolling circle replication, (Fig. 2b-II). Many RNA virus mRNAs have secondary structures at their 5′ and/or 3′ ends, which could mimic the stem-loop structure found in the ssDNA viruses and serve as a template for rep protein binding, cleavage and/or ligation. Either of these mechanisms would generate a “head to tail” genome arrangement of the rep and CP genes, similar to that found in the BSL-RDHV, DfCycV and TB_AmbHV genomes (Fig. 1a, b and d). Another mechanism, or gene inversion, can be invoked to generate ambisense or “head to head” arrangements of rep and CP genes as observed in DMClaHV (Fig. 1c). In this scenario, the viral rep protein could attach to the 5′ end of a mRNA instead of the 5′ end of the template strand DNA in step 2 of ssDNA replication (Fig. 2b-III). Finally, the ligation of the replicated ssDNA by the rep protein to generate a full-length circular ssDNA genome could ligate the RNA virus mRNA instead of DNA (Fig. 2b-V). All of these scenarios require ligation of the “free end” of the captured mRNA to the virus DNA. Whether the rep protein is capable of this activity or there are cellular ligases that perform RNA–DNA ligation is not clear. This process would, in theory, be active on all RNAs, but only a CP gene is likely to be maintained after selection. Discovery of a ssDNA virus genome with a trapped cellular mRNA would be strong support for these mechanisms.

After RNA capture, 2 more steps are required to generate a RDHV-like virus. Reverse transcription of the captured mRNA into DNA is required to transform the RNA into DNA. Many cells, if not all, have reverse transcriptase (RT) activity (Finnegan 2012). A cellular RT seems more likely than a co-infection with a third virus encoding a RT. The capture of a RNA CP gene would generate two CP genes after ligation and reverse transcription; one from the original ssDNA virus, and one from the acquired RNA virus CP. Presumably, the ssDNA virus capsid protein was lost by recombination in current RDHV-like viruses, which have probably diverged considerably after the original capture event. The limited size of the viral capsid provides strong selection for smaller sized genomes (Fane et al. 2006), but it is equally likely that after recombination the original ssDNA capsid could be maintained. An alternative to the proposed RNA capture is that reverse transcription of co-infecting viral mRNAs took place before integration by recombination into the dsDNA form of a ssDNA virus genome (Krupovic 2012). However, this scenario would predict that capsid proteins similar to RNA viruses should be found in dsDNA viruses, which has not been the case to date.

Recombination between ssDNA–RNA virus genes may seem to be highly unlikely, but given the extremely large number of viruses, estimated at an astronomical 1031 viruses on Earth (Hendrix et al. 1999), even extremely infrequent events will occur with a non-negligible probability. Moreover, an accepted mechanism for gene acquisition in dsDNA viruses is non-homologous recombination or “moron acquisition” (Hendrix et al. 1999). It is thus critical to study the potentially very low RNA ligase and endonuclease activity of the ssDNA virus rep to determine how frequent such RNA capture events are. Determining the latter may be important in the detection of new viruses and prediction of emerging infectious diseases (Morens and Fauci 2012).

These putative RNA capture mechanisms are not limited to the ssDNA viruses. Recently a number of RNA virus genomes have been discovered in dsDNA host genomes (Horie and Tomonaga 2011; Feschotte and Gilbert 2012; Cui and Holmes 2012). The mechanism or mechanisms of this integration are not known, but possibly ssDNA viral rep proteins, which are also known to be integrated in many host genomes could be involved (Liu et al. 2011).

Summary and Future Directions

Apparently due to their unique replication mechanism, ssDNA viruses can acquire RNA virus capsid proteins, which could give them an evolutionary advantage and strengthen arguments for ssDNA viruses being at an optimum in virus evolutionary space. Moreover, their ability to capture RNA may make ssDNA viruses particularly evolutionarily robust. Access to extremely diverse sequence space appears to compensate or be a reason for the cumbersome replication process required for ssDNA virus replication (Fig. 2). It is critical to determine the actual abundance of ssDNA viruses, both those with RNA virus-like capsid proteins and those with canonical ssDNA virus capsid proteins. Moreover, complete genomes of more ssDNA viruses, particularly those with RNA virus-like capsid proteins, should be determined. These genomes may help elucidate the mechanisms of RNA capture. It is also distinctly possible that some ssDNA viruses, capsid proteins of which are not clearly related to other viruses, such as the recently proposed “Gemycircularviruses” (Yu et al. 2010; Rosario et al. 2012a) are related to as yet undiscovered RNA viruses. Finally, and most importantly, biochemical assays with ssDNA virus rep proteins with RNA substrates should be undertaken.