Introduction

Viruses are the most common biological entities, at least 10 times more abundant than bacteria, with counts of the order of 107 and 109 in marine waters and sediments, accounting for as much as 94 % of nucleic acid containing particles in the marine environment [18]. Viruses are known to infect from bacteria to mammals in the marine ecosystem [58]. The discovery of viral pathogens has been largely reactive, in a sense that new pathogens are usually discovered only when they are involved in disease or an epizootic. Virus discovery till recently employed methods such as purification of viruses from infected animals by density gradient ultracentrifugation, electron microscopy followed by identification of their nucleic acids and development of molecular diagnostic techniques. However, these methods are time consuming, although successful. Proactive characterization of putative viral pathogens from aquaculture species would be advantageous, especially for disease surveillance and management of aquaculture programmes. During the last decade, with the advancements made in molecular cloning, sequencing and bioinformatic tools, structure of marine biomes, near shore sediments, novel viruses associated with infections in marine animals such as sea turtle and sea lion have been described using metagenomic approaches [9, 14, 17, 43, 62, 63]. The term ‘metagenomics’ was coined by Handelsman in [35] 1998 and is defined as sequence based or function based cultivation independent analysis of collective microbial genomes in a given environment. Metagenomic methods can be applied to study viruses in any system including marine, terrestrial, and animal-associated environments. The first of the viral metagenomics was reported a decade ago, which described the uncultured near shore viral communities using shotgun cloning and sequencing [17]. Since then using viral metagenomics a number of viruses associated with cancerous tumors, nasopharyngeal samples, transplanted organs, blood and faeces of humans; terrestrial ecosystems such as deserts, prairie, rain forest soils, paddy soils and plants; extreme environments such as hyper saline environments, deep sea hydrothermal vents and hot springs; and marine and freshwater ecosystems have been described [82]. This tool has not been harnessed in aquaculture sector so far and hence, the present article provides an overview of virus discovery and viral diversity described during the recent years using metagenomic approaches and discusses the application of this technology in aquaculture and fisheries.

Viruses Causing Diseases of Marine Animals Discovered Through Conventional Tools

Viral diseases are widespread in nature and a number of them have been implicated in disease and mortality in marine mammals, finfish, and shellfish. Morbilliviruses, the RNA viruses belonging to the family paramyxoviridae are important pathogens of marine mammals [76], while herpes viruses have been reported to cause neoplastic diseases such as in fibropapillomas in marine turtles [33]. Among the viruses causing disease of finfish in marine environment include iridoviruses, which have been reported in over 140 different fish species worldwide [29]. Infectious salmon anaemia virus (ISAV), an enveloped negative sense single stranded RNA virus of the family orthomyxoviridae [44] in a significant pathogen of salmon mariculture in Norway, Scotland, Canada and the United States. Another important virus of marine fish is the one involved in viral haemorrhagic septicemia (IHN), a rhabdovirus, reported to cause epizootics in wild shoaling fish such as herrings and mackerel, and has been reported from over 50 species of fish [71]. In addition, marine finfish are also reported to be affected by herpesviruses [49] and nodaviruses [57]. Among other marine animals, molluscs are known to be affected by several viral diseases caused by iridoviruses, herpesviruses, birnaviruses and papova-like viruses and have been reported to be responsible for economic losses in Europe and Japan [58]. The most important commercial aquaculture animals, the penaeid shrimp are so far reported to suffer infections due to more than 20 viruses globally [20, 81], among which, white spot syndrome virus (WSSV), placed in a new family nimaviridae [30], yellow head virus (YHV) [86] of the family ronivirdae [30] and Taura syndrome virus (TSV) a picornavirus [37] have been largely responsible for causing serious economic losses to shrimp aquaculture [31]. Other viruses such as parvoviruses, the infectious hypodermal and hematopoietic necrosis virus (IHHNV) [12], hepatopancreatic parvovirus (HPV) [52], lymphoidal parvo-like virus (LPV) [65], baculoviruses such as Baculovirus penaei, monodon-type baculovirus, baculoviral midgut gland necrosis type virus [52], iridovirus [53], togaviruses such as lymphoid organ vacuolization virus (LOVV) [13], rhabdoviruses [59] and spawner isolated mortality virus [32] have been also reported to cause infections in shrimp. The number of viral pathogens from different geographical regions has been increasing with the reports of new diseases and syndromes such as infectious myonecrosis [67] and monodon slow growth syndrome [72].

Discovery of Novel Viruses of Marine Animals Using Metagenomics

A number of new viruses have been described recently in human and veterinary medicine, plant and marine sciences using viral metagenomics (Table 1). Some such important discoveries include description of a novel picornavirus (SePV1) having only 19.3–30.0 % mean protein sequence identity between the 3D polymerase gene sequence of SePV-1 and those of other picornaviruses in apparently healthy ringed seals (Phoca hispida), the most abundant mammal in the Arctic seas, hunted on the shore of Beaufort Sea [43]. Sea turtle tornovirus 1 (STTV1) and California sea lion anellovirus (ZcAV) both sharing only a limited sequence similarity to previously described viral genomes have been recently discovered [62, 63]. STTV1 was discovered from the fibropapilloma of a Florida green sea turtle, having a single-stranded, circular genome of ~1,800 nucleotides in length, and had very low amino acid identity with chicken anemia virus [62]. ZcAV was an anellovirus discovered from the lung of a California sea lion that died in a respiratory-related mortality event, had a single-stranded circular genome of 2,140 nucleotides, with only 35 % amino acid identity to feline anelloviruses in the ORF1 region [63]. Recently, a novel circovirus and two novel noda viruses from asymptomatic wild shrimp (Penaeus duorum) from Tarpon Springs, Florida, were discovered using metagenomic sequencing of DNA and RNA viruses purified from these shrimp [61]. The novel shrimp circovirus had a single-stranded DNA genome of 1,955 nucleotides and shared <50 % amino acid identity to any known viruses in the Genbank database. All previously described circoviruses are reported to infect birds or pigs and represented a phylogenetic branch distinct from the known avian and porcine circoviruses, making this shrimp circovirus the first circovirus described in an invertebrate. Two novel nodaviruses, which shared less than 60 % amino acid identity to known shrimp nodaviruses and likely represent a novel virus genus, were also identified in the shrimp [61]. In aquaculture setting, the only work using such an approach of random shotgun cloning and sequencing has been the description of Laem Singh virus (LSNV), associated with monodon slow growth syndrome of black tiger shrimp by Sritunyalucksana and coworkers [72], and the new virus was reported to be an RNA virus showing significant deduced amino acid sequence similarity to RNA-dependent RNA polymerases (RdRp) of the viruses in the family Luteoviridae.

Table 1 Discovery of viruses in humans, terrestrial and marine animals, plants and insects and understanding viral ecology using metagenomics

Diversity of Viruses in Marine Ecosystem

Viral metagenomics started with the publication describing viral diversity in seawater in a landmark paper by Breitbart et al. [17]. Extrapolating from about 1,000 cloned sequences they reported between 300 and 7,000 new viral types in seawater, among which, 35 % were phages. They reported that the metagenomic sequences comprised repeat and mobile elements, bacteria, archaea and eukarya, and that a majority of them included major families of dsDNA phages and over 65 % of the viral sequences generated were not significantly similar to known viruses. A similar study of dsDNA viruses in near shore sediments indicated much phylogenetic overlap with seawater bacteriophages and the presence of at least 104 distinct genotypes per kilogram of sediments [14]. Subsequently, the global ocean sampling (GOS) Expedition, by Craig Venter in 2003, was one of the important projects exploring the marine metagenome on a larger scale (Table 1). Angly and coworkers [8] based on metagenomic analyses of 184 viral assemblages collected over a decade and representing 68 sites in four major oceanic regions reported that 60–80 % of the metagenomic sequences were not similar to the viral sequences in the current Genbank databases. Global diversity was very high, presumably several hundred thousand of species, and regional richness varied on a North–South latitudinal gradient. The marine regions had different assemblages of viruses. Cyanophages and a newly discovered clade of single-stranded DNA phages dominated the Sargasso Sea, whereas prophage-like sequences were most common in the Arctic. Most viral species were found to be widespread and the difference between viral assemblages was attributed to variation in the occurrence of the most common viral species and not by exclusion of different viral genomes [8]. They reported that no RNA bacteriophages were detected indicating that most marine bacteriophages have DNA genomes and that most hosts of marine RNA viruses may be eukaryotes. Studies on the RNA viruses in the ocean are relatively scanty. RNA viruses are known to infect marine organisms from bacteria to whales, but RNA virus communities in the sea remain less investigated. Using reverse-transcribed whole-genome shotgun sequencing Culley and his colleagues [23] reported a diverse assemblage of previously unknown RNA viruses including a broad group of marine picorna-like viruses, and distant relatives of viruses infecting arthropods and higher plants. Occurrence of a diverse array of picorna-like viruses in the ocean was reported based on the analysis of conserved sequences of RdRp sequences amplified from marine virus communities [22]. Among the viral types recorded by them, they also reported occurrence of a lytic pathogen of Heterosigma akashiwo, a toxic-bloom-forming alga responsible for severe economic losses to the finfish aquaculture industry.

Viral Metagenomics Methods

There is no single gene common to all viruses, analogous to bacterial ribosomal profiling, which posed difficulties, until the advent of improved tools of cloning and sequencing for understanding their diversity and discovering new species [28]. The first of the viral metagenomics was described a decade back, which described the uncultured near shore viral communities using shotgun cloning and sequencing [17]. These earliest DNA viral metagenomes were linker-amplified shotgun libraries (LASLs), created by ligating dsDNA linkers to genomic DNA fragments and cloning them into a vector plasmid for subsequent Sanger sequencing [15, 17, 69]. LASLs were initially limited to dsDNA viruses. However with the advent of strand-displacement amplification using Phi29 polymerases, the technology was extended to ssDNA viruses [16, 45]. The first RNA viral metagenomes were generated using random RT-PCR primers linked to adaptor sequences for cDNA synthesis [5].

The methods of shotgun sequencing for virus discovery, sequence independent virus discovery and viral diversity have been recently reviewed [6, 74]. The basic steps involved in viral metagenomics include preparation of viral nucleic acid that is free from host and contaminating nucleic acids, sequence independent amplification of viral nucleic acid, sequencing and finally use of bioinformatics tools for analysis of sequences generated. In any given sample, the viral nucleic acid constitutes a very small proportion. Viral isolation and purification protocols use filtration techniques using a 0.22 μm filter to remove bacteria and larger organisms. However, larger viruses such as giruses [77] may be eliminated in this step. Cesium chloride gradients have been commonly used to separate viral particles from free DNA and cellular material based on buoyant density [74]. While refined methods enrich and selection, at each step viruses can be lost decreasing the overall viral diversity detected [74]. Hence, viral metagenomes have been generated using filtration techniques to remove larger non-viral particles, without subsequent density centrifugation step. Metagenomic protocols also include steps to remove host genomic DNA/RNA. Viral preparations derived from plants or animals can be contaminated with DNA from host organisms, microbial flora, and microbial DNA from reagents [4]. DNAse treatment is commonly used to degrade unwanted free DNA prior to extraction and amplification of viral nucleic acid [4, 74]. While DNAse treatment does reduce the amount of contaminating DNA in viral samples, it does not completely remove it, and the use of bioinformatic filters may be necessary after sequencing [4]. The purity of viral nucleic acid preparations can be examined especially for contaminating bacterial and host genomic nucleic acids by PCR.

The amount of total nucleic acids isolated from viral particles is often too low for sequencing, and may require amplifying viral nucleic acid depending on the sequencing technology used. The viral genomes present in the nucleic acid sample are simultaneously amplified independent of their sequences and these methods have been reviewed recently [6, 25]. Sequence-independent of amplification of nucleic acids can be achieved by methods such as sequence-independent single-primer amplification (SISPA), random PCR or displacement amplification. In SISPA, adapters are ligated to DNA or cDNA to enable sequence-independent amplification [68]. The random PCR method involves use of a primer consisting of known adapter sequence at 5′ end and a degenerate hexa or heptamer at 3′ end during the cDNA synthesis step, when the cDNA is labeled with adapter sequences at both ends. A similar treatment can be done with DNA sample, enabling primer targeted sequencing. Multiple displacement amplification uses random primers in combination with high fidelity displacement polymerase such as Phi29 polymerase, enabling rolling circle amplification of nanogram to microgram quantities of nucleic acid of total viral DNA or cDNA, and generate adequate template for sequencing [24, 41, 75]. Viral RNA may also be amplified using whole transcriptome amplification methods [74, 75].

Laboratory methods such as pulsed field gel electrophoresis (PFGE) and randomly amplified polymorphic DNA (RAPD-PCR) assays have been used to quantify viral richness [38, 84, 85]. However, these methods do not always provide an accurate assessment of diversity, e.g. one band can represent multiple genomes in PFGE [84].

Metagenomic Sequencing

Metagenomic sequencing technologies differ in library preparation methods and the length of reads produced. An important approach is to construct viral shotgun libraries for sequencing by Sanger’s method as proposed by Breitbart and Rohwer [16]. Sanger’s sequencing method yields high quality sequence data with read lengths about 400 base pairs. The next generation high-throughput pyrosequencing by 454 Life Sciences provide greater amounts of sequence data compared to Sanger’s sequencing method, enabling detection of even low number of viruses present in the sample [56] with no need of cloning [54]. The recent 454 Life Sciences GSFLX system provides read lengths ranging from 250 to 400 base pairs and is widely used. Even higher throughput technologies such as Solexa/Illumina and SOLiD systems are now available, which can provide data of as much as 3–6 Gb per run. However, the average read length obtained with these systems is about 50–100 bp [66].

Bioinformatics for viral metagenomics

Bioinformatic analyses of viral metagenomes attempts to answer three questions: how many viruses are there (diversity), ‘what are they (taxonomy), and what are they doing (function)? [83]. The sequence data generated need to be first assembled. Bioinformatic screens to identify and filter out host and other non-viral sequences from metagenomes should also be included. Several software such as MEGAN [40], PathSeq [47] and CAMERA [70] etc. are available for analysis of metagenomic sequences. Several software Packages are available to rapidly parse and visualize BLAST results, and also to assist in taxonomic assignment. Bioinformatic tools compare the viral sequences generated with known sequences maintained in an annotated database such as NCBI using the programmes such as Blastn and Blastx to assign taxonomy (by comparing with known viral sequences and indicate divergent ones) and function to metagenomic sequences. MEGAN (http://ab.inf.uni-tuebingen.de/software/megan/) assigns metagenomic sequences to NCBI taxonomic classes based on significant BLAST similarities, and assigns taxonomy at the lowest (i.e. most specific) level possible using a least common ancestor algorithm [40]. The genome relative abundance and average size (GAAS) (http://www.sourceforge.net/GAAS) metagenomic tool also uses BLAST similarities to assign taxonomy. GAAS provides a set of viral community relative abundances based on all BLAST similarities for all sequences [9]. GAAS also normalizes for the length of the target genome in the database, which provides more accurate estimates of community composition [9]. The Community Cyber infrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) project (http://camera.calit2.net.) was established for attempting to bridge gaps and in developing global methods for monitoring microbial communities in the ocean. CAMERA’s database includes environmental metagenomic and genomic sequence data, associated environmental parameters (“metadata”), pre-computed search results, and software tools to support powerful cross-analysis of environmental samples. However, these tools do not help detection of completely novel viruses since viral metagenomes often contain a large number of sequences with no similarity to known sequences, as reported in microbialite viromes, where, unknown sequences accounted for 99 % [26]. Characterizing Short Read Metagenomes (CARMA) (http://www.cebitec.uni-bielefeld.de/brf/carma/carma.html) is a software for characterizing the taxonomic composition and genetic diversity of short-read metagenomes and was originally designed for the analysis of environmental metagenomes obtained by the ultra-fast 454 pyrosequencing system [48]. Functional annotation for viral metagenomes may be assigned by BLAST analyses to a small percentage of viral sequences derived. There are other methods available for functional annotation, including profile Hidden Markov Model approaches [87] and gene neighborhood analysis [36]. Viral diversity and community structure cannot usually be determined from BLAST comparisons, since many metagenomic sequences have no significant similarities to known organisms. PHAge Communities from Contig Spectrum (PHACCS) implements mathematical models to determine viral community structure and calculate alpha diversity measures from contig spectra [7, 17]. When metagenomic sequences are assembled, overlapping sequences are grouped together to form contigs, or contiguous sequences. PHACCs can be used from a web interface (http://biome.sdsu.edu/phaccs/). The programme takes four inputs: the calculated contig spectrum, the average fragment size in the metagenomic library, the minimum overlap length, and the average genome size [7, 9]. PHACCs tests several viral community structure models, and outputs the best fit model, along with estimated species richness, evenness, and the Shannon diversity index [7].

Concluding Remarks

Viral metagenomics has been recognised as an important tool in human, veterinary medicine and has contributed significantly in discovering novel viruses associated with disease in humans and animals. A great deal of information has been generated on the diversity of viruses in marine ecosystems and a number of new viruses associated with disease in marine animals have been discovered using this tool. So far, to our knowledge, description of LSNV associated with monodon slow growth syndrome in farmed shrimp in Thailand is the only one study that has used shotgun cloning and sequencing in aquaculture [72]. Recently two novel circoviruses and a nodavirus have been identified in apparently healthy shrimp from western Florida using viral metagenomics [61]. Such proactive characterization of putative viral pathogens from aquaculture species before epidemics occur will have tremendous advantages, benefitting both disease monitoring efforts and aquaculture management. It is time that this tool is utilised in understanding the viral diversity and putative viral pathogens in aquaculture settings. Such information can have implications on the biosecurity in aquaculture. Further, understanding viral ecology in aquaculture ecosystem can throw light on the role of viruses in the food web dynamics in the aquaculture ecosystems.