Introduction

The family Totiviridae includes non-enveloped, icosahedral double-stranded RNA viruses. The 40-nm virions contain a single capsid protein of 70-100 kDa. Five different genera have been recognized: Totivirus, Victorivirus, Leishmaniavirus Giardiavirus, and Trichomonasvirus. The members of Totivirus and Victorivirus infect yeast and fungi, while those of Giardiavirus and Leishmaniavirus infect parasitic protozoa [16]. An unassigned virus, infectious myonecrosis virus (IMNV), which infects Pacific white shrimp (Litopenaeus vannamei), clusters in phylogenetic trees with members of the genus Giardiavirus, but it is not clear if it belongs to a new genus in the family Totiviridae or a new species in the genus Giardiavirus. Additional unclassified viruses include Armigeres subalbatus totivirus (AsTV), Drosophila totivirus (DTV), Omono River virus (OMRV), and Tianjin totivirus (ToV-TJ), which infect mosquitoes (adult Armigeres subalbatus), Drosophila melanogaster, Culex mosquitoes (Culex pipiens pallens) and bats (Myotis ricktti), respectively [4, 6, 18, 19]. Zhai et al. [19] proposed a new genus named “Artivirus” for three arthropod-infecting totiviruses (AsTV, DTV and IMNV). Piscine myocarditis virus (PMCV) is another unclassified member of the family Totiviridae. PMCV was been reported to be a causative agent of cardiomyopathy syndrome (CMS) in farmed Atlantic salmon in Norway and has also been detected in wild salmon [3, 17]. Recently, a novel PMCV-like virus was detected in apparently healthy golden shiners (Notemigonus crysoleucas) on Minnesota baitfish farms [5]. Hence, this is third totivirus reported in fish so far.

The genome of members of the family Totiviridae varies in size from 4.6 to 6.7 kb and is divided into two ORFs, ORF1 and ORF2, which are in different frames of the genomic positive strand. The 5’-proximal ORF1 encodes a double-stranded RNA-binding motif (DSRM) and a major capsid protein (MCP), along with other proteins. The 3’-proximal ORF2 encodes an RNA-dependent RNA polymerase (RdRp), which has eight conserved motifs. The DSRM motif is present at the extreme 5’ end of ORF1 in IMNV and DTV. In OMRV and AsTV, however, it is present at approximately 300 amino acids from the N-terminus of the predicted ORF1-encoded protein [4, 6, 7]. Viruses of this family have 2A-like sequence motifs in ORF1 encoding a conserved tetrapeptide NPGP, which is known to mediate ribosome skipping to produce an apparent cleavage of the viral polyprotein precursor. Hence, this cotranslational event results in separation of two polypeptides, with the first ending with G and the second starting with P. An interesting feature of totiviruses is that the MCP is encoded in the C-terminal half of ORF1 and is generated by an unknown mechanism during ORF1 polyprotein cleavage [4, 7, 9]. The sequences of OMRV and IMNV have two 2A-like motifs, while the DTV and AsTV sequences have one 2A-like motif. A ribosomal -1 frameshift has also been reported for viruses of the family Totiviridae. This occurs because of the presence of a “slippery heptamer” sequence (NNNWWWH, where NNN represents any three identical nucleotides, WWW represents AAA or UUU, and H represents A, C or U. The presence of a pseudoknot sequence further supports the ribosomal -1 frameshift [4, 7].

The genome of IMNV is 7561 nucleotides (nt) long and contains two long, partially overlapping open reading frames (ORF1 and ORF2). ORF1 encodes a polyprotein of 1605 amino acids (aa) that is thought to be processed co- or post-translationally into four separate polypeptides (1, 2, 3, and 4) due to the presence of an NPGP tetrapeptide [6, 7, 11]. Polypeptide 1 is 93 aa in size and is predicted to contain a classical DSRM, which is thought to be involved in the suppression of host defense responses [6, 7]. The functions of polypeptides 2 and 3 are not known. Polypeptide 4 is the MCP and is the largest in size (901 aa) of the four polypeptides of ORF1 [6, 9]. ORF2 is in the −1 frame relative to ORF1 and encodes a polypeptide of 912 aa, which is designated as polypeptide 5 and includes the RdRp [6, 7, 9].

Golden shiner (Notemigonus crysoleucas) is one of the most important baitfish distributed all over the United States and southern Canada. According to the Census of Aquaculture, 76 golden shiner farms produced $17.1 million worth of product in the USA in 2005 [14]. Minnesota is the second largest producer of golden shiners, with 14 farms [14]. As with the anthropogenic movement of any animal, the transportation of baitfish from one geographic area to another introduces risk, including the dissemination of infectious agents. Therefore, it is important to understand in detail all possible pathogens of baitfish for better assessment of the potential risk of these pathogens on wild and farmed fish populations. With this in mind, we conducted a survey of golden shiners from Minnesota baitshops that were being sold at retail for angling. A combination of cell culture and metagenomics analysis was used to detect all possible viruses associated with these apparently healthy golden shiners. A novel virus related to members of the family Totiviridae was detected and is described in this report. This novel virus has been tentatively named “golden shiner totivirus” (GSTV).

Materials and methods

Source of samples

Golden shiners were collected from baitshops (n = 56) located throughout Minnesota from May to October 2014 (n = 25) and from December 2014 to February 2015 (n = 31). Thirty fish were collected with a dipnet and delivered alive on the same day to the Minnesota Veterinary Diagnostic Laboratory (MNVDL; St. Paul, MN). Upon arrival, the fish were euthanized with an overdose of MS-222 (Argent Chemical Laboratories, WA, USA). A thorough health examination was performed, followed by the removal of the kidneys and spleen for virus isolation [8]. The kidney and spleen tissues of five fish were pooled to make six pooled samples for each location. Hence, a total of 336 pools were used for virus isolation.

Virus isolation

Virus isolation was performed according to the USFWS and AFS-FHS Blue Book [15]. Briefly, a 10 % suspension of each pool was prepared in Hank’s balanced salt solution. The samples were centrifuged, and the supernatant was collected and used to inoculate Epithelioma papulosum cyprini (EPC), fathead minnow (FHM), and Chinook salmon embryo (CHSE) cell lines. The inoculated cells were incubated at 15 and 22 °C for 14 days. If no cytopathic effect (CPE) was observed after 14 days, a blind passage was done on fresh cells followed by an additional 14 days of incubation. If no CPE was observed following two passages, the sample was determined to be negative.

Illumina sequencing

Clarified supernatant from homogenized tissue pools sampled from May to October 2014 (n = 25) were prepared for Illumina MiSeq next-generation sequencing. Good-quality total RNA was extracted using TRIzol LS Reagent (Invitrogen, NY, USA), followed by RNA purification using a QIAamp viral RNA Mini Kit (QIAGEN, CA, USA). Extracted RNA was submitted to the University of Minnesota Genomic Center (UMGC) for reverse transcription to create cDNA. The cDNA was fragmented, blunt-ended, and ligated to indexed (barcoded) adaptors to run all samples in a single lane. Library preparation was done using Illumina’s Truseq RNA v2 sample preparation kit, followed by 300 cycles of paired-end read sequencing on an Illumina MiSeq. The sequence reads were analyzed using CLC Genomics Workbench 7.5 (www.clcbio.com). After trimming primer sequences, contigs were prepared by de novo assembly. Extracted contigs were analyzed by BLAST (BLASTx) analysis on the NCBI website (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The ORFs were predicted using the ORF Finder tool (http://www.ncbi.nlm.nih.gov/gorf/gorf.html).

Sequence analysis and phylogeny

The nucleotide sequences generated in this study and those available in GenBank were aligned using the ClustalW method in MEGA 6.06 software [10]. The best substitution model for analysis of aa sequences of each gene was selected on the basis of the lowest BIC (Bayesian information criterion) score in MEGA 6.05. The aa substitution model LG +G (gamma distribution with five rate categories) was used to generate phylogenetic trees. The maximum-likelihood phylogenetic trees were statistically validated by using 1000 bootstrap replicates [2]. Sequence comparison and determination of nt and aa pairwise identity scores were done using Geneious Pro [1]. Frameshifting and pseudoknot predictions were performed using the KnotInFrame [12] and pKiss online tools with default settings using the thermodynamic model parameters of Turner et al. [13] and a temperature of 29 °C (http://bibiserv2.cebitec.uni-bielefeld.de/rna).

Reverse transcription polymerase chain reaction (RT-PCR)

Primers were designed that targeted the RdRp gene for standardization of RT-PCR for screening of study samples and field samples. The forward 5’-ATGTGTCTGATGTTGAAGCA-3’ and reverse 5’- AAGCAGCATAGTCAAATGGT-3’ primers targeting nt 5224-5244 and 6306- 6286 of the RdRp gene generated an amplified product of 1082 bp. Once the reaction conditions were standardized, a single pooled sample from each of the 56 locations was tested by RT-PCR. If the single pooled sample for each location was positive, the six samples were tested individually. RNA was extracted from individual samples using a QIAamp Viral RNA Mini Kit (QIAGEN, CA, USA). A QIAGEN OneStep RT-PCR Kit (QIAGEN, CA, USA) was used for RT-PCR reactions. The amplified PCR products were purified using a QIAquick PCR Purification Kit (QIAGEN, CA, USA) according to manufacturer’s guidelines and then submitted to the UMGC for Sanger sequencing with the forward and reverse primers used for RT-PCR. The forward and reverse sequences were aligned together using Sequencher 5.1 software (www.genecodes.com) followed by BLAST analysis (www.ncbi.nlm.nih.gov). The specificity of RT-PCR was evaluated by testing GS-PMCV-like virus, golden shiner astrovirus, golden shiner picornavirus, golden shiner reovirus, and fish calicivirus.

Results

Virus isolation

No CPE related to an unknown virus was observed after inoculation on the EPC, FHM, and CHSE cell lines at 15 and 22 °C in any of the 336 samples after two blind passages. Golden shiner reovirus was isolated from two of the 56 locations.

Sequence comparison and phylogenetic analysis

BLASTx analysis of contigs obtained by de novo assembly showed that they matched most closely with IMNV in two of 25 pools. The sequence from this study is 7788 nt long with a complete 5’ untranslated region (UTR) of 135 nt (nt 1-135), complete ORFs and a partial 3’ UTR of 54 nt (nt 7734-7788). The sequence contains two ORFs (ORF1 and ORF2). The larger ORF1 encodes a polypeptide of 1659 aa in frame +1 from nt position 136 to 5115 (4980 nt) with a start codon at nt 136-138 and a stop codon at nt 5113-5115. ORF1 encodes a protein that is 54 aa longer than the 1605-aa protein encoded by ORF1 of IMNV -ID-EJ-12-1(AIC34743.1). Sequence alignment with previously published totivirus sequences revealed that the sequence has two 2A-like sequence motifs, EGVEKNPGP and GDIESNPGP, at positions 305-313 and 466-474, respectively (Fig. 1, Table 1). These 2A-like motifs differ from those of IMNV, which are at positions 86-94 (GDVESNPGP) and 370-378 (GDVEENPGP). Due to the presence of 2A sequence motifs, ORF1 is predicted to encode three polypeptides (polypeptides 1-3) of size 312, 161, and 1186 aa, respectively. The largest fragment of 1186 aa was predicted to be further cleaved co- or post-translationally into two smaller proteins, 319 and 867 aa in size, due to the presence of an NKKVHAYNGN sequence motif at position 783-792. The 867-aa fragment corresponded to the MCP of IMNV and other previously reported totiviruses. Hence, this novel golden shiner totivirus has four polypeptides encoded by ORF1, namely, polypeptides 1, 2, 3 and 4 of 312, 161, 319 and 867 aa and with molecular weights of 35.3, 17.9, 37.4, 96.3 kDa, respectively. The DSRM motif is expected to be present at aa position 185-247, which is unique among totiviruses and different from IMNV and OMRV, in which the DSRM was reported at the extreme left of the N-terminus (from aa 1-60) and just upstream of the second 2A motif (before aa 500-508), respectively (Fig. 1, Table 1).

Fig. 1
figure 1

Genomic arrangement of Golden Shiner Totivirus (GSTV)

Table 1 Comparison of unclassified members of family Totiviridae with Golden Shiner Totivirus (GSTV)

ORF2 is 737 aa long (nt position 5520-7733) and corresponds to the RdRp of other members of the family Totiviridae. A sequence alignment of the putative translation product of ORF2 with those of previously reported totiviruses confirmed the presence of eight conserved motifs of RdRp (Supplementary Fig. 1). This ORF2 encoded another polypeptide, polypeptide 5. The first potential AUG start codon of ORF2 was present 404 nt downstream of the stop codon of ORF1. However, a ribosomal frameshift is possible, because there was no stop codon in the same reading frame up to 507 nt upstream of the putative ORF2 start codon. This possibility is supported by the presence of a “slippery heptamer” (AAAUUUC) at position 5016-5022, which is 90 nt upstream of the stop codon of ORF1, and the presence of a 120-aa stable RNA knotted structure (pseudoknot or hairpin) from nt position 5023 to 5143 (Fig. 2). The predicted ORF1 and ORF2 fusion protein sequence was NFQDGG (Fig. 2). Hence, an overlapping region of 99 nt was observed, which is shorter than the 172-nt and 199-nt overlapping regions in AsTV and IMNV, respectively. A complete 5’ UTR of 135 nt was present, which is the same size as in IMNV, and a partial 3’ UTR of 55 nt was sequenced. Therefore, the genome sequence of GSTV is complete except for part of the 3’UTR. Neither the 5’ nor 3’ UTR showed any notable nt sequence similarity to those of other totiviruses.

Fig. 2
figure 2

Predicted slippery heptamer highlighted in yellow color and fusion motif of ORF1 and ORF2 highlighted in gray color

Phylogenetic analysis was done based on ORF1 MCP and ORF2 RdRp sequences. GSTV formed a separate clade among totiviruses based on phylogenetic analysis of the ORF1 MCP sequence. GSTV was most closely related to IMNV, with maximum aa sequence identity of 26.42-27.86 %, followed by 26.59, 22.94 and 21.75 % to DTV, AsTV and OMRV, respectively (Fig. 3, Table 2). Similar to ORF1, the ORF2 (RdRp) of GSTV formed a separate clade with maximum identity of 38.10 % and 38.50 % to IMNV and DTV, respectively (Fig. 4, Table 2). GSTV formed a separate lineage in both ORF1 and ORF2 based on phylogenetic analysis (Fig. 3 and 4).

Fig. 3
figure 3

Phylogenetic tree constructed based on 867 amino acids of ORF1 (Major capsid protein). Tree was constructed in MEGA 6.06 using LG + G model with Maximum Likelihood method and 1000 bootstrap replicates. Study sequence is highlighted in bold

Table 2 Percent identity of Golden Shiner Totivirus (GSTV) with other totiviruses
Fig. 4
figure 4

Phylogenetic tree constructed based on 700 amino acids of ORF2 (RdRp gene). Tree was constructed in MEGA 6.06 using LG + G model with Maximum Likelihood method and 1000 bootstrap replicates. Study sequence is highlighted in bold

GenBank accession number

The sequence obtained in this study was submitted to the GenBank database with accession number KT725636.

Rt-pcr

The newly developed RT-PCR targeting the RdRp gene was found to be specific for the detection of GSTV and did not amplify GS-PMCV-like virus, fish astrovirus, fish calicivirus, fish reovirus or fish picornavirus (Fig. 5). A single band of 1082 bp confirmed the presence of GSTV, and no band was observed for other viruses used for testing specificity. The two out of 25 pools (8 %) that were found positive by Illumina sequencing were confirmed using the new RT-PCR. The remaining 31 samples were all negative. Further testing of 12 individual samples used to make the two positive pools (six in each pool) resulted in five confirmed positive samples (2/6 in the first positive pool and 3/6 in the second positive pool). The sequences of these five amplified PCR products were 100 % identical to the sequence obtained by Illumina sequencing of the pooled sample.

Fig. 5
figure 5

A 1.2 percent gel electrophoresis showing specificity of GSTV RT-PCR. Lanes: M = molecular marker; 1 = pooled sample 1; 2 = pooled sample 2; 3 = individual sample 1; 4 = individual sample 2; 5 = Golden shiner piscine myocarditis-like virus; 6 = Golden shiner astrovirus; 7 = Golden shiner picornavirus; 8 = Golden shiner reovirus; 9 = fish calicivirus;10 = control negative. Positive samples show single specific band of 1082 bp

Discussion

Golden shiner is one of the most important baitfish in the United States. We used next-generation sequencing to screen for known and unknown viral pathogens circulating in the retail baitfish market of Minnesota and found a novel virus of the family Totiviridae. In this study, an almost complete genome of a novel GSTV was detected. It is 7788 nt long with a complete 5’ UTR (135 nt, nt position 1-135), complete ORFs, and a partial 3’ UTR of 54 nt (nt 7734-7788). The GSTV encodes five polypeptides in a similar order as in OMRV and IMNV, but in a unique pattern, which is different from both IMNV and OMRV. Due to the presence of the first 2A motif at aa 303-313, polypeptide 1 is larger (312 aa) than those of OMRV (95 aa) and IMNV (93 aa). A DSRM was present in polyprotein 1, like in IMNV, but at aa position 185-247, which is different from its position at aa 1-90 in IMNV [6 9]. The ORF1-encoded MCP was smaller (867 aa) in GSTV than in IMNV (901 aa) and OMRV (897 aa) [4, 6, 9, 19].

A ribosomal -1 frameshift was also predicted in GSTV, which is similar to previously reported totiviruses. However, the slippery heptamer sequence was different in GSTV from the heptamer sequence in IMNV, OMRV and other totiviruses. Similar to ToV-TJ, the predicted pseudoknot sequence in GSTV starts just after the slippery heptamer, which is in contrast to IMNV and OMRV, where there is a longer distance between the pseudoknot sequence start and the slippery sequence [4, 6, 18]. Due to the effect of the ribosomal -1 frameshift, an overlapping of 99 nt between ORF1 and ORF2 was observed, which was smaller than the overlapping region in AsTV (172 nt) and IMNV (199 nt) [6, 19]. The fusion protein sequence of MCP and RdRp is predicted to be NFQDGG, which was different from SYFFRC and SGFYEC in AsTV and IMNV, respectively [7, 19]. The eight conserved motifs were consistent in GSTV, with some insertions and substitutions, when compared to AsTV, IMNV, OMRV and ToV-TJ sequences [4, 9, 19].

Based on phylogenetic analysis of the ORF1- and ORF2-encoded proteins, GSTV showed maximum aa identity of 27.86 % and 38.50 %, respectively, to previously reported totiviruses. On the basis of ORF1 and ORF2 sequence analysis, GSTV formed a separate lineage from closely related totiviruses. The criteria established by the International Committee on Taxonomy of Viruses (ICTV) for species demarcation in the family Totiviridae include less than 50 % identity at the protein level [16]. Based on these criteria and the uniqueness of its genome organization, we believe that GSTV should be considered a member of a novel species in the family Totiviridae.

To the best of our knowledge, this is the fourth known totivirus in fish and the second report of the detection of a totivirus in freshwater fish. IMNV had previously been reported to be the first member of the family Totiviridae that infects the Pacific white shrimp L. vannamei. IMNV is considered to be highly pathogenic and is reported to cause high mortality and morbidity; however, the pathogenicity of GSTV has not been evaluated. Future studies should be conducted to better understand the pathogenicity of this virus, its host range, and its environmental requirements. The specific RT-PCR developed in this study will be helpful in conducting a molecular epidemiological study to determine the prevalence of this virus in wild and farm-raised golden shiner populations.