Introduction

The red imported fire ant (Solenopsis invicta Buren) is an exceptionally invasive species native to South America [45]. Introduced into North America in the 1930s [45], from somewhere in Formosa Province, Argentina [5], the ant has proven to be a very serious pest in the US where annual damage estimates exceed $6 billion [36]. Comparative studies on populations in the native and introduced ranges have shown that fire ant populations are up to ten-fold greater in infested areas within the US [38, 39]. In addition, the US population is largely devoid of natural enemies compared with the population in the native range, suggesting that S. invicta escaped its natural enemies during US founding [3, 50, 64].

Although poorly studied in ants [7, 8, 13, 18, 33], viruses can be used to provide natural control of insect pests as classical biological control agents or biopesticides [25]. Thus, efforts have focused on discovery, characterization, and development of viruses from fire ants to be used, ultimately, in their control [7, 33, 48]. To date, five RNA viruses and one DNA virus have been discovered from S. invicta. The RNA viruses include Solenopsis invicta virus 1 (Dicistroviridae) [46], Solenopsis invicta virus 2 (Polycipiviridae) [47], Solenopsis invicta virus 3 (Solinviviridae) [49], Solenopsis invicta virus 4 (Polycipiviridae) [35], and Solenopsis invicta virus 5 (tentatively, Dicistroviridae) [59]. The impact of these virus infections in fire ants varies. Solenopsis invicta queens infected with Solenopsis invicta virus 1 (SINV-1) exhibit decreased body weight reducing the probability of successful colony founding [31]. Solenopsis invicta virus 2 (SINV-2) infections are also associated with negative impacts on queens resulting in significant reductions in fecundity, longer claustral periods, and slower growth of newly established colonies [31]. Solenopsis invicta virus 3 (SINV-3) reduces queen fecundity [51] and alters the feeding behavior exhibited by the worker caste, which results in colony starvation [54]. SINV-3 is host specific [40, 41] and the most virulent [51, 55] of the characterized fire ant viruses offering a viable method of controlling fire ants in introduced areas [33]. SINV-3 has intentionally been released as a classical biological control agent against S. invicta populations in Tennessee, Florida, and California [34, 53]. The impacts of Solenopsis invicta virus 4 (SINV-4), Solenopsis invicta virus 5 (SINV-5), and Solenopsis invicta densovirus (SiDNV), have not been established [52].

The ongoing effort of our laboratory is to characterize the virome of Solenopsis invicta to identify viruses that could be used as natural control agents against this ant pest. Multiple metagenomic libraries were created from different developmental stages of S. invicta [workers [59], immatures (larvae, pupae), and dead workers taken from midden piles] collected from colonies from across Formosa, Argentina, and sequenced with the Illumina Miseq method. Bioinformatic analyses revealed nine new RNA virus genomes that were confirmed by Sanger sequencing. Phylogenetic analysis of the RNA-dependent RNA polymerase (RdRp) and nonstructural polyprotein, and genome characteristics were used to tentatively taxonomically place these new virus genome sequences; these include four new species of Dicistroviridae, one Polycipiviridae, one Iflaviridae, one Totiviridae, and two genome sequences that were too divergent to be placed with taxonomic certainty.

Materials and methods

Solenopsis invicta collections

Samples of workers and brood (larvae and pupae) were obtained from 182 S. invicta colonies collected at 25 locations from across eastern Formosa, Argentina (Table 1). After being extracted from the nest soil, live ants were transported to the USDA quarantine facility in Gainesville, Florida, where they were maintained in rearing trays. Voucher specimens have been deposited at the USDA-ARS, Center for Medical, Agricultural and Veterinary Entomology (CMAVE), Gainesville, Florida collection.

Table 1 Sites from which Solenopsis invicta colonies were collected as source material for each gene library

RNA preparation

Total RNA was extracted from pooled groups of worker ants, brood (larvae + pupae), and dead worker ants obtained from midden piles. RNA was extracted using the Trizol method followed by the PureLink RNA Mini Purification Kit according to the manufacturer’s instructions (Thermo Fisher Scientific, Waltham, MA). RNA quality of each preparation was assessed by microfluidic analysis on an Agilent 2100 Bioanalyzer (Agilent, Cary, NC) using the RNA 6000 Nano kit according to the manufacturer’s instructions. Total RNA was submitted to GE Healthcare (Los Angeles, CA) for mRNA purification, library preparation, and Illumina RNA sequencing (MiSeq).

Library preparation and sequencing

Eight libraries were created [four from worker ants (1W, 2W, 3W, 4W), two from brood (5B, 6B) and two from dead workers (7D, 8D)] for separate sequencing. Collection information for each library is summarized in Table 1. The number of pooled ants used for RNA preparation of each library was: 1W (n = 690 worker ants); 2W (n = 795 worker ants); 3W (n = 660 worker ants); 1W (n = 585 worker ants); 5B (n = 111 larvae + 30 pupae); 6B (n = 137 larvae + 34 pupae); 7D (n = 147 dead workers); 8D (n = 147 dead workers). Libraries 7D and 8D used the same source of RNA for library preparation but RNA for library 8D was treated with the GeneRead rRNA depletion kit (Qiagen, Germantown, MD) before library preparation. Total RNA (200 ng) purified from each of the pooled groups was used for mRNA purification with the Illumina TruSeq Stranded mRNA Library Preparation Kit (Catalog # RS-122-2101). The low sample protocol was followed according to the manufacturer’s instructions. The RNA fragmentation step was omitted to maximize library insert length. Rather than fragmenting the RNA, the sealed plate was incubated at 80 °C for 2 min to elute the primed mRNA from the RNA purification beads. This omission resulted in RNA fragmentation with an average final library size of 467 bp. Library sizes were determined empirically by microfluidic analysis on an Agilent 2100 Bioanalyzer and quantified using the Quant-iT dsDNA kit with broad range standards (ThermoFisher Scientific, Q-33130). Samples were pooled together at equimolar quantities and sequenced twice using the Illumina MiSeq (2 × 300) cycle kit with version 3 chemistry. Using the Illumina indices, the data were demultiplexed and the runs combined to assign the data to individual samples. All other procedures were followed according to the manufacturer’s instructions.

Metagenomic analysis

Sample data were prepared by removing contaminants and trimming sequencing adaptors using the BBTools program BBduk version 37.02 (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide). After quality and contaminant filtering the total number of RNA reads for each sample was: 6,991,698 (1 worker); 7,049,900 (2 worker); 6,585,560 (3 worker); 7,485,078 (4 worker); 6,626,110 (5 brood); 7,590,164 (6 brood); 5,943,866 (7 dead worker [midden]); and 5,042,710 (8 dead worker [midden]). The reference genome for S. invicta (Genbank accession AEAQ00000000.1) was downloaded from NCBI. Low complexity regions of the genome were masked Repeat Masker version open-4.0.7 (http://www.repeatmasker.org), a mapping reference was created using BBsplit version 37.02 and RNA reads from S. invicta were removed with the BBTools program BBsplit. A combined assembly was created from all eight samples using Spades version 3.11.1 in “meta” mode [2]. The scaffolds from Spades were annotated using Diamond version 0.9.17 in translated protein search mode against the NR database (GenBank) from February 3, 2018 [4]. The Diamond annotations were loaded into Megan version 6.11.6, and RNA virus contigs were identified by the lowest common ancestor method [16]. All reads from each sample were mapped back to the combined assembly using Bowtie2 version 2.3.4.1 in “very sensitive” mode [26].

Data availability

Raw sequence data from each library were deposited into the GenBank database as a sequence read archive under accession number, SRP113235 (Bioproject PRJNA394996). The complete virus genomes were deposited in GenBank under accession numbers provided in Table 2. The code to reproduce the metagenomic and phylogenetic analyses is archived at (https://doi.org/10.5281/zenodo.2430743).

Table 2 New virus genome sequences discovered from Solenopsis invicta

Virus genome re-sequencing

Nine, near-complete RNA virus genomes were assembled from the Illumina-derived sequences. These sequences were used as templates to design oligonucleotide primers to provide complete, overlapping coverage of each genome. The genomes of each virus were PCR amplified in sections (~ 1200 nucleotides in length) from cDNA using RNA pooled from all of the libraries. Amplicons were cloned into pCR4 vector and sequenced by the Sanger method. To acquire the ends of each genome, 5′ and 3′ rapid amplification of cDNA ends (RACE) was completed for each sequence. For 3′ RACE, cDNA was synthesized with the GeneRacer Oligo dT primer (Invitrogen, Carlsbad, CA) and PCR subsequently conducted with the GeneRacer 3′ primer and a gene-specific primer. For 5′ RACE, cDNA was synthesized with a gene-specific oligonucleotide primer; PCR was subsequently completed with the gene-specific primer and the GeneRacer Abridged Anchor Primer. Amplicons generated during RACE reactions were also cloned into pCR4 vector and submitted for Sanger sequencing. Genomes were assembled, de novo, with the CAP3 program in Vector NTI (Life Technologies, Carlsbad, CA). A minimum of three-fold genome coverage was obtained.

Phylogenetic analysis

An initial analysis was conducted to broadly infer taxonomic position of the new S. invicta virus genomes within the picorna-like virus superfamily [21]. RdRp regions of the picorna-like superfamily members [21] and the S. invicta viruses were aligned with MUSCLE [11] and subsequently analyzed by the Maximum Likelihood method with the JTT matrix-based model [19] to infer an evolutionary history [24]. The RdRp region of each S. invicta virus was also subjected to Blastp analysis [1] against the National Center for Biotechnology NR database to identify potential related sequences for inclusion in subsequent, refined phylogenetic analyses.

Seven of the nine S. invicta genome sequences clustered within well-studied and represented groups, specifically, Dicistroviridae (SINV-6, SINV-9, SINV-12, SINV-13), Iflaviridae (SINV-11), Polycipiviridae (SINV-8), and Totiviridae (SMV). Two S. invicta virus sequences (SINV-7 and SINV-10) did not cluster clearly within any established group of the picorna-like virus superfamily. In these instances, sequences identified as close relatives by Blastp were included in the phylogenetic analysis [1].

Phylogenetic analysis was subsequently conducted with the translated amino acid sequences of the nonstructural polyproteins to refine taxonomic placement. For each group the fire ant virus genome and related references identified by Blastx were collected. The polyproteins were aligned using Mafft version 7.394 in E-INS-I mode and trimmed to phylogenetically informative residues using Trimal v1.4.rev15 [6, 20]. The Trimmed polyprotein lengths were: Dicistroviridae: 957 nts, Iflaviridae 1683 nts, Polycipriviridae: 1880 nts, SINV-7_10 group: 1628 nts, Totiviridae: 429 nts. An unrooted maximum likelihood tree was created using RAxML version 8.2.10 with the Gamma rate model and the LG substitution matrix [28, 44]. The model and substitution matrix was selected based on AIC scores from the online tool Smart Model Selection in PhyML [14, 29]. 100 Bootstrap replicates were run. Trees were formatted, midpoint rooted, and annotated using the ETE toolkit version 3.1.1 [15].

Population specificity

Experiments were also conducted to determine whether the new virus sequences were present in the North American S. invicta (introduced) population by testing RNA samples pooled from S. invicta colonies collected in Gainesville, Florida (n = 56; October 2016), Biloxi, Mississippi (n = 16; May 2016), and Coachella, California (n = 47; May 2017). Total RNA was extracted from worker ants using the Trizol method according to the manufacturer’s instructions (Invitrogen). RT-PCR was conducted on the US S. invicta samples and if a virus sequence was detected, additional RT-PCR analyses were conducted using RNA obtained from individual colonies to provide a sense of the inter-colonial prevalence of the virus sequence. In addition, the entire genome was sequenced from RNA obtained from North American ants for comparison with the Argentinean sequence.

Results

Nine new RNA virus genome sequences were discovered from the metatranscriptome of S. invicta field colonies collected from across approximately 2.5 million hectares of the Formosa region of Argentina (Table 2). Eight of these viruses, including Solenopsis invicta virus 6 (SINV-6), SINV-7, SINV-8, SINV-9, SINV-10, SINV-11, SINV-12, and SINV-13 exhibited sequence identity, domain motifs, and genome architecture characteristics consistent with positive-sense, single-stranded RNA viruses of the Picornavirales [27]. Phylogenetic analysis of the conserved RdRp regions of these virus genomes further supported this placement (Suppl. Fig. 1). The SINV-7 genome was, by far, the most prevalent virus sequence represented among all libraries (Fig. 1). Virus diversity was greater in libraries created from adult ants (both alive and dead) where all virus sequences were represented. Conversely, the brood-derived libraries contained exclusively SINV-7-associated sequences (library 5), and SINV-7 with a small representation from SINV-12 (library 6) (Fig. 1). All previously described Solenopsis invicta viruses (i.e., SINV-1, SINV-2, SINV-3, SINV-4, SINV-5, and SiDNV) were also detected in the South American libraries described. However, our focus for the current work was to describe new viruses.

Fig. 1
figure 1

Proportion of unambiguous virus sequence reads binned by library designation. RNA was acquired from different developmental stages of Solenopsis invicta including workers, brood and dead workers (x-axis). Each virus genome (see legend) is represented by a unique color. The total number of reads for each sample after quality and contaminant filtering was 6,991,698 (1 worker); 7,049,900 (2 worker); 6,585,560 (3 worker); 7,485,078 (4 worker); 6,626,110 (5 brood); 7,590,164 (6 brood); 5,943,866 (7 dead worker [midden]); and 5,042,710 (8 dead worker [midden])

The SINV-6 through SINV-13 genomes possessed NTPase domains containing the conserved Walker A motif (Gx4GK[S/T]) indicative of helicase function, and an RdRp motif (Fig. 2). The RdRp is an essential protein encoded in the genomes of all RNA containing viruses with no DNA stage [32]. These S. invicta genome sequences required a reverse transcription step for successful polymerase chain amplification (data not shown); PCR without reverse transcription did not yield an amplicon. Furthermore, each of these genomes encoded single to multiple large open reading frames in the sense orientation only (Fig. 2). These results support the conclusion for positive-sense, single-stranded RNA genomes. The ninth virus, Solenopsis midden virus (SMV), exhibited genome characteristics and sequence identity consistent with double-stranded RNA genomes of the Totiviridae.

Fig. 2
figure 2

Diagrammatic representation of the predicted genome map and relative size (nucleotides) of each Solenopsis invicta virus sequenced. Solenopsis invicta virus 6 (SINV-6) through SINV-13 are positive-sense, single-stranded RNA genomes and SMV has a double-stranded RNA genome. Summary information for each virus is provided in Table 2

Phylogenetic analysis of the conserved RdRp region inferred placement within the picorna-like virus superfamily [10, 21] (Suppl. Fig. 1). SINV-6, SINV-9, SINV-12, and SINV-13 clustered within the Dicistroviridae clade [58]. SINV-8 clustered within the recently described Polycipiviridae [35], SINV-11 within the Iflaviridae, and SMV within the Totiviridae (a double-stranded RNA genome). SINV-7 and SINV-10 sequences were too divergent and did not cluster within an established group of the picorna-like virus superfamily [21]. However, close relatives of SINV-7 and SINV-10 were identified by Blastp analysis, with many of these close relatives assigned to a “picorna-calici” clade as described by Shi et al. [43].

Dicistroviridae (SINV-6, SINV-9, SINV-12, SINV-13)

Phylogenetic analysis of the non-structural polyproteins placed four of the virus genomes (SINV-6, SINV-9, SINV-12, and SINV-13) clearly within the Dicistroviridae [58] (Fig. 3). SINV-6 was detected in both North and South American populations of S. invicta. Therefore, the genome was Sanger sequenced (Table 2) from Argentinean S. invicta (isolate = Formosa) and US S. invicta (isolate = Hogtown) samples for comparison. The RNA genome of both isolates was 9793 nucleotides (nts) in length, excluding the polyadenylated 3′ terminus. Two large non-overlapping ORFs were predicted in the sense orientation. ORF1 (5′-proximal) was 5862 nts and ORF2 (3′-proximal) was 2085 nts (Fig. 2). Blastx [1] analysis of each ORF revealed significant identity to non-structural proteins (ORF1) and structural (capsid) proteins (ORF2) characteristic of viruses in the Dicistroviridae. The RdRp region of ORF1 exhibited significant identity to Bundaberg bee virus 1 (unclassified) and Acute lethal paralysis virus (Cripavirus genus) [58]. The intergenic internal ribosome entry site (IRES) bulge sequence (UGAUCU) found in cripaviruses was not detected in either SINV-6 isolate [17, 37]. Minimal sequence variation was observed between the isolates. The genomes were 97.6% identical and the predicted translation sequences of ORFs 1 and 2 were 99.5 and 99.7% identical, respectively. Phylogenetic analysis of the nonstructural polyprotein (ORF1) supported placement within the Cripavirus clade. However, SINV-6 and other cripaviruses appear to form a unique group that may represent a new genus related to, but distinct from Cripavirus (Fig. 3).

Fig. 3
figure 3

Phylogenetic (mid-point rooted) tree for the SINV-6, SINV-9, SINV12, and SINV-13 (shown in red font) non-structural polyprotein and related sequences of the Dicistroviridae. Each genus, Triatovirus, Cripavirus and Aparavirus is noted. The host’s taxonomic class is shown in gray font and the accession number is provided after the virus name. Numbers on the internal nodes represent the number of nonparametric bootstrap samples out of 100 with the same topology. Legend for genetic distance is shown at the bottom left

The SINV-9 genome was 10,162 nts in length excluding the polyadenylated 3′ terminus. This genome sequence was not detected in S. invicta collected from the US range. Two large non-overlapping ORFs were predicted in the sense orientation (Fig. 2). ORF1 (5′-proximal) was 5600 nts and the translated sequence exhibited identity with helicase and RdRp protein sequences from viruses in the Dicistroviridae (Triatovirus genus). The translated polyprotein of ORF2 (3′-proximal, 290 nts) also exhibited sequence identity with triatoviruses (capsid proteins). This is the first S. invicta virus genome shown to cluster within the Triatovirus genus.

Phylogenetic analysis of the SINV-12 and SINV-13 nonstructural polyproteins showed that these sequences clustered within the Aparavirus genus (Fig. 3). The SINV-12 genome was 9535 nts excluding the polyadenylated 3′ terminus. Two non-overlapping ORFs (ORF1 and 2) and one overlapping ORF (ORF2b) were predicted in the sense orientation (Fig. 2). ORF1 (5′-proximal) was 5592 nts and ORF2 (3′-proximal) was 2807 nts. ORF2b overlapped ORF2 at the 5′end. Sequence analysis revealed significant identity to non-structural proteins (ORF1) and structural (capsid) proteins (ORF2) characteristic of viruses in the Aparavirus genus of the Dicistroviridae. The overlapping ORF2b is also characteristic of aparaviruses [58].

The genome of SINV-13 was 8297 nts in length, excluding the polyadenylated 3′ terminus. Two non-overlapping ORFs (ORF1, 5′-proximal and ORF2, 3′-proximal) were predicted in the sense orientation with a third ORF (ORF2b) overlapping ORF2 at the 5′ end. Sequence analysis of ORF1 revealed significant sequence identity with Israeli acute paralysis virus; ORF2 had significant sequence identity with Kashmir bee virus. The overlapping ORF2b and phylogenetic analysis support placement of SINV-13 in the Apaavirus genus. SINV-12 and SINV-13 were detected exclusively in South American S. invicta.

Polycipiviridae (SINV-8)

The SINV-8 genome sequence was 12,178 nucleotides with five non-overlapping ORFs in the sense direction and one overlapping ORF (ORF2b) predicted (Fig. 2). ORFs 1–4 and 5 exhibited significant sequence identity with putative capsid proteins, and helicase, protease, and RdRp of RNA viruses from the Polycipiviridae, respectively. Phylogenetic analysis of the polyprotein from ORF5 clusters SINV-8 with the new family Polycipiviridae, genus Sopolycivirus [35] (Fig. 4). The unique polycistronic genomic architecture and phylogenetic analysis support this taxonomic placement. The SINV-8 genome sequence was detected exclusively in Argentinean S. invicta.

Fig. 4
figure 4

Phylogenetic (mid-point rooted) tree for the SINV-8 (shown in red font) non-structural polyprotein and related sequences of the Polycipiviridae. Each genus, Sopolycivirus, Hupolycivirus and Chipolycivirus is noted. The host’s taxonomic class is shown in gray font and the accession number is provided after the virus name. Numbers on the internal nodes represent the number of nonparametric bootstrap samples out of 100 with the same topology. Legend for genetic distance is shown at the bottom left

Iflaviridae (SINV-11)

The SINV-11 genome was 9200 nts excluding the polyadenylated 3′ terminus with a single large ORF in the sense direction (Fig. 2). The translated sequence of the predicted ORF exhibited identity with capsid, helicase and RdRp proteins of virus members in the Iflaviridae. Phylogenetic analysis of the polyprotein placed SINV-11 within the large iflavirus family (Fig. 5). This is the first iflavirus genome sequence reported from S. invicta. The SINV-11 genome sequence was detected exclusively in the South American population of S. invicta.

Fig. 5
figure 5

Phylogenetic (mid-point rooted) tree for the SINV-11 (shown in red font) non-structural polyprotein and related sequences of the Iflaviridae. The host’s taxonomic class is shown in gray font and the accession number is provided after the virus name. Numbers on the internal nodes represent the number of nonparametric bootstrap samples out of 100 with the same topology. Legend for genetic distance is shown at the bottom left

Totiviridae (SMV)

The SMV genome sequence was 6766 nts in length. Two overlapping ORFs were predicted in the sense orientation (Fig. 2). The predicted translated ORF1 (Gag) and ORF2 (Pol) sequences exhibited identity with capsid and replicase proteins from the Totiviridae, respectively. PCR amplification first required reverse transcription indicating an RNA genome. Sense and anti-sense oligonucleotide primers produced a cDNA template after reverse transcription providing evidence that the genome was double stranded. Phylogenetic analysis of the ORF2 polyprotein sequence clustered SMV within a large group of arthropod infecting toti-like viruses, yet in a group distinct from the Totiviridae proper (Fig. 6). There are four recognized Totiviridae genera with each genus distinguished by its host; virus members in the Giardiavirus and Leishmaniavirus infect protozoa, and those in the Totivirus and Victorivirus infect fungi. The clade formed with SMV-like totiviruses is distinct from the established Totiviridae genera and appear to exclusively infect arthropods (Fig. 6).

Fig. 6
figure 6

Phylogenetic (mid-point rooted) tree for the SMV (shown in red font) non-structural polyprotein and related sequences of the Totiviridae and Toti-like viruses. The known genera, Totivirus, Giardiavirus, Victorivirus, and Leishmaniavirus are indicated. The host’s taxonomic class is shown in gray font and the accession number is provided after the virus name. Numbers on the internal nodes represent the number of nonparametric bootstrap samples out of 100 with the same topology. Legend for genetic distance is shown at the bottom left

Unclassified viral genomes (SINV-7, SINV-10)

Neither SINV-7 nor SINV-10 was detected in North American S. invicta colonies examined. The SINV-7 genome was 10,257 nucleotides in length, excluding the polyadenylated 3′ terminus. Two non-overlapping ORFs in the sense direction were predicted, one long ORF in frame 1 at the 5′ end of the genome (8517 nts) and one short ORF in frame 3 at the 3′ end of the genome (587 nts) (Fig. 2). No significant sequence similarity was detected from ORF2. The ORF1 polyprotein sequence exhibited the most significant alignments to the unclassified Milolii virus (98% sequence coverage with 37% identity; GenBank Accession: MF155030) obtained from the ghost ant, Tapinoma melanocephalum, and the Bundaberg bee virus 8 (91% sequence coverage with 38% identity; GenBank Accession: AWK77859) from the honey bee, Apis mellifera [42]. Phylogenetic analysis of the polyprotein places SINV-7 within a large cluster of, as yet, unidentified viruses (Fig. 7).

Fig. 7
figure 7

Phylogenetic (mid-point rooted) tree for the SINV-7 and SINV-10 (shown in red font) non-structural polyprotein and related, unclassified sequences. The host’s taxonomic class is shown in gray font and the accession number is provided after the virus name. Numbers on the internal nodes represent the number of nonparametric bootstrap samples out of 100 with the same topology. Legend for genetic distance is shown at the bottom left

The SINV-10 genome was 10,979 nts long excluding the polyadenylated 3′ terminus. A single large ORF was predicted in the sense direction (10,236 nts). Blastx analysis of the ORF revealed significant sequence identity with Hubei picorna-like virus 54 (99% ORF coverage; 53% sequence identity) from a Myriapod metagenome [43] and Riptortus pedestris virus 1 from the bean bug, Riptortus pedestris [65]. These viruses are not currently classified and may form a new virus family. While the genome architecture resembles iflaviruses, phylogenetic analysis placed SINV-10 nearer the Solinviviridae. However, SINV-10 does not appear to possess a jelly roll domain observed in Solinviviridae genomes. This group of virus sequences (Fig. 7), including SINV-3, SINV-7 and SINV-10 exhibits significant divergence from members of the picorna-like virus superfamily and may represent a unique taxonomic group.

Discussion

The objective of the work presented here was to examine metatranscriptomes of different developmental stages of S. invicta collected from across their native range (Formosa, Argentina) to expand the catalog of viruses that could be used as natural control agents against the US population. Nine virus genomes presented here increases the number of viruses discovered from S. invicta to 15—a 180% increase. The S. invicta viruses now include 13 positive-sense, single-stranded RNA viruses (SINV-1 to SINV-13) [46, 47, 49, 50, 59], one double-stranded RNA virus (SMV), and one double-stranded DNA virus (SiDNV) [52]. These additions to the S. invicta virome offer potentially new classical biological control agents (outside South America) because they appear to be found exclusively in Argentinean S. invicta.

The tentative virus names that we have assigned provide information about the associated source of the virus (i.e., Solenopsis invicta) and a unique designation (Arabic numeral). The rules established by the International Committee for the Taxonomy of Viruses (ICTV) for naming new viruses emphasize uniqueness above all other criteria. We feel that, although mundane, these tentative names satisfy this requirement and are consistent with previously ICTV-ratified names.

Additional studies will be required to establish the relationships between these new virus genomes and S. invicta. Fire ants are omnivores [61], so a virus sequence derived from the metatranscriptome may have originated from another organism that had been ingested by the ants. However, the virus genome sequences were highly expressed (Fig. 1) suggesting that replication was occurring as would be expected with active viral infections (as opposed to ingestion of a relatively small number of virus particles). Furthermore, each virus was detected from numerous locations. These associations will need to be investigated to establish whether the ant serves as host for each virus. Regardless, the metagenomics method greatly accelerates the prospecting phase for virus discovery [30]. Among the new viruses described, only SINV-6 was detected in the US S. invicta population indicating that many of the new viruses may be candidates for classical biological control in the US However, because a limited number of samples from the US range (from California, Mississippi, Florida) was evaluated, it is possible that additional viruses are actually present in US S. invicta. These results also support the long-held assumption that S. invicta escaped its natural enemies during the US founding event(s) [38, 39].

There were clear differences in the representation of each virus sequence based on developmental stage. SINV-7 was, by far, the most common virus genome sequence detected among all developmental stages, ranging from 60 to 100% depending on the library examined (Fig. 1). Brood-derived virus sequences were limited to SINV-7 and SINV-12. This result was unexpected because the larval stage of S. invicta receives all incoming solid food from foraging worker ants. The larvae accept solid food from worker ants, digest it, and regurgitate it in liquid form back to the workers who then redistribute it throughout the colony by trophallaxis [50, 62]. Thus, larvae should be exposed to the most pathogens because they are recipients of all incoming food. There are several possible explanations for the reduced virus representation in the brood metatranscriptome, including an inability of the virus to replicate within the larval stage (stage-dependent tropism), viruses being destroyed in the larval digestion process, and the overrepresentation of SINV-7 may interfere with other virus replication processes (virus–virus interactions) [9]. Stage- and tissue-specific tropism have been reported in other, well-studied, S. invicta viruses, including SINV-1 and SINV-3 [50].

All virus genomes except SMV were detected in the combined libraries (workers and brood; libraries 1–6) created from live ants. The greatest virus diversity was observed in the libraries derived from dead worker ants (Fig. 1). SMV was the only virus genome sequence exclusively detected in the libraries created from dead worker ants. Phylogenetic analysis of the RdRp of SMV placed it within the Totiviridae clade of viruses (Suppl. Fig. 1). When the entire sequence from the translated ORF2 (Pol) was analyzed, SMV clustered within a unique large group of Totivirus-like sequences (Fig. 6). This group is related, but distinct from the ICTV-recognized Totiviridae members within the genera Giardiavirus, Totivirus, Victorivirus, and Leishmaniavirus (Fig. 6). Virus members in the Totiviridae have only been reported to infect protozoan or fungal hosts [63]. The Totivirus-like group, including SMV, is unique phylogenetically but also distinguished because all members infect arthropods, suggesting that this group may be sufficiently divergent for unique taxonomic assignment. Several members have been obtained from ant hosts, including Linepithema humile [60] (Linepithema Toti-like virus), Camponotus nipponicus [23] (Camponotus nipponicus totivirus), and C. yamaokai [22] (Camponotus yamaokai virus). However, because SMV was limited to libraries obtained from ant midden piles, we assumed that the virus was likely infecting a fungal or protozoan opportunist. The actual host of SMV remains to be determined.

Four new genomes clustered within the Dicistroviridae, including two aparaviruses (SINV-12 and SINV-13), one Cripavirus (SINV-6), and one Triatovirus (SINV-9) (Fig. 3). Interestingly, the Aparavirus genus appears to be bifurcating into two distinct phylogenetic groups. One grouping (Aparavirus 1; Fig. 3) includes viruses that infect terrestrial insects (all within the order Hymenoptera). The other grouping (Aparavirus 2) includes viruses obtained from marine organisms (Crustacea and Mollusca) as hosts (Fig. 3). SINV-9 is the first fire ant-derived virus in the Triatovirus genus. However, Black queen cell virus (Accession AF183905) is its nearest relative in the group, which infects honey bees.

SINV-11 is the first Iflavirus associated with fire ants. The Iflaviridae is a very large group with a single genus that requires taxonomic revision [57]. Hosts within the Iflaviridae are limited to arthropods (primarily insects), with the exception of the King virus, that was obtained from a metagenomics project from a bat (Chiroptera) [12].

SINV-8 expands membership of the Sopolycivirus genus within the Polycipviridae. This is a new family of viruses within the Picornavirales with a unique polycistronic genome architecture [35]. Members of the genus Sopolycivirus primarily infect ants (Formicidae). There are three viruses from S. invicta within this genus, including SINV-2, SINV-4, and now SINV-8 (Fig. 4). SINV-2 has been reported to be associated with significant reductions in fecundity, longer claustral periods, and slower growth of newly established S. invicta colonies [31].

Two virus genomes were rather divergent and did not cluster within any established virus taxon. SINV-7 and SINV-10 branch basally with the new family, Solinviviridae [56], but are deeply divergent (Fig. 7). Blastx analysis of the translated ORFs of SINV-7 identified close relatives, including the Milolii virus from ghost ants (MF155030), the Alber virus (from an unknown species of ant from Lebanon [Alex Greninger—personal communication]; KX580900), the Bundaberg bee virus 8 from the honey bee (MG995704), the HVAC-associated RNA virus (from an air conditioner filter; MG775312), and the Blacklegged tick picorna-like virus 1 from a tick (MG647769). The nearest relative to SINV-10 identified by Blast analysis included the Hubei picorna-like virus 54 (KX884250) from an unknown arthropod host and Riptortus pedestris virus 1 (NC031750) from a hemipteran.

In conclusion, nine new RNA virus genomes have been discovered by metagenomic RNA sequencing of S. invicta samples collected from across the Formosa region of Argentina making the virome of S. invicta the best characterized of any ant species. Eight of the new viruses are exclusive to the South American S. invicta population, indicating that they could be potential classical biological control agents against the US population. Additional investigation will establish the host, impact, and suitability of each virus as a natural control agent for S. invicta.