Introduction

Cotton (Gossypium hirsutum) is one of the major socioeconomically important crops cultivated worldwide, serving as the main source of fiber for the textile industry [1,2,3]. Cotton crops are constantly challenged by several insect pest species [4]. The cotton boll weevil (CBW), Anthonomus grandis (Coleoptera: Curculionidae), is considered the major insect pest in South and North America, and cotton crops exhibit the highest incidence of infestation during the transition period from flowering to fructification [5, 6]. CBW adults feed on and lay eggs within the cotton reproductive structures, often causing flower bud abortion [7, 8]. Since CBWs are endophytic, their larvae can cause damage to flower buds when they are not aborted, impacting fiber quality [6, 9]. Their high reproductive capacity, plasticity, and genetic variability and their occurrence in crop residues or stumps has helped to increase the incidence, density, and geographic distribution of CBWs worldwide [6, 10,11,12]. There is currently no conventional or transgenic cotton cultivar with satisfactory resistance to CBW available to cotton producers. Consequently, insecticides are applied many times each year for its management [13]. Unfortunately, the frequent occurrence of CBW populations with reduced susceptibility to insecticides and failure of chemical control in cotton crops has already been reported in Brazil [14, 15]. Meanwhile, the identification of new viruses that infect CBW may provide information leading to the development of molecular or biological tools for their effective control.

Here, we used an RNA sequencing approach involving next-generation sequencing (NGS) to investigate the presence of viruses and coding viral RNA in apparently healthy native adult CBW insects collected in September 2020 in a cotton field situated in Serra da Petrovina (16o47’53’’S and 54o07’53’’W), Pedra Preta city, Mato Grosso state, Brazil. Pooled CBW insects (200 mg) were macerated in SM buffer (100 mM NaCl, 8 mM MgSO4, and 50 mM Tris-Cl, pH 7.5), using a mortar and pestle. The homogenate was filtered once through cheesecloth and centrifuged three times at 4,000 × g for 10 min to clarify the supernatant, which was then used for RNA extraction using a QIAamp Viral RNA Mini Kit (QIAGEN, Hilden, Germany) according to the manufacturer’s instructions. rRNA was removed from the total RNA sample using a Ribo-Zero rRNA Removal Kit (Illumina, San Diego, CA, USA), and a cDNA library was constructed using a TruSeq RNA Library Preparation Kit (Illumina, San Diego, CA, USA). The cDNA sample was sequenced at Macrogen (Gangnam-gu, Seoul, Republic of Korea) using an Illumina HiSeq 2000 paired-end platform. The raw reads were quality trimmed and assembled de novo using MEGAHIT software [16]. The resulting contigs that were closely related to viral sequences were retrieved from an in-house viral RefSeq database using BLASTx. To extend the assembled sequences as far as possible, generated/trimmed reads were mapped back to the respective viral genomes using Geneious 11.1.5 software [17], which was also used for genome annotation. Open reading frames (ORFs) were confirmed using a BLASTx search against the NCBI non-redundant protein database (08/2021).

The NGS resulted in 56,210,608 total reads, 138,798 of which were considered virus-related sequences. De novo assembly of these viral reads generated a consensus sequence that was 10,632 nucleotides in length (GenBank accession number OK413669, Supplementary Material S1). The genome coverage was 1,440X. A single ORF of 8,913 nucleotides encoding a large polyprotein was predicted, in addition to a 5’-UTR (1,158 nucleotides) and a 3’-UTR (561 nucleotides). The 5’ and 3’ ends of the viral genome were confirmed by rapid amplification of cDNA ends (RACE) using 5' and 3’ RACE System for Rapid Amplification of cDNA Ends, version 2.0 (Thermo Fisher Scientific) according to the manufacturer’s protocol (data not shown). The amplified 5’ and 3’ products were sequenced using a MinION platform, using a Rapid RBK110.96 kit, following the manufacturer’s instructions (Oxford Nanopore Technologies), and the sequences were analyzed using Geneious 11.1.5 software. Functional regions of the polyprotein encoding structural and non-structural proteins flanked by putative proteolytic cleavage sites were identified by sequence alignment (Fig. 1). From these data, the genome organization of the novel virus clearly resembled that of other members of the family Iflaviridae [18, 19].

Fig. 1
figure 1

Genome organization of the putative new picorna-like virus, tentatively named "Anthonomus grandis iflavirus 1" (AgIV-1), identified in cotton boll weevil (Anthonomus grandis). The viral genome consists of 10,632 nucleotides with a 5’-UTR and a 3’-UTR with a polyadenylated tail, containing a large polyprotein ORF of 8,913 nucleotides encoding the structural proteins VP1 to VP4 in the N-terminal region and the non-structural proteins helicase, protease, and RNA-dependent RNA polymerase (RdRp) in the C-terminal region. The N-terminus of the polyprotein contains a typical L-polypeptide.

An amino acid sequence alignment made using the MAFFT method [20] showed that this putative polyprotein was 32.13% identical to the corresponding sequence of a putative iflavirus (QKN89051.1) found in samples from wild zoo birds in China. The International Committee on Taxonomy of Viruses (ICTV) [19] has established two demarcation criteria for creating new species in the genus Iflavirus: (1) natural host range and (2) amino acid sequence identity in the capsid protein under 90% [21, 22]. We therefore conclude that this new picorna-like virus should be recognized as a member of a new species in the genus Iflavirus. We have tentatively named this putative new virus "Anthonomus grandis iflavirus 1" (AgIV-1).

Phylogenetic analysis was performed based on whole genome sequences of the putative novel virus and those of other iflaviruses (Supplementary Table S1). Sequence alignments were made by the MAFFT method [20] using the whole polyprotein coding regions of related viruses (Fig. 2). A maximum-likelihood tree was constructed by the FastTree method implemented in Geneious 11.1.5 software [17], and branch support was estimated using a Shimodaira-Hasegawa-like test. According to Silva et al. [23], iflaviruses do not form separate clades according to the insects they infect, suggesting that they did not follow the same evolutionary path as their insect hosts at the order level [23, 24]. The results of this study corroborate this observation, and AgIV-1 can be seen to be ancestral within a clade with three iflaviruses found in lepidopteran hosts (Fig. 2).

Fig. 2
figure 2

Phylogenetic analysis of the new picorna-like virus, tentatively named "Anthonomus grandis iflavirus 1" (AgIV-1), identified in cotton boll weevil (Anthonomus grandis) and other viruses belonging to the families Iflaviridae and Dicistroviridae. The midpoint-rooted maximum-likelihood phylogenetic tree was constructed based on whole polyprotein sequences, using the FastTree method implemented in Geneious 11.1.5 software. The branch support was estimated using a Shimodaira-Hasegawa-like test.

The identification of new viruses infecting cotton boll weevil increases our knowledge of their diversity and evolution and provides new information that could lead to the development of biotechnological tools for its control. Few iflaviruses have been associated with symptoms or acute disease [25, 26], with deformed wing virus of honey bees being an exception [27]. Future studies with this new picorna-like virus will be focused on its prevalence, infectivity, and possible use as a viral vector to carry interfering RNA for biological control strategies.