Introduction

Silverleaf whitefly, Bemisia tabaci (Gennadius), is a polyphagous pest that causes severe damage to more than 600 plant species, directly by feeding and excreting honeydew that causes sooty mould, and indirectly by transmitting more than 200 plant viruses (De Lima et al. 2021). In 1889 it was first discovered in cotton fields in Greece (Cock 1993), and on cotton in Pusa (Bihar, India) during 1905 (Misra and Lamba 1929). Due to its remarkable ability to shift, develop, adapt, and monopolise in new environments, it is regarded as one of the top 100 invasive alien species in the world (Ramos et al. 2018). Studies have also revealed the existence of B. tabaci cryptic species, which are morphologically ambiguous but does have distinctive biological, physiological, and genetic variations that have caused its prominence to shift its nomenclature from biotypes (Costa et al. 1991), to races (De Barro et al. 2005), to genetic groups (Boykin et al. 2007) and species (Tay et al. 2013).

To date, globally 46 cryptic species of B. tabaci with 4% genetic divergence have been identified under 11 genetic groups (Mugerwa et al. 2018; Lestari et al. 2021; Rehman et al. 2021). While 10 cryptic species (Asia I, Asia II 1, Asia II 5, Asia II 7, Asia II 8, Asia II 11, Asia IV, China 3, China 7 and Middle East Asia Minor 1 (MEAM 1)) have been reported from India. Understanding the genetic variation and distribution of B. tabaci cryptic species has become incredibly important in light of the current climate change, increase in the domestic transportation of agricultural products, and the intensive pest control techniques.

Therefore, in the current study we firstly conducted an interhost survey and learned that Brinjal host had the highest B. tabaci infestation, allowing us to conduct interlocation survey in the Brinjal crop of Bihar. Molecular markers viz. RAPD, SSR, mtCOI were used to study the genetic variability and also for characterizing the morphologically indistinguishable B. tabaci cryptic species (Mugerwa et al. 2018).

Materials and methods

Whitefly collection

During the survey, adult whiteflies were randomly collected using a handheld aspirator from underside of the leaves from different hosts in the Pusa region of Bihar and also from brinjal host of different locations in Bihar (Kothia, Pusa, Dhrubgama, Mandai Dih, Mirapur, Madhurapur, Jhakra, Alipur Bihta, Faridpur, Charuipar and Dariapur); one location each from Telangana (Tadikonda) and Andhra Pradesh (Bapatla) states as check to explore the diversity (Table 1; Fig. 1). Adults taken from the same host plant at a sampling location were kept in the same tube, whilst those taken from different host plants were kept in separate tubes with 99.9% ethanol. Samples were identified for confirmation of B. tabaci using morphological keys (Calvert et al. 2001; Baig et al. 2015).

Table 1 Details of interhost and interlocation survey conducted during the year 2020-2021
Fig. 1
figure 1

Survey during the year 2020-2021

Genomic DNA extraction

Genomic DNA was extracted based on the method given by Frohlich et al. (1999) with several modifications. Individual adult whitefly from each population (interhost and interlocation) were selected randomly and inserted into 1.5 μL eppendorf tube and a total of 100 μL of 2% cetyltrimethyl ammonium bromide (CTAB), 10 μL of 20% sodium dodecyl sulphate (SDS), 10 μL of 0.1% 2-mercaptaethanol and 10 μL of proteinase K were added and homogenized using the plastic rods. After being homogenized 400 μL of 2% CTAB was added and incubated at 65 °C for 45 min. After the incubation phenol: choloroform: isoamyl alcohol (25:24:1) was added and shaken gently for 10 sec and centrifuged for 10 min at 14000 rpm. A total of 400 μL of supernatant was taken, transferred into a new tube and added equal volume of chilled Isopropanol and put in a −20 °C for 1 hr. After that, it was centrifuged at 12000 rpm for 8 min. The supernatant was then discarded, added 400 μL of 70% chilled ethanol and centrifuged at 12000 rpm for 5 min. The supernatant was then discarded and dried at room temperature, added 50 μL of TE buffer and stored at −4 °C until used as template for PCR amplification.

DNA amplification

Thermocycler was executed with a 20 μl reaction mixture comprising of 2 μl template DNA (100 ng), 1.5 μl of forward and reverse primers, 8 μl of Taq mixture and 8.5 μl of nuclease free water. The following conditions were used to run the thermo cycler: initial denaturation at 95°C for 2.30 sec, 35 cycles of denaturation at 94°C for 45 sec, annealing at (Supplementary 1) for 30 sec and extension at 72°C for 5 min followed by final extension of 72°C for 10 min. The PCR results were then electrophoresed using a 2% agarose gel suspension in TAE buffer with 5 µL ethidium bromide at 100 V for 30 min and PCR products were visualized under UV light by Syngene gel documentation system.

Amplicon scoring and data analysis of RAPD and SSR primers

Gel images acquired with the Syngene gel documentation system were employed to score the data matrix (one and zero for the presence and absence of bands, respectively) with AlphaView SA software. The scored marker data matrix was further used to generate a dendrogram (Sneath and Sokal 1973) based on genetic dissimilarity in DARwin 6 software (Perrier et al. 2003). As a result, total amplified bands, number of polymorphic bands, Percentage of Polymorphic Bands (PPB), Polymorphism Information Content (PIC) (Anderson et al. 1993); Resolving Power (RP) (Prevost and Wilkinson 1999); Effective Multiplex Ratio (EMR) (Kumar et al. 2009); Marker Index (MI) (Powell et al. 1996) for each RAPD and SSR marker were computed to determine the informativeness of primers. The genetic variations among interhost and interlocation B. tabaci populations analyzed using RAPD and SSR primers (1st tier) were promoted to the 2nd tier (Universal primer: mtCOI analysis).

Sequencing and phylogenetic analysis

The PCR products (mtCOI) were purified using a QIA quick PCR purification kit (Qiagen Inc. Valencia, CA and USA) and then directly sequenced by ABI 3130XL genetic analyzer at Eurofins Genomics Bengaluru, Karnataka. The obtained sequences (657 bp) were aligned in Molecular Evolutionary Genetic Analysis Version X (MEGA X) using ClustalW to look for duplicates, gaps, indels and pseudogenes (Tamura et al. 2011). By performing maximum likelihood fits of 24 different nucleotide substitutions, the best model for phylogenetic tree construction was estimated with help of Bayesian Information Content (BIC) value (Felsenstein 1981). Further, estimates of evolutionary divergence between sequences and maximum composite likelihood of nucleotide substitution pattern were computed in MEGA X. To ensure proper reading frame, the sequences were translated into their corresponding amino acids using ExPASy translate (Gasteiger et al. 2003), then aligned with ClustalW to observe for conserved, semi-conserved, and fully conserved regions (Sievers et al. 2011). Neutrality tests, such as Fisher's (Fisher 1935) and Tajima's (Tajima 1989) were employed to determine whether the COI fitted to the neutrality requirements. The mtCOI generated sequences were found to be 100% identical to B. tabaci and were submitted in National Center for Biotechnology Information (NCBI) GenBank database (Altschul et al. 1997) and accession numbers were retrieved for all the populations (Table 1).

Results and discussion

RAPD analysis (1st tier)

RAPD polymorphism

The genetic variation of B. tabaci populations was explored with 11 RAPD primers, which were amplified with polymorphism ranging from 90 to 100% and produced 110 bands altogether with mean number of total bands and polymorphic bands per primer was 10.00 and 9.09 respectively. Higher PIC (0.81) in F2 and higher EMR (14.00), MI (10.42) and RP (8.07) in F12 primers revealed there greater informativeness and low EMR (7.00), RP (2.82) in OPA-15; low PIC (0.49), MI (4.45) in OPA-5 revealed there lesser informativeness in examining variation of B. tabaci populations (Table 2). Among all the primers, OPA-11 was identified as a potential genetic marker owing to its single monomorphic band with 90% polymorphism, because if there was no monomorphic band, then population would be deliberated as a distinct species (Maurya et al. 2020). Queiroz et al. (2017) observed more than 70% polymorphism in OPA-05 (70.0), OPA-10 (77.9), OPA-11 (73.8), OPA-13 (77.3), OPA-15 (70.8) and these observations are in line with our findings. Similarly, Hameed et al. (2012) and Hopkinson et al. (2020) employed RAPD primers to identify variations in B. tabaci populations.

Table 2 Details on amplification of RAPD region in genomic DNA of 29 B. tabaci populations

UPGMA clustering and dendrogram

The data were clustered using the methodology of the Unweighted pair-group method with arithmetic averages (UPGMA) in a dendrogram, the dissimilarity coefficients of 29 populations. There were ten distinct clusters in the interlocation population, with Bapatla (cluster IV) having the highest dissimilarity coefficient of 46% and Madhurapur and Mirapur (cluster I) having the closest relationships with a dissimilarity coefficient of 30%. Interhost populations of Okra and Dolichos bean (cluster VI) are most closely related with a dissimilarity coefficient of 36% and Common jasmine (cluster X) populations shared higher dissimilarity coefficient of 57%. Cluster I of interlocation population viz. Madhurapur, Mandai Dih, Mirapur, Dhrubgama, Jhakra and Kothia (Northern Bihar) (Fig. 2) and cluster II of Charuipar, Faridpur and Dariapur (Southern Bihar) demonstrates that populations were differentiated based on their geographical locations. Cluster V of interhost populations belong to the Solanaceae family shared a lesser dissimilarity coefficient (37%). Results clearly demonstrated that interlocation populations were less diverged than the interhost populations, due to the fact that they were primarily collected from the same host viz. brinjal. Similar pattern studies were conducted with RAPD primers in B. tabaci (De Barro and Driver 1997); Myzus persicae (Zitoudi et al. 2001); Helicoverpa armigera (Lopes et al. 2017); Leucinodes orbonalis (Murali et al. 2021).

Fig. 2
figure 2

Dendrogram deduced from matrix of pair wise distances in RAPD analysis of B. tabaci interhost and interlocation using UPGMA

SSR analysis (1st tier)

SSR polymorphism

The genetic diversity of B. tabaci populations was investigated using nine SSR primers, which amplified with 100% polymorphic bands and generated a total of 60 bands, with 6.66 mean number of total bands and polymorphic bands. Higher PIC (0.878), EMR (9.0), MI (7.902) and RP (5.79) in Btls1-2 revealed there greater informativeness and low PIC (0.664) in Bta1; EMR (4.00) and MI (3.196) in Btls 1-6 and RP (1.238) in Bta 11 revealed less informativeness in examining the variability of B. tabaci populations (Table 3). Similar to our studies, De Barro et al. (2003), Simón et al. (2007), Gauthier et al. (2008) and Ben Abdelkrim et al. (2017) used these primers for examining genetic variability among B. tabaci populations. Contrarily, Valle et al. (2012) observed lowest polymorphism percentage in Bta11 primer, which showed 100% in our study.

Table 3 Details on amplification of SSR region in genomic DNA of 29 B. tabaci populations

UPGMA clustering and dendrogram

The UPGMA technique was used to cluster the data into a dendrogram using the dissimilarity coefficients of 29 populations. There were ten distinct clusters in the interlocation population, with Alipur Bihta, Dhrubgama, and Mirapur (cluster V) having the highest dissimilarity coefficient of 42% (cluster VIII) and being the most closely connected with a dissimilarity value of 30%. However, interhost populations of Cucumber and Potato (cluster I) are most closely related, sharing a dissimilarity coefficient of 38%, while, Common jasmine was the most divergent with higher dissimilarity coefficient of 56% (Fig. 3). According to Fakrudin et al. (2004), this augmented genetic variability might aid species in evolving and adapting to new environment more quickly. The lower dissimilarity coefficient observed between B. tabaci populations from interlocation can be enlightened by the certainty that they were all collected from the same host (Brinjal), whereas the higher dissimilarity coefficient observed among B. tabaci populations from interhost could be due to their collection from different hosts. Similar pattern of differentiation studies were conducted by Valle et al. (2012) and Reddy et al. (2022) using SSR primers in Bemisia tabaci and Helicoverpa armigera populations, respectively.

Fig. 3
figure 3

Dendrogram deduced from matrix of pair wise distances in SSR analysis of B. tabaci interhost and interlocation using UPGMA

mtCOI analysis (2nd tier)

To construct a B. tabaci cryptic species phylogenetic tree, we first collected 29 mtCOI sequences from interhost and interlocation of Bihar along with Andhra Pradesh (Bapatla) and Telangana (Tadikonda) produced an amplicon of 657 bp mtCOI region (Supplementary 1). The phylogenetic tree was built based on Hasegawa-Kishino-Yano with Gamma distribution model (HKY+G) for 29 sequences along with 44 reference sequences and Bemisia atriplex, Bemisia afer and Trialeurodes vaparorium as out groups (Fig. 4) (Supplementary 2).

Fig. 4
figure 4

Phylogenetic tree inferred from 657 bp sequences of 29 mtCOI genes, 44 cryptic species of B. tabaci and three out groups

It was noted that four cryptic species viz. Asia I, Asia II 1, Asia II 7 and China 3 were found to cluster with 29 B. tabaci populations (Fig. 5). The sequences of interhost (Okra, Dolichos bean, Pointed gourd, Tomato, Potato, Mexican marigold, Cucumber, French bean, Indian jujube, Congress grass, White fig, Cluster fig and Common jasmine); interlocation (Pusa, Bapatla, Dariapur, Charuipar, Faridpur, Mirapur, Jhakra, Mandai Dih, Dhrubgama, Alipur Bihta, Madhurapur, Kothia) clustered with Asia I was found to be the major cryptic species accounting for 25 of the 29 (86.20%) B. tabaci populations. And therefore, Asia I had a greater potential to inflate and adapt in Bihar among four reported cryptic species. Similarly, Roopa et al. (2015) sequenced 71 B. tabaci samples and found Asia I cryptic species to be the most predominant accounting for 44 out of the 71 samples (61.97%).

Fig. 5
figure 5

Phylogenetic tree inferred from 657 bp sequences of 29 mtCOI genes, four cryptic species of B. tabaci and three out groups

Asia II is a genetically diverse group consisting of 13 sub cryptic species, Asia II (1–13) (Kanakala and Ghanim 2019) among them, only Asia II 1 and Asia II 7 were detected from the collected B. tabaci samples. Interestingly, Black nightshade and Chinese hibiscus collected from single location (Pusa region) clustered with Asia II 1 and Asia II 7, respectively shows that cryptic species add a impact on host plant selection. Roopa et al. (2015) discovered Asia II 7 cryptic species on Chinese hibiscus, which supports with our findings. Chowda-Reddy et al. (2012) stated that Asia II 7 was primarily found in Southern and Western India, which is in contrast to our results as Asia II 7 cryptic species was found in Bihar (Eastern India). Moreover, Barberton daisy grouped with China 3 (Fig. 4) has been discovered for the first time in Bihar, which borders West Bengal and is where Ellango et al. (2015) first discovered the China 3 cryptic species.

In contrast to the prior data given by Misra and Lamba (1929), Chowda-Reddy et al. (2012), Roopa et al. (2015) and Rangaswamy et al. (2019), it was evident that among the four cryptic species (Asia I, Asia II 1, Asia II 7, and China 3); Asia II 7 and China 3 were reported for the first time in the Bihar region. Given the wide-spread occurrence of numerous cryptic species, the abundance of suitable hosts, climate change, the overall domestic transportation of agricultural products, and intensive pest control strategies, B. tabaci has a high likelihood of acquiring an adaptive advantage in various parts of the nation.

Multiple sequence alignment

Multiple alignment of 29 B. tabaci nucleotide sequences revealed 305 completely conserved residues (Supplementary 5) and 105 Single Nucleotide Polymorphisms (SNPs). Furthermore, in multiple alignment of amino acid sequences 75 fully conserved residues, 46 conserved residues and 23 semi conserved residues were identified (Supplementary 6). This greater level of nucleotide similarity indicates that they evolved from a common origin. Similarly, Wosula et al. (2017); Kunz et al. (2019) found 7453 SNPs and 125 conserved amino acid residues, respectively with amplification of mtCOI region of B. tabaci.

Pair wise genetic distance

The pair wise genetic distance of B. tabaci populations ranged from 0.00 to 0.47 (Supplementary 4) with an over mean distance of 0.08. Similarly, Dinsdale et al. (2010) reported zero to 34% genetic distance among 198 B. tabaci populations.

Patterns of nucleotide substitution in mtCOI

As nucleotide composition is a crucial aspect of nucleic acids, the study revealed that thiamine (T) (43.37%) and guanine (G) (19.00%) had the highest and lowest numbers of nucleotide bases, respectively. Similarly, Roopa et al. (2015) observed highest nucleotide base in thiamine (43.10%) and lowest in guanine (13.22%) with B. tabaci mtCOI sequences. Furthermore, the base composition of mtCOI gene fragment was biased towards Adenine (A) and Thymine (T) with an overall 67.54% which was a universal feature of nucleotide diversity (Lynch 2008).

Neutrality tests

The negative Tajima's D = -0.726310 (Table 4) and Fisher’s exact test with P value <0.05 (Common jasmine, Congress grass, Chinese hibiscus, Black nightshade, Cucumber, Kothia, Madhurapur, Dhrubgama, Alipur Bihta, White fig, Chinese hibiscus and Tadikonda) (Supplementary 7) indicated excess of low frequency polymorphisms. Thus, both tests supported the neutral theory of evolution and these findings were supported by Tocko-Marabena et al. (2017) who concluded that B. tabaci was found to be significant with Tajima's D (-2.45317).

Table 4 Results from Tajima's Neutrality Test

Conclusion

In the present study, we were able to confirm the existence of four B. tabaci cryptic species (Asia I, Asia II 1, Asia II 7, and China 3) in Bihar. In particular, Asia II 7 and China 3 were discovered for the first time in the Bihar region, while Asia I cryptic species dominated all interhost and interlocation populations. Overall, this study contributes to the characterization of various B. tabaci cryptic species for assessing ongoing changes in genetic diversity, evolutionary history, and potential spread that enable effective pest management while avoiding overuse of insecticides and lowering environmental pressure.