Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is readily transmitted from person to person. We evaluated the emerging landscape of SARS-CoV-2 variants in Bangladesh from a retrospective study of nasopharyngeal swabs collected from 130 SARS-CoV-2-positive cases randomly selected over 6 months. Mutation analysis of whole-genome sequencing of 130 SARS-CoV-2 variants revealed 528 unique coding mutations, of which 102 were deletions, 6 were premature stop codons, and the remaining were substitutions. The most common mutation in the cohort was ORF1b:P314L, with a frequency of 98.5%. A total of 132 unique coding mutations were observed in the spike protein gene. Fourteen mutations were mapped to the spike protein receptor binding domain (RBD). These mutations increase the affinity between the spike protein and its human receptor, angiotensin converting enzyme 2 (ACE2), thereby increasing SARS-CoV-2 transmissibility. This study will help understand the SARS-CoV-2 virus and ultimately aid in monitoring and combatting the COVID-19 pandemic by furthering research on appropriate therapies. Analysis of age revealed closer association of the Delta variant with older populations and of the Omicron variant with younger populations. This may have important implications on how we monitor infections, distribute vaccines, and treat patients based on their ages.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
SARS-CoV-2, the virus responsible for the COVID-19 pandemic, was first identified in Wuhan, China, in late 2019. Since the beginning of the pandemic, as of April 2022, more than 504 million cases have been reported globally, with more than 6.2 million deaths (Yu et al. 2022). COVID-19 has had tremendous global and social impacts (Sanjay 2020). Bangladesh, a low- and middle-income country of nearly 170 million people, has been similarly affected.
SARS-CoV-2 testing is a principal bulwark in the response to the pandemic. The Bangladesh government imposed restrictions and quarantines in response to the pandemic, and also took the lead in launching testing in public and private facilities. Even today, SARS-CoV-2 testing is a key pillar in the response to the pandemic. Praava Health, a private healthcare facility, established one of the first PCR testing laboratories and led a concerted effort to test patients in Dhaka, Bangladesh, and neighboring areas. Testing for SARS-CoV-2 in people who have symptoms and also in those who have no symptoms but may have been exposed to the virus can help prevent the spread. A positive test early in the course of the illness enables individuals to isolate themselves and allows them to seek treatment earlier. It may also reduce the risks of infecting others and developing severe disease, long-term disability, or death. Since nearly half of all SARS-CoV-2 infections are transmitted by people who show no symptoms, identifying asymptomatic and pre-symptomatic infected individuals plays a major role in controlling the pandemic (Johansson et al. 2021). Comorbidities such as heart disease, obesity, and diabetes are also more common in under-represented communities because of long-standing societal and environmental factors and impediments to healthcare access (Bajgain et al. 2021). COVID-19 can spread quickly in these communities, and the impact of that spread is high. Testing, particularly of asymptomatic and pre-symptomatic individuals, is the key to stopping this spread. According to WHO, SARS-CoV-2 will be difficult to eradicate and will probably continue to circulate indefinitely with periodic outbreaks and epidemics. This will make testing critical for decreasing transmission (Morens et al. 2022). The current study was designed to investigate the genomic diversity of SARS-CoV-2 variants isolated from Bangladeshi patients and to analyze the temporal profile of the mutational accumulations within the whole genome and within the gene encoding the spike protein.
2 Methods
2.1 Collection of samples and clinical data, nucleic acid extraction, and COVID-19 testing of samples
Nasopharyngeal swabs from patients were collected in viral transport media (VTM) according to CDC guidelines, and clinical information including age, gender, symptoms, clinical classification, and locality were recorded. Total nucleic acid was extracted using commercial kits according to the protocol of the manufacturer. Total nucleic acid of 5 μL was subjected to RT-PCR screening following the CDC’s 2019 Novel Coronavirus (2019-nCoV) Real-Time RT-PCR Diagnostic Panel guide. From April 2021 to January 2022, a total of 130 positive samples with Ct values less than 30 were selected randomly for whole-genome sequencing at the Genomic Research Laboratory of the Bangladesh Council of Scientific and Industrial Research (BCSIR). Informed consent was obtained from all participants in the study.
2.2 Whole-genome sequencing of SARS-CoV-2 using MiniSeq
Random hexamers generated the cDNA-directed reverse transcription using 20 μL of RNA extract, 660 μM dNTPs, 5 x RT Improm II reaction buffer (Promega), 50 ng hexanucleotides, 1.5 mM MgCl 20 U RNasin® Plus RNase Inhibitor (Promega, Madison, Wisconsin), and 1U of ImProm-II™ Reverse Transcriptase (Promega). SARS-CoV-2 genomes were quantified by using a qRT-PCR assay targeting a conserved region of the envelope gene. Sequencing-ready libraries were prepared using cDNA from the CoV sample (CoVOC43), the viral pool sample (ViralPool) with Nextera Flex for Enrichment (Illumina, San Diego, California), and IDT for Illumina Nextera DNA UD Indexes. The total DNA input recommended for tagmentation is 10–1000 ng. After tagmentation and amplification, samples were enriched with the Respiratory Virus Oligos Panel (Illumina, San Diego, California), which features ~7800 probes designed to detect respiratory viruses, recent flu strains, and SARS-CoV-2. After enrichment, the prepared libraries were quantified, pooled, and loaded onto the MiniSeq™ sequencing system with an output of 2× 76-bp paired-end reads for sequencing.
2.3 Data analysis
NextClade v2.9.1 (https://clades.nextstrain.org/) was used for mutation identification, clade assignment, and placing the sequences in the SARS-CoV-2 phylogenetic tree. Lineage analysis was carried out using Pangolin v4.2 (https://pangolin.cog-uk.io/). Genome sequences of 2828 SARS-CoV-2 isolates submitted in GISAID from 1 January 2021 to 1 February 2022 were collected and used as background data for sublineage analysis. Civet (https://github.com/artic-network/civet) was used to cluster sequences based on common mutations.
Genomic sequences were aligned using the FFT-NS-2 method of MAFFT v7.505 using the SARS-CoV-2 isolate Wuhan-Hu-1 complete genome (MN908947) as reference. A neighbor-joining phylogenetic tree considering uniform rates of substitution according to the maximum composite likelihood model was constructed and visualized in MEGA11 software. Ambiguous positions were discarded by pairwise deletion. The 1st, 2nd, 3rd, and noncoding positions were considered as the codon position positions to be analyzed. The original dataset was resampled 1000 times to derive the bootstrap values, and values corresponding to branches that were not reproduced in at least 30% replicates are not shown in the tree.
Summary statistics were shown as means±SD for continuous variables and as percentages for categorical variables. Patients were stratified into six groups according to their age. The association of variants with age and gender were investigated via the chi-square test using GraphPad Prism 8.4.2 (www.graphpad.com). A p-value less than 0.05 was considered statistically significant.
3 Results
In this study, a total of 130 samples from patients who underwent SARS-CoV-2 testing at Praava Health were collected and analyzed. The demographic characteristics, vaccination status, and comorbidities of the study participants are presented in table 1.
Results presented in figure 1 demonstrate that of the patients who tested positive for SARS-CoV-2 by PCR, 23.8% were between 30 and 39 years, 23.1% were between 18 and 29 years, 18.4% were between 50 and 64 years, 18.4% were 65 years or older, and only 5.4% were between 1 and 17 years. Neither gender was more closely associated with infection status (results not shown). Our data indicate an increased association of hypertension (38.5%) and type 2 diabetes (32.5%) in SARS-CoV-2-positive patients.
3.1 Lineage and phylogenetic analysis
Using the unique mutations (as the sequences were submitted to NextClade and assigned the lineages) within a viral genome, a lineage and phylogenetic analysis can be used to assign standardized lineages/variants independent of location and sample size. Lineage analysis was carried out using both the clade assigner of NextClade and the Pangolin SARS-CoV-2 lineage assigner. Results of NextClade contained 9 unique clades, whereas the Pangolin assignment contained 13 lineages (figures 2 and 3).
Among the sequenced cases, the predominant Delta variants, comprising 60 samples (46.2%), belonged to the 21A, 21J, and 21I clades and B.1.617.2 and AY.122 Pangolin lineages. The Delta variant first emerged from India in October 2020 and was reported to have increased transmissibility (Petersen et al. 2022). The next largest clade identified by NextClade was 20H (Beta variant), comprising 47 samples (36.1%), all belonging to the Pangolin lineages B.1.351 and B.1.351.3. The next largest was a group of 14 samples (10.8%) belonging to the 21K and 21L clades (Omicron variant) and BA.1, BA.1.17.2, BA.1.1, BA.2.10, BA.2, and BA.2.10.1 Pangolin lineages (figures 2 and 3) (Pradhan et al. 2022).
3.2 Gender/age association with SARS-CoV-2 variants
No significant gender bias with SARS-CoV-2 variants was identified in this study (table 1), and gender-wise stratification of the major clades (Beta: B.1.351 + B.1.351.3; Delta: B.1.617.2 + AY.122; Omicron: BA.1 + BA.1.17.2 + BA.1.1 + BA.2.10 + BA.2 + BA.2.10.1) revealed that SARS-CoV-2 infection was not related to gender (p=0.3851) (figure 4).
SStratification of patients infected with the major variants Beta, Delta, and Omicron by age revealed that the highest percentage (29.8%) of patients infected with Beta variants were aged between 18 and 29 years, followed by those between 30 and 39 years, 50 and 64 years, and 40 and 49 years. Among patients infected with the Delta variant, the highest percentage (26.7%) belonged to the oldest age group (>65), followed by age groups 50–64, 30–39, and 18–29 years. A preponderance of patients infected with the Omicron variant were between 30 and 39 years of age. Our data indicate that the Beta variant was prevalent in younger populations (18–39 years), whereas the Delta variant more frequently infected the older population (>50 years), and the Omicron variant was more prevalent among the youngest population (18–39 years) (figure 5).
3.3 Chronological prevalence of different SARS-CoV-2 variants from April to July 2021 in the sampled population
Analysis of the significant lineages over time showed that B.1.617.2 was the predominant lineage for most of 2021, co-existing with the B.1.351, B.1.351.3, and AY.122 lineages. In early 2022, the viral population belonged exclusively to the BA.1, BA.1.17.2, BA.1.1, BA.2.10, BA.2, and BA.2.10.1 lineages (figure 6).
Plotting the dominant lineages over time revealed a high diversity among circulating viral strains almost throughout 2021, with a gradual shift of the circulating strains from Beta to Delta variants. From the beginning of 2021 to mid-2021, the Beta variants (B.1.351 and B.1.351.3) predominated, but were later replaced by the highly transmissible Delta variants (B.1.617.2 and AY.122). According to our study, the viral strain B.1.617.2 persisted the longest in 2021, with identification dates ranging from the end of May 2021 until the very end of 2021. At the beginning of 2022, the identified viral strains exclusively belonged to the BA.1, BA.1.1, and BA.2 lineages, indicating a further shift of the viral population from Delta to Omicron variants, congruent with the global scenario (figure 6).
The NextClade phylogenetic tree revealed that most of the sequences from our dataset belonged to the 20H (Beta, V2), 21A (Delta), and 21M (Omicron) clades. The neighbor-joining tree grouped one sequence from 21D (Eta) with the 20H (Beta, V2) sequences. The Delta clades (21A, 21J, and 21I) were grouped in a separate branch. The Delta and Beta sequences were evolutionarily closer to the Beta sequences emerging earlier, while the sequences belonging to 20I (Alpha, V1) was revealed to be closer to the Omicron (21K and 21L) sequences (figures 7 and 8).
3.4 Mutation analysis
The 130-sample cohort had an average of 34.01 coding mutations per sample (range 16–85) and a median of 31.0. A total of 528 unique coding mutations were observed, of which 102 were deletions, 6 were premature stop codons, and the remaining were substitutions. The number of coding mutations increased as the viral population shifted to highly mutated BA.1, BA.1.1, and BA.2 variants at the beginning of 2022 (figure 9), leading to greater genetic diversity in the most recently emerging variants (21L and 21K, Omicron) in the pandemic, as seen in figure 10. ORF1a harbored the greatest number of mutations, which can be attributed to its long ORF, which codes for a total of 10 proteins. Normalizing for ORF length, ORF1a appears significantly less tolerant to missense mutations than ORF7b, ORF8, the N gene, and the E gene (figure 11).
Eight mutations were observed in more than 50% of the samples sequenced. The most common mutation found in the cohort was ORF1b:P314L, which occurred at a frequency of 98.5% (128 samples). The globally dominant D614G mutation in the spike protein occurred at the second-highest frequency of 84.6% (110 samples). The deletion mutations ORF1a:S3675-, ORF1a:G3676-, and ORF1a:F3677- were found in almost half of the samples with frequencies of 53.1%, 52.3%, and 50.8%, respectively. The other substitution mutations that occurred in more than 50% of samples were ORF1a:P2046L, S:T478K, and M:I82T.
A total of 132 unique coding mutations were observed in the spike protein with the 9 most prevalent mutations appearing in at least 35% of samples: D614G (84.6%), T478K (50.8%), P681R (47.7%), R158G (45.4%), T19R (45.4%), E156del (45.4%), F157del (45.4%), L452R (43.8%), and D215G (36.9%). Fourteen variations were mapped to the RBD of the spike protien involved in host receptor binding (figure 12). Since as early as January 2021, the highly frequent T478K mutation emerged spontaneously multiple times, predominantly in Mexico, the United States, and India (B.1.617.2 of Indian variants) (Di Giacomo et al. 2021). Present in the RBD, this mutation was predicted to hinder the Spike/ACE2 interaction (Saito et al. 2022). The P681R mutation located near the furin cleavage site was also highly conserved in the B.1.617.2 lineage. This mutation was found to facilitate cleavage of the spike protein and enhance viral fusogenicity (Kannan et al. 2021). The remaining highly prevalent mutations (T19R, E156del, F157del, R158G, and L452R) were also characteristic mutations of the B.1.617.2 lineage (Saito et al. 2022). The N501Y mutation (30%) was characteristic of B.1.1.7 and B.1.351. This mutation was found to increase the transmissibility of the virus by imparting to the variant greater affinity between spike proteins and ACE2 for each other (Liu et al. 2022).
Clustering sequences based on the presence of nucleotide substitution revealed five distinct clusters within the samples (table 2). Cluster 1 (n=12) was composed of sequences from clades 21K and 21L. The Delta lineage was stratified into three clusters: cluster 2 (n=5), cluster 3 (n=23), and cluster 4 (n=19). All three clusters contained sequences from 21J and 21A clades. Finally, cluster 5 (n=37) consisted exclusively of sequences from the clade 20H.
4 Discussion
Among the 130 samples in which SARS-CoV-2 was detected by PCR, 60 were taken from male patients and 70 were taken from female patients. Clinical information and vaccination status were recorded. Patients were categorized by age groups. Among the different age groups of positive cases, the greatest numbers of patients were between ages 30 and 39 years (23.8%) followed by the 18- to 29-year-old age group, which made up 23.1% of the cohort. Other investigators described similar findings (Kushwaha et al. 2021).
In Pangolin lineages, B.1.617.2 was the most prevalent, followed by the B.1.351 lineages. B.1.617.2 is also the most prominent lineage in India (Mlcochova et al. 2021). We also analyzed the clade of the selected sequences. Among the sequenced cases, the predominant Delta strains, comprising 60 samples (46.2%), belonged to the 21A, 21J, and 21I clades and B.1.617.2, AY.4, AY.12, AY.6, AY.10, AY.4.4, AY.39, and AY.43 Pangolin lineages. The frequencies of infection by the major clades were independent of patient gender. Beta, Delta, and Omicron variants infected patients of different ages at different frequencies. The highest percentage of patients infected with the Beta variant (29.8%) were between ages 18 and 29 years; patients between the ages of 30–39, 50–64, and 40–49 years were infected with the Beta variant at decreasing frequencies. Omicron was more prevalent in younger patients, whereas the Delta variant infected older populations. Vaccinated and unvaccinated patients infected with the Delta variant reportedly recover more slowly than those infected with the Alpha variant, as indicated by longer lengths of hospital stays and prolonged viral shedding (Kumar et al. 2022). The Omicron variant is 6–8 times more infectious than the Delta variant (Wang et al. 2022). Our study suggests that younger patients were less susceptible to other variants of SARS-CoV-2 compared with the Omicron variant.
During the study period, the B.1.617.2 variant was the predominant lineage for most of 2021 with co-existing B.1.351, B.1.351.3, and AY.4 lineages. In early 2022, the viral population belonged exclusively to the BA.2, BA.1.1, and BA.1 lineages. Analysis of phylogenetic trees showed that most of the samples belonged to the Delta and Beta variants, but that the Omicron variants contained the greatest number of mutations.
Phylogenetic trees also revealed that the Beta (20H) and Delta (21A, 21J, and 21I) variants were closely related. The Alpha (20I) and Omicron (21K and 21L) variants were less divergent in our study, and their emergence from 20B, as suggested by the global database, could not be reproduced in this study. From mutation analysis, we observed 528 unique coding mutations, of which 102 were deletions, 6 were premature stop codons, and the remaining were substitutions. The number of coding mutations significantly increased as the viral population shifted to highly mutated BA.1, BA.1.1, and BA.2 variants at the beginning of 2022, leading to greater genetic diversity in the variants that have emerged most recently in the pandemic (21L and 21K, Omicron). Eight mutations were observed in more than 50% of the samples sequenced. The most common mutation in this cohort was ORF1b:P314L, which occurred at a frequency of 98.5% (128 samples), while the D614G mutation in the spike protein (S_D614G) was found in 97% of the sequences in another study in Bangladesh (Rokshana et al. 2021). The deletion mutations ORF1a:S3675, ORF1a:G3676-, and ORF1a:F3677- were found in almost half of the samples at frequencies of 53.1%, 52.3%, and 50.8%, respectively. The other substitution mutations that occurred at frequencies over 50% were ORF1a:P2046L, S:T478K, and M:I82T. Our results agree with similar findings reported in the literature (Di Giacomo et al. 2021). A total of 132 unique coding mutations were observed in the gene encoding the spike protein, and the 10 most prevalent mutations that appeared in at least 35% of samples were P314L (98.5%), D614G (84.6%), T478K (50.7%), P681R (47.7%), R158G (45.4%), T19R (45.4%), E156del (45.4%), F157del (45.4%), L452R (43.8%), and D215G (36.9%). The P681R mutation in the spike protein is highly conserved, facilitates cleavage of the spike protein and enhances viral fusogenicity (Saito et al. 2022).
Fourteen variations were mapped to the RBD of the spike protein, which is involved in binding the host receptor. Similar results have been reported by others (Saito et al. 2022). Our study also demonstrates that mutations of the N gene occurred more frequently in the Omicron variant than in other variants. The current study investigated the genomic diversity of SARS-CoV-2 strains isolated from Bangladeshi patients and helps demonstrate the temporal profile of the mutational accumulations in the genome and spike protein over the study period. It suggests, for the first time, that patients of different ages may be differentially susceptible to the variants of SARS-CoV-2. This may have important implications for how aggressively we monitor infections, distribute vaccines, and treat patients based on their age.
References
Bajgain K, Badal S, Bajgain B, et al. 2021 Prevalence of comorbidities among individuals with COVID-19: A rapid review of current literature. Am. J. Infect. Control 49 238–246
Di Giacomo S, Mercatelli D, Rakhimov A, et al. 2021 Preliminary report on severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike mutation T478K. J. Med. Virol. 93 5638–5643
Johansson M, Quandelacy T, Kada S, et al. 2021 SARS-CoV-2 Transmission from people without COVID-19 symptoms. JAMA Netw. Open 4 2035057
Kannan SR, Spratt AN, Cohen AR, et al. 2021 Evolutionary analysis of the delta and delta plus variants of the SARS-CoV-2 viruses. J. Autoimmun. 124 102715
Kumar N, Quadri S, Ismaeel A, et al. 2022 COVID-19 recovery patterns across alpha (B.1.1.7) and delta (B.1.617.2) variants of SARS-CoV-2. Front. Immunol. 13 12606
Kushwaha S, Khanna P, Rajagopal V, et al. 2021 Biological attributes of age and gender variations in Indian COVID-19 cases: A retrospective data analysis. Clin. Epidemiol. Glob. Health 11 100788
Liu Y, Liu J, Plante KS, et al. 2022 The N501Y spike substitution enhances SARS-CoV-2 infection and transmission. Nature 602 294–299
Mlcochova P, Kemp SA, Dhar MS, et al. 2021 SARS-CoV-2 B.1.617.2 Delta variant replication and immune evasion. Nature 599 114–119
Morens D, Taubenberger J and Fauci A 2022 Universal coronavirus vaccines - An urgent need. N. Engl. J. Med. 386 297–299
Petersen JD, Lu J, Fitzgerald W, et al. 2022 Unique aggregation of retroviral particles pseudotyped with the delta variant SARS-CoV-2 spike protein. Viruses 14 1024
Pradhan S, Varsani A, Leff C, et al. 2022 Viral aggregation: The knowns and unknowns. Viruses 14 438
Rokshana P, Sultana A, Jahan Ara B, et al. 2021 Molecular Analysis of SARS-CoV-2 Circulating in Bangladesh during 2020 revealed lineage diversity and potential mutations. Microorganisms 9 1035
Saito A, Irie T, Suzuki R, et al. 2022 Enhanced fusogenicity and pathogenicity of SARS-CoV-2 Delta P681R mutation. Nature 602 300–306
Sanjay B 2020 The social impact of the COVID19 pandemic. ORF Iss. Brief 406
Wang L, Berger N, Kaelber D, et al. 2022 Incidence rates and clinical outcomes of SARS-CoV-2 infection with the omicron and delta variants in children younger than 5 years in the US. JAMA Pediatr. 176 811–813
Yu C, Qianyun L, Li Z, et al. 2022 Emerging SARS-CoV-2 variants: Why, how, and what’s next? Cell Insight 1 100029
Acknowledgements
The authors thank all patients and research investigators for their participation. This study was performed and supported by Praava Health Bangladesh Ltd. Many thanks are due to those who provided helpful comments and suggestions. Thanks to the Child Health Research Foundation and Bangladesh Council of Scientific and Industrial Research in Bangladesh for technical support needed for the sequence analysis. We are also thank Dr. Linda Hae Kum Lee for reviewing and editing the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors report no conflict of interest in this work.
Additional information
Corresponding editor: Arindam Maitra
Rights and permissions
About this article
Cite this article
Azam, S., Sayem, M., Khan, S. et al. SARS-CoV-2 testing and its role in understanding the evolving landscape of the pandemic in Bangladesh. J Biosci 48, 55 (2023). https://doi.org/10.1007/s12038-023-00376-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12038-023-00376-w