Introduction

Coronaviruses (CoVs) were first reported in humans in 1962 [1] and belong to the Order Nidovirales, Family Coronaviridae, Subfamily Orthocoronavirinae, Genus Betacoronavirus, Subgenus Sarbecovirus, Species severe acute respiratory syndrome-related coronavirus, which has a single-stranded RNA (ssRNA) betacoronavirus [2]. This virus could undergo rapid mutations and transmit it within the human population; however, the transmission rate may vary depending upon the vaccination strategy and genetic makeup. At the time of writing, more than 12 strains have already been reported by WHO and CDC [3, 4]. The SARS-CoV-2 surface spike glycoprotein binds with human angiotensin converting enzyme 2 (ACE2) receptors [5]. After anchoring, human TMPRSS2 cleaves and activates spike protein accordingly, to allow SARS-CoV-2 entry, which initiates endocytosis or direct fusion of the viral envelope with the host membrane [6,7,8,9]. The positive-sense single-stranded RNA (ssRNA) genome of SARS-CoV-2 is approximately 30 kb with 14 open reading frames (ORFs), some of them overlapping in nature [10]. The n5 cap and a 3-poly(A) tail of SARS-CoV-2 genome serve as an mRNA for translation to produce viral poly-proteins. Highly organized untranslated region; the ssRNA 5 and 3 ends contain a (UTR) that is critical in the control of RNA replication and transcription. Seven stem-loop structures in 5 UTR, while the 3 UTR contains a stem-loop and a pseudo-knot. The pseudo-knot or stem-loop is thought to have a function in transcriptional control [11]. There are a total of 16 non-structural proteins (NSPs) identified, including ORF1a-encoded proteases nsp5 (chymotrypsin-like protease, 3CLpro or Mpro) and papain-like protease (PLpro) found in subunit nsp3 [10, 12]. The remaining one-third of the genome comprises viral structural protein genes shared by all CoVs (spike, envelope, membrane, and nucleocapsid). Substitution mutation occurs in specified location and codon change from G to A or C to T in the 65% of viral genome. There are ≈100,000 possible single nucleotide substitutions in the SARS-CoV-2 genome [13, 14]. Mutation through natural selection results in evolution. However, most mutations are not beneficial for the organisms with them but may be favorable to the host [15, 16]. It was also reported that interactions between viral and host proteins, including stem loop structures, regulate virus replication and translation. Lowering the number of stem-loop structures in the positive strands of ssRNA viruses enhances RNA expression [17, 18]. The viral replication and transcription are dependent on different factors in the host cell [19]. Nucleotide changes in spike ssRNA that allow evasion of selective pressure reduce antibody and antiviral drug efficacy. SARS CoV 2 spike protein mutations are important for triggering attachment with ACE2 and may result in immunological evasion with IgG [20, 21]. The aim of this study is to predict the structural determination of the SARS-CoV-2 spike protein based on the probabilistic mutation in spike RNA. The mutational impact on the diversification of the stem-looping secondary structure of the viral ssRNA was explored. Further, spike designated ssRNA structural variation has been linked to the pattern of interactions between the viral-spike proteins with ACE2 and human IgG that may reveal the replication and infection mechanisms within the human in the host. We believe that our study will provide a comprehensive understanding of the structural determinants of the SARS-CoV-2 spike protein, offering insights into both the genetic variability of the virus and its infectibility implications. The ultimate goal is to contribute valuable knowledge that can inform public health strategies, vaccine development, and the ongoing battle against the COVID-19 pandemic.

Materials and Methods

Profiling of Mutational Landscape, Nucleotide Statistics, and Substitution Frequency

Furthermore, databases such as GISAID (https://www.gisaid.org), which elucidates its genetic properties, and Nextstrain (https://nextstrain.org), which tracks its mutational profile in the global population, are used. GISAID database represented a total of 7,375,845 SARS-CoV-2 genome sequence entries from patient samples, accessed by the end of January 2022.

We counted the number of nucleotides and their corresponding frequencies of SARS-CoV-2 spike ssRNA sequence. The substitution patterns and rates were estimated using the Tamura 1992 parameter model [22] with MEGA-X [23].

SARS-CoV-2 ssRNA Data Set Curation

Data sets of SARS-CoV-2 spike ssRNA Sequence were primarily obtained from NCBI Genbank Database. Accession number of SARS-CoV-2 (Wuhan): MT079854.1, Alpha: MZ314997, Beta: MZ314998.1, Eta: MZ362451.1, Delta: OK091006.1, Gamma: MZ315141.1, Omicron: OL672836.1.

Prediction and Visualization of the RNA Secondary Structures

The RNA secondary structures of the viral genomic sequences were predicted using the online RNAfold web server at http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi [24]. Predicted RNA secondary structures were then visualized as structural diagram utilizing FORNA server at http://rna.tbi.univie.ac.at/forna [25].

Prediction of Expected Variants of SARS CoV 2

We had significantly characterized predominant mutations occurring in the loop region. Probable mutational hotspots into the different size of the loop regions were identified and edited (G- > A and C- > U substitutions) E.M.1 (9 ± 2), E.M.2 (13 ± 2), E.M.3 (19 ± 3), E.M.4 (27 ± 3), and E.M.5 (39 ± 3) accordingly. The C program was utilized to create different mutant sequences based on the provided probabilistic sequence data sets (Supplementary Fig. 8). Function names like diffchar and sdam were used for this purpose. The scripts were written using pointers. The sample set was specified as a S1 to Sn pointer. nC1 + nC2 + nC3 + …. + nCn sequence sets were generated as expected model (E.M.) to eliminate any cross-platform issues, along with better compatibility all code compiles without all warnings enabled using the C compiler (Fig. 1A). The code has been tested in Windows environment.

Fig. 1
figure 1

(A) The diagram of data flow architecture of the pipeline for generating mutated sequences (B). Program generated sequences including SARS-CoV-2 (Wuhan) were validated by divergence phylogenetic trees

ssRNA Structural Stability

Predicted mutant ssRNA sequences were validated based on the thermodynamics and frequency of the MFE structure and diversity [24]. The finding indicated that significant MFE values across the genome, provided the initial evidence for the occurrence of RNA structure formation in SARS-CoV-2 and other coronavirus genomes. MFE calculations reveal the sequence order contribution to RNA folding, with higher values resulting from native sequence folding energies being better than shuffled controls.

Protein Structure Modeling

Suitable mutant sequences of the different spike protein of SARS-COV-2 were selected and the structures were modeled using Swiss-model interactive modeling (https://swissmodel.expasy.org/, accessed February, 2022). The high-quality models were created using the crystal structure of its C-terminal dimerization domain (PDB id: 6VSB). The 3D models were then improved using 3Drefine (http://sysbio.rnet.missouri.edu/3Drefine/) and online protein structure refinement servers in a two-step procedure [25].

The Mutational Impact and Protein–Protein Interaction Analysis of the Modeled Structures

All of the modeled spike protein structures were chosen for docking after being evaluated by Mol Probity Clash, Ramachandran Favoured, QMEANDisCo Global, and QMEAN score. HADDOCK 2.4 was used to perform protein–protein docking of the S (Spike) protein with angiotensin-converting enzyme 2 (PDB id: 6M18) and the heavy and light chains of human IgG (PDB id: 6YOR). To anticipate binding affinities and establish biological interfaces, we accessed the PRODIGY web server (Prodigy Webserver (uu.nl). The structure of molecular interactions between spike proteins and ACE2 and IgG was visualized and analyzed using PyMOL2 [26,27,28].

Surface-Topology Calculation of Proteins

Solvent accessibility factor defines two properties of protein; pocket, where water enters and cavity, where does not. The CASTp: Computed-Atlas-of-Surface-Topography of Protein (http://sts.bioe.uic.edu/castp/index.html?j_5e8c7bec25090) was used to define pocket and cavity.

Evaluation of H-bonding in Docking Structure

Observation of the H-bonding length of amino acid residues between the ACE2 receptor and human IgG of original SARS-CoV-2 isolate Wuhan-Hu-1 (WH1), the D614G mutant, the N501Y mutant, the B.1.617.2 (Delta), and omicron variant were analyzed by the PyMol visualized system to recognize the H-bond length between amino acid residues short or long. By using PyMol software, the result of the free energy network of the stable protein–protein complex was analyzed.

Energetic and Kinetic Data Analysis

The HADDOCK web server was used to generate the protein–protein binding score, cluster-size based RMSD values and other bond energy values, i.e., Van-der-Waals, Electrostatic Desolvation energy were evaluated. Binding affinity [ΔG (kcal mol-1)] Dissociation constant [Kd (M) at 25 °C] explained the strength and the affinity of the interactions.

SARS-CoV-2 Spike Protein Mutational Saturation, Divergence, and Age Estimation

A timetree was constructed by applying the RelTime technique [29, 30] to a user-supplied phylogenetic tree with branch lengths estimated using the Maximum Likelihood (ML) method, utilizing the Tamura-Nei substitution model [30]. Two calibration constraints were used to build the timetree. To define minimum and maximum time constraints for nodes for which calibration densities were supplied. This [31] approach was used to construct confidence intervals. Times were not calculated for out-group nodes because the Reltime technique calculates divergence times using evolutionary rates from the in-group and does not presume that evolutionary rates from the in-group of the clade apply to the out-group. The tree’s estimated log probability value is 6680.52. To describe evolutionary rate variations among sites in 5 categories (+ G, parameter = 0.0500). According to the rate variation hypothesis, some sites [(+ I), 0.00% of sites] were evolutionarily invariable. There were 14 nucleotide sequences in this study. 1st + 2nd + 3rd + noncoding codon locations were included. The total number of places in the final dataset was 4083. MEGA11 [32] was used to execute evolutionary studies.

Results

Phylogenetic Clusters Correlated with Mutational Pattern

We assessed 7,375,845 SARS-CoV-2 genomic sequences uploaded to the GISAID database in January 2022. We also assessed the lineage diversity of SARS-CoV-2 variants through time using phylodynamic time tree analysis. In the SARS-CoV-2 spike gene, phylodynamic time tree analysis revealed mutational hotspots. According to the SARS-CoV-2 global lineage, the VOCs were included in our predicted models (Supplementary Fig. S1A-S1B). Nucleotide frequencies were calculated after retrieving the SARS-CoV-2 spike ssRNA sequence. More A and U (62.9%) and less G and C were shown in spike ssRNA (37.1%). In their spike ssRNA sequence, RSUV analysis revealed more U-rich codon (Supplementary Table S1). We also created a large data collection of 3822 complete coding sequences to build numerous sequence alignments. Furthermore, Tamura model analysis of substitution patterns demonstrated a high probability for all two forms of substitution (G → A and C → U). After examining additional variations, all of the mutations were highlighted in the ssRNA of the Wuhan variant (Supplementary Fig. S2A–S2E). Our findings revealed U-richness in virtually all coding sequences of SARS-CoV-2 spike ssRNA, however this pattern differed at first codon positions [33]. Our research strategy aimed to determine whether the observed patterns were attributable to a biased mutation process or selection.

Expected Mutation Model

To resolve this enigma, we utilized the concept of probable substitution selection (except deleterious mutation), which has been shown to be prevalent in ssRNA viruses [20]. Next, we modeled to construct the proportion of G → A and C → U substitutions in the small loop (9 ± 2 nucleotides) with more mutations, where as in the larger loop (39 ± 3 nucleotides) with fewer expected mutations, thus estimating the differing roles mutation and selection. We expected and designed five sets of ssRNA sequences depending on loop size, where we identified eighty possible mutational spots.

Most of the mutations were observed within the loop region of the spike protein. Prior research indicates that ssRNA loop regions exhibit heightened motion dynamics as a consequence of their minimal covalent connections [34]. We identified most of the mutations in the loop regions. Additionally, we noted that most of the substitution co-occurrence to G- > A and C- > U. The program estimated the probable combination of mutated sequences (Named as E.M.) based on the size of the five types loop structures. Expected mutations were edited into the loops of ssRNA sequence and named as E.M.1 to E.M. 5; small to large accordingly. Using computational program we retrieved a total of 31 sets of Expected Models (E.M.) utilizing nC1 + nC2 + nC3 + …. + nCn equation (Fig. 1B). Utilizing the program generated RNA sequences were then translated into protein sequences.

Free-Energy Landscape and Stability of SARS-CoV-2 Spike ssRNA

MFE (minimum free energy) methods present in the most popular RNAfold web server was used to predict and analyze secondary RNA structures. Free energy principles based on empirical thermodynamic parameters were used in this model. As indicated in the graph (Supplementary Fig. S4) sequence lengths ranging from 1 to 3822 nucleotides. According to ssRNA stability result, MFE average rate was 28.5 ± 0.5 kcal/mol, free energy of the thermodynamic ensemble was −30 ± 0.6 kcal/mol and ensemble diversity was 18 ± 0.5 per 100 nucleotides. Results revealed that eighteen were thermodynamically stable whereas nine were less stable. Preferred twenty-seven number of SARS-CoV-2 spike ssRNA were picked based on the overall minimum free energy associated with the different structural aspects, such as thermodynamic ensemble, MFE and ensemble diversity (Supplementary Fig. S4).

Optimization Toward Spike Protein Structure Modeling and Validation

The best validation statistics were chosen for study from a total of 27 homology models of the mutant SARS-CoV-2 spike ssRNA. Using the Swiss Model workspace, we generated 27 models of glycosylated full-length spike protein by integrating experimental structural data and bioinformatic predictions. The best validation statistics were chosen for this study from a total of seven homologous models of the expected mutant SARS-CoV-2 spike ssRNA. SWISS-MODEL server was used to analyze the stability of homology structures, based on the Z-score, QMEAN, and Ramachandran plot [35]. These computational methods are preferably used to assess the stability of the modeled spike proteins and the hACE2 cellular receptor for molecular docking. The new models resulted a Qmean z-score of −2.00 ± 0.2, a Mol Probity Score of 1.13 ± 0.03, and a Ramachandran Favoured > 91%. For a protein of this size were considered for further analysis, this is within the allowed range. Figure S4 shows a graphic comparing the quality of our model to current x-ray models in terms of Q-mean z-scores (Supplementary Fig. S5). The 3Drefine server was used to improve the models. Five models were generated by the 3D-refine server. The top-ranking models were selected, having favorable properties such as the lowest 3Drefine score, GDT-HA, RMSD, lowest RWPlus score, and MolProbity [36].

Docking Performance and Bonding Pattern

Spike with ACE2 Binding Partner Identification

In bound complexes, higher binding affinity was noticed in case of E.M.2 (36), Gamma (32), and Omicron (30) more than 30 ± 3 intercation. Moderate number of bond resulted around the binding region in case of E.M.3 (29), E.M.4 (28), Wuhan (27), Beta (26), E.M.5 (25) and E.M.3.4 (25), Delta (22), and E.M.2.5 (21), whereas lower number of interaction E.M.3.5 (20), Eta (17), and Alpha (14) presented in Fig. 2. Highest number (22 ± 2) of polar amino acids interactome profile found around the binding region in case of omicron, E.M.2, and Beta. Highest number (13 ± 3) of non-polar amino acid interaction noticed in binding complexes, i.e., E.M.2, E.M.3, E.M.4, E.M.5, Gamma, and E.M.3.4. Whereas a lower (7 ± 2) number of non-polar residues interact in the case of E.M.2.5, E.M.3.5, Beta, Delta, Alpha, Eta, and Omicron (Supplementary Fig. S6A-S6N). The binding free energy for viral-spike with ACE2 was calculated. The amino acids contribution to the free energy (−17 ± 1) indicated that the interactions were more favorable for E.M.3, E.M.2, and Omicron.

Fig. 2
figure 2

The hydrogen bonds at the SARS-CoV-2 with H. sapiens ACE2 and SARS-CoV-2 with H. Sapiens IgG interfaces. Percentile stacked bar plot of the residue number of the docked complex revealed after different variants of SARs-CoV-2 spike protein with human ACE2 and IgG heavy with light chain, orange color bar indicating ACE2 with spike; Delta resulted highest number of interaction. Ash color indicates spike with IgG heavy chain and light ash color displays spike with IgG light chain

Spike with IgG Binding Partner Identification

In bound complexes with heavy chain of human IgG revealed that higher number of binding affinity resulted in case of E.M.2 (31) and E.M.3.5 (31). Moderate numbers of bonds were shown in case of E.M.4 (29), E.M.2.5 (28), Omicron (26), E.M.5 (24), Wuhan (24), and E.M.3 (23). Lower number of bond interaction in case of Alpha (20), Eta (17), Gamma (16), and Beta (15) including E.M.3.4 (15) and least number of bonds in case of Delta (7) were identified as primary attractors (Fig. 2). In the cases of E.M.3.4, E.M.4, E.M.2, E.M.2.5, and E.M.3.5, a higher number (18 ± 2) of non-polar amino acid interactome profiles was detected surrounding the binding domain. In the cases of Omicron and E.M.3.5 result reflected higher number (16 ± 1) of polar amino acid bonding patterns.

Whereas a least number of polar and non-polar residues interaction recorded in case of spike of Delta (2) with human IgG heavy chain (Supplementary Fig. S7A-S7N). The binding free energy for the heavy chain of the human IgG antibody with spike was estimated. The contribution of amino acids to the free energy (−18 ± 1) revealed that E.M.3.5, E.M.4, WHCV, and E.M.3 represented a stable interaction. The bound complexes IgG light chain with SARs-CoV-2 spike interestingly showed highest number interaction in case of Omicron (25) and lowest by Delta (4). Other variants including our expected model variants resulted 15 ± 5 number of bonds (Fig. 2). In the cases of E.M.3 (9) highest and Delta, Eta, E.M.5, resulted least number of non-polar amino acid interactome profiles was detected surrounding the binding domain. In the instance of Omicron, the bound complexes of IgG light chain with SARs-CoV-2 spike indicated the maximum number (20) of polar amino acid interactions (Supplementary Fig. S7A-S7N). The interactome profile of ACE2 with the spike complex represented that E.M.2, Gama, Omicron, E.M.3, and E.M.4 showed the largest number of interactions accordingly, in our study. Selected spike protein with IgG-H and IgG-L chain complexes demonstrated that Omicron, E.M.4, and E.M.2.5 showed significant higher number of interaction. Figure 2 revealed that the highest docked surface areas are present in the Delta variants and more of the deleterious nature of the contact areas was documented [37, 38].

The binding free energy of SARS-Cov-2 spike protein with human-ACE2 complex was calculated. Result suggested that the E.M.2, E.M.3, Omicron, and gamma representing stable complex. The binding free energy of the human IgG antibody heavy with light chain with a SARs-CoV-2 spike protein was calculated. Contribution of amino acids to free energy (−12 ± 1) demonstrates a stable bound complex spike of Alpha, Beta E.M.2.5, and E.M.3.4 with human IgG antibody heavy with light chain (Table 1).

Table 1 The interactome profile revealed binding free energy of spike protein of different variant SARS-CoV-2 with ACE2, IgG-H & IgG-L

Viral membrane glycoproteins, i.e., spikes, are the largest amounts of trans-membrane proteins transcribed from the ssRNA which is in a compact stem-looped structure. So, the fast production of the large number of spikes and the processivity of the RdRp is a mandatory requirement. A large amount of U-A bonding is favorable for maintaining this processivity because lower energy expenditure is required. Additionally, in the case of stability in the protein structure, it was already reported that the trans-membrane proteins encoded with more uracil are more stable and may engage in promiscuous chaperone like activities [39]. In the case of SARS-CoV-2, in our and also in other studies, more U-rich codons are noticed in spike proteins than the viral cytosolic proteins. So spike protein stability by U-rich codon representation may be regarded as one of the functions of the host protein interactivity. A higher ratio of interactions between ACE2: IgG (D + L) represented the more deleterious nature of the variants. In that regard, delta may be judged to be in that category compared to the Wuhan or the Omicron variant. Second position and third position U-rich codons represent Phe, Leu, Ile, Val Cys, Arg, Ser, and Gly amino acids, respectively. The report suggests that the Leu/Ile/Phe-rich domain of some human viruses, i.e., polyomaviruses, including JCV, BKV, and SV40 can form stable dimer/oligomers mediated by a predicted amphipathic α-helix, spanning amino acids. Moreover, deletion of the α-helix renders a replication incompetent virus [40, 41]. Further studies are necessary in relation to the SARS-CoV-2 function. Virus with smaller R group like Val, Gly and Ser and Cys may attribute lower structural hindrance and more liberty to form secondary and tertiary structures which is supportive for more interactive energy ensemble formations. This is supported by Fig. 3. In this regard, transactivation of some Cys-rich proteins in some other RNA viruses like hHIV may be noteworthy, and in addition, Val and Gly are also supportive of its metabolic function and help in anti-termination of the transcriptional regulations [42, 43]. As an individual amino acids thiol containing Cys or hydroxyl Ser has their own capacity for interactivity. Positive charge Arg is helpful for the polar interactions with the contact ligand with an ability to form hydrophobic-hydrophilic combined multi domain structure for higher interactivity. The spike RBD protein and the region of the nucleocapsid protein are demonstrated as nuclear localization signals (NLS). These are enriched with the replacements of positively selected amino acid. These replacements are shown to be linked with apparent epistatic interactions. These are also designated as the blue-prints of major diversification in the SARS-CoV-2 phylogeny [44, 45].

Fig. 3
figure 3

Increased stability and binding affinity interface of spike SARS-CoV-2 with H. sapiens ACE2 protein and IgG. (A) A diagrammatic representation of H. sapiens ACE2 protein and IgG with key amino-acid residues binds more tightly to interact with the spike (S) protein of different SARS-CoV-2 variants, high density color gradients indicates highest number of interactions whereas low density display minimum number of interactions. (B) Multitude of host and different variants of SARS-CoV-2 spike interactions characteristics impacts on disease profiling as per the interaction

Reinfection and recovery from different variants of coronaviruses induce immunological protection. The spike ssRNA mutation rate, which is reflected particularly in the spike glycoprotein, was used to screen for and detect a mutation profile. The assumed E.M.3.4 model may be the variant of concern (VOC). The majority of the mutational impact on expected model variants is in the receptor-binding domain, which increases infectivity by increasing binding affinity with ACE2 [46]. The SARS-CoV-2 Immunity and Reinfection Evaluation (SIREN) project, conducted by Victoria Hall and colleagues in the United Kingdom, revealed that being seropositive to SARS-CoV-2 through natural infection protects against both asymptomatic and symptomatic reinfection [47]. Different study already suggested that antibody through vaccination program enhance protection against a range of SARS-CoV-2 variants. The expected model variants, in particular, exhibit particular immune escape, although most of them have high infectivity (Fig. 3A). However, previous serosurvey and studies suggest that neutral immunity and T-cell responses through vaccine platforms can protect most of the variants, regardless of E.M.3.4 (Fig. 3B).

Time Tree of Empirical Datasets

Phylogenetic evaluation of global viral populations using the GISAID and Nextstrain nomenclature systems revealed multiple clades and lineages (Fig. 4A). A global phylogenetic analysis of the circulating genomes was performed to identify distinct groups and their unique mutational patterns. We used VOC SARS-CoV-2 lineages to create minimal spanning trees (Supplementary Fig. 1A-1B) to visualize genetic links and distances between mutations from various countries. Due to host selection forces, mutations in multiple countries may differ. Some mutation sites in the same nation are constant, and the genome sequence has obvious regional peculiarities [38, 48].

Fig. 4
figure 4

Evolutionary dynamics and time tree of SARS-CoV-2 spike ssRNA. (A)The genome of the SARS-CoV-2 global phylogenetic tree, as extracted from GISAID, as seen in the Nextstrain global phylogenetic tree. Tree-Time was used to create a time-resolved phylogenetic tree with chosen metadata information. (B) Scale lengths represent the divergence time. Phylogeny based on co-evolution divergence times was estimated, and each branch is labeled using the ML method, utilizing the Tamura-Nei substitution model. The sampling interval is similar to the time frame which assesses the long-term rate of evolution with high precision

These findings suggested that, although certain changes are essential in the development of SARS-CoV-2, others may be the consequence of the virus’s adaptability to multiple countries, natural environment, infectivity pattern, and immunoescape mechanisms. The time tree for the SARS-CoV-2 spike was built using ssRNA source trees. By commensalism and amensalism, seven expected variants and five variants of SARS-CoV-2 are highlighted and classified, respectively. The length along the branches is indicated by the bootstrap percentage out of 500 bootstrapping.

The cross-family continuity in reference to time scale phylogeny was estimated using phylogenetic analysis [33]. The SARs-CoV-2 spike ssRNA dataset, which served as a model for SARs-CoV-2 infectivity and escapology in time tree. Due to the former’s higher pace of development, there are also considerable changes in overall tree length. The genomic evolutionary rate of SARs-CoV-2 spike ssRNA was expressed in divergence time, and the values for E.M.5, E.M.4, E.M.3 with E.M.3.5 and E.M.2 with E.M.2.5 & E.M.3.4 displayed, respectively (Fig. 4B). The SARs-CoV-2 expected evolutionary clock was coordinated with future adaptive diversity in humans.

Discussion

We revealed that the spike of SARS-CoV-2 ssRNA studied here had a skewed nucleotide composition in their coding regions, with U and A-rich sequences. The spike ssRNA of SARS-CoV-2 contains 3822 nucleotides, and there are ≈ 11,000 possible single nucleotide substitutions that may occur [13]. One of the experimental goals of this analysis is to explore the fingerprints of the RNA editing ratio in the long-term evolution of the Coronavirus. The existence of a greater rate of Adenine rich sequences, as well as probable selection for amino acids encoded by A-rich codons, promotes MHC system avoidance [49]. We noticed a lower frequency of CG pairs and a higher frequency of U compared to A in the single-stranded RNA (ssRNA). Zinc finger antiviral protein (ZAP), apolipoprotein BmRNA (APOBEC), and adenosine deaminases acting on RNA (ADAR) promote C- > U mutations in the SARS-CoV-2 ssRNA genome [13, 50, 51]. Regardless of the underlying mutational processes, the studies performed in the research revealed an over-representation of C- > U transitions. It was already reported that oxygen tension would alter the induction of bacterial mutations by other mutagens which is regulated by hypoxia-inducible factor-1α [52,53,54]. Predicted ssRNA sequences were chosen and structured to be not only be more thermodynamically stable but also to have a more compact structural ensemble than random sequences. The genomic structural stabilization, at the expense of the lower virulence and pathogenicity that was observed in some recent variants like Omicron resulted in higher spreadability but lower mortality in human host. This suggests a co-surveillance adaptation strategy. Moreover, the selection of more U and A indicates the adaption strategies of simpler nuclear to maintain a lower energy level for structural stabilization. Unlike several other viruses, SARS-CoV-2 has showed purifying selection strategies and has extensively utilized its mutational benefits. Although a significant number of variants might have gone through extirpation steps, a large number of VOCs have survived. Implication in ancestral recombination is a possible game plan of the procurement of the mutational advantages in positive selection procedures. One report reveals that major selective forces are implicated on the ‘501Y lineages’ by repeatedly favored convergent mutations at 35 genome sites of the SARS-CoV-2 [55, 56]. A report revealed that the genomes of SARS-CoV-2 were remarkably structured, with minimum folding energy (expressed as minimum folding energy differences; MFEDs) than previously examined viruses, such as the hepatitis C virus. High MFED values were shared with all coronavirus genomes analyzed, creating several hundred consecutive energetically favored stem-loops throughout the genome. This characteristic provided insight into the selection of the SARS-CoV-2 spike ssRNA for protein modeling. Surprisingly, the findings of the study revealed a strong correlation between phylogeny and divergence time (Fig. 4). The highest number of mutations resulted in the case of E.M.3.4 which is 113 and 87 compared to Omicron and Wuhan reference strain (Supplementary Fig. S8). It also adds to the evidence that systematic adaptation in SARS-CoV-2 spike ssRNA may be a key factor for assuming infectivity and immune-escapology. All the predicted model variants showed moderate or high infective rate but reduced severity due the significant number of IgG light and heavy chain interactions, i.e., E.M.5, E.M.4, E.M.3, E.M.3.5, E.M.2, and E.M.2.5. Only one variant, EM.3.4, appears to cause serious symptoms, like Delta variant. Although a high rate of vaccination results in a neutralization titer against SARS-CoV-2 variants, this may reduce serious symptoms. It was already reported that the decline of the neutralization titer after 250 days represents a considerable loss of protection against SARS-CoV-2 infection [57]. Here, we show that our hypothesized E.M.3.4 provides an evidence-based SARS-CoV-2 variant of concern model that will assist in the design of vaccination strategies to control the pandemic’s future trajectory. Mutational imposition of the fitness cost on the virus is attributed by the stem-loop structural disruption of the ssRNA and lower level of energy ensemble, as observed in the current study. Out of the main three events like infectivity (ACE2-TMPRSS2 role), propagation (replication) and immuno-escape (MHC-IgG recognition), the virus appears to strive for a balance in negotiation. We hypothesize that these balancing features are evolutionary attributed via mutational modifications of the ssRNA of the SARS-CoV-2 following thermodynamic and kinetic properties. Further structural constraints, such as stem-looping, impose genomic reconstruction for molecular epidemiology. As mentioned in this paragraph, not only ACE2 binding or immune interaction, but also RNA-dependent RNA polymerase (RdRp) structural alteration and stability also determine viral fitness. Multiple mutational analyses determined the role of mutations in inducing alterations in RNA secondary structure and RdRp functionality, thereby governing the pathogenicity of the virus. The important finding of more occurrences of neutral variants in asymptomatic or less symptomatic persons versus deleterious variants in the deceased person may suggest viral fitness definition. The killing of the host terminates a large number of viral life-cycle rather co-surveillance with the host increases the propagation and spreadibility of the virus. Therefore, long-term evolutionary fitness is rendered in the second category. The fatal variants that killed large populations did not last long because they had a particular mutation associated with strong ACE2 binding, immune escape, or both. For example, the D614G mutation in the S-glycoprotein, for example, has an impact on immunity and partial vaccination escape [58]. The E484K and N501Y mutations in the receptor-binding domain of spike were the major sources of neutralizing resistance. The 501Y.V2 variants, on the other hand, do not confer improved infectivity in a variety of cell types [59]. With their diverse virulence and epidemiological results, five important mutations (T95I, A222V, G142D, R158G, and K417N) were much more frequent in the Delta Plus variant than in the Delta variant [60]. When comparing the mutational hotspots of the Omicron and Delta variants, it appears that the Omicron variation has a lot of mutations in its spike proteins [61]. However, according to the findings, the probability of severe outcomes after SARS-CoV-2 infection is significantly lower for omicron than for delta [62]. As a result of Omicron infections, booster immunization with mRNA vaccines provides over 70% protection against hospitalization and mortality. The presence of a major cluster of mutations in the S protein has led to the development of more than 12 variants, indicating the beginning of antigenic drift for SARS-CoV-2. The emergence of such variations is particularly concerning since they may evade antibodies and result in increased ACE2 binding. This divergence in time for viral identification might lead to an underestimation of the pandemic’s scale [63, 64]. The predicted S protein, particularly its RBD that interacts with the host cellular receptor to gain access to host cells, is a viable vaccination target for SARS-CoV-2. As illustrated above, E.M.3.4 variant may improve ACE2 affinity but may not necessarily confer antibody resistance. On the other hand, E.M.2, E.M.2.5, E.M.3, E.M.3.5, E.M.4, and E.M.5 resulted in lower ACE2 binding affinity and more IgG interactions. Other potential immunological correlates of protection will be tested and evaluated using our hypothesized models. The WHO has declared the VOCs to be resistant to neutralizing antibodies, making them more infectious and pathogenic [65, 66].

Conclusion

Our hypothesized models for SARS-CoV-2 are resistant to neutralizing antibodies, rendering the virus more infectious and pathogenic. Clade and lineage classifications will be altered due to the virus’s high diversity and high mutability nature. It has already been reported that the molecular evolution pattern of the SARS-CoV-2 amino acid substitution rate is 25.331 per year [67]. Our hypothesized model, E.M.3.4, may evolve in 2024–2025, and E.M.2 is expected to evolve in 2024. On the other hand, our research presents a modeling framework for combine’s imprecise data from previous infected variants with antibody interactome profile, which may provide a strategy for projecting the uncertain future of SARS-CoV-2 immunity. To evaluate the efficacy of control measures and monitor co-evolutionary events in the future, our hypothesized VOC models and evolutionary epidemiology of SARS-CoV-2 lineages may become more significant.