1 Introduction

Proteins are engaged in highly selective interactions in micro to macro living systems. Variation (Mutation) in the sequence causes significant perturbations or complete abolishment of function, potentially leading to diseases. There is an important need to understand the impacts of variation in the protein structure. The stability of proteins plays an important role in characterizing their functions, activity and regulation [1].

One of the possible ways to assess the effect of a mutation on protein binding affinity/stability is to experimentally measure it. However, these methods can be time-consuming and costly. With the advancements and amalgamation of computing technology with chemistry, physics, and biology, it has become convenient to estimate the impact of mutations on protein stability/energy theoretically with near accuracy to the experimental results [2].

The current era of genome sequencing has unraveled a large number of human genetic variations, many of which may affect protein binding and function [3].

Protein stability refers to the ability of a protein to maintain its native three-dimensional structure under a given set of conditions. The Gibbs free energy (ΔG) is a thermodynamic parameter that describes the tendency of a system to change spontaneously from one state to another. In the context of protein stability, ΔG is a measure of the free energy difference between the folded (native) and unfolded (denatured) states of the protein [4].

A negative ΔG value indicates that the protein is stable in its folded state, while a positive ΔG value indicates that the protein is unstable and has a tendency to unfold. The magnitude of ΔG reflects the strength of the interactions that stabilize the folded protein, such as hydrogen bonds, hydrophobic interactions, and electrostatic interactions [5].

Experimental techniques such as protein folding assays, circular dichroism spectroscopy, and differential scanning calorimetry can be used to measure protein stability and ΔG values under various conditions, such as changes in temperature, pH, and ionic strength. Computational methods such as molecular dynamics simulations and free energy calculations can also be used to predict protein stability and ΔG values based on the protein’s structure and environmental conditions [6].

AIDS pandemic, caused by the retrovirus HIV-1, has claimed more than 30 million lives over the past four decades. Antiretroviral (ART), which is required for the whole life, has transformed the disease into a little manageable one. The CD + T lymphocyte is the main target cell through which HIV-1 enters, by binding to its receptor CD4 and to the co-receptors i.e., CC-chemokine receptor-5 (CCR5). The fusion of the viral and human cell membranes, prompted by this binding, initiates a complex intracellular life cycle, producing new viruses [7].

The stability of mutants in the context of HIV proteins, especially in relation to their binding to anti-AIDS drugs like rilpivirine, as well as their impact on pathogenicity and virulence, can vary significantly. It’s important to note that HIV is known for its high mutation rate, which can lead to the emergence of drug-resistant variants and altered pathogenicity. Here is some information, along with references, on these aspects:

Rilpivirine Resistance: Rilpivirine is a non-nucleoside reverse transcriptase inhibitor (NNRTI) used in the treatment of HIV. Resistance to rilpivirine can develop due to mutations in the HIV reverse transcriptase gene, specifically in the NNRTI-binding pocket. Common Mutations: Common mutations associated with rilpivirine resistance include K103N, Y181C, and E138A. These mutations can reduce the binding affinity of the drug to the reverse transcriptase enzyme, leading to reduced drug efficacy [8]. Essential amino acids, such as tryptophan (W) such as W229 and W234, which contribute to hydrophobic interactions in the NNRTI binding pocket, are involved in rivilpivirine’s binding to the reverse transcriptase enzyme. In order for Rilpivirine and the enzyme to create hydrogen bonds and engage in hydrophobic interactions, tyrosine (Y), as demonstrated by Y181, is essential. Furthermore, phenylalanine (F) residues, such as F-227, contribute to the hydrophobic pocket that Rilpivirine binds to, increasing the affinity of its binding [8].

The basic mechanism of action of Rilpivirine as an NNRTI is derived from this combination of certain amino acid interactions inside the reverse transcriptase enzyme.

Mutations in the reverse transcriptase gene can affect the binding of rilpivirine to the enzyme’s active site. The loss of drug binding affinity can result in reduced inhibition of reverse transcription, allowing the virus to replicate. The specific mutations determine the degree of resistance, with some mutants showing higher resistance levels than others. The effect on drug binding can be assessed through in vitro studies and molecular modeling [9].

Mutations in HIV can influence viral pathogenicity and virulence. Some mutations may lead to changes in viral proteins that affect the virus’s ability to infect and replicate within host cells. Mutations that enhance viral fitness, replication, and immune evasion can contribute to increased pathogenicity. Conversely, some mutations may reduce viral fitness and replication. Studies on the impact of specific mutations on viral pathogenicity are ongoing, and the results can vary depending on the viral strain and host factors [10]. Due to its capacity to target the immune system, particularly CD4 + T cells, which are essential for the body’s defence against diseases, HIV is a very dangerous virus. These cells are the main target of the virus, which causes their depletion, impairs immunity, and increases susceptibility to a variety of opportunistic infections and cancers. The ability of HIV to elude immune response, develop persistent infection, and gradually weaken immune system activities is largely responsible for its pathogenicity [10].

HIV’s virulence varies from person to person and is influenced by the virus strain, the immunological system of the individual, and the accessibility of treatment. A rapid course of the disease is caused by certain strains of HIV that are more virulent than others. Treatment becomes much more difficult since the virus can mutate quickly, resulting in the creation of drug-resistant forms [10].

HIV’s capacity to incorporate its genetic material into the host’s DNA is another factor contributing to its virulence; this allows the virus to create a latent reservoir of infected cells that can reawaken and release virus particles even after years of successful antiretroviral therapy. Finding a treatment for HIV is significantly hampered by this viral reservoir.

The emergence of drug-resistant mutants, including those resistant to rilpivirine, poses a clinical challenge in the management of HIV infection. Alternative antiretroviral regimens may be required for individuals with drug-resistant strains. Monitoring for drug resistance through genotypic and phenotypic testing is essential in HIV clinical care to guide treatment decisions [10].

Computational Chemistry is a multidisciplinary field that combines principles of chemistry, physics, and computer science to investigate and understand chemical phenomena using computational methods. It involves the development and application of theoretical models, algorithms, and software tools to study various aspects of molecular systems, such as their structures, properties, and reactivity. Computational chemistry is a highly sophisticated branch of chemistry that uses computer simulations and mathematical models to study chemical systems. It involves the use of theoretical methods, algorithms, and computer programs to estimate the properties and behaviour of molecules, materials, chemical reactions etc.

The use of computational methods in chemistry has revolutionized the way researchers approach the study of molecules and materials. It enables the exploration of complex chemical systems that are often difficult or even impossible to study experimentally. Computational chemistry techniques provide insights into molecular interactions, reaction mechanisms, and properties of compounds, helping researchers to design new drugs, catalysts, and materials.

Computational chemistry has many applications, including drug discovery, materials science, catalysis, and environmental chemistry. By using computational methods, the properties of molecules and materials can be predicted to near accuracy without the need for expensive and time-consuming experiments. This helps in saving time thereby faster and more efficient development of new drugs, materials, and technologies.

Computational chemistry is a broader field that encompasses a wide range of computational methods and techniques used to study chemical systems. In addition to MD simulations and protein modelling, computational chemistry also includes techniques such as quantum chemistry, molecular mechanics, and molecular docking, among others [11].

Some of the commonly used computational chemistry methods include computer aided drug design (CADD) including, molecular mechanics, quantum mechanics, density functional theory, and molecular dynamics simulations. These methods vary in their level of accuracy and computational cost and are chosen based on the specific research question and available computational resources.

Overall, computational chemistry plays an important role in advancing our understanding of chemical systems and developing new technologies that can improve our lives.

Computer-aided drug design (CADD) is a computational approach that involves the use of computer algorithms and software to assist in the drug discovery process. This approach uses various computational tools to identify potential drug candidates and optimize their properties before they are tested in the laboratory [12].

CADD has become an essential tool in drug discovery, allowing researchers to rapidly screen large numbers of compounds and optimize their properties before investing time and resources in expensive experimental studies.

Virtual screening is a computational technique used to predict the potential activity of small molecules (ligands) against a specific target protein. It involves the use of computer software to analyse large databases of molecules and predict their affinity and activity for a specific target. It can be used in drug discovery to identify potential drug candidates that can bind to the target protein and modulate its activity [13]. It is a powerful tool in drug discovery as it can significantly reduce the time and cost involved in the drug discovery process by identifying potential drug candidates with high affinity and specificity for the target protein.

Molecular Dynamics (MD) simulation is a computational technique used in computational chemistry to study the behaviour of atoms and molecules over time [14]. In an MD simulation, the system of interest is described by a set of equations of motion that define the behavior of each atom or molecule in the system. The equations of motion take into account the interactions between atoms or molecules, which are described by a potential energy function. MD simulations can be used to study a wide range of chemical and biochemical systems, including proteins, DNA, and small molecules. They can provide insights into the dynamics and thermodynamics of these systems, such as the conformational changes that occur in proteins and the binding of ligands to enzymes. The simulation proceeds by solving the equations of motion numerically, typically using a numerical integration method such as the Verlet algorithm or the leapfrog algorithm [15]. The simulation calculates the position, velocity, and acceleration of each atom or molecule at each time step, and the positions of the atoms or molecules are updated based on these calculations.

Molecular dynamics (MD) simulations are one common type of simulation used in this field. MD simulations involve the use of computational models to simulate the motion of atoms and molecules over time. In the context of protein modelling, MD simulations can be used to study the structural and dynamic properties of proteins, including their folding and unfolding processes, interactions with ligands, and conformational changes [16].

Protein modelling is the process of predicting the three-dimensional structure of a protein from its amino acid sequence. The three-dimensional structure of a protein is essential to understanding its function, interactions, and biochemical properties. There are several methods used to model protein structures, including homology modelling, ab initio modelling, and molecular dynamics simulations.

Homology modelling assumes that the amino acid sequence of a protein is similar to that of a known protein with a similar function and structure [17]. In homology modelling, the known protein structure is used as a template to predict the structure of the target protein. The accuracy of homology modelling depends on the similarity between the amino acid sequences of the target protein and the template protein.

Ab initio modelling, also known as de novo modelling, is a method that predicts the structure of a protein without using a template structure. Ab initio modelling is based on physical principles such as energy minimization and can be computationally expensive. This method is more challenging than homology modelling but can be used for proteins that do not have a close homolog with a known structure [18].

Protein modelling is an essential tool for understanding protein function and structure. It has applications in drug design, protein engineering, and understanding the mechanisms of protein-protein interactions. A protein could have multiple structures available, and if another structure of the same protein is used, the predicted change in stability for structure-based methods might be different. The mutation causes a change in the stability of a protein.

DUET online server is used for these computations. DUET consolidates two reciprocal approaches (mCSM and SDM) in a agreement vaticination, attained by combining the results of the separate styles in an optimized predictor using Support Vector Machines (SVM) [16]. The system improves the overall delicacy of the prognostications in comparison with either system collectively and performs as well as or better than analogous styles. DUET is a bioinformatics web garçon created for gaining sapience into the goods of nsSNPs on protein stability. It integrates two reciprocal styles into an agreement/ optimized vaticination, as a way to work the stylish of SDM, a statistical implicit energy function that relies on negotiation tables deduced from homologous protein families which incorporates constraints on residue surroundings during elaboration, and mCSM, a machine literacy algorithm that takes into account the residue 3D physicochemical terrain epitomized as a graph- grounded structural hand [19].

Mutations can be classified into three categories (a) “Good” which increases fitness, (b) “Indifferent or Neutral”, as the effects are too small and, (c) “Bad” which decreases fitness [19].

ΔΔG results will fall into three categories:

  1. A.

    ΔΔG > 0.5: Positive results suggest that a mutation would be destabilizing. These mutations are residues that are usually avoided during design and can be classified as “Bad”.

  2. B.

    0.5 > ΔΔG > − 0.5: Things that are near 0 are within the noise range so should be considered indifferent or neutral. These can be included in the design to allow more neutral changes in the protein that may compensate for changes in the protein. These can be classified as “Neutral” or “Indifferent”.

  3. C.

    ΔΔG < − 0.5: Negative results suggest that the mutation would lead to a more stable protein and can be classified as “Good”.

Protein modelling of missense mutations involves predicting the structural and functional consequences of amino acid substitutions that alter the protein sequence. Missense mutations are single-nucleotide variations that change a single amino acid residue in a protein sequence, potentially affecting protein stability, interactions, or enzymatic activity.

There are several computational tools and methods available for protein modelling of missense mutations, including homology modelling, molecular dynamics simulations, and machine learning-based approaches. These methods use various algorithms to predict the effect of a missense mutation on protein structure and function, such as changes in protein stability, folding, dynamics, and interactions [20,21,22,23,24].

One common approach is to compare the predicted structure and stability of the wild-type protein with that of the mutated protein. If the mutation destabilizes the protein or alters its structural integrity, it may affect the protein’s function or interactions with other molecules.

Overall, protein modelling of missense mutations can provide valuable insights into the potential effects of genetic variations on protein structure and function, which can help in understanding the molecular basis of genetic diseases and designing therapeutic interventions.

The present study is undertaken to asses the impact of in silico mutations on the basis of ΔΔG as a measure of stability.

2 Materials and methods

This is an attempt to study the impact of the mutation “on” and “by” specific amino acid residues. An in-silico introduction of missenses investigation has been undertaken to test the effect of mutation on the stability of the newly designed proteins.

In the present study HIV-1 NNRTI protein, namely 4G1Q [25], downloaded from protein data bank (http://www.rcsb.org), was used to perform mutation and assess and compare relative stability of designed proteins with the parent protein [26]. DUET server was used for performing mutations in 4G1Q on twenty neighbouring residues, surrounding the active ligand, within the vicinity of 6 Å from the centre of the ligand [19]. A dataset of 380 designed proteins is created. Further, ΔΔG was estimated for all the 380 designed proteins for comparing their relative stability with the parent protein, 4G1Q. The snapshot of protein 4g1q is presented in Fig. 1.

Fig. 1
figure 1

Snapshot of 4g1q.pdb

The FASTA sequence of the protein 4g1q is given herewith.

> 4G1Q_1|Chain A|Reverse transcriptase/ribonuclease H|Human immunodeficiency virus type 1 (11,678).

MVPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFAAQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLSKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNKGRQKVVPLTNTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAG.

> 4G1Q_2|Chain B|p51 RT|Human immunodeficiency virus type 1 (11,678).

PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFKKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIKVRQLSKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQ.

The major focus of the work was to understand the impact of single point mutation on the stability of protein using ΔΔG as a measure of stability. In order to understand the impacts of non-synonymous single nucleotide polymorphisms (nsSNPs) on the structure and function of the proteome, as well as to guide protein engineering, accurate in silico methodologies are needed to study and prognosticate their goods on protein stability. The change in folding free energy upon mutation (ΔΔG in kcal/ mol) is used as the measure to understand the impact of the mutation. DUET, a web server for an intertwined computational approach to study missense mutations in proteins is.

In order to do so, complementary information regarding the mutation, analogous as secondary structure (used by SDM) and a pharmacophore vector that accounts for the changes between wild- type and mutant residue (used by mCSM) are also calculated and used by DUET. As described previously, the pharmacophore vector is attained by comparing the frequency of eight possible grain characteristics between wild- type and mutant remainders (positive, negative, hydrophobic, hydrogen patron, hydrogen acceptor, sulphur and neutral [19].

The DUET (Distance-Dependent, United, Enhanced Sampling) server is a computational tool used for estimating changes in binding free energy upon mutation or interaction in molecular systems, such as protein-protein or protein-ligand complexes. DUET utilizes molecular dynamics simulations and the end-point method to calculate these energy changes. Here is an overview of how calculations are done to determine changes in free energy using the DUET server.

2.1 Preparation of input structures

The user typically provides two input structures for DUET analysis: the wild-type (WT) and mutant (MT) structures. These structures can represent a protein-protein complex, protein-ligand complex, or any other molecular system of interest. The WT structure represents the original or reference state, while the MT structure represents the mutant or perturbed state.

2.2 Molecular dynamics (MD) simulations

DUET performs molecular dynamics simulations for both the WT and MT structures. MD simulations involve the numerical integration of Newton’s equations of motion to simulate the behavior of atoms and molecules over time. During the simulations, the system’s potential energy is continuously sampled. DUET uses the CHARMM force field to calculate energy terms, including van der Waals interactions, electrostatic interactions, and solvation energies.

2.3 Energy calculations

At various time points during the MD simulations, DUET calculates the potential energy of the system for both the WT and MT states. The energy terms include intra-molecular energies (energies within the molecules), inter-molecular energies (energies between molecules, e.g., protein-protein or protein-ligand interactions), and solvation energies. Calculation of Binding Free Energy: DUET uses the energy values obtained from the MD simulations to calculate the binding free energy difference (ΔΔG) between the WT and MT states. The binding free energy is typically calculated using the following equation:

$$\Delta \Delta {\text{G}} = \Delta {\text{G}}\_{\text{MT}} - \Delta {\text{G}}\_{\text{WT}}$$

where ΔG_MT is the free energy of the mutant (MT) state, and ΔG_WT is the free energy of the wild-type (WT) state.

DUET performs statistical analysis on the energy data obtained from multiple MD trajectories to improve accuracy and reliability.

3 Result and discussion on duet results

The results of missenses caused by inducing mutations in a protein (4g1q.pdb) molecule and their effects on the stability of designed proteins are detailed in this section.

Missenses were introduced in a total of 20 AARs in silico and mutated de novo design of 380 proteins is carried out. The stability of the designed proteins is carried out by comparing their ΔΔG values, which is a metric for comparing how a single point mutation affects protein stability, with the parent protein 4G1Q.

The impact of the mutations on protein stability based on ΔΔG are assessed in two ways:

  1. A.

    Impact on stability of designed protein on mutation of a specific surrounding amino acid residue.

  2. B.

    Impact on stability of designed protein by mutation of a specific amino acid residue.

The ΔΔG values of all the 380 designed proteins, on a mutation of surrounding amino acid residues, are presented in Table 1, in which ΔΔG for 4G1Q is taken as zero and comparisons are made.

Table 1 ΔΔG values for the designed proteins* as obtained from DUET server

A bar graph showing the comparative ΔΔG values of all the 380 designed proteins is presented in Fig. 2. All values above the x-axis indicate the ΔΔG values of proteins which are unstable than parent 4g1q while those below the x-axis (negative) indicate the ΔΔG values of protein which are stable than the parent 4g1q.

Fig. 2
figure 2

Comparative ΔΔG of designed proteins on mutation of 20 AARs

The results thus obtained from the estimation of ΔΔG using the DUET server table presented in 1, it is observed that of 380 designed (mutated) proteins a total of 41 exhibit positive while 339 exhibit negative ΔΔG values. This suggests 339 stable proteins while 41 unstable proteins are obtained, indicating stabilization effect of mutation in nearly 90% cases.

3.1 Effect of mutation on stability of a specific surrounding amino acid residue

Table 2 presents the order of stability of newly designed proteins formed on mutations of a specific AAR. This also gives a detailed insight into the effect of mutation of a specific AAR.

Table 2 Order of stability of designed protein of mutation of a specific SAAR*

The subsequent data, which show that 380 new designed proteins were produced on a single point mutation, are taken from Table 2. A single point mutation yields 339 stable proteins out of the 380 designed proteins, while 41 designed proteins that are less stable than parent 4G1Q are obtained. All the designed proteins that are obtained by mutating F227, P225, P236, V106, W229, Y183 and Y318 are observed to be more stable than parent 4G1Q, suggesting no effect of mutation on these AARs positions. While mutating P226, Y181, and Y188 mutation produces a total of Fifty-Four (out of Fifty-Seven i.e., Eighteen each) proteins, more stable than 4G1Q are obtained, suggesting mutations of these AARs also stabilizes the designed (mutated) protein but to a lesser extent. 21 out of 41 the unstable proteins were obtained when lysine (K) amino acid residues namely K101, K102 and K103 are mutated. The highest number (08) of unstable designed proteins are obtained when K101 is mutated, while mutation of K102 and K103 yielded 7 and 6 unstable designed proteins, respectively. This suggests mutation of lysine might be highly important in deciding the stability of a protein. This further suggest that introduction of instability might affect the process of denaturation and in all probabilities enhance it, i.e. when lysine is mutated the stability of a protein decreases.

3.2 Effect of mutation on stability by a specific amino acid residue

Table 3 presents the impact of mutation by a specific mutation on the stability of designed proteins.

Table 3 Effect of mutation by specific AAR*

The following conclusions are drawn from Table 3 on the influence of mutation caused by a particular AAR. The AARs G, H, and K impact the stability of 4G1Q the most and on mutation by these AARs all the de novo designed proteins are observed to be more stable than parent 4G1Q. A little lesser Impact is observed when mutations is performed by F and P, wherein only one designed protein, less stable than parent 4G1Q is obtained for each mutation. The Lysine (K) AAR produces the highest number (07) of unstable designed proteins. Of the various impacts of mutation, in 10 cases where K102 is mutated, most unstable designed proteins are obtained. Surprisingly, mutation by and mutation of lysine is creating instability in the designed protein suggesting that neither lysine should be mutated nor it should be used for mutation.

The designed proteins have been classified on the basis of mutation of a specific AAR and their stability (ΔΔG) range. Table 4 shows the details of these mutations and stability (ΔΔG range) of designed proteins.

Table 4 Classification of designed proteins

Table 4 provides the following observations: as previously mentioned, 339 stable and 41 unstable designer proteins are obtained. Of the 339 stable designer protein, 12 highly stable designed proteins are obtained on the mutation of F227, L100, Y188 and Y318. Their ΔΔG values thus obtained are between − 4.0 and − 3.0. Of these 12 designed proteins it is observed that the maximum number (05) of most stable proteins are obtained when Y318 is mutated. These 12 stable proteins are obtained on mutation of hydrophobic AARs. 58 proteins having ΔΔG values between − 3.0 and − 2.0 are obtained. Of these highest number (09 each) of designed proteins, within this stability range, is obtained when L100 and Y318 are mutated. 113 proteins having ΔΔG values between − 2.0 and − 1.0 are obtained. These can be classified as moderately stable.87 proteins having ΔΔG values between − 2.0 and − 1.0 are obtained and these can be classified relatively less stable. 99 designer protein having ΔΔG values between − 0.5 and 0.5 are obtained, and the stability of these cannot be justified as the ΔΔG values are within the noise range so should be considered indifferent or neutral. A total of 11 highly unstable designed proteins are obtained on the mutation of E138_B, K101, K102, and K103. Their ΔΔG values thus obtained are greater than 0.5. The unstable designed proteins are obtained when the charged AARs (E and K) are mutated.

Table 5 shows that all of the designer proteins produced by the mutations of F227, P225, P236, V106, W229, Y183, and Y318 are more stable than the parent 4G1Q. These results of the present study are contrary to the belief that mutation induces instability in the protein and the naturally occurring proteins acquire most stable form. The Lysine residues (101, 102 and 103) are the most affected AARs and they produce least number of stable designer proteins. Though, the missenses are induced in silico, the results need to be verified practically.

Table 5 Shows the number of stable proteins obtained on mutation of a specific AAR

Another way in which the designed proteins have been classified is on the basis of mutation by a specific AAR and their stability range (ΔΔG). Table 6 shows the details of these mutations and stability (ΔΔG range) of designed proteins.

Table 6 Classification of designed proteins

The following observations are derived from Table 6, and as previously said, 339 stable and 41 unstable designer proteins are found. Of the 339 stable designer protein, 12 highly stable designed proteins are obtained on the mutation by A, D, G, E, S and T. Their ΔΔG values thus obtained are between − 4.0 and − 3.0. Of these 12 designed proteins it is observed that the maximum number (03 each) of most stable proteins are obtained when mutated by G and S. No regular pattern of impact of mutation by a specific property of is obtained. 58 proteins having ΔΔG values between − 3.0 and − 2.0 are obtained. Of these highest number (07 each) of designed proteins, within this stability range, is obtained when mutated by N and T. 113 proteins having ΔΔG values between − 2.0 and − 1.0 are obtained. These can be classified as moderately stable. 87 proteins having ΔΔG values between − 2.0 and − 1.0 are obtained and these can be classified relatively less stable. 99 designer protein having ΔΔG values between − 0.5 and 0.5 are obtained, and the stability of these cannot be justified as the ΔΔG values are within the noise range so should be considered indifferent or neutral. A total of 11 highly unstable designed proteins are obtained on the mutation by E, I, L M, and P. Their ΔΔG values thus obtained are greater than 0.5. In this case the mutation caused by hydrophobic has given the most unstable designed protein.

As can be seen from the Table 7, the designer proteins that resulted from the mutations generated by A, C, D, G, H, K, Q, N, S, and T are all more stable than the original 4G1Q protein. On mutation by L and Y, highest number (08) of unstable designer proteins, suggesting I and L follow, relatively better than other AARs, the trend of natural phenomena wherein mutation causes instability in the protein.

Table 7 Shows the number of stable and unstable proteins obtained on mutation by a specific AAR

The comparative stability analyses reveals that the following combinations give the top 11 most unstable de novo designed proteins and are presented in Table 8.

Table 8 Mutated and mutation by AARs yielding most unstable designer proteins

From the Table 8 it is observed that mutation of combinations K102-I/L/P/E give most unstable proteins.

4 Conclusions

The study has given surprising results and a higher number of stable designer proteins were obtained on mutation. As the work take cares of single point mutation and nothing else, the results are non-traditional. However, the environment at each position should be considered. If interacting molecules are not present in the model, such as at a known zinc-binding site, then a seemingly favourable mutation will not be favourable in reality.

A position that has a lot of negative ΔΔGs could mean that this position evolved a destabilizing residue because it is necessary for its catalytic activity, for binding another molecule, or because of another functionally relevant reason.

Moreover, it must be kept in mind that this quantifies a single-point mutation. Sometimes sufficient stability can only be attained by various interrelated changes. Only one mutation can be predicted by ΔΔG at a time. It is a must to induce the mutations and run some relax reiterations in order to determine if multiple mutations would have a cumulative effect on stability. It takes longer much time calculate even almost exact ΔΔG.