Computational Analysis of Non-synonymous SNPs in ATM Kinase: Structural Insights, Functional Implications, and Inhibitor Discovery

Panchal, Nagesh Kishan; Samdani, Poorva; Sengupta, Tiasa; Prince, Sabina Evan

doi:10.1007/s12033-024-01120-x

Computational Analysis of Non-synonymous SNPs in ATM Kinase: Structural Insights, Functional Implications, and Inhibitor Discovery

Original Paper
Published: 15 March 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Molecular Biotechnology Aims and scope Submit manuscript

Computational Analysis of Non-synonymous SNPs in ATM Kinase: Structural Insights, Functional Implications, and Inhibitor Discovery

Download PDF

305 Accesses
Explore all metrics

Abstract

Ataxia telangiectasia-mutated (ATM) protein kinase, a key player in cellular integrity regulation, is known for its role in DNA damage response. This study investigates the broader impact of ATM on cellular processes and potential clinical manifestations arising from mutations, aiming to expand our understanding of ATM’s diverse functions beyond conventional roles. The research employs a comprehensive set of computational techniques for a thorough analysis of ATM mutations. The mutation data are curated from dbSNP and HuVarBase databases. A meticulous assessment is conducted, considering factors such as deleterious effects, protein stability, oncogenic potential, and biophysical characteristics of the identified mutations. Conservation analysis, utilizing diverse computational tools, provides insights into the evolutionary significance of these mutations. Molecular docking and dynamic simulation analyses are carried out for selected mutations, investigating their interactions with Y2080D, AZD0156, and quercetin inhibitors to gauge potential therapeutic implications. Among the 419 mutations scrutinized, five (V1913C, Y2080D, L2656P, C2770G, and C2930G) are identified as both disease causing and protein destabilizing. The study reveals the oncogenic potential of these mutations, supported by findings from the COSMIC database. Notably, Y2080D is associated with haematopoietic and lymphoid cancers, while C2770G shows a correlation with squamous cell carcinomas. Molecular docking and dynamic simulation analyses highlight strong binding affinities of quercetin for Y2080D and AZD0156 for C2770G, suggesting potential therapeutic options. In summary, this computational analysis provides a comprehensive understanding of ATM mutations, revealing their potential implications in cellular integrity and cancer development. The study underscores the significance of Y2080D and C2770G mutations, offering valuable insights for future precision medicine targeting-specific ATM. Despite informative computational analyses, a significant research gap exists, necessitating essential in vitro and in vivo studies to validate the predicted effects of ATM mutations on protein structure and function.

Graphical Abstract

A pan-cancer assessment of alterations of the kinase domain of ULK1, an upstream regulator of autophagy

Article Open access 10 September 2020

Computational insights into NIMA-related kinase 6: unraveling mutational effects on structure and function

Article 20 December 2023

A computational approach for structural and functional analyses of disease-associated mutations in the human CYLD gene

Article Open access 31 May 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Several DNA damage events ensue in the human body every day as a result of exposure to diverse environments [1]. These conditions effect the DNA by simple base alterations, base incongruities, inter-strand crosslinks, intra-strand crosslinks, bulky DNA adducts, DNA–protein crosslinks, single-stranded break (SSB), and double stranded break (DSB) [2]. When normal cells are stressed and their DNA is damaged, the damage can be repaired utilising intact DNA repair pathways until the stress becomes severe enough to cause cell death or senescence [3]. The ataxia telangiectasia mutated (ATM) protein is one of the most important unit of the DNA damage response system, acting as an intra-cellular sensor for DSB [4]. It is generally found in cells in the dimeric forms and undergoes auto-phosphorylation in response to DNA damage, resulting in the separation of the inactive complex. The following activation of a signalling cascade linking the phosphorylation of several substrates, which leads to two critical responses to DNA damage: the cell-cycle checkpoints activation and the beginning of DNA repair. Therefore, when DNA repair mechanism fails, apoptosis gets triggered [5]. ATM substrates comprise Mdm2, c-Abl, and p53, which impact the G1 checkpoint; Rad51, NbsS1, FANCD2, and BRCA1 that plays role in the transient IR-induced S-phase arrest; besides Chk1, Chk2, and BRCA1 that control the G2 checkpoint [5, 6]. It modulates networks participating in DNA repair, insulin-like growth factor, stress response and other metabolic pathways, with approximately phosphorylating 700 targets, as a result of DSBs. The large number of ATM targets during DNA repair or genomic stress is most likely a method of coordinating many pathways. ATM and other members of the PIKK family, such as the catalytic subunit of DNA-dependent protein kinase (DNA-PKc) and ATM-related (ATR), exhibit redundancy and collaborate in response to various forms of genotoxic stress (Fig. 1).

The structure of ATM is characterized by a butterfly-shaped dimer, formed by the combination of the FAT and KD domains into a dimeric body referred to as FATKD (Fig. 2). Emerging from this body is the N-terminal α–α solenoids, spanning approximately 1900 residues, identified as Spiral and Pincer domains. The Spiral domain covers residues 1–1166, followed by the Pincer domain encompassing residues 1167–1898 [7, 8]. Moving along the sequence, the FAT domain, named after FRAP, ATM, TRRAP, extends from residues 1899–2613, while the Kinase domain occupies residues 2614–3056. Similar to other PIKKs, the Kinase domain comprises an N-terminal lobe (residues 2614–2770) and a C-terminal lobe (residues 2771–2957), with the catalytic cleft situated between them. The C lobe concludes with the FAT C-terminal domain (residues 3027–3056), a distinctive feature within the PIKK family, absent in canonical kinases [9, 10]. Maintaining structural integrity is crucial, as mutations in key residues of the ATM protein can potentially alter its structure, thereby leading to significant functional changes.

Understanding the significance of single-nucleotide polymorphisms (SNPs) in human genetic phenotypic variation will help us better understand human genetic phenotypic variability, particularly in complex illnesses. Additionally, SNPs in the ATM gene can disturb all of the above-mentioned interactions, which are necessary for the kinase’s normal function, and several studies have connected SNPs in the ATM gene to a range of diseases [11,12,13]. SNPs in the biologically important regions of ATM can alter its normal function. Despite the fact that ATM is an important kinase linked to DNA repair and a diversity of malignancies, only a few computational studies have been demonstrated to be involved in detecting disease-associated mutations and their role in structure and function change.

Several computational analyses have already been carried out in past to find harmful SNPs in the gene linked to human diseases [14,15,16,17]. As a result, the goal of this study was to assess the potential impacts of SNPs on distinct structural regions of ATM that might affect its function and perhaps play a role in cancer progression. To accompany this, we primarily used several computational algorithms such MetaSNP, Pmut, and Provean to evaluate the deleterious/disease-causing potential of SNPs. Additionally, the web servers like I-Mutant 2.0, mCSM, SDM2, CUPSAT, and MUpro were used to evaluate the effect of SNPs on protein stability. Later the cancer-promoting potentials and residual conservation of SNPs were evaluated by FATHMM-cancer and ConSurf server, respectively. Following that, we presented modelled protein structures for the mutations using PyMOL mutagenesis plugin. The molecular-docking analysis of wild type and mutants was performed against ATM inhibitors such as AZD0156 and AZD1390 along with the natural compound quercetin and best docked possess were analysed and represented. Lastly, to validate the docking experiments, the molecular dynamics simulation was performed.

Materials and Methods

The workflow that was followed is depicted in Fig. 3.

Data Collection

The ATM kinase mutations (SNPs) list was gleaned through online mutational databases such as HuVarBase (https://www.iitm.ac.in/bioinfo/huvarbase/), and dbSNP (https://www.ncbi.nlm.nih.gov/snp/). The UniProt KB (https://www.uniprot.org/) a protein sequences database was used to obtain the protein sequence information of ATM kinase [UniProt Id: Q13315 (ATM_HUMAN)]. The 3D coordinates of ATM kinase protein were obtained from the Protein Data Bank (RCSB PDB) PDB Id: 5PN0 (http://www.rcsb.org/) for the study.

Deleterious Mutation Analysis

We used a number of publicly available tools for this research, which are briefly described below.

A web-based tool called MetaSNP that aids in identifying polymorphic missense SNPs associated with disease is based on a random forest binary classifier. It primarily incorporates four widely used techniques, SIFT, PhD-SNP, PANTHER, and SNAP, which aid MetaSNP in more effectively detecting harmful variants. SV-2009 dataset was used to train and test this tool using a 20-fold cross-validation procedure (https://snps.biofold.org/meta-snp/index.html) [18].

A protein’s biological function can be predicted using the online tool PROVEAN v1.1.3 (Protein Variation Effect Analyzer), which predicts how an amino acid substitution or indel will impact a protein. The scores generated both within and between clusters are averaged to produce the PROVEAN score. The tool’s default threshold score is “− 2.5,” and if the variant is predicted to be less than that score, it is predicted to be “deleterious,” while if it is predicted to be more than that score, it is predicted to be “neutral” (http://provean.jcvi.org/about.php) [19].

A neural network algorithm is used by the online server PMut to forecast the pathological nature of missense mutations. SwissVar is a variation database that has been manually curated to train this tool. It primarily functions on two levels; first, it retrieves data from a local database of mutational hotspots, and then it assesses a specific SNP in a particular protein. It foresees that the mutation score will range from 0 to 1. If mutations scoring 0 to 0.5 are considered neutral mutations and mutations scoring 0.5 to 1 are considered disease-causing mutations (http://mmb.irbbarcelona.org/PMut/) [20].

Protein Stability Check

A novel programme called mCSM uses a graph-based approach to examine the effects of missense mutations on protein stability. As a result of the atomic distance pattern of various residues, it has been trained in a particular environment. mCSM provides a better understanding of mutations and their relationship to diseases for a large number of proteins. For evaluating mutation stability, this programme has a unique cutoff (scoring pattern). When a mutation’s Gibbs free energy is predicted to be greater than zero, it is said to be “stabilising,” and vice versa if the mutation’s Gibbs free energy is below zero (http://biosig.unimelb.edu.au/mcsm/) [21].

Site-directed mutator 2, or SDM2, is a computer programme that assesses the variation in protein stability brought on by mutations. Following the environment-specific amino acid substitutions tables based on density packing and residue length, it evaluates the effects of mutations. Over 130 different proteins have been tested using this tool’s nearly 2690 different amino acid substitutions. If the Gibbs free energy is above “0,” it is predicted to be stabilising, and if it is below “0,” it is predicted to be destabilising (http://marid.bioc.cam.ac.uk/sdm2) [22].

A web server called iSTABLE is used to forecast the stability of proteins. It establishes whether a mutation has made a protein more or less stable. Support vector machines are used as integrators by this server. The two primary input options for this tool are structural and sequential. A stabilising mutation is indicated by a positive Gibbs free energy value, while a destabilising mutation is indicated by a negative number (http://predictor.nchu.edu.tw/istable/) [23].

Cologne University Protein Stability Analysis Tool (CUPSAT) is a computer programme that analyses the effects of point mutations on protein stability. It predicts the difference in Gibbs free energy between wild-type/normal and mutant proteins. The findings include information on the mutation’s location, structure, and the specific effects of 19 different amino acid substitutions on protein stability. A positive Gibbs free energy value indicates a stabilising mutation, whereas a negative number indicates a destabilising mutation (http://cupsat.tu-bs.de/) [24].

I-Mutant 3.0 is a machine-learning-based technique that considers altered residues’ spatial surroundings in terms of surrounding residue types and surface accessibility. I-Mutant 3.0 has been trained to perform the following tasks: (I) Predict the direction of protein stability changes as a result of mutations (a classification task); (II) Predict the Gibbs free energy as a result of mutations (a function approximation task) (https://gpcr2.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi) [25].

MUpro predicts the effect of a mutation on protein stability using a suite of machine learning systems. The results are centred on two machine learning methodologies, support vector machines, and neural networks. It calculates the effect of mutation on protein stability using the value of the Gibbs free energy change. It also forecasts the direction of energy change using neural networks and support vector machines. Furthermore, it predicts protein stability without knowing the protein’s tertiary structure (http://mupro.proteomics.ics.uci.edu/) [26].

Cancer-Causing Potential

FATHMM-cancer is a web-based high-throughput tool for predicting the functional consequences of mutations. It forecasts the cancer-causing potential of specific mutations. Based on the default threshold score of “− 0.75”, this tool generates a prediction. A predicted score less than “− 0.75” indicates that the mutation is “cancer-promoting”, whereas a score greater than “− 0.75” indicates that the mutation is a “passenger” (http://fathmm.biocompute.org.uk/cancer) [27].

Biophysical Characteristics

The biophysical properties were examined using the Align GVGD server. The prediction analysis was given the mutation list and a multiple sequence alignment as inputs. The information is arranged by Class, which ranges from 0 (most likely neutral) to 65 (most likely deleterious) (http://agvgd.hci.utah.edu/agvgd_input.php).

Conservation Analysis

The conservation of amino acids is critical for understanding protein evolution and function. The ConSurf server is a computational tool that uses multiple sequence alignment to assess amino acid conservation in a protein based on phylogenetic relationships between homologous sequences. It has a scoring scale of “1 to 9”, with 1 indicating little or no conservation, 5 indicating moderate conservation, and 9 indicating high conservation. Furthermore, buried amino acids with a high conservation value are considered structural residues, whereas exposed amino acids with a high conservation score are considered functional residues (https://consurf.tau.ac.il) [28, 29].

Mutant Protein Modelling and Quality Assessment

Using the mutagenesis plugin embedded in PyMOL (www.pymol.org), Y2080D and C2770G mutant models were created using the wild-type ATM as a reference model. Subsequently, the SwissPDB viewer was employed to mitigate high-energy configurations, employing the GROMOS 43B1 force field for energy minimization in both mutant and wild-type ATM structures. This involved adjusting their coordinate geometries to release internal constraints and diminish the overall potential energy.

Drug-Likeness Property and ADME Check

The ADMETlab 2.0 server (https://admetmesh.scbdd.com/service/evaluation/cal) was used to evaluate the drug likeness and pharmacokinetic property of two known ATM kinase inhibitors, AZD0156 and AZD1390, as well as a natural chemical compound “quercetin”, which has previously exhibited to have anticancer characteristics.

Molecular Docking Analysis

The AutoDock software was used to perform molecular-docking studies with AZD0156, AZD1390, and quercetin for wild type and mutants [30]. The wild-type ATM and mutants were given all of the necessary polar hydrogen, solvation parameters, and were assigned Kollman United Atom charges. Grid (affinity) maps with 100 (X), 100 (Y), and 100 (Z) grid points, plus a spacing of 0.375, were created for the protein’s active site using the AutoGrid programme. The Lamarckian Genetic Algorithm (LGA) was used to perform the molecular docking, with each experiment containing ten distinct runs [31]. Finally, using the Discovery studio visualizer and Pymol software, the structure of the docked complexes with the highest binding affinity was visualised.

Molecular Dynamic (MD) Simulation

MD simulations were conducted for docked complexes involving wild-type ATM, Y2080D, and C2770G as protein targets, along with the ligands AZD0156, AZD1390, and quercetin. GROMACS 2021 and the PROGRG server were employed to generate ligand and complex topologies. The complexes were solvated with simple point charge (SPC) water molecules, and NA⁺ and Cl⁻ ions were added for neutralization. The system underwent initial equilibration in the NVT ensemble, addressing particle number, volume, and temperature, followed by equilibration in the NPT ensemble, which considered particle number, pressure, and temperature. Subsequently, 10,000 picoseconds (ps) of MD simulation were conducted for the complexes.

Post MD Analysis

The analysis of MD simulations results involved the utilization of trajectory files, including computations for Root-Mean-Square Deviation (RMSD), Radius of Gyration (Rg), and Solvent-Accessible Surface Area (SASA). Additionally, Principal Component Analysis (PCA) was performed using various built-in scripts in GROMACS. The graphical representation of all trajectory files was generated using the QtGRACE visualization software.

MM-PBSA Assessment

The g_mmpbsa package was employed in conjunction with GROMACS 2021 to assess the molecular mechanics Poisson Boltzmann surface area (MM-PBSA) and analyze the free binding energy of wild-type ATM, Y2080D, and C2770G proteins in complex with ligands (AZD0156, AZD1390, and quercetin). The binding energy was computed based on the final 1000 ps from the 10,000 ps MD simulation production. The estimation of binding affinity considered both bonded and non-bonded interactions in the solvent stage, distinguishing between interactions in the vacuum. To calculate polar and non-polar solvation energy, the Poisson Boltzmann equation and solvent-accessible surface area (SASA) were utilized. The binding free energy (ΔG binding) was determined using the following equation:

$$\Delta G{\text{ binding}} = \Delta G{\text{ complex}} - \left( {\Delta G{\text{ protein}} + \Delta G{\text{ ligand}}} \right)$$

Results

Distribution of ATM SNPs

For our analysis, we used a list of 419 ATM kinase SNPs found in public databases which are positioned in different coding regions of the protein.

Analysis of Pathogenicity

The impact of missense SNPs on the amino acids they alter can be used to estimate their pathogenicity. Therefore, this investigation was mainly focused on ATM kinase missense mutations and their pathogenic/deleterious effect. A total of 419 SNPs were analysed by Provean, Pmut, and MetaSNP (PANTHER, PhD-SNP, SIFT, SNAP) which resulted 167, 89, 250, 311, 240, 283, and 269 as deleterious SNPs, respectively, and is represented in the graphical manner (Fig. 4). In addition, a detailed dataset of predicted results is presented in Supplementary Table 1. Overall results achieved from this investigation exhibited “54” SNPs as deleterious/pathogenic which are residing on different domains of ATM from the large pool of mutations. The 54 deleterious SNPs details sheet obtained, replete with score and server predictions, is displayed in Table 1. Later, these 54 deleterious SNPs were further analysed for protein stability check.

Table 1 Pathogenicity analysis

Full size table

Analysis of Protein Stability

The impact of the 54 most deleterious mutations on protein stability was predicted using Mupro, iStable, iMutant 3.0, mCSM, SDM, and CUPSAT. Out of 54 mutations, 5 (V1913C, Y2080D, L2656P, C2770G, C2930G) were found to be destabilising SNPs based on the examination of all 6 stated web-based algorithms; details related to the score and predictions are listed in Table 2 and are graphically depicted in Fig. 5.

Table 2 Protein stability analysis

Full size table

Analysis of Oncogenic Nature of ATM Mutants

FATHMM-cancer was used to check the cancer-causing potentials of the V1913C, Y2080D, L2656P, C2770G, and C2930G mutations. The scores for V1913C, Y2080D, L2656P, C2770G, and C2930G derived from this study were − 2.87, − 1.5, − 2.4, − 1.69, and − 2.64, respectively, and were predicted to have cancer-promoting potential (Table 3). Overall, the results of this prediction indicated that these mutations have a role in cancer and subjected for further analysis.

Table 3 Cancer-promoting analysis

Full size table

Analysis of Biophysical Characteristics

The V1913C, Y2080D, L2656P, C2770G, and C2930G mutations were subjected to Align GVGD server to assess the biophysical characteristics. The results obtained from the server showed that all of the mutations belong to the class 65 (most likely deleterious) (Table 4).

Table 4 Biophysical characteristics

Full size table

Prevalence of Mutations in Cancer

We investigated the cancer incidence of the mutations in the COSMIC database based on the FATHMM-cancer and Align GVGD prediction results and discovered that Y2080D was reported in Haematopoietic and lymphoid cancer, and C2770G was reported in squamous cell carcinomas. Therefore, these two mutations were taken for further analysis.