In silico screening and analysis of nonsynonymous SNPs in human CYP1A2 to assess possible associations with pathogenicity and cancer susceptibility

Navapour, Leila; Mogharrab, Navid

doi:10.1038/s41598-021-83696-x

In silico screening and analysis of nonsynonymous SNPs in human CYP1A2 to assess possible associations with pathogenicity and cancer susceptibility

Article
Open access
Published: 02 March 2021

Volume 11, article number 4977, (2021)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

In silico screening and analysis of nonsynonymous SNPs in human CYP1A2 to assess possible associations with pathogenicity and cancer susceptibility

Download PDF

Leila Navapour¹ &
Navid Mogharrab¹

2968 Accesses
14 Citations
Explore all metrics

Abstract

Cytochrome P450 1A2 (CYP1A2) is one of the main hepatic CYPs involved in metabolism of carcinogens and clinically used drugs. Nonsynonymous single nucleotide polymorphisms (nsSNPs) of this enzyme could affect cancer susceptibility and drug efficiency. Hence, identification of human CYP1A2 pathogenic nsSNPs could be of great importance in personalized medicine and pharmacogenetics. Here, 176 nsSNPs of human CYP1A2 were evaluated using a variety of computational tools, of which 18 nsSNPs were found to be associated with pathogenicity. Further analysis suggested possible association of 9 nsSNPs (G73R, G73W, R108Q, R108W, E168K, E346K, R431W, F432S and R456H) with the risk of hepatocellular carcinoma. Molecular dynamics simulations revealed higher overall flexibility, decreased intramolecular hydrogen bonds and lower content of regular secondary structures for both cancer driver variants G73W and F432S when compared to the wild-type structure. In case of F432S, loss of the conserved hydrogen bond between Arg137 and heme propionate oxygen may affect heme stability and the observed significant rise in fluctuation of the CD loop could modify CYP1A2 interactions with its redox partners. Together, these findings propose CYP1A2 as a possible candidate for hepatocellular carcinoma and provide structural insights into how cancer driver nsSNPs could affect protein structure, heme stability and interaction network.

Identification of the Effects of Pathogenic Genetic Variations of Human CYP2C9 and CYP2D6: An In Silico Approach

Article 01 April 2024

Computational analysis of missense variant CYP4F2*3 (V433M) in association with human CYP4F2 dysfunction: a functional and structural impact

Article Open access 09 May 2023

Non-synonymous SNPs variants of PRKCG and its association with oncogenes predispose to hepatocellular carcinoma

Article Open access 21 June 2023

Introduction

The genome of two individuals, except for identical twins, shares 99.9% identity and only differs by 0.1%. Although, this value seems very low, it is responsible for about 3 million differences among 3.2 billion base pairs¹. The most abundant genetic variations in the human genome are single nucleotide polymorphisms (SNPs) which play a significant role in the phenotypic diversity, interindividual differences in susceptibility to complex diseases and drug reactions^1,2. However, a small number of the SNPs is associated with pathogenicity that must be distinguished from a pool of neutral variants. Although experimental techniques provide the most accurate and reliable approaches for assessing the consequences of a substitution, analysis of all SNPs in human genome or even in a single gene is a major challenge for researchers due to the complex, time-consuming and costly experimental procedures³. Therefore, in silico computational approaches have attracted considerable interest of biologists, as they make it possible to screen a large number of SNPs in a relatively short time and low cost, and to prioritize them for further experimental and clinical tests. Moreover, the structure–function relationship studies by molecular dynamics (MD) simulations could elucidate the molecular mechanisms of diseases and may provide valuable insights into the diagnosis as well as treatment^4,5,6,7.

The human cytochrome P450 (CYP) superfamily enzymes are the most important enzymes of the phase I xenobiotic metabolism and known as one of the highly polymorphic proteins^8,9. Single nucleotide polymorphisms in the enzymes of this superfamily play an important role in differences between individuals in response to drugs and other xenobiotics as well as the susceptibility to develop various diseases¹⁰. Among 18 cytochrome P450 families encoded by the human genome, members of CYP1 family are particularly important due to their major contribution to the metabolism of carcinogenic compounds such as polycyclic aromatic hydrocarbons (PAHs)^11,12,13. This family of CYPs has three members CYP1A1, CYP1A2 and CYP1B1 grouped into A and B subfamilies^11,13.

The human CYP1A2 gene is located on the long arm of chromosome 15 (15q24.1) that spans seven exons. The CYP1A2 protein is exclusively expressed in liver and plays an important role in metabolism of heterocyclic and aromatic amines, caffeine and melatonin^{14,15,16,17,18}. This enzyme is also responsible for hepatic metabolism of many clinically used drugs such as tacrine, zolpidem, clozapine, theophylline and so on^{19,20,21,22,23}, while the other two enzymes of this family do not have significant role in drug metabolism.

The human CYP1A2 gene encodes a heme-binding protein composed of 516 residues. The three-dimensional (3D) structure of the protein covering residues 34–513 has been determined in complex with the inhibitor α-naphthoflavone, while the structure of the N-terminal transmembrane helical domain is lost in this crystal structure¹⁴. According to the crystal structure (PDB ID: 2HI4¹⁴), CYP1A2 holds fifteen α-helices and five β-sheets¹⁴. The iron atom of the heme prosthetic group is coordinated by the Cys458 of the protein moiety which belongs to the consensus signature of cytochrome P450 proteins (PROSITE signature PS00086)²⁴. In addition, arginine 137 (R137) from C helix is hydrogen bonded to the heme propionate oxygen and further stabilizes its position.

CYPalleles is a web page which was developed to standardize the nomenclature of human cytochrome P450 alleles (http://www.cypalleles.ki.se/⁹). It also provides genetic information and the molecular effect of the variants on the enzyme activity. More than 20 alleles have been reported for CYP1A2 gene in CYPalleles, among them, CYP1A2*6 (R431W), CYP1A2*8 (R456H), CYP1A2*11 (F186L), CYP1A2*15 (P42R) and CYP1A2*16 (R377Q) are the most studied alleles^{25,26,27,28,29}. Nevertheless, structural or functional consequences of the vast majority of nsSNPs for CYP1A2 recorded by the NCBI dbSNP database have not yet been determined. Since CYP1A2 is one of the main hepatic CYPs involved in the bioactivation of carcinogens and metabolism of clinically used drugs, SNPs of this enzyme could affect cancer susceptibility or drug efficiency. Therefore, the identification and evaluation of CYP1A2 pathogenic nonsynonymous SNPs (nsSNPs) are of major importance. This is also helpful in personalized medicine and optimization of drug treatment to achieve the most efficiency and least side effects. In this study, nsSNPs of CYP1A2 gene were evaluated by computational tools to identify pathogenic nsSNPs. We also performed MD simulation to assess how these nsSNPs affect the protein structure.

Methods

Data collection

The human CYP1A2 protein sequence was obtained from UniProt database³⁰ (UniProt ID: P05177). SNP data for CYP1A2 gene were retrieved from NCBI dbSNP³¹ build 150. All nucleotide positions were related to GRCh37.p13 (hg19) annotation release 105. The three-dimensional structure of the CYP1A2 protein (PDB ID: 2HI4¹⁴) was downloaded from Protein Data Bank (https://www.rcsb.org³²).

In silico evaluation of nsSNPs

In silico evaluation of CYP1A2 nsSNPs was performed using a variety of computational tools in a stepwise fashion where the output of each step was served as the input for the next one. SIFT³³, PROVEAN³⁴, MutationAssessor³⁵, EFIN³⁶, LRT³⁷, FATHMM-MKL³⁸, PhD-SNP³⁹, and CADD⁴⁰ are sequence-based predictors which could be easily applied to amino acid or nucleotide sequences. PolyPhen2⁴¹, SNAP2⁴², SuSPect⁴³, PMUT⁴⁴ and MutPred2⁴⁵ are sequence- and structure-based tools which utilize the user-provided sequence information and the self-extracted structural features to predict if SNPs are associated with functional effects.

We categorized the tools into three groups (Table 1). SIFT³³, PROVEAN³⁴, MutationAssessor³⁵, EFIN³⁶, LRT³⁷, FATHMM-MKL³⁸, CADD⁴⁰, PolyPhen2⁴¹ and SNAP2⁴² were used to predict the impact of the nsSNPs on the protein function. PhD-SNP³⁹, SuSPect⁴³, PMUT⁴⁴, MutPred2⁴⁵ and VEST-4⁴⁶ were employed to assess the likelihood that a variant is pathogenic. CHASM-3.1⁴⁷ was used to identify possible cancer driver variants. All prediction scores were received directly from their own web servers except for VEST-4 and CHASM-3.1 which were fetched from CRAVAT⁴⁸ server. In addition to the score, VEST-4 and CHASM-3.1 also assign a p-value to each variation and an approximate false discovery rate (FDR) for each p-value. The p-value denotes the probability that benign/passenger variant is misclassified as a pathogenic/driver.

Table 1 Classification of the methods used for in silico evaluation of CYP1A2 gene nsSNPs.

Full size table

Evolutionary conservation analysis

The evolutionary conservation of amino acid positions was calculated with ConSurf^49,50 web server which assigns a score between 1 (most variable position) and 9 (most conserved position) to each amino acid position. The protein sequence similarity searching was performed against UNIREF-90 in which CSI-BLAST (Context-Specific Iterated-Basic Local Alignment Search Tool), 3 and 0.0001 were set for homolog search algorithm, number of iteration and E-value cutoff, respectively.

Prediction of transmembrane helix

The TMHMM 2.0 (Transmembrane Hidden Markov Model)⁵¹ web server was used to predict transmembrane helices. The TMHMM incorporates hydrophobicity, charge bias, helix lengths and grammatical constraints into prediction of various regions of a transmembrane protein.

Molecular dynamics simulation

All MD simulations were conducted by GROMACS package version 5.0.5⁵² using the CHARMM36 force field⁵³. The crystal structure of the CYP1A2 protein (PDB ID: 2HI4¹⁴) was used as the starting structure for wild-type (WT) protein after removing the ligand (alpha-naphthoflavone) atomic coordinates. The initial structure of the variant proteins was generated from WT structure using mutate tool of Swiss-Pdb Viewer v4.1.0⁵⁴. The proteins were immersed in a cubic box of TIP3P water molecules. An adequate number of water molecules was replaced by counter ions to neutralize the systems. Each neutralized system was then subjected to steepest descent energy minimization until the maximum force fell below 500 kJ mol⁻¹ nm⁻¹. In order to equilibrate the solvent and ions around the proteins, two position-restrained MD simulations were carried out. The temperature and pressure of the systems were controlled at 300 K and 1 bar by V-rescale thermostat⁵⁵ and Parrinello-Rahman barostat⁵⁶, respectively. After equilibration, each system was subjected to 200 ns (ns) unrestrained MD simulation considering the similar conditions as two previous position-restraint simulations. The LINCS algorithm⁵⁷ was used to constrain the bonds with hydrogen atoms and the particle mesh Ewald method⁵⁸ was employed for long range electrostatic interactions. The Cut-off distance for the Lennard–Jones, short-range and long-range electrostatic interaction was set to 12 Å. A time step of 2 fs was used for integrating Newton's equations of motion.

Trajectory analysis and visualization

Most of the trajectory analyses reported in this study were performed by built-in utilities of GROMACS package version 5.0.5⁵². The root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), and intramolecular hydrogen bonds were analyzed using gmx rms, gmx rmsf, gmx gyrate and gmx hbond of GROMACS package, respectively. The secondary structure content of the proteins was calculated as a function of time using the DSSP program⁵⁹. The principal component analysis (PCA) was conducted using gmx covar and gmx anaeig. To perform free energy landscape (FEL) analysis, all-atom RMSD with respect to the average structure and radius of gyration were initially obtained for the analyzed time frames and then were employed by gmx sham module of GROMACS for calculation of Gibbs free energy as well as construction of FEL. A conformation with minimum free energy was extracted as the representative structure for visualization. The three-dimensional structures of the proteins were visualized using Chimera 1.11⁶⁰. The CaPTURE program⁶¹ was used to explore cation-pi interactions of the snapshots extracted from the MD trajectories.

Results

The SNP dataset

The nsSNPs of human CYP1A2 gene were retrieved from the NCBI dbSNP database³¹ build 150. The nsSNPs which met at least one of the following criteria in the validation method were entered to the evaluation: (1) sequenced in 1000Genome project (1000G), (2) validated by multiple independent submissions to the refSNP cluster, (3) validated by frequency or genotype data, (4) genotyped by HapMap project, (5) validated by submitter confirmation, and (6) observed in at least two chromosome apiece. The nsSNPs which have no information on validation method (did not have any of the mentioned criteria) were excluded. Among them, there were four known alleles of CYP1A2 which were listed in CYPalleles including P42R (CYP1A2*15), S212C (CYP1A2*12), R377Q (CYP1A2*16) and N397H (CYP1A2*18). Since these alleles have been frequently studied, we made an exception for these nsSNPs and included them in our analyses. Totally, 176 nsSNPs were prepared for analysis (Supplementary Table S1). More than half of the nsSNPs occured in exon 2 (n = 94) and the others were mapped in exons 3 (n = 10), 4 (n = 8), 5 (n = 17), 6 (n = 14) and 7 (n = 33). The G to A transition is the most frequent nucleotide substitution (29.5%) found among all analyzed variations followed by C to T (23.9%), A to G (6.8%) and C to A (6.8%). At the protein level, the most common amino acids as the reference and missense were Arg (n = 45) and Leu (n = 17), respectively. The replacements of Arg with Trp (n = 12, 6.8%), Gln (n = 12, 6.8%), Cys (n = 7, 4.0%), and His (n = 7, 4.0%) and substitution of Asp by Asn (n = 7, 4.0%) are the most frequent amino acid substitutions (Fig. 1).

In silico evaluation of nsSNPs

As shown in Fig. 2, a total of 176 nsSNPs for human CYP1A2 gene were evaluated in a multi-step framework. A variant must be voted by all of the tools to proceed to the next step of the analysis. Firstly, all nsSNPs were evaluated by SIFT, PROVEAN, MutationAssessor, LRT, FATHMM-MKL, EFIN, CADD, PolyPhen2 and SNAP2 to identify functional nsSNPs. As a result, 38 nsSNPs were agreed to be associated with functional effects by all of the used methods (Supplementary Table S2). Subsequently, the isolated nsSNPs were subjected to pathogenicity evaluation using SuSPect, MutPred2, PMUT, PhD-SNP and VEST-4. 18 out of the 38 examined nsSNPs including G52R, L65P, G73R, G73W, L98Q, R108Q, R108W, R136C, E168K, F205V, T324R, E346K, R355W, R377Q, H388Y, R431W, F432S and R456H were classified as pathogenic by all five methods (Table 2). The evolutionary conservation profile was calculated for the amino acid position of these pathogenic variants using ConSurf^49,50 web server. The conservation scores calculated by this server range from 1 to 9, and discriminate between highly variable and highly conserved positions, respectively. The results included one position (136) with score of 8 and fifteen positions (52, 65, 73, 98, 108, 168, 205, 324, 346, 355, 377, 388, 431, 432 and 456) with score of 9 (Fig. 3), indicating that almost all of the pathogenic nsSNPs affect evolutionary conserved positions in CYP1A2 protein.

Table 2 Pathogenicity evaluation of functional CYP1A2 nsSNPs.

Full size table

The filtered pathogenic variants were further analyzed with CHASM-3.1⁴⁷ to assess possible association with cancer susceptibility. CHASM-3.1 consists of cancer-specific classifiers which allow to predict cancer driver variants depending on a particular cancer type. Since CYP1A2 is a hepatic enzyme, we selected liver-viral (hepatocellular carcinoma) to compute the cancer driver scores. The results reported in Table 3 revealed a possible association with hepatocellular carcinoma for G73R, G73W, R108Q, R108W, E168K, E346K, R431W, F432S and R456H variants (P-value < 0.05).

Table 3 Assessing the cancer susceptibility of pathogenic CYP1A2 nsSNPs using CHASM-3.1.

Full size table

Evaluation of nsSNPs occurring in transmembrane helix

The CYP1A2 is a membrane-bound protein which is anchored to the endoplasmic reticulum membrane through an N-terminal transmembrane helix. However, to date, no complete structure for CYP1A2 including this region has been determined. Hence, the sequence of CYP1A2 protein was submitted to TMHMM server v2.0 to predict the transmembrane helix. According to the server’s estimation, the transmembrane helix includes residues 7 to 28. Nine substitutions including S10L, L15F, S18C, S18Y, A19P, F21L, F25C, F25S and V27M have occurred in the transmembrane helical region, none of which were found to be associated with pathogenicity. Moreover, evolutionary conservation analysis of the nsSNPs located in this transmembrane helix did not found any conserved amino acid position other than Ser10 (Fig. 3).

Molecular dynamics simulation

In order to determine which of the cancer driver nsSNPs should be subjected to MD simulation, we used all evaluation tools with stringent threshold of effectiveness/deleteriousness (Fig. 4). As a result, two cancer driver nsSNPs G73W and F432S voted by all the tools were selected for the structural evaluation by MD simulation. The structure of CYP1A2 (PDB ID: 2HI4¹⁴) after removal of the ligand (alpha-naphthoflavone) was used as the wild-type (WT) protein. The initial structure of the G73W and F432S variants was obtained by substitution of the corresponding residues in the WT structure. Finally, variant and WT structures were subjected to 200 ns MD simulation to explore possible impacts of the substitutions on protein structure.

Root mean square deviation (RMSD) of the alpha carbon (Cα) atoms for each frame with respect to the starting (1D-RMSD) and to all other frames (2D-RMSD) as well as radius of gyration (Rg) along the simulation time were calculated (Fig. 5). By comparing the 1D-RMSD trend it was found that the G73W (1.95 ± 0.20 Å) behaves more or less similar to the WT (1.94 ± 0.21 Å), whereas the F432S demonstrates minor deviation in the Cα atom positions (2.24 ± 0.28 Å). The 2D-RMSD plots indicate that WT and G73W variant converged to relatively stable conformations after about 40 ns of simulations (Fig. 5C), while for F432S variant, such a stable conformation was achieved after about 80 ns, suggesting that F432S has experienced more structural changes before running out into a stable structure (Fig. 5C). The measurement of Rg as a function of the simulation time also implied that the structures converged after about 80 ns (Fig. 5B). Taking these findings together and to be statistically comparable, the analyses were focused on those trajectories obtained from the last 120 ns of simulations (from 80 to 200 ns) for all the three proteins.

In order to gain more insight into the local structural changes around substitution sites, we extracted a conformation with minimum free energy as the representative structure using free energy landscape analysis from the last 120 ns of each MD simulation. Gly73 is located in a short loop just after A helix (residues 61–72). Substitution of this residue by tryptophan renders the indole ring of Trp to be captured by the positive charge of the guanidinium group of Arg90. As a result, a cation-pi interaction formed between Trp73 and Arg90 after about 39 ns of the simulation. The distance between the indole ring and guanidinium group is maintained at about 4 Å for 70 ns, after that the magnitude of the fluctuations increased (Fig. 6A). In this regard, 161 snapshots were extracted at every 1 ns from 40 to 200 ns of the simulation time and explored for cation-pi interactions by CaPTURE program⁶¹. The results taken from CaPTURE also confirmed the formation of cation-pi interaction between Trp73 and Arg90, although it was attenuated after 120 ns (Fig. 6B). On the other hand, analysis of the secondary structure showed the C-terminal end of the A helix became unstable after establishment of cation-pi interaction between Trp73 and Arg90 (Fig. 6C). The next substitution site, Phe432, is located in a 3₁₀ helix flanked by helices K' and K". In WT structure, most of the residues located within a radius of 5 Å from side chain of Phe432 are nonpolar residues, of which, Ala370, Leu373, Trp421, Ile440, Leu444 and Met448 have been shown to be involved in van der Waals interactions with the aromatic moiety of Phe432. By comparing WT and F432S representative structures, it was cleared that substitution of serine residue with a small polar side chain for F432 has led to disappearance of these interactions.

We also conducted further analyses to explore the overall structural changes upon substitutions. The secondary structure content of the proteins was also measured during the analyzed time window. Both variants showed a decrease in the β-sheet (Fig. 7A) and α-helical content (Fig. 7B). The average number of β-sheet forming residues was reduced from 40 ± 2 in WT to 34 ± 3 in G73W variant and 33 ± 3 in F432S variant. The average number of residues participating in α-helix was also decreased from 220 ± 4 in WT to 212 ± 7 and 213 ± 6 in G73W and F432S, respectively. Detailed analysis of secondary structure elements revealed disruption of β-sheet 3′ in G73W variant (Fig. 7C) and β-sheet 4 in F432S variant (Fig. 7D). The results implied that no α-helix structure was completely lost, they were just shortened by one or more residues.

On the other hand, analysis of hydrogen bonds implied a decrease in the number of intramolecular hydrogen bonds in both variants as the average number of hydrogen bonds was reduced from 376 ± 9 in WT to 365 ± 9 and 363 ± 9 in G73W and F432S variants, respectively (Fig. 7E). It was also observed that the number of hydrogen bonds with occupancy above 70% has decreased from 264 in WT to 237 and 249 in G73W and F432S variants, respectively. The reduction in the number and strength of hydrogen bonds suggested a gain in the overall flexibility of the variants upon substitutions. So, to examine whether these substitutions affect protein overall flexibility, we performed principal component analysis (PCA). The Eigenvectors and eigenvalues were obtained from diagonalization of the covariance matrices of the Cα atoms, and the principal components were generated by projecting the trajectories on the respective eigenvectors (Fig. 7F and 7G). The trace of the diagonalized covariance matrix was found to be 530.27 Å², 693.11 Å² and 931.91 Å² for WT, G73W and F432S variants, respectively, confirming an increase in the overall flexibility of the variants, of which the increase in the F432S variant is more drastic compared to that of G73W.

In order to provide more insight into the protein structural flexibility, RMSF of the Cα atoms as a function of residue number was calculated over the last 120 ns (Fig. 8A). The RMSF graph has been highlighted with color blocks indicating α-helices and β-strands according to CYP1A2 crystallographic structure. The differences in per-residue RMSF (ΔRMSF) for G73W and F432S Cα atoms with respect to the WT were also measured and visualized in Fig. 8B. Positive values indicate more flexible residues and negative values show less flexible residues compared to those of WT. As seen in Figs. 8B and 8C, a significant increase in flexibility was measured for β-sheet 3′ and its flanking loops of G73W variant. Disruption of β-sheet 3′ due to breaking of two hydrogen bonds between Ala297 and Asn300 confirmed the higher flexibility in this region (Fig. 7C). In case of F432S variant, a sharp increase in fluctuation of the CD loop was particularly significant (Figs. 8B and 8C). Increased flexibility was also observed in F helix, FG loop, G helix and GH loop.

Calculation of RMSD for Cα of the CD loop during the entire course of the F432S simulation demonstrated displacement of this loop after about 14 ns of the simulation (Fig. 9A). In addition, monitoring of the CD loop interactions revealed that F432S has lost the hydrogen bonding network in this region of the protein (Supplementary Table S3). The salt bridge between Asp152 from this loop and Arg281 from GH loop has also disrupted (Fig. 9B). The removal of these interactions thought to be the reason for displacement and higher mobility of the CD loop as well as GH loop in F432S variant. Another notable change was the significant weakening of the conserved hydrogen bond between Arg137 of the C helix and the heme propionate oxygen which occurred shortly after CD loop movement (Fig. 9A). By looking at the results described for CD loop, it may be concluded that displacement of the CD loop together with its increased flexibility have induced breakage of the Arg137-heme hydrogen bond.

Discussion

In this study, we performed a comprehensive in silico evaluation to identify CYP1A2 gene pathogenic nsSNPs using a wide variety of computational tools. To our knowledge only one study has been carried out to evaluate the nsSNPs of human CYP1A2 gene. Wang et al. using two tools SIFT and PolyPhen analyzed the functional impact of thirty-three nsSNPs of CYP1A2 gene and reported eleven nsSNPs as damaging substitutions⁶². We expanded our study to include more nsSNPs and hypothesized that a more reliable and accurate estimate of a substitution consequence could be provided by using a variety of computational methods that follow different approaches to distinguish between pathogenic and neutral variants. Although all predictive methods have been developed to estimate whether a given substitution has functional/pathogenic effect, it does not necessarily mean that they can elucidate the mechanism how the SNPs affect protein function or cause disease. This question could be explored using other experimental or computational techniques including MD simulation.

To test our hypothesis, we initially annotated the nsSNPs using a variety of computational methods to distinguish between functional and neutral variants. Assessing the pathogenicity of functional nsSNPs identified 18 pathogenic nsSNPs. Evolutionary conservation analysis indicated that almost all of the pathogenic nsSNPs occupy conserved amino acid positions. Moreover, the results obtained from CHASM-3.1 revealed a possible association between G73R, G73W, R108Q, R108W, E168K, E346K, R431W, F432S and R456H with risk of developing hepatocellular carcinoma.

The results of this study are in fairly good agreement with those published by Ito and colleagues. They reported reduced activity for CYP1A2*4 (I386F), CYP1A2*6 (R431W), CYP1A2*8 (R456H), CYP1A2*11 (F186L), CYP1A2*15 (P42R), CYP1A2*16 (R377Q) and CYP1A2*21 (S298R and Y495Ter) toward phenacetin and 7-ethoxyresorufin substrates. The nonsense substitution (Y459Ter) of CYP1A2*21 results in a truncated protein that reduces the activity of the enzyme²⁵. Moreover, two allelic variants CYP1A2*14 (T438I) and CYP1A2*20 (D436N) showed higher activity for phenacetin compared with wild-type enzyme. In the current study, P42R, R377Q, I386F, R431W and R456H variants were predicted as functional variants, of which R431W and R456H variants were also found to be associated with pathogenicity and cancer susceptibility.

Among nsSNPs predicted as cancer drivers, G73W and F432S were still voted by all methods even after applying more stringent thresholds. Accordingly, these variants were subjected to 200 ns MD simulations to explore the effect of substitutions on the protein structure. Findings demonstrated that these substitutions change protein structural features not only in proximity of the substituted residues but also in spatially distant regions. Both variants experienced a reduction in the number and strength of intramolecular hydrogen bonds as well as in β-sheet and α-helical content. Results derived from the principal component analysis (PCA) confirmed an increase in the overall flexibility especially for F432S variant. A drastic increase was also found for the CD loop flexibility which is a long serine-rich stretch (residues 148–158) extended into the solvent. Increased mobility of the CD loop has been recently reported upon simulation of R377Q²⁷. In this regard, the experimentally observed loss of the enzymatic activity in R377Q variant has been attributed to the reduced heme stability due to the increased flexibility of the C helix, which is adjacent to the CD loop. The C helix is also adjacent to the heme prosthetic group and interacts with heme propionate oxygen via a conserved arginine residue (Arg137). Hence, any change in flexibility of the C helix could affect the stability of the heme. Moreover, C helix is one of the main regions involved in interaction with redox partners like cytochrome b5 (CYB5). CYPs binding to CYB5 is mediated through a groove on the proximal surface of the protein which includes C helix. There are also growing evidences for the involvement of CD loop in binding of some CYPs to the CYB5, although it appears the CYB5 interactive elements of various CYPs are type specific. For example, the interacting region on CYP3A4 in apo form consists of helices B, C, D, BB' and CD loops, β-bulge and meander region while on CYP2E1 is provided by helices C, J', L, β-bulge and meander region^63,64. Taken together, it seems reasonable to expect that the high mobility of the CD loop in F432S may affect heme stability as well as interaction with CYB5.

Conclusion

CYP1A2 is one of the main hepatic CYPs involved in the bioactivation of carcinogens and metabolism of clinically used drugs. Hence, nsSNPs of this enzyme could affect cancer susceptibility and drug efficiency. In current study, using a variety of computational tools, 38 out of 176 nsSNPs of human CYP1A2 gene were predicted as functional variants. The functional nsSNPs were further analyzed to trace possible association with pathogenicity and cancer susceptibility. As a result, 18 nsSNPs predicted as pathogenic, of which G73R, G73W, R108Q, R108W, E168K, E346K, R431W, F432S and R456H variants were also found to be associated with hepatocellular carcinoma. We also performed 200 ns MD simulations to explore how G73W and F432S cancer driver variants affect the protein structure. Simulation results revealed several significant structural alterations, particularly for F432S variant. Among them, increased flexibility of the CD loop and loss of the hydrogen bond between heme and Arg137 from C helix were the most prominent ones, because they could affect the heme stability as well as the protein interaction with cytochrome b5. These findings may be considered in designing experimental studies and provide novel insights into understanding the structure–function relationship in CYP1A2 and other CYPs.

References

Kruglyak, L. & Nickerson, D. A. Variation is the spice of life. Nat. Genet. 27, 234–236 (2001).
Article CAS PubMed Google Scholar
Shastry, B. S. SNP alleles in human disease and evolution. J. Hum. Genet. 47, 561–566 (2002).
Article CAS PubMed Google Scholar
Jia, M. et al. Computational analysis of functional single nucleotide polymorphisms associated with the CYP11B2 gene. PLoS ONE 9, e104311 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
George, D. C. P. et al. Evolution-and structure-based computational strategy reveals the impact of deleterious missense mutations on MODY 2 (maturity-onset diabetes of the young, type 2). Theranostics 4, 366–385 (2014).
Article MathSciNet PubMed PubMed Central CAS Google Scholar
AbdulAzeez, S. & Borgio, J. F. In-silico computing of the most deleterious nsSNPs in HBA1 gene. PLoS ONE 11, e0147702 (2016).
Article PubMed PubMed Central CAS Google Scholar
Kelly, J. N. & Barr, S. D. In silico analysis of functional single nucleotide polymorphisms in the human TRIM22 gene. PLoS ONE 9, e101436 (2014).
Article ADS PubMed PubMed Central CAS Google Scholar
Pires, A. S., Porto, W. F., Franco, O. L. & Alencar, S. A. In silico analyses of deleterious missense SNPs of human apolipoprotein E3. Sci. Rep. 7, 2509 (2017).
Article ADS PubMed PubMed Central CAS Google Scholar
Evans, W. E. & Relling, M. V. Pharmacogenomics: translating functional genomics into rational therapeutics. Science 286, 487–491 (1999).
Article CAS PubMed Google Scholar
Sim, S. C. & Ingelman-Sundberg, M. The Human Cytochrome P450 (CYP) Allele Nomenclature website: a peer-reviewed database of CYP variants and their associated effects. Hum. Genom. 4, 278–281 (2010).
Article CAS Google Scholar
Preissner, S. C. et al. Polymorphic cytochrome P450 enzymes (CYPs) and their role in personalized therapy. PLoS ONE 8, e82562 (2013).
Article ADS PubMed PubMed Central CAS Google Scholar
Nelson, D. R. The cytochrome p450 homepage. Hum. Genom. 4, 59–65 (2009).
Article CAS Google Scholar
Zanger, U. et al. Genetics, epigenetics, and regulation of drug-metabolizing cytochrome P450 enzymes. Clin. Pharmacol. Ther. 95, 258–261 (2014).
Article CAS PubMed Google Scholar
Nebert, D. W. & Dalton, T. P. The role of cytochrome P450 enzymes in endogenous signalling pathways and environmental carcinogenesis. Nat. Rev. Cancer 6, 947–960 (2006).
Article CAS PubMed Google Scholar
Sansen, S. et al. Adaptations for the oxidation of polycyclic aromatic hydrocarbons exhibited by the structure of human P450 1A2. J. Biol. Chem. 282, 14348–14355 (2007).
Article CAS PubMed Google Scholar
Brøsen, K. Drug interactions and the cytochrome P450 system. Clin. Pharmacokinet. 29, 20–25 (1995).
Article PubMed Google Scholar
Kim, D. & Guengerich, F. P. Cytochrome P450 activation of arylamines and heterocyclic amines. Annu. Rev. Pharmacol. Toxicol. 45, 27–49 (2005).
Article CAS PubMed Google Scholar
Tassaneeyakul, W. et al. Caffeine metabolism by human hepatic cytochromes P450: contributions of 1A2, 2E1 and 3A isoforms. Biochem. Pharmacol. 47, 1767–1776 (1994).
Article CAS PubMed Google Scholar
Skene, D. J. et al. Contribution of CYP1A2 in the hepatic metabolism of melatonin: studies with isolated microsomal preparations and liver slices. J. Pineal Res. 31, 333–342 (2001).
Article CAS PubMed Google Scholar
Wang, B. & Zhou, S.-F. Synthetic and natural compounds that interact with human cytochrome P450 1A2 and implications in drug development. Curr. Med. Chem. 16, 4066–4218 (2009).
Article CAS PubMed Google Scholar
Spaldin, V. et al. Determination of human hepatic cytochrome P4501A2 activity in vitro use of tacrine as an isoenzyme-specific probe. Drug Metab. Dispos. 23, 929–934 (1995).
CAS PubMed Google Scholar
Bertilsson, L. et al. Clozapine disposition covaries with CYP1A2 activity determined by a caffeine test. Br. J. Clin. Pharmacol. 38, 471–473 (1994).
Article CAS PubMed PubMed Central Google Scholar
Sarkar, M. A., Hunt, C., Guzelian, P. S. & Karnes, H. T. Characterization of human liver cytochromes P-450 involved in theophylline metabolism. Drug Metab. Dispos. 20, 31–37 (1992).
CAS PubMed Google Scholar
Pichard, L. et al. Oxidative metabolism of zolpidem by human liver cytochrome P450S. Drug Metab. Dispos. 23, 1253–1262 (1995).
CAS PubMed Google Scholar
Sigrist, C. J. et al. New and continuing developments at PROSITE. Nucl. Acids Res. 41, D344–D347 (2012).
Article PubMed CAS PubMed Central Google Scholar
Ito, M., Katono, Y., Oda, A., Hirasawa, N. & Hiratsuka, M. Functional characterization of 20 allelic variants of CYP1A2. Drug Metab. Pharmacokinet. 30, 247–252 (2015).
Article CAS PubMed Google Scholar
Lim, Y.-R. et al. Functional significance of cytochrome P450 1A2 allelic variants, P450 1A2*8, *15, and *16 (R456H, P42R, and R377Q). Biomol. Ther. 23, 189–194 (2015).
Article CAS Google Scholar
Watanabe, Y. et al. Prediction of three-dimensional structures and structural flexibilities of wild-type and mutant cytochrome P450 1A2 using molecular dynamics simulations. J. Mol. Graph. Model. 68, 48–56 (2016).
Article CAS PubMed Google Scholar
Zhang, T., Liu, L. A., Lewis, D. F. & Wei, D.-Q. Long-range effects of a peripheral mutation on the enzymatic activity of cytochrome P450 1A2. J. Chem. Inf. Model. 51, 1336–1346 (2011).
Article CAS PubMed Google Scholar
Ying, B.-L., Fa, B.-T., Cong, S., Zhong, Y. & Wang, J.-F. Insight into the mutation-induced decrease of the enzymatic activity of human cytochrome P450 1A2. Med. Chem. 6, 174–178 (2016).
Article CAS Google Scholar
Apweiler, R. et al. UniProt: the universal protein knowledgebase. Nucl. Acids Res. 32, D115–D119 (2004).
Article CAS PubMed PubMed Central Google Scholar
Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucl. Acids Res. 29, 308–311 (2001).
Article CAS PubMed PubMed Central Google Scholar
Berman, H. M. et al. The protein data bank. Nucl. Acids Res. 28, 235–242 (2000).
Article ADS CAS PubMed PubMed Central Google Scholar
Sim, N.-L. et al. SIFT web server: predicting effects of amino acid substitutions on proteins. Nucl. Acids Res. 40, W452–W457 (2012).
Article CAS PubMed PubMed Central Google Scholar
Choi, Y., Sims, G. E., Murphy, S., Miller, J. R. & Chan, A. P. Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7, e46688 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Reva, B., Antipin, Y. & Sander, C. Predicting the functional impact of protein mutations: application to cancer genomics. Nucl. Acids Res. 39, e118–e118 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zeng, S., Yang, J., Chung, B.H.-Y., Lau, Y. L. & Yang, W. EFIN: predicting the functional impact of nonsynonymous single nucleotide polymorphisms in human genome. BMC Genom. 15, 455 (2014).
Article Google Scholar
Chun, S. & Fay, J. C. Identification of deleterious mutations within three human genomes. Genome Res. 19, 1553–1561 (2009).
Article CAS PubMed PubMed Central Google Scholar
Shihab, H. A. et al. An integrative approach to predicting the functional effects of non-coding and coding sequence variation. Bioinformatics 31, 1536–1543 (2015).
Article CAS PubMed PubMed Central Google Scholar
Capriotti, E., Calabrese, R. & Casadio, R. Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information. Bioinformatics 22, 2729–2734 (2006).
Article CAS PubMed Google Scholar
Kircher, M. et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310 (2014).
Article CAS PubMed PubMed Central Google Scholar
Adzhubei, I. A. et al. A method and server for predicting damaging missense mutations. Nat. Methods 7, 248 (2010).
Article CAS PubMed PubMed Central Google Scholar
Hecht, M., Bromberg, Y. & Rost, B. Better prediction of functional effects for sequence variants. BMC Genom. 16, S1 (2015).
Article CAS Google Scholar
Yates, C. M., Filippis, I., Kelley, L. A. & Sternberg, M. J. SuSPect: enhanced prediction of single amino acid variant (SAV) phenotype using network features. J. Mol. Biol. 426, 2692–2701 (2014).
Article CAS PubMed PubMed Central Google Scholar
Ferrer-Costa, C. et al. PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics 21, 3176–3178 (2005).
Article CAS PubMed Google Scholar
Pejaver, V. et al. MutPred2: inferring the molecular and phenotypic impact of amino acid variants. bioRxiv, 134981 (2017).
Carter, H., Douville, C., Stenson, P. D., Cooper, D. N. & Karchin, R. Identifying Mendelian disease genes with the variant effect scoring tool. BMC Genom. 14, S3 (2013).
Article Google Scholar
Carter, H. et al. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer Res. 69, 6660–6667 (2009).
Article CAS PubMed PubMed Central Google Scholar
Douville, C. et al. CRAVAT: cancer-related analysis of variants toolkit. Bioinformatics 29, 647–648 (2013).
Article CAS PubMed PubMed Central Google Scholar
Berezin, C. et al. ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20, 1322–1324 (2004).
Article CAS PubMed Google Scholar
Ashkenazy, H. et al. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res. 44, W344–W350 (2016).
Article CAS PubMed PubMed Central Google Scholar
Krogh, A., Larsson, B., Von Heijne, G. & Sonnhammer, E. L. Predicting transmembrane protein topology with a hidden markov model: application to complete genomes. J. Mol. Biol. 305, 567–580 (2001).
Article CAS PubMed Google Scholar
Abraham, M. J. et al. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX 1, 19–25 (2015).
Article ADS Google Scholar
Huang, J. & MacKerell, A. D. CHARMM36 all-atom additive protein force field: validation based on comparison to NMR data. J. Comput. Chem. 34, 2135–2145 (2013).
Article CAS PubMed PubMed Central Google Scholar
Guex, N. & Peitsch, M. C. SWISS-MODEL and the Swiss-Pdb Viewer: an environment for comparative protein modeling. Electrophoresis 18, 2714–2723 (1997).
Article CAS PubMed Google Scholar
Bussi, G., Donadio, D. & Parrinello, M. Canonical sampling through velocity rescaling. J. Chem. Phys. 126, 014101 (2007).
Article ADS PubMed CAS Google Scholar
Parrinello, M. & Rahman, A. Polymorphic transitions in single crystals: a new molecular dynamics method. J. Appl. Phys. 52, 7182–7190 (1981).
Article ADS CAS Google Scholar
Hess, B., Bekker, H., Berendsen, H. J. & Fraaije, J. G. LINCS: a linear constraint solver for molecular simulations. J. Comput. Chem. 18, 1463–1472 (1997).
Article CAS Google Scholar
Darden, T., York, D. & Pedersen, L. Particle mesh Ewald: An N log (N) method for Ewald sums in large systems. J. Chem. Phys. 98, 10089–10092 (1993).
Article ADS CAS Google Scholar
Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577–2637 (1983).
Article CAS PubMed Google Scholar
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
CAS PubMed Google Scholar
Gallivan, J. & Dougherty, D. Cation-π interactions in structural biology. Proc. Natl. Acad. Sci. U.S.A. 96, 9459–9464 (1999).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, L.-L., Li, Y. & Zhou, S.-F. A bioinformatics approach for the phenotype prediction of non-synonymous single nucleotide polymorphisms in human cytochromes P450. Drug Metab. Dispos. 37, 977–991 (2009).
Article PubMed CAS Google Scholar
Zhao, C. et al. Cross-linking mass spectrometry and mutagenesis confirm the functional importance of surface interactions between CYP3A4 and holo/apo cytochrome b5. Biochemistry 51, 9488–9500 (2012).
Article CAS PubMed Google Scholar
Gao, Q. et al. Identification of the interactions between cytochrome P450 2E1 and cytochrome b5 by mass spectrometry and site-directed mutagenesis. J. Biol. Chem. 281, 20404–20417 (2006).
Article CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Biophysics and Computational Biology Laboratory (BCBL), Department of Biology, College of Sciences, Shiraz University, Shiraz, Iran
Leila Navapour & Navid Mogharrab

Authors

Leila Navapour
View author publications
You can also search for this author in PubMed Google Scholar
Navid Mogharrab
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

N.M. and L.N. designed the research. L.N. performed computational analyses and MD simulations. N.M and L.N. contributed to results interpretation and wrote the manuscript.

Corresponding author

Correspondence to Navid Mogharrab.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Navapour, L., Mogharrab, N. In silico screening and analysis of nonsynonymous SNPs in human CYP1A2 to assess possible associations with pathogenicity and cancer susceptibility. Sci Rep 11, 4977 (2021). https://doi.org/10.1038/s41598-021-83696-x

Download citation

Received: 16 June 2020
Accepted: 03 February 2021
Published: 02 March 2021
DOI: https://doi.org/10.1038/s41598-021-83696-x
Springer Nature Limited

This article is cited by

Computational analysis of non-synonymous SNPs in the human LCN2 gene
- Kaniha Sivakumar
- Usha Subbiah
Egyptian Journal of Medical Human Genetics (2024)
The role of SKA2 on affective disorder, post-traumatic stress disorder and suicide behavior: systematic review and in silico analysis
- Thelma Beatriz González-Castro
- Itzel Rodríguez-Fuentes
- Jorge Luis Hernández-Vicencio
Metabolic Brain Disease (2024)
Computational analysis of missense variant CYP4F2*3 (V433M) in association with human CYP4F2 dysfunction: a functional and structural impact
- Mahvash Farajzadeh-Dehkordi
- Ladan Mafakher
- Babak Rahmani
BMC Molecular and Cell Biology (2023)
Dynamic insights into the effects of nonsynonymous polymorphisms (nsSNPs) on loss of TREM2 function
- Raju Dash
- Yeasmin Akter Munni
- Il Soo Moon
Scientific Reports (2022)

In silico screening and analysis of nonsynonymous SNPs in human CYP1A2 to assess possible associations with pathogenicity and cancer susceptibility

Abstract

Similar content being viewed by others

Introduction

Methods

Data collection

In silico evaluation of nsSNPs

Evolutionary conservation analysis

Prediction of transmembrane helix

Molecular dynamics simulation

Trajectory analysis and visualization

Results

The SNP dataset

In silico evaluation of nsSNPs

Evaluation of nsSNPs occurring in transmembrane helix

Molecular dynamics simulation

Discussion

Conclusion

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation