Introduction

Polymorphism in α adducin 1 (ADD1) gene is associated with hypertension disease. Hypertension is defined as blood pressure measurements of 140/90 mm Hg or greater. It is considered to be polygenic, where candidate genes are selected from pathways that are implicated in blood pressure regulation [1]. Several findings have related to genes of the renin–angiotensin–aldosterone system in which variation in the angiotensinogen gene (AGT) has been associated with increased AGT levels and blood pressure in many distinct populations. A common variant in the angiotensin-converting enzyme (ACE) gene has also been associated in some studies with blood pressure variation in men [2]. The action of angiotensin II is triggered through stimulation of angiotensin II type 1 receptor (AT1R) [3], which form an important target for control of angiotensin II-dependent hypertension [4]. Alterations in the regulation of PLD2 gene have also been found to play an important role in the development of hypertension [5]. In our previous study, we have reported the association of PLD2 gene with hypertension [6]. In this report, we have analyzed the gene that shares the potential of influencing blood pressure via sodium homeostasis. ADD1 is one such gene that displays larger blood pressure changes with body sodium variation, causing hypertension [7].

Single-nucleotide polymorphisms (SNPs) are a valuable resource for investigating the genetic basis of disease [8]. ADD1 gene polymorphism has the potential of influencing blood pressure via sodium homeostasis [9]. ADD1 gene has been investigated worldwide in the search for the inheritable determinants of blood pressure phenotype in humans, as reports have shown a positive association between polymorphism of the ADD1 gene and hypertension [10]. ADD1 protein, found in the renal tubule, is involved in cellular signal transduction and interacts with other membrane-skeleton proteins that affect ion transport across the cell membrane [11, 12]. Mutated ADD1 may induce an alteration on actin–spectrin–based membrane skeleton [13] that may affect the regulation of other factors in the Na transport system, such as anion exchanger, epithelial Na channels [14, 15] and Na–K–Cl cotransport [16] in the luminal part of the cell. Thus, ADD1 can be considered as a ‘renal hypertensive gene’ that affects the capacity of the tubular epithelial cell to transport Na and hence, affects blood pressure. Abnormalities in renal sodium reabsorption may be involved in the development and maintenance of experimental and clinical hypertension [7]. ADD1 has been considered as a candidate gene for hypertension since the first report by Cusi et al. [10]. Therefore, we have selected ADD1 gene for our study to understand its association with hypertension. Bioinformatics tools used to retrieve data about SNPs based on gene of interest have been documented [8]. This study is mainly concerned with the effect of ADD1 polymorphism on population variability and structural changes in α adducin 1 protein by computational methods to understand its possible implications on hypertension.

The SNP, rs4961, showed significant damaging effect and SNP variability with large differences among the minor allele frequency observed in various populations. This polymorphism causing a change from aliphatic to aromatic amino acid found in the coiled and disordered region might alter the function of that region of the protein and affect its stability.

Materials and Methods

Data Source and Selection of nsSNPs

A total of 1,113 SNPs associated with ADD1 gene were retrieved from the single-nucleotide polymorphism database (dbSNP) [17]. We used functional single-nucleotide polymorphism (F-SNP) database [18] for selecting nsSNPs of ADD1 gene. The Ensembl database providing genome annotations [19] was mined by F-SNP to identify nsSNPs.

Identifying Deleterious nsSNPs Using F-SNP Database

F-SNP database provides the functional information about SNPs with respect to protein coding, splicing regulation and transcriptional regulation by mining variety of web services and databases. To predict the damaging effect of coding nsSNPs [18], F-SNP mined Sorting Intolerant from Tolerant (SIFT) [20], Polymorphism Phenotyping (PolyPhen) [21], SNPeffect [22] and SNPs3D [23]. Other computational tools such as, Exonic splicing enhancer (ESEfinder) [24], RESCUE-ESE [25], Exonic splicing regulator (ESRSearch) [26] and Putative Exonic splicing enhancer (PESE) [27] were mined to identify SNPs in exonic splice regions. Golden Path was mined to identify SNPs in transcriptional regulatory regions [28].

Assessing Population-Based Genotypic Study of nsSNPs Using SPSmart Tool

SNPs for Population Studies (SPSmart) tool [29] was used for accessing and combining large-scale genomic databases of SNPs in human population genetics. The SPSmart engine encompasses a range of data sets: Hap-Map release, the Stanford University and University of Michigan CEPH-HGDP, i.e., Centre d’Etude du Polymorphisme Humain (CEPH) Human Genome Diversity Panel (HGDP) and the Perlegen SNP data set [29].

Comparative Modeling of Wild and Mutant Proteins and Predicting Disordered Regions

Modeller was executed to build protein models from the templates obtained from sequence similarity with the target protein sequence [30]. We used DisEMBL tool to predict the disordered/unstructured regions within a protein sequence [31].

Results and Discussion

Out of 1,113 SNPs associated with ADD1 gene, 9 are identified to be non-synonymous by F-SNP database and are listed in Table 1 with their corresponding allele change. We employed SIFT and PolyPhen tools to obtain the functional impact of nsSNPs, resulting in amino acid changes. Six nsSNPs, rs4971, rs4972, rs4962, rs4963, rs4690006 and rs4961 are predicted to be damaging by SIFT with tolerance index score of ≤0.05 (Table 1). From sequence homology, SIFT calculated the tolerance index scores ranging from 0 to 1, where 0 is damaging and 1 is neutral. SIFT used position-specific scoring matrices to compute the tolerance index for all possible substitutions [20]. Score of ≤0.05 is the cut-off to identify damaging SNPs by SIFT program [20, 32]. Lower the tolerance index, more functional impact a particular amino acid substitution is likely to have [32]. On the other hand, four nsSNPs, rs4971, rs4972, rs4962 and rs4961 are predicted to be damaging by PolyPhen with position-specific independent count (PSIC) scores of >1.5 (Table 1). PolyPhen program used sequence conservation, structure to model position of amino acid substitution, calculated PSIC scores for each of the two variants and then computed the PSIC scores difference between them [32]. Score of ≥1.5 is the cut-off to identify damaging SNPs [21]. SNP with PSIC score difference of higher value is predicted to be damaging [32].

Table 1 Functional effect of nsSNPs obtained from SIFT and PolyPhen

F-SNP assesses the deleterious effect of SNPs by calculating a specific functional significance (FS) score for each nsSNP. The deleterious SNP has a FS score value between 0.5 and 1 [18]. Seven nsSNPs, rs4971, rs4972, rs4962, rs4963, rs11792, rs4690006 and rs4961 are found to have deleterious FS scores in the range of 0.5–1. They are found to be deleterious by having changes in the protein-coding region except for one nsSNP, rs11792. Putative ESEs are predicted for all nsSNPs by the change in splicing regulation region. Numerous disease-associated polymorphisms exert their effects by disrupting the activity of ESEs which are additional oligonucleotide sequences other than splice sites, enhancing splicing from an exonic location [25, 33]. A change in transcriptional regulatory region is examined for rs4971, rs4972, rs4961, rs11792 and rs4690006 (Table 2). This can alter binding sites, and thus disrupt proper gene regulation [18]. nsSNPs destabilize proteins, interfere with the formation of domain–domain interfaces, have an effect on protein–ligand binding or severely affect human health [34]. The functional effect of nsSNPs causing genetic disorder like dementia has been reported, where bioinformatics tools are used for analysis [35]. The other two nsSNPs (rs13306092 and rs13306093) are predicted to have low FS scores and are not considered for our further study.

Table 2 Functional prediction of nsSNPs by F-SNP database

The evolutionary approach to the identification of functionally significant SNPs is based on the patterns of genetic variation in populations which can provide the evidence of functional polymorphism [36]. This is assessed on nsSNPs (rs4971, rs4972, rs4962, rs4963, rs11792, rs4690006 and rs4961) to compare SNP genotypes by obtaining the allele frequencies of populations across different databases by employing SPSmart tool [29]. SNPs with minor allele frequency (MAF) >0.05 are examined for disease association [37]. Among seven nsSNPs, rs4963 and rs4961 are predicted to have population variability with MAF value >0.05 in different populations within two datasets (Hap Map and Perlegen Browser) (Table 3). The other nsSNPs though have deleterious FS scores, no difference in MAF is found among the populations and hence they are not considered to be significantly damaging. For rs4961, MAF value is found to be >0.05 in various populations such as AFR (Africa), EUR (Europe), EAS (East Asia), GIH (Gujrat Indians in Houston, Texas), MEX (Mexican), AFA (African American), EUA (European American) and HCH (Han Chinese). On the other hand, for rs4961, MAF value is found to be >0.05 in populations such as AFR, EUR, EAS, AFA, EUA and HCH. Compared with rs4963, the SNP, rs4961 shows large differences in MAF observed among more number of populations. Hence, this nsSNP, rs4961 is selected for structural analysis.

Table 3 Minor allele frequencies in different populations predicted by SPSmart tool

Significant change in protein-coding region is found for rs4961 that replaces glycine460 (G460) in ADD1 protein with tryptophan (W) because of the nucleotide change at 2,038th position where GGG is replaced with TGG i.e., G (guanine) to T (thymine). This indicates a change from alkyl group to an aromatic group in the side chains of the amino acids (the base represented in bold caption is the nsSNP.) Both wild and mutant protein models for SNP, rs4961 are obtained by comparative protein modeling [30]. The template, 3OCR having E-value 0.0 with better sequence identity with the query protein is chosen for significant target-template sequence alignments and is finally selected to construct the wild and the variant, rs4961 protein models of ADD1. The change in amino acids from G to W at 460 position is observed to be in the loop region of the modeled protein as shown in Fig. 1. α Helices and β sheets are the majority of secondary structures found in proteins which are interspersed with regions of irregular structure referred to as coil. They can possess structural significance, and can be the location of the functional portion or active site of the protein [38]. The direction of loop regions for both native and mutant proteins is also found to be altered as depicted in Fig. 1. Experimental determination of protein structures has shown that loop regions are disordered, and thus do not achieve a stable structure. This type of loop region is referred to as random-coil [38].

Fig. 1
figure 1

The native protein structure with glycine (460) and mutant protein structure with tryptophan (460) for SNP rs4961

The change in amino acid residues is also located in the disordered region of the protein as predicted by DisEMBL. Thermodynamically, the disorder in a polypeptide chain is defined as the random-coil structural state. The sequence of amino acids determines not only the structure of a protein but also the lack of structure [39]. Many proteins have regions that remain unstructured which are referred to as disordered [40]. Large group of functional sites are found primarily in unstructured parts of proteins [39]. It is essential to be able to predict which regions of a target protein are potentially disordered/unstructured [31] and polymorphism occurring in this region can alter the structure and function of the protein. rs4961 also shows SNP variability with large differences among minor allele frequency observed in various populations. Population-wide allele frequencies and SNP variation study among populations are important for determining the relevance of a disease-associated polymorphism and can be an efficient way to identify genetic regions or genes implicated in complex disease and traits [41, 42].

Our in silico analysis reports that the SNP, rs4961 expressing the amino acid variant (G460W) has significant damaging effect and can be considered to be functionally important. It has been reported that G460W genotype for rs4961 of the ADD1 gene is associated with erythrocyte sodium transport [43]. Manunta et al. have reported that the tryptophan (Trp) adducin variants are characterized by reduced fractional excretions of lithium and uric acid, which suggests increased proximal sodium reabsorption, thereby causing the risk for hypertension [7]. The structural analysis of ADD1 protein shows G460W to be in the coiled and disordered region and hence, this polymorphism causing a change from aliphatic to aromatic amino acid may alter the function of that region of the protein and affect its stability. SNP that alters the amino acid sequence (nsSNPs) appears to affect the stability of protein structure [44]. The primary structure (sequence of amino acids) determines the 3D structure and shape of the protein which in turn determines the function that the protein deals with. Because there is a change in the amino acid sequence because of the polymorphism in ADD1 gene, protein 3D change is visualized, considerably leading to an increase in tubular sodium reabsorption and the risk for hypertension [43]. Hypertensive patients carrying the 460W allele of the gene encoding the cytoskeleton protein α-adducin when compared with those having the wild-type G460 variant, show an enhanced proximal tubular renal reabsorption of sodium [45] and experience larger blood pressure changes in response to sodium loading or diuretic treatment [10, 46]. It has been reported that patients with 460W allele display larger blood pressure changes by modulating the capacity of tubular epithelial cells to transport sodium [7].

We have examined the effects of SNPs using a succession of tools. This methodology should be structured and proposed as a standard screening process for the assessment of the effects of SNPs’ on genes. Bioinformatics tools play an important role in extracting information from databases, followed by subsequent analysis. It helps in predicting the functional effect of SNPs associated with genes related to certain diseases. Bioinformatics tools and databases also include structure/function annotations of genes and proteins, disease correlations and population variations. However, the only limitation in this analysis is that it might not take into account some newly discovered SNPs which have not been included in the dbSNP. Our analysis made from these in silico studies, may be useful to understand the genetic mechanism underlying the development of the disease and can provide evidence for the functional polymorphism when compared with different genetic profiles among populations.

Based on the overall results from this study, we can ascertain that the mutation from glycine to tryptophan at the residue position 460 in the ADD1 native protein is a potential candidate for the cause of hypertension by ADD1 gene. The result reported from our study with ADD1 gene associated with hypertension is well supported by experimental studies carried out earlier on ADD1 gene [7, 9]. Thus, ADD1 can be considered as a ‘renal hypertensive gene’ that affects the capacity of the tubular epithelial cell to transport Na and, hence, affects blood pressure [7].