Introduction

C-reactive protein (CRP) is a major acute phase protein released from hepatocytes in response to any kind of stress in the body (Black et al. 2004). Elevated level of CRP independently confer risk for metabolic syndrome, cardiovascular disease and type-2 diabetes mellitus (T2DM), globally including Indians, a high-risk group for metabolic disorders (Ridker et al. 2004; Mahajan et al. 2009, 2012). Several factors such as age, gender, ethnicity, obesity and lifestyle affects CRP level (Ford et al. 2003; Khera et al. 2005; Rojo-Martinez et al. 2013), but genetic factors also account for substantial heritability (\(\sim \)35–56%) (Pankow et al. 2001; Wessel et al. 2007).

Ample evidences exist for higher CRP level in Indians as compared to Europeans, Caucasians, Chinese and Malays (Chambers et al. 2001; Chandalia et al. 2003; Khoo et al. 2011). Most large-scale whole-genome assays were conducted to link circulating CRP level to genes and regulatory regions involved in metabolic, weight homoeostasis and immune pathways in European population (Benjamin et al. 2007; Reiner et al. 2008; Ridker et al. 2008; Elliott et al. 2009; Dehghan et al. 2011). Few genomewide association studies (GWAS) have included nonEuropean subjects (Okada et al. 2011; Reiner et al. 2012; Wu et al. 2012; Dorajoo et al. 2013; Kong and Lee 2013; Kocarnik et al. 2014). Some studies (Elliott et al. 2009; Dorajoo et al. 2013) identified genetic variants associated with CRP level inclusive of Indian subjects and identified known variants in LEPR, IL6R, CRP, HNF1A, APOC1, APOE, GCKR, IL6 and IL1F10, but none of these Indian subjects were living in India. Indians have a different food habit, lifestyle patterns and genetic heterogeneity from other populations of the world (Tabassum et al. 2011, 2013; Giri et al. 2014), that may account for altered CRP level in resident Indians as compared to nonresident Indians. Such differences highlight the need for a separate genetic study of CRP level in this population. This may also help in understanding the genetics of complex disorders that are associated with CRP level (Plomin et al. 2009). Although, earlier we ourselves queried for known genetic determinants of CRP in Indians using the candidate gene-based approach (Mahajan et al. 2011). However, we did not look for contribution of genomewide variants that could explain population-specific genetic players regulating CRP.

The present study aim to investigate genomewide common variants associated with the plasma CRP level in a large sample size of nearly 4493 Indian subjects of Indo-European origin.

Materials and methods

Study population

Total 4493 apparently healthy subjects, who were volunteers of Indian Diabetes Consortium (INDICO) (INdian DIabetes Consortium 2011), residing in and around Delhi were enrolled for the current study. Blood samples from individuals were collected after overnight fasting. Subjects are well characterized for biochemical and anthropometric measures as mentioned previously (Mahajan et al. 2009). Plasma CRP was measured using an ELISA kit (Biocheck, USA) and Cobas Integra 400 Plus (Roche Diagnostics, Mannheim, Germany) via immunoturbidimetric assay. CRP values are reported in mg/L.

This study was approved by human research ethics committees of All India Institute of Medical Sciences and CSIR-Institute of Genomics and Integrative Biology, Delhi, India. This study was carried out in accordance with principles of Helsinki Declarations. Prior informed written consent was obtained from all study subjects before their participation in the study.

Discovery phase

Discovery phase (stage I) subjects were genotyped as part of the T2DM GWAS conducted in our laboratory using Illumina Human 610-quad bead chips (Tabassum et al. 2013). Normoglycemic subjects, who were controls in T2DM GWAS, were chosen for the current study. Data for the subjects were generated as described previously (Tabassum et al. 2013). Stringent sample and single-nucleotide polymorphism (SNP) quality controls were followed. Subjects with call rate <0.95, sex disparities and low/overt heterozygosity (3 standard deviation (SD) away from mean value) were excluded. Further, SNPs with minor allele frequency (MAF) <0.01 were removed. Among SNPs with MAF>0.01, those with a call rate of <0.99 and Hardy–Weinberg equilibrium (HWE) P value \(<1\times 10^{-6}\) were excluded. Mitochondrial and sex chromosome SNPs were also discarded. SNP pruning was performed using –indep-pairwise command with \(r^{2}\) of 0.2 and a window size of 50 using PLINK v1.07 (Purcell et al. 2007). Identity-by-descent analysis was performed using 118,496 independent pruned SNPs. Related subjects were detected based on Pi-hat score >0.1875 and member of pair with a lower sample call rate was removed. Principal component analysis was used to detect population outliers. First 10 principal components were used to identify 29 subjects as population outliers (6 SD away from mean value) that were removed. We limited our analysis on samples with CRP<10 mg/L not taking lipid lowering drugs. CRP values were inverse normalized using R (http://www.r-project.org/). Finally, association of 519,533 autosomal SNPs with transformed CRP values were assessed in 1097 participants using the linear regression model in PLINK. Age, sex, body mass index (BMI), smoking status, alcohol consumption status, tobacco chewing status and first three principal components were used as covariates in the model. Quantile–quantile (QQ) analysis was used to compare the theoretical and observed distribution of P values using the qqman package in R (Turner 2014).

Validation phase and meta-analysis

Stage I signals obtained for association with CRP level (\(P {<}10^{-4})\) besides previously known variants were queried in independent 3396 normoglycemic individuals for association with CRP trait. Selected SNP variants for CRP were genotyped along with other variants for ongoing GWAS for multiple other biochemical traits through Illumina Golden Gate technology (Shen et al. 2005). Overall, 204 samples (6%) were genotyped as replicates to assess experimental accuracy (replication error <0.01 was observed). Samples having poor call rate (<90%) were discarded. SNPs with GenTran score <0.6, cluster separation score <0.4 and call rate <90% were excluded. SNPs with MAF<0.01 and HWE \(P<1\times 10^{-6}\) were also discarded prior to genetic analysis.

Moreover, we also evaluated the combined association status of five cis-acting (proximal to CRP gene) and 10 trans-acting variants (distal to CRP gene) that were genotyped in stage I of the current study and previously genotyped in another independent Indo-European sample set by our group (Mahajan et al. 2011). Cis-acting variants were rs3093077, rs4131568, rs2794520, rs2592887 and rs12093699 in the CRP gene. Trans-acting variants were rs1892534, rs12753193, rs753947 (LEPR), rs4129267 (IL6R), rs2464196, rs1169300, rs735396 (HNF1A), rs7953249 (HNF1A-AS1), rs2075650 (APOE) and rs10778213 (ASCL1-C12orf42). We used 385 subjects from our previous study as replication cohort for all these variants except for variant rs1892534. For variant rs1892534, we used 1026 subjects from previous study as replication cohort. Meta-analysis was conducted with METAL using the fixed effect inverse variance method (Willer et al. 2010).

Power calculation

The statistical power of the study was calculated using Quanto software (http://biostats.usc.edu/software). Additive model of inheritance was assumed for a range of allele frequencies 0.01–0.50. Two-tailed test at a significance level of 0.05 and effect sizes ranging from 0.0001 to 0.66, obtained from the literature were used for power calculation, and it was found that the present study is sufficiently powered to detect association of genetic variants (figure 1 in electronic supplementary material at http://www.ias.ac.in/jgenet/). Average CRP level of 2.45 mg/L and SD of 2.15 mg/L were used to calculate the power of the study. MAF was calculated from combined data for discovery and replication phase subjects.

Fig. 1
figure 1

QQ plot comparing the calculated distribution of \(-log_{10} P\) values observed for association of SNPs with CRP level in stage I with theoretical distribution expected under the null hypothesis of no association with the phenotype. The genomic inflation factor (\(\lambda \)) was calculated to be 1.003.

Phenotypic variance explained by SNPs

To calculate proportion of heritability explained by the identified SNPs, GCTA tool (Yang et al. 2011) was used. We performed restricted maximum likelihood (REML) analysis followed by One GRM (–grm option) for quantitative traits. Inverse normalized CRP values were used as input for the phenotype. Analysis was adjusted for categorical covariates; sex, smoking status, alcohol consumption status and tobacco chewing status, and continuous covariates such as age, BMI and first three principal components. Summary result of REML analysis reported phenotypic variance of 4.07% by all the identified SNPs in Indian population.

Pathway analysis of significant genes

To comprehend the metabolic pathways involving CRP regulating genes, pathway analysis was performed with significant genes (meta-analysis \(P<\)0.05) using GeneMANIA (Montojo et al. 2010).

Table 1 Descriptive characteristics of the study participants.
Fig. 2
figure 2

Manhattan plot for the SNPs associated with CRP level in stage I. Genomic positions for respective SNPs are shown in the x-axis (NCBI built 37).

Table 2 Association status of previously reported variants genotyped in stage II with circulating CRP level (P value \(<1\times 10^{-3})\) in Indian subjects.

Results and discussion

Genomewide analysis for circulating CRP

The present study is the first GWAS for circulating CRP level in Indians that ascertains the role of previously described variants for CRP regulation. Detailed characteristics of study population are presented in table 1. Figure 1 demonstrates the QQ plot, where the observed P values show good agreement with null hypothesis. Genomic inflation factor (\(\lambda \)) was computed to be 1.003 indicating no effect of type-1 errors on inflation of the observed P values. Figure 2 depicts the association of P values for 519,533 SNPs among 1097 subjects in stage I. Variants rs2965498 (effect size \(=\) 1.91, \(P\,\hbox {value}=1.14\times 10^{-7})\) and rs2913036 (effect size \(=\) 1.64, \(P\,\hbox {value}=4.48\times 10^{-7})\) lying 261 and 250 kb upstream of POM121 trans-membrane nucleoporin-like 12 (POM121L12), respectively featured the strongest novel signals in stage I (figure 2). Both these variants were in strong linkage disequilibrium (LD) with each other in stage I with \(r^{2}=0.79\). However, these signals did not replicate in stage II.

Signals with stage I \(P{<}10^{-4}\) along with previously reported variants for CRP level were followed for stage II in 3396 subjects. Among the previously reported variants taken for replication, rs1183910 (HNF1A) passed near genomewide significance (\(P=5.03\times 10^{-8})\) followed by strong association of rs4420638 near APOC1 gene (\(P=3.41\times 10^{-7})\) (table 2).

Table 3 Variants associated with circulating CRP level in meta-analysis (\(P<1\times 10^{-3})\) among Indian subjects.

Meta-analysis in 4493 subjects (1097 subjects in stage I and 3396 subjects in stage II) identified HNF1A variant, rs7310409 (\(P=5.28\times 10^{-11})\) as the top signal that crossed genomewide significance (table 3). This was followed by strong replications of other known variants, rs1892534 (\(P=5.87\times 10^{-5})\) and rs3093077 (\(P = 2.59\times 10^{-4})\) in/near LEPR and CRP as discovered in our previous study (Mahajan et al. 2011).

Discovered associations of genomewide significant HNF1A variants, rs7310409 and rs1183910 in Indian population are consistent with earlier findings in several different populations (Reiner et al. 2008; Ridker et al. 2008; Elliott et al. 2009; Dehghan et al. 2011; Okada et al. 2011; Wu et al. 2012; Dorajoo et al. 2013; Kong and Lee 2013; Kocarnik et al. 2014). Our study was effectively powered to mark the GWAS level association of HNF1A variants for the first time in Indian population. This finding is further supported by prior large-scale genetic studies conducted in Asian-Indian subjects by the London Life Sciences Population cohort (LOLIPOP) (CRP effect (%) \(= -13.6\), \(P=1.2\times 10^{-17}\) for rs1183910) (Elliott et al. 2009), and South-Asian Indians of Singapore Indian Eye Study (SINDI) (beta \(=\) 0.038, \(P=0.003\) for rs7310409 and beta \(=\) 0.046, \(P=3.96\times 10^{-4}\) for rs1183910) (Dorajoo et al. 2013), where strong associations of HNF1A variants rs1183910 and rs7310409 were detected.

Fig. 3
figure 3

CRP level variation with respect to genotypes observed for genomewide associated SNPs in HNF1A loci. Average CRP level in individuals (y-axis) according to different genotypes (x-axis) for identified GWAS SNPs in current study has been plotted. Homozygous protective genotype with no CRP increasing allele, heterozygous genotype and homozygous risk genotype with two CRP increasing allele have been denoted as 0, 1 and 2, respectively.

Figure 3 depicts all the three genotypes (homozygous protective, heterozygous and homozygous risk genotype) observed in the two GWAS variants that strongly correlate with the mean CRP level in individuals harbouring respective genotypes (\(r=0.99\) for rs7310409 and rs1183910).

Other regulatory features of identified loci

Table 4 Association status of known GWAS signals for CRP in Indian population.

Utilizing publically accessible HaploReg and RegulomeDB databases (Boyle et al. 2012; Ward and Kellis 2012), other functional regulatory features for GWAS signals and their LD SNPs were extracted. These SNPs have either promoter (H3K9ac and H3K4me3) or enhancer histone marks (H3K27ac and H3K4me1) or both, and DNAse hypersensitivity in several tissues including liver (table 1 in electronic supplementary material). Other CRP-associated variants were also found to harbour similar regulatory features (table 2 in electronic supplementary material). HNF1A is considerably expressed in liver and functions as a transcription factor. It has been shown to bind promoter regions of CRP gene and regulate CRP synthesis from liver cells (Li and Goldman 1996). Additionally, mutations in this gene accounts for substantial cases with maturity-onset diabetes of the young (MODY-3) that occurs during the early stages of life (<25 years) characterized by autosomal dominant inheritance and insulin secretory defects (Fajans et al. 2001). Based on the overlapping analysis with publically available datasets, observed HNF1A variants and their LD partners showed the presence of several regulatory features such as histone marks, DNAse I hypersensitive sites and binding sites for various proteins and factors of transcription machinery in liver and other tissues. This essentially implicates functional role of these variants in regulating HNF1A transcription and expression. Altered HNF1A expression could also potentially alter the expression of its target genes. Since, Indians are at high risk for metabolic syndrome, where earlier we demonstrated CRP levels as a good predictor for detecting Indian individuals with metabolic syndrome (Mahajan et al. 2012). HNF1A might be a key regulator of CRP and therefore strongly associate with CRP levels in high-risk Indian population. Identified HNF1A variants together contributed for 1.11% to variance in CRP level in Indians. Strong association (\({\sim }10^{-7})\) near apolipoprotein gene APOC1 (explained variance \(=\) 0.95%) of our study was also in agreement with studies linking this region with CRP level (Elliott et al. 2009; Okada et al. 2011).

Association status of known GWAS signals for CRP in Indian population

We evaluated the association status of previously described variants of CRP in the GWAS catalogue (Welter et al. 2014) and literature among Indians and other ethnic populations (table 4). The effect sizes were converted to uniform unit by absolute conversion factor before comparison. Reported variants associated with CRP level (\(P{<}0.05\)) in Indians were rs1183910 and rs7310409 in HNF1A, rs4420638 near APOC1, rs1892534 and rs12753193 in LEPR, rs3093077 and rs4131568 in CRP, rs1800961 in HNF4A, rs4129267 in IL6R and rs157580 in APOE. Majority of these variants have higher effect sizes in Indians compared to other ethnicities. Among the known significant variants detected in our population (\(P{<}0.05\)), we observed directional consistency for effect sizes across present study and source study for all loci except for rs4131568 at CRP. Identification of known signals in genetically heterogeneous Indian population reflects that major variants associated with circulating CRP level are substantially similar across the globe. Therefore, genes identified for CRP regulation in Indians were observed to be genes of metabolic and immune-function relevance consistent with earlier studies in other populations (Reiner et al. 2008; Ridker et al. 2008; Elliott et al. 2009; Dehghan et al. 2011; Okada et al. 2011; Wu et al. 2012; Dorajoo et al. 2013; Kong and Lee 2013; Kocarnik et al. 2014). Since CRP levels have been found to differ in individuals with different life style (exercise, smoking, food habit etc.) (Rojo-Martinez et al. 2013), we also evaluated the effects of discovered loci in the presence of endogenous environmental factors. Here, we considered the effects of identified loci in smokers, alcoholics, tobacco chewers and nonvegetarians in comparison to their contemporaries of healthy habits in Indian subjects (table 3 in electronic supplementary material). We observed strong reduction in effect size of LEPR variant rs12753193 among nonsmokers (effect size \(=\) 0.48) in comparison with smokers (effect size \(=\) 0.86). Further, the effect size of this variant was higher in alcoholics (effect size \(=\) 0.91) than nonalcoholics (effect size \(=\) 0.53). LEPR variant rs1892534 had higher effect size in alcoholics (effect size \(=\) 0.9) than nonalcoholics (effect size \(=\) 0.59). This variant also had higher effect size in nonvegetarians (effect size \(=\) 1.3) than vegetarians (effect size \(=\) 0.47). HNF1A variant rs1183910 showed notable difference in effect sizes for the CRP level among alcoholics (effect size \(=\) 0.38) versus nonalcoholics (effect size \(=\) 0.26) and nonvegetarians (effect size \(=\) 0.53) versus vegetarians (effect size \(=\) 0.19). Similarly effect size of IL6R variant rs4129267 was higher in tobacco chewers (effect size \(=\) 1.09) than nontobacco chewers (effect size \(=\) 0.56). Further, variant rs7310409 in HNF1A showed different effect sizes among nonvegetarians (effect size \(=\) 0.53) and vegetarians (effect size \(=\) 0.26) with higher effect size in the former group. These results suggest that an inherent environment in India might also influence the effect of identified genetic loci for CRP in Indians. All identified variants contributed to nearly 4% of the variance in the CRP level in Indian population that indicates major contribution of several other genetic and epigenetic players in modulating circulating CRP level in this population.

Pathway enrichment analysis

Pathway analysis of significant genes in meta-analysis (\(P{<}0.05\)) demonstrated significant enrichment of genes involved in pancreas development (table 4 in electronic supplementary material). Inclusive of current study samples, earlier we demonstrated that higher CRP levels were associated with T2DM in BMI-independent manner (Mahajan et al. 2009). Current study also detected significant enrichment of CRP-associated genes in pancreas development that probably signify shared genetic heritability for pathophysiology of CRP level and T2DM that is adult-onset and most common form of diabetes across the globe including India. This is in agreement with the hypothesis that complex disorders are outcomes of precursor quantitative traits (Plomin et al. 2009). It implicates essentiality of studying quantitative trait genetics to better understand mechanistic heterogeneity of complex disorders across different populations.

In summary, our study is the first GWAS conducted for circulatory CRP levels in Indians. Our study confirmed known associations of common variants at genomewide scale in a large sample size of Indo-European population.