Introduction

Stroke risks vary among ethnic populations, with much higher incidence and mortality reported in East Asian [1]. In China alone, 1.6 million people die of stroke every year, which rank stroke as the leading cause of death and adult disability in the country, and there are about 2.5 million new stroke patients and 7.5 million stroke survivors each year [2]. Beside the low control rates of modifiable risk factors, discrepancies in genetic background have been proposed as a potential cause for the uneven distribution of stroke risk in geographic regions and among ethnic populations [3], but the specific genes responsible for the extra risk have not been identified to date.

In the past years, genome-wide association studies (GWAS) have identified several ischemic stroke susceptibility loci, including 4q25, 16q22, 12p13, 7p21, and 6p21 [48]. A recent GWAS [9] associated rs505922, a single nucleotide polymorphism (SNP) located in the first intron of ABO gene, with stroke of large artery atherosclerotic (LAA) and cardioembolic types, but not with stroke of small vessel disease type. This association between ABO gene polymorphisms and risk of specified stroke subtypes were detected only in a Caucasian population and remained to be tested in other ethnicities. Increasing evidence suggests that genetic risks differ depending on ischemic stroke subtype [10]. LAA stroke is the most common subtype in China [11].

Therefore, to validate the findings of the previous Caucasian GWAS study in Chinese Han population and to search for more susceptible loci of LAA stroke, a case–control study consisting of 644 cases and 642 controls was performed.

Material and Methods

Ethics Statement

This study was approved by the Ethical Review Board of Jinling Hospital (Nanjing, China). Written informed consent was obtained from each enrolled participants.

Study Population

Patients were enrolled from Nanjing Stroke Registry Program (NSRP), which has been described in previous publications [12]. Between March 2012 and December 2014, consecutive patients with first-ever stroke were assessed for eligibility of participation. Inclusion criteria of the cases included Chinese Han ethnicity; LAA stroke classified according to TOAST and confirmed by imaging results within 14 days of stroke onset; and aged 18 years or older. Exclusion criteria included malignancies; severe heart, lung, liver, and kidney dysfunction; autoimmune and inflammatory diseases; and hematological diseases.

The controls were screened from local residents who had a regular physical examination. Residents were included if they were ethnically Chinese Han, aged 18 years or older, without history of atherosclerotic diseases, and without history of stroke or cardiovascular diseases.

Fine-Mapping Strategy

By searching the HapMap release 27 of the merged phases 2 and 3 (http://hapmap.ncbi.nlm.nih.gov/), a LD block of 25.4 kb around the rs505922 SNP in Han Chinese was identified. Using Haploview software version 4.2 (http://www.broadinstitute.org/haploview/haploview), tag SNPs for fine-mapping were determined based on their possibility to tag surrounding variants [13].

The tag SNPs (Table 1) were selected based on the following criteria: (1) with r 2 = 0.8; (2) with minor allele frequency (MAF) ≥10 %; (3) with Hardy-Weinberg equilibrium test P value ≥0.05; and (4) with overall rate of genotype ≥75 %. The index SNP, rs505922, was also included. As a result, ten tag SNPs (rs630014, rs505922, rs500499, rs8176668, rs575259, rs8176722, rs2073824, rs8176725, rs8176731, and rs8176740) were selected for genotyping. Figure 1a shows rs number and relative position of these ten tSNPs. Figure 1b shows their LD blocks in controls which were calculated by the Haploview 4.2 software.

Table 1 Information of ten selected SNPs of ABO gene region in a Chinese population
Fig. 1
figure 1

a ABO genomic organization. Exons are presented as gray blocks; linkage disequilibrium blocks are indicated by dotted box. The SNPs analyzed in the present study are indicated by arrows according to their genomic position. b LD plot of ABO gene using 10SNPs in 642 ethnic Han Chinese controls. This plot was generated by the Haploview program. Three blocks were determined. The rs number (top; from left to right) corresponds to the SNP name and the level of pairwise. D’ indicates the degree of LD between the two SNPs

Blood Sample Collection and Genotyping

Genomic DNA was extracted with DNA isolation kit. Genotyping for variants of ABO gene was conducted by the Improved Multiple Ligase Detection Reaction (iMLDR) [14], and 5 % randomly selected samples were used to assess reproducibility of genotypes, which resulted in a 100 % concordance.

SNP–SNP Interactions

Multifactor dimensionality reduction (MDR) (version 3.0_0_2) and MDR-permutation testing (MDRpt, version 1.0_beta_2) were performed to evaluate the SNP–SNP interactions in the risk of LAA stroke. In MDR, multilocus genotypes were classified as high- and low-risk groups. With this method, the multidimensional genotype variables were transformed into single-dimensional ones [15]. The validity of these transformed single-dimensional multilocus genotype variables in predicting LAA stroke risk were evaluated with cross-validation and permutation testing [16].

Among multilocus models, the best candidate interaction model was the one with the highest testing accuracy and the cross-validation consistency (CVC). The true positive models would have estimated testing accuracy higher than 0.5. We performed MDR in a cross-validation framework that can assess the predictive ability of the models [17]. We implemented 1000-fold permutation testing methods to assess statistical significance. The model was considered to be statistically significant when the P value was less than 0.05.

Statistical Analysis

Power analyses were conducted by PS software, version 3.0.14 (available at http://www.mc.vanderbilt.edu/prevmed/ps). A fixed MAF of 30 %, a type I error probability of 0.05, and an odds ratio (OR) of 1.4 were used to estimate the power. The sample of this study yielded a power of 81.2 % in analyzing association between the cases and the controls.

Hardy-Weinberg equilibrium was tested with chi-squared goodness-of-fit test to examine the frequency distribution of each SNP in control subjects. Haploview v4.2 was adopted to analyze linkage disequilibrium (LD) and identify haplotype blocks.

Chi-squared tests were used to compare differences of categorical variables between groups. The Student’s t test was performed to compare the continuous covariates. Logistic regression model were used to evaluate association between SNP polymorphisms and risk of LAA stroke. The Hosmer–Lemeshow goodness-of-fit test was performed to assess calibration of logistic regression model. A P value of more than 0.05 was deemed as satisfactory. The discriminative power of multivariate logistic regression analysis was evaluated with a receiver operating characteristic (ROC) curve.

Haplotypes were reconstructed with PHASE software (version 2.1). In haplotype analysis, haplotype frequencies less than 0.01 were omitted. MDR was adopted to identify the potential interactions of these ten tSNPs on risk of LAA stroke. Bilateral P value less than 0.05 was deemed statistically significant. Statistical analyses were performed using IBM SPSS Statistics version 22.0 (Armonk, NY: IBM Corp.).

Results

Baseline Characteristics

The clinical and demographic characteristics of the cases and controls are shown in Table 2. The cases and controls were well matched on age (P = 0.877) and sex (P = 0.683). Prevalence of hyperlipidemia was also much similar between cases and controls (P = 0.767). As shown in Table 2, the cases had a higher prevalence of conventional risk factors for vascular diseases, including a history of hypertension, diabetes mellitus (DM) and smoking compared with the controls, and body mass index (BMI) was also significantly different between the cases and the controls (P < 0.01). Thus, these variables (age, sex, BMI, hypertension, DM, hyperlipidemia, and smoking) were adjusted in the multivariate logistic regression analysis to evaluate the main effects of ABO gene polymorphisms on risk of LAA stroke.

Table 2 Characteristics of cases and controls

Association Between ABO SNPs and LAA Risk

The genotype distributions in the controls were consistent with Hardy–Weinberg equilibrium (Table 1). In the univariate analyses, the genotype frequencies of two SNPs were significantly different between the cases and the controls: rs8176668 (P = 0.005) and rs2073824 (P = 0.001, Table 3). For rs8176668 A>T polymorphism, the frequencies of AA, AT, and TT genotypes were 52.3, 39.1, and 8.6 %, respectively, in cases, and were 43.3, 47.0, and 9.7 %, respectively, in controls. For rs2073824 G>A polymorphism, the frequencies of GG, GA, and AA genotypes were 38.0, 44.6, and 8.8 % in cases and were 28.0, 52.5, and 19.5 % in controls. In multivariate logistic regression analysis, adjusted for age, sex, BMI, hypertension, DM, hyperlipidemia, and smoking, AT genotype of rs8176668 had a significant protective effect on LAA stroke (OR = 0.71; 95 % CI, 0.55 to 0.92). AA genotype of rs2073824 also had protective effects on LAA stroke (OR = 0.65, 95 % CI, 0.46 to 0.93). In the dominant-effect model, both rs8176668 (TT/AT vs AA, OR = 0.721; 95 % CI, 0.57 to 0.92; P = 0.001) and rs2073824 (AA/GA vs GG, OR = 0.61; 95 % CI, 0.47 to 0.79; P < 0.001) were associated with LAA stroke. The goodness-of-fit test demonstrated good calibration (for rs8176668, P = 0.553; for rs2073824, P = 0.830). For rs8176668 multivariate logistic regression analysis model, the area below the ROC curve was 0.737 with a 95 % CI of 0.71–0.76. For rs2073824 multivariate logistic regression analysis model, the area below the ROC curve was 0.742 with a 95 % CI of 0.72–0.77.

Haplotype Block Structure and LD Analysis

As shown in Fig. 1, LD analysis showed that there were three blocks in the ABO gene. Seven of ten selected SNPs were included by these three blocks. Rs505922 and rs500499 were located in LD block 1; rs575259, rs8176722, and rs2073824 were in LD block 2; and rs8176731 and rs8176740 were in LD block 3. In haplotype analysis, after adjusting for age, sex, BMI, hypertension, DM, hyperlipidemia, and smoking, haplotype TC (OR = 0.72; 95 % CI, 0.54 to 0.95) and haplotype ACA (OR = 0.73; 95 % CI, 0.58 to 0.91) were associated with a decreased risk of LAA stroke (Table 4).

Table 3 Association between tag SNPs and risk of LAA stroke
Table 4 Haplotype analysis between cases and controls

SNP–SNP Interactions

A total of ten tSNPs in ABO gene were analyzed by MDR. Table 5 summarizes the results of MDR for analyzing interactions of the ten tag SNPs in their influences on risk of LAA stroke. In the single-locus model, rs2073824 was the most influential attributor for LAA stroke risk, with testing accuracy (TA) of 0.55, cross-validation consistency (CVC) of 10/10, which is consistent with the results of previous logistic analysis. In the multilocus model, the best interaction model was a single-locus model incorporating rs2073824 SNP (testing accuracy = 0.5469, CVC = 5/10, P < 0.05).

Table 5 SNP–SNP interactions analyzed with MDR

Discussion

This study found that rs8176668 and rs2073824 in ABO gene were associated with reduced risk of LAA stroke in Chinese population. The sample size provided 81.2 % power to detect an OR of 1.4 for minor allele frequencies of 0.3.These results are novel to the results of the recent Caucasian GWAS [9]. These findings suggested that susceptibility loci for LAA stroke risk identified by Caucasian GWAS might not play a role in LAA stroke in Chinese population.

These discrepant results between different studies may be explained from two aspects. The first explanation lies in the genetic differences between Chinese and Caucasian populations. The ABO gene varies in geographic/ethnic distribution of allele frequencies [18]. A previous study identified three novel ABO alleles in the Chinese Han population [19]. Differences of minor allele frequencies between Chinese and Caucasian populations were also observed. The minor allele in CEU (Utah residents with Northern and Western European ancestry from the CEPH collection) become the major allele in CHB (Han Chinese in Beijing, China) (rs2073824, G allele, CEU = 0.372, CHB = 0.608). The second explanation lies in that rs505922 may be a marker polymorphism rather than a causal variant.

The SNP rs505922 detected in Caucasian GWAS is located in the first intron of ABO gene, which was associated with LAA stroke. This study identified two novel SNPs, rs8176668 and rs2073824, associated with LAA stroke. Rs8176668 and rs2073824 were located in the first and the sixth introns of ABO gene, respectively. Two SNPs identified in this study were novel to the one detected in Caucasian GWAS study, but they all located in the ABO gene. ABO gene is located on the distal long arm of chromosome 9 (9q34.1-q3.2) [20] and consists of seven exons and six introns spanning about 18–20 kb. ABO gene encodes a glycosyltransferase, which transfer monosaccharides to the cell-surface H antigens, and forms the antigenic structure of the ABO blood groups [21]. Several studies [22, 23] have showed that the ABO gene was involved in the development of coronary artery disease, thrombotic events, and hemorrhagic stroke. While introns do not encode protein, they play essential roles in mRNA process and transportation. Some introns are known to enhance the expression of genes which are involved in a process named as intron-mediated enhancement. We speculate that SNPs rs8176668 and rs2073824 may influence the nucleotide splicing process, result in various spliceosomes and reduce the protein product of ABO gene.

MDR is a powerful method for analyzing gene–gene interactions and has been widely used in several genetic association studies of complex diseases. So, this method is suitable for exploring the relationship between SNP–SNP interactions and risk of LAA stroke. However, no significant SNP-SNP interaction among these ten SNPs was observed.

This study has limitations that need to be considered when interpreting the results. First, the cases and controls were recruited from hospitals, which might result in potential selection bias. Second, we provided evidence for association between ABO gene polymorphisms and risk of LAA stroke, but the underlying molecular mechanisms are still unknown, and need to be determined by cell or animal studies in future. Furthermore, we did not exclude residents with family history of cardiovascular disease or stroke in control selection, which might introduce extra bias. The controls with family history of cardiovascular disease may have higher frequencies of genotypes associated with cardiovascular disease. Therefore, exclude those controls with family history of cardiovascular disease may decrease the likelihood of type II error (false negative).

In conclusion, rs8176668 and rs2073824 in ABO gene may be associated with LAA stroke in the Chinese Han population, which may be used as candidate biomarkers for LAA stroke susceptibility screening. However, the molecular mechanism by which the ABO gene influenced LAA stroke risk warrant further study.