Introduction

Cancer is known as malignant neoplasia related to uncontrolled proliferation. Until now, there are over 200 different kinds of cancers that can affect the human body [1]. Among them, gallbladder cancer (GBC) is the most common biliary tract cancer in North Indian population, characterized by its high lethality, aggressive nature, and dismal prognosis [2]. For all advances in chemoradiation, the outcome of treatment with advanced GBC is still poor, because of chemoresistance and recurrence due to heterogeneity of tumor cells. Cancer stem cell (CSC) presents in a heterogeneous tumor with properties of tumor progression, chemoresistance, and recurrence [3].

Cancer stem cells are identified by expression of surface markers CD44, ALCAM, EpCAM, CD133, and molecular markers NANOG, SOX-2, LIN-28A, ALDH1A1, and OCT-4. Surface markers are transmembrane glycoprotein in nature and are involved in cell proliferation, invasion, and metastasis of cancer stem cell [4, 5]. CD44+CD133+ cells showed chemoresistance and CSCs like characteristics in gallbladder cancer cell lines [6, 7]. Our earlier reports have shown the important role of CD44 gene variants in GBC susceptibility [8]. ALCAM-ALCAM interconnections have a role in the development and maintenance of tissue architecture and tumor progression [9]. ALCAM considered as a breast cancer prognostic marker [10]. EpCAM+ cells have been possessed a tumor-initiating role [11]. EpCAM is overexpressed in different tumors, including colon, lung, pancreas, breast, and ovary [12].

Molecular marker ALDH1A1 enzyme belongs to dehydrogenases family of proteins that participates in cellular detoxification and differentiation through the oxidation of intracellular aldehydes [13] while Oct-4, Sox-2, and Nanog transcription factors responsible for pluripotency of stem cell. Oct-4, Sox-2, and Nanog induce expression in cooperative manner for maintaining pluripotency of cancer stem cells [14]. LIN28 encodes a microRNA-binding protein that binds to let-7 microRNA and inhibits production of the mature let-7 microRNA in stem cells [15].

Many studies have reported significant associations of CSC gene polymorphisms with susceptibility and treatment response in cancers [1618]. Till now; there are a limited number of studies involving CSC genetic variants with GBC [8]. Therefore, in this study, we investigated 15 CSCs (surface and molecular marker) germline polymorphisms that have been previously associated with different cancers, to evaluate their influence on the risk and treatment outcomes of GBC in the North Indian population. In future, some cancer stem cell genetic variants may act as potential predictive and prognostic markers for GBC.

Materials and methods

Recruitment of subjects

In this study, 610 histologically confirmed gallbladder cancer patients and 250 controls were recruited after taking their informed consent. All healthy controls were age-, gender-, and ethnicity-matched and without gastrointestinal disorders and belonged to same geographical region. The study protocol was approved by the Ethics Committee of SGPGIMS. Two hundred GBC patients receiving adjuvant or neoadjuvant chemoradiotherapy, in a sequential manner for three cycles, were followed for treatment response. Radiotherapy was given by IMRT and 3D-CRT techniques. Chemotherapeutic drugs gemcitabine, 5-flurouracil, and cisplatin were given in combination. Two hundred patients were followed-up for drug toxicity according to the National Cancer Institute (NCI) Common Toxicity Criteria for Adverse Events (CTCAE), version 3.0 (http://ctep.cancer.gov). Tumor response was assessed in 140 NACT-treated patients, according to the Response Evaluation Criteria in Solid Tumors (RECIST1.1 criteria). Details of patient enrollment and treatment plan are shown in Fig. 1.

Fig. 1
figure 1

Study plan flow diagram. GBC susceptibility group and follow-up group with treatment outcomes

Patients with static and progressive disease were considered as nonresponders while with complete and partial pathological response were considered as responders. In chemotoxicity, grade 0–1 was considered as mild toxicity and grade 2–4 as moderate to severe. Hematological toxicity was in terms of anemia, leucopenia, and thrombocytopenia. We evaluated overall survival on the basis of patient’s histopathological and clinical stages in 199 patients. Treatment modality and clinical stage affect the survival pattern, so we divided 199 patients in three groups’ postoperative, metastatic, and locally advanced for survival analysis. Median follow-up survival period was 18 months.

Basic characteristics of study population and clinical data of follow up patients are given in Tables 1 and 2, respectively.

Table 1 Characteristics of the subjects
Table 2 Characteristics of followed up patients

SNP selection

Most Informative tagger SNPs (Haploview software 4.2) were selected with >5 % MAF in GIH and CEU population by using the HapMap project database. The sample size was calculated by using QUANTO1.1. This study achieved 80 % power.

Genotyping

Venous blood (4 ml) was collected from patients, and the genomic DNA was extracted from peripheral blood using the standard salting out method [19]. Genotyping of the SNPs was carried out using the TaqMan allelic discrimination assay, ARMS-PCR, and PCR-RFLP. Details of genotyping methods are summarized in Table S1 (supplementary data).

Statistical evaluation

Descriptive statistics of patients were presented as mean and standard deviations for continuous measures and frequencies and percentages for categorical measures. Correlations between various genotypes and treatment outcomes were examined using binary logistic regression. The association was expressed with odds ratios (OR) or risk estimates with 95 % confidence intervals (CI). Statistical analysis was done using SPSS statistical analysis software, version 20.0. Haplotype analysis was done by using the SNPstat Software (bioinfo.iconcologia.net/SNPstats). Global P value represents a difference between the means associated with pair of haplotypes. To reduce chances of obtaining false-positive results (type I errors) Bonferroni correction was applied in subgroup analysis. The Bonferroni correction is an adjustment made to P values when several dependent or independent statistical tests are being performed simultaneously on a single dataset. To perform a Bonferroni correction, we divided the critical P value (0.05) by the number of comparisons being made (0.05/2 = 0.025). The statistical power of the study is then calculated based on this modified P value (0.025).

The functional effects of polymorphisms were determined by online F-SNP (http://compbio.Cs.Queens.CA/F-SNP/) database. F-SNP database provides integrated information about the functional effects of SNPs obtained from 16 bioinformatics tools and databases and provides output in terms of FS score. The functional effects are predicted and indicated at the splicing, transcriptional, translational, and posttranslational level. As such, the F-SNP database helps identify and focus on SNPs with potential pathological effect to human health.

Generalized multifactor dimensionality reduction (GMDR) analysis was done for the adjustment of quantitative and discrete covariants with gene-gene interactions [20]. GMDR replication of permutation determines accuracy of the P value assessed by permutation. We used 1000 replication of permutation to get significant P values.

The best gene-gene interaction model was obtained by reducing multi-locus genotypes into low-risk and high-risk groups on the basis of highest testing accuracy and permutation results, and cross-validation consistency (CVC) was considered to be statistically significant at the 0.05 level.

Survival curve was constructed using Kaplan-Meier method, and differences between the groups were tested by the log-rank method. The hazard ratio was calculated by Cox-proportional method at 95 % confidence interval. More intuitive survival model was generated by recursive partitioning (rpart version 3.1). Protein direct and functional interactions were determined by STRING database.

Results

Association of genetic variants with GBC susceptibility

In this study, we evaluated the frequency distribution of genetic polymorphisms in cancer stem cell surface markers CD44, ALCAM, EpCAM, and CD133 and molecular markers ALDH1A1, OCT-4, SOX-2, LIN-28A, and NANOG polymorphisms between healthy controls and GBC patients. All studied polymorphisms were in Hardy-Weinberg equilibrium in controls (P > 0.05).

The frequencies of ALCAM rs1157G>A and rs10511244T>C polymorphisms were found to be significantly associated with GBC risk, at genotypic [rs1157(AA), OR (CI) = 2.4 (1.09–5.29), P value = 0.03] and allelic levels [rs1157(A), OR (CI) = 1.42 (1.07–4.89), P value = 0.016; rs10511244 (C),OR (CI) = 1.42 (1.02–4.96), P value = 0.036], respectively (Tables 3 and S2).

Table 3 Frequency distribution of CSCs surface markers polymorphisms in GBC and controls (age and gender adjusted)

To determine the effect of multiple single nucleotide polymorphisms within a gene, haplotype analysis was done. Those haplotypes, which did not cross rare haplotype frequency (<0.05 %), were excluded from analysis. Bonferroni correction was applied in subgroup analysis. TheArs1157Crs10511244 haplotype of ALCAM was significantly associated with increased risk in GBC patients [Table S3 OR (CI) = 2.40 (1.34–4.32), P value = 0. 0035], but the significance was lost after Bonferroni correction. ALCAM Ars1157Crs10511244 haplotype in GBC patients harboring gallstones [Table S4 OR (CI) = 3.71 (1.90–7.26), P value = 1e−04*] and GBC females [Table S5 OR (CI) = 3.62 (1.53–8.57), P value = 0. 0035*] remained significant after Bonferroni correction (P < 0.025).

Individual CD44 gene polymorphisms had no significant differences in frequency distribution between GBC cases and controls, but haplotype of CD44 Crs13347Ars353639Ars187116Crs187115 [OR (CI) = 0.51 (0.27–0.95)], P value = 0.033] was significantly associated with lower risk of GBC. Here also, the significance was lost after Bonferroni correction (Table S3).

There were no significant associations of EpCAM rs1126497T>C, EpCAM rs1421T>C, CD133 rs3130C>T, CD133 rs2240688T>G, ALDH1A1 rs13959A>G, OCT-4 rs3130932T>G, SOX-2 rs11915160A>C, LIN-28A rs4274112T>C, and NANOG rs11055786T>C polymorphisms with GBC risk in terms of overall case control frequency distribution, gender stratification, and gallstone status (Tables 3, S2, S3, S4, and S5).

Gene-gene interaction for GBC susceptibility

GMDR analysis was used to evaluate gene-gene interaction with GBC susceptibility by adjusting age, gender, tobacco, and gallstone status covariants.

We performed separate gene-gene interaction analysis for surface and molecular markers. In surface markers, ALCAM rs1157 and EpCAM rs1126497 [Table 4, OR (CI) = 1.8 (1.17–6.78), P value = 0. 005] was best significant interaction model for GBC susceptibility. In molecular markers OCT-4 rs3130932, LIN-28A rs4274112 [Table 4, OR (CI) = 1. 84 (1.18–5.88), P value = 0. 007] was best significant interaction model for GBC susceptibility.

Table 4 GMDR: gene-gene interaction

Association of CSC genetic variants with treatment response and toxicity

Univariable analysis did not show significant association of variants in surface and molecular marker between responders and nonresponders of chemoradiotherapy treatment outcomes at allele, genotype, and haplotype levels (haplotypes, Table S6). Results were consistent after multivariable analysis. Hematological and gastrointestinal toxicity of treatment in GBC patients also did not associate with any particular CSC genetic variants (Tables S7 and S8).

The GMDR analysis was used to evaluate gene-gene interaction with clinical outcome by adjustment of the covariants tumor stage, age, gender, and treatment modality. The best combination model for clinical outcome is given in Table 4.

For surface markers, we did not get significant interaction model for treatment response. However, in molecular markers, the best significant model was NANOG rs11055786, OCT-4 rs3130932, and SOX-2 rs11915160 (Table 4, OR (CI) = 5.6 (1.71–18.3), P value = 0. 003] for treatment response. ALDH1A1 rs13959 (Table 4, OR (CI) = 3.0 (1.25–7.47), P value = 0.016) emerged as the best interaction model for hematological toxicity.

Associations between the CSCs genetic variants and GBC survival

Kaplan-Meier survival curves were used to assess the associations between the CSCs polymorphisms and survival time. Survival was calculated separately in the three patient groups’ postoperative, metastatic, and locally advanced. The median survival period for metastatic was 9.5 months. Metastatic GBC cases with ALCAM rs1157GG genotype showed a higher survival rate than the individuals with GA+AA genotype (log-rank p = 0.026) (Fig. 2). In Cox-proportional hazard model ALCAM rs1157GA+AA genotype in GBC cases, associated with higher hazard ratio (HR = 1. 7, CI = 1.03–2.84, Table 5) as compared with GG genotype. Kaplan-Meier survival curve did not show any significant differences in other CSC polymorphisms’ survival rate between wild and variant genotypes. Model-based classical recursive partitioning is a statistical method that creates a decision tree with the split criteria of log-rank test that strives to correctly classify according survival times and the Kaplan-Meier graphs. Tree-structured survival analysis of our data represented that ALCAM (GA+AA) end node had highest hazard ratio and lowest survival period as compared to CD133rs3130C>T and OCT-4 rs3130932 (Figs. 3 and 4).

Fig. 2
figure 2

Kaplan-Meier survival curve with ALCAM rs1157 in metastatic GBC patients

Table 5 Cox-proportional hazard model
Fig. 3
figure 3

Decision tree constructed by recursive partitioning analysis

Fig. 4
figure 4

Kaplan-Meier survival curve of the risk groups based on a specific gene variant profile including CD133 rs3130G>T, OCT-4 rs3130932, and ALCAM rs1157 G>A

Discussion

Cancer stem cells are responsible for tumor growth, migration, invasion, aggressiveness, resistance, and pluripotency of a tumor. This study was conducted to determine the role of cancer stem cell gene variants in gallbladder cancer susceptibility and treatment outcomes.

CSC genetic variants and GBC susceptibility

Our results obtained by analyzing 610 GBC patients and 250 controls demonstrated that ALCAM rs1157G>A and rs10511244T>C polymorphisms were significantly associated with increased risk in GBC patients at genotype and haplotype analysis. On the other hand, CD44 Crs13347Ars353639Ars187116Crs187115 haplotype was significantly associated with a protective effect. The ALCAM and CD44 are cell surface adhesion molecules which mediate adhesion interactions between cell-cell and cell-substrates [9, 21]. Previous studies also showed ALCAM rs1157 homozygous variant to be associated with breast cancer susceptibility [22]. The ALCAM rs1157G>A and ALCAM rs10511244T>C are present in 3′UTR and intron1 region of the gene, respectively. By F-SNP, we hypothesized that the variant alleles of ALCAMrs1157 G>A and ALCAMrs10511244T>C may be involved in transcription regulation (Table 6). Variant allele may result in increased ALCAM expression which may disrupt adhesion interactions that disturb tissue architecture between cells [23].

Table 6 In silico analysis

In our study, CD44Crs13347Ars353639Ars187116Crs187115 haplotype conferred lower risk of GBC which reconfirms our earlier observation in smaller number of samples [8]. However, CD44 rs187116–rs187115 T-A haplotype has been reported to confer higher risk of gastric adenocarcinoma [24]. CD44rs187115 variant genotype was also correlated with risk of oral cancer development [25]. CD44 has been reported to be highly expressed in subserous gallbladder carcinoma [26] and CD44 protein variant overexpression was related to histologic dedifferentiation of gallbladder carcinoma [27]. Several mechanistic studies have shown dualistic nature of CD44, after its ligand hyaluronic acid interaction. In silico analysis using F-SNP predicted change in transcriptional regulation for rs13347, rs353639, and rs187115 SNPs (Table 6), supporting the influence of selected CD44 genetic variants on gene transcription and splicing mechanisms.

On applying logistic regression and haplotype analysis, we did not find significant association of EpCAM rs1126497T>C, EpCAM rs1421T>C, CD133 rs3130C>T, CD133 rs2240688T>G, ALDH1A1 rs13959A>G, OCT-4 rs3130932T>G, SOX-2 rs11915160A>C, LIN-28A rs4274112 T>C, and NANOG rs11055786T>C polymorphisms with overall frequency distribution, gender, and gallstone status. Some other studies showed that EpCAM rs1126497 CT+TT and OCT-4 rs3130932 T>G were associated with susceptibility to breast cancer risk, whereas no association was observed with EpCAM rs1421 [28, 29].

In genetic association studies involving low penetrance genes, effect of individual genes in disease risk is rather limited and gene-gene interaction have been proposed as important contributor in risk assessments. For gene-gene interaction analysis by GMDR with adjusting covariants, we found that SNPs ALCAM rs1157 and EpCAM rs1126497 surface marker as best significant interaction model with increased risk of GBC susceptibility. F-SNP predicted EpCAM rs1126497 functional role in splicing regulation. STRING 10 also suggested ALCAM and EpCAM protein- protein interaction (Fig. 5). So, there must be an interacting role of ALCAM and EpCAM in GBC susceptibility.

Fig. 5
figure 5

STRING 10 protein-protein interactions for cancer stem cell surface marker

Likewise, GMDR analysis predicted interactive role of OCT-4 rs3130932 and LIN-28A rs4274112 molecular marker with GBC susceptibility. Lin-28a is a mRNA binding protein and facilitates posttranscriptional regulation of OCT-4 protein [30]. Our previous study also reported LIN-28A rs4274112 (AG+GG) association with lymph node metastases and OCT-4rs3130932 with hormone receptor positive tumors in breast cancer [31]. These findings uncover a new level of gene-gene interaction in gallbladder cancer susceptibility.

CSCs genetic variants and GBC treatment response

At both univariable and multivariable logistic regression and haplotype analyses, none of the CSC polymorphisms were significantly associated with GBC treatment response. Even after GMDR analysis, CSCs genetic variants for surface markers still did not significantly associate with treatment response and toxicity. In a previous study also, ALCAM rs1157 had no statistical significance with 5-FU-treated colon cancer recurrence [32] while in other study, it was shown to be associated with colon cancer recurrence [17]. CD44rs187116 and rs187115 T-A haplotype had lower risk to develop gastric adenocarcinoma recurrence [24]. The expression of CD44 with tumor prognosis is controversial. Certain studies have indicated that the overexpression of CD44 is correlated with elevated chemoradiotherapy resistance and increased risk of recurrence [33] whereas others have related poor prognosis to CD44 downregulation in tumor cells [34]. For EpCAM rs1126497T>C, rs1421T>C, our results with treatment response and toxicity are in tune with previous studies which also did not show any significant effect of these variants on prognosis in nonsmall cell lung cancer [35].

Variants for molecular marker genes showed an interactive association with treatment outcomes. In GMDR analysis, SNPs in SOX2 rs11915160, OCT4 rs3130932, and NANOG rs11055786 turned out to be the best interaction models for predicting poor response to NACT in GBC. The cooperative interaction sox-oct cis regulatory element is essential for nanog pluripotency. STRING 10 analysis showed interaction among SOX-2, OCT-4, and NANOG proteins (Fig. 6). Our earlier report also suggested SOX-2 rs11915160, OCT-4 rs3130932, and NANOG rs11055786 interactive role with treatment outcomes in breast cancer [31].

Fig. 6
figure 6

STRING 10 protein-protein interactions for cancer stem cell molecular markers

ALDH1A1 rs13959 elicited as the best model for higher grade 3–4 hematological toxicity. ALDH1A1 belongs to dehydrogenase family and plays a role in the detoxification of active cyclophosphamide metabolites. Our findings suggest that ALDH1A1 rs13959A>G genetic variations may affect the risk of severe hematological toxicity caused by myelosuppressive chemotherapy drugs. ALDH1A1 rs13959A>G is a synonymous polymorphism present in exon 3. F-SNP predicted its role in change in splicing regulation (Table 6). This variant allele could thus affect ALDH enzyme activity, thus resulting in an increased risk of chemotherapeutic drug toxicity. ALDH1A1 polymorphisms (rs3764435 C>A-rs63319C>A) A-A were also reported to be associated with grade 3–4 hematological toxicity in breast cancer [36].

CSC genetic variants and GBC survival

Survival analysis results showed that ALCAM rs1157 GA+AA genotype is associated with unfavorable survival in GBC metastasis cases. The Cox-proportional hazard regression analysis demonstrated that GA+AA genotype had higher hazard ratio. Swedish population study also showed similar results; ALCAM rs1157 homozygous minor allele carriers had worse survival in breast cancer [37]. CD44, EpCAM, CD133, NANOG, SOX-2, LIN-28A, OCT-4, and ALDH1A1 polymorphisms did not significantly associate with overall survival. But previously reported, CD44 rs187116G>A–rs187115T>C T-A haplotype and CD44rs13347 variant genotype associated with lowered survival rate in breast cancer patients [24, 38]. Survival decision tree terminal node determined ALCAM GA+AA as worse survival subgroup with higher hazard ratio.

These data collectively suggest important biological role of the CSC genetic variants in the susceptibility and prognosis of gallbladder cancer. However, functional studies are needed to elucidate the effects of the CSC genetic variants in various cancers. In future, cancer stem cells may prove to be important in defeating gallbladder cancer.

In conclusion, our findings suggest important role of CSC genetic variants in GBC susceptibility, prognosis, and survival outcomes.