Introduction

Breast cancer, the most common cancer among women worldwide, illustrates low incident rates in African and Asian populations in comparison to high incidence rates in North American and Western European population (more than 10 fold differences). Age-standardized breast cancer incidence rates in India range from 15 to 29 per 100,000 exhibiting an increasing trend in the last two decades [1]. Northeast Indian states like Aizawl district, Kamrup, and Imphal breast cancer incidence rates range from 14.6 to 19.6 per 100,000 women [2].

Northeast Indian population reportedly has distinct culture and food habits with extensive tobacco consumption. A high incidence of esophageal, gastric, and oral cancers associated with both smoking and smokeless tobacco has been documented in this region [3]. Strong association of breast cancer risk with betel quid consumption and its potential in causing chromosomal damage and genetic alterations have also been reported [4].

Tobacco carcinogens, particularly organic, due to their lipophilic nature get concentrated in breast tissue. These carcinogens once activated by enzymes expressed in breast epithelial cells generate reactive oxygen species, oxidized bases, and bulky DNA adducts and cause DNA double strand breaks [5]. In addition, constant mutagenic assaults lead to deletions, amplifications and mutations of critical genes [6]. DNA repair and cell cycle checkpoints are two primary defense mechanisms against mutagenic exposures. Homologous recombination (HR) repair is a fundamental process which maintains genomic stability in cells. RAD51 (RAD51 recombinase) the central protein in the HR repair pathway binds to DNA and promotes ATP-dependent homologous pairing and strand transfer reactions [7]. The interaction between RAD51 and BRCA2 (breast cancer 2, early onset) are critical for the cellular response to DNA damage. BRCA2 also implicated in the DNA repair processes plays a role in modifying the DNA-binding activity of RAD51 by preventing the formation of multimeric RAD51 complexes [8]. TP53 (tumor protein p53) mediates a SOS response that comprises of apoptosis, cell cycle arrest, and DNA repair [9]. cyclin D1 (CCND1) is a key regulatory protein of the G1/S cell cycle checkpoint that monitors for unrepaired DNA damage [10]. Therefore, it is important to investigate the contribution of common genetic variations in these genes towards breast cancer risk in relation to DNA damage due to environmental risk habits. The present study examined the role of TP53 72Arg>Pro, RAD51 135G>C, BRCA2 −26G>A, and CCND1 G870A gene polymorphism and BRCA2 gene mutations and their interactions with environmental risk factors in relation to breast cancer risk in Northeast Indian population.

Study subjects

Two hundred five histopathologically diagnosed breast cancer incident cases during the period of December 2006 to 2009 and patients willing to participate in the study were included. They were registered at Dr. Bhubaneswar Borooah Cancer Institute, Guwahati and Civil Hospital, Aizawl, the collaborating centers in Northeast India. Two hundred seventeen voluntary, age- (±5 years) and sex-matched individuals selected from the unrelated attendants who accompanied the cancer patients provided a willingly available source of controls from the same socioeconomic background as the cases reducing confounding biases. All cases and controls were resident of the northeastern part of India belonging to the same ethnicity. Demographic data and characteristics such as age, sex, smoking habit, tobacco betel quid use, and alcohol consumption were obtained from subjects in a standard questionnaire. Patients with only breast as their primary site of cancer were included. Controls were selected on the basis of no history of any obvious systemic or infectious disease. All subjects provided written informed consent for participation in this research which was done under a protocol approved by the institutional ethics committee of Regional Medical Research Centre, North East Region (Indian Council of Medical Research). Smokers, chewers, and drinkers were classified into two categories ever and never. Five milliliters of blood was collected in EDTA vials and stored under −70 °C until processed.

Genotyping assays

From each participant, 5-ml venous blood sample was collected in EDTA-coated vials. The blood was stored at −20 °C and was transported to the Institute where the study was performed. Genomic DNA was isolated by standard phenol chloroform method. Genotyping for TP53 72Arg>Pro, CCND1 G870A, and RAD51 135G>C polymorphisms in 205 Northeast Indian breast cancer cases and 217 matched controls was performed. Polymorphisms were analyzed by PCR-restriction fragment length polymorphism (PCR-RFLP) [11, 12]. Screening for mutation in all 27 exons of BRCA2 gene in cases and controls was done by denaturing high pressure liquid chromatography (DHPLC), and any sequence variation was confirmed by sequencing. The −26 G>A polymorphism in exon2 in the BRCA2 gene was also analyzed by DHPLC. In addition, about 10 % random samples were rechecked by the same method (multiplex PCR or PCR-RFLP).

Statistical analysis

Difference in the distribution of demographic characteristics and genotype frequencies between cases and controls were evaluated using the Chi Square (χ 2). Hardy–Weinberg equilibrium (HWE) was assessed by using the χ 2 test. Estimates of risk to breast cancer imparted by genotypes and other covariates were determined by using multivariable conditional logistic regression (LR). For all the tests, a two-sided p ≤ 0.05 was considered statistically significant. The data analysis was performed on the SPSS Version 16 software package.

Multifactor dimensionality reduction analysis

The multifactor dimensionality reduction (MDR) software (version 2.0 beta8) was applied to identify high-order gene–gene interactions associated with breast cancer risk. It is a nonparametric, genetic model-free method for overcoming sample size limitations of LR for the detection and characterization of gene–gene interactions. In MDR, multilocus genotypes are pooled into high and low risk groups, reducing the predictors from n dimensions to one dimension (i.e., constructive induction). The new one-dimensional multilocus variable is evaluated for its ability to classify and predict disease status through cross validation and permutation testing. Validation of models as effective predictors of disease status derived empirically from 1,000 permutations accounts for multiple comparison testing also identifying false positives. The MDR permutation results were considered to be statistically significant at the 0.05 level [13].

Interaction entropy graphs

Interaction graphs were built to visualize and interpret the results obtained from MDR using Orange machine learning software package. Entropy estimates were used to determine the information gain about a class variable (e.g., case–control status) from merging two variables together. Entropy estimates are useful for building interaction graphs facilitating the interpretation of relationships between variables. Interaction graphs are comprised of a node for each variable with pairwise connections between them. The percentage of entropy removed (i.e., information gain) by each variable is visualized for each node. The percentage of entropy removed for each pairwise Cartesian product of variables was visualized for each connection. Thus, the independent main effects of each single nucleotide polymorphism (SNP) can be compared to the interaction effect [14].

Classification and regression tree analysis

Classification and regression tree (CART) analysis was performed using the SPSS ver. 16 software to build a decision tree. Decision tree was created by splitting a node into two child nodes repeatedly, beginning with the root node that contains the total sample. Before constructing a tree, we chose measure for goodness of split using Gini criteria, by which splits were found that maximize the homogeneity of child nodes with respect to the value of the target variable. After the tree was grown to its full depth, a pruning procedure was performed to avoid over fitting the model. Finally, the risk of various genotypes was evaluated by using the LR analysis. The odds ratios (ORs) and 95 % CIs were adjusted for age, with treating the least percentage of cases as the reference [13].

False-positive report probability

Reports of gene–environment interaction studies are often challenged by false-positive discoveries especially when results are generated by multiple comparisons. To estimate the false-positive report probability (FPRP) and to evaluate robustness of the findings from MDR analysis, we used the Bayesian approach described by Wacholder et. al. Considering poor epidemiological data from the study population and inconsistent association of the SNPs with breast cancer risk, we set a fairly wider range of prior probabilities (10−6 to 10−1) with an estimated statistical power to detect an OR of 0.1, 0.2, and 0.3 and α level equal to the observed p value. The FPRP cutoff point was stringently kept to 0.5 [14].

Results

The mean age was 45.5 ± 12.86 years for the cases and 45.98 ± 14.44 years for the controls. No significant difference in distribution of age (p = 0.07), tobacco smoking (p = 0.16), and tobacco chewing (p = 0.78) was seen between cases and controls and were not found associated with breast cancer risk. However, women with a betel quid chewing history and alcohol consumption were found significantly associated with the risk of developing breast cancer (p < 0.001 and p = 0.003, respectively) (Supplementary Table 1).

Association of genetic and environmental factors with breast cancer risk by LR analysis

The distribution and main effects of genetic and environmental factors are summarized in Supplementary Tables 1 and 2. Betel quid chewing was significantly associated with breast cancer risk (OR = 4.98 (3.15–7.87); p < 0.001). Genotype distribution of CCND1 and TP53 polymorphism showed significant variation in cases and controls (Supplementary Table 2). Main effects of genotypes were evaluated using multivariable LR. Both homozygous and heterozygous genotype (AA, AG) and dominant model (AA+GA) of CCND1 showed a protective trend towards breast cancer risk (0.28 (0.14–0.57) p = <0.001, 0.37 (0.20–0.68) p = 0.002, and 0.34 (0.18–0.62) p = <0.001, respectively). The “A” allele was also found underrepresented in the breast cancer case population (0.68 (0.51–0.89), p = 0.006). The Pro/Pro genotype of TP53 also showed a protective trend towards breast cancer risk (0.52 (0.28–0.95), p = 0.03). The Pro allele of TP53 was also found significantly underrepresented in the case population (0.76 (0.58–1.00), p = 0.05) (Supplementary Table 2). No significant association of breast cancer risk was observed in women with RAD51 polymorphism. Screening for mutation in BRCA2 gene showed presence of 8415G>T: K2729N mutation in Exon 18 in two cases. The variant AA genotype frequency for −26 G>A polymorphism in exon2 was found 17.1 % cases and 14.7 % controls. The variant GG and AG genotype frequency of 10462A>G: I3412V in exon27 were 4.9 % cases and 2.8 % controls. The BRCA2 polymorphisms were not found associated with breast cancer risk.

Risk associated with SNPs stratified by betel quid chewing

Data was further stratified by betel quid chewing as it was the strongest independent risk factor in LR. Stratification of risk associated with genetic factors among betel quid chewers (BQC) and non-betel quid chewers (NBQC) is shown in Table 1. The AA genotype and dominant model (AA + AG) of CCND1 showed protection towards breast cancer risk in BQC (0.28 (0.10–0.77), 0.01 and 0.32 (0.12–0.81), 0.01), and NBQC (0.26 (0.09–0.78), 0.01 and 0.37 (0.16–0.87), 0.02). In addition, GA genotype of CCND1 was associated with protection towards breast cancer risk in BQC subset (0.34 (0.13–0.90), p = 0.03). The “A” allele of CCND1 was also significantly underrepresented in the breast cancer cases in both BQC and NBQC subsets (0.64 (0.43–0.95), p = 0.02 and 0.57 (0.36–0.89), p = 0.01, respectively). The Pro/Pro genotype and Pro allele of TP53 showed protection towards breast cancer in NBQC (0.29 (0.10–0.81), p = 0.01 and 0.51 (0.32–0.80), p = 0.003, respectively). In the BQC group, the C allele of RAD51 was overrepresented in cases and associated with breast cancer risk (2.03 (1.26–3.30) 0.002). Two cases showing BRCA2 8415G>T: K2729N mutation in Exon 18 belonged to BQC (Table 1).

Table 1 Distribution of genotypes of DNA repair and cell cycle genes amongst case and control in two sample subsets (NBQC and BQC)

MDR analysis

MDR analysis was applied to further explore gene–gene and gene–environment interactions. Best predictive models up to four orders of interaction, along with their cross validation consistency (CVC) and testing balance accuracy (TBA) were chosen. For total data set, betel quid chewing was the best one locus model with CVC of 10/10 and testing accuracy of 0.6770 which was statistically significant (p < 0.001) determined by 1,000 fold permutation testing. For a two-locus interaction, combination of betel quid chewing and alcohol consumption was most significant with CVC of 8/10 and TBA of 0.6673 (p < 0.001). The three-locus model consisted of tobacco smoking, tobacco chewing, and betel quid chewing with CVC of 10/10 and TBA of 0.6952 (p < 0.001). The best four-locus model which included genes consisted of RAD51* TP53* tobacco smoking* betel quid chewing with a CVC of 10/10 and TBA of 0.6869 (<0.001). MDR analysis performed in NBQC showed a best four-locus models with TBA 0.6765 (0.005) and CVC of 10/10 (Table 2). No interaction models were obtained for the BQC subset.

Table 2 Multifactor dimensionality reduction analysis (MDR) revealing interactions in total data set and NBQC

Interaction entropy graphs

As shown in the hierarchical interaction graphs in Fig. 1, for total sample set, betel quid is chewing large independent effect (9.38 %) among environmental factors. A strong interaction (1.32 %) was seen between RAD51 and TP53. Similar to total data set, the NBQC (Fig. 1) depicted a large part of interaction between TP53 and RAD51 (1.32 %). CCND1 had a large independent effect (1.89 %) in NBQC. In addition, small percentages of the entropy in case–control status explained by TP53 (0.64 %), or EX2BRCA2 (0.11 %) considered independently, but a large percentage of entropy explained by the interaction between these two loci (1.02 %) was found.

Fig. 1
figure 1

Interaction entropy graphs using orange software. The interaction model describes the percentage of the entropy (information gain) removed by each variable (main effect: represented by nodes) and by each pairwise combination of attributes (interaction effect: represented by connections). Positive entropy (plotted in green) indicates nonlinear interaction. Attributes are selected on the basis of MDR results obtained in case of a total dataset and b non-betel quid chewers. Labels: tbsmk tobacco smoking, bqchw betel quid chewing, EX2BRCA exon 2 of BRCA2 gene

CART analysis

Figure 2 shows the selected CART model constructed on all investigated genetic variants and environmental risk factors. The final tree contained nine terminal nodes. The first split of the root node was on betel quid chewing, indicating that it is the strongest risk factor for breast cancer. Among BQC, the subsequent splits showed interactions between CCND1, tobacco smoking, alcohol and TP53. In NBQC, first split was TP53 which was seen to interact with tobacco chewing and BRCA2 Exon2 −26G>A polymorphism. Terminal node 14 comprising of least percentage of cases was taken as reference to calculate OR for other terminal nodes. Among betel quid chewers, significant risk was observed for terminal node 3 consisting of CCND1 GG genotype (OR = 33.0; 95 % CI = 6.08–179.07, p = <0.001) followed by terminal node 11 (BQC, CCND1 GA,AA, No Smk, Alc) (OR = 42.00; 95 % CI = 5.11–345.11, p = <0.001). Risk was also observed in Nodes 15 (BQC, CCND1 GA,AA, No Smk, Non Alc, TP53 Pro/Pro; Arg/Arg) (OR = 14.84; 95 % CI = 3.13–70.34, p = <0.001) and Node 16 (BQC, CCND1 GA,AA, No Smk, Non Alc, TP53 Arg/Pro) (OR = 9.40; 95 % CI = 1.99–44.34, p = <0.001). In NBQC group, risk was seen for terminal node 5 comprising of NBQC and TP53 Arg/Arg (OR = 5.54; 95 % CI = 1.11–27.42, p = 0.03).

Fig. 2
figure 2

Classification and regression tree (CART) analysis for the DNA repair and cell cycle genes and environmental risk factors. Terminal nodes are bordered thick blue. Red-bordered odd ratio boxes values are <0.05

FPRP

Supplementary Table 3 shows the FPRPs for BQC and NBQC obtained from LR analysis. It reports the FPRP values calculated using the statistical power to detect an OR of 0.1, 0.2, and 0.3 with α level equal to the observed p value. Results show a reliability on CCND1 in the BQC and NBQC with prior probabilities (0.05 and 0.025) for both OR = 0.1 and 0.2. TP53 also showed a good reliability with prior probabilities (0.05 and 0.025) for both OR = 0.1 and 0.2. In addition, with a prior probability of 0.05 at OR = 0.3, CCND1 for BQC and NBQC and TP53 for NBQC gave reliable results.

Discussion

The present study employs a multi-analytic strategic approach to systematically examine the associations between breast cancer risk and a panel of genetic polymorphisms involved in DNA repair and cell cycle control. LR, MDR, and CART analyses consistently predicted betel quid chewing the most important risk factor for breast cancer in the Northeast Indian population. Betel quid is composed of areca nut (Areca catechu), catechu (Acacia catechu), slaked lime (calcium oxide and calcium hydroxide), wrapped in betel leaf (Piper betel) and tobacco. Betel quid chewing with tobacco results in high exposure to carcinogenic tobacco-specific nitrosamines (~1,000 mg/day, compared with ~20 mg/day in smokers). Ingestion of quid along with the juice causes nitrosation of secondary and tertiary amines favored by the acidic pH of stomach. Slaked lime, iron, and copper transition metals lead to generation of reactive oxygen species (ROS) in the oral cavity [15]. Since metabolic absorption of the ingredients of betel quid directs the cancer-causing principles to other organs/tissues of the body, the evidence is growing to indicate that cancers other than oropharyngeal may also be caused by betel quid chewing [16].

LR analysis illustrated an important role of CCND1 polymorphism in breast cancer pathogenesis irrespective of the exposure to betel quid. The CCND1 AA genotype in BQC and AA, GA genotypes in NBQC gene imparted protection against breast cancer risk. Moreover, CART analysis also illustrated interaction between BQC and CCND1. Meta-analysis by Cheng Lu et al. on 5,371 breast cancer cases and 5,336 controls from 7 published case–control studies showed that the A allele was significantly associated with increased breast cancer risk [17]. Contrastingly, three studies in nasopharyngeal, hepatocellular, and colorectal carcinoma from Southern, Taiwanese, and Singapore Chinese population, respectively, reported the A allele to be protective with similar allelic frequencies to the present study (A = 0.60; G = 0.40, mean A allele frequency 0.63) [18]. Parallel to this, G allele has been implicated to increase risk in colorectal cancer in Singapore and Turkish population, gastric cancer in Chinese population, and cervical and nasopharyngeal cancer in Portuguese population [1921]. Besides, GG genotype has been associated with poorly differentiated tumors, reduced disease-free interval in squamous cell carcinoma of the head and neck, and oral cavity/pharyngeal cancers [22, 23]. Cyclin D1 mRNA exhibits alternate splicing (transcripts a and b) resulting in protein products with possible functional differences [24]. The CCND1-A allele is predisposed for transcript b and cyclin D1b production suggested to have tumorigenic effects unlike the transcript a [25]. However, normal mucosa of controls is known to express both transcripts irrespective of genotype. Moreover, GG genotype is associated with transcript b in colorectal carcinoma and leukemia patients [18, 23]. The present study reveals a population- and cancer-specific role of CCND1 polymorphism. It can also be suggested that the AA genotype may endorse the “transcript a” production thus conferring protection in this population. However, the relationship of the alleles is complex, and probably functional studies may help to explain the exact biological basis of its interaction on tumor behavior in cancers.

RAD51-C allele showed an increased breast cancer risk in BQC in LR analysis. RAD51 5′UTR 135G>C polymorphism may increase the levels of RAD51 protein expression disrupting the fine balance of HR protein levels inhibiting the stimulation of apoptotic pathway [7]. High level of DNA damage in individuals with CC genotype in endometrial cancer has been observed [26] in contrast to reduce risk among heavy smokers in head and neck squamous cell carcinoma [27]. BRCA2 K2729N variant seen in two BQC cases has previously been reported in 3 % esophageal squamous cell carcinoma cases and controls in one study and in 0.62 % in breast cancer cases in the Chinese population and one familial ESCC case in Turkmen population of Iran. K2729N variant located in the conserved BRCA2 COOH-terminal domain is involved in α-helix and β-sheet structures of oligosaccharide-binding fold 1 (OB fold1). Unfolding of RAD51 from BRCA2 to the damaged DNA by FANCG protein is regulated by OB fold1 [28] which could therefore be effected in these BRCA2 mutated samples. Moreover, as interaction between the BRCA2 and RAD51 is essential for DNA repair [29], the C allele may act indirectly and disrupt DNA repair allowing the cell to accumulate more mutations. In addition, K2729 variant is also located in the binding domain of BRCA2 to MAGE-D1 protein, a synergistic suppressor of cell proliferation indicating deregulated cell proliferation in BRCA2 mutated samples due to incorrect/nonbinding of MAGE-D1 to BRCA2.

The MDR analysis did not generate any significant model in the BQC; however, CART analysis showed interactions between CCND1, tobacco smoking, alcohol and TP53. The results are biologically plausible as cyclin D1 modulates growth arrest and cell death in a p53-dependent manner following exposure to oxidative DNA damage which is caused due to ROS production via metabolism of betel quid and alcohol [30]. Overall, breast cancer etiology in BQC was governed by carcinogen exposure by betel quid chewing and CCND1 genotypes along with minor role of BRCA2 gene mutation and C allele of RAD51 gene.

In NBQC LR, CART and MDR analyses consistently revealed the prediction value of TP53. The Pro/Pro genotype of TP53 polymorphism was protective against breast cancer risk and consistent results were replicated by CART. Siddique and Sabapathy in 2006 [9] and Costa et al. in 2008 [31] reported that TP53-Pro is more efficient in induction of DNA repair target genes in contrast to other apoptotic and cell-cycle-arrest target genes than TP53-Arg [8, 31]. In addition, Siddique and Sabapathy also documented preferential expression the TP53-Pro allele at the RNA level by healthy Asian heterozygote individuals (TP53-Arg/Pro) in comparison to the TP53-Arg allele preferentially expressed in most heterozygote breast cancer patients. Further, the authors also depicted that TP53-Arg-expressing cells are less able to remove micronuclei associated with chromosomal aberration suggesting that TP53-Arg might be less potent in reducing genomic instability and perhaps cancer predisposition. Previously, it has been shown that mutations in the TP53 are less frequently found in TP53-Arg patients compared to TP53-Pro patients suggesting that the TP53-Arg allele be a weaker allele that does not require mutation for carcinogenesis [9]. These facts indicate the protective effect of the Pro/Pro genotype and risk with Arg/Arg genotype as seen in our study.

In addition, MDR generated a four-order model showing interactions between TP53* BRCA2 −26 G>A* RAD51* CCND1 in NBQC. The −26 G>A polymorphism in the 5′UTR of BRCA2 has a regulatory role which is further influenced by codon 72 polymorphism in the TP53 gene [32]. Interactions between TP53 and RAD51 may influence DNA recombination and repair [33]. In the post hoc analysis, entropy graphs depicted strong interactions between RAD51 and TP53 subsequent to TP53 and BRCA2 EX2. CCND1 had an independent effect. Overall breast cancer etiology in NBQC was governed by interaction between TP53, RAD51, BRCA2, and CCND1 genes along with a major role of TP53 codon 72 polymorphism.

The advantage of LR analysis is that it controls for confounding variables [34]. CART and MDR do not assume any specific parametric form while uncovering SNP–SNP interactions that are missed by LR [35]. Cross validation and permutation testing procedures reduce the chances of making type I errors as a result of multiple testing in MDR [34]. An important feature of CART is the influence of the first split on the tree structure. In our analyses, the main effect, betel quid appears in the first split. As there is a strong main effect, the resulting tree is interpreted as very stable [35]. Moreover, the significance of our results can also be gauged considering the FPRP values we obtained under different scenarios (Supplementary Table 3).

Limitation of the study include the candidate gene approach and relatively small sample size. Inclusion of tag SNPs would have present more convincing support for the associations. Northeast Indian population is relatively homogeneous, thus, the likelihood of extensive population stratification in our study is generally lower than in more ethnically diverse locations. Further, dietary patterns and other factors like family history were neither unaccounted nor adjusted in the analyses because of missing or uncollected data. However, given the strong interactions detected in this study, these potential confounders would probably have minor influence on the results. Moreover, the case–control matching was done in reference to age and ethnicity, thereby controlling for any confounding effect accounted by these variables.

Conclusion

Our data indicate that common genetic variations in DNA repair and cell cycle genes contribute towards breast cancer risk. In addition, unparallel predisposition was observed amongst BQC and NBQC breast cancer patients rendering dissimilar susceptibility towards breast cancer. BQC might be at an elevated risk for breast cancer attributable to betel quid carcinogens and minor roles of BRCA2 mutation and C allele of RAD51. Whereas NBQC could be at slightly lower risk for breast cancer due to the protection offered by the Pro/Pro-TP53 form. CCND1 polymorphism conferred protection irrespective of the betel quid chewing status.