Introduction

Gastric cancer (GC), or stomach cancer, is one of the most health hazard diseases for human beings, causing over 700,000 deaths worldwide per year [1]. And the prognosis is quite poor for this disease, with a 5-year rate of <5–15 %. Although the incidence and mortality rates of GC have been declining in recent years, it still ranks fourth in incidence and second in mortality among all cancers worldwide [2]. Moreover, over 70 % of these cases and deaths were estimated to occur in developing countries. In China, gastric cancer is one of the most common cancers in the country, ranking third on the list of lethal cancers and accounting for approximately 10 % of newly diagnosed cancers [3]. Generally, gastric cancer rates are about twice as high in males as in females.

The occurrence of GC can be attributed to a lot of factors [46].

There are a few studies that demonstrate the significance of environment on gastric cancer risk. In the review of McCredie in 1990 [7], the data gathered from 177,167 cases revealed that compared with Australians, a significant higher incidence of GC was detected in Europe, the British Isles, and Asia. And there are also some other observations that highlight the impact of environmental and behavioral risk factors in the development of gastric cancer across the globe [8]. Except for environmental and behavioral factors, infection by Helicobacter pylori also plays an important role in the GC occurrence: a meta-analysis by Eslick, reviewed 42 cohort and case–control studies after 1982 and identified a twofold increase in risk of development of GC in patients found to have previous H. pylori infection [9]. Further, as published by Uemura et al. [10], in a prospective study of 1,526 patients, 1,246 tested positive for H. pylori. However, both environmental and infectious factors have to induce the GC conditionally. With the recent development of genetic epidemiology and molecular epidemiology, an increasing number of studies is becoming focused on the genetic mechanism of GC, which definitely determines the occurrence and development of GC.

Development of GC is a multistage process. Normally, the progression from epithelial cell to tumor cells may consist of five stages at least [11]. These sequential changes in the gastric mucosa may occur over many years as the result of environmental and genetic factors interactions. And the accumulation of multiple genetic alternations, including activation of oncogenes and inactivation of tumor suppressor genes, will induce cancer development [1216]. And with the aid of molecular biology, the number of polymorphic genes that modify the effects of identified or suspected carcinogens is increasing [17, 18]. However, previous studies focusing on individual genes tended to have a little progression in explaining the genetic mechanism of GC. And more and more researches have taken the individual variations in cancer risk as an association of specific variant alleles on different genes that are present in a significant proportion of the normal populations [19]. The diverse associating patterns of these genes, along with environmental factors, could explain the high variation in the GC incidence observed around the world. Individual genetic susceptibility may be critical in a variety processes relevant to gastric carcinogenesis, including mucosal protection gene against H. pylori such as TNF [2022], polymorphisms in DNA repair [23, 24], tumor-suppressor genes such as TP53 [25, 26], genes involved in steroid hormone biosynthesis and progesterone receptor such as CYP19A1 and ALDH2 [2729], and regulation of gene expression [3032]. The potential functions of multiple genes have drawn a lot of attention in the recent years, and it is necessary to carry on comprehensive researches on the interactions and synergistic effect of these candidate genes [33]. In this study, we chose genes that have been proven in previous studies to play important roles on the development and progression of gastric cancer [3440], including enzyme metabolism genes such as GSTM1, GSTT1, CYP2E1, NAT, NTHFR, and ALDH2; DNA repair-related genes such as XRCC1; and inflammatory response gene such as IL- and TNF. PCR-RFLP (restriction fragment length polymorphism) and AS (allele-specific)-PCR methods were selected to obtain the products of the targeted genes. Then, we applied single-factor conditional logistical analysis to these genes to conform the susceptibility of these genes and used multi-gene risk analysis model to assess the synergistic effect of these genes on genetic susceptibility to primary gastric cancer.

Materials and methods

Patients studied

All patients were diagnosed with primary gastric cancer by pathological examination. There were 564 gastric cancer patients, including 453 males and 151 females, sampled in hospitals in Nanjing from 2005 to 2011. The mean age was 61.15 ± 12.61 (age 18–87). Corresponding controls were determined to be cancer free and were matched to each case according to gender and age (within 5 years). This investigation was approved by the Research Ethics Committee, and written informed consent was obtained from all individuals. A uniform epidemiology questionnaire was designed to obtain patients’ information, including gender, age, smoking, and alcohol consumption, by studying personal medical records and through individual interviews from cancer cases and control subjects. Blood samples (3-ml obtained by venipuncture into ethylenediamine tetraacetic acid [EDTA]-containing vials) were drawn from cancer cases and controls and immediately stored at −60 °C until use. Subjects with incomplete clinical pathological data, inadequate quantity of blood samples, or unsatisfactory genetic analyses were excluded.

DNA extraction and amplification

Genomic DNA was extracted from a leukocyte pellet by traditional proteinase K digestion and followed by phenolchloroform extraction and ethanol precipitation, and stored at −20 °C after quantification by ultraviolet spectrophotometer. Oligonucleotide primers were designed based on previous researches and sequences deposited in NCBI. PCR-RFLP (restriction fragment length polymorphism) and AS (allele-specific)-PCR were specially modified to detect target gene polymorphism [41, 42]. PCR was performed in a 50-μl reaction system containing 1× buffer, pH 8.5, 200 mM deoxynucleotide triphosphates, 2.5 mM MgCl2, 1 mM of each primer, 100 ng genomic DNA, and 2.5 units of thermostable Taq DNA polymerase. Identification of target genes was performed according to previous studies [41]. All the samples were repeated, and no discrepancies were discovered upon replicate testing.

Statistical analysis

Databases were established through Epidata3.0. Statistical analysis was performed by using the SPSS 18.0 software package (IBM, Armonk, NY, USA). Genotype frequencies were calculated after logical rectifying. The risk of the gastric cancer was evaluated with the chi-square test, P < 0.05 for statistical significant. We use the odds ratio (OR) and 95 % confidence level (CI) to show the statistical significance of various genotypes and gastric cancer risk. OR values were calculated with an unconditional logistic regression model, and confounding factors such as age and gender were specifically corrected in our analysis. Multi-gene risks were analyzed with the multi-gene risk analysis model proposed by Demchuk [43].This model estimated the frequencies of genotype profiles from single-gene frequencies as a product of epidemiologically derived single-gene frequencies. Polygenetic OR values were calculated with single-gene OR values under the assumption of absence of linkage disequilibrium. Therefore, single-gene frequencies multiply to estimate the frequency of polygenotypes. And the model also assumes that the selected genes are biologically independent and no epistasis at the level protein function is considered. The calculation with multi-gene risk model will give a multiplicative OR value for a polygenotype in which the combinatorial genotype OR is generated simply by multiplying individual OR values.

Results

Detection of potential risk genes and gene–gene interactions

Nine genes were selected as the potential risk factors (Table 1), including GSTM1, CYP2E1, NAT2 M1, NAT2 M2, NAT2 phenotype, XRCC1194, MTHFR1298, IL-, and VDR TaqI. We also conducted multivariate conditional logistic regression on the seven genes and confirmed seven risk genes and their genotypes at significant level 0.05. The detail descriptions of CYP2E1(c1/c1), NAT2M1(T/T), NAT2M2(A/A), XRCC1194(T/T), NAT2 phenotype(slow acetylator), MTHFR1298C(A/C), and VDR TaqI(T/T) are show in Table 2. The analysis of gene–gene interaction indicated that interaction effect was detected among 12 pairs of all the gene–gene combinations (Table 3). Moreover, this effect has been identified as synergistic effect (OR > 1) which can increase the risk of gastric cancer.

Table 1 Single-factor conditional logistic regression analysis
Table 2 Multifactors conditional Logistic regression analysis
Table 3 The interaction of gene analysis

Multiple genetic variants combinatorial contribution analysis

The OR values calculated for seven genetic variants were used to estimate the contribution of these genetic variants to gastric cancer risk, attempting to make a preliminary analysis for the risk of susceptibility genes. In this model, every genetic factor was divided into include or not. We use an X to denote a person carries a susceptibility genetic variant, if not 0. So, there are 128(27) potential genotypic profiles. For example (0000000) denotes a person does not carry any susceptibility genetic variant and the OR = 1. (XXXXXXX) denotes a person carry all the 7 susceptibility genetic variants, and the OR value is the product of each genetic variant’s OR value, so is the frequency.

Seven genetic variants’ OR values and frequencies are showed in Fig. 1. Our data shows that the frequency of these genes within a population and magnitude of risk are highly correlated, such that very high-risk genotypes are exceedingly rare. The highest risk genotype is unlikely to exist in a given population and therefore has minimal value for screening purposes. Modeling the impact of multiple three gene variants (NAT2M1, CYP2E1, and NAT2 phenotype [slow acetylator]) provides a pseudo-continuous log-normal relative disease risk distribution in the population (Fig. 2a). Inclusion of variants associated with MTHFR 1298, NAT2M2 (Fig. 2b) XRCC194, and VDR TaqI (Fig. 2c) further shifts the distribution toward the higher risk. As we added more susceptibility genes to the model, the risk distribution broadened, allowing better distribution of the population into high- and low-risk categories.

Fig. 1
figure 1

Frequency and ORs of genotypes calculated using the 7 gene variants selected

Fig. 2
figure 2

Distribution of relative disease risk calculated using asthma-associated gene variants. a Contain three genes: NAT2M1(T/T), CYP2E1(c1\c1), and NAT2 phenotype (slow acetylator); b A added with MTHFR1298(A/C), NAT2M2(A/A); c B added with XRCC194(T/T), VDR Taq I(T/T)

The present model also provided an opportunity to quantify the relative change in risk associated with the presence of genetic variants. This is exemplified in Fig. 3 where the dotted green line represents the risk profile for the most common genotypes modeled from three susceptibility genes variants (NAT2M1, CYP2E1, and NAT2 phenotype [slow acetylator]), the dashed blue line shows the risk profile when MTHFR 1298 and NAT2M2 are added, and the solid red line indicates the risk profile when all the seven genes variants are present.

Fig. 3
figure 3

Genotypic profiles’ accumulative frequency and OR values regression curve: dotted green line represents the risk profile for the most common genotypes modeled from three susceptibility genes variants (NAT2M1, CYP2E1, and NAT2 phenotype [slow acetylator]), the dashed blue line shows the risk profile when MTHFR 1298 and NAT2M2 are added, and the solid red line indicates the risk profile when all the seven genes variants are present (Color figure online)

Discussion

In this study, we used a case–control method to screen the effects of certain genes on gastric cancer susceptibility. And we found a significant association between risk change and genetic polymorphisms in Chinese Han people. This conclusion is consistent with the previous studies, which have shown that the occurrence of gastric cancer is a multi-factor and multi-step complex process [46]. Yet, some risk factors conformed in these reports have particularly small effects by themselves, making them difficult to be used in gastric cancer screening. There were also been substantial researches on tumor susceptibility and genetic polymorphism [44, 45], but the combined effects of multiple genes on cancer susceptibility remain difficult to study without a standard method.

Most researchers have been applying logistic regression model to explore gene–gene interactions so far [46]. And the single-genotype OR values provided by this model are the available input to model the polygenic disease association. However, the accuracy of this model to capture true polygenic susceptibility remains to be determined, and most of the conclusions of these researches are difficult to follow because of the complexity of this process. Furthermore, with more SNPs and genetic polymorphisms discovered, more samples are needed, which result in an issue that has been referred to as the “curse of dimensionality” [47].

For the study deriving data from multiple genes, we modified the methods of Demchuk to qualitatively and quantitatively evaluate the risk of the potential gastric cancer susceptibility genes, which were originally used to study the effects of multiple genetic polymorphisms on asthma. With this model, we can observe the change in risk with each additional gene. The frequency associated with such risk level will be important in defining susceptible population that needs increased protection with respect to exposure, as well as for risk management. Moreover, this model allows for incorporation of exposure information as an independent variable, illustrating why variants such as those involved in atopy or chemical metabolism, would need to be included separately in indentifying the number of individuals in a population at increased risk.

With the multi-gene risk model, more risk genetic variants were detected compared with our previous research [41]. However, the major limitation of multi-gene risk model is that epistatic relationships are not considered. Although the model assumes there is no statistical interaction, it does not account for potential biological interactions at the protein level that may modify risk. With the unveiling of the human genome, the challenge of understanding the mechanism of GC lies in revealing the function of genes, including gene to gene and gene to environment interactions. Genetic epidemiology and molecular epidemiology provide tools and methods that are helpful for discerning these relationships. The model widely used in previous researches may not be able to fully explain all gene interactions, but with improvements in methodology and understanding of genetics, more models can be used to explore the genetic causes of disease, identify susceptible populations, and improve risk management. Except for multi-gene risk model, some other new models have been used for genetic screening: Ritchie first proposed multifactor reduction (MDR) in 2001 [48]. However, this model cannot be used for quantitative traits; Lou et al. [49] proposed in 2007 based on the expansion of the basic principles of MDR method, which is called Generalized Multi-dimensionality reduction (GMDR) case. As an interaction analysis method, GMDR is model-free, available for studies on different outcome variables including continuous ones and permitted adjustment for covariates to improve prediction accuracy. GMDR method can also be applicable to different types of samples and outcome variables which was superior to other statistical approaches for continuous variables in some aspects [50]. However, all of these methods require further testing.

In conclusion, the polygenic model for genetic susceptibility contributes to the design of a virtual toxicology testing laboratory, which would help to reduce animal testing and adverse human exposures. And more GC risk genes have been detected with this model in our recent work. With the rapid advances in the identification of genetic variants of GC, key susceptibility polygenotypes deriving risk for this complex disease may be identified. And we believe this research can help to develop an effective way to use these factors to screen high-risk groups, especially in Chinese Han people.