Introduction

Head and neck squamous cell carcinomas (HNSCCs), which comprise cancers of the larynx, pharynx and oral cavity, are the sixth most common malignancies and seventh leading cause of cancer-related deaths in the world [1,2,3]. It has been estimated that new HNSCC cases have been increasing from 48,010 in 2009 to 65,410 in 2019 in the USA [4,5,6], and it had about 74,500 new cases in 2015 in China, according to the Chinese Cancer Society [7]. Several key factors may cause HNSCCs, such as smoking and excessive alcohol use for smoking-related HNSCCs as well as prior high-risk human papillomaviruses (HPVs) infection for HPV-positive HNSCCs, but only a small fraction of the population exposed finally develop HNSCCs in their life [8,9,10,11]. Although the underlying mechanisms have still not been thoroughly studied, the heterogeneity in DNA repair mechanism may be responsible for the susceptibility to HNSCCs [12].

Benzo(a)pyrene diol epoxide (BPDE) is a classic smoking-related carcinogen that can cause DNA damage by forming DNA adducts [13, 14]. DNA repair is essential to protect against DNA damage through several processes, of which the nucleotide excision repair (NER) process is a pivotal one [15,16,17,18,19]. Nine core genes (XPA, XPB, XPC, XPD, XPF, XPG, ERCC1, DDB1 and DDB2) are highly involved in the NER process that includes DNA damage recognition (DDB1, DDB2, XPA and XPC), dual incision and oligonucleotide fragment removal (XPD and XPB), gap-filling synthesis and restoring DNA duplex (ERCC1, XPF and XPG), and ligation [20,21,22]. Variation in these genes’ functions may lead to an abnormal NER process and subsequently increase the risk of cancers [21]. Humans with defective NER genes may have NER-associated diseases [23,24,25].

In previous studies, the reduced DNA repair capacity as measured by a host cell reactivation (HCR) assay was associated with an increased risk of HNSCCs in non-Hispanic white population [26, 27]. It was found that an increased risk of HNSCCs was associated with reduced mRNA expression levels of DDB1 in non-Hispanic white population [28]. So far, there has been no study exploring the above associations in Chinese population, in which the composition of HNSCCs is different from that in the non-Hispanic white population. Specifically, oropharyngeal cancers account for the vast majority of HPV-positive HNSCCs in Western countries, and the HPV-positive oropharyngeal cancer cases in previous non-Hispanic white population study were about 89.1% of all the oropharyngeal cancer cases, while in the Chinese population the majority of oropharyngeal cancer cases were HPV negative, meaning they were mainly smoking-related HNSCCs [28,29,30,31]. In addition, the etiology of smoking-related HNSCCs is quite different from that of HPV-positive HNSCCs [8, 32, 33]. Subsequently, we conducted a case–control study with 483 subjects to evaluate the associations between mRNA expression levels of nine core NER genes and risk of HNSCCs in the Chinese population.

Materials and methods

Study participants

We recruited HNSCC cases and cancer-free controls from the First Affiliated Hospital of Xi’an Jiaotong University during the period between January 2013 and April 2018. The cases were selected based on the following criteria: 40 years and older, newly diagnosed, histologically confirmed HNSCCs but with no other cancers. The controls were recruited among visitors (spouses, friends, and family members) accompanying patients to clinics other than the Head and Neck Clinic where cases are accrued in the First Affiliated Hospital of Xi'an Jiaotong University; they were biologically unrelated to the cases, frequency matched with cases by age (± 5 years) and sex; and have no history of prior malignancies; no blood transfusions in the last 6 months; and not on immunosuppression therapy. The subjects included in the currently study were all Chinese Han. A written informed consent was obtained from eligible cases and controls. As in the method described in previous studies [28, 34], participants who smoked more than 100 cigarettes during their lifetime were defined as ever smokers, of which those who had quit smoking for at least 1 year were defined as former smokers, and the rest were considered as current smokers; others were considered never smokers. Participants who drank alcoholic beverages at least weekly for 1 year were considered as ever drinkers, of which those who had quit drinking for more than 1 year were considered as former drinkers, and the rest were defined as current drinkers; others were defined as never drinkers. Ever smokers and ever drinkers were used to describe former and current smokers or drinkers, respectively. The HPV status of the subjects was tested by quantitative real-time PCR. The expression levels of core NER genes were not correlated with the HPV status in the non-Hispanic white population [28]. Since the number of the HPV-positive HNSCC cases was very limited with only two cases identified as HPV positive, we could not infer that NER gene expression was not correlated with the HPV status in the current Chinese population. Thus, the HPV-positive HNSCC subjects were excluded to avoid further heterogeneity in this study. The study protocol was reviewed and approved by the First Affiliated Hospital of Xi'an Jiaotong University Institutional Review Board.

Sample preparation and quantitative real-time PCR

Details regarding the current methods have been reported previously [28, 35]. As the assay applied to the target tissue may not measure the inherent NER phenotype, we use the peripheral blood lymphocytes as a surrogate tissue for measuring the NER phenotype; these have a lower probability of being exposed to etiologic agents than the target tissue. In short, each subject donated 15 ml of blood. T-lymphocytes were isolated from whole peripheral blood by Ficoll gradient centrifugation. The mRNA expression levels of NER genes were examined by quantitative real-time PCR using the total RNA with the TRIzol reagent (Invitrogen, Carlsbad, CA). The PCR was carried out using the ABI 7900HT Sequence Detection System (Applied Biosystems, Foster City, CA). Each amplification reaction was performed in a final volume of 5 μl containing the primers, complementary DNA and Master mix. The specific thermal cycling conditions were as follows: 95 °C for 5 min, followed by denaturation at 95 °C for 15 s for 40 cycles, and annealing/extension at 60 °C for 1 min. The 18S expression was used as an internal control. Each sample was analyzed in triplicate. The Ct value or threshold cycle was the PCR cycle at which a significant increase in fluorescent signal was first detected. The expression levels of nine NER genes relative to that of 18S were calculated by delta Ct (ΔCt). The ΔCt value was the Ct value of the target gene subtracted its Ct value of 18S. Therefore, the higher the ΔCt values, the lower are the expression levels of the target mRNA.

Statistical analysis

The distribution of demographic variables [e.g., age, sex, smoking and drinking status] were evaluated between cases and controls by the Chi-square test. The differences in the relative mRNA expression levels of NER genes were compared by Wilcoxon rank-sum test between cases and controls. In addition, we performed a stepwise logistic regression analysis to explore the best model to predict HNSCC risk.

The medians of mRNA expression values were used in the controls as the cutoff values for calculating crude odds ratio (OR) and their 95% confidence intervals (CI). The associations between mRNA expression levels and HNSCC risk were estimated by computing ORs and CIs from multivariate logistic regression models. Further stratification analyses were used to evaluate effect modification of related NER mRNA expression levels and demographic variables. A multiplicative interaction was defined as when OR11 > OR01 × OR10, in which OR11 was the OR when both factors were present, OR10 was the OR when only factor 1 was present, and OR01 was the OR when only factor 2 was present.

To assess the effects of mRNA expression levels on HNSCC risk prediction in the multivariate model, two risk models were constructed to examine the area under the receiver operating characteristic (ROC) curve (AUC): the baseline model including only demographic variables, and the mRNA model including the mRNA expression levels in addition to these demographic variables. All tests were two-sided, and P < 0.05 was considered significant. All statistical analyses were performed using SAS software (version 9.4; SAS Institute, Inc., Cary, NC).

Results

Characteristics of the subject population

The summary of the distributions of demographic variables and tumor characteristics between cases and controls is presented in Table 1. There were no significant differences in the distributions of age, sex, and drinking status between patients and controls. The average age was 60.1 years for the case subjects (median 59; range 41–89) and 59.3 years for the control subjects (median 59; range 41–90). Of all the subjects, 67.3% of cases and 63.8% of controls were male, 27.5% of cases and 31.5% of controls were former smokers, and 34.7% of cases and 31.5% of controls were former drinkers. There were more current smokers (28.3%) and current drinkers (31.5%) in cases than in controls (17.7% and 28.9%, respectively). The primary sites of 251 HNSCCs included the oral cavity (54, 21.5%), oropharynx (111, 44.2%), and hypopharynx/larynx (86, 34.3%).

Table 1 Distributions of demographic variables and tumor characteristics between cases and controls

Differences in relative mRNA expression levels of NER genes between cases and controls

We evaluated the differences in mRNA expression levels of NER genes between cases and controls by Wilcoxon rank-sum test (Table 2). Among all nine NER genes, only XPA and XPB expression levels were statistically significantly lower in cases than in controls (P = 0.029 and P = 0.001, respectively; Fig. 1a). Because the expression levels of the nine NER genes were measured at the same time, they were likely to be correlated with each other. As shown in Supplementary Table 1, mRNA expression levels of XPA or XPB were statistically significantly correlated with the other eight NER genes. Subsequently, when we included the expression levels of XPA and XPB in the stepwise logistic regression analysis, only XPB remained in the final model (P = 0.001).

Table 2 Comparison of the mRNA expression levels of nine core nucleotide excision repair genes between cases and controls
Fig. 1
figure 1

a Relative mRNA expression levels of nine NER genes between HNSCC patients and healthy controls. Quantitative real-time PCR was used to measure the relative mRNA expressions of nine NER genes; b modification effects of XPB by smoking status; (c) modification effects of XPB by drinking status

Stratification analyses of the expression levels of XPA and XPB

Stratification analyses of XPA and XPB expression levels showed that cases in subgroups of the age ≤ 59 years, age > 59 years, male, never and former and current smokers, and former and current drinkers exhibited significantly lower mean expression levels of XPB than controls (P < 0.001, P = 0.035, P < 0.001, P = 0.042, P < 0.001, P < 0.001, P = 0.008, and P < 0.001, respectively, Table 3). Likewise, cases in subgroups of male, former and current smokers, and current drinkers exhibited significantly lower mean expression of XPA than controls (P = 0.024, P = 0.015, P = 0.021, and P = 0.006, respectively). In both cases and controls, the age and sex differences in the expression levels of XPA and XPB were all statistically insignificant (P = 0.539, P = 0.507, P = 0.972, P = 0.490, P = 0.514, P = 0.829, P = 0.944, and P = 0.103, respectively). In cases, current smokers and drinkers had lower expression levels of XPA and XPB than former and never smokers and drinkers, and the differences were all statistically significant (P < 0.001, P < 0.001, P = 0.029, P < 0.001, respectively), but such a difference was not observed in controls (P = 0.077, P = 0.055, P = 0.130, P = 0.043, Table 3). The differences in the expression levels of XPA and XPB were not statistically significant by tumor sites in both cases and controls (Supplementary Table 2).

Table 3 Stratification analyses of mRNA expression levels of XPA and XPB between cases and controls

Associations between mRNA expression levels of NER genes and risk of HNSCCs

To estimate HNSCC risk, the relative mRNA expression levels of NER genes were dichotomized by the median values of the controls (Table 4). The crude ORs for HNSCC risk associated with lower relative expression levels of XPB were 1.70 (95% CI 1.18–2.44), compared with the high mRNA expression levels of XPB. After adjustment for all covariates in the logistic regression model, the ORs of XPB remained essentially unchanged. When continuous mRNA expression values were used in the logistic regression analyses with adjustment for all covariates, there was also a dose–response relationship between the reduced mRNA expression levels and the increased HNSCC risk for XPB (Ptrend = 0.001, Table 4).

Table 4 Logistic regression analysis of mRNA expression levels of nine NER genes in cases and controls

Interactions between XPB or XPA mRNA expression levels and selected variables

We further assessed possible interactions on a multiplicative scale between expression levels of XPB or XPA and demographic variables listed in Table 1. We found that smoking or drinking status had a significantly multiplicative interaction with the relative expression levels of XPB (P = 0.001 and P = 0.042, respectively, Table 3), but not of XPA, in association with HNSCC risk. To further unravel these interactions, we stratified the adjusted ORs by smoking and drinking status. It was apparent that ORs for the relative expression levels of XPB by medians in groups of former smokers and current smokers were greater than those of never smokers (Fig. 1b). The ORs for the relative expression levels of XPB by medians in groups of former drinkers and current drinkers were greater than those of never drinkers (Fig. 1c).

We further assessed the prediction performance of models integrating demographic variables and mRNA expression levels on HNSCCs using the ROC curves that measure the effect of XPB expression levels. The AUC was significantly improved in the model that included the combined effect of XPB expression levels and interaction with smoking status or drinking status, compared with the model that did not (Fig. 2a, P = 0.041, and P = 0.009, respectively). Furthermore, the AUC was significantly improved in current and former smokers that included the combined effect of expression levels, but insignificantly improved in never smokers, compared with the model that did not (Fig. 2b, c, d, P = 0.006, P = 0.021, and P = 0.989, respectively). Also, the AUC was insignificantly improved in current, former and never drinkers, compared with the model that did not (Supplementary Fig. 1a, b, c, P = 0.092, P = 0.110, and P = 0.941, respectively).

Fig. 2
figure 2

Overall and stratified ROC curves by smoking status calculated in multivariate logistic models. a The AUC was significantly improved in the model that included the combined effect of smoking status or drinking status and XPB expression levels, compared with the model that did not (P = 0.041, and P = 0.009, respectively). (b) The AUC was insignificantly improved in never smokers that included the combined effect of smoking status and XPB expression levels (P = 0.985). (c) The AUC was significantly improved in former smokers (P = 0.021). (d) The AUC was significantly improved in current smokers (P = 0.006)

Discussion

In the current study, we further confirmed the results in the previous non-Hispanic white population study that reduced mRNA expression levels of NER genes were associated with an increased risk of HNSCCs. However, our results demonstrated that the reduced relative mRNA expression levels of XPB were associated with an increased risk of HNSCCs, other than DDB1. Furthermore, we found that smoking status or drinking status had a significant multiplicative interaction with mRNA expression levels of XPB on HNSCC risk. The AUC model suggested that the combined effect of XPB and smoking status further improved the efficacy of risk prediction.

In a previous study, we measured the NER mRNA expression levels of HNSCC cases and healthy controls in a non-Hispanic white population [28]. The results suggested that the lower mRNA expression levels of DDB1 were associated with the increased risk of HNSCCs. In Western countries, the majority of oropharyngeal cancer cases are HPV-positive HNSCCs due to HPV exposure by sexual activity [8, 32]. However, in China, oropharyngeal cancers are primarily caused by tobacco smoking and alcohol consumption, rather than HPV infection. Importantly, the etiologic mechanism of HPV-positive HNSCCs is quite different from that of the smoking-related HNSCCs [33]. Thus, we conducted another large case–control study to investigate the relationship between the mRNA expression levels of NER genes and risk of HNSCCs in the Chinese population.

In this study, we found that reduced mRNA expression levels of XPB were associated with an increased risk of HNSCCs, other than DDB1, which is in accordance with previous translational study [34]. The previous study found significantly lower mRNA expression levels of NER genes in female controls than in male controls and that the reduced DDB1 mRNA expression levels may play a more important role in the risk of HNSCCs in males than in females [28]. However, we did not observe any significant differences in XPB mRNA expression levels between male and female subjects, nor did we find any interaction with sex. Interestingly, we have observed modification effects of smoking as well as drinking status on XPB, indicating that the XPB expression level is partly affected by smoking or drinking, and the association between the reduced mRNA expression levels of XPB and increased risk of HNSCCs may differ by smoking status or drinking status. Subsequently, we stratified the ORs by smoking status and found that the adjusted ORs for XPB in former and current smokers were greater than that in never smokers, indicating that ever smokers have a higher risk of developing HNSCCs with a lower XPB expression. Even though the results in the Chinese population study were different from that of the non-Hispanic white population study at transcript levels, smoking status is more essential for the etiologies of smoking-related HNSCCs. Firstly, the reason for the above discrepancy may be due to the fluctuations of expression levels measured by RT-PCR in different cell stages. Secondly, the different types of oropharyngeal cancers in different race groups may further lead to this heterogeneity. Moreover, we also found that there were no differences in these genes between different tumor sites, suggesting that mRNA expression levels of XPB may be a general biomarker for all HNSCC tumors.

The protein encoded by XPB was a subunit of the transcription factor IIH functioning as a DNA-dependent ATPase-helicase [36, 37]. The finding in the present study is consistent with that of previous translational study in a non-Hispanic white population. These results suggested that altered transcript and translational levels of XPB may both contribute to the risk of HNSCCs. The protein encoded by XPA is a zinc-finger DNA-binding protein that participates in NER to detect DNA damages [38]. We found that XPA expression levels were statistically significantly lower in cases than in controls, which is in accordance with the previous study [34]. However, the associations between the reduced XPA expression levels and increased risk of HNSCCs are statistically insignificant.

In a previous study, the improvement on prediction of HNSCCs risk by NER gene expression levels was more evident in males than in females [28]. In the current study, we found that the AUC model was significantly improved by including the combined effect of smoking status and XPB expression, compared with the model that did not, especially in ever smokers, suggesting that suboptimal XPB expression levels may play a more important role in the risk of HNSCCs in ever smokers than in never smokers.

The PCR assay is a rapid, cost-effective, and efficient method to measure the mRNA expression levels of NER genes, and we found that the reduced mRNA expression levels of NER genes were associated with an increased risk of HNSCCs in two different race groups [28, 35, 39]. Therefore, the PCR assay is an optimal assay for future epidemiologic studies. However, there are still several limitations needed to be resolved. Like previous hospital-based studies, the control group may not be representative of the general population, and future studies may need a much larger sample size and recruit the controls from the community-based population. Future mechanistic studies are also needed for the role of NER genes in the etiology of HNSCCs in the Chinese population.