Introduction

The selective estrogen receptor modulator (SERM) tamoxifen, and third generation aromatase inhibitors (AI) including anastrozole, letrozole, and exemestane have played a substantial role in decreasing breast cancer mortality, especially when used in the adjuvant setting [1]. Approximately 60–70% of newly diagnosed breast cancers are estrogen receptor (ER)-positive, but only 60% of these will respond to therapy [2]. It is not currently possible to identify which patients with ER-positive cancers will respond to anti-estrogens nor is it possible to determine whether a particular treatment (tamoxifen or an AI) will be more effective for an individual patient.

Tamoxifen is an estrogen receptor antagonist in breast cancer cells, accounting for its favorable anti-neoplastic effect. There are data suggesting that the effectiveness of tamoxifen can be partially attributed to its metabolic activation to more potent anti-estrogenic metabolites including 4-OH-tamoxifen (4-OH-tam) and 4-OH-N-desmethyl-tamoxifen, also called endoxifen [3]. This bioactivation is mediated primarily by the cytochrome P450 (CYP) family member 2D6 (CYP2D6), which shows large phenotypical variations due to genetic polymorphisms [4]. Though hundreds of polymorphisms have been identified, the majority of variation in metabolic activity can be accounted for by a relatively small number of no function (*3, *4, *6) or diminished function (*10, *41) alleles [5].

Low-activity polymorphisms in CYP2D6 are associated with decreased plasma concentrations of endoxifen [3, 6,7,8]. We and others have hypothesized that tamoxifen efficacy would be diminished in patients who have lower endoxifen concentration [9, 10] or carry low-activity CYP2D6 polymorphisms [11,12,13,14], however, these associations have not been established [15, 16]. These inconsistent results can be attributed to a number of factors including differences in patient, tumor, or treatment characteristics or incomplete genotyping analysis [17, 18]. Alternatively, another explanation for the varying results is that the underlying hypothesis is null or near null [19, 20]. Given the potential clinical significance of a predictive biomarker for tamoxifen efficacy, it is necessary to conduct additional studies in large cohorts of tamoxifen treated patients.

Here we report the results from a retrospective pharmacogenetic analysis of a large prospectively collected patient cohort. Specifically, our objective was to test for an association between CYP2D6 phenotype and benefit of tamoxifen [21], utilizing DNA from tumors collected from patients treated with surgery and adjuvant tamoxifen or with surgery only (n = 500 for each group). Our prespecified hypothesis was that patients who had low CYP2D6 metabolic activity, based on CYP2D6 genotype, would have worse treatment outcomes in the tamoxifen treated cohort but similar outcomes in the surgery only group.

Methods

Patient cohort

This secondary analysis was performed using patients from two breast cancer databases and corresponding biobanks maintained by the Breast Center at Baylor College of Medicine (BCM, Houston, TX) that have been previously described in detail [22]. Briefly, the PPG/P01 database and biobank, funded by the National Cancer Institute (Bethesda, MD), collected tissue and data on disease, adjuvant treatment and outcomes from community physicians for patients with early breast cancer diagnosed between 1970 and 1999. Clinical characteristics and outcomes were similar to data from the Surveillance, Epidemiology and End Results Registry for the same period. The second database and biobank, maintained as part of a Breast Cancer Specialized Program of Research Excellence (SPORE) grant funded by the National Cancer Institute, collected similar tissue and clinical data, with follow-up information coming from tumor registries for patients with early breast cancer who were diagnosed and treated between 1984 and 1999 from community hospitals throughout the United States. Comparison to SEER data for early breast cancer from more or less the same time period suggests that death has been very reliably ascertained, while disease recurrence was slightly under-ascertained. This is expected, given that data derived from hospital tumor registries not MD offices. The effective sample size slightly reduced; however, there is no reason to think there is a difference in completeness by genotype.

Selection of patients from these databases for the BCM Breast Tumor DNA Bank-v1 has been described previously [22]. Briefly, Caucasian patients from either database with ER+ tumors (≥3 fmol/mg protein) that received surgery and tamoxifen (“treated”, n = 500) or no systemic treatment (“untreated”, n = 500), had complete patient and tumor information, and sufficient banked tumor material, were selected. Treatment within these observational registries was in accordance with standard clinical practice. Duration of tamoxifen therapy reflects community practice during the time period and patient/physician preference. No patients were treated with adjuvant chemotherapy. A total of n = 213, and n = 787 samples came from the P01 and SPORE banks, respectively.

Genotyping and CYP2D6 phenotype assignment

Fresh, whole-tumor specimens were flash frozen and maintained in the biobank. These specimens thawed for approximately 3 days during a tropical storm that flooded the biobank, prior to being refrozen. DNA was isolated using Puregene® DNA Purification Kit (Qiagen) in the BCM Genetics Core. The DNA samples were genotyped for CYP2D6 gene variants using the Taqman® Allelic Discrimination assays (Applied Biosystems, Inc., Foster City, CA) as described previously [7]. The CYP2D6 gene variants determined include: *2 (rs1135840), *3 (rs35742686), *4 (rs3892097), *6 (rs5030655), *10 (rs1065852), *41 (rs28371725), and assays were run in a Step-One Plus instrument (Applied Biosystems, Inc. Foster City, CA). Detailed information on CYP2D6 allele nomenclature can be found at http://www.cypalleles.ki.se/cyp2d6.htm. Call rate for each allele genotyped was >99%; random selection and re-genotyping of approximately 10% of the samples yielded concordance >99%. Hardy–Weinberg equilibrium (HWE) was assessed for each polymorphism via exact tests by using the HWE function in the R package ‘genetics’. Each patient was assigned a predicted CYP2D6 phenotypic activity score (AS) based on the method recommended by PharmGKB by adding the AS assigned to each of the patient’s two alleles (*3, *4, *4×N, *6, *6×N = 0; *10, *41 = 0.5; *1, *2, *10×N, *41×N = 1; *1×N, *2×N = 2). Each patient’s AS was then transformed into a predicted CYP2D6 metabolizer activity phenotype (poor metabolizer (PM) = 0, intermediate (IM) = 0.5, extensive (EM) = 1.0–2.0, and ultra-rapid (UM) >2.0) [23, 24].

Statistical analyses

This analysis had a prespecified primary outcome and method of quantifying CYP2D6 activity, and was calculated to have 80% power to detect a hazard ratio (HR) of 2.5 assuming 6% of cases were PM and a sample size of n = 500 in each group. The primary endpoint for all analyses was recurrence free survival, defined as the period of time following surgery until first recurrence or death, or censoring due to loss of follow-up. Overall survival (OS), the time from diagnosis to death or censoring due to loss of follow-up, was used for secondary analyses. Due to the sparseness of very long-term follow-up data, all patients and analyses were censored at 150 months (12.5 years) of follow up. Survival curves were estimated using Kaplan–Meier method. Clinical characteristics and tendency to be treated with tamoxifen differed between patients obtained from the P01 and SPORE databases, therefore, analyses were stratified by database.

Clinical characteristics, genotype frequencies, and outcomes were compared between tamoxifen treated and tamoxifen untreated patients using Chi square or Wilcoxon Rank-Sum tests, as appropriate. Cox proportional hazards analysis was used to identify clinical factors (age, progesterone receptor (PR) status, nodal status, tumor size, database) significantly associated with outcome. Genotype data was defined in two ways, in the primary analysis, PM patients (AS = 0) were compared to all other patients (AS > 0) and in secondary analyses the AS (0–3) was analyzed as a continuous variable. Associations between CYP2D6 PM status (AS = 0) and prognostic clinical variables (age, nodal status, tumor size) were analyzed separately in the tamoxifen treated and untreated cohorts using Chi square and Fisher’s exact tests, as appropriate. Statistical significance of a relationship between genotype and treatment outcomes were assessed using the log-rank test independently in the tamoxifen treated and untreated cohorts. Schoenfeld residuals were inspected and the proportional hazards assumption was tested using the Kolmogorov-type supremum test on 1000 simulated patterns. Variables that violated the proportional hazards assumption (database in all models and PR status in the untreated and combined model) were used as stratifiers in subsequent models. Multivariable models were constructed including significant clinical variables and CYP2D6 genotype to test for independent contribution of CYP2D6. All statistical analyses were performed using SAS v9.3 with two-tailed tests and a standard significance threshold of p < 0.05.

Results

Patient characteristics

After exclusion of patients missing genetic or clinical information, 476 patients who received adjuvant tamoxifen and 481 patients who did not receive any adjuvant treatment were evaluable for pharmacogenetic analyses (Fig. 1). The demographic, disease, and treatment characteristics of the patients are reported in Table 1. All tumors were ER+ and 77% were PR+. In general, this patient population has favorable prognostic features such as small tumors (48% <2 cm) and low rates of metastasis (66% node negative). There are significant differences between patients who received tamoxifen treatment and those who did not in several of the patient and tumor characteristics including age, tumor size, and nodal status. This expected finding reflects the nature of the non-randomized and community-based cohorts; retrospective population-based analysis; patients with more aggressive tumors were more likely to receive additional adjuvant treatment, as decided by their treating physicians. The median follow-up for patients was 121 and 124 months for tamoxifen treated and untreated patients, respectively.

Fig. 1
figure 1

Consort diagram depicting the patient flow from initial selection from the SPORE or P01 databases into the final analysis

Table 1 Summary of patient, tumor, genetic, and outcomes data in tamoxifen treated and untreated cohorts

Association between clinical variables and treatment outcome

Differences between patient cohorts from the two DNA banks was highly significantly associated with RFS (Table 2) and OS (data not shown) in both the tamoxifen treated and untreated cohorts, therefore, all analyses were stratified according to the two cohorts. As expected, age, tumor size, and nodal status were independently associated with RFS in the treated and untreated cohorts (all univariate p < 0.05). PR status was not associated with outcome (p = 0.32).

Table 2 Associations with recurrence free survival in tamoxifen treated and untreated patients in univariate and multivariable analyses(a)

Genotyping results

The genotype counts for tamoxifen treated and untreated patients included in the analysis are reported in Supplementary Table 1. All minor allele frequencies were similar to expected frequencies in a predominantly Caucasian cohort [23]. Of note, the common no-activity CYP2D6*4 and diminished activity *41 alleles were within expected Hardy–Weinberg proportions. The CYP2D6*2 allele was not within the expected Hardy–Weinberg proportions; however, this is irrelevant as the *2 allele is categorized as metabolically normal (AS = 1), similar to wild-type *1 [24]. CYP2D6 diplotype was translated into a predicted activity phenotype for each patient (Supplementary Table 1).

Association between CYP2D6 and prognostic clinical variables

CYP2D6 poor metabolizer status (AS = 0) was not associated with age or tumor size in either the tamoxifen treated or untreated cohorts (all p > 0.05, data not shown). A nominal association with nodal status was detected in the tamoxifen treated patients, in which patients with CYP2D6 PM status were more likely to have ten or more positive nodes (5/28 = 17.9%) than patients with AS > 0 (16/449 = 3.6%) (p = 0.015). A similar association was not detected in the tamoxifen untreated patients (p = 0.42); however, the association maintained significance when the treated and untreated cohorts were combined (p = 0.026, Supplementary Table 2).

Association between CYP2D6 and treatment outcome in tamoxifen treated patients

In the primary analysis there was no association between CYP2D6 non-PM status (AS > 0) and RFS in tamoxifen treated patients (HR 0.68, 95% confidence interval (95% CI) 0.33–1.40, p = 0.29), Table 2 and Fig. 2 (left). A Cox-based survival curve assuming average clinical variables (1–3 nodes, tumor size of 2–5 cm, and 66.5 years of age) is depicted in Supplementary Fig. 1 (left). Similarly, in a secondary analysis of AS, as a continuous variable, there was no association with RFS (HR 1.16, 95% CI 0.84–1.62, p = 0.37, Table 2). After adjusting for relevant clinical covariates (age, tumor size, positive nodes), CYP2D6 non-PM status (p = 0.80) was not associated with RFS; however, there was a borderline significant association of worse RFS as CYP2D6 AS increased (HR 1.43, 95% CI 1.00–2.04, p = 0.05). CYP2D6 non-PM status (p = 0.28) and AS (p = 0.57) were not associated with OS in tamoxifen treated patients (data not shown).

Fig. 2
figure 2

Recurrence free survival curves stratified by CYP2D6 PM status including 95% confidence intervals (shaded areas) and number at risk (along X-axis). In tamoxifen treated patients (left) there was no association between CYP2D6 genotype and recurrence free survival. In the tamoxifen untreated cohort (right) the patients with CYP2D6 non-poor metabolizer phenotype had significantly better recurrence free survival (HR 0.44, 95% CI 0.22–0.89, p = 0.023) than patients with poor metabolizer phenotype

Association between CYP2D6 and treatment outcomes in tamoxifen untreated patients

A parallel analysis was performed in the cohort of patients that did not receive adjuvant systemic treatment. In the univariate analysis CYP2D6 non-PM status was associated with superior RFS (HR 0.44, 95% CI 0.22–0.89, p = 0.023, Table 2 and Fig. 2 (right). A Cox-based survival curve assuming average clinical variables (1–3 nodes, tumor size of 2–5 cm, and 66.5 years of age) is depicted in Supplementary Fig. 1 (right). In a secondary analysis of AS as a continuous variable, increasing AS was nearly significantly associated with improved RFS (HR 0.72, 95% CI 0.51–1.00, p = 0.051). In the multivariable model of RFS, nodal status did not maintain significance (p = 0.44). In adjusted analyses patients with CYP2D6 non-PM status had superior RFS compared to patients with PM phenotype (HR 0.41, 95% CI 0.20–0.84, p = 0.015) and similar results were found when analyzing CYP2D6 AS as a continuous variable (HR 0.66, 95% CI 0.47–0.92, p = 0.015). CYP2D6 non-PM status (p = 0.83) and AS (p = 0.74) were not associated with OS in patients not receiving adjuvant treatment (data not shown).

Discussion

A number of studies have tested the hypothesis that patients with breast cancer who carry low-activity CYP2D6 genotypes have inferior tamoxifen treatment outcomes. A recent meta-analysis detected a small, but statistically significant, increase in tumor recurrence for patients with diminished CYP2D6 activity, particularly for those who carry two non-functional copies of CYP2D6 (PMs, AS = 0) [16]. However, this meta-analysis relied on data from several independent studies, and there is concern that publication bias as well as exclusion of several large studies [12, 14] may be artificially inflating meta-analysis estimates away from the null hypothesis [25,26,27]. Therefore, it is important that additional large, well-conducted analyses testing the CYP2D6/tamoxifen hypothesis are published, regardless of their findings. The current study utilized two large breast cancer registries and biobanks with long-term follow-up to test for an association between CYP2D6 genotype and recurrence free survival in two subcohorts, one which received adjuvant tamoxifen treatment and the other that received no adjuvant treatment. All patients had ER+ tumors, did not receive adjuvant chemotherapy, and CYP2D6 allelic coverage was relatively comprehensive, three factors that have been identified as limitations of many of the previous retrospective studies [17]. In this analysis, there was no decrease in tamoxifen effectiveness for patients with CYP2D6 PM phenotype, though there was evidence of an association in the opposite direction when CYP2D6 activity score quantitatively was analyzed with adjustment for other important clinical characteristics. Additionally, in patients who did not receive adjuvant treatment, higher CYP2D6 metabolic activity was associated with superior outcomes.

After adjustment for clinical characteristics, we found that patients with low CYP2D6 activity have superior tamoxifen treatment outcomes. These data contradict the hypothesis that extent of metabolic activation of tamoxifen to endoxifen is a biomarker for therapeutic effectiveness and are consistent with two previous studies [28, 29]. Analyses of the CYP2D6-tamoxifen hypothesis with the highest strength of evidence, conducted in large prospective clinical trials, have yielded similarly conflicting results [12,13,14]. The potential biases and limitations of all studies to date has been discussed [30,31,32,33], but the overall equivocal results suggest that a true association, if one exists, is likely marginal and only detectable in the most highly selected cohorts. This conclusion is supported by the results of the meta-analysis from the International Tamoxifen Pharmacogenetics Consortium, which only detected an association with recurrence free survival in a carefully selected subcohort of the overall analysis population [16], a filtering process that itself was debated by the research community [26].

This study, unexpectedly, detected an improvement in RFS in patients with higher CYP2D6 activity in the cohort who did not receive adjuvant tamoxifen treatment. Inclusion of an untreated cohort in pharmacogenetic studies is necessary to differentiate between true pharmacogenetic effects that are predictive of treatment outcome and prognostic genetic effects [34,35,36]. If the tamoxifen/CYP2D6 hypothesis were true, one would expect to see patients with higher CYP2D6 activity have superior outcomes in the tamoxifen treated cohort and similar outcomes in the tamoxifen untreated outcomes. In contrast, our results indicate that patients with higher CYP2D6 activity have superior outcomes in the tamoxifen untreated cohort and similar, or perhaps inferior, outcomes in the tamoxifen treated cohort. Contrary to the hypothesis, these results suggest that increased CYP2D6 activity may be a prognostic factor associated with superior treatment outcomes in patients not receiving systemic treatment. If true, this adds an additional layer of complexity to previous studies of the CYP2D6/tamoxifen hypothesis, which did not include a tamoxifen untreated control group. However, skepticism is warranted, as a plausible biological rationale for a prognostic effect of systemic CYP2D6 activity on ER+ breast cancer prognosis is not readily available. The physiological role of CYP2D6 is not well defined as few high-affinity endogenous substrates have been identified. CYP2D6 is responsible for O-demethylation of pinoline [37] and of 6-methoxytryptamine to serotonin [38], which may account for the well-established association between CYP2D6 activity and personality [39, 40]. It is unlikely, though possible, these physiological differences are related to prognosis of ER+ breast cancer. CYP2D6 has very weak affinity for testosterone [41], suggesting a possible relationship with ER+ breast cancer occurrence or prognosis; however, associations of CYP2D6 polymorphisms with occurrence of ER+ breast cancer have not been detected in very large genome-wide screens [42].

Genotyping for this analysis was performed using DNA isolated from whole-tumor specimens, and not from peripheral blood. Several studies have confirmed a near perfect concordance between CYP2D6 genotypes obtained from tumor and matched germline DNA [11, 43,44,45,46] and these are in contrast with a single study reporting some discordance between CYP2D6*4 genotypes, potentially due to somatic loss of heterozygosity (LOH) [47]. Tumor LOH has been hypothesized to explain large deviations from HWE in the BIG 1-98 analysis [12, 30]. In the present study, CYP2D6*4 was well within expected Hardy–Weinberg proportions, further refuting the hypothesis that tumor genotyping causes meaningful misclassification. Deviations from HWE seen in BIG 1-98 are more likely the result from a well-known consequence of population admixture [48], similar to the deviations from HWE detected in the multi-center studies included in the ITPC, regardless of whether the genotyping was performed in DNA derived from blood or tumor [49]. Deviation from Hardy–Weinberg proportions for the *2 allele, and potential misclassification of *1 and *2, would have no effect on this analysis as both alleles are fully functional alleles with assigned AS = 1, based on CPIC recommendations [5, 24].

Other limitations of this analysis are also worth mentioning. The use of patients from non-trial-based breast cancer biobanks is subject to biases inherent in retrospective analyses [50], including under-ascertainment of recurrence, and several important data elements were not available for some or all patients including menopausal status (available for most), tamoxifen treatment duration and/or adherence, and concomitant administration of CYP2D6 inhibitors. Each of these variables has been hypothesized to be an important consideration in analyses of this pharmacogenetic association [16, 17, 51]. Given these limitations, it is critical that our current findings are interpreted in the context of the dozens of previously published studies. The inconsistency of these findings, spanning the full range of effect from protective, null, to enhanced risk, are consistent with random sampling from a distribution with a modest effect, at most. The marginal association, detectable only in carefully selected patient populations, and the relative infrequency of the PM phenotype (frequency ≈ 6% in Caucasian cohorts), further support recommendations of ASCO [52] and the NCCN [53] against genotyping CYP2D6 to guide tamoxifen treatment, despite confirmation that doing so is feasible and safe [15, 54,55,56,57,58].

In conclusion, in this large, retrospective analysis, patients who received tamoxifen treatment with low-activity CYP2D6 genotype had similar, or perhaps slightly better treatment outcomes compared with patients with normal or slightly diminished CYP2D6 activity. In a parallel analysis, patients with low CYP2D6 activity genotype who did not receive tamoxifen treatment had inferior treatment outcomes. These findings contradict the underlying hypothesis that low-activity CYP2D6 genotype is associated with inferior tamoxifen benefit. These findings further suggest that the true association between CYP2D6 activity and tamoxifen effectiveness, if one exists, is unlikely to be clinically meaningful.