Introduction

Depleting systemic estrogen concentrations by inhibiting the CYP19A1 (aromatase)-mediated conversion of androgens to estrogens is an effective treatment strategy for hormone receptor positive (HR +) breast cancer. Third-generation aromatase inhibitors (AIs) (anastrozole, letrozole, or exemestane) are first-line treatment in postmenopausal patients with HR + breast cancer [1, 2]. However, AI treatment is associated with characteristic toxicities that cause treatment non-persistence, which increases risk of breast cancer recurrence and death [3]. One of the toxicities that most often causes treatment discontinuation is AI-induced musculoskeletal syndrome (AIMSS)[4,5,6,7]. AIMSS is characterized by joint pain and stiffness, myalgias, carpal tunnel syndrome, tenosynovitis, and/or reduced grip strength and is experienced by approximately half of AI-treated patients [8,9,10].

AIMSS risk may be affected by clinical factors such as body mass index (BMI), prior chemotherapy (especially taxanes), and prior tamoxifen. However, these clinical factors alone do not accurately predict which patients will necessitate treatment discontinuation due to AIMSS [10,11,12,13]. Inherited germline variants in candidate genes including CYP19A1 [14,15,16] and ESR1 [17, 18] have been reported to affect AIMSS risk [19]. Additionally, a genome-wide association study (GWAS) identified variants in TCL1A that may predict AIMSS; however, this has not been successfully validated in other cohorts [20, 21]. Further pharmacogenetic discovery and replication are necessary to identify variants that may predict which patients are at increased risk of AIMSS and AI treatment discontinuation.

A substudy of the Exemestane and Letrozole Pharmacogenetics (ELPh) study was conducted to document the clinical course of AIMSS and identify clinical and genetic risk factors, primarily using candidate gene approaches [22, 23]. We recently performed genome-wide genotyping within ELPh to validate that CYP2A6 genetics was a major determinant of letrozole pharmacokinetics [24]. The objective of the current study was to identify new genetic variants associated with discontinuation of AI therapy due to AIMSS in ELPh participants using a genome-wide approach and, secondarily, replicate previously reported pharmacogenetic associations.

Materials and methods

ELPh patients and treatment

ELPh was a prospective, open-label study that enrolled postmenopausal women with stage 0–III HR + breast cancer considering AI therapy. AI could be given as front-line treatment or following completion of local (i.e., surgery and/or radiation) and systemic (tamoxifen and/or chemotherapy) treatment [25]. Patients were randomized 1:1 to receive oral exemestane (25 mg/day) or letrozole (2.5 mg/day) for 2 years, and the arms were stratified by prior treatment with bisphosphonate, tamoxifen, and chemotherapy. Patients were enrolled from August 2005 to July 2009 from  Indiana University Melvin and Bren Simon Comprehensive Cancer Center, Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins University, and the University of Michigan Rogel Cancer Center. Due to the small number of non-white patients in the ELPh study, all analyses and results were restricted to self-reported white patients. All patients provided written informed consent prior to enrollment and the study was approved by the Institutional Review Boards of each site.

Collection of treatment discontinuation due to AIMSS

Reasons for discontinuation of AI treatment, including the specific side effect, were prospectively recorded by study coordinators at each site. The primary end point for this pharmacogenetic analysis was the time to discontinuation of the initial AI medication due to musculoskeletal toxicity (i.e., AIMSS), defined as arthralgias, myalgias, joint pain or stiffness, tendonitis, numbness or tingling, and/or carpal tunnel syndrome. We have previously reported analyses of clinical and candidate genetic predictors of this endpoint [4, 23, 26].

Genome-wide genotyping and imputation

Methods for germline DNA isolation, genome-wide genotyping, imputation, and quality control have been previously reported [27]. Briefly, pre-treatment germline DNA [28] underwent genome-wide genotyping on the Infinium Global Screening Array V2.0 (> 650,000 variants) at the University of Michigan Advanced Genomics Biomedical Research Core. All sample call rates were > 98%. Variants with call rates (< 95%) or large departure from Hardy Weinberg Equilibrium (p < 10–6) were removed. Imputation was performed to generate allelic doses for > 16 million variants using the Michigan Imputation Server [29], SHAPEIT, and Eagle (v2.4), with removal of variants with imputation r2 < 0.20.

Statistical methods

Cox proportional hazards models were used to test the association between each genotyped or imputed variant and time to discontinuation of AI therapy due to AIMSS. Patients who discontinued for reasons other than AIMSS were censored at the time of discontinuation. Associations were analyzed under an additive genetic model using the “gwasurvivr” package in R [Rizvi et al. 2019] and adjusted for clinical variables that were previously reported to be associated with discontinuation due to AIMSS, including age (≤ 55 years), prior taxane chemotherapy (Yes vs. No), pre-treatment pain score on the visual analog scale (continuous), and treatment arm (exemestane vs. letrozole) [4]. As a secondary analysis, we repeated all association analyses by AI treatment arm (exemestane versus letrozole) to identify AI-specific associations. For variants that were significantly associated in one AI treatment group but not the other, we also tested the significance of an interaction term in the combined cohort.

Variants with minor allele frequencies (MAF) less than 0.025 in the combined cohort or 0.05 in the treatment-specific arms were excluded. Unless noted otherwise, results are reported for imputed allelic dosages. P values less than 5 × 10–8 were considered genome-wide significant. For the twenty-five variants in 11 genes that were previously reported to be associated with AIMSS risk [16, 18,19,20,21, 30,31,32,33], p values less than 0.05 were considered statistically significantly. Where noted below, associations for a subset of these variants were previously reported in the ELPh cohort [22, 23].

Similar to our previous GWAS, dbSNP and LDlink [34] were used to annotate variants of interest and assess patterns of linkage disequilibrium. Genetic variants that were significantly associated with discontinuation of treatment were also functionally interrogated using publicly available databases, including Genotype-Tissue Expression (GTEx) [35], RegulomeDB [36, 37], and the mQTL database [38]. Unless specified otherwise, all analyses were carried out using a combination of R and Python programs.

Results

Of the 503 patients enrolled on the ELPh study, 400 self-reported white patients who initiated AI treatment and had genome-wide genotype data were included in the analysis (Fig. 1). Demographic and clinical data for these patients are presented in Table 1 [28]. One hundred patients discontinued their treatment due to AIMSS, with a median time of 6.3 months (interquartile range of 8.3 months). Clinical variables previously reported to be associated with AIMSS risk in ELPh [4] had consistent findings in this analysis (i.e., increased AIMSS risk with younger age, prior taxane chemotherapy, higher baseline pain score, and exemestane treatment) (data not shown).

Fig. 1
figure 1

CONSORT diagram of patient flow from ELPh Trial to this AIMSS GWAS

Table 1 Characteristics of breast cancer patients by AI treatment

In the primary analysis of the combined cohort of 400 AI-treated patients, two imputed variants (rs79048288 and rs912571) were associated with AIMSS-related treatment discontinuation risk after adjustment for clinical covariates (Table 2, Fig. 2). Patients carrying the “T” allele at rs79048288 had increased risk of AIMSS (HR = 4.42, 95% CI: 2.67–7.33, p = 7.69 × 10–9, Supplementary Fig. 1). Rs79048288 is an intronic variant within CCDC148, the gene encoding the coiled-coil domain containing 148 protein. According to GTEx or RegulomeDB, this variant is not known to affect expression of CCDC148 or any other genes or protein. The RegulomeDB score for this variant is 0.35 (scale 0.0–1.0), with higher scores indicating increased likelihood of being a regulatory variant. There was no interaction between rs79048288 and treatment arm (p = 0.30), indicating that the association between rs79048288 and AIMSS risk is not AI-specific.

Table 2 Variants associated with time to discontinuation of AI treatment due to AI-induced musculoskeletal symptoms
Fig. 2
figure 2

Associations between all imputed variants and discontinuation of AI therapy due to AIMSS after adjustment for age (under 55 years), baseline pain score on visual analog scale, prior taxane chemotherapy treatment, and drug (exemestane). Two variants (rs79048288 and rs912571) were significantly associated with AIMSS risk in the combined cohort (p value < 5 × 10–8, indicated by red horizontal line)

Carriers of the “G” allele at rs912571 were protected from AIMSS in the combined AI therapy GWAS (HR = 0.30, 95% CI: 0.20–0.47, p = 4.74 × 10–8, Supplementary Fig. 2). Rs912571 is in an intergenic region upstream of PPP1R14C, the gene encoding protein phosphatase 1 regulatory inhibitor subunit 14C. The RegulomeDB score for rs912571 is 0.43 (0.0–1.0). According to GTEx, rs912571 is also an expression quantitative trait locus (eQTL) for PPP1R14C in sun exposed skin, with higher expression for the variant G allele (p = 4.8 × 10–5). Again, there was no interaction between rs912571 and the AI treatment arm (p = 0.87).

Secondary GWAS analyses were conducted within each drug separately. In the letrozole analysis, two variants in (rs1324052 and rs74418677) were associated with increased AIMSS risk (5.91, 95% CI: 3.16–11.06, p = 2.80 × 10–8, Table 2, Supplementary Figs. 3 and 4). These variants, which are in perfect linkage disequilibrium (R2 = 1), are in an intergenic region downstream of a long noncoding RNA known as LOC105377814. Each variant is associated with expression of SUPT20H in testis tissue in GTEx, with lower gene expression in carriers of the risk variant (i.e., rs74418677 C allele) (both p = 2.3 × 10–5), possibly via a cis-acting methylation effect on the nearby methylation site cg19272349 (mQTL in adults β =  − 0.504, p = 1.99 × 10–9). Based on RegulomeDB, there is stronger evidence that rs74418677 is a regulatory variant than rs1324052 (score of 0.59 versus 0.35). No variants were significantly associated with AIMSS risk in the exemestane analysis (Supplementary Fig. 3). A list of all imputed variants with suggestive associations (p < 1 × 10–5) in the combined, letrozole, or exemestane analyses can be found in Supplementary Table 1. Six candidate variants, rs912571, rs9322336, and rs2347868 in ESR1, rs7984870 in RANKL, and rs2369049 and rs11849538 in TCL1A, were significantly associated with AIMSS risk in either the combined cohort or the exemestane-treated group (p value < 0.05) (Table 3 and Supplementary Table 2). However, these associations have been previously reported in similar analyses of the ELPh cohort [22, 23]. None of the variants that were previously reported to be associated with AIMSS that had not been previously tested in ELPh was associated with AIMSS in this analysis (Supplementary Table 2).

Table 3 Candidate variants associated with time to discontinuation of AI treatment due to AI-induced musculoskeletal symptoms

Discussion

Musculoskeletal symptoms are a common AI treatment-emergent toxicity that often leads to treatment discontinuation [4,5,6,7]. Prior studies, primarily using candidate gene approaches, have reported several genetic variants that affect AIMSS risk [14,15,16,17,18,19]. Additionally, a genome-wide association study (GWAS) identified variants in TCL1A that may predict AIMSS [20]; however, this has not been successfully validated in other cohorts [20, 21]. Our primary GWAS identified two new variants (rs79048288 and rs912571) that were associated with AIMSS risk in the ELPh cohort.

In the primary GWAS analysis, the T allele of the intronic rs79048288 variant within CCDC148 was associated with a 4.4-fold higher AIMSS risk and the G allele of the intergenic rs912571 upstream of PPP1R14C was associated with a 3.3-fold lower AIMSS risk. In silico analyses did not reveal any obvious functional impact of the intronic variant within CCDC148 (rs79048288), and little is known about the physiological function of this protein, except that dysregulated expression has been reported in several cancer types [39, 40]. In silico analyses revealed that the G allele of rs912571 was associated with higher expression of PPP1R14C in sun-exposed skin, indicating that this variant could be functionally consequential. This gene encodes a signal-transducing protein phosphatase, also referred to as KEPI, that is an inhibitor of myosin phosphatase and regulates smooth muscle contraction, providing further suggestive evidence that this variant could be functionally associated with AIMSS [41,42,43].

Two additional variants of interest, rs1324052 and rs74418677, were found to be associated with AIMSS in the letrozole-only analysis. In silico analyses suggest that rs74418677 is a regulatory variant that affects methylation of cg19272349 and expression of SUPT20H, also referred to as P38IP, a protein that is known to be involved in cell cycle regulation and cellular autophagy [44, 45]. Intriguingly, a nonsense variant (p.Lys25X) in this gene was identified as the likely causal variant in hereditary rheumatoid arthritis [46]. We speculate that patients with subclinical arthritis-like conditions are at increased risk of clinically overt musculoskeletal pain when administered letrozole, similar to the identification of hereditary neuropathy genes as predictors of taxane-induced neuropathy [47]. However, we are not aware of any prior studies that have investigated or reported an association for these or other polymorphisms in these genes (i.e., CCDC148, PPP1R14C, or SUPT20H) with AIMSS or any other AI treatment outcome.

Our attempted replication of variants previously reported to be associated with AIMSS found only two significant associations in our combined cohort, both of which have been previously reported in ELPh. The association between rs9322336 in ESR1 and AIMSS risk in exemestane-treated patients was previously reported by our group in 2013 [23]. Interestingly, another group recently reported that this variant was also associated with lower AIMSS risk in an independent cohort of 196 patients treated with letrozole or anastrozole [18]. While these two studies provide consistent and suggestive evidence of association, the association between rs9322336 and AIMSS risk awaits validation in additional studies. In fact, such a validation was recently attempted in the racially diverse ECOG E1Z11 cohort of anastrozole-treated patients using rs2347868, which is modestly correlated with rs9322336 (linkage disequilibrium R2 of 0.31), and no association was detected for this variant or any of the other nine variants tested [48]. The other association for RANKL (rs7984870) was previously reported by our group in 2019 [22] and was itself an attempt to replicate a previously reported association by Wang et al. [49]. This variant was not included in the E1Z11 analysis and was not successfully replicated in our recent analysis of an independent cohort of 143 AI-treated women [50]. Taken together, there is weak evidence that any of these candidate variants in CYP17A1 [14], CYP19A1 [14,15,16], ESR1 [17, 18], or TCL1A [20, 21] are associated with AIMSS risk.

A genetic biomarker of AIMSS could be useful in clinical decision-making. Patients carrying a variant that increases risk of AIMSS for all AI may be candidates for enhanced toxicity monitoring [51] or evidence-based interventions such as exercise, yoga, duloxetine, and acupuncture [52,53,54]. The identification of a genetic variant that increases risk for only one or two of the AI’s would be even more clinically useful. These patients could be switched within the AI class, since all third-generation AI’s are similarly effective [55] and switching within the drug class can improve treatment tolerability and persistence [4]. This study indicates that none of the previously reported genetic biomarkers is sufficiently robust for clinical use. The variants identified in our GWAS, particularly the rs74418677 variant that may increase risk of letrozole-induced musculoskeletal toxicity through expression of SUPT20H, should be prioritized for future validation studies, e.g., within the E1Z11 cohort [48]. Convincing evidence of clinical validity is warranted prior to further investigation of the causal mechanism for these candidate variants and is necessary before these variants can be used for clinical decision-making.

This study had several strengths, including the use of hypothesis-agnostic genome-wide association approach in a large prospectively accrued cohort of patients with a well-documented clinical outcome. However, there were also some limitations that should be considered. Though the ELPh cohort is similar in size to a prior AIMSS GWAS [20], pharmacogenetic sample sizes like these are still orders of magnitude smaller than those used in disease genetics GWAS [56,57,58], which limits power to detect associations with smaller effect sizes or for uncommon variants (i.e., MAF < 0.025), or pathways that are enriched in the detected associations [59]. Finally, we were not able to attempt validation of an AIMSS polygenic risk score reported by another group due to a lack of information in their publication with which to recapitulate their 70-variant signature [60]. While we were not in a position to functionally characterize these variants in preclinical models, whether rs74418677 plays a role in regulating SUPT20H expression should investigated.

In conclusion, we identified several new variants that were associated with AIMSS risk in our cohort of AI-treated patients, including rs912571 (PPP1R14C) and rs74418677 (SUPT20H), that should be prioritized for attempted replication in independent cohorts of AI-treated patients. Successful validation of these associations is necessary prior to prospective studies that use genetic biomarkers to inform clinical decision making to reduce AIMSS and enhance AI treatment persistence to improve clinical outcomes in patients with HR + breast cancer.