Introduction

Breast cancer is the most common cancer among women, yet most cancer registries do not capture breast cancer recurrence [1]. This has led to an information gap that hinders health system monitoring and quality improvement efforts for a variety of reasons. First, given the large number of new breast cancer cases diagnosed each year, the burden to diagnose and treat recurrences is expected to be significant [2, 3]. Second, survival from primary breast cancer has improved with early detection and advances in treatment, but this has also resulted in a rise in the number of women living with metastatic disease [4]. Third, understanding the risk factors for recurrence could be used to identify high-risk populations and can also be used to guide patient-informed treatment strategies for the primary carcinoma [5,6,7].

Most population-level statistics on incidence, risk factors, and mortality are inferred from hospital-based cohort studies or randomized controlled trials, which may not generalize to the population because of differences in surveillance and referral practices [8,9,10,11]. Conversely, an algorithmic approach applied to population-level data has the benefits of automation and more comprehensive data capture with less loss to follow-up. To this end, researchers have sought to use administrative data to capture potential recurrences using diagnostic codes, evidence of subsequent treatment, survival estimates, and other indicators of healthcare utilization [12,13,14,15,16]. Validated against a gold standard, typically medical chart review, these methods have demonstrated utility, despite study-to-study variation in the definition of recurrence and the methodology used to identify it.

Leveraging this groundwork, we sought to estimate the population-level incidence of breast cancer recurrences in Ontario, Canada. We further estimate the percent of distant metastasis to the brain, bone, lung, pleura, liver, and other sites.

Methods

Cohort ascertainment

Patients were identified from the Ontario Cancer Registry (OCR) with a breast cancer diagnosis between 01/Jan/2013 and 31/Dec/2017 [17]. Patients were excluded if they were < 18 years of age at diagnosis, were diagnosed after death or by death certificate, had an invalid Ontario health card number, a non-Ontario postal code, or did not access the healthcare system through the Ontario Health Insurance Program (OHIP) within 1 year of diagnosis. We further excluded patients with: 1) lymphomas, sarcomas, or other non-breast histologies (n = 298); 2) stage 0, IV, unknown, or missing stage at presentation (n = 5517); and 3) unknown laterality for the primary cancer diagnosis (n = 242).

Data sources used to identify evidence of breast cancer recurrence

The Discharge Abstract Database (DAD) and the National Ambulatory Care Reporting System (NACRS) collect information for all inpatient and outpatient hospital encounters, respectively. The Activity Level Reporting (ALR) database includes data on radiation as reported by the regional cancer centers in the province. The ALR database also includes information on anti-neoplastic systemic therapy administered in regional cancer centers and community hospitals by drug name. The provincial New Drug Funding Program (NDFP) collects information on newer more costly intravenous systemic therapies, including the indication as specific for recurrent or metastatic breast cancer.

We looked for evidence of recurrence starting from the date of the primary breast cancer diagnosis until the date of censor, defined as the earliest of (1) the next cancer diagnosis (any new cancer in the OCR, excluding non-melanoma skin); (2) last contact date with the healthcare system (DAD, NACRS, OHIP); (3) death (OCR or RPDB); or (4) December 31, 2021. This was referred to as the recurrence follow-up period (the time the patient is “at-risk” of developing a recurrence).

Primary laterality

The OCR collects information on laterality (left, right, or bilateral) for the primary diagnosis. When assigning a new breast cancer to an existing case, the case resolution process considers histology, laterality, and temporality of the subsequent breast cancer. To allow for some misclassification, we considered patients who had a new contralateral breast cancer diagnosed within 6 months of the primary diagnosis to have had bilateral primary breast cancer (using the primary cancer diagnosis date as the diagnosis date) [18].

Covariates

Patient sociodemographic characteristics included age at primary diagnosis, rurality, and neighborhood-level marginalization derived from the 2016 Canada Census using the Postal Code Conversion File, PCCF + v7B. Clinical characteristics included Charlson comorbidity score (DAD/NACRS), symptomatic presentation, screening through the Ontario Breast Screening Program, and overall stage, hormone receptor (HR) status (positive for either estrogen or progesterone receptors), and HER2 status from the Collaborative Staging database [17]. HER2- patients who received trastuzumab within 1 year of diagnosis were reclassified as HER2 + .

Indicators of recurrence

Flags for local, regional, and distant recurrence/metastasis were created (Fig. 1). Patients were classified into 4 mutually exclusive categories based on the presence of: (1) distant metastasis (DM) flag; (2) regional recurrence (RR) flag; (3) local recurrence (LR) flag; and (4) no recurrence/metastasis.

Fig. 1
figure 1

Indicators for local, regional, and distant metastases from primary breast cancer. Indicators were searched starting from the date of primary surgical resection until end of follow-up. Indicators with an asterisk (*) were searched starting 1-year after the primary resection. **not considered in this study, but reported by others

Ipsilateral ductal carcinoma in situ (DCIS)

We captured the first ipsilateral DCIS from the OCR (ICD-O-3 behavior code 2) after the date of primary breast cancer diagnosis.

Second breast surgery

We used DAD/NACRS to identify a second breast cancer surgery occurring > 12 months after diagnosis to reduce the likelihood of misclassifying revisions or reconstructions after the primary resection (eTable S1). To determine the laterality of the second surgery, we used the intervention location attribute. The accuracy of this attribute code was verified (overall agreement 96.4%) using the laterality of the primary cancer diagnosis from the OCR and the laterality of the primary surgical resection (eTable S2).

Radiation

Radiation (ALR) delivered to the ipsilateral or contralateral breast, ipsilateral or contralateral axilla, supraclavicular region, or chest/chest wall were only considered if administered > 12 months after the primary diagnosis to avoid erroneously capturing primary treatment (eTable S3). No time restriction was placed on radiation administered to other sites. The accuracy of the laterality for primary radiation treatment was 98.9% (eTable S2).

Systemic therapy

We used the ALR and NDFP databases to capture specific chemotherapeutic agents suggestive or indicated for recurrent/metastatic breast cancer (eTable S4). For drugs or policies specific for breast cancer recurrence/metastasis (e.g., pertuzumab), no time restriction was applied (e.g., the earliest treatment was captured any time after diagnosis). For systemic agents that could also be used to treat primary breast cancer or recurrence/metastasis (e.g., doxorubicin), evidence of recurrence was considered if the agent was administered > 12 months after the primary cancer diagnosis.

Intervention codes for possible metastatic disease

We captured interventions (i.e., resection, fixation, destruction, repair) conducted on the four most common sites of metastatic breast cancer (brain, bone, liver, lung) (Supplementary Table S1 for DAD/NACRS procedure codes).

Diagnostic codes for secondary malignancy

We used ICD-10 diagnostic codes to flag secondary malignancies (eTable S5).

Statistical methods

To assess the internal validity of these codes to capture recurrence/metastasis, we used overall survival (OS) as an outcome indicator since OS was derived independent of any of the recurrence/metastasis flags, assuming that patients with a recurrence would have worse OS than those who did not [15]. OS was presented using Kaplan–Meier plots. Factors associated with recurrence/metastasis were assessed using multivariable logistic regression, reporting odds ratios (OR) with 95% confidence intervals (CI).

Software and privacy

All analyses were conducted at Ontario Health using Statistical Analysis Software (SAS Institute Inc., Cary NC). Research ethics was not required as per the Ontario Health privacy office.

Results

After exclusions, 45,857 patients were included (Fig. 2). The recurrence follow-up period was a median 67.1 (IQR 51.8, 84.8) months [mean 65.8 (SD 24.6) months]. Patients were a mean 63.1 (SD 13.6) years of age at diagnosis, 21,187 (46%) of patients were diagnosed with stage I breast cancer, 35,773 (78%) had ductal carcinoma, 31,590 (69%) had HR + /HER2- disease, 39,726 (87%) had no comorbidity, and 23,948 (52%) were symptomatic at the time of diagnosis (Table 1).

Fig. 2
figure 2

Kaplan–Meier plots for overall survival among patients who were classified as having a local recurrence (A), regional recurrence (B), or distant metastasis using the strict definition of only secondary diagnostic codes or radiation intent (C). The overall summary of the recurrence/metastatic event is presented in (D)

Table 1 Patient Characteristics

Local recurrence

A total 2159 (4.7%) of patients had a LR flag a median 23 (IQR 14–39) months after the primary diagnosis (Table 2). This was driven predominantly by ipsilateral breast surgery (1,448 patients) or radiation to the ipsilateral breast, chest, or chest wall (1,008 patients). Only 27 patients had an ipsilateral DCIS diagnosis during the follow-up period. Using OS as an outcome indicator, patients who had a LR flag had worse OS compared to patients who had no LR (log-rank p < 0.0001) (Fig. 3A).

Table 2 Indicators of breast cancer recurrence
Fig. 3
figure 3figure 3figure 3

Product-limit survival estimates

Regional recurrence

A total 1125 (2.5%) of patients had a RR flag a median 22 (13, 40) months after diagnosis, driven by radiation to the ipsilateral axilla, supraclavicular node or internal mammary chain (n = 605) or a diagnosis of a secondary malignancy in the axilla as the most responsible diagnosis (n = 395). Using OS as an outcome indicator, patients who had a RR flag had worse OS compared to patients who did not (log-rank p < 0.0001) (Fig. 3B).

Distant metastasis

A total 5431 (11.8%) of patients had a DM a median 23 (9, 42) months after diagnosis. Patients most frequently had an indicator for metastasis to the bone (n = 2614; 5.7%), liver (n = 1809; 3.9%), lung (n = 1698; 3.7%), brain (n = 1129; 2.5%), some other lymph nodes outside the ipsilateral axillary, internal mammary or supraclavicular regions (n = 822; 1.8%), and pleura (n = 726; 1.6%). A total 2128 (4.6%) of patients had a DM to some other site. Using OS as an outcome indicator, patients who developed a DM had worse OS compared to patients who did not (log-rank p < 0.0001) (Fig. 3C; eFigure S1 by site of metastasis).

Systemic therapy

A total 2653 (5.8%) of patients received chemotherapy not indicated for primary breast cancer and 1626 (3.5%) received chemotherapy for a recurrent/metastatic indication any time during the recurrence follow-up period. A total 1473 (3.2%) of patients received chemotherapy indicated for breast cancer at least 1 year after diagnosis. Using either of these classifications, a total 3253 (7.1%) of patients received chemotherapy for recurrence/metastasis a median 26 (12, 45) months after diagnosis.

Summary of recurrences

Because systemic therapy could be provided for either RR or DM, patients receiving chemotherapy without evidence of DM were considered to have had a RR. This decision was made a priori, but was also supported when OS was used as an outcome indicator (eFigure S2). A total 5431 (11.8%) patients developed a DM a median 23 (9, 42) months after diagnosis; 1086 (2.4%) of patients had a RR a median 13 (10, 34) months after diagnosis; and 1069 (2.3%) of patients had an isolated LR a median 26 (16, 42) months after diagnosis (eFigure S3 for time-until-event). The remaining 38,271 (83.5%) patients had no evidence of progressive disease during the recurrence follow-up period. Patients with DM had a median survival of 53.3 (95% CI 51.9–54.7) months from the time of primary diagnosis and a median OS of 15.4 months (95% CI 14.4–16.4 months) from the time the recurrence/metastasis was captured (Fig. 3D–E). In contrast, the median survival for all other patients was not reached.

In the subset of patients diagnosed with primary breast cancer in 2013 (for follow-up time for up to nine years), a total 1184/8713 (13.6%) of patients were found to have developed a DM during the recurrence follow-up period. There was a non-linear trend in the occurrence of the events with a decreasing risk of capturing a DM as the time since diagnosis increased: 29.6% of DM were identified within the first year of diagnosis, 18.0% within the second year, and 171 (14.4%) within the third year (Fig. 4).

Fig. 4
figure 4

Distant metastasis identified by number of years of the diagnosis date

Factors associated with recurrence

We explored factors associated with disease progression (LR, RR, or DM) (eTable S6 for descriptive statistics). After adjustment, patients were more likely to develop a DM (versus regional, local, or no recurrence) if they were older [OR 1.32 (1.06–1.64) for age 75 + versus < 50 years], male [OR 1.59 (1.19–2.12)], had stage II [OR 2.85 (2.62–3.11)] or stage III [OR 8.96 (8.16–9.84)] versus stage I, were symptomatic at primary breast cancer presentation [OR 1.87 (1.70–2.06) versus screen-detected], or had 3 + comorbidities [OR 1.56 (1.29–1.89) versus none] (Table 1). Patients were least likely to develop DM if their primary tumor was of mucinous histology [OR 0.41 (0.29–0.58)]. There were no obvious trends related to neighborhood-level socioeconomic characteristics. In the statistical model where we do not adjust for receipt of trastuzumab, HER2 + patients were more likely to develop a recurrence among HR + patients [OR 1.12 (1.01–1.23)], but less likely among HR- patients [OR 0.70 (0.61–0.81)]. After additionally adjusting for receipt of trastuzumab for the primary breast cancer (a mediator for the effect of HER2 status on the risk of recurrence), the effect of HER2 + on the risk of recurrence increased such that HER2 + patients were more likely to develop a recurrence if they were HR + [OR 1.79 (1.53–2.10), p < 0.0001], but not HR- [OR 1.16 (0.95–1.42), p = 0.13]. Receipt of primary trastuzumab halved the risk of recurrence [OR 0.53 (0.45–0.63), p < 0.0001].

Discussion

In the present study, we estimated that 2.3%, 2.4%, and 11.8% of stage I-III breast cancer patients had an isolated LR, a (loco)-regional recurrence only, and DM over a median follow-up of 67 months, respectively. OS after isolated LR was similar to the recurrence-free population and moderately worse for patients with (loco)-regional recurrence. However, OS was substantively reduced after a diagnosis of metastatic disease, with a median survival of 15.4 months. Patients presenting with symptomatic breast cancer and more advanced stage were more likely to develop a DM.

Recurrence rates reported in the literature vary due to differences in data availability, follow-up time, the definition of recurrence, and whether DM or RR were included [14]. However, our estimates are aligned with several international investigations. One European study demonstrated a 5-year cumulative incidence of 1.1%, 1.2%, and 7.6% for local, regional, and DM, respectively, after a median follow-up of 72 months [19]. Similar results were reported by one study from China (3.9% for locoregional and 8.8% for DM after a median follow-up of 75 months) [20]. One study in Brazil demonstrated a LR rate of 6.8% a median 28 months after primary breast-conserving surgery and was consistent with estimates from a recent systematic review [21, 22]. One registry-based study from the Netherlands reported a probability of recurrence of 20% after a median follow-up of 10 years, demonstrating a peak risk during the second and eighth years after diagnosis [23]. Other studies using more statistical approaches yielded estimates ranging from 7.6% over a period of 52 months to 19% over 20 years [14, 15].

Several studies demonstrated that the use of administrative data to identify a recurrence is valid and appropriate. Compared with manual chart review, indicators of a recurrence limited only to subsequent treatment (surgery, chemotherapy, or radiation) yielded sensitivity and specificity ranging from 78 to 81% [24, 25] However, this approach will not capture untreated DM [26] One Danish group supplemented treatment with ICD-10 diagnostic codes for secondary malignancies, demonstrating high agreement with medical chart review in a small sample of patients (n = 471, sensitivity = 97%, specificity = 97%) [27]. One Ontario study achieved a sensitivity of 85.3% and specificity of 93.8% using treatment indicators and cause of death from patients treated at three academic tertiary cancer centers [28]. Investigators in Alberta, Canada used several classification and regression tree (CART) models to optimize different measures of discrimination. Their overall predictive accuracy was good using a model that incorporated subsequent treatment, referral to oncologists, visits to cancer centers, visits to oncologists or general surgeons, primary cancer diagnostic codes, and cause of death due to cancer [13]. A similar study from the United States yielded lower accuracy using CART, but employed fewer indicators of recurrence, limited to registry data, treatment, secondary non-breast malignancy diagnostic codes, and a diagnosis of breast carcinoma in situ [12, 29]. To define DM, we opted to exclude any codes for treatments and instead relied on diagnostic codes for secondary malignancies. This was done because the reason for treatments (e.g., surgery) could be multifactorial and not specific to breast cancer metastasis. Prior work to identify metastatic brain cancers indeed found that surgical resections of brain tissue were frequently for new primary brain cancers instead of metastases [30].

Each year in Ontario, we anticipate approximately 12,000 new breast cancer diagnoses [31]. We therefore anticipate that 1632 (13.6% of 12,000) patients each year will present with a DM from a breast cancer that was diagnosed within the previous nine years. This is in addition to the approximately 500 patients per year diagnosed with stage IV breast cancer at initial presentation. This yields at least 2,132/12,000 (17.8%) patients per year diagnosed with metastatic breast cancer (de novo or after a previous diagnosis). This annual incidence surpasses that of many other primary cancer types, and with clinical trials geared toward this population demonstrating positive survival benefits of treatment, more accurate prospective data collection on these cases is needed [32]. Due to the lack of population-level data on this population, evidence is lacking on the effectiveness of treatments and the geographic variability in treatment options provided. Our findings indicate that this is not an inconsequential number of people, and efforts should be taken to obtain more detailed information on the type of recurrence/metastasis (e.g., location of lesions, number of lesions), treatment options, and the date of diagnosis.

Limitations

Although this is the first population-based study estimating breast cancer recurrence in Ontario, there are some limitations. First, although recurrences can occur at any time, for select indicators of loco-regional recurrence, we searched starting 12 months after the diagnosis date. This was important to reduce the number of false positives (e.g., misclassifying ongoing treatment or re-treatment for residual disease for the primary cancer as a recurrent event), but occurred at the expense of false negatives (e.g., missing an early recurrent event) [12, 13]. We believe we are justified in this trade-off because LR within the first year of diagnosis are less likely and discriminating recurrences from residual disease is not always clear. Second, although our results have face validity and reasonable convergent validity (e.g., using OS as an outcome indicator; reporting similar incidence estimates as the literature; and reporting similar factors associated with recurrence as the literature and in the expected directions), we were unable to systematically validate our findings using a gold standard (e.g., through an integrated electronic medical record system). Despite this, our definition uses similar indicators used by previous studies that demonstrated good agreement with chart review [24, 25, 27]. We believe those studies are generalizable to our patients since many of these studies were derived from jurisdictions with similar healthcare systems. Third, we may have overestimated the effect of recurrence on mortality since patients treated with oral hormonal therapy alone (e.g., those with more indolent cancers who respond well to hormonal therapy) may have been excluded. Unlike intravenous therapies, oral hormonal therapies like tamoxifen and aromatase inhibitors are incompletely captured in our databases. Patients can remain on hormonal therapy and switch between agents for many years after their primary cancer diagnosis. However, our databases do not give us the ability to distinguish prolonged therapy from the primary cancer versus new therapy for a recurrence. Thus, we opted to omit oral treatments like hormones from the study.

Strengths

Despite the above limitations, one strength of this study is that it was population-based and therefore less likely to miss recurrences because of patient movement within the province. This is one caveat of medical chart review since access to medical records may be restricted to specific institutions, yet patients may be followed elsewhere. At a population level, we provide estimates that are important for system level monitoring and planning for several reasons. First, survival for patients with breast cancer is high, with 5-year OS estimated at 85% for patients with stage I-III disease [33]. Thus, using OS as an outcome in comparative studies often demands large sample sizes with lengthy follow-up for statistical inferences. Clinical trialists therefore often use surrogate endpoints such as recurrence-free survival [34]. This may circumvent some of the problems of using OS as an outcome when mortality is uncommon, but recurrence is not always a surrogate for OS and must be interpreted cautiously [35, 36]. Isolated LR may be a poor predictor of OS since it can be clinically managed, but may instead be considered an indicator affecting patient quality-of-life. DM may instead be considered a better outcome indicator because it was highly prognostic and 82% of all DM were captured within 5 years of diagnosis. Second, since breast cancer is the most common cancer among women, uncommon or even rare events can have a large burden on population-level health system resources. Third, the incidence of a recurrence/metastasis (15.5% of breast cancer survivors) may warrant breast cancer survivors’ consideration for organized screening programs [5, 37, 38].

In future work, comprehensive electronic medical record systems would enable machine learning techniques to use information from imaging reports, pathology reports, and clinic progress notes to classify LR/RR/DM. One recent study using natural language processing of text from progress notes and pathology reports yielded an AUC 0.93 and 0.87 in the training and validation datasets, respectively [39]. Pathology reports alone would only be available to a non-representative sample of patients; those who had a biopsy or surgical resection to diagnose recurrence. Tissue specimens may be more common for some metastatic sites (breast, liver) but uncommon in others (brain, bone). Thus, imaging reports and clinic notes would provide a more representative corpus of documents for text-mining.

Conclusion

In conclusion, in our study using administrative data, 15.5% of breast cancer patients developed a recurrence/metastasis within 9 years after a diagnosis of breast cancer. Symptomatic patients with stage III primary breast cancer are at elevated risk of developing a DM.