Introduction

In the global area, breast cancer is cancer death's main reason in females, and approximately 2.1 million females will be suspected to be new breast cancer cases in 2018, accounting for 25% of female cancers [1]. Breast cancer ranks the first for incidence and the fifth for mortality of female tumors in China, and the burden of breast cancer has grown rapidly in the past several decades [2]. Breast cancer is a heterogeneous tumor with four major molecular subtypes and distinct in prognosis. Nowadays, the selection of treatment and evaluation of prognosis in clinic are mainly based on distinct molecular subtypes.

The prognoses of breast cancer patients are mainly determined by pathological characteristics of tumors, such as stage, grade, human epidermal growth factor receptor 2 (HER2), and hormone-receptor status [3, 4]. The searching for new prognostic markers to improve the performance of survival prediction has never ended, and recently, the influence of tumor markers on prognosis has attracted increasing attention. The cancer antigen 15-3 (CA15-3), a member of the mucin-1 (MUC-1) family of glycoproteins, is over-expressed in cancers and is identified as a useful tumor marker driven by altered glycosylation of itself [5]. In the blood, the elevation of carcinoembryonic antigen (CEA) levels as a kind of cell adhesion molecule is associated with tumor metastasis [6, 7]. The cancer antigen 125 (CA125), the product of the MUC16 gene, is a key regulator of multiple cell survival pathways in ovarian cancer and breast cancer cells [8]. Utilizing CA15-3, CA125, and CEA for predicting breast cancer risk has been widely incorporated in clinical routine, whereas the association of these markers with breast cancer prognosis has not been well demonstrated.

Considerable effort has gone into the correlation between tumor markers and the prognosis of breast cancer. The European Group on Tumor Markers has recommended utilizing CA15-3 and CEA to evaluate a patient’s event in the future course of the disease, early detect the progression of disease, and select therapeutic methods in breast cancer [9]. In contrast, the American Society of Clinical Oncology (ASCO) has not recommended the utilization of CEA and CA15-3 in screening, observing the presence of disease, staging, and choosing treatment regimen of breast cancer [10]. Elevated CA125 levels have been accounting for 84% in metastatic breast patients [11]. CA125 was reported to be proved effective in patients' management with breast cancer and could seem to provide predictive information in the course of disease [12]. Maric et al. [13] recently reviewed serum tumor markers' role in breast cancer, and they indicated that more wide investigations concerning prognostic value were supposed to be needed due to inconsistent studies. Additionally, most studies have been based on breast cancer overall, and the association between these markers and breast cancer outcome by molecular subtype remains to be elucidated clearly.

Therefore, we established a breast cancer patients’ specific cohort in China and assessed the role of tumor markers (CA15-3, CA125, and CEA) in breast cancer outcome overall, and specifically for different molecular subtypes.

Materials and methods

Study population and data sources

Tianjin Breast Cancer Cases Cohort (TBCCC) began in 2007; all the participants were pathologically confirmed breast cancer patients diagnosed and treated in the Tianjin Medical University Cancer Institute and Hospital. Demographic and epidemiologic data of participants were collected by one professional full-time personnel using a structured questionnaire shortly after the confirmation of breast cancer. Clinico-pathological and treatment data were abstracted from medical records. Follow-up information and vital status were attained once a year by telephone. Lost to follow-up rate was 6.04% in the study population. Moreover, the database was annually updated by checking medical records and by linking to the Tianjin Cancer Death Registry System to further ascertain the accuracy and completeness. The last obtainable update of vital status was completed on December 31, 2017. The written informed consent was acquired from each breast cancer patient or patient’s guardian in TBCCC, and the current study was approved by the research ethics board of the Tianjin Medical University Cancer Institute and Hospital. The study included 11,851 women identified from TBCCC and diagnosed with invasive breast cancer between 2005 and 2014. We excluded patients with stage IV at diagnosis (n = 62), preoperative metastasis (n = 16), and receiving neoadjuvant chemotherapy (n = 937). A total of 10,836 female cases were included in descriptive and survival analysis (Fig. 1).

Fig. 1
figure 1

Diagram of the study population

Tumor markers and covariates

Peripheral blood samples (5 mL) were collected from all patients before surgery. Then, serum was separated by centrifugation (3000 rpm for 5 min) and kept at – 80 °C for later analysis. Serum CA15-3, CA125, and CEA levels were determined using an automatic electrochemistry luminescence immunoassay system (ROCHE E170; Roche, Germany). The cut-off points for normal and elevated tumor markers were 25 U/mL (CA15-3), 35 U/mL (CA125), and 5 μg/L (CEA). CA15-3 with > 25 U/mL, CA125 with > 35 U/mL, and CEA with > 5 μg/L were considered elevated levels, respectively. We included demographic, epidemiologic, and clinic-pathological variables in the analyses as adjusting variables. Demographic and epidemiologic factors included age at diagnosis, body mass index (BMI), marital status, education, average monthly income, occupation, age at menarche, parity, menopausal status, duration of breastfeeding, oral contraceptive use, smoking status, physical activity, and family history of breast cancer. Clinico-pathological factors included pT, pN, histological grade, stage, ER status, PR status, molecular subtype, chemotherapy, endocrine treatment, and radiation treatment. BMI was calculated as weight in kg that was divided by height's square in meters. Patients were defined as underweight if BMI ≤ 18.4 kg/m2, overweight if BMI ≥ 24 kg/m2 and obese if BMI ≥ 28 kg/m2. Monthly income of patients was categorized as low, middle, and high. Family history of breast cancer referred to at least one of the relatives (father/mother, brother/sister, son/daughter, and grandparents) with breast cancer. Patients who smoked regularly (at least one cigarette per month, more than three months) were defined as smokers.

Breast cancer molecular subtypes

In our study, part of the patients’ molecular subtypes was from the pathology reports, and the rest were evaluated by imputation. The expression of ER, PR, HER2, and Ki-67 status were detected using immunohistochemistry (IHC) method. ER positivity and PR positivity were considered as the presence of ≥ 1% nuclear-stained malignant cells were. The status of HER2 was based on semi-quantitative method by calculating the stained tumor cell nuclear and nuclear staining intensity, and was graded between 0 and 3+. Results of “0” or “1+” or “2+” and HER2-FISH negative were indicated as HER2 negative and “2+” and HER2-FISH positive or “3+” as positive. Thus, we classified breast cancer intrinsic subtypes as follows: luminal A (ER+ and PR+, Her2−, Ki‐67 < 14%), luminal B (ER+ and PR+, Her2−, Ki‐67 ≥ 14% or ER+ and PR+, Her2+ [luminal Her2]), HER2‐enriched (ER−, PR−, Her2+), and basal‐like (ER−, PR−, Her2−, CK5/6+ and EGFR+) [14]. It is worth noting that basal-like subtypes are not completely identical with triple-negative subtypes, with an overlap of approximately 70–80%.

Previously, in the TBCCC, we reported that a subtype classifier was performed by random forest algorithm and the caret R package [15] to predict unknown molecular subtypes [16]. The data set with known molecular subtypes (n = 1046) was randomly divided to 80%:20%, with the former used as the primary cohort (n = 837) and the latter as validation cohort (n = 209). The model was constructed by binary ER, PR, HER2, continuous Ki67, and age at diagnosis and run with five times repeated tenfold cross validation to avoid overfitting. The performance of the imputation was assessed, with an accuracy of more than 99%. At last, we employed the established model to impute patients’ subtypes among research population with unknown molecular subtypes.

Statistical analysis

Descriptive analysis was utilized for features of tumor and patients, and results were showed as proportions and frequencies. The Chi-square test was used for presenting the difference of proportions. The end points which we observed were disease progression (defined as the local recurrence, regional, or distant metastatic) and death attributed to breast cancer. Disease-free survival (DFS) was defined as the time from the date of surgery to the date of first recorded progression, the date of last follow-up, or the date of death, whichever came first. Similarly, breast cancer-specific survival (BCSS) was defined as the time from the date of surgery to the date of death attributed to breast cancer or the date of last follow-up, whichever came first. Median follow-up time for BCSS and DFS was calculated by Reverse Kaplan–Meier method. The assumption of proportional hazards was met on the basis of analysis of Schoenfeld residuals. No multicollinearity between the independent variables (tested by variance inflation factor analysis) was found. DFS and BCSS were calculated by the Kaplan–Meier plots, and the log-rank test was used for presenting the differences of survival time among groups. Analyses were performed for breast cancer overall, as well as stratified by molecular subtypes. Prognostic factors that were identified through univariate Cox analysis were subjected to backward stepwise Cox proportional hazard regression analysis to identify statistically significant variables (P < 0.05, Wald test) to be incorporated in the multivariable analysis. Risk estimates were shown as hazard ratios (HRs) with 95% confidence intervals (CIs) for each factor. Interaction between tumor markers levels and molecular subtypes was calculated by including interaction variable to the Cox model. Data analysis was fulfilled by SAS (version 9.4; SAS Institute, Cary, NC, USA) and R software packages. We calculated two-sided P values and used a P value < 0.05 to express statistical significance.

Results

Patient characteristics

In total, 10,836 female breast cancer cases were included in this study, and the median age at diagnosis for participants was 51.0 (range 13.0–89.0) years. The median follow-up time was 55 (range 0–156) months. The median CA15-3, CA125, and CEA levels before surgery were 9.6 U/mL (range 1.0–395.3), 10.7 U/mL (range 0.5–2988.0), and 1.7 μg/L (range 0.2–1000.0), respectively. Elevated CA15-3, CA125, and CEA levels were identified in 367 (8.3%), 151 (7.6%), and 268 (9.5%) of the participants, respectively. By imputing for unknown molecular subtypes, there were 4513 of luminal A (61.3%), 1174 of luminal B (16.0%), 1172 of HER2-enriched (15.9%), and 498 of basal-like (6.8%) breast cancer cases in this analysis. Adjuvant chemotherapy was the most common modality (63.5%), in comparison with endocrine treatment (21.4%), and radiation treatment (15.1%). In total, 709 patients (6.5%) displayed disease progression and 571 (5.3%) died of breast cancer (Table 1).

Table 1 General characteristics of the study population

Association between levels of preoperative serum markers and demographic and epidemiologic factors

The distributions of demographic and epidemiologic factors among the participants are shown by tumor markers in Table 2. CA15-3 was significantly related to education (P < 0.01), average monthly income (P < 0.01), occupation (P = 0.02), and parity (P = 0.02). CA125 was associated with an average monthly income (P < 0.01). Moreover, statistical analysis indicated that CEA was closely correlated with patients with age at diagnosis (P < 0.01), education (P = 0.01), average monthly income (P < 0.01), occupation (P = 0.03), menopausal status (P = 0.02), and smoking status (P < 0.01).

Table 2 Association between serum markers levels and epidemiological factors

Association between levels of preoperative serum markers and clinico-pathological factors

The distributions of clinico-pathological factors among the participants are shown by tumor markers in Table 3. The elevations of three markers were more tend to be found in larger tumor size group (pT2 or pT3-4) compared with pT0-1 (P < 0.01). The elevation of CA125 was more tend to be found in greater lymph-node metastases (pN1 or pN2-3, 17.6%) relative to pN0 (16.9%). Women with greater lymph-node metastases (pN1 or pN2-3, 25.0%) and more advanced stage (AJCC stage II or III, 26.8%) were more likely to be identified as elevated CA15-3 compared with pN0 (9.5%) and stage I (6.9%). The basal-like subtype (17.9%) was most likely to have elevated CA125 levels, followed by luminal B (10.1%), HER2-enriched (9.1%), and luminal A (9.1%) subtypes. The proportion of elevated CEA was similar across luminal B (8.9%), HER2-enriched (7.0%), and basal-like subtype (7.0%), but much higher among luminal A subtype (17.7%).

Table 3 Association between serum markers levels and clinico-pathological factors

Association between levels of preoperative serum markers and breast cancer prognosis

For breast cancer overall, univariate analysis showed that female patients with elevated CA15-3 and CEA levels had significantly more adverse BCSS and DFS (Supplementary Tables 1, 2; Supplementary Figs. 1–5). As shown in Figs. 2 and 3, the multivariable analysis further revealed that elevated CA15-3 and CEA values conferred increased hazards of dying as a result of breast cancer (CA15-3: HR 1.54, 95% CI 1.01–2.34; CEA: HR 2.45, 95% CI 1.40–4.30). Similar associations were observed for DFS (CA15-3: HR 2.09, 95% CI 1.44–3.02; CEA: HR 2.71, 95% CI 1.71–4.27). However, no association between CA125 and BCSS or DFS of breast cancer was observed (BCSS: P = 0.89 and DFS: P = 0.08).

Fig. 2
figure 2

Multivariate Cox regression analysis for tumor markers and breast cancer-specific survival. *Adjusted for age at diagnosis, BMI, education, average monthly income, occupation, age at menarche, parity, menopausal status, smoking status, pT, pN, histological grade, stage, ER, PR, molecular subtype, chemotherapy, endocrine, and radiation treatment. Adjusted for age at diagnosis, education, occupation, parity, menopausal status, smoking status, pT, pN, histological grade, stage, PR chemotherapy, endocrine, and radiation treatment. Adjusted for age at diagnosis, marital status, education, occupation, age at menarche, parity, smoking status, pT, pN, stage, and PR. Adjusted for BMI, Marital status, pT, pN, stage, and radiation treatment. §Adjusted for BMI, education, parity, pT, pN, stage, and radiation treatment

Fig. 3
figure 3

Multivariate Cox regression analysis for tumor markers and disease-free survival. *Adjusted for BMI, education, parity, breastfeeding duration, menopausal status, pT, pN, histological grade, stage, ER, PR, molecular subtype, chemotherapy, endocrine, and radiation treatment. Adjusted for education, parity, menopausal status, pT, pN, histological grade, stage, PR, chemotherapy, endocrine, and radiation treatment. Adjusted for education, parity, pT, pN, histological grade, stage, PR, and radiation treatment. Adjusted for pT, pN, stage, and radiation treatment. §Adjusted for BMI, smoking status, pN, stage, and radiation treatment

When stratified by molecular subtype, univariate analysis showed associations for patients with elevated CA15-3, CA125, and CEA with certain subtypes (Supplementary Tables 1, 2; Supplementary Figs. 1–5). As shown in Figs. 2 and 3, the multivariable analysis further revealed that among those with the luminal A subtype, elevated CA15-3 and CEA exhibited consistently and statistically significant reduced BCSS (CA15-3: HR 4.47, 95% CI 2.04–9.81; CEA: HR 3.79, 95% CI 1.68–8.55) and DFS (CA15-3: HR 4.06, 95% CI 2.29–7.18; CEA: HR 3.41, 95% CI 1.75–6.64). Elevated CEA conferred significant reduction for BCSS in the basal-like group (HR 5.13, 95% CI 1.65–15.9). Nevertheless, elevated CA125 was not found to be a significant indicator of either BCSS or DFS in any molecular subtypes (P > 0.05). The P values for the interaction of tumor markers with subtypes were not statistically significant (P > 0.05).

Discussion

In line with previous study findings [17,18,19,20], the association between high CA15-3 and CEA and the adverse outcome was found, because elevated markers levels are directly tied to tumor burden and the presence of antigens about tumor at the time of diagnosis of breast cancer reveals vascularization in the tumor cell, with the possibility of micrometastases [21]. However, Ebeling et al. conducted a study including 1046 breast cancer cases and did not find CA15-3 an independent prognostic marker [22] and Clinton’s study supported this view [23]. A prior study reported no association between breast cancer outcome and CA15-3 among patients younger than 40 years [24], and another research by Maric et al. [13] did not offer prognostic information for tumor markers. Possible reasons for the inconsistent results may include differences in treatment modalities at the time of measurement, the time to measure tumor markers (preoperative, intraoperative, and postoperative), the prevalence of elevated serum tumors, sample size, and follow-up time.

To our knowledge, little is known about the relationship between tumor markers and the prognosis of breast cancer among specific molecular subtypes in Chinese. The previous study just reported the association of tumor markers and molecular subtypes [17, 25,26,27,28,29], and studies with regard to the association between markers and breast cancer outcome in each subtype were rare. Clarkes et al. [30] reviewed that MUC1 is a luminal cell-specific antigen, and can be involved in cell proliferation and differentiation. Our study further showed that patients with elevated CA15-3 and CEA had unfavorable BCSS and DFS among the luminal A subtype. This can be manifested that the elevated levels of CA15-3 and CEA are more tend to be in hormone-receptor positive breast cancer compared to negative ones. Furthermore, CEA seemed to be an independent prognostic marker for BCSS in the basal-like subtype. This may be explained partly by the distinct biological behaviors of distinct molecular subtypes [31].

The preoperative tumor markers are not only easy to obtain, but also the measurement is easily completed. Moreover, clinicians identify credible and comprehensive tumor markers is essential in an attempt to assist in decision-making for female patients, the improvement of therapeutic regimen, as well as predicting patients’ outcomes by subtypes. Compared with others’ studies, the role of the tumor markers was further validated with a longer follow-up of patients in our study. Although the breast cancer patients’ selection criteria were rigorous, our study still guaranteed a large sample size. Furthermore, our study evaluated the association between tumor markers and the prognosis not only in breast cancer overall but also in specific molecular subtypes. The utility of CA15-3 and CEA may be served as effective prognostic indicators for Luminal A breast cancer patients. Basal-like is an important molecular subtype of breast cancer characterized by aggressive behavior and limited therapeutic response [32]. Our study further demonstrated that CEA offer added prognostic value to basal-like subtype patients, whereas it was not an independent prognostic factor in HER2-enriched and luminal B subtypes.

Nevertheless, due to the generally good prognosis of breast cancer, the number of people with progression or death was small even after a long follow-up. For newly diagnosed patients, shorter follow-up can lead to bias. Given this limitation, longer term follow-up will be needed in this cohort to update the results, and multi-center prospective studies should be used in an effort to confirm the validity of this study. In addition, experimental studies to investigate the mechanism of the role of tumor markers in future studies are needed.

In conclusion, CA15-3, CA125, and CEA are directly associated with aggressive clinico-pathological characteristics and elevated CA15-3 and CEA before surgery negatively affect breast cancer survival and progression among luminal A and basal-like subtypes. Therefore, CA15-3 and CEA may be combined with known prognostic variables for clinical practice in assessing patients’ outcomes, and directing treatment modalities in pursuit of better prognoses, as well as determining personalized treatments for patients with different molecular subtypes.