Introduction

Breast cancer (BC) is the most commonly diagnosed cancer among women, and its high morbidity and mortality rate are mostly attributed to metastasis [1, 2]. Adopting and improving measures, such as early diagnosis, proper staging, risk assessment, estimating prognosis, and monitoring therapy response could result in a more favourable outcome and improve the quality of life for patients [3]. In clinical practice, this can be achieved by the assessment of biomarkers either from tumour itself or blood. Blood-based or circulating biomarkers have advantages over tissue biopsies since they can be accessed by minimally-invasive procedures and multiple samples can be obtained over a course of time. Thus, “liquid biopsy” which divulges information about the tumour status is an excellent alternative. However, currently used biomarkers, for example, carbohydrate antigen (CA) 15-3 or carcinoembryonic antigen (CEA) lack high sensitivity and specificity, and their use is limited to more advanced stages of BC [4, 5]. Currently, biomarkers for early detection of BC or metastasis are not in routine clinical use. Circulating tumour cell (CTC) status, which is the only commercially available independent prognostic marker for MBC approved by the U.S. federal drug agency (FDA) [6], has short-comings due to the techniques used for its detection, which is limited to specific sub-populations of CTCs [7, 8]. Thus, development of new biomarkers for early detection of BC and prognostic markers, which have higher diagnostic accuracy, is of considerable clinical importance.

Recently, circulating nucleic acids, circulating microRNAs and circulating DNA, have been proposed as promising markers for both primary (PBC) and metastatic breast cancer (MBC) [912]. Circulating DNA is described as cell-free DNA (cfDNA) or circulating tumour DNA present in serum or plasma [13]. Since the bulk of cfDNA is proposed to be released from tumour sites, analysis of cfDNA could be used as a surrogate for tissue biopsies [14]. Mutation status, genomic alterations or instability, and methylation status are few of the properties of cfDNA which could be ascertained to obtain a snap-shot view of the solid tumour [1517]. Apart from these, DNA integrity (DI) of cfDNA (cfDI), which is a measure of the extent of cfDNA fragmentation has also been exploited as a biomarker for diagnosis and prognostication in cancer. Analysis of cfDI has practical advantages and is suitable for routine diagnostics. The analysis material, i.e., plasma or serum, can be obtained easily, while the quick and well-established quantitative PCR (qPCR) methods are inexpensive and very sensitive, thus requiring low amounts of starting material.

Increased DNA integrity was first observed in plasma samples of malignant cancer patients in comparison to normal controls [18]. These results were later confirmed in head and neck [19], colorectal [20], breast [12], renal [21], and many other epithelial carcinomas. In BC, increased cfDI has been correlated to a worse disease outcome and poor response to adjuvant chemotherapy [12, 22]. This increase in cfDI has been explained by the hypothesis that, while in cancer patients cfDNA is derived from both apoptotic and necrotic cells, in healthy controls it originates predominantly from apoptotic cells [23]. Since DNA fragments resulting from apoptosis are 160–180 bp in length, and those from necrosis can reach up to several kbp, the hypothesized different origins of cfDNA were proposed as the cause of the observed differences in cfDI. While most of these studies used a PCR-based indirect inference of cfDI, those which employed direct visualization of DNA fragments showed, in contrast, cfDNA in cancer patients to be more fragmented than in controls [24, 25]. A recent PCR-based study has confirmed the presence of more fragmented and hence lower cfDI in prostate cancer [26].

In the study presented here, we have measured cfDI and cfDNA concentration of 383 probands by fragment analysis of ALU and LINE1 repetitive DNA elements. We found healthy controls to possess the highest cfDI and also observed the cfDI to decrease and cfDNA concentration to increase proportional to the cancer severity. Additionally, in the MBC group we also found correlation of cfDI to progression-free (PFS) and overall survival (OS), and possessing better prognostic capabilities than CTC status.

Materials and methods

Study subjects and plasma sample preparation

The study was approved by the Ethical Committee of the University of Heidelberg (Heidelberg, Germany). Study subjects (n = 383) included patients with primary tumour, PBC (n = 82, Table 1 in ESM), or with radiologically confirmed presence of one or more metastatic sites, MBC (n = 201, Table 2 in ESM), and healthy controls (n = 100). All subjects were females and Caucasians. The PBC cohort consisted of patients with initial diagnosis of sporadic BC and no clinically discernible metastasis. Blood was collected from them at the time point of diagnosis before they underwent any therapy or surgery, while MBC patients received one or more rounds of therapy for their metastatic tumour before recruitment and thus, before blood collection. Healthy controls comprised of individuals with no clinically diagnosed malignancies, autoimmune diseases or inflammation at time of blood collection.

EDTA blood was collected from all study participants, and plasma was separated by centrifuging blood at 1,300×g for 20 min at 10 °C within 2 h of blood collection. The supernatant, i.e. plasma, was centrifuged again at 15,500×g for 10 min at 10 °C to minimize any blood cell or cell debris contamination. The resulting plasma was snap frozen in liquid nitrogen and stored at –80 °C until further use. For MBC patients, CTC counts were additionally evaluated by the \({\hbox{CellSearch}}^{\circledR }\) system (Veridex, LLC, Raritan, NJ). Depending on the number of CTCs, patients were distinguished into two sub-groups; CTC-positive MBC (CTCpos-MBC; ≥5 CTCs/7.5 ml blood) or CTC-negative MBC (CTCneg-MBC; no detectable CTCs).

Extraction of cfDNA from plasma

cfDNA was extracted from 800 μl of plasma using the QIAamp DNA Blood Mini Kit (Qiagen, Germany) according to the kit protocol but with the following modifications: (1) addition of linear acrylamide (Ambion, Life Technologies, USA) to samples prior to extraction at a final concentration of 20 mg/ml, and (2) increasing volume of AL buffer and ethanol to 800 μl. Subsequently, cfDNA was eluted in 40 μl of elution buffer, the eluate re-applied onto the column, and the final eluate was collected and stored at 20 °C. Samples from different groups were always extracted together to avoid batch effects.

cfDNA from in vitro cells and apoptosis induction

Three cell lines, non-tumourigenic breast epithelial cells, MCF-10a, tumourigenic epithelial cells derived from solid, invasive ductal carcinoma of the breast, BT-474, and tumourigenic epithelial cells derived from pleural effusion of a patient with metastatic adenocarcinoma of the breast, MCF-7, were analysed. Identity of cell lines was confirmed by SNP-profiling. All cell lines were grown as adherent monolayers, and media was aspirated 72 h post seeding for cfDNA extraction. For apoptosis induction, MCF-10a and MCF-7 cells were grown to 80 % confluence, and then incubated for 24 h with either 1 μM staurosporine (test) or DMSO (control). At least >90 % of cells were confirmed to have undergone apoptosis by trypan blue staining, and corresponding media was collected. In vitro experiments were carried out in biological triplicates. In both the above described experiments, immediately following aspiration, media was centrifuged at 2,000×g for 10 min to eliminate cellular contamination. cfDNA was extracted from 400 μl of media as detailed above after proportionally adjusting the volume of AL buffer and ethanol.

Estimation of DI and concentration of cfDNA with ALU and LINE1 repetitive elements

Integrity and concentration of both plasma and media cfDNA were derived by analysing two repetitive elements, ALU and LINE1. For each of these targets, a short (ALU=111 bp, LINE1=97 bp) and a long (ALU = 260 bp, LINE1 = 266 bp) fragment were measured in triplicates by qPCR using Absolute SYBR green assay (Thermo Scientific) with the Roche \(\hbox {LightCycler}^{\circledR }\) 480 system (Roche Applied Sciences, Germany). Primers were designed in such a way that the short amplicons were nested within their long counterparts. ALU and LINE1 sequences were extracted from Repbase Update consensus sequences for human ALU Sx and LINE1, respectively, and primers were designed for ALU fragments and LINE1 long fragment using Primer3Plus [27] by targeting the retrieved sequences [28, 29]. Primers for LINE1 short fragment were taken from literature [30]. To achieve high PCR efficiencies, the primer sequences were optimized for minimal self-binding, dimerization and cross-hybridization [31]. All primer sequences and amplicon lengths are given in Table 3 in ESM, and the standard curve along with the corresponding PCR efficiency of each primer pair is shown in Fig. 1 in ESM. Concentrations of the long and short fragments were calculated by absolute quantification method using the \(\hbox {LightCylcer}^{\circledR }\) 480 software. cfDI was subsequently calculated as ratio of concentration of long fragment to concentration of short fragment. cfDNA concentration of a sample was deduced from the concentration of the short fragments of ALU or LINE1. To avoid inter-run variations, samples from different proband groups were randomized prior to qPCR setup, and experiments were conducted in a blinded manner.

Data analysis

Statistical analysis was executed in R.2.15 [32]. cfDNA concentrations were log2-transformed for data analysis. Differences in cfDI and cfDNA concentration between groups were evaluated by Mann-Whitney U test for patient plasma samples, while for cell line comparisons Students t test was applied. Influence of apoptosis induction on cfDI was evaluated by a two-way ANOVA model with interaction. The data was log2-transformed for this analysis so that the ANOVA model tests log2-fold changes, i.e., relative decreases rather than absolute differences. Correlations between ALU and LINE1 results were analysed by Pearson correlation. Multiparametric receiver operating characteristic (ROC) analysis was carried out to assess the discriminatory power of cfDI and cfDNA concentration between two groups, and the corresponding area under the curve (AUC) was calculated. Mann-Whitney U tests (for categorical and binary data), Spearman correlation permutation tests (for quantitative and continuous data), and Jonckheere-Terpstra tests (for ordinal data) were used to interpret the association between cfDI or cfDNA concentration and different clinical characteristics (R package “coin”, version 1.0–12). To assess correlation to PFS or OS, Cox proportional hazard models were generated for cfDI, cfDNA concentration, and CTC status, and the corresponding hazard ratios (HR) were calculated. Comparison of non-nested cox models was done with respect to their predictive accuracy as assessed by .632+ bootstrapped Brier score-based prediction error curves and integrated Brier scores (IBS) computed after 5, 10 and 15 months. This analysis was done using the R package “pec”, version 2.2.9 [33]. Kaplan-Meier curves were constructed after stratifying the data as below and above median cfDI or cfDNA concentration.

Results

Primer specificity and PCR efficiency

PCR efficiencies of all primer pairs were found to be >75 % (Fig. 1 in ESM). The specificity of primers was further confirmed by running the PCR products in a 2 % agarose gel (Fig. 2 in ESM). Since multiple bands were seen in ALU PCR products, they were digested with ALU specific restriction enzyme (ALU I, Fermentas,Thermo Scientific), and absence of non-ALU-specific amplification was confirmed (data not shown).

Decrease in cfDI and increase in cfDNA concentration in PBC and MBC cases in comparison to controls

The measurements made with ALU and LINE1 showed a good correlation between them for cfDI (\(\rho =0.77\), 95 % CI 0.72–0.80) and log2cfDNA concentration (\(\rho =0.95\), 95 % CI 0.94–0.96) (Fig. 3 in ESM), and always the same direction of results was obtained with both repetitive elements. cfDI was observed to be inversely proportional to the severity of the disease. Healthy controls had the highest cfDI (median ALU cfDI = 0.65, median LINE1 cfDI = 0.50), followed by PBC (median ALU cfDI = 0.62, median LINE1 cfDI = 0.48), CTCneg-MBC (median ALU cfDI = 0.58, median LINE1 cfDI = 0.46), and finally CTCpos-MBC cases (median ALU cfDI = 0.39, median LINE1 cfDI = 0.32) (Table 1). The reverse direction was observed with respect to cfDNA concentrations; controls had the lowest and CTCpos-MBC cases the highest concentration (Table 1). The differences in cfDI and cfDNA concentration between each pair of groups were statistically significant for both ALU and LINE1, with the exception of CTCneg-MBC versus PBC (Table 1; Fig. 1). Comparison of cfDI of PBC to controls showed borderline significance for both targets (ALU: P = 0.046; LINE1: P = 0.041) (Table 1; Fig. 1).

Table 1 Median cfDI and cfDNA concentration of different study groups calculated from ALU and LINE1 targets, and the P-values of Mann-Whitney U test conducted to compare cfDI and log2cfDNA concentration of the four different categories of study subjects—controls, PBC, CTCneg-MBC, and CTCpos-MBC
Fig. 1
figure 1

Box and whisker plots of cfDI estimated by a ALU, b LINE1, and log2cfDNA concentration from c ALU and d LINE1 targets in controls, PBC, CTCneg-MBC, and CTCpos-MBC

Combination of cfDI and cfDNA concentration can distinguish between plasma samples from PBC, MBC and control individuals

To test if the differences in cfDI and cfDNA concentration could be useful to discriminate controls from patients, PBC from MBC cases, and also CTCneg-MBC from CTCpos-MBC patient groups, multivariable logistic regression fitted analysis with ALU and LINE1 cfDI, ALU and LINE1 log2cfDNA concentration as co-variates was performed. The model with all four variables performed the best for each of the six comparisons (Fig. 2; Table 4 in ESM). Although differences between PBC samples and controls or CTCneg-MBC samples were not large, combination of the four variables could discriminate them with appreciable precision (AUC of 0.75 and 0.71 respectively). CTCpos-MBC cases, having drastically low cfDI and high cfDNA concentration, could be significantly differentiated (P \(< 10^{-10}\) for each) from controls (AUC = 0.93), PBC (AUC = 0.86), and also CTCneg-MBC cases (AUC = 0.83; Table 4 in ESM; Fig. 2).

Fig. 2
figure 2

Multiparametric ROC analysis using four variables, log2cfDNA concentration and cfDI calculated from ALU and LINE1 targets, to estimate the strength of the model to discriminate two groups, along with area under the curve (AUC) and 95 % confidence interval (CI) for each comparison

cfDI and cfDNA concentration as markers of survival in MBC patients

Cox proportional hazard analysis revealed a decrease in ALU cfDI to be highly associated with lowered PFS (HR = 0.46, = 0.002) and OS (HR = 0.15, \(P<0.000001\)). Similar results were observed with LINE1 cfDI. In contrast, increased levels of cfDNA concentration corresponded to shortened survival (Table 2; Fig. 3). To verify if the prognostic capabilities of cfDNA was independent of CTC status, IBS of different models was compared. Cox model with ALU and LINE1 cfDI and log2cfDNA concentration had the lowest IBS of 0.181 (PFS) and 0.091 (OS) at 10 months and thus outperformed the model with CTC status alone (PFS = 0.190, OS = 0.103) (Table 5 in ESM). Combining cfDNA properties of DI and concentration with CTC status of patients, did not further decrease the prediction error.

Table 2 Results of Cox proportional hazard analysis for progression-free (PFS) and overall survival (OS) in MBC group. Hazard ratio (HR) was calculated for increase in cfDI from lower 25th quartile to upper 75th quartile, for every twofold increase in cfDNA concentration, and for CTC status by stratifying patients as CTCneg-MBC or CTCpos-MBC
Fig. 3
figure 3figure 3

Kaplan–Meier curves for a progression-free (PFS) and b overall survival (OS) in MBC patients using cfDI or log2cfDNA concentration or CTC as the predictor variable

Correlation of cfDI and cfDNA concentration to clinical characteristics of PBC and MBC

The association between cfDNA and clinical characteristics was assessed individually for PBC and MBC cases. Here, a true association was inferred only if statistical significance was reached with both ALU and LINE1, thus reducing any false positive interpretations. Although no correlation of cfDNA properties to age was seen in the MBC groups, age was found to have an influence on cfDI and cfDNA concentration in the PBC group.

In the PBC group, cfDI and cfDNA concentration was observed to be influenced by menopause status, which in turn is highly dependent on age, hence collinearity might explain our observations. cfDI also demonstrated a significant correlation to tumour size (\(\rho\) of –0.33 for ALU, –0.41 for LINE1), a clinical feature indicative of cancer aggressiveness (Table 6 in ESM). In CTCpos-MBC patients, it was also negatively correlated to CTC numbers (\(\rho\) of –0.39 for ALU, –0.33 for LINE1), meaning cfDI decreased in these patients in a CTC-dependent manner. In contrast, cfDNA concentration displayed a positive correlation to CTC numbers (\(\rho\) of 0.50 for ALU and LINE1). cfDI was significantly decreased, and cfDNA concentration elevated, in MBC patients with liver metastasis or disseminated metastasis (Table 6 in ESM).

In vitro analysis reflects the same trend observed in cases and controls

The integrity of cfDNA extracted from media collected from monolayer cultures of MCF-10a, BT-474 and MCF-7 cells was analysed for studying the direction of cfDI changes in vitro. This pointed to MCF-10a having the highest cfDI (ALU = 0.97, LINE1 = 0.95), significantly higher than that of BT-474 (ALU = 0.71, LINE1 = 0.69) and MCF-7 (ALU = 0.67, LINE1 = 0.67) (Fig. 4a, b). This difference in the integrity of cfDNA from breast non-tumourigenic (MCF-10a) and tumourigenic (BT-474 and MCF-7) cell lines was in concordance to our observations in patient samples.

Fig. 4
figure 4

Box and whisker plots of cfDI measured in MCF-10a, BT-474 and MCF-7 cells by a ALU and b LINE1. c Bar plots of log2cfDI measured in MCF-10a and MCF-7 cells after apoptosis induction by staurosporine or control treated (P=0.058)

Difference in apoptotic fragment length between MCF-10a and MCF-7 cells

We hypothesized the decrease in cfDI could be due to differences in the efficiency of DNA fragmentation during apoptosis in controls and cancer patients. To support this, integrity of DNA extracted from media of MCF-10a and MCF-7 cells with and without apoptosis induction was analysed. cfDI of staurosporine treated cells was lower than cfDI of the corresponding untreated cells, indicating apoptosis was indeed induced. In line with our hypothesis, following apoptosis induction a significantly lower cfDI, i.e., a higher degree of DNA fragmentation in the tumourigenic MCF-7 cells (cfDI=0.64) than the non-tumourigenic MCF-10a (cfDI = 0.95) was observed (Fig. 4c). Since even in untreated cells, MCF-7 had a lower cfDI than MCF-10a cells, the data was log2-transformed and the relative decrease was calculated. In comparison to the untreated cells, MCF-7 cells had an 18 % relative decrease in cfDI which was more than twice as much as MCF-10a, which only had a 7.8 % decrease (P = 0.058 for interaction effects).

Discussion

Here, we have demonstrated the potency of cfDI as a diagnostic biomarker by a comprehensive analysis of healthy controls and BC patients with different disease stages. To our knowledge, this is the largest study in the field of cfDI with 383 study subjects, and the first to determine cfDI by simultaneously targeting two repetitive regions. Other strengths of this study include the standardised sample processing within 2 h of blood collection, a two-step centrifugation protocol to reduce contamination with DNA from cell debris, and conducting experiments in a randomized and blinded manner. Plasma samples were preferred over serum, despite studies showing cfDNA concentration to be higher in the latter [34], since there have been reports of coagulation process affecting the spectrum of circulating nucleic acids in serum and thus contributing to higher variability [35]. The use of repetitive DNA elements with a high copy number that are distributed throughout the genomic DNA ensured generation of accurate cfDI estimates inclusive of samples with very low cfDNA concentrations, even less than 0.05 ng/μl, which is the predominant concentration range in healthy individuals and PBC patients. The efficient amplification of our targets was confirmed by the standard curves generated with the primers used. Since we independently analysed two different repetitive elements, ALU and LINE1, in parallel for each sample, and obtained concordant results with both approaches, chances of false positives were minimized.

We observed a more fragmented nature of cfDNA or decreased cfDI in BC cases compared to controls. Our results further demonstrated a hierarchical decrease of cfDI and increase of cfDNA concentration from cases with locally confined (PBC) to those with metastasis (MBC), especially in patients with poorer prognosis. The PBC group, which included predominantly early stage (I or II) cases, had the highest cfDI and lowest cfDNA concentration among cases, while still being modestly yet distinctly different from healthy individuals. This was followed by the CTCneg-MBC group, which has a comparatively better prognosis among MBC patients. CTCpos-MBC group possessed grossly diminished cfDI and elevated cfDNA concentration in comparison to the other three groups. Multiparametric ROC analysis revealed that combining cfDI and cfDNA concentrations calculated from ALU and LINE1 elements could clearly differentiate both PBC and MBC cases from controls, PBC from MBC cases, and also CTCpos-MBC from CTCneg-MBC cases. Despite correlation to CTC status and counts, cfDI was able to perform as a prognostic marker independent of CTCs. The model with cfDI and cfDNA concentration fitted the data significantly better than that with CTC status only, as it had lower prediction error as evident from the IBS. Our data demonstrate that patients with lowered cfDI and more fragmented DNA would have a worse outcome. Higher fragmentation of DNA is associated with high caspase 3 activity, which in turn has been shown to be correlated with significantly increased rate of recurrence and deaths among cancer patients [36, 37]. In addition, caspase 3, an effector of apoptosis, has been shown to directly increase the rate of tumor cell re-population by secreting growth-stimulating factors into the tumour microenvironment [37]. This would indirectly imply patients with higher rates of apoptosis have paradoxically higher tumour proliferation, and therefore poorer prognosis. A link between increased apoptosis and poor outcome has been already shown in BC, and the same is reflected in our results [38, 39].

While few reports have emerged with findings similar to ours [2426], current literature predominantly reports an increased cfDI among cancer patients [12, 1921]. Different sources of cfDNA between cancer (both apoptotic and necrotic cells) and healthy (apoptotic cells) individuals have been proposed as the cause for the increased cfDI observed in cancer patients [12, 23]. However, this syllogistic reasoning has not been backed by experimental proof to show that majority of cfDNA of cancer and healthy subjects are indeed from necrotic and apoptotic cells, respectively. A study on renal cancer postulated the presence of high molecular DNA associated with nectrotic DNA in only a small proportion of patients by analysing cfDNA from cancer patients using gel electrophoresis. Based on these results, necrotic DNA accounts for increased cfDI in only a fraction of cancer patients [40]. In vitro experiments by us showed a decline in cfDI in two BC cell lines, BT-474 and MCF-7, with respect to a non-cancerous breast epithelial cell line, MCF-10a, thus corroborating our observations in patient and control individuals. We also demonstrated that subsequent to induction of apoptosis, MCF-7 breast cancer cells had a decreased cfDI compared to MCF-10a cells. This increased DNA fragmentation in apoptosized cancer cells compared to apoptosized non-cancerous cells, would provide an explanation for the decreased cfDI in cancer patients observed by us. This result would imply that the extent of fragmentation of DNA during apoptosis might be varied between cancer and normal cells. This theory is supported by the work of Giacona et al. [24], who observed cfDNA from healthy individuals had three- to fivefold multiples of nucleosome associated DNA length, and considerably longer fragments than pancreatic cancer patients.

The conflict in the directionality of our results with some previous studies reporting an increased cfDI in cancer patients could be partly due to the properties of primer pairs and amplicon lengths. Since qPCR-based methods produce indirect estimates of true biological values, inconsistencies in the real-time amplification or differences in the PCR efficiencies of the long and short amplicons can influence cfDI estimation. Sub-optimal primers or PCR efficiencies can lead to biologically impossible cfDI estimates >1 [21, 41, 42]. We found all samples analysed in this presented study have cfDI values within the biologically plausible range of 0–1, thus reinforcing our methodology and our results. Additionally, sample processing and preparation could also affect the results. Pre-analytical parameters such as time from blood collection to processing have already been shown to affect the results [43]. In our experiments, we noted that the second high-speed centrifugation step of plasma to remove cell debris significantly reduces cfDNA concentration and alters the cfDI compared to plasma samples that were obtained by centrifuging blood at 4,000×g and without additional centrifugation (data not shown).

cfDI are reported to be altered in different cancers, hence they are not specific to BC and cannot be developed as a stand-alone test. Nevertheless, they can be used in combination with other biomarkers to enhance sensitivity and specificity. Our results point to the potential of cfDI as a part of of a molecular multi-marker assay for diagnosis of PBC and MBC, and as a prognostic marker for MBC. Its strong correlation to PFS and OS, while outperforming the currently established prognostic marker in MBC, CTCs, indicates a versatile role it could play in diagnostics. To verify the emerging potential of cfDI as an early detection marker in BC, further specifically designed prospective studies need to be conducted.