FormalPara Key Points

Non-invasive breast cancer detection method with 98.6 % sensitivity and 100 % specificity.

Urinary exosomal microRNA as diagnostic biomarker.

Urinary microRNA as part of multiomic breast cancer detection.

Urine-based breast cancer detection offering diagnostic, prognostic, and theranostic features.

1 Introduction

The distinct impact of breast cancer (BC) on women’s health and associated life expectancy is reflected by continuously increasing global incidence rates seen over past decades. The assessment of cancer registry data in 102 countries for the period 1990–2016 as part of the Global Burden of Disease (GBD) study [1] identified BC as the most frequent malignant disease in women, with 1.7 million cases in 2016 worldwide [2]. Malignancies originating from the breast tissue account for 25% of all cancer cases in women [3]. Since 1990, BC incidence rates more than doubled in 60 of the 102 countries, affecting both developed and developing regions [2]. Early detection of BC is the major decisive factor for the cure rate [4,5,6] Although BC-associated mortality rates were reduced as a result of improved screening and treatment options, the heterogeneous quality patterns in different countries sustain a crucial weak point in BC management [2, 7]. Routine BC detection methods such as palpation, mammography, and ultrasound bear a number of limitations, e.g., moderate sensitivity and specificity rates, especially at denser mammary tissue and lower patient compliance [8, 9]. The need for novel approaches with respect to more diversified, personalized, predictive, and preventive BC diagnostics applies particularly to younger, premenopausal, and pregnant women, who generally can be characterized as having greater difficulties with respect to BC detection, potentially combined with fewer choices with regard to effective treatment options (fertility preservation, protection of fetus) and more serious consequences of established diagnostic interventions (irradiation, tissue biopsies) in earlier phases of life [10,11,12]. Thus, there is a significant demand for innovation in the methodology of improved early BC diagnosis.

Liquid biopsy-based disease biomarkers offer a range of promising prospects as important prognostic, diagnostic, and theranostic tools [13]. Various biomarker types are available in body liquids through minimally or non-invasive sampling procedures [13]. Among others, circulating micro ribonucleic acid (microRNA/miRNA/miR) molecules qualify as robust and reliable matrices with regard to detectability and specificity in disease diagnosis and monitoring [13, 14]. A recent study demonstrated the feasibility of urinary miRs (uri-miRs) in BC detection [15]. Similarly, the applicability of uri-miRs with diagnostic features has been proven in other tumor types as well [13, 16,17,18]. The method leading to this comprises the isolation of vesicle-encapsulated (exosomal) miRs from urine specimen and subsequent quantitative expression analysis. Highly specific uri-miR expression patterns are then able to distinguish cancer patients from healthy controls [15, 17, 19].

The current study on urinary BC miR biomarkers advances previous endeavors in this field [15, 20], showing methodical improvement as well as the extension of the biomarker panel and the cohort size. This investigation is based on prerequisite conditions for the chosen set of the 13 analyzed miR types (miR-17, miR-107, miR-125b, miR-194, miR-222, miR-423, miR-424, miR-660, let7-a, let7-d, let7-e, let7-f, and let7-i). Hence, an miR specimen qualified only if it complied with the following two requirements: a known BC association and a previously proven solid detectability in urine samples.

Dysregulated expression of let-7 family members of miRNAs could be linked to breast carcinogenesis [21,22,23,24,25,26]. Distinct let-7 miRNAs bear the potential to be employed as prospective molecular markers in BC diagnostics and as therapeutic targets [26,27,28,29].

The functional involvement and putative diagnostic implications of miR-17 in BC has been demonstrated in several studies [30,31,32,33], including one emphasizing its predictive potential in BC recurrence management [34]. Aberrant expression levels of miR-125 relate to BC tumorigenesis, within which it displays a functional correlation with the HER2-overexpressing BC subtype, influencing its metastatic risk and its prognosis [35,36,37,38]. Malignant breast tissue transformation was found accompanied by miR-222 activities regulating decisive signaling pathways in tumor progression [39,40,41]. Most interestingly, miR-222 potentially offers predictive power in hormone receptor-positive BC [42]. Exosome-mediated intercellular transfer of miR-222 appears to be associated with BC metastasis [43].

With a focus on BC, the biomarker potential of miR-107 was demonstrated in various functional studies, which also revealed its impact on BRCA1 activity and BC recurrence prediction in triple-negative BC (TNBC) [44,45,46,47,48,49]. Among others, the levels of circulating miR-107 are correlated with lymph node metastasis and receptor status in BC patients [50]. The Wnt/β-catenin signaling pathway in BC is regulated by miR-194, thereby affecting cancer cell proliferation, migration, and invasion [51]. Furthermore, circulating miR-194 molecules are linked to the BC recurrence risk [52] and exhibit a functional interrelation with the anti-HER2 agent trastuzumab [53].

The functional background of miR-423 and the crucial impact of altered single-nucleotide polymorphisms (SNPs) in the miR-423 gene on BC biology became apparent in recent investigations [54, 55]. Another functional aspect is provided by the combined expression levels of miR-423 and miR-4417, which have differentiated 70.1% of hereditary and non-hereditary BCs [56].

miR-424 exhibited a cancer relevant regulatory impact in basal-like BC [57] and tumor suppressor activities influencing chemotherapy resistance in BC [58]. A serum-based BC biomarker study identified a three-miR signature comprising the expression levels of miR-424, miR-29c, and miR-199a as having the highest diagnostic accuracy for distinguishing BC patients from healthy controls [59]. Oncogenic effects are linked to miR-660 expression, with triggering functions in BC proliferation, migration, and anti-apoptotic activities [60]. A BC tissue-derived next-generation sequencing analysis extracted miR-660 as one promising biomarker candidate with significant prognostic features in overall survival (OS) and recurrence-free survival (RFS) in BC patients [61].

In this study, expression levels of a set of 13 distinct BC-associated miRNA types were analyzed in the urine of untreated BC patients in comparison to healthy controls in order to extract a non-invasive biomarker tool applicable for BC diagnosis.

2 Materials and Methods

2.1 Cohort

The case–control cohort comprised 69 untreated patients, newly diagnosed with primary BC in the adjuvant setting, and 40 healthy female controls treated at the Department of Obstetrics and Gynecology, University Medical Center Freiburg during March 2016 to October 2017. Control patients included in this study either presented themselves for routine BC screening or for clarification of specific breast-related symptoms (e.g., subjective tactile findings, mastodynia) in order to exclude the presence of BC and its precursors. Therefore, the control’s health status was confirmed by medical examination (inspection and palpation) performed by an experienced clinical physician, including breast and regional lymph nodes, to exclude potential BC patients and patients with any history of other malignant disease or current inflammation. Furthermore, healthy controls underwent breast and axillary ultrasound and mammography. In order to identify and exclude patients with advanced stages of BC, staging procedures following the current national guidelines were performed in each case.

The investigation protocols (36/12 and 386/16) were approved by the institutional ethical review board of the University of Freiburg. All patients and healthy controls involved provided written informed consent. Clinical cohort characteristics are summarized in Table 1.

Table 1 Cohort characteristics of breast cancer (BC) patients and healthy controls

2.2 miRNA Specimen

This biomarker identification study with diagnostic applicability with respect to BC detection is based on the expression analysis of urinary circulating miRNA types. The chosen set of 13 miRNA specimens is characterized by the combined features of a proven functional relation to BC and a robust and reliable detectability in human urine samples. For normalization purposes in the relative quantification of miRNA expression levels, two miRNA specimens with housekeeping characteristics (miR-16 and miR-26b) were used, as identified in a previous study [15]. Detailed information on the miRNA types, including the target sequences, is listed in Table 2.

Table 2 Information on miRNA names, sequences and primers utilized in expression analysis

2.3 Sampling and Storage

Native spontaneous urine samples were collected in 100-mL sterile urine sampling cups (Sarstedt, Germany) and cryopreserved in 10-mL aliquots (Urin Monovette, Sarstedt, Germany) at − 80 °C until further processing.

2.4 Sample Preparation, RNA Isolation, and Reverse Transcription

Cryopreserved native urine specimens were thawed at room temperature and centrifuged (4000 rpm; 15 min; 4 °C) to remove potential cell debris residues or other solid precipitates from pure urine supernatant.

For the isolation of miRNA-loaded exosomes/microvesicles from urine samples, 10 mL of centrifuged urine was transferred to a 10-mL syringe (#4617207V; BRAUN, Melsungen, Germany) with a nylon filter with a 0.22-µm pore size (#02542904; Perkin Elmer, Waltham, USA) screwed onto the syringe. With the piston inserted, the sample was filtered dropwise through the screwed-on filter. Following filtration, the filters were washed with 5 mL 1× Dulbecco’s phosphate buffered saline (DPBS) buffer (#14190185; ThermoFisher Scientific, Karlsruhe, Germany) using a fresh 5-mL syringe (#4617053V; BRAUN, Melsungen, Germany) to remove sample residues that could have possibly interfered with the downstream applications. After washing, total RNA isolation was performed using the Norgen Total RNA Purification Kit (#17200, NORGEN Biotek Corp., Thorold, Canada). According to the manufacturer’s protocol [62], 600 µL of lysis buffer was extruded through a filter membrane. Purified RNA was eluted in 35 µL of elution buffer. Further processing of RNA specimen was defined to a standard volume of 2.5 µL.

Reverse transcription (RT) of miRNAs was realized using 2.5 µL of purified RNA sample, 5 μL of 5× RT-Buffer, 0.1 mM of ATP, 0.5 μM RT-primer, 0.1 mM of each deoxynucleotide (dATP, dCTP, dGTP, and dTTP), 25 units of Maxima Reverse Transcriptase (Thermo), and 1 unit of poly(A) polymerase (New England Biolabs GmbH, Frankfurt, Germany) in a final volume of 25 µL. Reaction incubation was set to 30 min at 42 °C, followed by enzyme inactivation at 85 °C for 10 min. The primer design for the RT reaction was performed using the miRprimer software tool by Busk [63]. All RT primer information is listed in Table 2. Complementary DNA (cDNA) samples were diluted in H2O to a final volume of 200 µL and stored at − 20 °C until further processing.

2.5 qPCR-based miRNA Quantification

Quantitative determination of miRNA expression levels was realized by real-time quantitative polymerase chain reaction (qPCR) on a LightCycler® 480 (Roche, Mannheim, Germany). The PCR reaction set up included 1 µL cDNA, in-house qPCR buffer [containing TRIS pH8.1, dATP, dCTP, dGTP, dTTP, magnesium, potassium ammonium, SYBRGreen (Jena Bioscience, Jena, Germany), and enhancers], and 0.25 units HotStart Taq Polymerase (Jena Bioscience) in a total volume of 10 µL.

The used qPCR primers were designed via miRprimer software [63] and are listed in Table 2. The applied qPCR program comprised initial denaturation (95 °C; 2 min), 40 cycles of denaturation (95 °C; 5 s)/annealing/extension (60 °C; 30 s), and one final melting curve analysis step.

Relative quantification of miRNA expression levels was performed in a duplicate analysis based on ΔCt method normalized on corresponding mean expression values of the two housekeeping miRNAs miR-16 and miR-26b. The validation of the housekeeping miRNAs applicable for the setting of urinary miRNA analysis was proven in a previous study [15] and is methodically described by Marabita et al. [64].

2.6 Technical Processing Normalization via Spike-In Control RNAs

For monitoring purposes of processing efficiency with regard to RNA extraction, cDNA synthesis (RT), and PCR amplification, the addition of two ‘spike-in’ control RNA specimens of a defined concentration to the RNA lysis buffer, as recommended by Marabita et al. [64], was necessary. Thereto, 5 fM of two exogenous synthetic RNA types (Caenorhabditis elegans cel-miR-39 and Arabidopsis thaliana ath-miR-159; Biomers.net GmbH, Ulm, Germany; sequences in Table 2) were added to the used RNA lysis buffer (Norgen).

Technical normalization of the expression data obtained by qPCR was performed based on threshold points (Ct), with relative quantities (RQs) defined as those Ct values scaled to the geometric mean of the external reference spike-in RNAs and conversion to a linear scale by the calculation formula:

$${\text{RQ}} = 2^{{ - \Delta C_{\text{t}} }}$$

with

$$\Delta C_{\text{t}} = C_{{{\text{t}}_{\text{miRNA}} }} - C_{{{\text{t}}_{\text{spike geometric mean}} }}$$

The normalization factor (NF) was calculated as the geometric mean of the selected normalizers (housekeeper miRNA specimens: miR-16 and miR-26b) for each sample (j). Thus, the NF is applied to extract the normalized relative quantities (NRQs) for each miRNA i and j sample [64]:

$${\text{NRQ}}ij = \frac{{{\text{RQ}}ij}}{{{\text{NF}}j}}.$$

2.7 Statistical Analyses

Biomarker assessment focused on the extraction of the diagnostic value of all 13 analyzed miRNA types and combinations of them with regard to urine-based BC detection. With this aim, we fitted a (univariable) logistic regression model for each type separately and then used several exploratory statistical approaches, described as follows.

  1. (1)

    Variable selection frequencies with cross-validation In each of 10,000 repetitions, we randomly selected a subsample of size 2/3 of the data (the training sample), fitted a logistic regression model using forward selection, and determined the inclusion frequency of each variable. We then chose the seven variables with the highest inclusion frequency, fitted a model to the full data set, and reduced this model further by forward selection, finally choosing within all convergent models the one that minimized the Akaike information criterion (AIC).

  2. (2)

    Variable selection frequencies with bootstrap We drew 10,000 bootstrap samples (with repetition). In each bootstrap sample, we performed modeling and variable selection as in the first approach.

  3. (3)

    Receiver operating characteristic (ROC) analysis/cross-validation Similar to approach 1, but now we applied the model that was fitted in the training sample to the (remaining) test sample, with fixed coefficients (100 repetitions). We constructed the ROC curves and used the area under the curve (AUC) as a measure of goodness of fit.

  4. (4)

    Variable selection frequencies with boosting Boosting is a stepwise and regularized procedure for fitting generalized linear models. Starting with setting all regression coefficients to zero, in each step, one coefficient is updated, such that the model fit improves the most. The number of steps is determined by cross-validation. As many coefficients will never be updated, boosting provides a sparse model. The result was visualized by showing the coefficient paths. Boosting was performed using the R package GAMBoost [65].

  5. (5)

    All-subsets regression All possible 212 = 4096 models were fitted and compared based on the AIC. We then restricted the list to convergent models with at most four variables.

All analyses were performed using the open statistical software environment R [66].

3 Results

The complete panel of all 13 (miR-17, miR-107, miR-125b, miR-194, miR-222, miR-423, miR-424, miR-660, let7-a, let7-d, let7-e, let7-f, and let7-i) circulating miRNA types could be reliably detected in all urine specimens of the test cohort. Sample-dependent variations in expression levels of miRNAs due to variable urine parameters (e.g., dilution, concentration of nucleic acids) could be equilibrated by normalization against the set of the two stably expressed housekeeping miRNAs, miR-16 and miR-26b.

Based on the miRNA expression levels converted to NRQs [see the electronic supplementary material (ESM), Supplement pages 2–5], the assessment of miRNA-type–specific diagnostic power for discrimination purposes for BC cases versus healthy controls was evaluated by employing several statistical methods (see Sect. 2). Thereby the applicability and validity of the methods were compared. These methods were considered in both a singular fashion and in a combined manner.

Univariable logistic regression models were conducted for predicting cancer for each miR. Using the resulting score for prediction, the AUC was calculated separately for each miR (see the ESM, Supplement page 14). The best predictors in the sense of the largest AUC were miR-424 (up, AUC = 0.882), let7-i (down, 0.865), miR-423 (down, 0.857), let7-f (down, 0.854), and miR-660 (down, 0.850). It must be emphasized that univariable analyses (1) cannot account for interaction between variables and (2) were not trained/tested on independent samples. In general, please also note that it must be expected that regression coefficients in different statistical models are different and in some cases may also have a different sign. The summarized information output of the multivariable statistical analyses is described in the following approaches (1–5).

  1. (1)

    Variable selection frequencies with cross-validation This approach resulted in a ranking scheme in miRNA type frequencies (Table 3).

    Table 3 Variable selection frequencies with cross-validation (inclusion frequency, relating to all selected models)

    The forward selection based on these ten variables (miRNA types) led to a model including the first five variables (miR-424, miR-423, miR-660, let7-i, and miR-17). Since models with more than four variables (miRNA types) failed to converge, these were excluded from further statistical assessment. Applying these restrictions, this statistical approach provided the model miR-424 + miR-423 + miR-660 + let7-i. This variable combination was selected in 15.56% of all repetitions.

  2. (2)

    Variable selection frequencies with bootstrap Table 4 lists the ranking of the most frequently chosen miRNA types.

    Table 4 Variable selection frequencies with bootstrap (inclusion frequency, relating to all selected models)

    Forward selection led to a model with the five miRNA types let7-i, let7-f, miR-423, miR-424, and miR-660. Again, the lack of convergence of models with more than four variables led to the reduction to the following miRNA type combination, which is identical to the result of the first statistical approach: miR-424 + miR-423 + miR-660 + let7-i. This variable combination was selected in 9.07% of all bootstrap samples.

  3. (3)

    ROC analysis/cross-validation The previously preferred model with four variables [miR-424, miR-423, miR-660, and let7-i (1 and 2)] was also the most frequently selected model here (18 of 100 repetitions). In all these repetitions, the AUC was 1 in the training sample and in the test sample, if the same model was newly fitted. When we applied the fitted model with fixed coefficients to the test sample, AUC values ranged between 0.905 and 1, with a median value of 0.995.

    There were a number of alternative models with at most four miRNA types with promising AUC values (see the ESM, Supplement pages 76–211). The most frequently selected combination of five miRNA types was again miR-424, miR-423, miR-660, let7-i, and miR-17 (5 of 100 repetitions).

  4. (4)

    Variable selection frequencies with boosting After 1500 boosting steps, seven miRNA types had been selected (including those obtained by the earlier approaches). These were miR-424 (up), miR-423 (down), miR-660 (down), let7-i (down), miR-194 (up), let7-a (down), and miR-125b (up); see Fig. 1 and the ESM (Supplement pages 212–215).

    Fig. 1
    figure 1

    Coefficient paths for the stepwise boosting procedure. The x axis shows the boosting steps (here 1–1500); the y axis shows the coefficient estimates for each selected variable. After 1500 steps, seven variables had been selected

  5. (5)

    All-subsets regression All 4096 possible candidate models were fitted. The restriction to convergent models with at most four variables provided the ranking for superior models (Table 5).

    Table 5 Best models with at most four variables for ‘all-subsets regression’

The first model characterized by the lowest AIC (AIC = 15.0) emerged only here, not in the previous approaches. The second model contains the four miRNA types (miR-424 + miR-423 + miR-660 + let7-i) with the highest inclusion frequency in approaches (1) and (2). The second model also emerged in approach (3). ROC plots for some of these models are provided in the ESM (Supplement page 216).

Note that in the statistical approach (5), all-subsets regression works without cross-validation, as all models are fitted to the full data set. Thus, this regression method lacks information in terms of the predictive performance of the models.

Comprehension of the statistical evaluation processes performed in this study is facilitated by the provision of all underlying data assessment files, including the following: NRQ values and original data in R-program readable format (see the ESM).

Summing up, a panel of four miRNA types, comprising miR-424, miR-423, miR-660, and let7-i, could be elected as a highly specific combinatory biomarker tool in the discrimination of BC patients versus healthy controls based on the urine specimen, characterized with 98.6% sensitivity and 100% specificity (Fig. 2). Figure 3 shows box plots for miRNA types miR-424, miR-423, miR-660, and let7-i for BC patients and healthy women. Figure 4 depicts the corresponding box plot for the model combining these four miRNA types.

Fig. 2
figure 2

ROC curve for the best model with four variables. miR-424, miR-423, miR-660, and let7-i applied to the full data set. The optimal cut-off value is indicated (defined as the cut-off where the sum of sensitivity and specificity is maximized). The corresponding score value is 0.681; thus individuals with a score higher than 0.681 are classified as BC-positive. BC breast cancer, ROC receiver operating characteristic

Fig. 3
figure 3

Box plots for miRNA types miR-424, miR-423, miR-660, and let7-i. miRNA microRNA

Fig. 4
figure 4

Box plot for model combined from miRNA types miR-424, miR-423, miR-660, and let7-i. miRNA microRNA

4 Discussion

4.1 Breast Cancer Detection via Liquid Biopsies and Advantages of Urine-Based Biomarkers

The greater objective of this study was to expedite the methodical improvement of BC diagnostics using biomarker-based non-invasive screening methods. In order to achieve that, the readily available sample matrix urine was considered as a promising source for the identification of a BC-specific biomarker panel. There is an incontestable convenience of urine for sampling procedures and as a result improved patients’ compliance in clinical practice. Beyond that, urine sampling surpasses blood-based techniques by offering risk-free non-invasive, more practitioner-friendly, cost-efficient, and easy hands-on sampling, even feasible for non-medically trained personnel. Given that most liquid biopsy biomarker classes characterize as challenging with respect to stability during sampling, biobanking, and biopreservation, the choice of sample matrix is one important factor in the identification process of disease-specific circulating biomarker molecules [67]. Biomarkers in body liquids may occur at low concentration levels only and are most commonly prone to degradation and unspecific adherence to other materials, accompanied by subsequent quality and quantity loss [67]. Blood and its derivatives generally offer one of the greatest diversities and high quantity levels of circulating biomarkers. However, concerning distinct biomolecules (e.g., miRNAs), disadvantageous aspects need to be considered, e.g., hemolysis [68, 69], anti-coagulant-dependent analytical challenges [70], and an unfavorable ‘signal-to-noise’ ratio due to BC diagnosis-unspecific molecules [67, 71,72,73]. The advantages of urine as source for biomarkers contrast with the technical and analytical challenges in the identification of robust and reliable subcellular markers for BC detection using this matrix. For years, the on-site operating experience with liquid biopsy-based BC diagnostics evolved from the methodical refinement and data evaluation that started from an initial feasibility study on urinary BC detectability [15].

Recently, the medical research landscape has been characterized by increasing efforts to investigate the diagnostic potential of biomarkers originating from body liquids in order to meet the increasing demand for non- or minimally invasive screening methods and for personalized treatment options in clinical disease management [74,75,76,77]. To date, liquid biopsy-based BC detection or monitoring studies predominantly concentrate on blood components [12, 78, 79]. So far, only a few approaches have helped to elucidate the potential association between urine-derived biomolecules and BC [15, 80,81,82,83,84]. Cancer-relevant urinary miRNA profiling mostly focuses on malignancies of the urogenital tract and adjacent organs [85,86,87]. To date, there is little data on cancer studies of other distant organs like gynecological cancer [17] and gastric cancer [88] that identified cancer-specific urinary miRNA expression levels.

The current study enhances previous analytical approaches, providing a refined standardized biomarker isolation method, a robust and reproducible quantification protocol, an expansion of cohort size and biomarker candidates, and a comprehensive statistical data assessment [15]. Based on continuous refinement and corresponding technical validation, the isolation of miRNA molecules from urine was standardized to a simple, yet effective, filter-based technique for exclusive extraction of vesicle-encapsulated miRNA specimens free of other potential cellular collaterals.

The preselection of miRNA biomarker candidates was guided by the two combinatory characteristics of robust and reliable detectability in human urine and a known functional relevance in (breast) cancer biology. Prior to that, the specific detectability of circulating miRNA specimen was analyzed in a bigger pool of miRNA candidates in a set of BC versus control samples. For this purpose, potential miRNA types underwent a multi-step selection process. First, miRNA types were preselected on the basis of recent urinary miRNA expression studies [89,90,91,92,93], with regard to their proven detectability in human urine. Second, the analytical feasibility of specific detection of miRNA sequences was evidenced by excluding a variety of miRNA candidates via their insuperable intricacies in qPCR primer design owing to adverse guanine and cytosine (GC) content and/or distribution in nucleotide sequences. Finally, the potential miRNA types had to meet the strict requirement of robust detectability, defined by a qPCR threshold of Ct < 30. It is worth mentioning that the urine samples used in the preselection process as well as in the presented study feature native, spontaneous patient urine specimens. These were not linked to general restrictive parameters like dietary issues or distinct sampling time points (e.g., soberness or morning urine) to offer an analytical approach that originates from the most practicable and easy applicable test medium.

Due to the enlarged cohort size of 69 BC patients versus 40 healthy controls, the informative value of the current study could be significantly increased compared to that of previous analyses [15, 17].

The methodical restriction on vesicle-encapsulated (exosomal) miRNA is reasonable given that the vast majority of non-exosomal (‘free’) RNA molecules are subjected to enzymatic degradation by ubiquitous ribonucleases, even though circulating miRNAs display a certain persistence compared to other RNA types [94, 95]. The great impact of the renal passage on biomolecules has to be also taken into account in urine-based analytics [17].

4.2 Methodical Background and Rationale

To our knowledge, there are no existing standardized protocols for isolation, quality assessment, and quantification of circulating miRNAs. Consequently, a general guideline for miRNA analysis in body fluids is still lacking [17, 96]. Blood derivatives are frequent sample matrices of recent miRNA-directed investigations, and a range of commercial suppliers offer blood- and even urine-based miRNA isolation/detection kits. However, solely as an extrapolation, the few confirmed recommendations on miRNA detection derived from blood might be transferred to the urine setting [96]. With the implementation of qPCR in miRNA profiling in the current study, the highest levels in terms of specificity, sensitivity, throughput, quantification accuracy, and flexibility compared to most other techniques [96] was utilized to obtain expression data. Control and normalization in biomarker analysis were realized by spike-in controls to enable complete monitoring of technical procedures (see Sect. 2). In addition, the relative quantification of miRNA specimen expression was also based on previously validated normalizers (miR-16, miR-26b) with stable housekeeping characteristics in urinary miRNA analysis [15]. Finally, NRQs could be extracted as a basis for computation.

Expression data evaluation employed a variety of statistical approaches to test and compare the validity of BC-specific miRNA profiles in an explorative manner. Statistical testing using variable selection frequencies combined with cross-validation offered an miRNA model output clearly highlighting the diagnostic value of the four urinary miRNA types miR-424, miR-423, miR-660, and let7-i. Of these types, miR-424 displays upregulated expression levels, while miR-423, miR-660, and let7-i are downregulated in BC patients compared to healthy controls. Varying the variable selection frequencies test by combining it with the bootstrap approach, a model with the previously extracted four miRNA types could be confirmed again. In both tests, the lack of model convergence when exceeding more than four variables led to the restricted miRNA panel of miR-424 + miR-423 + miR-660 + let7-i.

Applying ROC analysis combined with cross-validation to the previously preferred set of four miRNAs (miR-424, miR-423, miR-660, and let7-i), selection of this model occurred in 18% of repetitions, with AUC = 1 in training as well as in the test sample. The application of this fitted model with fixed coefficients to the test sample resulted in promising AUC values (median value 0.995). Alternative models with a maximum of four miRNAs also presented promising AUC values (see the ESM, Supplement pages 76–211). Interestingly, the approach with five variables highlights the miRNA combination miR-424 + miR-423 + miR-660 + let7-i + miR-17 via the most frequent selection, thus again featuring the four previously emphasized miRNA specimens.

The output of variable selection frequencies testing after 1500 boosting steps offered a selection of seven miRNA types including the versant quartet miR-424 + miR-423 + miR-660 + let7-i. These were completed by the miRNA specimens miR-194 and miR-125b, found upregulated in BC patients, and by the downregulated let7-a (Fig. 1).

Ultimately, via the most comprehensive approach of all-subsets regression, all 4096 possible candidate models were fitted. The application of the restrictive parameters to a maximum of four variables and obligatory model convergence extracted four superior models that are characterized by the concordant triplet of the miRNA types miR-424 + miR-660 + let7-i. Without an additional variable miRNA type, this triplet ranks behind the other first three superior models (AIC = 19.2). The model with the addition of miR-125b to the triplet scores first rank (AIC = 15.0), though occurs in this statistical approach only. The versant combination with miR-423 with highest ranks in the previous statistical tests scores the second rank (AIC = 16.3). Replacement of miR-423 with miRNA type let7-d provides a model listed at the third rank (AIC = 17.2).

Based on the different statistical methods performed, the following miRNA types could be specified with the highest diagnostic value with regard to BC detection via urine sample: miR-424, miR-660, let7-i, and miR-423. Alternative models account for partial replacement of the ‘top four’ combination with miR-125b, miR-194, let7-a, let7-i, or miR-17. Patient stratification by specific molecular BC subtypes cannot be performed because of the restricted sample size of distinct BC characteristics [e.g., N (HER2neu-positive) = 4; N (estrogen receptor (ER)-negative) = 13; see Table 1], and unplanned post hoc testing should be avoided in general [97].

4.3 The Role of miR-424, miR-660, miR-423, miR-125b, miR-194, miR-17, let7- a, let7-d, let7-f, and let7-i in Breast Cancer

Dysregulated miR-424 expression levels were identified in association with a range of malignant diseases [57, 98,99,100,101], but with a contrary functional impact on tumor progression. In BC biology, miR-424 showed tumor-suppressive characteristics as part of the miR-424/503 complex in tissue-based investigations [58]. Recently, the tumor-suppressive effects of miR-424 were also observed in basal-like BC in vivo and in vitro [57, 102]. The diagnostic potential of miR-424 in the detection of non-invasive early BC in a triplet miRNA signature with miR-199a and miR-29c was demonstrated in a serum-based circulating biomarker study [59]. In addition, plasma levels of miR-424 expression, as part of a six-miRNA-type diagnostic panel, offer discriminating value in the separation of responders from non-responders in ER-positive, metastatic BC patients treated with dovitinib [103]. Due to its top ranking and repeated confirmation via different statistical approaches in the current study, miR-424 could be identified as the miRNA type with the most promising diagnostic value in urinary BC detection.

The majority of investigations into miR-660 tumor-biological functions confirm its tumor-suppressive regulatory activity [104,105,106,107]. This contrasts with the oncogenic potential of miR-660 observed in osteosarcoma [108]. Most interestingly, in vitro studies identified an oncogenic regulatory impact of miR-660 in BC [60]. NGS profiling of BC tissue specimens identified miR-660, together with miR-574, as an auspicious prognostic biomarker with respect to OS. So far, potential diagnostic features of circulating miR-660 have been demonstrated for non-small cell lung cancer detection in serum only [109]. The persistent presence of miR-660 in the top three of all miRNA specimen models with a superior diagnostic value qualifies this miRNA type as a strong marker in urine-based BC detection.

let7-i could be identified as a constant ‘top four’ member in superior diagnostic miRNA models in the urinary BC-detection setting. The current study observed downregulated expression levels of let7-i in urinary BC samples. This is consistent with a variety of previous studies in different tumor entities, either accounting for its tumor-suppressive characteristics or cancer-related downregulation [110,111,112,113,114,115]. Serum-based investigations revealed the diagnostic features of circulating let7-i in prostate cancer and melanoma [116, 117].

According to nearly all statistical approaches, miR-423 also ranks as one important miRNA type in the four-variable superior diagnostic miRNA panel. The diagnostic value of circulating miR-423 in blood-based investigations has been demonstrated in various pathological settings, including cancer detection or prognosis [118], e.g., in lung cancer [119]. Functional studies identified exosomal miR-423 as an oncogenic trigger in gastric, prostate, and ovarian cancer [120,121,122]. While McDermott et al. [123] skipped miR-423 from further analysis because of its missing significance in blood expression levels in luminal A-like BC, the current study identified significant downregulation in urinary BC specimens. The underlying functional interrelation creating the observed urinary expression levels, which contrasts with the oncogenic direct effects of miR-423 demonstrated in other malignant settings, may be announced as the subject of future analyses. The known association of a distinct genetic miR-423 polymorphism (SNP: rs6505162; NR_029945.1: n.87A>C, T) and functional impact on BC risk [54, 55], as well as prognostic features in several cancers [124], was not a specific subject of the current study and is not detectable by the analytical method applied.

Urinary miR-125b presented with upregulated expression in BC specimens and could be identified for the first time as a promising urinary BC biomarker using boosting and an all-subsets regression statistical method. BC-related regulatory functions investigated by various studies account for a dual-faced impact on malignant progression as well as on BC chemotherapy response and prognosis [37, 125,126,127,128]. Serum levels of circulating miR-125b might serve as a biomarker for BC detection [129] and chemotherapy response [126].

Another promising miRNA type in urinary BC detection identified by boosting in statistical evaluation is miR-194. In regard to cancer, ambivalent functions could be described for miR-194; as reports have provided evidence for both oncogenic and tumor suppressive effects in a variety of cancers [130,131,132,133]. So far, BC-specific investigations indicate a rather oncogenic regulatory impact of miR-194 [51, 134]. The current study detected upregulated urinary miR-194 expression levels in BC compared to healthy controls. This is in line with serum-based investigations focusing on BC recurrence markers, which identified miR-194 upregulation in the serum of BC patients [52].

Three more members of the let7 family exhibit a certain degree of diagnostic potential with regard to urinary BC detection. Depending on the statistical approach applied, let7-a, let7-d, and let7-f rank closely behind the top five superior diagnostic miRNA specimens in this study. Their functional interrelation with cancers, including BC, could be demonstrated for several let7 miRNA types [28, 29, 113, 135]. Recently, circulating let7 levels in serum were described as potential BC biomarkers [23].

Prospective BC-subtype–specific diagnostic features might be offered by miR-17. Tissue-based analyses demonstrated a discriminating value of miR-17 with respect to BC subtyping [136] and a potential application as a recurrence marker [137]. Most interestingly, circulating serum miR-17 expression exhibited a significant relation with hormone receptor status in BC and could be identified as a promising BC biomarker [33]. Upregulated urinary miR-17 ranks in the top five superior BC diagnostic miRNA specimens in the ROC analysis performed in the current study.

4.4 Study Limitations

The critical consideration of the results and the conclusions drawn from the current study comprises analytical, methodical, and statistical aspects. The restricted number of 13 miRNA types included in this analysis may be perceived as an important objection. Standardized operating procedures in circulating miRNA isolation, quality assessment, and quantification remain undefined, particularly with respect to the urinary specimen [17, 96]. To date, there are a small number of biomarker studies on urinary miRNAs in general and a diminutive count of BC-detection approaches via urine specimen in particular. These studies hardly offer any experience or comparison possibilities in this still nascent field of research. With significant advances, minimally invasive blood-based analyses utilizing patient serum or plasma may identify a still growing number of promising miRNA candidates for BC detection [12, 78, 79]. However, the blood-derived body of knowledge on circulating miRNAs cannot be adopted directly to urine. The prerequisite decisive factor for the miRNA types included in the panel analyzed is the robust detectability in native urine samples. Isolation of vesicle-encapsulated miRNAs was performed via filtration-based size restriction using a biomarker subcellular compartment in combination with a commercial lysis-based kit for further processing. In this process, only exosomal miRNAs are captured and measured, while ‘free’ miRNA molecules are omitted. The clinical relevance of ‘free’ versus exosomal miRNAs is still debatable [138]. General guidelines for circulating miRNA quantification are not existent and have to be defined and validated individually dependent on specific settings. The current study attempted to select the most appropriate ones and to follow the most accurate analytical approach based on on-site and external methodical experiences.

The presented analyses are based on a cohort of 109 individuals (69 BC patients, 40 healthy controls). One limitation of this study is that patients with pre-existing or concomitant malignant disease were excluded. This criterion was chosen in the context of this explorative biomarker study analogous to therapy studies in order to ensure a collective of BC patients versus malignancy-unaffected controls as possible homogeneous groups in a contrasting juxtaposition. However, the subsequent validation phase will not continue to adopt this criterion, in order to obtain a preferably robust BC-specific miRNA panel with advanced diagnostic precision in a less restricted variety of patients.

To simply identify the best statistical model for this sample by forward regression or all-subsets regression would not be sufficient to obtain a good prediction for new subjects. We used statistical techniques such as cross-validation, bootstrap, and boosting to overcome the small sample limitation and to find a model with promising predictive ability. In this study, these techniques were used for exploratory analyses. It is planned to validate the resulting model in a future prospective diagnostic study.

Considering the proof-of-principle study performed in 2015 [15], a variety of novel aspects were issued in the current investigation. The panel of analyzed miRNA types varied substantially, with only miR-125b present in both studies. As outlined above, the multi-step selection criteria for candidate miRNAs in consequence of the refined methodical procedure enforced the exclusion of miRNA types that exhibited promising potential in the previous approach [15]. Compared to the former protocol, the sampling restriction defined by midstream urine specimen was simplified to native urine to implement an even more robust protocol with respect to the most practicable future clinical application. The RNA isolation yield could be further optimized with the introduction of vesicle harvest via filtration, thus enabling abandonment of a pre-amplification step before qPCR. The cohort size versus that of Erbes et al. [15] (24 BC patients and 24 healthy controls) more than doubled in the current study. Also, compared to our present study, the approach in [12] was more heuristic and less systematic.

Despite a continuously growing number of studies reporting the potential of circulating miRNAs as clinical biomarkers, to date, no bench-to-bedside implementation has been realized [139]. The advantages of exosomal miRNAs, e.g., their ability to act as conserved and stable information carriers [139], are accompanied by additional features complicating their informative value. The origin of exosomes from malignant cells with a distinct cancer-specific miRNA inventory is an assumption in the vast majority of studies [140, 141]. Consequently, a definite and global backtracking of isolated exosomes to the site of component production is impossible via currently applied techniques. Thus, the origin of exosomes analyzed in the current study remains unclear. There is a possibility that a varying number of exosomes isolated from urine actually originate from malignant BC tissue. However, all other living cells in the patients’ body need to be considered as an alternative origin, e.g., immune system and urogenital tract. Trafficking and sorting of extracellular vesicles/exosomes in body liquids remains experimental [142], although novel methods are being tested in developmental stages [143].

5 Conclusion

The final assessment of the data obtained authorizes the definition of a most promising four-variable urinary miRNA signature with significant diagnostic features in BC detection: miR-424 + miR-423 + miR-660 + let7-i. This panel is characterized by diagnostic features of 98.6% sensitivity and 100% specificity. Results also legitimize the potential capacity for expansion and prospective BC-subtype–specific adaption of the diagnostic miRNA panel based on future validation and verification studies. Clinical BC management remains challenging in various regards, including early detection, pathological categorization, subsequent tailored and individualized treatment, treatment monitoring, and recurrence control. Increasing the versatility of diagnostic, prognostic, and theranostic tools offers new perspectives to improve BC treatment efficacy.

The output of this study facilitates the prospective implementation of a non-invasive BC-detection method based on miRNA biomarker quantification in urine specimens. The object of invention was submitted to the European Patent Office by the University of Freiburg in 08/2019, and currently has a pending patent application status (application No. EP19190989.4).

5.1 Future Directions and Expert Recommendations

There is a strong and growing demand for new strategies aside from established techniques to compensate the still existing deficits in BC management [12]. Non-invasive, urine-based BC detection via circulating vesicle-encapsulated miRNA specimens could be one auspicious integral part in seminal healthcare. Another accompanying advantage of the discovery process of circulating miRNA biomarkers for BC detection is the gains in knowledge regarding BC pathogenesis [144].

The proximate critical endeavor is directed to comprehensive validation of the identified urinary miRNA BC biomarker panel in a clinical prospective cohort study. In the process, parallel continuation of biomarker screening also focuses on the identification of miRNA signatures with discriminating characteristics with respect to molecular BC-subtype categorization. Subsequent to successful validation, adaption of laboratory analytical settings to allow automation of technical processes will need to be put into practice to meet the requirements for reproducibility, operational capacity, and cost-efficiency. The prospective intention is to establish a marketable BC diagnostic test implementable in customary medical laboratories to complement and enhance conventional BC screening methods. Besides the diagnostic features of urinary miRNA specimens in the detection of solid breast tumors, these specimens also bear the potential to serve as BC biomarkers for early diagnosis (BC precursor lesions) or as indicators for BC staging, therapy efficacy, prognosis, or recurrence.

Common consent characterizes current expert opinion on circulating miRNAs holding great promise as diagnostic, prognostic, or predictive biomarkers in the clinical management of BC patients based on liquid biopsy matrices in general [144,145,146,147,148] and in the urinary setting in particular [15, 17]. Certainly, relevant investigators also agree on the methodical and analytical difficulties in setting definite standards for the isolation, reliable detection, and quantification of circulating miRNAs and the choice for appropriate and comparable data assessment options [71, 89, 96, 147,148,149]. As a consequence, existing studies share little consistency with respect to the circulating miRNA signatures identified by different research groups; thus clinical implementation of definite miRNA biomarker panels in oncological practice is missing [147]. Novel and established methods for miRNA biomarker discovery need to be explored and refined singularly and in a combined manner, e.g., via qPCR, NGS, and microarray, and also with regard to high-throughput applications [150]. Besides technical factors, variations in patient selection, potential therapy effects, concurrent diseases, inadequate sample sizes and statistical analysis, and insufficient validation studies testing clinical utilization need to be considered in future approaches [147, 148]. There is a strong demand for larger prospective clinical trials to validate pilot study data and comprehensively designed studies to well characterize a panel of miRNAs specific to tumor types, early or advanced cancer stage, response to treatment, patient outcome, and recurrence [145].

The application of hybrid approaches including urinary miRNA profiling and breast imaging [mammography, breast magnetic resonance imaging (MRI)] represents one promising approach to pioneer multiomic BC detection for future preventive, predictive, and personalized medicine [12, 67, 151, 152].