Introduction

Fecal occult blood tests (FOBTs) have been shown to reduce colorectal cancer (CRC) mortality when applied to annual or biennial screening [1, 2] and have long been used as a noninvasive approach to CRC screening. Historically, FOBT approaches have relied on detecting components of hemoglobin, each with distinct advantages and limitations [3]. The guaiac-based method measures peroxidase-like activity of intact heme and has existed for over a century [4]. The fecal immunochemical test (FIT), which detects the globin protein, was first described four decades ago [5]. While technical refinements with conventional FOBTs continue, there has been no fundamental innovation around the targeted biomarkers themselves in recent decades.

FIT has been increasingly the FOBT of choice for CRC screening, as it is human-specific, not affected by dietary artifact, and detects primarily lower intestinal bleeding [3, 6, 7]. Combining a FIT application with assay of exfoliated molecular markers in stool has proven to be a rational and effective approach to further expand the sensitivity of stool testing for detection of colorectal neoplasia [8,9,10,11]. Such a combination compensates for the intermittency or absence of bleeding from colorectal neoplasms [12, 13], while exploiting the complementary value of FIT.

A multi-target stool test, comprising a panel of DNA markers and FIT [9,10,11], has been approved by the Food and Drug Administration (FDA) for CRC screening, been incorporated into CRC screening guidelines by the American Cancer Society and United States Preventive Services Task Force [14, 15], and become available to patients. The FIT component of this multi-target test currently requires separate stool sampling by the patient, as the EDTA stabilizing buffer for DNA interferes with the protein-based FIT assay, and an assay platform different from that used for the DNA markers. A molecular-based FOBT has potential to simplify the patient experience by avoiding a separate stool sampling and also reduce material costs of the multi-target stool test.

In a recent unpublished RNA sequencing study to identify colorectal epithelial markers, we noted that a subset of microRNAs (miRNAs) were not expressed in colorectal epithelia but are abundantly present in whole blood. Prior reports have found miRNAs to be relatively stable in various biological media, including blood and stool [16, 17]. Conceptually, miRNA that indicates the presence of blood could be assayed from stool by a PCR-based platform, without modification of collection device or FDA-approved nucleic acid buffer. Therefore, we sought to evaluate the feasibility of using miRNA markers as novel markers for fecal occult blood.

Materials and Methods

This prospective study was divided into sequential phases: marker discovery, technical validation, assessment of cellular origin of candidate markers in blood, appraisal of marker specificity across species, determination of dynamic assay range in spike-in experiments, description of marker stability in stool, and blinded exploratory stool studies of CRC detection. The study was approved by the Mayo Clinic Internal Review Board.

Marker Discovery by Small RNA Sequencing

We performed small RNA sequencing on human whole blood (n = 25) and colorectal epithelia, both normal (n = 20) and neoplastic (n = 75). Detailed clinical and demographic data are provided in Supplemental Table 1. Age and gender were balanced between groups. Paraffin-embedded colorectal epithelial tissues were selected from cancer registries at Mayo Clinic, Rochester, Minnesota, and were reviewed by an expert gastrointestinal pathologist to confirm correct histologic classification and to mark for directed sampling using 1.5-mm tissue block punches. Total RNA was extracted using RecoverAll Total Nucleic Acid Isolation Kit (Thermo Fisher Scientific, Massachusetts, USA). Five hundred nanograms of RNA per sample was used to prepare small RNA libraries using NEBNext Small RNA Library Prep Set following manufacturer’s guide (New England Biolabs, Massachusetts, USA). Up to 24 randomized libraries were indexed and pooled in one lane of a flow cell. Sequencing was performed at 50 bp single-read mode on an Illumina HiSeq 2000 sequencing system (Illumina, California, USA).

Annotation and quantification of the miRNA were performed by miRDeep2 [18]. Reads of each miRNA were scaled to global miRNA reads and normalized across samples using a localized smoothing algorithm. miRNAs with average normalized read counts >5 in either blood or epithelia group were further analyzed. The area under the ROC curve (AUC) was estimated using normalized reads with tests of significance based on the methods of DeLong, DeLong, and Clarke-Pearson in R [19]. Criteria to identify blood-specific miRNA marker candidates relative to colorectal epithelial tissue included average read counts in blood >50, fold changes of blood over epithelia >20, and AUCs > 0.95.

Technical Validation of Sequencing Data by RT-qPCR

We adopted different RT-qPCR strategies for miRNAs quantification based on sample type. Complete miRNA assay information is provided in Supplemental Table 2. RT-qPCR on whole blood and tissue samples was performed using Sybr Green-based locked nucleic acid primer miRNA assay (Exiqon, Denmark) following the manufacturer’s guidelines. To validate small RNA sequencing results, all human blood samples (n = 25) and a random subset of the colorectal epithelial samples (n = 20, Supplemental Table 1) were included. PCR reactions were run on a LightCycler 480 System (Roche, Switzerland). Quantification cycle (C q) values were analyzed using automatic baseline adjustment. miRNA expression was calculated by comparative C q method normalized to hsa-miR-30e-3p, which, in small RNA sequencing, was identified to be most consistent to the global mean of all miRNA reads. Comparative C q calculation was performed in Microsoft Excel. Mann–Whitney test performed by Graphpad Prism 6.0 was used to evaluate significant differences in miRNA expression between blood and tissues. A P value below .05 was regarded as statistically significant.

Origin of Blood Markers

To understand which blood cell subtype candidate markers arose from, we investigated their levels in erythrocytes and white blood cells (mononuclear cells and polymorphic nucleated cells). Mononuclear cells were isolated using Ficoll reagent (GE Healthcare, New Jersey, USA). Polymorphic nucleated cells and erythrocyte lysate were isolated from the bottom fraction of Ficoll-spun whole blood. Erythrocyte lysate was prepared using Red Blood Cell Lysis Buffer (Roche, Switzerland). Protocols were carried out based on manufacturer-specific guidelines.

Assay Specificity

Dietary animal blood is known to interfere with guaiac-based FOBTs and, to a lesser extent, FIT [3]. Many miRNA sequences are likely conserved across species, but their abundance may vary in species-specific and organ-specific manners. Thus, we examined the abundance of candidate miRNAs in whole blood obtained from dietary animals including pig, cow, sheep, rabbit, chicken, and turkey (purchased from HemoStat Laboratories, California, USA). Levels of miRNA markers were normalized to input RNA quantity. We also investigated baseline levels of erythrocyte miRNA markers in stool samples from five healthy individuals on unrestricted diets.

Extraction and Detection of miRNAs in Stool Samples

One hundred microliters of stool homogenate was mixed with 700 µL QiaZol containing external control (cel-miR-39-3p). Extraction of total RNA from stool was performed using miRNeasy kit following manufacturer guides (Qiagen, Germany). RNA amount was quantified by RiboGreen assay (Thermo Fisher Scientific). Samples with less than 1 µg extracted total RNA could have insufficient analyte and were not further analyzed. Total RNA was eluted in 50 µL nuclease-free water and subsequently diluted 100-fold. An extraction control (nuclease-free water) was included in each batch of RNA extraction to monitor RNA contamination during extraction.

Since stool contains an abundant amount of PCR inhibitors and confounding nucleic acid content from bacteria, we adopted the more sensitive and specific hydrolysis probe-based miRNA assay (Thermo Fisher Scientific) to detect miRNAs in stool. Assay efficiency and linear range for each miRNA target are described in Supplemental Figure 1. RT-qPCR condition is as follows: 3 µL of the diluted RNA, 0.6 µL TaqMan miRNA RT primer, 6 nmole dNTP (with dTTP), 20 units reverse transcriptase, 1.5 units RNase inhibitor, and 0.6 µL RT buffer were used in one RT reaction with a total volume of 6 µL. The thermal cycling condition was as follows: 16 °C for 30 min, 42 °C for 30 min, 85 °C for 5 min, and hold in 4 °C. PCR reaction mix contains 10 µL TaqMan Universal PCR Master Mix with no AmpErase UNG, 0.5 µL miRNA TaqMan assay, 2 µL RT product, and 7.5 µL nuclease-free water. PCR reactions were run on a LightCycler 480 System using automatic baseline adjustment for C q analysis. The PCR profile was as follows: 95 °C for 10 min, 50 cycles of 95 °C for 15 s, and 60 °C for 1 min. Data collection was carried out at the 60 °C step.

Calibration curve and no-template control were included in every RT-qPCR setup on a 96-well plate format. Copy number was quantified by referencing C q to calibration curves. Reaction with no amplification was assigned a C q value of 50. C q of external control cel-miR-39-3p was used as quality control for RNA extraction and RT-qPCR but not for normalization [20]. Samples with cel-miR-39-3p recovery >20% were determined valid for analysis. We included an epithelial marker hsa-miR-200b-3p, which is independent to fecal blood concentration, as a normalizer for erythrocyte markers in a generalized additive model by R, for correcting variations in endogenous RNA content and RT-qPCR efficiency (also refer to “Results”).

Assay Correlations with Spiked Blood

We investigated assay linearity of miRNA markers and of FIT (Eiken Chemical Co., Japan) over two ranges of blood concentrations: one across a broad, 6 orders of magnitude concentration range in saline and the other across a clinical range in stool (pooled sample of stool from five healthy individuals on unrestricted diets spiked with whole blood at concentrations of 2.5, 5, 10, 25, and 50 μL/g of stool to represent a clinically relevant fecal occult blood range). A parametric Pearson correlation coefficient estimated by Graphpad Prism 6.0 was used to evaluate linearity on the logarithmic scale.

Analyte Stability in Stool

To assess fecal stability of blood miRNA markers, stools from five healthy human volunteers on unrestricted diets were spiked with blood at a concentration of 25 μL/g to simulate typical fecal blood levels in cancer. Storage under PBS and three different nucleic acid preservative buffers including EDTA buffer (Exact Sciences, Madison, USA), RNA Later (Thermo Fisher Scientific), and DNA/RNA Shield (Zymo Research, California, USA) were evaluated over a 14-day incubation period at room temperature.

Clinical Studies

We investigated fecal levels of candidate miRNA markers in two cohorts from the referral setting. Exploratory set (Supplemental Table 3) was stool samples obtained from a well-characterized archive maintained at Mayo Clinic, Rochester. Its samples had been reported by previous studies of our group [8, 13]. Feasibility set (Supplemental Table 4) was independent archival stool samples collected by Exact Sciences. All pathologies were reviewed by expert gastrointestinal pathologists. An effort was made to balance age and gender within each set. We excluded patients if they had a history of familial adenomatous polyposis or hereditary nonpolyposis CRC.

Exploratory set was studied to filter individual candidate markers in stool for subsequent evaluation. This included archival stools from cases with CRC (n = 20), disease controls with pancreatic cancer (n = 20), and healthy controls (n = 20). Patients in both control groups were individuals with normal colonoscopy. Pancreatic cancers were included to examine miRNA erythrocyte marker cross-reactivity with an upper GI cancer which also involves neoplastic marker exfoliation into stool. The nonparametric Mann–Whitney test was used to evaluate significant differences of stool miRNA levels between cases and controls. A P value level below .05 was regarded as statistically significant. Among this set, FIT data on 35 samples were available including on all colorectal cancer cases (n = 20) and a subset of normal controls (n = 15).

Best miRNA markers were then explored in an independent feasibility set that included 29 CRC cases, 31 advanced adenomas, and 115 controls with normal colonoscopy. Advanced adenoma was defined as adenomas ≥1 cm, with high-grade dysplasia, or with more than 25% villous features. The endpoint of CRC (yes/no) was fit using a generalized additive model in R using smoothing splines of the predictor variables. ROC curves were used to illustrate panel discrimination. For any combination of two group comparisons, the minimum detectable AUC from the null value of 0.5 is 0.69 assuming group sizes of 30 or more and 80% power with a one-sided test of significance of .05.

In both sets, stool collection, processing, and storage were carried out as previously described by our team [8,9,10,11]. Briefly, stools were collected before bowel purgation and colonoscopy or 1 week after colonoscopy but before neoplasm resection or adjuvant therapy. Whole stools were collected in buckets, and EDTA buffer for nucleic acid preservation was added by patients immediately after defecation. Buffered stools were shipped to the laboratory, and homogenized and archived at −80 °C within 1–4 days after defecation for long-term storage. To minimize variation in sample quality due to potential freeze–thaw artifact, samples had gone through no more than one previous freeze–thaw cycle. RT-qPCR on each clinical stool sample was performed once, by investigator who was blinded to samples’ clinical information.

Results

Marker Discovery and Characterization

Based on all 120 small RNA libraries, 502 miRNAs had average normalized reads above 5 in either blood or colorectal epithelia, and 145 (29%) had higher levels in blood than in colorectal epithelia (Supplemental Data 1). Nineteen miRNAs had average read counts above 50 in blood and demonstrated perfect discrimination between blood and epithelia with AUCs of 1.0 at P values <.0001 (Table 1). Among these 19 marker candidates, six exhibited a >20-fold change in blood compared to epithelia: hsa-miR-144-3p, 451a, 144-5p, 20b-5p, 486-5p, and 363-3p. All six markers demonstrated no significant difference in expression across colorectal epithelial subgroups at the tissue level. These top six markers each retained high discrimination in validation by RT-qPCR, and their expression in blood exceeded that in tissue by 119- to 3988-fold. All six markers were found to have arisen from erythrocytes with no cross-expression in white blood cell types or in epithelia (Fig. 1a).

Table 1 Most differentially expressed miRNAs: blood versus colorectal epithelia
Fig. 1
figure 1

Association of blood-enriched miRNAs with erythrocytes. a Expression of miRNAs in blood and epithelial cells. Correlations of erythrocyte-specific miRNA and FIT evaluated in (b) dilution series of blood in saline across 6 orders of magnitude, and c fecal blood range seen in practice

Several miRNA markers were identified as highly discriminant for colorectal epithelium and absent in whole blood. Top epithelial markers achieved AUC of 1.0 with >1000-fold differential expression over blood. We selected hsa-miR-200b-3p as a candidate epithelial marker to normalize erythrocyte marker levels in stool experiments (Fig. 1a).

Assay Correlation with Spiked Blood

Dilutions in saline. Based on complete blood count, serial dilutions yielded calculated concentrations ranging from 4 to 3.6 × 105 erythrocytes per µL solution. Overall, all six miRNAs were linearly associated with erythrocyte concentration (R 2 = 0.984–1.000 level, P < .0001 for each, Fig. 1b). In contrast, the linearity of FIT was less than that for miRNA markers (R 2 = 0.899, P = .0039) primarily due to loss of linearity at extreme upper and lower erythrocyte concentrations.

Dilutions in stool. Calculated levels of the spike-in series approximating the clinical fecal occult blood range were from 1800 to 36,000 erythrocytes per µL stool homogenate (Fig. 1c), FIT and hsa-miR-363-3p, 451a, and 144-5p correlated highly with erythrocyte concentration (R 2 = 0.99–0.999 levels, P < .0001 for each). hsa-miR-144-3p, 486-5p, and 20b-5p correlated with erythrocyte concentration at R 2 = 0.757 (P = .0242), 0.952 (P = .0009), and 0.973 (P = .0003) levels, respectively.

Assay Specificity

Erythrocyte-specific hsa-miR-144-3p, 144-5p, 20b-5p and 363-3p showed very low expression in cow, pig, sheep, rabbit, turkey and chicken whole blood relative to human blood (Supplemental Table 5). Cross-reactivity was observed with hsa-miR-451a in blood from pig and with hsa-miR-486-5p in blood from sheep. Except for hsa-miR-144-3p, the other five miRNA erythrocyte markers showed negligible baseline level in stools from healthy volunteers on unrestricted diets (Fig. 2).

Fig. 2
figure 2

Marker stability in stool. Recovery of a hsa-miR-144-3p, b hsa-miR-144-5p, c hsa-miR-451a, d hsa-miR-486-5p, e hsa-miR-363-3p, and f hsa-miR-20b-5p was evaluated across 14-day incubation at room temperature in stools homogenized in saline (PBS) or three types of preservative buffer. miRNA level was expressed as ratio relative to day 0. Dotted line indicates baseline miRNA levels of a pooled sample without spiking

Analyte Stability in Stool With and Without Preservative Buffers

At a spike-in concentration of 25 μL blood per gram of stool, erythrocyte miRNAs markers were least stable in the absence of preservative and most stable in DNA/RNA Shield. Degradation was reduced by variable degrees in different preservative buffers (Fig. 2). None of the markers returned to baseline even in stools homogenized with a non-preservative saline solution and incubated for 14 days at room temperature.

Clinical Studies

In the exploratory set, fecal levels of hsa-miR-144-5p and 451a were significantly higher in CRC cases (n = 20) than normal controls (n = 40, Fig. 3b, c), and these two markers were carried forward for further exploration. Marker levels in stools from pancreatic cancer patients, included to assess potential confounding by upper GI pathology, were similar to normal controls. In a subset of cases and controls for which both FIT and miRNA results were available, CRC detection rates appeared similar (Supplemental Table 6). Top markers were then evaluated in an independent feasibility set, using a generalized additive model based on hsa-miR-451a and 144-5p. The epithelial marker hsa-miR-200b-3p, which was also modeled in the algorithm, had a negative coefficient, indicating it functioned as a normalizer (Fig. 4). All three markers were independently significant contributors to the model. For CRC (n = 29) detection, the panel yielded an AUC of 0.89 (95% CI 0.82–0.95, P < .0001, Fig. 4), and at 95% specificity, the panel demonstrated a sensitivity of 66% for all cancers. Sensitivity was higher with distal cancers (92%) than with proximal cancers (48%), P = .0084 (Table 2). No significant difference in sensitivity was found in detecting early T stage (T1 and 2) and late T stage (T3 and 4) cancers. Detection of advanced adenoma (n = 31) was poor (AUC = 0.58, 95% CI 0.45–0.69, P = .244), consistent with the low sensitivities seen with conventional FOBTs. Despite this, the miRNA panel was more sensitive toward advanced adenomas in distal colon than proximal colon (P = .0432, Table 2). It should be noted that this modeling was not performed to exploratory set to avoid over-fitting of data in a small sample size. In both sample sets, all stool samples had met the criteria of above 1 µg in extracted total RNA amount and above 20% external control recovery efficiency.

Fig. 3
figure 3

Stool levels of individual erythrocyte-specific miRNAs in exploratory set. Stool levels of a hsa-miR-144-3p, b hsa-miR-144-5p, c hsa-miR-451a, d hsa-miR-486-5p, e hsa-miR-363-3p, and f hsa-miR-20b-5p. This exploratory set includes healthy individuals (normal, n = 20), patients with pancreatic cancer (PanC, n = 20), and patients with colorectal cancer (CRC, n = 20)

Fig. 4
figure 4

Colorectal cancer detection by stool assay of an erythrocyte miRNA panel in feasibility set. a ROC data plotted for an erythrocyte miRNA panel in this feasibility study (29 CRC, 115 controls with and without normalization with epithelial marker). b Cancer detection rates by site and stage

Table 2 Erythrocyte miRNA marker panel sensitivities and specificities

Discussion

We have demonstrated a novel approach to fecal occult blood testing based on erythrocyte-specific miRNAs. Considering its features, this new method may yield certain performance advantages over conventional tests (Table 3).

Table 3 Characteristics of guaiac, immunochemical, and miRNA-based FOBTs

Although depleted of DNA and large-sized RNAs, the high abundance of miRNAs in erythrocytes and the ability to quantify them by RT-qPCR approaches render miRNAs as rational and appealing candidate class of markers for occult blood. Others have reported erythrocyte-enriched miRNAs and their potential use as a hemolysis indicator in plasma samples [21]. However, many such miRNAs also appear to be abundant in epithelia (Supplemental Data 1) [22], and exfoliated epithelial cells likely contribute to the majority of miRNA content found in stool. Conversely, because of the overlapping expression, previously reported elevation of epithelial miRNA markers in stool such as hsa-miR-92a-3p and 21-5p could have also stemmed from occult tumor bleeding [23]. To avoid markers of confounding origin, we specifically selected markers that were abundant in blood and had absent or low expression in colorectal epithelia. Because of their low background in normal stool, the presence of trace amounts of blood could be reflected by these markers.

As with both the heme and globin moieties of hemoglobin, which are, respectively, targeted by guaiac and FIT tests [3], we found that miRNA is progressively degraded in stool over time in our incubation study. The degree of such degradation varied by individual miRNA and could be reduced using preservative buffers. Unlike hemoglobin which is measured as a concentration in the buffer by FIT and not adjusted for variable degrees of stool hydration, miRNA marker levels in stool can be directly quantified when normalized against an internal control to correct for variations in input content and assay efficiency. Inclusion of the epithelial marker identified in this study (hsa-miR-200b-3p) as a normalizer in the miRNA panel significantly improved the AUC in discriminating CRC cases.

Functions of erythrocyte-enriched miRNAs are not clearly understood. However, it is known that both hsa-miR-144-3p/5p and hsa-miR-451a are members of the hsa-miR-144/451 cluster located on chromosome 11. Transcription of this locus is regulated by hematopoietic transcription factor GATA-1 [24, 25]. Mature hsa-miR-144-3p (major product of precursor hsa-miR-144) and hsa-miR-451a regulate erythropoiesis. hsa-miR-144-5p, being the minor product, is not known to play important roles in erythrocytes.

In our small exploratory and feasibility studies using archival stool samples stored in EDTA buffer, erythrocyte miRNA markers yielded moderately high sensitivity and specificity for CRC detection and at levels comparable to FIT. The panel was more sensitive to distal lesions than proximal lesions, potentially due to the lesser extent of bleeding with proximal lesions and degradation of analyte during luminal transit, a phenomenon that is also seen with stool assays for hemoglobin [9]. Sensitivity for advanced adenomas (conventional adenomas and SSAs combined) was relatively much lower than for CRC in this study. While this new miRNA approach remains to be optimized and formal comparisons with established FOBTs are premature, observed detection rates for these advanced precancers were lower than those reported for FIT with advanced conventional adenomas [9, 11, 26] and similar to those reported with SSAs [9, 11, 13]. The predominantly proximal location of advanced adenomas in our study might have also contributed to the low marker sensitivity. While these marker levels were negligible in patients with pancreatic cancer, definitive studies need to be performed to address detection of upper gastrointestinal bleeding. It is important to note that CRC cases in both cohorts had been recruited from symptomatic patients in the referral setting, and this early assessment of miRNA marker discrimination in stool may not be representative of performance in a screening population.

The FDA-approved multi-target stool DNA test combines FIT and a DNA marker panel [9, 11]. Unlike the FIT component of this test which requires a separate stool sampling by the patient because the protein immunoassay is incompatible with the EDTA buffer required to preserve DNA, we have shown in the present study that erythrocyte miRNA markers can be assayed directly from the EDTA-buffered sample. Accordingly, use of miRNA markers in place of FIT may allow testing of all analytes from one buffered stool specimen. This would obviate the need for separate stool sampling, simplify the patient experience, and reduce material costs of the stool collection kit and assay process. Further research and development efforts are clearly needed to prove that miRNA markers are non-inferior to FIT for this application.

In conclusion, we have shown that fecal occult blood levels can be accurately quantified across a broad concentration range using a novel approach based on assay of erythrocyte-specific miRNA markers. Candidate markers are generally human-specific and are stable in stool with use of preservation buffers. Pilot data suggest that a miRNA marker panel can detect CRC at a rate similar to that by FIT. Our discovery data and observations from this early exploration in clinical stool samples justify further technical refinement and, thereafter, robust clinical studies to train and test this approach under optimized sample collection and assay methods.