Background

Colorectal cancer (CRC) is currently the third most common malignancy and the second leading cause of cancer-related death in the world (Siegel et al. 2016). Most of the patients are diagnosed at the advanced stage, where tumor invasion and early systemic dissemination have already occurred, missing the optimum period for curative resection (Figueredo et al. 2008). The dismal outcome of this disease has aroused the attention to the critical importance of early detection. The wide use of fecal occult blood testing (FOBT) and colonoscopy examination in the past decades has moderately increased the detection rate of early-stage tumors. FOBT is a cost-effective and safe test, but it has relatively poor sensitivity and is prone to produce false results as colorectal tumors intermittent bleed (Health Quality Ontario 2009; Duffy et al. 2011). In contrast, colonoscopy examination offers high diagnostic accuracy, but the invasive nature and high cost render it impractical for large-scale screening (Kim et al. 2007; Mandel. 2008). Stool DNA test targeted at molecular biomarkers has been commercially available for twelve years, but the high cost and poor sensitivity limited its clinical application (Bailey et al. 2016; Itzkowitz et al. 2007). CEA and other CRC traditional biomarkers, like CA242, have been commonly used in clinical practice. However, their sensitivity is also far from satisfying (Levin et al. 2008; Zhong et al. 2015). Thus, there is an urgent need to develop a new noninvasive, sensitive and cost-effective method to complement and improve the current CRC screening strategies.

Long noncoding RNAs (lncRNAs) are a class of noncoding RNAs that are longer than 200 nucleotides and do not translate into proteins (van Werven et al. 2012; Ponting et al. 2009). These lncRNAs have recently attracted increasing research interest due to their important role in the regulation of multiple biological processes including initial tumor development and its progression. Aberrant lncRNA expression has been detected in tissues sections from breast cancer, gastric cancer, esophageal squamous cell cancer (ESSC), non-small cell lung cancer (NSCLC) and colorectal cancer (Yang et al. 2016; Nie et al. 2016; Xu et al. 2016; Feng et al. 2015; Sun et al. 2016a). Recently, lncRNAs were found to be present in bloodstream in a stable state and may reflect the physiological and pathological alterations of patients with cancer, which excited great interest among researchers in investigating the possibility of using circulating lncRNAs as surrogate minimally invasive biomarkers. Despite the growing body of the literature characterizing the possibility of circulating lncRNAs as biomarkers, there are still no reports regarding the potential role of circulating lncRNAs in diagnosis of CRC patients (Wang et al. 2015; Tong et al. 2015; Zhou et al. 2015).

Here, we systematically investigated the expression of specific lncRNAs using a three-phase study. In the initial screening phase, microarray analysis was employed to mine potential candidate lncRNAs integrated with previous studies (Ge et al. 2013; Li et al. 2015; Ni et al. 2015; Qi et al. 2013; Sun et al. 2016b; Tong et al. 2014; Wang et al. 2016b; Yao et al. 2016; Yin et al. 2014, 2015) In the training phase, we performed clinical validation of lncRNA expression status using both tissue and serum samples to identify lncRNAs dysregulated with consistent pattern in these clinical materials and constructed a diagnostic panel based on the result. In the validation phase, lncRNAs identified were further verified and the diagnostic performance of the panel was further validated. In addition, we also assessed the correlation between the expression level of lncRNAs identified and the disease-specific survival rate of CRC patients, to explore their potential for prognostic prediction.

Methods

Ethics statement

This study has been conducted under the supervision of the Clinical Research Ethics Committee of Qilu Hospital, Shandong University. The written informed consent was obtained from each participant prior to tissue and blood sample collection, and all the experiments were performed in accordance with the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments.

Study design

To identify potential serum lncRNA biomarkers for colorectal cancer, step-by-step discovery procedure was designed including three phases—initial screening phase, training phase and validation phase. The lncRNA candidates in each phase were determined based on the profiling results of prior phase of study. In the initial screening phase, primary CRC tissues along with corresponding adjacent non-tumor (NM) tissues from six different CRC patients were subjected to microarray analysis, to identify lncRNAs significantly differentially expressed. Candidate lncRNAs were then selected according to the result of the analysis and previous studies (Ge et al. 2013; Li et al. 2015; Ni et al. 2015; Qi et al. 2013; Sun et al. 2016b; Tong et al. 2014; Wang et al. 2016b; Yao et al. 2016; Yin et al. 2014, 2015). In the training phase, lncRNA candidates were firstly verified by RT-qPCR in 80 pairs of tissue specimens, involving primary CRC tissues and matched NM tissues. Subsequently, the verified lncRNA was further examined in an independent cohort of serum samples obtained from 120 CRC patients and 120 controls. The data from this were used to construct the diagnostic panel, based on the logistic regression model for differential diagnosis between the CRC and the control group. In the validation phase, the diagnostic lncRNA panel constructed in the training phase was applied to another independent cohort of serum samples from 240 patients (120 CRC patients and 120 controls) to validate its diagnostic performance. In addition, the correlation between the expression of identified lncRNAs and disease-specific rate of CRC patients was assessed to explore their potential as predictors for CRC prognosis.

Patients and control subjects

All participants involved were recruited from Qilu Hospital of Shandong University, between 2007 and 2015. Demographic and clinical features of CRC patients and controls have been described in Table S1. Tissue specimens were obtained from patients who underwent primary tumor resection at the Department of General Surgery. Serum samples were collected before any anticancer treatments such as surgery, chemotherapy and radiotherapy were given. All CRC patients were clearly diagnosed based on histopathology or biopsy analysis. Tumor stage was defined according to the tumor-node-metastasis (TNM) staging system of Union for International Cancer Control (UICC).

CRC patients in the validation phase have been followed up since diagnosis confirmed, at intervals of 3 months in the first 2 years and at 6 months interval thereafter, up to the fifth year. Disease-specific survival was defined as the interval from date of diagnosis to CRC-related death. The date of latest record retrieved was June 20, 2015. Owing to incomplete follow-ups, 13 of all the cases were excluded from the cohort, and the median follow-up time was 62 months (range, 11–74 months).

Sample collection and preparation

Fresh tumor tissue and paired adjacent normal tissue sections were immediately cut from the resected colorectal tissue and kept at −80 °C until RNA extraction. Venous blood was collected and centrifuged at 4000 rpm for 10 min, within 2 h. The supernatant fluids were then collected and further centrifuged at 12,000 rpm for 15 min to completely remove the cell debris. The whole process was strictly controlled to avoid hemolysis, and the supernatant serum was stored at −80 °C, until analysis. The CEA level of each sample was determined using Roche cobas e601 (Roche, Switzerland).

LncRNA microarray analysis

Total RNA extracted from six pairs of fresh primary colorectal cancer tissues and their adjacent normal tissues were subjected to human genome-wide lncRNA microarray (Arraystar Human LncRNA Microarray V2.0; Agilent Technology, Santa Clara, CA) analysis. The whole process and subsequent data analysis were performed by Kangchen Bio-tech, Shanghai P.R. China.

RNA isolation and RT-qPCR analysis

Total RNA from frozen tissues and serum samples were extracted using TRIzol and TRIzol LS reagents (Invitrogen, Carlsbad, CA), respectively, according to the manufacturer’s protocol. The quantity and quality of RNA were measured by NanoDrop spectrophotometer. The reverse transcription (RT) reactions were performed using a Prime Script™ RT Reagent Kit (Takara, Dalian, Liaoning).After mixing with 1 μg of template RNA, 4 μL of 5× Prime Script Buffer Mix, 1 μL of Prime Script RT Enzyme MixI, 1 μL of Oligo dT Primer and RNase-free dH2O in a final volume of 20 μL, the reaction volumes were incubated at 37 °C for 30 min, followed by 85 °C for 5 s and 4 °C for 60 min. For real-time PCR, 2 μL of diluted generated cDNAs was mixed with 12.5 μL of SYBR Premix Ex TaqTM, 0.5 μL of DyeII, 1 μL forward and reverse primers (10 µM) and 9 μL of nuclease-free water in a final volume of 25 μL, according to the manufacturer’s instructions (Takara, Dalian, Liaoning). The reactions were incubated at 95 °C for 30 s, followed by 45 cycles of 95 °C for 5 s and 60 °C for 34 s. Melting curve analysis was performed to evaluate the specificity of the RT-qPCR products. All reactions were run on CFX96™ real-time system (Bio-Rad, CA, American). Each RT-qPCR experiment was repeated three times. Relative expression of genes was calculated using the comparative cycle threshold (Ct) (2−ΔΔCt) method with glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as the endogenous control. The average of Ct values of all normal samples was used as a control sample in this method.

Statistical analysis

LncRNA levels among different groups of subjects were normalized by endogenous control, GAPDH. The distribution of the lncRNA relative expression (based on 2−ΔΔCt method) of each group was determined by Kolmogorov–Smirnov test. Data were presented as median (interquartile range). Nonparametric Mann–Whitney U test was used to compare the differences in expression of lncRNAs between the CRC and the control group. Receiver-operating characteristic (ROC) curves and the area under the ROC curve (AUC) were employed to evaluate the performance of the selected lncRNA panel to discriminate CRC patients from controls. Kaplan–Meier method was utilized to estimate survival curves, and the log-rank test was used to make comparisons. The Cox proportional hazards regression model was used to identify the independent prognostic factors. ROC analysis was processed by MedCalc 15.2.2 (MedCalc, Mariakerke, Belgium) software. MATLAB software (MATLAB, R2013a) was used for logistic regression analysis to establish lncRNA panel. Other analysis was performed using SPSS version 19.0 software (SPSS, Chicago, IL). A p value of <0.05 was considered as statistically significant.

Results

Identification of candidate lncRNAs by microarray analysis

The microarray analysis with six pairs of primary colorectal and adjacent normal tissues was conducted with a microarray targeting 33,000 lncRNAs. In total, 5873 lncRNAs were identified with significant differential expression (fold change ≥2.0 and p < 0.05). To identify the lncRNAs that were potential biomarkers, we concentrated on the lncRNAs with fold change >10, including 301 lncRNAs upregulated in the tumor tissues and 122 downregulated. Starting from those lncRNAs with the greatest fold changes, we filtered appropriate candidate lncRNAs in descending order. Candidates should be plausible for prime designing, and only those have steady expressions in tissue and serum samples were selected. Finally, we chose five candidate lncRNAs from the upregulated group and five from the downregulated group as well (Table 1). Another eight lncRNAs were also tested by RT-qPCR because they had been shown dysregulated in malignant cancer tissues (Ge et al. 2013; Li et al. 2015; Ni et al. 2015; Qi et al. 2013; Sun et al. 2016b; Tong et al. 2014; Wang et al. 2016b; Yao et al. 2016; Yin et al. 2014, 2015). Eighteen candidate lncRNAs were selected in total.

Table 1 Candidate lncRNAs selected on a basis of the microarray analysis

Analysis of selected lncRNA expression by RT-qPCR

The expression of all 18 candidate lncRNAs was evaluated by RT-qPCR, using 80 pairs of CRC and control tissue samples. Among these, eight lncRNAs (MALAT-1, NR_029373, NR_046321, NR_034119, GAS5, BANCR, NR_026817 and PCAT-1) were found significantly dysregulated in CRC tumor tissues compared with controls (Table 2). Subsequently, these eight lncRNAs were further analyzed in an independent cohort of 240 serum samples including 120 CRC patients and 120 controls. One lncRNA (BANCR) that was upregulated and three other lncRNAs (NR_026817, NR_029373 and NR_034119) whose expression was downregulated in CRC patients compared to control were finally identified (Fig. 1a–d; Table S2). The corresponding AUCs of the four lncRNAs (BANCR, NR_026817, NR_029373 and NR_034119) were 0.638, 0.708, 0.812 and 0.724, respectively (Fig. 2a–d).

Table 2 Expression of 18 candidate lncRNAs in CRC and corresponding adjacent non-tumor tissues [median (interquartile range)]
Fig. 1
figure 1

Concentrations of four identified serum lncRNAs in patients with CRC (n = 120) and control individuals (n = 120) using RT-qPCR assay in training set (ad), *p < 0.001

Fig. 2
figure 2

ROC curve analysis for the detection of CRC using BANCR (a), NR_026817 (b), NR_029373 (c) and NR_034119 (d) in patients with CRC (n = 120) and control individuals (n = 120) in training set

Identification of predictive lncRNA panel

Retrospectively, a stepwise logistic model was constructed for CRC diagnosis using the 240 serum samples enrolled in the training phase. The predicted probability of being diagnosed with CRC from the model based on the 4-lncRNA panel was calculated using the equation as follows: Logit(P) = −1.0468−0.5716 × BANCR + 0.6212 × NR_026817 + 0.8296 × NR_029373 + 0.3967 × NR_034119. ROC analysis was used to evaluate the diagnostic performance of the established lncRNA panel. The AUC for the 4-lncRNA panel was 0.891 (95 % confidence interval [CI] = 0.844–0.927, sensitivity = 81.67 % and specificity = 80.00 %, Fig. 3a).

Fig. 3
figure 3

(a, b) ROC curves for the detection of CRC using 4-lncRNA panel in training set (a) and validation set (b); (c) ROC curve analysis using CEA for the detection of CRC in validation set; (df) ROC curves using the 4-lncRNA panel for the detection of patients with TNM stage I (d), II (e) and III (f) in validation set; (gi) ROC curve analysis using CEA for the detection of CRC stage I (g), II (h) and III (i) in validation set

Validation of the lncRNA panel

The four lncRNAs identified in the training phase were further measured using another validation data set (another independent cohort including 120 CRC patients and 120 controls). No significant differences were observed in the distribution of age, gender and tumor characteristic for the CRC and control group samples between the training and the validation sets (Table S1). The alteration in the lncRNA expression pattern of the samples in the validation phase cohort was consistent with those in the training phase serum samples (Table S2; Fig. S1). The corresponding AUCs of the four lncRNAs (BANCR, NR_026817, NR_029373 and NR_034119) were 0.659, 0.752, 0.806 and 0.706, respectively (Fig. S2).

The diagnostic performance of the identified 4-lncRNA panel was further assessed. The predicted probability was calculated based on the parameters obtained from the training set and used to construct the ROC curve. The AUC of the 4-lncRNA panel was 0.881 (95 % confidence interval [CI] = 0.833–0.919, sensitivity = 89.17 % and specificity = 75.83 %, Fig. 3b), which was significantly better than that of CEA (AUC: 0.749, 95 % confidence interval [CI] = 0.689–0.803, sensitivity = 56.67 % and specificity = 86.67 %, p < 0.001, Fig. 3c).

Furthermore, we then compared the diagnostic performance of this 4-lncRNA panel with CEA, in discriminating CRC patients from control individuals at different TNM stages. The AUCs of the 4-lncRNA panel for patients with TNM stageI, II and III were 0.774, 0.844 and 0.949, respectively (Fig. 3d–f), and were all higher than those of CEA, which were 0.588, 0695 and 0.861, respectively (Fig. 3g–i).

Correlation between the four lncRNAs and clinicopathological characteristics

The data summarized in Table S3 show the relationship between the four identified lncRNAs and the clinicopathological characteristics of the patients with CRC in the validation set. Higher levels of serum lncRNA BANCR and lower levels of NR_026817, NR_029373, NR_034119 significantly correlated with advanced TNM stage (p < 0.05) and lymph node metastasis (p < 0.05). Higher levels of BANCR correlated with tumor local invasion (p < 0.05). However, no significant associations were found between the four lncRNAs with age, gender, tumor location, size or differentiation (all at p ≥ 0.05).

Correlation of lncRNAs expression between tissues and serum

We analyzed the correlation between the expression levels of BANCR, NR_026817, NR_029373, NR_034119 in CRC tissues and serum. As shown in Fig. S3, a significant correlation was observed for all four lncRNAs (BANCR: r = 0.504, p = 0.005; NR_026817: r = 0.611, p < 0.001; NR_029373: r = 0.573, p < 0.001; NR_034119: r = 0.589, p < 0.001). It suggested that there is a consistency in the expression of these four lncRNAs at both tissue and serum levels.

Analysis of the identified lncRNAs stability in serum

The serum from five CRC patients was treated under different harsh conditions to assess the stability of these four identified lncRNAs, BANCR, NR_026817, NR_029373 and NR_034119. The serum was collected and subsequently exposed to harsh conditions, such as storage at room temperature for 4, 8 and 24 h or incubated at −80 °C and repetitive freeze–thaw cycles for 0, 2, 4 and 8 times. No obvious alterations in the expression level of these four lncRNAs were observed, as shown in Fig. S4, suggesting their stability.

Correlation between the four lncRNAs expression and disease-specific survival rate

Survival analysis has been finally carried on 107 patients since 13 patients were lost to follow up. Kaplan–Meier survival analysis showed that patients with low expression of NR_029373 and NR_034119 had significantly lower disease-specific survival rate, compared with those with high expression (p = 0.013 and 0.044, respectively, Fig. 4). A statistically significant correlation was also observed between disease-specific survival rate and NR_029373 expression (p = 0.016), NR_034119 expression (p = 0.049), size (p = 0.007) and TNM stage (p < 0.001) using univariate Cox proportional hazards regression model analysis. Parameters significantly related to disease-specific survival based on univariate analysis were further analyzed by multivariate analysis to identify the independent prognostic factors. The NR_029373 expression (p = 0.013), NR_034119 expression (p = 0.038), size (p = 0.006) and TNM stage (p = 0.022) were finally identified as statistically significant prognostic factors (Table 3).

Fig. 4
figure 4

Kaplan–Meier curves for disease-specific survival rate according to the serum levels of NR_029373 (a) and NR_034119 (b) in patients with CRC in validation set

Table 3 Univariate and multivariate analysis for predictors of disease-specific survival in the clinical validation cohort

Discussion

In our study, microarray analysis was firstly employed to provide basic information of lncRNAs significantly dysregulated in CRC tissues. Candidate lncRNAs were selected, compiled the microarray analysis result and previous studies, and then evaluated by RT-qPCR in tissues and serum samples to validate their consistent pattern of dysregulation in these clinical materials (Ge et al. 2013; Li et al. 2015; Ni et al. 2015; Qi et al. 2013; Sun et al. 2016b; Tong et al. 2014, Wang et al. 2016b; Yao et al. 2016; Yin et al. 2014, 2015). Four lncRNAs (BANCR, NR_026817, NR_029373 and NR_034119), which showed considerable discriminating potential to identify CRC patients from control with high AUC values, were finally identified. Using the multivariate logistic regression model, we established a panel of four lncRNAs that can diagnose CRC patients with higher accuracy in comparison with traditional diagnostic biomarker like CEA. In addition, we also identified NR_029373 and NR_034119 as independent factors for CRC patient prognosis.

The expression profile of 18 candidate lncRNAs was first explored in CRC tissues and adjacent NM tissues in the training phase of our study (Liu et al. 2015a; Ren et al. 2015). Among these, eight lncRNAs (MALAT-1, NR_029373, NR_046321, NR_034119, GAS5, BANCR, NR_026817 and PCAT-1) were significantly identified to be differentially expressed between CRC tissues and the adjacent NM tissues. The result showed that the expression of GAS5 significantly reduced in CRC tissues, while expression of MALAT-1 and PCAT-1 lncRNAs increased, and this trend was consistent with previous studies (Yin et al. 2014; Ma et al. 2016; Ji et al. 2014; Ge et al. 2013). Guo et al. have reported that BANCR was overexpressed in CRC, while Shi et al. claimed its reduced expression (Guo et al. 2014; Shi et al. 2015). In the present study, it was observed upregulated in CRC tissues. In addition, exploring the role lncRNAs in tumor carcinogenesis may help to decipher their oncogenic or suppressor function in CRC patients. Yin et al. showed that GAS5 overexpression could inhibit in vivo CRC cell proliferation, which indicated that it may function as a tumor suppressor in CRC progression (Yin et al. 2014). The MALAT-1 lncRNA has been reported to be involved in RC cell proliferation, migration and invasion (Ji et al. 2014). In study by Yang et al., it was demonstrated that MALAT-1 promoted the growth of tumor cells via a target protein called AKAP-9 (Yang et al. 2015). There have been no functional studies about the role of PCAT-1 in development of CRC, but PCAT-1 has been identified to be associated with the tumor cell proliferation in several other cancers including NSCLC and bladder cancer (Zhao et al. 2015; Liu et al. 2015b). The deregulations of the expression of other four lncRNAs (NR_029373, NR_034119, NR_026817 and NR_046321) were firstly reported in our study.

The reliance on surgical resection and invasive procedures limits the application of tissue lncRNAs in cancer diagnosis, which highlights the merit of lncRNAs in circulation (El-Tawdi et al. 2016; Gu et al. 2016; Tong et al. 2015; Wang et al. 2015, 2016a; Wu et al. 2016). We further measured the expression of identified tissue lncRNAs in serum samples. LncRNAs dysregulated in both tissue and serum with the consistent pattern could effectively represent the lncRNA expression alteration of tumor tissues and simultaneously satisfy the demand of noninvasive biomarkers. RT-qPCR was performed in two independent cohorts of serum samples from the training and validation phase of our study, to validate the lncRNAs identified in the tissue specimens. This stringent analysis led to identification of four significantly altered lncRNAs (BANCR, NR_026817, NR_029373 and NR_034119). In addition, ROC analysis further confirmed that expression of these lncRNAs considerably distinguished CRC patients from the controls, with high AUC values. Given the heterogeneous nature of CRC, a 4-lncRNA panel was established to further improve the diagnosis of this disease. The higher diagnostic accuracy in the training and validation phases, with AUC values of 0.891 and 0.881, respectively, indicated that combination of these lncRNAs may comprise a promising method for CRC detection. Furthermore, we directly compared the diagnostic efficiency of 4-lncRNA panel with previously established marker CEA, in the same cohort. The result clearly demonstrated the superiority of 4-lncRNA panel over CEA for CRC diagnosis, with better sensitivity, especially in early-stage tumors. Based on these findings, it seems that serum lncRNA panel may prove to be a much more sensitive method for CRC detection.

The four lncRNAs in the panel were significantly dysregulated in both tissue and serum sample according to our selection procedure. In addition, we examined the correlation between lncRNAs expressions of colorectal cancer tissue and serum. A significant correlation was observed, approving the power of the 4-lncRNA panel to reflect the condition of the solid tumor. Despite the reduced expression of NR_026817, NR_029373 and NR_034119 in CRC patients, all three lncRNAs have a steady expression in both tissue and serum samples. Moreover, we tested the stability of all four identified lncRNAs in harsh conditions. These lncRNAs could stably express even after their storage at room temperature or at −80 °C and repetitive freeze–thaw cycles, suggesting that this 4-lncRNA panel is fit for stably detection even under harsh conditions.

Considering one of the most urgent needs of clinicians, to find adequate predictive biomarker that could discriminate CRC patients with high risk and poor prognosis, we investigated the role of these four identified lncRNAs as prognostic biomarkers. The median was used as the cutoff as we just explore their potentiality for prognosis prediction in this study. Reduced expression of NR_029373 and NR_034119 lncRNAs was correlated with lower disease-specific survival rate. The Cox proportional hazards regression model analysis showed that serum expression level of NR_029373 and NR_034119 was independent factors for disease-specific survival rate of CRC patients, suggesting that they may be employed as biomarkers for CRC prognosis.

Although we have constructed a promising 4-lncRNA panel for CRC detection in serum, it is uncertain if this panel is only specific for CRC. Thus, additional studies will be required to further examine the expression changes of these four lncRNAs in other tumors. Despite demonstrating improved sensitivity and markedly higher AUC value of the 4-lncRNA panel over CEA, in distinguishing early-stage CRC patients from healthy people, the expression of these four lncRNAs has still not been investigated in patients with adenoma. Colorectal adenoma is accepted as the usual benign precursor lesion in the transformation to CRC. Thus, we need to explore the lncRNAs dysregulation in patients suffering from adenoma with various degree of dysplasia and to identify if it has some prognostic benefits to patients who are at the very early risk of having CRC.

In conclusion, we have successfully established a distinctive serum 4-lncRNA panel for CRC detection through stringent step-by-step selection procedures and identified that NR_029373 and NR_034119 can act as independent predictors of CRC-specific survival rate. Further studies, including large clinical samples and diverse ethnic populations, are required to confirm the usefulness of these lncRNAs, as noninvasive markers in CRC patients.