Introduction

In 2020, more than 1.9 million new colorectal cancer (CRC) cases were estimated to have developed worldwide [1]. Despite advances in diagnosis and treatment, CRC ranked as the second leading cause of cancer-related mortality, with an estimated 935,000 deaths [1]. Early detection of CRC is crucial to providing effective treatment and improving patient prognosis, but remains challenging due to the asymptomatic nature of early-stage disease. The 5-year survival rate for CRC ranges from 90% for patients with localized disease to 14% for those with distant metastasis [2].

Colonoscopy with biopsy is considered the gold standard for CRC diagnosis, but is unsuitable for mass screening because the procedure is highly invasive and costly. The fecal immunochemical test (FIT) has been widely used for CRC screening due to its relative simplicity and cost-effectiveness and has helped to reduce CRC-related mortality [3]. However, because the FIT detects occult blood in the feces, false-positive results can be caused by a multitude of non-cancerous lesions. Furthermore, false-negative results from the FIT due to low sensitivity, especially with early-stage CRC [4,5,6], may lead to delayed diagnosis and missed opportunities for timely intervention. Currently utilized serum tumor markers for CRC, such as carcinoembryonic antigen (CEA) and carbohydrate antigen (CA)19–9, lack sufficient sensitivity and specificity for early detection of CRC [7]. Therefore, there is a crucial need for more sensitive, specific, and non-invasive biomarkers for CRC.

Urine-based tests are non-invasive, convenient, and highly suitable for mass screening. Urine-based tests thus have the potential to increase the participation rate in mass screening. Urinary proteins mainly comprise filtered plasma proteins and proteins secreted from the urinary tract. While approximately 40% of urinary proteins are filtered plasma proteins, the colon represented one of the organs with the highest presence of tissue-derived proteins in urine [8]. Urine may thus offer a promising sample for biomarkers of CRC. However, no urinary biomarkers are currently used for the diagnosis of CRC.

We have previously reported on the usefulness of urinary biomarkers for diagnosing esophageal cancer [9], gastric cancer [10,11,12,13], and CRC [14, 15], and have established methods for handling urine samples. In this study, we conducted a stepwise analysis from comprehensive identification of urinary protein biomarkers by mass spectrometry (MS) to the establishment of novel biomarkers for the early diagnosis of CRC by enzyme-linked immunosorbent assay (ELISA).

Methods

Patients and study design

We collected urine and serum samples from September 2012 to June 2020 at two Japanese institutions (Nagoya City University Hospital and Okazaki Public Health Center). Men and women between 20 and 90 years old were enrolled in the study. The CRC group included patients who had been histologically confirmed as having CRC using endoscopic biopsy and who had not received treatment prior to sample collection. Patients with recurrent CRC, a history of neoplasms of any type within the preceding 5 years, and/or multiple neoplasms were excluded. The healthy control (HC) group consisted of individuals who were asymptomatic and showed no evidence of neoplasms during their medical check-ups.

This study complied with both the REMARK guidelines [16] and STROBE statement [17] to ensure that this case–control biomarker study was accurately and comprehensively reported. Our study protocol (no. 45-12-0013) was approved by the ethics committees at each institution (Nagoya City University Hospital Institutional Review Board and Okazaki Public Health Center Ethical Committee), and conformed to the ethical guidelines of the 1975 Declaration of Helsinki (6th revision, 2008). Each patient provided written, informed consent before study entry.

Samples and definition

Urine and serum samples were collected before any CRC treatment. These samples were immediately frozen and stored at −80 °C until analysis, as previously reported [9,10,11,12,13,14,15]. The disease stage of CRC was determined based on the final pathologic diagnosis after resection according to the 7th edition of the Union for International Cancer Control TNM classification [18].

Preparation of urine samples for mass spectrometry analysis

An independent pool of urine samples from patients with CRC and HCs was created. Each pool of urine samples was filtered through 0.45-μm filters (#SLHAR33SB; Merck, Burlington, MA, USA) and immediately mixed with a protease inhibitor cocktail (#5056489001; Roche, Basel, Switzerland). Reduction with 25 mM dithiothreitol (#209–19861; FUJIFILM Wako, Osaka, Japan) followed by alkylation with 50 mM iodoacetamide (#19302–54; Nacalai Tesque, Kyoto, Japan) was then performed. Samples were mixed with Sera-Mag SpeedBead carboxylate-modified magnetic particles (#45152105050250, #65152105050250; GE Healthcare, Chicago, IL, USA), and the urinary proteins were precipitated in 50% ethanol. Protein aggregates were washed three times with 80% ethanol and eluted into 50 mM ammonium bicarbonate buffer (#017–02875; FUJIFILM Wako) supplemented with trypsin (protein weight, 1/50, #3708969001; Roche). After overnight incubation at 37 °C, digestion was quenched by adding 1% trifluoroacetic acid (#204–02743; FUJIFILM Wako). Tryptic peptides were desalted using a C18 stage-tip [19] and labeled with tandem mass tags (TMT) 10-plex reagent (#90110; ThermoFisher Scientific, Waltham, MA, USA) according to the protocol from the manufacturer. The mixture of labeled peptides was separated into 32 fractions with an off-line reversed-phase liquid chromatography (RPLC) system (LC-20AB system; Shimadzu, Kyoto, Japan), on an L-column3 C18 analytical column (5 mm particles, #862020; Nikkyo Technos Co., Tokyo, Japan) using a 22-min gradient from 19 to 37% B solution (90% acetonitrile, pH 10) at a flow rate of 0.2 mL/min with the mobile phase A solution (2% acetonitrile, pH 10) and B solution. These fractions were subsequently combined into 15 non-contiguous fractions for liquid chromatography-tandem mass spectrometry (LC–MS/MS) analysis.

LC–MS/MS analysis

Fractions were individually analyzed by LC–MS/MS in a Q Exactive mass spectrometer (ThermoFisher Scientific) coupled to an UltiMate3000 nano LC system (ThermoFisher Scientific) with an electrospray ionization capillary column, 150 mm × 75 μm i.d. (Nikkyo Technos Co.) via a nano-electrospray ion source. Low-pH RPLC was performed at a flow rate of 300 nl/min using a linear gradient [0 min, 5% solvent B (95% acetonitrile with 0.1% formic acid); 100 min, 40% solvent B] by mixing with solvent A (2% acetonitrile with 0.1% formic acid) and solvent B. Each precursor ion was measured with a mass-to-charge ratio (m/z) range of 400–1600. MS2 scans were obtained for the 20 most intense peaks from each MS1 scan.

Protein identification and quantification was performed using MaxQuant version 1.5.1.2 software (http://maxquant.org), supported by Andromeda. MS spectra were searched against the UniProt human database (release 2019_04) (https://www.uniprot.org/) and 262 common contaminants using the following parameters. Enzyme specificity was set to trypsin/P. Missed cleavage sites were tolerated up to two. Carbamidomethylation of cysteine residues was set as a fixed modification. Methionine oxidation and deamidation of asparagine and glutamine residues were set as variable modifications. The false discovery rate for protein groups, peptide groups, and peptide spectrum matches was less than 0.01. Quantitative data for each protein group were median-normalized and log2-transformed.

ELISA

We measured urinary and serum protein concentrations using mono-specific ELISA kits according to the instructions from the manufacturers (Supplementary Table S1).

Database analysis

The GSE223119 dataset was downloaded from the Gene Expression Omnibus (GEO) (https://www.ncbi.nlm.nih.gov/geo/) database to compare expressions of DPEP1 and TFF1 between CRC tissues (n = 20) and normal tissues (n = 20) using RNA sequencing data. The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data were analyzed by GEPIA (http://gepia.cancer-pku.cn/) to investigate gene expressions of DPEP1 and TFF1 between colon cancer (CC) tissues (n = 275) and normal tissues (n = 349) or between rectal cancer (RC) tissues (n = 92) and normal tissues (n = 318). Normal tissues were matched with TCGA adjacent tissues and GTEx normal tissues. Kaplan–Meier estimates of overall survival for CRC were explored via the cBioPortal database (https://www.cbioportal.org/) including 588 patients (Colorectal Adenocarcinoma, TCGA, PanCancer Atlas).

Statistical analysis

The propensity score (PS), including age and sex, was estimated using a logistic regression model. We randomly matched participants with a case-to-control ratio of 1:1, using the nearest-neighbor method within a caliper of width 0.2 of the standard deviation of the logit of the PS. Quantitative variables are described using median and interquartile range (IQR) and were analyzed using the Mann–Whitney U test. Categorical variables were analyzed using the chi-square test or Fisher’s exact probability test, as appropriate. In the multivariate analysis, logistic regression modeling was used to estimate the odds ratio (OR) with 95% confidence interval (CI). Correlations were evaluated using the Spearman rank method with a coefficient (r). The area under the curve (AUC) for each biomarker was calculated by receiver operating characteristic (ROC) curve analysis, and the representative value was shown as the AUC with a 95% CI. The sensitivity and specificity of each biomarker were determined from the optimal cutoff value. All statistics were calculated using the SPSS Statistics version 25 (IBM, Tokyo, Japan) and R software (https://www.r-project.org/). Two-sided values of P < 0.05 were considered statistically significant.

Results

Patients

A total of 474 participants (299 HCs, 175 CRCs) were enrolled in this study of urinary biomarkers for CRC. After PS matching, 363 individuals (188 HCs, 175 CRCs) were finally selected as the study cohort. This entire cohort was randomly divided into three sub-cohorts: 32 individuals (16 HCs, 16 CRCs) as the discovery cohort, 220 individuals (110 HCs, 110 CRCs) as the training cohort, and 111 individuals (62 HCs, 49 CRCs) as the validation cohort (Fig. 1a). In the discovery cohort, LC–MS/MS analysis was performed to identify candidate urinary biomarkers for CRC diagnosis. ELISA analysis was then used to establish unique urinary protein biomarkers in the training cohort. Finally, these established biomarkers were confirmed in the independent validation cohort. We also examined the detectability of patients with colorectal adenoma (CRA) (n = 31) using the established urinary biomarkers. In addition, age- and sex-matched serum samples (68 HCs, 68 CRCs) were used to further validate the diagnostic potential of serum expressions of the established urinary proteins for CRC diagnosis in the serum cohort. Database analysis was performed to validate the expressions of established biomarkers in CRC and normal tissues.

Fig. 1
figure 1

Study flowchart and quantitative mass spectrometry. a Consort diagram. b Individual donor information and sample pooling. c High-throughput quantitative proteomics platform for cancer biomarker discovery. d Heat map of the differentially expressed urinary proteins for quantitative mass spectrometry between HC and CRC groups. HC, healthy control; CRC, colorectal cancer; MS, mass spectrometry; ELISA, enzyme-linked immunosorbent assay; CRA, colorectal adenoma; TMT, tandem mass tag

Comprehensive analysis with quantitative proteomics analysis

In the discovery cohort, we conducted a comprehensive proteomics analysis using LC–MS/MS analysis to compare urinary protein expression levels between CRCs and matched HCs. We prepared eight pooled urine samples from the four individuals (Fig. 1b). We then tagged these samples with 8-plex tandem mass tags (TMT) and combined them (Fig. 1c). Among the 765 proteins quantified in the LC–MS/MS analysis, 78 urinary proteins (46 up-regulated and 32 down-regulated proteins) were differentially expressed between groups (absolute fold change > 2.0) and stably expressed in each pool (Fig. 1d). Subsequently, we selected 23 urinary protein candidates among these for further analysis based on the following selection criteria. These criteria were based on information from previous reports and The Human Protein Atlas (https://www.proteinatlas.org). Specifically, we focused on proteins with small molecular weight, those identified as cancer-specific proteins, or those preferentially expressed in the gastrointestinal system, including the colon and rectum.

Establishment and validation of protein biomarkers using ELISA

Table 1 shows overall patient characteristics. Baseline characteristics were well balanced between the HC and CRC groups in both the training and validation cohorts. Stage 0/I was found in 40.0% and 44.9% of CRCs in the respective cohorts.

Table 1 Characteristics of the study cohort

Next, to establish urinary protein biomarkers, urinary levels of 23 proteins identified in the discovery cohort were analyzed with ELISA in the training cohort. Among these, 15 proteins with levels normalized to urinary creatinine levels were significantly higher in the CRC group than in the HC group (Table 2). In the multivariate analysis of 8 urinary proteins normalized to urinary creatinine levels, uDPEP1/uCr (OR 1.010, 95%CI 1.003–1.016; P = 0.004) and uTFF1/uCr (OR 1.027, 95%CI 1.014–1.041; P < 0.001) were independent biomarkers for CRC diagnosis in the training cohort. The post hoc achieve powers for significant biomarkers in the training cohort were larger than 85%. Based on these results, we established a diagnostic biomarker panel for CRC consisting of urinary uDPEP1/uCr and uTFF1/uCr using logistic regression modeling. ROC analyses of the top four urinary protein biomarkers (uDPEP1/uCr, uTFF1/uCr, uANPEP/uCr, and uLGALS3/uCr) allowed HC and CRC groups to be distinguished in the training cohort (uDPEP1/uCr, AUC = 0.802; uTFF1/uCr, AUC = 0.751; uANPEP/uCr, AUC = 0.714; uLGALS3/uCr, AUC = 0.712). The urinary protein biomarker panel comprising uDPEP1/uCr and uTFF1/uCr showed good power to distinguish patients with CRCs from HCs, with an AUC of 0.840 (95%CI 0.787–0.893; Fig. 2a). We also analyzed absolute values of each urinary protein level because biomarker tests using absolute values are more readily applicable to practical use. Absolute values of 11 urinary proteins were significantly higher in the CRC group than in the HC group (Supplementary Table S2). In multivariate analysis with 8 absolute urinary protein values, uDPEP1 (OR 1.015, 95%CI 1.007–1.023; P < 0.001), uTFF1 (OR 1.000, 95%CI 1.000–1.000; P = 0.011), uPRG4 (OR 1.005, 95%CI 1.000–1.009; P = 0.030), and uTNFRSF10C (OR 0.904, 95%CI 0.848–0.963; P = 0.002) were independent biomarkers for CRC diagnosis in the training cohort. ROC analyses of the top four urinary protein biomarkers (uDPEP1, uTFF1, uANPEP, and uLGALS3) showed significant differentiation between HCs and CRCs in the training cohort (uDPEP1, AUC = 0.799; uTFF1, AUC = 0.697; uANPEP, AUC = 0.698; uLGALS3, AUC = 0.623). The biomarker panel comprising uDPEP1 and uTFF1 could distinguish between CRCs and HCs with a good AUC of 0.825 (95%CI 0.770–0.879; Fig. 2b).

Table 2 Urinary protein levels normalized to urinary creatinine levels
Fig. 2
figure 2

Receiver operating characteristic curves. a, b Training cohort. c, d Validation cohort. Receiver operating characteristic curves were obtained from urinary protein levels normalized to urinary creatinine levels (a, c) and absolute urinary protein values (b, d) to distinguish CRC patients from healthy controls. AUC, area under the curve; 95%CI, 95% confidence interval

Next, we confirmed the diagnostic performance of 8 established urinary protein biomarkers in the independent validation cohort. For urinary protein biomarkers normalized to urinary creatinine levels, uDPEP1/uCr (P < 0.001), uTFF1/uCr (P < 0.001), uANPEP/uCr (P < 0.001), uLGALS3/uCr (P < 0.001), and uTNFRSF10C/uCr (P = 0.020) were significantly higher in the CRC group than in the HC group (Table 2). ROC analysis of the top four urinary protein biomarkers (uDPEP1/uCr, uTFF1/uCr, uANPEP/uCr, and uLGALS3/uCr) in the validation cohort also showed significant differentiations between HCs and CRCs (uDPEP1/uCr, AUC = 0.809; uTFF1/uCr, AUC = 0.731; uANPEP/uCr, AUC = 0.741; uLGALS3/uCr, AUC = 0.724). The urinary protein biomarker panel comprising uDPEP1/uCr and uTFF1/uCr showed good performance for CRC diagnosis with an AUC of 0.855 (95%CI 0.786–0.925; Fig. 2c). Similarly, absolute values of uDPEP1 (P < 0.001), uTFF1 (P < 0.001), uANPEP (P < 0.001), uLGALS3, (P < 0.001), uPRG4 (P = 0.004), uPIGR (P = 0.029), and uTNFRSF10C (P = 0.003) were significantly higher in the CRC group than in the HC group (Supplementary Table S2). ROC analysis of the top four urinary protein biomarkers (uDPEP1, uTFF1, uANPEP, and uLGALS3) in the validation cohort also allowed significant differentiation between HCs and CRCs (uDPEP1, AUC = 0.930; uTFF1, AUC = 0.747; uANPEP, AUC = 0.846; uLGALS3, AUC = 0.716). Notably, the biomarker panel comprising uDPEP1 and uTFF1 showed extremely high diagnostic power for CRC with an AUC of 0.956 (95%CI 0.923–0.988; Fig. 2d).

Table 3 summarizes the sensitivities and specificities of our established urinary protein biomarkers and serum levels of CEA and CA19-9 in the whole cohort. The urinary protein biomarker panel comprising uDPEP1 and uTFF1 showed 91.1% and 94.3% sensitivity and 62.8% and 63.4% specificity for stage 0–III CRC diagnosis using urinary creatinine-normalized values and absolute values, respectively. The sensitivities of these urinary protein biomarkers were clearly superior to the currently utilized serum tumor markers for CRC diagnosis (CEA: 33.1%; CA19-9: 16.9%).

Table 3 Diagnostic power of biomarkers

Urinary protein biomarkers for early-stage and precancerous lesions

Importantly, the urinary protein biomarkers established here showed significantly higher levels even in the stage 0/I CRC group, compared to the HC group, for both urinary creatinine-normalized levels (Supplementary Figure S1) and urinary absolute values (Supplementary Fig. S2). In contrast, urinary levels of these protein biomarkers were not associated with disease progression including stage II and III.

These urinary biomarkers also significantly distinguished stage 0/I CRCs from HCs, using both urinary creatinine-normalized levels (Fig. 3a) and absolute urinary values (Fig. 3b). The urinary protein biomarker panel comprising uDPEP1/uCr and uTFF1/uCr, using values normalized to urinary creatinine, showed good diagnostic performance for stage 0/I stage CRC with an AUC of 0.792, 83.1% sensitivity, and 62.8% specificity. The urinary protein biomarker panel comprising absolute values of uDPEP1 and uTFF1 also showed good diagnostic performance for stage 0/I stage CRC with an AUC of 0.852, 93.8% sensitivity, and 63.4% specificity (Fig. 3a, 3b; Table 3). In comparison, the sensitivities of currently used serum tumor markers for stage 0/I CRC were very low, at 8.2% for CEA and 11.5% for CA19-9. These findings indicate that the urinary protein biomarker panel established here holds promise as a non-invasive screening tool for the early detection of CRC.

Fig. 3
figure 3

Receiver operating characteristic curves for early detection of stage 0/I CRC. Receiver operating characteristic curves were obtained from urinary protein levels normalized to urinary creatinine levels (a) and absolute urinary protein values (b) to distinguish stage 0/I CRC patients from healthy controls in the whole cohort. AUC, area under the curve; 95%CI, 95% confidence interval

We also investigated the diagnostic performance of our established urinary protein biomarkers in CRA patients. The characteristics of the CRA cohort are shown in Supplementary Table S3. Surprisingly, urinary normalized levels of DPEP1 and TFF1, as the top two urinary protein biomarkers, were significantly higher in the CRA group than in the HC group, and were significantly higher in the CRC group than in the CRA group (Fig. 4a, b). Accordingly, uDPEP1/uCr and uTFF1/uCr significantly distinguished CRA patients from HCs (Supplementary Fig. S3a). Similarly, absolute urinary values of DPEP1 and TFF1 were significantly increased in the order of HCs, CRA patients, and CRC patients, reflecting disease burden (Fig. 4c, d). Urinary absolute values also distinguished between CRA patients and HCs (Supplementary Fig. S3b).

Fig. 4
figure 4

Urinary DPEP1 and TFF1 levels according to colorectal adenoma and cancer. Healthy controls (HCs) and patients with colorectal adenoma (CRA) and colorectal cancer (CRC) (HC, n = 172; CRA, n = 31; CRC, n = 159) were used for analyses related to urinary levels of DPEP1 (a, c) and TFF1 (b, d). Urinary protein levels normalized to urinary creatinine levels (a, b) and absolute urinary protein values (c, d). Each comparison between groups was analyzed using the Mann–Whitney U test. Correlations were analyzed using Spearman rank correlations. HC, healthy control; CRA, colorectal adenoma; CRC, colorectal cancer

Additional validation analysis

We also analyzed serum samples from 68 CRC patients and 68 age- and sex-matched HCs. The frequency of Stage 0/I was 44.1% among CRCs in the serum cohort (Supplementary Table S4). Serum levels of DPEP1 (sDPEP1) and TFF1 (sTFF1) were significantly higher in the CRC group than in the HC group (serum DPEP1, P = 0.023; serum TFF1, P < 0.001) (Supplementary Fig. S4a), and could significantly distinguish between HCs and CRCs with AUCs of 0.613 and 0.709, respectively (Supplementary Fig. S4b). Serum levels of DPEP1 and TFF1 also distinguished between HCs and stage 0/I CRCs with AUCs of 0.617 and 0.661, respectively (Supplementary Fig. S4c). However, the diagnostic performance of uDPEP1 and uTFF1 was superior to that of sDPEP1 and sTFF1 in terms of CRC diagnosis.

We further validated the expression status of DPEP1 and TFF1 in CRC tissues using the GEO and TCGA databases. RNA sequencing data for 20 pairs of CRC and adjacent normal tissues were identified from the GEO database using the GSE223119 dataset. Expression levels of these genes were significantly higher in CRC tissues than in normal tissues (DPEP1, P < 0.001; TFF1, P = 0.042) (Supplementary Fig. S5a, b). We also confirmed expressions by the TCGA and GTEx databases via the GEPIA platform and found that DPEP1 (CC, P < 0.05; RC, P < 0.05) and TFF1 (CC, P < 0.05; RC, P < 0.05) were significantly higher in CRC tissue than in normal tissues (Supplementary Fig. S5c, d). Using the database, we also compared the expression levels of the four established biomarkers in gastrointestinal cancer tissues with those in the corresponding normal tissues, including CC, gastric cancer, and esophageal squamous carcinoma. DPEP1 showed significantly higher expression in only CC among gastrointestinal cancers (Supplementary Fig. S6). Finally, we evaluated the association between DPEP1 and TFF1 expressions in CRC tissues and prognosis using the cBioPortal database and found no association between their expressions and prognosis (Supplementary Fig. S7).

Discussion

The present study was conducted using comprehensive stepwise analyses to identify possible urinary biomarkers for the diagnosis of CRC in a large cohort. We identified candidate urinary biomarkers using LC–MS/MS and finally established a novel urinary biomarker panel for CRC diagnosis comprising DPEP1 and TFF1 using ELISA. This panel was validated in an independent cohort, showing outstanding diagnostic power with AUCs > 0.85. Notably, this panel enabled early detection of CRC with high power. Moreover, urinary DPEP1 and TFF1 could also significantly distinguish CRA patients from HCs. These urinary biomarkers showed prominently superior potential for the early detection of CRC compared to currently used tumor markers, suggesting a potential contribution to improving the curability of CRC.

In recent decades, biomarker studies have employed various omics techniques to identify specific biomolecules in body fluids. Among these biomolecules, proteins are clinically attractive targets due to the relatively simple detection using antigen–antibody reactions. This simplified procedure not only facilitates testing, but also reduces the variability that may arise during each step from extraction to quantitation, enabling greater accuracy of measurement. Moreover, the simplicity of the current biomarker panel comprising only two proteins might contribute to easy application in clinical settings at low cost.

Based on DPEP1 and TFF1 expressions in CRC tissues and serum, these protein biomarkers likely originate from CRC cells and enter the urine through the bloodstream. While serum DPEP1 and TFF1 also serve as diagnostic biomarkers for CRC, urinary DPEP1 and TFF1 exhibited even better diagnostic potential than serum levels. The exact reason for the advantages of urine remain unclear, but lower background protein content through glomerular filtration might facilitate biomarker detection due to reduced background noise.

Urine samples exhibit variability due to several factors, including hydration, diet, and health conditions. In proteomic studies using spot urine samples, protein-to-creatinine ratio has been considered a potential normalization method [20]. This study established urinary biomarkers primarily using urinary creatinine normalization and confirmed their ability to distinguish CRCs from HCs. Notably, these biomarkers also exhibited excellent diagnostic performance for CRC even with absolute urinary values, which offers huge advantages for the future development of simple screening kits.

Three reports have examined urinary protein biomarkers for CRC diagnosis. One study using MS analysis identified urinary fragments of fibrinogen, hepcidin-20, and β2-microglobulin as discriminative between CRCs and HCs [21]. The second study was our previous investigation revealing that the combination of urinary cysteine-rich protein 61 and trefoil factor 3 served as a diagnostic biomarker for CRC with an AUC of 0.75 using ELISA [15]. The third study established a diagnostic biomarker panel for CRC comprising urinary CORO1C, ARPC5, and RAD23B using MS analysis, and immunoassay verification showed AUCs of 0.787–0.846 among 154 HCs and 103 CRCs [22]. This study also utilized LC–MS/MS to identify candidate biomarkers. However, the proteins identified by LC–MS/MS in this study were different from those previously identified. Comprehensive MS analyses often yield inconsistent results across various biomarker studies, likely due to various factors including instrument models of MS, handling of urine samples, and CRC stages. Therefore, we validated the identified protein biomarkers by ELISA in training and validation cohorts following LC–MS/MS analysis. Furthermore, we focused primarily on proteins expressed in the gastrointestinal tract, which may have contributed to the establishment of biomarkers different from previous reports. The urinary biomarker panel established in this study showed diagnostic performance comparable to previous studies for all-stage CRC. Notably, approximately 40% of tumors in the present study were early stage (stage 0 and I), and a urinary biomarker panel comprising uDPEP1 and uTFF1 achieved an excellent AUC of 0.852 even for stage 0/I CRC, representing unprecedented utility as a urinary biomarker for CRC.

DPEP1 is a zinc-dependent metalloproteinase involved in glutathione and leukotriene metabolism [23]. Despite some reports that this protein acts as a tumor suppressor for certain cancers [24], DPEP1 is highly expressed in CRC tissues [23, 25, 26] and cell lines [25, 27], promoting CRC cell proliferation and invasion [23, 26, 28]. DPEP1 enhances E-cadherin expression and mediates transforming growth factor-β-induced epithelial to mesenchymal transition [26], and also boosts CRC cell proliferation/stemness via MYC and Achaete scute-like 2 [23, 28]. DPEP1 has been identified as a biomarker for disseminated tumor cells in intra-peritoneal lavage and blood samples [25] and also one of the candidate genes for fecal RNA-based CRC screening [27]. This study is the first to establish urinary DPEP1 as a diagnostic biomarker for CRC. Urinary DPEP1 levels have already been shown to be elevated in early-stage CRC, making this a suitable early diagnostic biomarker. The exact mechanism by which urinary expression of DPEP1 does not correlate with stage progression remains unclear. However, the fact that DPEP1 expression in CRC tissues is not associated with prognosis is consistent with the urinary expression pattern of DPEP1. These results suggest that DPEP1 is involved in the early stages of carcinogenesis.

TFF1, as another independent biomarker for CRC diagnosis in this study, is a small peptide expressed and secreted by mucus-secreting cells. TFF1 is predominantly expressed in the stomach and colon, acting to protect against and repair mucosal damage [29]. While TFF1 acts as a tumor suppressor in gastric cancer [30], TFF1 showed higher expression in CRC tissues [29] and promoted invasion and adenoma-carcinoma transition for CRC [31]. Serum TFF1 levels were significantly higher in metastatic CRCs than in HCs with an AUC of 0.72 and decreased after primary tumor resection [32]. Meanwhile, TFF1 expression did not correlate with clinicopathological parameters or prognosis for CRC [29]. Similar to DPEP1, TFF1 seems to be involved in CRC development rather than progression. We have previously identified urinary TFF1 as an early diagnostic biomarker for gastric cancer [13], but the present study demonstrates higher diagnostic value for TFF1 in CRC than in gastric cancer.

ANPEP, which showed the third highest diagnostic performance in this study, is a membrane dimeric metallopeptidase involved in cancer development. Elevated ANPEP expression has been correlated with adverse clinical outcomes in several cancers [33]. ANPEP gene expressions were lower in CRC tissues and cell lines [34, 35], but were associated with drug resistance [36]. Interestingly, high ANPEP activity in CRC tissues correlated with better overall survival, while high ANPEP activity in plasma showed worse overall survival [34]. Further validation of ANPEP expression in each sample for CRC patients is therefore required.

LGALS3, the fourth top CRC diagnostic biomarker in this study, belongs to the galectin family, playing a role in cellular adhesion, cell growth, and apoptosis [37]. LGALS3 expressions were higher in CRC tissues and correlated with clinicopathological parameters [37] and shorter disease-free survival [38]. Plasma LGALS3 levels were also significantly higher in CRCs than in HCs, with an AUC of 0.59 for CRC diagnosis [39]. These findings are consistent with our result that urinary LGALS3 serves as a promising biomarker for CRC diagnosis.

This study had the following limitations. Since this study did not include data from the FIT, we could not directly compare diagnostic performance between FIT and the current urinary protein biomarkers. However, since the sensitivity of FIT for early-stage CRC has been reported as 33–68% for stage I CRC and 39–52% for T1 CRC [4,5,6], our urinary biomarker clearly showed superior sensitivity for early-stage CRC. To address this limitation, we plan to conduct a prospective cohort study with industry collaboration for future clinical applications. Furthermore, although uDPEP1 and uTFF1 are independent CRC diagnostic biomarkers with an excellent AUC, their combination effect is not as pronounced. The future prospective cohort study will also address whether the combination biomarker of uDPEP1 and uTFF1 is better than the single biomarker of uDPEP1 alone. In addition, since the mechanisms underlying established biomarkers are not fully understood, further investigation will be necessary in future.

In conclusion, this study established a novel, reliable, non-invasive urinary protein biomarker panel that enables early detection of CRC. With its promising nature for future clinical utilization, we plan to undertake a prospective study to further validate its potential.