Introduction

N-glycosylation is the attachment of N-glycans to the N-X-S/T/C sequons; each N-glycan, with a GlcNAc2Man3 core and unique branch structure, functions in a site- and structure-specific manner. More than 50% of mammalian proteins are glycosylated, and glycoproteins play various key roles in the folding of immune protein molecules, recognition and communication between immune proteins, activation of immune cells, and antigen presentation as well as in the occurrence and development of disease (especially the occurrence, development, metastasis and invasion of tumors) [1,2,3,4,5,6,7,8]. High-throughput site- and structure-specific characterization of differential N-glycosylation in pathological conditions using MS-based N-glycoproteomics has been one of the state-of-the-art instrumental platform for discovery of putative disease biomarkers.

Pancreatic cancer (PC) is one of the most deadly malignant tumors in the world. Its five-year survival rate is less than 5%, making it the worst prognosis of all cancers [9]. No significant symptom in the early phase makes it difficult to diagnose [10]. Serum antigen 19–9 (CA 19–9), with sensitivity of 79–81% and specificity of 82–90%, is currently used as a first-line auxiliary screening for early PC [11, 12]. There is an urgent need to find more efficient biomarkers for better diagnosis and development of new therapeutic strategies.

In recent years, researchers have done a variety of studies of glycoprotein markers of PC. For developing a strategy to identify sialylated glycoprotein markers in serum, Jia Zhao et al. applied lectin affinity selection in the analysis of cancer serum compared with normal samples. 130 sialylated glycoproteins were identified, and down-regulation of R1-antitrypsin and sialylated plasma protease C1 inhibitor was observed [13]. With specific anti-sialyl Lewis X antibody and N-glycan sequencing, Ariadna Sarrats et al. found glycosylation changes on acute-phase proteins (APP), which is partially regulated by cytokines and a useful tumor markers [14]. In 2014, Sheng Pan et al. reported comparative study of pancreatic tumor, chronic pancreatitis and normal pancreas tissues [15]. Abnormal level of PC-associated MUC5AC, CEACAM5, IGFBP3 and LGALS3BP were observed. For metastasis tracking of PC, Hae-Min Park et al. found that high-mannose N-glycans is higher among Capan-1 cells (pancreatic cancer cells that have metastasized to the liver) than Panc-1 and MIA PaCa-2 cells which were from the pancreas duct head and tail regions. Meanwhile, up-regulation of highly-branched sialyted N-glycans was also found in pancreatic cancer cells [16]. With TMT-labeling quantitative study of pancreas sera, Shibu Krishnan et al. quantified 703 proteins [17]. Altered proteins were predominantly abundant in inflammatory response, coagulation, and immune-related events were recognized with biomarker potential.

Previously we have established a site- and structure-specific quantitative N-glycoproteomics with intact N-glycopeptide search engine GPSeeker [18] and stable isotopic diethyl labelling (SIDE) [19, 20]. The pipeline has been successfully applied in the quantitative characterization of N-glycoprotein markers of MCF-7 and MCF-7/ADR cancer stem cells [21,22,23].

Here, we report our site- and structure-specific quantitative tissue N-glycoproteomics study of pancreatic cancer. A series of glycoprotein including TF, FADS3, SOX3, MFAP4, CX058, KCC2G, TSP1, MYOME, NAV3, UBQLN, SERPH, KCNQ5, IGHA1, PKHL1, CATD, ZN546, NAA80, PGS1, TPP1, IF140, PDIA2, DVL3 and HYOU1 were quantified with differential expression.

Experimental section

The experimental procedure is mainly composed of four steps including 1) sample preparation of the 1:1 mixture of isotopically diethylated intact N-glycopeptides from adjacent normal vs. cancer pancreatic tissues, 2) RPLC-MS/MS (HCD) analysis of the 1: 1 mixture, 3) site- and structure-specific database search and identification of intact N-glycopeptides using GPSeeker, and 4) relative quantitation of differentially expressed intact N-glycopeptide IDs using GPSeekerQuan. (Fig. 1).

Fig. 1
figure 1

(A) Site- and structure-specific quantitative N-glycoproteomics pipeline; (B) Overall identification and quantitative results from four biological replicates each with three technical replicates

Chemicals and reagents

Trifluoroacetic acid (TFA), iodoacetamide (IAA), dithiothreitol (DTT), trypsin, protease inhibitor cocktail, ammonium hydroxide solution (ACS reagent, 28.0%–30.0%) and all solvents (such as acetonitrile, isopropyl alcohol and methanol for HPLC eluent grade) were purchased from Sigma-Aldrich (St. Louis, MO). BCA assay kit, ammonium bicarbonate (ABC, Reagent Grade) and 20× phosphate-buffered saline (PBS, DEPC-treated) were purchased from Sangon Biotech (Shanghai, China). Sodium dodecyl sulfate (SDS) and Tris (tris-hydroxymethyl-aminomethane) were purchased from BBI Co., Ltd. (Shang, China). 13CH313CHO (99% 13C atom, 1632-98-0) was purchased from Cambridge Isotope Laboratories, Inc. Ultrapure water was produced on site by Millipore Simplicity System (Billerica, MA).

Sample preparation of the 1:1 mixture of isotopically diethylated intact N-glycopeptides from adjacent normal vs. cancer pancreatic tissues

Four pairs of pancreatic cancer vs. adjacent normal tissues in this study were obtained from four patients in the Second Military Medical University Affiliated Changhai Hospital (Shanghai, China). All samples were collected according to the approved protocols by the internal review board. After surgery, the tissue samples were shredded, washed with 1xPBS (pre-cooled in 4 °C) until no blood was visible, flash frozen in liquid nitrogen, and stored at −80 °C within 30 mins. With the addition of lysis buffer (tissue: buffer, 1 g: 10 mL) and protease inhibitor cocktail (1% v/v), each tissue was homogenized at 80 thousand rpm for 2 mins, incubated on ice for 30 min and centrifuged at 14000 g for 30 min at 4 °C; the supernatant was then mixed with cold acetone (1:6, v/v) and incubated at −20 °C for 3 h. After precipitation, the solution was centrifuged at 10,000 g for 10 mins. The pellets was collected, fully re-suspended in 8 M urea, and diluted with 50 mM NH4HCO3 to let urea be less than 1 M. The protein concentration was measured by BCA assay.

The diluted proteins were reduced with dithiothreitol (DTT) at 55 °C for 30 mins, alkylated with iodoacetamide(IAA) at RT for 30 min in the dark and digested with trypsin (100:1, w/w) at 37 °C overnight. The digests were acidified with TFA to a final concentration of 0.5% (v/v). Tryptic peptides were desalted with C18 SPE columns and resuspended in 80% ACN/5% TFA. The intact N-glycopeptides were enriched using homemade ZIC-HILIC pipette tip containing 30 mg ZIC-HILIC particles (Merk Millipore, 5 μm, 200 Å). Briefly, the ZIC-HILIC packings were pre-equilibrated with the washing buffer (0.1% TFA, 80% ACN/5% TFA); the desalted peptide solution was loaded; after washing with 80% ACN/1% TFA. the intact N-glycopeptides were eluted with 300 μL 0.1% TFA and 100 μL 50 mM NH4HCO3; the eluants were combined, dried in a SpeedVac, and redissolved in 100 μL 0.1% TFA.

CH3CHO (20%, w/w) and 13CH313CHO (20%, w/w) was added to equal amount of intact N-glycopeptides extracted from adjacent normal and cancer pancreatic tissues, respectively, at a ratio of 0.5 μL/1 μg. After shaking for 1 min, an equal volume of 0.6 M NaBH3CN solution was added. Then the solutions were incubated for 1 h in a shaker at 37 °C. The reaction was terminated with an equal volume of 4% NH4OH solution. The light- and heavy-labeled samples were mixed at 1: 1 ratio, desalted with C18 SPE columns, dried in the SpeedVac and redissolved with 50 μL 0.1% TFA [24].

RPLC-MS/MS (HCD) analysis of the 1: 1 mixture of isotopic diethyl-labeled intact N-glycopeptides

The intact N-glycopeptides were trapped on a 5 cm long homemade trap column (360 μm o.d. × 200 μm i.d.) and separated on a 70 cm long homemade analytical column (360 μm o.d. × 75 μm i.d.) on a Dionex Ultimate 3000 RSLC nano-HPLC system (Thermo Fisher Scientific). Both columns were packed with C18 particles (Phenomenex, 5 μm, 300 Å). Buffer A is composed of 99.8% H2O and 0.2% FA, and buffer B is composed of 95.0% ACN with 4.8% H2O and 0.2% FA. The flow rate of the mobile phase was set at 3 μL/min for sample loading and 300 nL/min for separation; A multistep gradient was adopted: 2% B 10 min, 2–40% B 190 min, 40–95% B 10 min, 95–95% B 5 min, 95–2% B 5 min and held at 2% B for the last 20 min. MS spectra were acquired on a Q Exactive Orbitrap MS (Thermo Scientific) as follows: m/z range 700–2000, mass resolution 70 k (m/z 200), automatic gain control (AGC) target 2 × 105 with max ion injection time 50 ms. MS/MS spectra were acquired at the Top20 data-dependent mode with the following settings: mass resolution 17.5 k, AGC target 5 × 105 with max ion injection time 250 ms, dynamic exclusion 20.0 s, HCD normalized collision energies 20.0%, 30.0%, and 30.0%, isolation window 3.0 m/z. The ESI conditions were as follows: spray voltage 2.8 kV, capillary temperature 250 °C and S-lens RF level 75. With the aforementioned RPLC-MS/MS settings, four biological replicates (BR1, BR2, BR3 and BR4) were acquired each with three technical replicates (TR1, TR2, and TR3).

GPSeeker database search and identification of intact N-glycopeptides

Site- and structure-specific database search and identification of intact N-glycopeptides using GPSeeker has been reported in detail elsewhere, and only a brief description is provided here [18,19,20]. Firstly, four customized theoretical intact N-glycopeptides DBs (LF, LR, HF and HR) from combination of two directions (forward, reverse) and two diethyl labels (Light (CH2CH3)2 and Heavy (13CH213CH3)2) were created. Each LC-MS/MS raw dataset was searched individually against the four DBs. The search parameters of precursor and fragment ions are isotopic abundance cutoff (IPACO), isotope peak m/z deviation (IPMD), and isotope abundance deviation (IPAD), and the adopted values are 40%, 20 ppm and 50%. GPSMs were screened and output with the following criteria: Y1 ions, Top4; minimum percentage of matched fragment ions for every peptide backbone, ≥10%; minimum matched product ions for every N-glycan moiety, ≥1; TopN hits, N = 2 (Top1 hits has the lowest P-score). For each data set, the target and decoy GPSMs from either LF/LR or HF/HR were combined and sorted in increasing P scores, and then a cut-off P-score is selected to obtain intact N-glycopeptide IDs with spectrum-level FDR ≤ 1%.

GPSeekerQuan database search and relative quantification of differentially expressed intact N-glycopeptides (DEGPs)

For every intact N-glycopeptide ID, GPSeekerQuan search its paired precursor ion with a mass difference of 4.01344 Da and a isotopic peak m/z tolerance of 20 ppm. For each precursor ion, the summed abundance of Top3 isotopic peaks were used for relative quantitation. For each intact N-glycopeptide ID, all the six isotopic peaks in the pair are required to be observed to obtain the relative ratio (tumor/control). In each BR at least two ratios are required to be observed among the three TRs; p values were calculated using t-test [25]. Intact N-glycopeptide IDs with a cancer/normal fold change of no less than 1.5 and a p value of no more than 0.05 were classified as DEGPs.

Results

With trypsin digestion, ZIC-HILIC enrichment, stable isotopic diethyl labeling, intact N-glycopeptides from pancreatic cancer and adjacent normal tissues were mixed in a 1: 1 ratio and analyzed using C18-nanoRPLC-ESI-MS/MS (HCD with stepped NCEs). Three technical replicates (TR1, TR2 and TR3) were acquired for each of the four biological replicates (BRs). The MS-only base-peak chromatograms from BR1 is shown in supplementary Fig. S1. In terms of intact N-glycopeptide IDs, good reproducibility among the three technical replicates of each biological replicate was observed (supplementary Fig. S2).

Four the 12 TRs of the four BRs, 20,038 intact N-Glycopeptides corresponding to 4518 peptide backbones, 228 N-glycan monosaccharide compositions 1026 N-glycan putative structures, 4460 N-glycosites and 3437 intact N-glycoproteins were identified (Fig. 1b). 720 N-glycoproteins were identified with more than one N-glycosites and 3403 N-glycosites are newly discovered in our results and have not been annotated in UniProt as of July 25, 2019 (see supplementary Table S2). For each intact N-glycopeptide ID, the detailed tabular information including dataset number, spectral index, retention time, precursor ion (experimental and theoretical m/z, z, IPMD), accession number, peptide Sequence, N-glycosite, monosaccharide composition and structure in one-line text format, −log (P score), G-brackets and Glycoform score is provided in supplementary Table S1.

In the discovered peptides modified with glycosylation, 4356 peptides were identified containing one putative N-glycosite on the peptide backbones each with a single N-X-S/T (X ≠ P) sequon where 2587 were comfirmed with G-bracket score no less than one; besides, 12 peptides were confirmed with G-bracket score greater than zero where two putative N-glycosites exist. For example, N-glycosite N342 and N347 of ETS-related transcription factor Elf-2 are newly annotated N-glycosites in this research, where intact N-glycopeptide SGKNSSPINCSR_N4H5F0S1 and SGKNSSPINCSR_N3H5F0S1 was identified with G-brackets of singly charged b1, b6* (* = GlcNAc) and b1, b2 separately.

Among the 20,038 intact N-glycopeptide IDs, 1152 showed more than one glycosylated modification at the same site with GF scores ≥1 and G-bracket score no less than one. For N-glycosite N925 of transcription initiation factor TFIID subunit 4 (O00268, TAF4_HUMAN), intact N-glycopeptide ISETAQQKNFSYK_N3H5F1S0 was identified with fucose sequence/position isomers of 01Y(61F)41Y41M(31M41Y41L)61M61M (core) and 01Y41Y41M(31M41Y41L21F)61M61M (branch) with 3 (BII3,YII3,YII3) and 3 (15AII2,34AII3,CII2) structure-diagnostic fragment ions, respectively.

For N-glycosite of N144 of immunoglobulin heavy constant alpha 1(IGHA1), 28 distinct intact N-glycopeptides were identified with both GF scores >0 and G-bracket score ≥ 1. Igα is the main immunoglobulin in human secretions, and membrane-bound immunoglobulin acts as a receptor to bind to specific antigens in the recognition stage of humoral immunity. Furthermore, we found that most of these glycosylation modifications were identified in the heavy-labeling of cancer tissues which indicating a relationship with glycosylation change of N-glycoprotein in cancer tissues. Acute phase proteins are associated with various types of cancer and other clinical conditions and may be the result of inflammation. IGHA1 was selected as a blood biomarker in previous report [26, 27], Tetsuya Terasaki et al. carried out a research to quantitatively compare the plasma proteome of glioblastoma patients and healthy controls by SWATH mass spectrometry analysis in 2018. Validated by quantitative targeted absolute proteomics analysis, eight proteins including IGHA1 were identified as biomarker candidates with the area under the receiver operating characteristics curve of IGHA1 greater than 0.80.

For N-glycosite N262 on intact N-glycopeptide LGLSFNSISAVDNGSLANTPHLR of decorin, varied glycosylated modification were also identified. Decorin is a protein located in extracellular region that is speculated to be involved in activities which may affect fiber formation. Related studies have found that the up-regulation of targeted decorin can be used as a feasible anti-cancer treatment. Because decorin exerts a synergistic effect on tumor suppression and chemotherapy, it represents an attractive prognostic marker and is also a target for potential therapeutic anticancer applications. For example, the transfer and expression of human decorin cDNA mediated by adenovirus induced apoptosis of xenograft tumor cells in nude mice, and the non-inhibitory properties for other cells further proved its specificity to tumor cells [28]. In addition, in vitro studies have found that carboplatin and decorin have a synergistic effect on ovarian cancer cells [29]. To further explore the role of decorin in pancreatic cancer, Helmut Friess et al. explored the expression of decorin in normal pancreas and resected tumors through real-time quantitative PCR, Western blot analysis and immunohistochemistry. The study found that the expression of decorin in pancreatic cancer patients was highly up-regulated, and it was located in ECM. However, the finding also revealed that decorin attenuated the cytostatic effect of chemotherapy drugs on pancreatic cancer cells so its anti-tumor effect may be offset during chemotherapy. All in all, the expression level of decorin may serve as a useful molecular marker reference[30].

Among the 20,038 intact N-Glycopeptide IDs, N-glycan structures of 10,071 were confirmed with no less than one structure-diagnostic fragment ions, i.e., sequence isomers were unambiguously differentiated. For example, from N-glycosite N88 of translocon-associated protein subunit beta, intact N-glycopeptides IAPASNVSHTVVLRPLK with linkages of -01Y41Y41M(31M41Y41L32S)61M61M and -01Y41Y41M(31M41Y41L32S)61 M(31 M)61 M were identified with 37 and 25 structure-diagnostic fragment ions, respectively (Fig. 2).

Fig. 2
figure 2

Legend of N-glycan and peptide backbone on N88 of intact N-glycopeptides IAPASNVSHTVVLRPLK with linkages of -01Y41Y41M(31M41Y41L32S)61M61M (A) and -01Y41Y41M(31M41Y41L32S)61 M(31 M)61 M (B) annotated with the matched fragment ions (C), (D) and the corresponding MS/MS spectrum of the matched fragment ions (E), (F)

For the N-glycan types among the 20,038 intact N-Glycopeptide IDs, the percentage of mannose, hybrid and complex N-glycosylation are 39.3%, 18.0% and 42.7%, respectively. Compared with the control group, the ratio of complex N-glycosylation in pancreatic cancer tissues was increased while the mannose ratio decreased, and no significant change was observed in the ratio of hybrid (Fig. 3).

Fig. 3
figure 3

N-glycan type distribution of cancer vs. adjacent pancreatic tissues

With the criteria of observation of all the six isotopic peaks of the paired precursor ions in the MS spectra, 4072 intact N-glycopeptide IDs were quantified and 60 were observed at least three times out of the four biological replicates. A quantitative volcano plot graph of four biological replicates is shown in Fig. S3. With the additional criteria of fold of change no less than 1.5 and p value no bigger than 0.05, 52 DEGPs corresponding to 38 up-regulated and 14 down-regulated N-glycoproteins, respectively (Fig. 1b). The 38 up-regulated intact N-glycopeptides come from 19 N-glycoproteins and 14 down-regulated intact N-glycopeptides come from 5 N-glycoproteins (Tables 1 and 2). Tissue factor (TF_HUMAN) displayed the highest up-regulation followed by FADS3, SOX3, MFAP4, CX058, KCC2G, TSP1, MYOME, NAV3, UBQLN, SERPH, KCNQ5, IGHA1, PKHL1, CATD, ZN546, NAA80, PGS1 and TPP1. While Translocon-associated protein subunit beta (SSRB_HUMAN) was found to express the highest down-regulation followed by IF140, PDIA2, DVL3 and HYOU1. For instance, intact N-glycopeptide TMANVSLAFR_N2H5F0S0 from N-glycosite N57 of N-glycoprotein Zinc finger protein 546 (ZN546_HUMAN, Q86UE3) was found to be up-regulated (3.30 ± 0.89) in PC relative to control pancreas tissues (Fig. 4); intact N-glycopeptide ILNHLLLFVNQTLAAHR_N2H6F0S0 from N-glycosite N284 of N-glycoprotein Protein disulfide-isomerase A2 (PDIA2_HUMAN, Q13087) was found to be down-regulated (0.33 ± 0.07) in PC relative to control pancreas tissues (Fig. 5). Figure 6 showed selected unique intact N-glycopeptides (the RSD of the obtained from 4 biological duplicates was cut at 40%) with its glycosylation modification found in our quantitative MS/MS N-glycoproteomics analysis.

Table 1 Up-regulated N-glycoproteins in pancreatic cancer vs. adjacent tissues (ranked with increasing p values)
Table 2 Down-regulated N-glycoproteins in pancreatic cancer vs. adjacent tissues (ranked with increasing p values)
Fig. 4
figure 4

Down-regulation (0.33 ± 0.07) of intact N-glycopeptide ILNHLLLFVNQTLAAHR_N2H6F0S0 in pancreatic cancer relative to control tissue; the N-glycosite is N284 on N-glycoprotein Protein disulfide-isomerase A2 (PDIA2_HUMAN, Q13087). (A, B, C), paired precursor ions in the three technical replicates; (D, E) N-glycan and peptide backbone graphical fragmentation maps annotated with the matched fragment ions; (F) The MS/MS spectrum with the matched fragment ions

Fig. 5
figure 5

Up-regulation (3.30 ± 0.90) of intact N-glycopeptide TMANVSLAFR_N2H5F0S0 in pancreatic cancer relative to control tissue; the N-glycosite is N57 on N-glycoprotein Zinc finger protein 546 (ZN546_HUMAN, Q86UE3). (A, B, C), paired precursor ions in the three technical replicates; (D, E) N-glycan and peptide backbone graphical fragmentation maps annotated with the matched fragment ions; (F) The MS/MS spectrum with the matched fragment ions

Fig. 6
figure 6

Differentially expressed intact N-glycopeptides in pancreatic cancer vs. adjacent normal tissues

Gene ontology (GO)analysis was carried out using PANTHER (protein annotation through evolutionary relationship) classification system (http://pantherdb.org/) for the selected differentially expressed intact N-glycopeptides in pancreas tissues (Fig. 7). It was showed that most of them participates in catalytic activity and binding in molecular function whilethere are still some upregulations that also play a role in transporter activity, molecular function regulator, molecular transducer activity, and transcription regulator activity. Besides, they are mainly localized on cell or cell part, protein-containing complex, organelle and membrane region in cellular component. In the aspect of biological process, the upregulations mostly occur in biological regulation, metabolic process and cellular process while the downregulations mostly take part in cellular process, response to stimulus, biological regulation and signaling.

Fig. 7
figure 7

Gene ontology analysis of the intact N-glycoproteins corresponding to the differentially expressed intact N-glycopeptides in pancreatic cancer vs. adjacent normal tissues. (A) Molecular Function; (B) Cellular Component; (C) Biological Process

Discussion

Cathepsin D

Different differential regulations was observed on the same N-glycosites. For intact N-glycopeptides with the peptide backbone of GSLSYLNVTR on N263 of cathepsin D (CATD_P07339), a series of high-mannose N-glycan with compositions of N2HxF0S0 (x = 4, 5, 6, 7) was up-regulated in the range of 1.74 to 4.37 fold; whereas N-glycan with composition of N2H5F0S0 was down-regulated (0.37) in PC related to control pancreas tissues. Cathepsins are members of a large protease family including a series of variant cathepsin A, B, C, D, E, F, G, H, K, L, O, S, V, W, and X. They are generally present in acidic cellular organelles, lysosomes and endosomes. Cathepsin D (CatD) is a acid protease active in intracellular protein breakdown as well as involved in the pathogenesis of several diseases such as cardiovascular disease, osteoporosis, rheumatoid arthritis, atherosclerosis cancer and possibly Alzheimer disease. It has become a hotspot to clarify the mechanism of cathepsin involved in the pathogenesis of these diseases and how to regulate it to develop new treatment strategies. So CatD has attracted more and more attention in recent years due to its importance in the mediating of lysosomal cell death pathway and cancer. Cathepsin D secreted by tumor cells was found to help degrade basement membrane, thereby promoting invasion and metastasis [31]. There were several reports which have illustrated that CatD is overexpressed in some cancer types through proteomics analysis and related to poor prognosis [32,33,34]. In 2003, N. Bossard et al. performed a analysis on the prognostic impact of a tumor marker which drew a conclusion that CatD is an independent prognostic marker for breast cancer associated with metastatic risk [35]. Similarly, CatD was also identified as a prognostic factor in colorectal cancer [36, 37]. In addition, studies in recent years have reported serum proteomic analysis of lung cancer, and found that in the early stages of lung cancer, there were already high levels of anti-CatD autoantibodies in different forms in the serum of patients, simultaneously a new glycosylation modifications of CatD isoform was also discovered. The expression of this isoform is closely related to the tumor type, clinical stage, lymph node metastasis and smoking status of patients with lung cancer and affects the prognostic survival time of lung cancer patients, suggesting that it can be used as one of the indicators to evaluate the biological behavior and prognosis of squamous cell carcinoma, adenocarcinoma and small cell lung cancer [38]. In 2020, Junho Kang et al. performed meta-analysis to assess the relationship between cathepsin D and breast cancer, which showed that the high expression level of cathepsin D is related to the poor prognosis of breast cancer. According the subgroup analysis, CatD was believed to be used as a marker for poor prognosis and serve as a therapeutic target for breast cancer [39]. The expression of cathepsin D was proven up-regulated in researches on PDAC [40]. Tatjana Crnogorac-Jurcevic et al. reported that Anterior gradient 2 (AGR2) is universally expressed in tumor cells of pancreatic cancer patients in sporadic and familial settings. Furthermore, AGR2 promotes the proliferation of cancer cells in vitro and in vivo through two kinds of proteases induced by post-transcription, cathepsin B and cathepsin D which reveals the involvement of cathepsin D in the pathogenesis of pancreatic cancer [41]. Soo-Youn Lee et al. screened patients with PDAC on the clinical usefulness with three proteins including CatD, matrix metalloproteinases (MMPs), and tissue inhibitors of MMPs (TIMPs) which was observed differentially transcribed and expressed in pancreatic tumors before. The serum levels of CatD and MMP-7 in the PDAC group was significantly higher than that in the control. The sensitivity using cut-off value of biomarker panel composed of CA 19–9, CatD and MMP-7 was significantly improved, which may provide effective screening test currently available for PDAC.

TPP1

Tripeptidyl-peptidase 1 (TPP1_O14773) was quantified with intact N-glycopeptides FLSSSPHLPPSSYFNASGR_ N2H3F1S0/N2H3F1S0 with upregulation of 2.57 and 2.22 in PC related control pancreas tissues. It is a protease that functions in lysosomes and is observed highly expressed in bone marrow, placenta, lung, pineal gland and lymphocytes. It can cleave the N-terminal tripeptide from the substrate and has weak endopeptidase activity. Mucinous cysts, including intraductal papillary mucinous tumors (IPMN) and mucinous cystic tumors (MCN), are precursor lesions of pancreatic cancer, if they have high-grade dysplasia or invasive carcinoma (HGD/IC), It should be removed. While it is currently recommended to monitor their malignant progression when the lesions are diagnosed as low-grade dysplasia (LGD) because they are generally considered to be benign. In the research performed by Charles S. Craik et al., increased aminopeptidase activity in fluid from mucinous cysts was identified which determined that TPP1 plays a major role in this activity [42]. Followed by sensitive, targeted proteomics analysis, TPP1 was proved with significantly increased levels in mucinous cysts compared to non-mucinous cysts. What is more, TPP1 activity is mainly related to HGD/IC, which is a key factor in determining whether the cyst should be surgically removed.

BGN

Biglycan (BGN_P21810) was identified with a range of up-regulated (1.55–6.46) intact N-glycopeptides with two peptide backbones (LLQVVYLHSNNITK and MIENGSLSFLPTLR) and six monosaccharide compositons (N2H12F0S0, N4H5F1S0, N4H5F2S0, N4H5F1S1, N4H5F1S2 and N4H6F2S0). Biglycan is a class I small leucine rich proteoglycan (SLRP), belonging to a diverse sub-group of proteoglycans which are involved in matrix organization and regulation of cell growth and signaling. It is expressed in ECM and serves as a key matrix component and an important signal molecule [43]. Biglycan influences cancer cell migration through interacting with TLR2 and TLR4 to initiate inflammation in innate immune cells via the NF-κB pathway [44, 45]. The research performed by Xiaojing Xing et al. revealed that biglycan up-regulates the expression of VEGF in colon cancer cells and promotes tumor angiogenesis [46]. Apart from this, various study have reported that the upregulation of biglycan was associated with cell proliferation, cell migration, metastasis and angiogenesis [47,48,49,50]. A paper published recently observed the mRNA expression levels of biglycan increased compared with normal tissues in bladder, brain and central nervous system, breast, colorectal, esophageal, gastric, head and neck, lung, ovarian and 28 subtypes of cancer [51]. The discovery of increased biglycan expression and its positive correlation with poor patient survival indicates that it can be used as a prognostic marker and as a target for a variety of new cancer therapies [51, 52]. However, biglycan overexpression may reduce tumorigenic potential thereby modifying tumor proliferation by regulating receptors and cell expression molecules in the tumor microenvironment. In 2001, Christoph K. Weber et al. detected the expression of biglycan in cell lines and tissue samples by Northern blot and immunofluorescence and measured the effect on proliferation. A conclusion was drew that biglycan induces G1-arrest in pancreatic cancer cell lines and may be part of the host defense mechanism designed to slow the progression of pancreatic tumors [53]. Giuseppe Aprile et al. further explored the prognostic role of biglycan expression in pancreatic cancer and confirmed biglycan expression as negative prognostic factor [54]. A research in recent years revealed that negative control of cell migration in highly metastatic pancreatic cancer cells is mediated by induction of biglycan, suggesting targeting the pathway related to biglycan is a potential therapeutic strategy to interfere with invasion and metastasis in highly metastatic PDAC [55].

NAV3

Neuron navigator 3 (NAV3_Q8IVL0) was quantified with upregulation (6.26 to 9.71) with intact N-glycopeptides SLGNMTGR_N2HxF0S0 (x = 6, 8). NAV3 is a protein located in nucleus outer membrane and may be involved in neuron regeneration. NAV3 was suggested previously to be a novel cancer-associated gene that contributes to the pathogenesis of a skin cancer subgroup of basal cell carcinoma and squamous cell carcinoma [56]. Besides, NAV3 undergoes transcriptional induction by epidermal growth factor, and locates at the tip of microtubules enhancing their growth. By regulating the dynamics of microtubules, NAV3 inhibits the random pattern of cell migration and inhibits the spread of breast cancer which is consistent with the ability of relatively low NAV3 abundance to predict poor prognosis in cancer patients [57].

Serpin H1

For heat shock protein 47 (HSP47, also called as serpin H1), intact N-glycopeptides with compositions of N4H3F1S0 and N2HxF0S0 (x = 5, 6, 7, 8) were found to be up-regulated in the range of 2.61–7.49 fold. HSP47 is an important companion for the proper folding and secretion of collagen. A number of studies have shown that HSP47 plays a role in the progress of collagen synthesis, preventing collagen aggregation and inducing the hydroxylation of proline and lysine residues [58]. HSP47 is encoded by the SERPINH1 gene whose location is one of the most frequently areas related to human cancer amplification [59]. HSP47 can promote tumor growth and invasion by regulating the ECM network, and may serve as a potential biomarker and therapeutic target [60]. Changes in the expression level of HSP47 are associated with several types of cancer, such as cervical cancer, breast cancer, lung cancer, colorectal cancer and gastric cancer [61,62,63,64]. In a study using immunohistochemistry to verify differential protein expression in a series of surgically removed invasive ductal pancreatic adenocarcinomas, HSP47 was universally expressed in stromal hyperplasia associated with ductal adenocarcinoma, and in 65% of tumor epithelium [65]. Furthermore, HSP47 was found positively expressed in pancreatic nonductal neoplasms [66].

PDE4DIP

For myomegalin (phosphodiesterase 4D interacting protein, PDE4DIP), intact N-glycopeptides with the compositions of N2H5F0S0 and N2H6F0S0 were found to be 5.46- and 10.80-fold up-regulated. PDE4DIP functions as an anchor sequestering components of the cAMP-dependent pathway to Golgi and/or centrosomes. In 2007, Hideaki Shimada et al. performed serological identification of antigens by recombinant cDNA expression cloning (SEREX) and determine myomegalin as a new SEREX antigen for esophageal squamous cell carcinoma [67]. Western blot analysis revealed that serum anti-myomegalin antibodies (s-MMGL-Abs) were present in 47% of the investigated patients, and multivariate analysis indicated that the presence of s-MMGL-Abs was significantly associated with a favorable prognosis which consequently may be a useful tumor marker for diagnosis and establishment of prognosis. In a recent study, mutant genes including PDE4DIP were observed to be related to the cause of lung cancer through statistical analysis of cancerous protein sequences [68]. Li Gang et al. carried out a research to reveal the intrinsic gene mutations associated with pancreatic adenosquamous carcinoma (PASC) through whole exome sequencing. Susceptibility genes including PDE4DIP was frequently found to be mutated in the germlines of PASC patients suggesting the biomarker potential of its corresponding protein [69].

TF

For tissue factor (TF_P13726), intact N-glycopeptides with the compositions of N2H6F0S0 and N2H5F0S0 TF were found to be 4.26- and 20.02-fold up-regulated at N-glycosite N156. TF is a transmembrane glycoprotein and its main function is to activate the coagulation cascade. There is very low level of TF in the blood of healthy individuals, however, overexpression of TF is related to tumor growth, tumor angiogenesis and metastatic potential in many malignant tumors including breast cancer, lung cancer, gastrointestinal cancers, urogenital cancers, melanomas, gliomas, colorectal and lung cancer [70, 71]. Patients with various types of cancer (including pancreatic cancer, colorectal cancer, and stomach cancer) often experience thrombosis, and components from the coagulation cascade also affect cancer progression [72]. Therefore, TF should be paid considerable attention as a determinant of tumor progression. In researches on pancreatic cancer, expression of tissue factor in pancreatic adenocarcinoma was found to be associated with activation of coagulation and reduced survival time [73]. The use of circulating TF levels as a biomarker of cancer thrombosis risk requires further evaluation and current strategies have applied TF as a target for anti-tumor therapy [74, 75].

TSP1

For thrombospondin-1 (TSP1_P07996), intact N-glycopeptides with compositions of N4H5F0S0, N4H5F0S1 and N5H4F0S0 was found to be up-regulated by 2.58–16.28 fold, which may function as an early negative feedback to restrain pancreatic carcinogenesis. TSP1 is a 450 kDa platelet and matrix glycoprotein located in extracellular matrix. It may involve in the endoplasmic reticulum stress response and binds to the cell surface in the presence of extracellular Ca2+ [76,77,78]. Thrombospondin-1 is composed of six domains which bind to different molecules and participate various pathways of cancer [79]. Thrombospondin-1 acts as a cancer promoter in some pathways [80,81,82], but plays a inhibitory role in other pathways [83, 84], making it complex to subdivide the role of thrombospondin 1 and distinguish them in the mechanism of pancreatic cancer. Researches based on by quantitative glycoproteomics analysis have also discovered the important role of TSP1. David M. Lubman et al. applied lectin array, TMT isobaric labeling quantification and in parallel label-free method strategy to analyze serum samples for finding reliable biomarker candidates in distinguishing pancreatic cancer and several pancreatic cancer related disease states along with healthy controls. Candidates consisting of α-1-antichymotrypsin (AACT), thrombospondin-1 (TSP1), and haptoglobin (HPT) were identified as potential markers, which are highly complementary to CA 19–9 and the marker panel of AACT, TSP1, HPT, and CA 19–9 showed a high diagnostic potential in distinguishing pancreatic cancer from other conditions [85]. Additionally, TSP1 active peptides such as 3TSR showed a significant ability to inhibit glioblastoma and pancreatic cancer which are worth further research [86].

PDIA2

For protein disulfide isomerase A2 (PDIA2_Q13087, also called PDIp), on N-glycosite N284 intact N-glycopeptides with compositions of N4H3F0S0/N2HxF0S0 (x = 3, 4, 5, 6) was found with 1.63–3.43 fold down regulation. PDIA2 acts as an intracellular estrogen binding protein and it may be involved in regulating the cellular level and biological functions of estrogen in the pancreas. Besides, it may act as a chaperone protein to inhibit aggregation of misfolded proteins [87, 88]. In 2009, Bao Ting Zhu et al. pointed out that the absence of PDIp expression in pancreatic adenocarcinoma may serve as an additional biomarker for pancreatic cancer [89]. Later in research paper published by Daniel Ansari et al., PDIA2 was also verified as downregulated proteins in pancreatic cancer compared to healthy control [27].

HYOU1

For hypoxia up-regulated protein 1 (HYOU1_Q9Y4L1), alternatively known as ORP-150, intact N-glycopeptides with N-glycosites N830, N869 and N931 and high-mannose N-glycans N2HxF0S0 (x = 5, 6, 7, 8) were found to be down-regulated (1.62–7.98) in pancreas tissues. HYOU1 plays an important role in hypoxia/ischemia and angiogenesis [90]. HYOU1 was reported to be overexpressed in some early research on bladder cancer, pancreatic and thyroid carcinomas [91, 92]. A MS-based research applied to discover and protein biomarker in pancreatic cancer also verified HYOU1 as one of the differentially expressed extracellular proteins in 2019 [93].

DVL3

For Segment polarity protein dishevelled homolog DVL-3 (DVL3_Q92997, also called Dishevelled-3), intact N-glycopeptides LNGTAKGER with N-glycosite N159 and compositions of N2H6F0S0 and N2H8F0S0 were found down-regulated (2.50–3.41) in pancreatic cancer. DVL-3 belongs to the dishevelled (DVL) protein family, which serves as a cytoplasmic scaffold protein bridge acceptor and downstream target [94]. DVL3 is discovered to be abnormally expressed in various tumors, including esophageal squamous cell carcinoma, breast cancer, lung cancer and glioblastoma [95,96,97]. Besides, it has been proved that DVL3 participates in both Wnt/β-catenin pathway and Notch signaling pathway making influence on cancer progression, chemoresistance and even maintain stem cell-like characteristics [98, 99]. A recent research proved that DVL3 is a key regulator of colorectal cancer stem cells and chemoresistance suggesting the potential of targeting DVL3 as a strategy for colorectal cancer treatment. At present, there are still few reports on the mechanism of DVL3 in pancreatic cancer.

In addition to the above-mentioned differential expressed proteins, others listed in our quantitative results in Tables 1 and 2 also plays a vital role in the progression of cancer. Some identified proteins may be subunits of certain protein families involved in inflammation, invasion and metastasis, and there may be still no targeted detail reported. For example, calcium/calmodulin-dependent protein kinase type II subunit gamma (KCC2G_Q13555–6) is a member of calcium/calmodulin-dependent kinases (CaM-kinase) protein family which participate in the activation of antiapoptotic signaling pathways, and Franklin RA et al. proposed that inhibition of the CaM-kinases has the potential to sensitize cancer cells [100].

Proteins and their functional interactions form the backbone of cellular mechanisms. In order to fully understand biological phenomena, their connectivity networks need to be considered. The list of differentially expressed proteins above were further analysis using string resources online in https://string-db.org/ [101]. A protein-protein association networks of the DEGPs was obtained (Fig. 8).

Fig. 8
figure 8

Association networks of the differentially expressed proteins in pancreatic cancer vs. adjacent normal tissues

Some of the differential N-glycoproteins of pancreatic cancer quantified in study have also been found in our previous research of other cancer systems. Ubiquilin-like protein (UBQLN_Q8IYU4) and Serpin H1 (SERPH_P50454) were also identified as DEGPs in MCF-7 cancer stem cells [22, 23]. The 5 different glycosylation modifications on N25 of ubiquilin-like protein were all identified in this study and have similar up-regulation trends in MCF-7 CSCs relative to MCF-7 cells. Intact N-glycopeptide SLSNSTAR_N2H8F0S0 from N-glycosite N120 of serpin H1 was found to be up regulated in MCF-7 CSCs relative to MCF-7 cells and a series of five high-mannose intact N-glycopeptide SLSNSTAR_N2HxF0S0 (x = 5, 6, 7, 8, 9) was identified on N-glycosite N120 with continuous transition from up-regulation at x = 5 to down-regulation at x = 9 in MCF-7/ADR CSCs vs. MCF-7/ADR. The differential expression of different glycoform modifications at the same protein site reflects the complex mechanism of serpin H1 in cancer pathology and may be related to cell drug resistance. In addition, In an early report on quantitative structural N-glycoproteomics of differentially expressed N-glycosylation in hepatocellular carcinoma by our group, proteins discovered in this quantitative analysis of PC vs control like cathepsin D, tripeptidyl-peptidase 1, zinc finger protein 546 was also identified [18]. The different monosaccharide compositions of cathepsin D intact glycopeptide GSLSYLNVTR exhibit expression in different directions which was similar to the situation in this article. Besides, the different expression of sialic acid linkage isomers of intact N-glycopeptide FLSSSPHLPPSSYFNASGR from protein tripeptidyl-peptidase 1 remind us that further work is required in the sialic acid linkage isomers in glycoprotein related to pancreatic cancer.

Conclusion

Using cell models, we have previously successfully developed site- and structure-specific quantitative N-glycoproteomics for discovery of putative N-glycoprotein cancer markers at the intact N-glycopeptide level. Site and structure of each N-glycosylation are confirmed with site-determining and structure-diagnostic fragment ions, respectively; and most interestingly, we observed site and structure-specific differential expression of both site isomers (same N-glycan at different sites of the same protein) and structure isomers (such as sialic acid linkage isomers and fucose position isomers). Here we report our benchmark of this method for clinical tissue samples with pancreatic cancer as a case.

With spectrum-level FDR ≤1%, 20,038 intact N-Glycopeptides corresponding to 4518 peptide backbones, 228 N-glycan monosaccharide compositions 1026 N-glycan putative structures, 4460 N-glycosites and 3437 intact N-glycoproteins were identified. With the criteria of ≥1.5-fold change and p value<0.05, 52 differentially expressed intact N-glycopeptides (DEGPs) were found in pancreatic cancer tissues relative to control, where 38 up-regulated and 14 down-regulated, respectively. Aberrant N-glycosylation found this study is in good consistence with the literature results from both proteomics and other orthogonal methods (such as transcriptomics).

By and large, site- and structure-specific quantitative tissue N-glycoproteomics is a capable method for discovery of putative N-glycoprotein markers in clinical cancer vs. adjacent tissues, although big heterogeneity was observed among different patients.