Introduction

Glycosylation is widely involved in regulating protein folding, signal transduction, immune recognition, and intracellular transport of nascent proteins [1, 2]. Fucose modifications are catalyzed by thirteen fucose glycosyltransferases (FUTs), in which α-1,6-fucose glycosyltransferase (FUT8) transfers fucose to the innermost N-acetylglucosamine (GlcNAc) of the N-glycan pentasaccharide core [3], and forms the core fucosylation (CF) structure. CF modification has been closely related to physiological processes [4] and regulation of cancer-related processes, including the proliferation of human gastric cancer cells [5], progression of prostate cancer [6], and metastasis of melanoma [7]. A comprehensive understanding of the CF glycoproteome and its regulation is expected to reveal novel biological mechanisms or disease targets.

Although methods for glycopeptide enrichment and mass spectrometry (MS) sequencing [8,9,10,11,12,13,14] have been developed to improve the efficiency of glycopeptide identification, the microheterogeneity of glycoforms still causes low-abundance intact glycopeptides that either do not trigger fragment spectra acquisition during tandem mass spectrometry (MS/MS) sequencing or trigger whereas with low quality spectra [15]. It is possible to simplify the structural complexity of the N-glycans by glycosidase treatment. PNGase F treatment of intact glycopeptides has been used, however, the difference between CF glycoforms and non-CF glycoforms would be lost [16, 17]. Treatment by single or combination of exo- or endoglycosidases, which left with the simplified glycopeptides containing disaccharide (fucosylated N-acetyl glucosamine, FucHexNAc), were used to study the CF glycoproteome [16, 18,19,20]. In addition, different MS parameters were also evaluated to improve the identification of CF glycoproteins. Examples include the combination of higher-energy collisional dissociation (HCD) and electron transfer dissociation (ETD) [21], the use of stepped normalized collision energy (stepped NCE) [22, 23], and the reference of Y1F/Y1 ratios at 20% and lower HCD energies [24].

In a series of studies, we have focused on the enrichment and identification of CF glycoproteins with increased performance. In 2009, Jia et al. [25] established a neutral loss-dependent MS3 scanning method following lectin enrichment and molecular weight cutoff application with Endo F3 treatment to simplify the glycans. Cao et al[22], for the first time, implemented the “stepped NCE” scanning approach to obtain high-quality spectra for glycopeptides and identified a total of 477 CF proteins from human plasma. Zhao et al. [26] used hydrophilic interaction chromatography (HILIC) enrichment followed by serial treatment with Endo H and Endo F3, to obtain the site-specific CF occupation ratios of glycopeptides. However, current approaches are still challenging to perform due to the long experimental duration and potential quantitative variation caused by the multi-steps of pH adjustment and enzymatic treatment. Both of these issues are detrimental for the analysis of large-scale clinical samples. We reasoned further optimization and simplification of the experimental procedure may be possible for quantitative analysis of site-specific CF glycopeptides in large scale applications by the single-step truncation (SST) strategy, using only Endo F3 to eliminate the heterogeneities of N-glycans. In the present work, we first chose the HeLa cell line as a model and evaluated the SST strategy for CF glycopeptide identification and quantification and then compared it with previously reported methods.

Pancreatic adenocarcinoma, commonly referred to pancreatic ductal adenocarcinoma (PDAC), accounts for more than 85% of all pancreatic malignancy cases. Only 15-20% of pancreatic cancer patients can undergo radical surgical resection, with an overall five-year survival rate of less than 10% [27, 28]. Study showed that dysregulated CF glycoproteins could be used to distinguish healthy individuals and patients with chronic pancreatitis from patients with pancreatic cancer [29] and were associated with metastasis of PDAC [30, 31]. Therefore, we practiced our SST strategy on two PDAC cell lines with different gemcitabine sensitivity and on 9 paired pre- and postoperative serum samples from PDAC patients. We found that CF modification of BCHE, CDH5 and SERPIND1 were associated with poor prognosis in PDAC and may be used as potential prognostic biomarkers.

Results and discussion

Characterization of CF glycopeptides based on the single-step truncation (SST) strategy

We intend to develop a straightforward and facile strategy for the quantitative analysis of CF glycoproteome. Immunoglobulins (IgGs), which contain high mannose glycoforms as well as complex and hybrid N-glycan chains, with bi-antennae glycans on the EEQYNSTYR and EEQFNSTFR motifs as the dominant glycoforms were first used to verify the efficiency and feasibility of the SST strategy based on matrix-assisted laser desorption/ionization–time of flight (MALDI-TOF) MS (Fig. S1). HILIC enrichment significantly enhanced the signals from glycopeptides, indicating that other non-glycosylated peptides were effectively removed. Upon Endo F3 treatment, the signals from intact glycopeptides were eliminated along with the presence of peaks from the simplified CF glycopeptide, which proved the successful cleavage of the peripheral glycans (Fig. 1A).

Furthermore, we compared the two approaches to cleave the glycan chain, means only Endo F3 (SST) and the sequential cleavage by Endo F3 followed by Endo H (HEE) [22, 26], using total lysate from the HeLa cell line, and followed by mass spectrometric-based glycoproteomics analysis. With the SST strategy, a total of 661 CF glycopeptides were identified in three replicates. We also detected 90 glycopeptides bearing only one N-acetyl glucosamine (named HN glycopeptides), which may have been generated by the cleavage of non-CF glycans by Endo F3. Additionally, the break of O-glycosidic bond between fucose and GlcNAc due to in-source fragmentation may also generate HN glycopeptide. With the HEE approach, similar to the SST strategy, a total of 740 CF glycopeptides were identified, while 1305 HN glycopeptides were detected (Fig. 1B, Fig. S2). Since Endo H cleaves high-mannose and hybrid glycoforms, it contributed limited incremental on CF-glycopeptides (79 here). Thus we can reasoned that most of the peptides with CF modifications were complex glycoforms, and the 1305 HN glycopeptides indicated high mannose and hybrid glycoforms were highly expressed in HeLa cells, which is consistent with a previous report [32].

In summary, the results demonstrated that the treatment of glycopeptides with Endo F3 could cover majority of the CF glycopeptides and it’s feasible to obtain the CF glycoproteome using only single-step enzymatic cleavage of the glycoforms.

Fig. 1
figure 1

Validation of the single-step truncation strategy to profile the CF glycoproteome. A, MALDI spectra of tryptic peptides of IgGs (top), glycopeptides enriched by HILIC (middle), and CF glycopeptides detected after Endo F3 cleavage of the enriched glycopeptides (bottom). The blue squares indicate N-acetylglucosamine, the green circles indicate mannose, the yellow circles indicate galactose, and the red triangles indicate fucose. B, Numbers of identified CF glycopeptide by single-step Endo F3 or sequential Endo F3-Endo H treatment of three biological replicates

Comparison of three strategies for site-specific CF glycoprotein profiling

Several different schemes have been established and tested by our laboratory or other laboratories [22, 25, 26, 33,34,35] (Table S1). In addition to the HEE approach, another multi-step treatment approach in which the second enzyme is PNGase F (named as HEP approach) was included in systematic comparison for qualitative and quantitative analysis of CF glycopeptides in the Hela cells [34, 36].

Since both HEE and HEP need two steps of pH-adjustment and glycosidase treatment, which are laborious and may introduce bias for quantitation. This could be a huge challenge when a large cohort of samples will be analyzed. Here we performed a thorough comparison among SST, HEE and HEP approaches as to their efficiency and quantitative performance for CF glycoproteome (Fig. 2A). Firstly, the sample treatment duration of SST strategy is only about 36 h, which is in direct comparison to 53 h for HEE and 52 h for HEP. The number of identified CF glycosites was quite close for SST (772 glycosites) and HEE (715 glycosites) methods, while the HEP method provided 605 identifications (Fig. 2B-C).

We further evaluated the stability of the three strategies for the quantification of CF glycopeptides. The average intra-strategy Spearman correlation coefficients were 0.940, 0.916, and 0.916 in the triplicate analyses for the SST, HEE, and HEP strategies, respectively (Fig. 2D). Obviously, the SST strategy showed the best quantitative stability. The violin plot (Fig. 2E) shows that the SST method had the best reproducibility, with a median CV of 23.02%, and that the HEP method had the lowest reproducibility, with a median CV of 28.54%. While for inter-strategy correlation coefficients, the HEP method exhibited the lowest values, with all values less than 0.8. This was probably caused by the large change in the pH value (from 4.5 to 8.0) between the Endo F3 and the PNGase F cleavage steps. In addition, we compared the two sequential cleavage strategies (HEE and HEP) in terms of site-specific CF occupation ratio (CFratio) [26], which are quite stable among replicates of each strategy whereas lower between the HEE and HEP strategies (Fig. S3), indicating potential quantification bias caused by the treatment of different glycosidases.

Since it is reported that Endo F3 has enzymatic activity only for complex N glycoforms and low activity for tri- and tetra-antennary glycans [37], we checked the identification of intact glycopeptides with complex glycoforms by the SST method (Fig. S4). Only eight intact glycopeptides with potential CF group were identified in the three replicates, of which four were triple-antennae and four were tetra-antennae, indicated majority of glycans containing CF were effectively released from the glycopeptides.

Clearly, we demonstrated that the SST method was the most time-efficient, with the highest number of identifications and the best quantitative reproducibility. In summary, the SST strategy can be used as a powerful tool for the quantification of site-specific CF modifications in large-scale application.

Fig. 2
figure 2

Comparison of the three strategies for the simplification and identification of CF glycoproteome. A, The Endo F3-only cleavage is named as SST, the sequential Endo F3-18O-assisted PNGase F cleavage is named as HEP and the sequential Endo H-Endo F3 cleavage is named as HEE. CF glycopeptide identification results of three strategies (B) and the overlaps among them (C). D, Correlation analysis on the quantification of CF glycopeptide from three bioreplications by SST, HEE and HEP. E, Coefficient of variations of CF glycopeptides quantitation. (** denotes P < 0.01, ns denotes P > 0.05)

CF glycoproteome differences between gemcitabine-sensitive and gemcitabine-resistant pancreatic cancer cells

A recent study reported that FUT8 may promote drug resistance and cell migration in pancreatic cancer [31], thus we first implemented the SST method to explore the differences in CF modification between the gemcitabine sensitive cells BxPC-3 and gemcitabine-resistant cells PANC-1. Our recent proteomics analysis showed that PANC-1 cells expressed higher level of mesenchymal markers, while BxPC-3 cells showed a more epithelial phenotype [38]. The MS spectra suggested that PANC-1 bears mostly high-mannose and hybrid glycans, while BxPC-3 showed more complex N-glycan pattern (Fig. S5A-B). In BxPC-3 cells, 900 CF glycopeptides were identified, much higher than the 666 identified in PANC-1 cells (Fig. S5C-D), which was in good concordance with the glycan patterns. The high correlation among replicate experiments from each cell line indicated good reproducibility (Fig. S5E-F). Principal component analysis (PCA) of the CF glycopeptide intensity clearly separated the two cell lines (Fig. S6A), indicating excellent discrimination of the CF glycoproteome.

Differential analysis identified 287 dysregulated CF glycopeptides, of which 131 were up-regulated and 156 were down-regulated in PANC-1 cells (Fig. S6B-C). Gene Ontology (GO) analysis showed that the differential CF glycoproteins are localized mainly on the cell membrane and with functions such as cell adhesion and signaling (Fig. S6D), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis indicated that these proteins are closely related to the cell adhesion, ECM-receptor interaction, axon guidance and phosphatidylinositol 3-kinase (PI3K)/AKT signaling pathways (Fig. S6E). The PI3K/AKT pathway may promote metastasis, invasiveness and drug resistance in pancreatic cancer [39, 40], and the axon guidance pathway has been shown in many studies to be closely associated with infiltration, metastasis and poor prognosis in pancreatic cancer [41,42,43]. In summary, alteration of CF glycosylation widely existed between the gemcitabine-sensitive and gemcitabine-resistant pancreatic cancer cells.

CF glycoproteome of pre- and postoperative serum from patients with PDAC

To investigate the CF glycoproteins related to the development of PDAC, paired pre- and postoperative serum samples from nine PDAC patients were analyzed at both the CF glycoproteome and the total proteome levels. A problem occurred with the 63 H MS data acquisition; thus, we excluded the related results from the subsequent data analysis. We identified 869 CF glycopeptides derived from 456 proteins (Fig. S7A). Spearman correlation analysis of the CF glycopeptides was performed to give a glimpse on the relationship among the patients in the cohort and between pre- and postoperative samples from individual patients (Fig. 3A). For six of the patients, surgery distinctively changed the CF patterns, as manifested by deviation in the correlation value intra pre- or post-operative group (Fig. 3). However, for patients 27 and 66, surgery led to only minor alteration in the CF patterns, as manifested by the close correlations of the pre- and postoperative serum from these two patients with the entire cohort of preoperative samples. Further analysis of the survival status showed that patients 27 and 66 were among the patients with the worst prognosis (Fig. 3B-C, Table S2). These results indicated that the homeostasis of the serum CF modification may be used as prognosis biomarkers.

Further exploration of the differentially regulated CF glycopeptides revealed 44 upregulated and 62 downregulated CF glycopeptides in the preoperative group. Clustering analysis clearly discriminated the pre- and post-operative groups (Fig. 3D, Fig. S7D). Again, the postoperative samples from patients 27 and 66 retained most of the preoperative features, whereas added with partial of the features caused by surgery.

The site-specific CF glycopeptide features, including BCHE_N369_CF (DJNSIITR_2), CDH5_N112_CF (LDREJISEYHLTAVIVDK_5) and SERPIND1_N49_CF (JLSMPLLPADFHK_1), that over-expressed in all the preoperative sample and postoperative from patients 27 and 66, could be used as potential prognostic biomarkers (Fig. 4).

High serum levels were recently shown to be significantly associated with worsened overall survival (OS) in ovarian cancer [44, 45]. High serum CDH5 levels were found to be significantly associated with poor progression-free survival and OS in metastatic breast [46] and gastric cancers [47]. SERPIND1 was found to be significantly upregulated in preoperative serum exosomes from patients with hepatocellular carcinoma (HCC) and its decrease upon surgery was considered a good prognostic marker for HCC patients [48]. In addition, the serum SERPIND1 level was used to assess the therapeutic benefits in patients with acute lymphoblastic leukemia [49].

We also performed differential analysis of the total proteome. Of the 587 proteins identified (Fig. S7G), 24 were upregulated and 26 were downregulated in the preoperative group. GO analysis indicated that the proteins were localized mostly in the secreted granule lumen and associated with biological processes such as the immune response and coagulation (Fig. S7I-L). In contrast to the fluctuation in their CF levels, the protein levels of BCHE, CDH5 and SERPIND1 did not show significant difference between the pre- and postoperative groups (Fig. 4).

Fig. 3
figure 3

CF glycoproteome landscape of pre- and postoperative serum from patients with PDAC. (A) Correlation, among the CF glycoproteome (* marked on patients with poor prognosis). (B/C) The average Spearman correlation of CF glycoproteome of each sample with all the preoperative serum (the vertical coordinate) along with the increasing in recurrence-free survival (RFS) (the horizontal coordinate) (B) or overall survival (OS) (the horizontal coordinate) (C). (D) Differentially expressed CF glycopeptides between pre- and postoperative group, Q indicates pre-operative and H indicates post-operative. (The title of the heat map rows is the CF glycopeptide sequences, in which letter J was used to replace N on the glycosites)

Fig. 4
figure 4

Pre- and postoperative paired plots at the CF glycopeptides and the protein levels of serum from PDAC patients. (** denotes P < 0.01, ns denotes P > 0.05)

Conclusion

By comparing different strategies for CF glycoproteome analysis, we presented the facile SST strategy, which provided the best reproducibility with the fewest operation procedures and more economic. The strategy was evaluated with both cell lines and serum samples from patients with PDAC. The CF glycoproteome revealed distinctive differences in the CF modification patterns between the pre- and postoperative serum samples. Further exploration found that the consistently high levels of BCHE_N369_CF, CDH5_N112_CF and SERPIND1_N49_CF before and after surgery may serve as potential biomarkers for worse prognosis. We hope to validate and extend our discovery in a large patient cohort in the next step. Nevertheless, our study provides a promising strategy for large-scale CF glycoproteome, and potentiate its clinical practicability.