Introduction

Glycosylation is one of the most important post-translational modifications (PTM) of proteins and in fact the vast majority of human proteins are glycosylated. The glycan structures are highly variable and differ not only between different proteins; the same protein usually carries many different glycans on a single site, a phenomenon called microheterogeneity, which gives rise to different protein species [1, 2]. Carbohydrate chains can affect a protein’s solubility and stability, but most importantly the complex structures of oligosaccharides are used for highly specific interactions between cells and protein receptors that are essential for life in multicellular organisms [2,3,4,5,6]. If cancer cells start to grow abnormally, cell-cell communication is severely changed and glycan structures are altered in cancer cells [6,7,8,9,10,11]. In diagnosis and therapy of cancer, the role of glycosylation of proteins has become a major field of interest. Analysis of patient-specific glycans of glycoproteins is a promising approach for cancer diagnosis [10,11,12,13,14,15].

Early diagnosis of cancer is crucial for success of curative treatment. It was for example shown that biannual screening for hepatocellular carcinoma (HCC), which is the most common type of liver cancer, reduces mortality by 37% [16]. Liver cancer is the second most frequent cause of death from cancer worldwide and its 5-year survival rate is the second lowest among all cancers [17, 18]. In the Western world, the main risk factor for liver cancer is liver cirrhosis (LC), primarily caused by viral hepatitis B or C (HBV or HCV), alcohol abuse, or non-alcoholic steatohepatitis (NASH) [19, 20].

Several biomarkers are currently used to diagnose HCC, but all of them suffer from low specificity [21, 22]. The most important biomarker is alpha-fetoprotein (AFP), a human glycoprotein that is expressed during embryogenesis, giving 65% sensitivity and 89% specificity in a highly genetically homogeneous population [21]. Other populations differ grossly from this value. Measurement of the fucosylated AFP glycoform L3 by immune affinity electrophoresis improves sensitivity and specificity [21, 23]. Still an easy and facile new diagnostic method for surveillance of patients at risk for HCC is needed [24]. Several studies report changes of glycosylation in patients with hepatocellular carcinoma [25,26,27,28,29,30,31,32,33,34,35,36,37,38]. For example, Tanabe et al. analyzed tryptic glycopeptides from human serum of HCC patients by lectin affinity chromatography and time-of-flight mass spectrometry. They found 12 HCC marker candidates including a multifucosylated alpha-1-acid glycopeptide being significantly increased in HCC patients [27]. Zhu et al. analyzed PNGaseF released and permethylated glycans from immuneprecipitated haptoglobin from the serum of HCC and LC patients by MALDI-QIT-TOF MS. They found bifucosylated haptoglobin increased in HCC patients [30]. Yin et al. analyzed ceruloplasmin from HCC and LC patients after digestion with trypsin and Endo F3 and reported that core-fucosylation of ceruloplasmin is significantly increased in alcohol-related HCC patients compared to alcohol-related LC patients [32, 33].

Glycan structures from glycoproteins can be analyzed by detection of specific glycan structures by antibodies, by chromatography techniques, by mass spectrometry (MS), and nuclear magnetic resonance (NMR) spectroscopy. Most often, primary structures of glycans are obtained by mass spectrometry. In a recent study, we developed a new technique for analysis of glycoproteins from human plasma by using liquid chromatography combined with mass spectrometry to obtain reliable results for quantification of glycan compositions in a very short time [39]. The ease of use and the fast time line for an analytical process that should potentially enter the clinic is essential in terms of cost and the patient’s comfort.

Fibrinogen is an abundant glycoprotein of human plasma found in concentrations of about 2.5 mg/mL [40]. The three subunits of fibrinogen (Aα, Bβ, and γ) are glycosylated and/or phosphorylated. Both N-glycans and O-glycans are observed.

N-Glycosylation analysis of human plasma fibrinogen has been described by Adamczyk et al. who found 53.0% of the monosialylated biantennary complex-type N-glycan and 32.6% of the disialylated one [41]. A monofucosylated glycoform was reported as monosialylated buiantennary glycan at 4.4 and 4.3% were assigned to the non-sialylated, non-fucosylated biantennary glycan [41]. N-Glycan sialylation affects the clotting behavior of fibrinogen and is, thus, important for formation of fibrin structures [42].

Zauner et al. analyzed tryptic and proteinase K digested peptides of fibrinogen by nanoLC-ESI-ion trap-MS and found O-glycosylated peptides of the Aα-subunit carrying the sialyl-T-antigen (GalNAc-Gal-NeuAc). These O-glycopeptides were proposed as markers for urinary tract infection [43].

Phosphorylation of the Aα-subunit has been described at several sites, of which Ser-3 and Ser-345 are the most dominant [44,45,46,47,48,49]. In ovarian cancer patients, elevated levels of the phosphorylated Aα-subunit were observed and quantified by several analytical techniques [50].

Two frequent genetic variants (single nucleotide polymorphisms, SNPs) of fibrinogen are known: rs6050 leads to the amino acid exchange Thr312Ala on the Aα-subunit and rs4220 leads to the amino acid exchange Arg448Lys on the Bβ-subunit [40]. Both SNP were associated with increased risk for vascular diseases [51,52,53,54,55,56].

The method based on mass spectrometry published by us provides quantification of glycosylation, phosphorylation, and of SNPs of fibrinogen within a few hours and involves a simple procedure and is thus well suited for clinical diagnostics [39]. To determine the glycosylation and phosphorylation patterns of fibrinogen and to relate those to pathological states, we analyzed samples from HCC patients, LC patients, and healthy donors.

Experimental section

Plasma samples

Plasma samples from healthy donors were provided by the Zentralinstitut für Transfusionsmedizin GmbH, Hamburg. Blood from HCC and LC patients was drawn and immediately centrifuged at 3500 rpm for 10 min. Supernatant plasma was frozen and stored at − 80 °C. Prior to the blood draw, informed consent was obtained and analysis of blood samples to detect new biomarkers for HCC was approved by the local Ethics Committee of the Medical Chamber Hamburg (PV3578).

Purification of fibrinogen

Fibrinogen was precipitated from 400 μL plasma by addition of cold ethanol to yield a final concentration of 8% (v/v). The sample was centrifuged at 5000 rpm, − 5 °C for 10 min. The pellet was washed with 8% ethanol (v/v) and dissolved in 100 μL water containing urea (6 M) and dithiothreitol (100 mM). The sample was incubated for 20–30 min at 37 °C for denaturation and reduction of disulfides and subsequently diluted 10-fold with water containing 0.1% (v/v) formic acid.

Liquid chromatography

Ten to 100 μL of the samples (depending on the amount of fibrinogen) was subjected to the LC-MS analysis (we usually started with 20 μL sample and used more in rare cases where the S/N of the mass spectrum was not sufficient; in case of overloaded detectors, we reduced the amount to 10 μL). Liquid chromatography was performed on a Dionex Ultimate 3000 HPLC with an Aeris Widepore XB-C8 (250 × 2.1 mm, particle size 3.6 μm) column. Samples were run at 60 °C and a flow rate of 250 μL/min. Solvent A was water containing 0.1% formic acid and solvent B was acetonitrile containing 0.1% formic acid. The gradient was as follows: 0 min, 5% B; 5 min, 5% B; 82 min, 60% B; 84 min, 90% B; 87 min, 90% B; 89 min, 5% B; 90 min, 5% B. Fibrinogen subunits Aα, Bβ, and γ elute at 39.3–41.4, 42.0–43.8, and 46.7–49.3 min, respectively.

Mass spectrometry

On-line detection was performed on an ESI-qTOF mass spectrometer (maXis, Bruker Daltonics) in positive ion mode. Calibration was performed with a tuning mixture (Tuning Mix ES-TOF, Agilent Technologies) in enhanced quadratic mode. The following acquisition parameters were applied: capillary voltage, 4500 V; end plate offset, 500 V; nebulizer pressure, 3.0 bar; dry gas flow, 9.0 L/min; dry gas temperature, 190 °C; quadrupole low mass, 1200 m/z; in source CID, 40 eV; mass range, 500–4000 m/z; spectra rate, 0.5 Hz.

Data processing

Chromatographic runs were analyzed in the area where the subunits (Aα, Bβ, and γ) eluted using the software Compass DataAnalysis (Bruker Daltonics). Spectra were averaged in each respective area and deconvoluted using the maximum entropy algorithm implemented in Compass DataAnalysis.

Quantification

Maximum entropy deconvoluted spectra were exported in “.xy” format and imported to MATLAB 2012a (The MathWorks, Inc.). The different protein isoforms were quantified by fitting the theoretical isotopic patterns to the experimental spectrum using a MATLAB script. For a detailed explanation of the quantification algorithm, see the Electronic Supplementary Material (ESM).

Statistical analysis

Statistical analyses were performed with MATLAB 2012a and SPSS Statistics 22 (IBM Corp.).

Results

HPLC analysis and mass spectrometry

Fibrinogen is predominantly synthesized in the liver. Therefore, we assumed that liver diseases should affect the PTMs of fibrinogen. The quantities and identity of the PTMs of fibrinogen were analyzed by mass spectrometry. To achieve this, we isolated fibrinogen from human blood plasma of healthy donors, HCC, and LC patients by precipitation with alcohol. Fibrinogen consists of three subunits, Aα, Bβ, and γ, that are glycosylated and/or phosphorylated. The protein subunits were separated by chromatography and analyzed as intact species by mass spectrometry to identify and quantify their PTMs.

The different isoforms were quantified by fitting the theoretical isotopic patterns to the experimental spectrum (Fig. 1). For a detailed explanation of the quantification procedure, see the ESM. The resulting percentages of each isoform of every subunit were statistically analyzed.

Fig. 1
figure 1

Comparison of the experimental, the calculated, and the difference curve with the associated glycan structures is shown; black, experimental deconvoluted mass spectrum; blue, calculated mass spectrum; gray, difference. A Aα-subunit from donor HE13; B Bβ-subunit from donor HE13; and C γ-subunit from donor HCC19. Mass spectrometry on the level of top down analysis can only determine composition of glycans. However, the body of literature available indicates the presence of the structures listed above

For each individual subunit, examples of spectra are shown in Fig. 2 for two healthy donors (HE13 and HE17), for two HCC patients (HCC11 and HCC19), and for two LC patients (LC09 and LC11) (for sample details, cf. ESM Table S1). Direct comparison of these spectra of each subunit already shows differences between the three patho-physiological states.

Fig. 2
figure 2

Mass spectra of the Aα- (left), Bβ- (middle), and γ-subunit (right) from human fibrinogen of six selected individuals are shown: two from healthy donors (a), (b); two from HCC patients (c), (d); and two from LC patients (e), (f). The Aα-subunits show three major signals from non-, mono-, and diphosphorylated species and three minor signals from corresponding O-glycosylated species. Duplication of all signals in the spectra (b), (d), and (f) is caused by a single nucleotide polymorphism (SNP) rs6050 resulting in amino acid exchange T312A resulting in a mass with 30 Da less. Comparing healthy, LC, and HCC patients, differences in the amount of phosphorylation are obvious from peak intensities. The Bβ- and γ-subunit each show signals from six glycoforms (structures in right hand panel). Duplicated signals in Bβ-spectra (a), (d), (e), and (f) are due to the SNP rs4220 resulting in amino acid exchange R448K. The abbreviations for these structures, e.g., G2, G1S1, and so on, can be found at the top of the dotted lines indicating each glycoform in the spectra of the Bβ- and γ-subunit. Additional signals arise from partial oxidation of methionines and loss of the C-terminal of glutamine. High amounts of fucosylation are observed in spectra (d) and (e). Loss of the C-terminal valine from the Aα-subunit is observed in spectrum (e)

Both the Bβ- and the γ-subunits from human fibrinogen are N-glycosylated with biantennary complex-type glycans [40, 41, 57, 58]. Each of the subunits shows six different glycan compositions whose tentative structures are shown in detail in the right hand panel of Fig. 2. Glycan structures cannot be determined by mass spectrometry alone, the structures shown are the most likely assumptions based on biological and previous analytical work. In spectra of the Bβ-subunit from HE13, HCC19, LC09, and LC11, all signals are duplicated due to the heterozygous SNP rs4220 on the FGB gene that causes the amino acid exchange R448K.

The two major glycoforms of the Bβ- and γ-subunit are S1 and S2. In samples of healthy individuals, like HE12 and HE17, the signal of the monosialylated glycoform S1 is larger than that of S2, but in many of spectra from HCC and LC patients the signals of the disialylated glycoform S2 becomes dominant, like in HCC19 and LC09. This indicates that sialylation is increased in HCC and LC patients.

Also, the degree of fucosylation varies between individuals. The Bβ- and the γ-subunit of HCC19 and LC09 both show large amounts of the fucosylated glycoform S1F, while the other spectra only show a minor signal of this glycoform (cf. below for a statistical analysis). However, glycoform S2F is only found in LC09 in considerable amounts.

Furthermore, we observed changes in the Aα-subunit. Three intensive signals on the left hand side of the spectra originate from non-, mono-, and diphosphorylated species. On the right hand side, three smaller signals are observed, arising from the corresponding O-glycosylated glycoprotein species [59]. The amount of monophosphorylation and diphosphorylation is drastically altered comparing the spectra in HCC11 and in LC11.

It has to be mentioned though that phosphorylation is constantly added and removed in a living organism, changes in phosphorylation may thus alternatively reflect a physiological condition unrelated to liver disease.

Furthermore, sulfation accounts for nearly the same mass difference like phosphorylation. However, since phosphorylation is extensively described for the Aα-subunit of human fibrinogen, we assume phosphorylation to be responsible for the signals observed.

In spectra of the Aα-subunit from HE17, HCC19, and LC11, all signals are duplicated due to the amino acid exchange T312A that is caused by the heterozygous SNP rs6050 on the FGA gene.

To verify the identity of these post-translational modifications, we also performed other experiments. We digested the isolated fibrinogen with sialidase or phosphatese, in order to observe the expected mass shift in the mass spectra (see the supplementary material of [39]).

We also perfomed various LC-MS/MS experiments on the glycopeptides of fibrinogen to analyze the N-glycan structure [39]. The mono- and disialylated glycopeptides (36VDKDLQSLEDILHQVEN(Glycan)K53) from the γ-subunit of human fibrinogen were identified by their fragmentation pattern in the MS/MS spectra. In healthy samples, we found 34.5% of the disialylated glycopeptide, which is in good agreement with the data from Adamczyk et al. [41].

We also found the phosphopeptides 1ADS(PO4)GEGDFLAEGGGVR16 and 335PGSTGTWNPGS(PO4)SER348 with high intensity. This is in agreement with literature data that phosphorylation sites Ser-3 and/or Ser-345 are occupied in the mono- and diphosphorylated species of the Aα-subunit. We also found minor signals of the phosphorylated peptides at Ser-272, Thr-393, and Thr-486. Phosphorylation at these sites could give rise to the minor amount of the triphosphorylated Aα-subunit that is observed in the mass spectra of the intact protein. Signals of O-glycosylated peptides (GalNAc-Gal-NeuAc) of the Aα-subunit were observed as well, but not further analyzed due to low intensity.

Quantitative analysis of post-translational modifications

In total, we analyzed 111 samples from which 30 were from healthy donors, 42 were from HCC patients, 33 were from LC patients, and 6 from patients with other unrelated diseases. Single factor analyses of variance (ANOVA) and Tukey’s post hoc tests were performed for all 32 quantified protein species to determine differences between healthy donors, HCC, and LC patients. The quantity of the corresponding isoform was used for the multivariate analyses if p < 0.05. Boxplots of the 12 dominant proteoforms of the Aα-, Bβ-, and γ-subunit are shown in Fig. 3. The corresponding p values are given in Table 1.

Fig. 3
figure 3

Boxplots for the percentages of the 12 dominant phospho- and glycoforms (Aα-no PO4, Aα-1 × PO4, Aα-2 × PO4, Aα-O-glycosylated, Bβ-S1, Bβ-S1F, Bβ-S2, Bβ-S2F, γ-S1, γ-S1F, γ-S2, and γ-S2F) for healthy donors, HCC, and LC patients. The red line indicates the median; the blue box shows the interquartile range. Corresponding p values of Tukey’s post hoc tests are given in Table 1. The fucosylated glycoforms γ-S1F and Bβ-S1F are significantly increased in HCC and LC patients compared to healthy donors. The monosialylated glycoform γ-S1 is significantly decreased in HCC patients compared to healthy donors; LC patients show a wide scatter. The complete set of boxplots can be found in the ESM

Table 1 p values of Tukey’s post hoc tests for the 12 dominant phospho- and glycoforms of human fibrinogen. Significant p values (< 0.05) are shown in italics

The results revealed that glycosylation and phosphorylation change in a complex pattern due to the physiological state of each individual. Four general patterns of changed post-translational modifications were observed:

  • First: Phosphorylation

    1. (a)

      The non-phosphorylated state of the Aα-subunit is found more often in HCC patients compared to healthy donors (p = 0.007) while diphosphorylation is decreased (p = 0.026) indicating decreased phosphorylation in liver cancer.

    2. (b)

      In LC patients, however, the monophosphorylated state is significantly increased compared to healthy donors and HCC patients (p < 0.001).

  • Second: O-glycosylation

O-Glycosylation is decreased in LC patients compared to healthy donors and HCC patients (p < 0.001).

  • Third: Sialylation of N-glycans

For both the Bβ- and the γ-subunit, the amount of the monosialylated glycoforms S1 is decreased in HCC samples compared to healthy samples (p = 0.019 and 0.001, respectively). In agreement with this observation, glycoform S2 is increased. LC patients show an extremely broad distribution of the quantities of the S1 form in both subunits.

  • Fourth: Fucosylation of N-glycans

The fucosylated glycoforms S1F of the Bβ-subunit and of the γ-subunit are significantly increased in HCC and LC samples compared to healthy samples with LC samples showing the highest increase for both subunits (p < 0.007). This fact is similarly observed for the glycoform S2F for Bβ and y subunits.

Interestingly, elevated levels of the phosphorylated Aα-subunit were also observed in ovarian cancer patients [50].

In agreement with data on other glycoproteins, we find an enhanced fucosylation as the S1F glycan in Aα- and Bβ-subunits. Similarly S2F is also found at elevated levels during HCC on the two subunits [27, 30, 33].

We performed a principal component analysis (PCA) using the quantification of 12 protein species differing in phosphorylation and glycans (cf. Fig. 3) as variables. In terms of separating the three groups, we found that the first and the third dimensions of the PCA are much more important than the second (and others). Therefore, we show only a two-dimensional plot of the first versus the third principal component in Fig. 4. The separation of the healthy samples from the rest is only achievable if the third dimension is considered. Importantly, data from healthy individuals have high values in principal component 1 (> − 0.03) and high values in principal component 3 (> − 0.02). Data points from HCC and LC patients are scattered over a much larger area than those from healthy individuals. In the PCA analysis, we can describe a region where all healthy individuals are very closely spaced found together. Of all of the 105 samples, 30 were originating from healthy individuals and the remaining 75 samples from either HCC or LC patients. We find seven HCC patients and two LC patients as false negatives in the region of healthy individuals. Thus, 88% of the sick individuals are correctly classified. This leads to a sensitivity of 88% and a specificity of 100%.

Fig. 4
figure 4

Plot of the first and third principal component of the principal component analysis using the quantities of 12 phospho- and glycoforms as variables. Green diamonds, healthy donors; blue circles, HCC patients; black crosses, LC patients. Data points of healthy donors are clustered in the upper right part of the plot. Data points of HCC and LC patients are each scattered over a larger area. Seven data points of HCC patients and two data points of cirrhosis patients are located in the area of healthy individuals. No healthy person is classified as having liver disease

PCA does not use information about the groups that are to be separated. In contrast, multivariate analysis of variance (MANOVA) calculates a set of canonical variables to give the best separation between the target groups, here HCC, LC, and healthy, as each sample is initially assigned to a target group. We performed a MANOVA using the same 12 variables as before; the first and second canonical variables are plotted in Fig. 5. Data points of healthy donors are located in the lower left area of the plot, data points of HCC patients are located in the central upper part of the plot, and data points of LC patients are located in the lower right part of the plot. Though the respective areas do overlap, there is a clear separation between the three groups.

Fig. 5
figure 5

Alternative analysis of the data by MANOVA. Shown is a plot of the first and second canonical variables from the MANOVA using the 12 phospho- and glycoforms as variables. Green diamonds, healthy donors; blue circles, HCC patients; black crosses, cirrhosis (LC) patients. Data points of healthy donors are clustered in the lower left part of the plot. Six data points of HCC patients are also located in the area of healthy donors. Two additional data points of HCC patients and one of a LC patient are lying on the boundary. A trend in separation of data points from HCC and LC patients is visible with five patients with LC lying the area of HCC patients and three HCC patients locate in the area of LC patients

If we discriminate only between healthy and LC or HCC patients, the MANOVA analysis misclassified no healthy individual as sick but seven HCC patients and one LC patient were classified as healthy. This leads to a specificity of 100% and the sensitivity of 92%.

Splitting data into training and test data sets

We also tested if a statistically valid model could be obtained by splitting the data into two independent groups. For that, we used 56 samples obtained chronologically first as a training data set, locked the MANOVA parameters obtained, and subsequently used the next 55 samples as validation data set. Coordinates of the second set of 55 samples were calculated using the coefficients from the MANOVA analysis of the first 56 samples. The results are shown in Fig. 7. Seven out of ten samples from healthy donors are correctly classified. Two more are found close to the border between healthy and HCC patients. The remaining sample is clearly misassigned in the area of HCC patients. In the validation dataset, separation between samples of HCC and LC patients is not convincing (cf. Fig. 7 bottom panel).

One patient diagnosed with HCC (V24) is located in the area of healthy individuals and is thus a false negative call. We obtained two further samples from this patient at the later stage of disease progression (V32 and V29, cf. ESM Table S5). The samples V32 and V29 classify in the LC and HCC region, respectively.

In summary, this analysis shows that we are not able to derive a statistically robust model yet. We strongly believe that a much larger sample size for training will overcome this problem. One might also include other parameter and/or glycoproteins in this analysis to stabilize the results.

Discussion

We analyzed the quantities of the various phosphorylated and O-glycosylated species of the Aα-subunit and the quantities of N-type glycoforms of the Bβ- and of the γ-subunit of human fibrinogen. The intact subunits of fibrinogen were isolated from human plasma by ethanol precipitation followed by reversed phase liquid chromatography coupled with mass spectrometry, which was used for quantification. We showed recently that phospho- and glycoforms can be very well quantified from mass spectra of intact protein subunits as the ionization of proteins is essentially dependent on the number of lysine and arginine residues [39, 60]. We developed the method utilized here to quantify phosphorylation and glycosylation species of Aα-, Bβ-, and γ-subunits of fibrinogen in healthy individuals [39]. Now we have extended this study to patients with HCC and with LC. Glycosylation of glycoproteins has been known for decades to be altered if a tumor is present in the body [6,7,8,9,10,11]. To achieve a high degree of specificity, we analyzed a glycoprotein that is predominantly synthesized in the liver, i.e., fibrinogen, to determine its PTMs as a function of the patho-physiological state. We hypothesized that glycoproteins, whose biosynthesis is predominantly in a given organ, should show the largest effect of altered glycosylation and/or phosphorylation due to a tumor present in that organ. It has to be remembered though that in initial cancer stages only a small amount of cells are transformed and thus only a small portion of the glycoproteins is altered. Additionally, almost all hepatocellular carcinoma cases follow a period of liver cirrhosis and in 81% of HCC patients LC is also present. Thus, only very few cases with only HCC and no LC were present in the data set.

We analyzed 111 samples, of which 30 originate from healthy donors, 42 from HCC patients, 33 from LC patients, and 6 samples from patients with other diseases. Statistical analyses for each individual PTM by ANOVA revealed significant differences between healthy individuals on one hand and HCC or LC patients on the other for most PTMs of all subunits. The data was analyzed by (i) an unbiased protocol, i.e., PCA, where the algorithm does not know about the classification of the patient, and (ii) by MANOVA where the information about the disease status and thus membership of a target group is included in the algorithm.

We observed an increase of fucosylated and sialylated glycoforms of the fibrinogen Bβ- and γ-subunits in samples from HCC or LC patients compared to those from healthy individuals. This is in agreement with many publications that in cancer, glycoproteins from plasma often show an increased fucosylation and sialylation [9,10,11, 25,26,27, 29, 30, 33]. In patients with HCC, we find an increase of non-phosphorylated species and correspondingly a decrease of diphosphorylation compared to healthy individuals and to LC patients (cf. Fig. 3). In LC patients, we find a significant enhancement of monophosphorylation. We did not perform site-specific analysis of phosphorylation which might change from healthy to diseased status. It is important to note that single protein species cannot be used as a specific biomarker because the distributions of the quantities of healthy individuals, HCC, or LC patients are too broad. However, a combined analysis of multiple changes can give an improved performance for differentiation of samples from healthy and sick individuals. Here, we selected 12 protein species differing in phosphorylation and glycosylation to analyze correlations between PTMs and patho-physiological state.

We used two different statistical analyses to group the data, i.e., PCA and MANOVA. The PCA as an unbiased method already separates the healthy individuals from patients with relevant chronic liver disease (LC and/or HCC) when using PCA dimensions 1 and 3. As a result, the group of healthy individuals includes seven HCC cases and two LC cases (cf. Fig. 4). No healthy individual is found in the area of the LC or HCC cases.

An even better separation was found when using the MANOVA algorithm to separate the three patho-physiological states, i.e., healthy, HCC, and LC. MANOVA recasts the coordinates of the quantifications of the individual protein species into a new coordinate system such that the best separation of the target groups is obtained. MANOVA uses information about the disease state of the individuals.

We used all dominant PTMs that have at least in one group a significant amount present in the plasma samples to allow a precise quantification. We did not consider PTMs that occur only in very small quantities because of the inherent systematic error associated with quantification of components present in very small amounts (cf. ESM).

Plotting of the first and second canonical variables revealed a clear separation between healthy and HCC/LC samples (cf. Fig. 5). However, the separation of LC from HCC samples was not robust enough. MANOVA is superior for the problem to assign a patho-physiological state from the quantification of PTMs over many other statistical techniques including PCA and various clustering protocols that also give a separation of the three groups but are not as convincing. It should be noted that the lower quality of the HCC/LC separation is probably due to the fact that almost all HCC patients are also diagnosed with LC.

Calculation of receiver operating characteristic requires a discrimination parameter whose threshold is varied; therefore, discriminant analysis was performed on the first and second canonical variables, resulting in a posteriori probabilities for each sample. Receiver operating characteristics were calculated for the separation of healthy individuals versus patients with relevant chronic liver disease (HCC and LC) based on the MANOVA with all 105 samples and 12 variables (cf. Fig. 6). The area-under-the-curve value is 0.976.

Fig. 6
figure 6

Receiver operating characteristics of the dataset of 105 individuals with 30 classified as healthy and 75 as having liver disease, i.e., HCC or LC

If the portion of the liver with the tumor tissue is producing a specific glycosylation pattern, the remaining healthy part of the liver may or may not produce the correct glycosylation. So far it is not clear if the amount of PTMs changes only in the tumor cells or if the global glycosylation is altered by signals from the tumor cells. A diagnostic test must be able to detect small changes to indicate a tumor. Proliferation and also biosynthesis in tumor cells is higher than in healthy tissue and thus in any case one expects proportionally more proteins in plasma that are synthesized by tumor cells.

To test for statistical validity of the analysis, the data set was separated into two sets, the samples obtained chronologically first and the subsequent samples. Here, it turned out that the statistical model is not robust enough yet and in the current version cannot be used to discriminate between healthy and liver disease (Fig. 7).

Fig. 7
figure 7

Plot of the first and second canonical variables from the MANOVA analysis for all 56 samples of the training data set (12 variables) and the validation data set with 55 samples. The quantities of 12 glycosylation species of fibrinogen of 56 patients lead to the definition of the 2D-MANOVA plot (training data set; top panel). The location of healthy individuals, HCC, and LC patients are shown in green, blue, and black, respectively. Samples of healthy donors are clearly separated from samples of HCC and LC patients as indicated by the black line. Separation between HCC and LC samples is not as robust (dotted line). It becomes clear from the bottom panel that the validation data set is not adequately classified by this MANOVA analysis

In order to improve the robustness of the diagnosis of HCC or LC by mass spectrometric analysis of post-translational modifications of fibrinogen, one would need to acquire a much larger sample set with accompanying information about the size of the tumor, a large enough sample set for liver cirrhosis only patients, and a large enough sample set for HCC patients without liver cirrhosis to be able to clearly distinguish these cases. Other metadata from a large sample size, like age, sex, and information about ethnic origin of the individuals, would also contribute to stabilizing the analysis.

A number of studies that aim at identifying HCC from changed fucosylation in glycoproteins have been published. Yin et al. analyzed the core-fucosylation of serum proteins in alcohol-related HCC and LC samples on the basis of isolated glycopeptides whose fucosylation was quantified [32, 33]. Core-fucosylated glycopeptides from several proteins are increased in alcohol-related HCC including fibronectin, cadherin-5, and ceruloplasmin. The best statistical value showed fibronectin at site 1007 with an AUC of 0.89, 93% sensitivity, and 86% specificity. However, they could not distinguish other cases of HCC.

Tanabe et al. analyzed aleuria aurantia lectin enriched glycopeptides from human serum by LC-ESI-Q-TOF [27]. Among 30,000 detected glycopeptides, they identified 12 HCC marker candidates with AUC values of 0.88–0.91. One of the best candidates was a multifucosylated glycopeptide from alpha-1-acid glycoprotein with 83% sensitivity and 79% specificity. Therefore, 19 glycopeptides identified from alpha-1-acid glycoprotein were used for O-PLS discriminant analysis resulting in partial separation of samples from HCC, HCV, HBV patients and healthy individuals.

We present here the first use of mass spectrometry of intact proteins to quantify the protein species in order to detect cancer in a facile and speedy analysis. Compared to other studies that tried to develop HCC biomarkers, we are using a multivariate approach that uses the quantities of 12 protein species combined to differentiate the pathological cases. We achieve an AUC of 0.976 for the separation of sick patient, i.e., HCC and LC, from healthy donors.

The method presented here is very fast: the time from taking the blood plasma to the resultant quantification of the PTMs and their interpretation takes less than 3 h, with an instrument time of about 90 min required on the mass spectrometer resulting in marginal instrument cost per sample.

Classification of patients with other diseases

We also analyzed six samples from patients with other diseases (cf. ESM Fig. S8). Three patients (V45, V25, and V54) correctly classify as healthy with respect to HCC/LC. One of these patients has coronary heart disease (V45) and the other patient has a very small liver adenoma (V25) with very low risk of transformation. The third individual has Hashimoto thyroiditis (V54). Another patient with angiomyolipoma of the liver (V20) is clinically diagnosed as HCC/LC patient and correctly classifies in the HCC/LC region. A patient with adenoma of the liver (V26) is located at the boundary between healthy and HCC/LC. The sample of only one patient has to be considered as clearly false positive. That patient has pancreatitis (V41) and classifies in the area of HCC/LC. It is possible that inflammation of organs, such as pancreatitis, change the glycosylation pattern such that the classification is the same as for HCC/LC patients. To this regard, further research with more samples is required.

Analysis of fibrinogen from patients at different time points

We performed an analysis of five patients during the progression and therapy of the disease to check if a consistent allocation to a group could be obtained. In fact for most patients only small variation in the MANOVA plot was obtained (Fig. 8 and Table 2) and only one patient moved from the border between healthy and HCC to a clear HCC assignment. All others stayed in the HCC area even though an anti-tumor therapy had been started.

Fig. 8
figure 8

Shown is the change of the location in the MANOVA plot for samples from patients who donated blood multiple times at different stages of disease (cf. Table 2)

Table 2 Information of the samples from patients who donated blood multiple times at different stages of disease

Conclusion

Our method provides quantitative information of the protein species of human fibrinogen and is remarkably fast and facile. Fibrinogen is precipitated from raw plasma by cold ethanol, reduced/denatured and automatically analyzed within 3 h. This technology may prove an easy way to determine pathological states by analysis of PTMs of glycoproteins from plasma samples in a few hours. We believe that a significantly larger set of samples including many samples from other diseases might stabilize the recognition of healthy vs. chronic organ damage, e.g., HCC and/or LC.