Introduction

Laryngeal squamous cell carcinoma (LSCC) is one of the commonest primary head and neck malignancy, representing approximately 1% of all malignancies. LSCC include three subtypes: supraglottic, glottic and subglottic carcinoma. A history of gradual development of hoarseness, sore throat, dysphagia and odynophagia are common presenting symptoms. Hoarseness is produced early in glottic cancer but is a late finding in supraglottic and subglottic cancer. Sore throat, dysphagia and odynophagia also are symptoms of chronic pharyngitis, which are often overlooked. In addition, the last two places are often hidden from physical examination. Therefore, tumors in these places are often discovered in advanced stages. Multiple studies have shown that increasing T and N stage correlate with lower survival. LSCC patients diagnosed in T1N0M0 stage have a 5-year survival at a rate of 90–95%, and T2N0M0 cancer at a rate of 80–90%, in the advanced stage T3-T4N0M0 the 5-year survival is decreased to 70%, but with the presence of nodal metastasis T1-T4N1–3M0 the 5-year survival is lower than 50% (Shah et al. 1997). The management of patients with early stage carcinoma of the larynx is highly successful in terms of survival, organ preservation, and treatment related morbidity with larynx-sparing surgery. However, survival rates and quality of life as determined by the quality of (larynx) function preservation are invariably poor in those patients with stage III, IV disease who were usually treated with near-total, total or extended laryngectomy, and synchronous neck dissection. Also majority of T3 and T4 tumors would be treated with postoperative radiotherapy. So detection of LSCC at early disease stages is paramount to successful clinical therapy.

In patients with LSCC, particularly of the supraglottis, management of the regional lymphatics is a crucial component of the overall treatment plan. If metastases to the cervical lymph nodes are clinically evident at diagnosis, treatment of the neck is mandatory. The situation is more controversial when the laryngeal cancer patient does not manifest clinical signs of neck disease (N0 patient). The incidence of occult lymph node metastases varies according to the site and stage of the primary laryngeal tumor, being higher in supraglottic and advanced (T3–T4) cancers, but 20–30% when all sites and stages are combined. A policy of elective neck treatment in these N0 patients will therefore result in overtreatment of almost 70–80% of cases, with unnecessary morbidity. In this situation, the physician must decide whether to perform elective treatment on the cervical lymph nodes or simply wait for the metastases to develop and then treat the patient when and if they occur. This issue, however, has yet to be resolved (Sarno et al. 2004). So, metastasis-associated serum biomarkers for LSCC are also urgently needed.

A major obstacle to screen for a diagnostic biomarker is the tremendous molecular heterogeneity that exists for nearly all human cancers, suggesting that simultaneous screening of multiple biomarkers will be required to improve the early detection/diagnosis of cancer. Since proteins are, for the most part, the mediators of a cell’s function, the study of the changes in proteins that result from a pathological lesion, such as complex multistep malignant process, would appear to be a rich source of potential cancer. Traditionally, proteomic research has involved two-dimensional gel electrophoresis to detect differences in comparative protein expression profiles between the healthy and the disease group (Srinivas et al. 2001; Adam et al. 2001). Although two-dimensional gel electrophoresis has been the “gold standard” proteomic method, it is usually difficult to resolve proteins with extremes in molecular weight, hydrophobicity and isoelectric points, is labor intensive and is not easily applied in the clinical setting. A new proteomic approach for the detection of cancer, which is called surface enhanced laser desorption/ionization time-of-flight mass spectrometry and ProteinChip technology, has been developed. The ProteinChip Biology System uses SELDI to retain proteins on a solid-phase surface that are subsequently ionized and detected by TOF MS (Von Eggeling et al. 2001). This instrumentation is presently being used for projects ranging from identification of disease biomarkers to study of biomolecular interactions. One of the key features of SELDI-TOF MS is its ability to provide a rapid protein expression profile from a variety of biological and clinical samples. SELDI-TOF MS coupled with bioinformatic approach has successfully found new biomarkers and achieved high sensitivity and specificity for the diagnosis of cancers of head and neck (Wadsworth et al. 2004; Soltys et al. 2004), bladder (Vlahou et al. 2001; Zhang et al. 2004), prostate (Adam et al. 2002; Cazares et al. 2002), ovary (Petricoin et al. 2002; Vlahou et al. 2003a), breast (Li et al. 2002; Vlahou et al. 2003b), liver (Poon et al. 2003), lung (Xiao et al. 2003; Zhukov et al. 2003), pancreas(Yu et al. 2005). Proteomic studies also were done on the progression and metastasis of many cancers (Schwegler et al. 2005;Gretzer et al. 2004). However, there are little published data on the use of this technique coupled with decision tree algorithm in studies on LSCC and lymph node metastasis. This work was aimed to discover the novel biomarkers based on their significant contribution to the optimal separation of stages I–II LSCC patients versus the healthy controls. The effectiveness of the selected biomarkers was then tested using independent data from stages III–IV LSCC patients and through Biomarker Pattern Software. The study is also designed to identify specific protein profiles to lymph node metastasis.

Materials and methods

Serum samples

A total of 142 serum samples from LSCC patients were obtained with informed consent at Fudan University affiliated Eye, Ear, Nose and Throat hospital. LSCC patients were at different clinical stages (according to UICC 1997). Diagnoses were pathologically confirmed, and none received chemotherapy or radiation therapy. One hundred and ten serum samples of healthy controls were collected by us in Fudan University affliated Zhongshan Hospital with informed consent. Demographics of the LSCC patients and controls are provided in Table 1. Specimens were obtained before treatment. All samples were aliquoted and stored at −80°C without a freeze-thaw until use.

Table 1 Demographics of the LSCC patients, premalignant lesions and control groups

SELDI processing of serum samples

Serum samples were applied on the weak cationic exchange (WCX2) chip surfaces (balanced two times with 50 mMNaAc pH 4.0). In brief, 3 μL of serum was mixed with 6 μL 8M urea in 1% 3-[(3-cholamidopropyl)-dimethylammonio]-1-propanesulfonic acid (CHAPS) for 30 min vortexed on ice, followed by the addition of 108 μL 50 mMNaAc pH 4.0. A volume of 100 μL diluted samples was then applied onto the chips using bioprocessor. Following a 60-min incubation, nonspecifically bound molecules were removed by two brief washes in binding buffer (50 mMNaAc pH 4.0) followed by one wash with HPLC-gradient H2O. A saturated solution of sinapinic acid (SPA) in 50% acetonitrile and 0.5% (v/v) trifluoroacetic acid was applied to the chip array surface (1 μL one time) and mass spectrometry was performed using a SELDI mass spectrometer (PBSII-C Ciphergen Biosystems Inc). Protein data were collected by averaging a total of 192 laser shots at laser intensity185, sensitivity 8 in a positive mode. The protein masses were calibrated externally using purified peptide standards. Mass calibration was performed using the all-in-one peptide standard (Ciphergen Biosystems Inc). Intra-ProteinChip Array reproducibility was checked by spotting eight different aliquots of one sample on the same array. Inter-ProteinChip Array reproducibility was checked by spotting one given sample on every different array. The intra- and inter-ProteinChip Array coefficients of variation were assessed for all protein peaks of more than background according to the setting of detection. The means of intra- and inter-ProteinChip Array coefficient of variations were 10 and 25%, respectively.

Bioinformatics and biostatistics

The whole protein profiling spectra obtained from serum samples were first normalized with total ion current normalization, a feature of Ciphergen’s ProteinChip Software 3.2.0. Peak clustering was performed using the Biomarker Wizard software (Ciphergen Biosystems) with the following specific settings: signal/noise (first pass): 5, minimum peak threshold: 20%, mass error: 0.3%, and signal/noise (second pass): 2. The samples were analyzed for peaks only within the range of 2–30 kDa. To characterize protein peaks of potential interest, serum profiling of patients with LSCC and normal control were compared.

Peak mass and intensity were exported to an excel file, then transferred to BPS. The classification model was built up with BPS. A classification tree was set up to divide the training dataset into either the cancer group or the control group through multiple rounds of decision-making. When the dataset was first transferred to BPS, the dataset formed a “root node”. The software tried to find the best peak to separate this dataset into two “child nodes” based on peak intensity. To achieve this, the software would identify the best peak and set a peak intensity threshold. If the peak intensity of a blind sample was lower than or equal to the threshold, this peak would go to the left-side child node. Otherwise, the peak would go to the right-side child node. The process would go on for each child node until a blind sample entered a terminal node, either labeled as cancer or control. Peaks selected by the process to form the model were the ones that yielded the least classification error when these peaks were combined to be used. The double-blind sample dataset was used to challenge the model. Peaks from the blind dataset were selected with Biomarker Wizard feature of the Software, following the exact conditions under which peaks from the training dataset were selected. The peak intensities were then transferred to BPS, and each sample was identified as either control or cancer based on the model. The results were compared to clinical data for model evaluation. To characterize the protein peaks of potential interest, serum profiling of patients with LSCC and normal control were compared. Mean peak intensity of each protein was calculated and compared (nonparametric test) in each group of serum samples.

Statistical analysis

Sensitivity was calculated as the ratio of the number of correctly classified diseased samples to the total number of diseased samples. Specificity was calculated as the ratio of the number of negative samples correctly classified to the total number of true negative samples.

Results

Serum protein biomarkers of LSCC

Two hundred and fifty serum samples were assayed by SELDI mass spectrometry. One hundred and fifty four samples (65 controls and 89 LSCC at stages I–II) were randomly selected to form the training set and 98 samples (45 controls and 53 cancers at stages III–IV) to form the blinded test set for the algorithm. WCX2 proteinchip could effectively resolve low-mass (<100 kDa) protein peaks. Peak labeling was performed with Biomarker Wizard Software. Because the majority of the peaks detected were in the range from 2 to 30 kDa, we use the peaks in this mass range as predictors, a total of 98 protein clusters were generated. Comparing peak intensities of serum of patients suffering from LSCC with normal control, mean intensity differed significantly for 18 peaks; 5 of them were significantly higher in the group of patients with LSCC (group C), whereas the other 13 peaks were higher in the group of normal controls (group N). The mean value of amplitude of these peaks for the two groups of patients is given in Table 2. Their mass spectra and gray-scale/gel views are shown in Figs. 1, 2, and 3. Therefore, the variations that consistently differentiate these two different groups could be considered as potential disease biomarkers (Fig. 4).

Table 2 Protein peaks differentially expressed in lscc, premalignent disease and normal control serum detected on WAX2 chip
Fig. 1
figure 1

Spectra (top) and gel views (bottom) of the 4176 peaks (arrows) detected on WAX2 chip. The peak appears to be downregulated in the cancer (C1–C3) compared to the normal (N1–N3) groups

Fig. 2
figure 2

Spectra (top) and gel views (bottom) of the 3,160 peaks (arrows) detected on WAX chip. The peak appears to be downregulated in the cancer (C1–C3) compared to the normal (N1-N3) groups

Fig. 3
figure 3

Spectra (top) and gel views (bottom) of the 3,665 peaks (arrows) detected on WAX chip. The peak appears to be upregulated in the cancer (C1–C3) compared to the normal (N1–N3) groups

Fig. 4
figure 4

Spectra (top) and gel views (bottom) of the13,794 peaks (arrows) detected on WAX2 chip. The peak appears to be downregulated in the cancer with lymph node metastasis (CN1–CN3) compared to the cancer without lymph node metastasis (CN01–CN03) groups

Decision tree model of LSCC

The intensities of these peaks were transferred to BPS to build a classification model. Excitingly, only one peak (4,176 Da) detected on the WCX2 chip was selected by the BPS algorithm to discriminate cancer from the normal groups. Figure 3 is the decision tree that was generated from the learning set to classify the two groups. The peak has significantly different intensity levels between the cancer and normal controls. Based on the model, peak 4176 was the most important peak. If the intensity of peak 4176 was lower than or equal to 0.42, samples were allocated to terminal node 1, which was a class 0 (cancer) node. If the intensity of the peak 4176 was higher than 0.42, samples went to terminal node 2, which was a class 1 (normal) node. When all the samples reached the terminal nodes through this decision-making process, the model yielded a sensitivity of 86.52% (77/89) and a specificity of 84.62% (55/65) (shown in Fig. 5). Based on the results of the test set, we calculated the sensitivity of the SELDI protein biomarker in the detection of LSCC patients in III–IV stages. Ninety-eight serum samples (53 of cancer patients in III–IV stages and 45 of controls) were randomly selected to perform the double-blind test. The same set of peaks was selected, and their peak intensities were transferred to BPS. When the double-blind sample data set was used to challenge the model, it predicted a sensitivity of 84.91% (45/53), and a specificity of 82.22% (37/45). Classification tree analysis of the LSCC training and test sets are shown in Table 3.

Fig. 5
figure 5

BPS decision tree model. The decision tree detailed the decision-making procedure and sample distribution of the BPS model. Each ellipse is a node, which was labeled by a node number. How to name the node depends on whether the majority of samples in the node belong to control or patient

Table 3 Classification tree analysis of the LSCC training and test sets

Lymph node metastasis-associated protein biomarkers

To characterize protein peaks of metastasis potential, serum profiling of LSCC patients with lymph node metastasis (pathologically confirmed n = 30) and patients without metastasis (stage I n = 24 and stage II of glottic subtype n = 10) were compared. Fourteen potential biomarkers could differentiate LSCC patients with lymph node metastasis from patients without metastasis. The mean value of amplitude of these peaks for the two groups of patients is given in Table 4. The mass spectra and gray-scale/gel views are shown in Fig. 4.

Table 4 Protein peaks differentially expressed in lscc with metastasis versus lscc without metastasis detected on WAX2 chip

Discussion

Currently, there are no satisfactory screening and early diagnostic strategies for LSCC. Outpatient examination techniques including fibreoptic laryngoscope can identify laryngeal lesions but cannot provide any reliable information on the severity of the underlying dysplasia or the existence of an underlying malignancy. The diagnosis of primary laryngeal malignancy is therefore conventionally based on obtaining a biopsy, which is invasive, requires a general anesthetic, and may have a detrimental effect on the patient’s voice. Obtaining a representative biopsy from a suspicious lesion is often difficult and malignant lesions may be missed due to sampling error. Therefore, repeated or large biopsies need to be taken to ascertain correct diagnosis, which increases the suffering of patients. The diagnosis of recurrent laryngeal lesions is equally difficult, as the appearance of the vocal fold has already been altered following treatment (radiotherapy or surgery) making the identification of early recurrence a challenging task.

SELDI is a high throughput technique used to generate protein expression profiles which, in combination with bioinformatics tools to extract information for biomarker discovery, has been essential in identifying novel protein biomarkers. Using this approach, and comparing samples in stages I–II with normal control, we found that 18 protein peaks with high statistical significance were differentially expressed in LSCC compared with control sera. These results are promising in terms of identification of new biomarkers of LSCC using proteomic profiling. Proteomic studies of LSCC are still scarce. Recently, Xiao et al. (2004) reported that a proteomic panel consisting of 16 protein peaks yielded a sensitivity of 93.3% and specificity of 96.7% in distinguishing LSCC from healthy controls. This study was, however, based on only 33 tumor samples and had no blind test. In the present study, we examined 142 serum samples from LSCC and 110 from healthy controls using the SELDI technique with the WCX2 proteinchip. The classification tree was constructed to distinguish LSCC from healthy individuals using only one protein peaks at 4,176 Da as a marker. When the model was tested with the blinded test set, it yielded a sensitivity of 84.91%, and specificity of 82.22%. So the peak has prominent clinical significance and has great potential to be used in clinical diagnosis and detection. Compared with the study, it has to be emphasized that only five of our peaks (m/z 3,884, 5,243, 5,335, 5,905, 6,114 Da ) were also profiled by Xiao although the same ProteinChip (WCX2) was used. It is possible that different experimental procedures and different sample handling and preparation used in these two workshops influenced the results.

Protein profiles of 30 LSCC patients with lymph node metastasis were compared to that of 34 LSCC patients at stage I and a small part of stage II glottic subtype which has no lymph node metastasis. Results showed 14 differentially expressed metastasis-associated biomarkers of LSCC. These biomarkers will provide a potential diagnostic platform for identification of lymph node metastasis in LSCC patients. Due to the relatively fewer samples of LSCC patients with lymph node metastasis, our results require more samples to broaden and improve its diagnostic value. We hope that, with much larger dataset our results will be confirmed or improved, these biomarkers will provide reliable parameters to identify high risk patients to decide on the necessity of elective treatment and avoid the under or over-treatment of N0 neck. Till now, little study on serum proteomic changes about lymph node metastasis was reported. Wu et al. (2002) performed two-dimensional electrophoresis (2-DE) and SELDI ProteinChip technology to identify proteins differentially expressed in two head and neck squamous cell lines, UMSCC10A (from the primary tumor) and UMSCC10B (from a metastatic lymph node). The results showed that enolase-alpha, annexin-I and annexin-II might be the important molecules in head and neck cancer invasion and metastasis. The results also suggest an important complementary role for proteomics in identification of molecular abnormalities, important in cancer development and progression.

Efforts are under way to purify, identify, and characterize these protein/peptide biomarkers. Knowing their identities for the purpose of differential diagnosis is not required. However, knowing their exact identities will be essential for understanding what biological role these peptide/proteins may have in the oncogenesis and metastasis of LSCC, potentially leading to novel therapeutic targets. Furthermore, once these molecules are identified, they can be used to develop antibody-based diagnosis assays. Paradis et al. (2005) used this approach to identify new biomarkers of hepatocellular carcinoma in the sera of patients, and identified an 8,900 Da peak corresponding to part of the carboxyterminal fragment of vitronectin that may have relation with invasive process. Ultimately, it is hoped that continued identification and characterization of these differentially expressed proteins should enable investigators to develop novel diagnostic and therapeutic modalities in cancer research. However, this is just a preliminary study, it is clear that the technology must be additionally tested to ensure that high specificity persists as the test sets increase in size. If validated with more samples, in a planned multi-institutional investigation, this approach may provide an innovative test of significant benefit for clinicians treating LSCC.

There are also several potential technical limits of SELDI-TOF MS technique. Because of the differences that existed between different workshops and different investigators, clinical use at the moment should not be warranted. Potential improvement to this approach is urgently needed. Experimental procedures and quality control for the SELDI test should be well-standardized, optimal sample handling and preparation should be united among investigators and serum biomarkers should be validated in larger studies of samples worldwide (Diamandis 2003). It is equally important that there should be a good cooperation and data exchange between different clinical centers. Recently, SELDI-TOF MS has been successfully used at six separate institutions to reproducibly generate identical protein profiling spectra for quality control sera, and correctly distinguish healthy from prostate cancer subjects based on serum protein profiles (Semmes et al. 2005). So it is promising that this approach would be used in clinical analysis in the future.

In conclusion, SELDI-TOF, a new and highly effective proteomic discovery technology has already demonstrated its potential to enhance our understanding of many cancers and should lead to the successful clinical exploitation of new biomarkers. In addition, through the application of this technology, we identified protein expression changes in LSCC patients. This differential expression of proteins among cancer cells when compared to populations of normal and LSCC reveals the utility of SELDI in providing a protein profile of the changes that occur from normal to malignant and to lymph node metastasis.