Introduction

Lung cancer is one of the most malignant tumors with the highest incidence and mortality rate in the world [1]. Pathogenesis of lung cancer remains unclear and the current five-year overall survival rate for patients is only 10–15% mainly because of lack of available methods for early diagnosis [2]. In recent years, proteomics has become one of the most effective methods to study the pathogenesis, screen the early diagnostic markers and the therapeutic molecular targets of tumor. However, the proteomic study of lung cancer tissue, especially the comparative proteomic study between lung cancers and benign lung diseases (BLD), is rare due to the obvious heterogeneity of lung cancer tissue [3, 4]. Based on clinical and histopathological features, lung cancer comprises small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) which accounts for about 80%. To find helpful evidence for early diagnosis of NSCLC, we compared the differentially expressional protein profiles between NSCLCs and normal lungs and between NSCLCs and BLDs by surface-enhanced laser desorption ionization time-of-flight mass spectrometry (SELDI-TOF-MS). The biomarker patterns were finally established by a support vector machine (SVM) after separating and purifying the parenchyma cells from the tissues of NSCLCs, normal lungs, and BLDs by laser capture microdissection (LCM).

Materials and methods

Materials

All patients suffering from lung cancer or BLDs included in this study attended the Second Affiliated Hospital of Medical School of Xi’an Jiaotong University from May 2004 to October 2006. Informed consent was obtained from every subject before this study and the privacy of each subject was guaranteed according to ethical and legal practice. The average age of the 13 NSCLC patients [7 squamous cell cancers (SCC) and 6 adenocarcinomas (AD)] was 59.4 years old (45–69 years), including 9 males and 4 females. All patients did not undergo chemotherapy or radiotherapy before operation. Complete histopathological information for each case was available. The P-TNM stages (UICC, 2007) were that 2 cases were at stage Ia, 5 stage Ib, 3 stage IIa, 3 stage IIb. Distributions of tumoral differentiation were that 4 cases were well, 7 moderate, and 2 poor. The average age of 6 BLDs control was 56.3 years old (45–65 years), and 4 patients were males and 2 females. The diagnosis of these BLDs was pathologically confirmed as follows: 1 case was inflammatory pseudotumor, 1 mediastinal cyst, 1 bronchiogenic cyst, 1 tuberculoma, 1 hamartoma, and 1 benign pleural endothelioma. Lung tissues of the eleven normal cases were cut at least 5 cm away from the edge of the tumors. The patients averaged 59.9 years (from 45 to 69), and 8 were males and 3 females. Seven of them were matched to SCC and four to AD.

Preparation of tissue specimens

Tissues from lung tumors, BLDs, and normal lung tissues were immediately cut and washed and stored at −80°C after operation. One section was used to confirm diagnosis pathologically and identify cells of interest by routine hematoxylin and eosin (H&E). Then a serial of frozen section at 8 μm was stained by slightly improved H&E for LCM. Briefly, the procedures of slightly improved H&E were: staining with hematoxylin for 1 s; washing with distilled water for 10 s; staining with eosin for 10 s; dehydrating with 85%, 90%, 100% ethanol for 10 s, respectively; clearing with xylene I and xylene II for 2 min, respectively.

LCM

Cells of interest from normal bronchial and alveolar epithelia, tumoral and BLD tissues were microdissected from the frozen tissue sections using the PixCell. Laser Capture Microdissection Microscope (ARCTURUS Inc, American) basically as described by Bonner et al. [5]. The condition of LCM were laser beam diameter, 7.5 μm; duration time, 15.5 ms; and energy, 80 mW.

Cell sample preparation and protein chip array analysis

All LCM samples were thawed on ice for 30–60 min. Ten microliters of U9 buffer (9 M urea, 2% CHAPS, 1% DTT) was added to 5 μl of each LCM sample in a 96-well cell culture plate, which was then agitated on a platform shaker at 4°C for 30 min. Next, 185 μl of sodium acetate (100 mM, pH4) was added to the U9 and was further agitated on a platform shaker at 4°C for 2 min. IMAC3 chips were activated by adding 200 μl of sodium acetate and agitated for 5 min twice. Diluted samples (100 μl) were applied to each spot of the bioprocessor (Ciphergen Biosystems, Ciphergen Biosystem Inc, American) which contained the Protein Chip arrays. The bioprocessor was then sealed and agitated on a platform shaker at 4°C for 60 min. The chips were then washed three times with 200 μl of sodium acetate and another two times with deionized water. Finally, the chips were removed from the bioprocessor and air-dried at room temperature. Before SELDI-TOF-MS analysis, 1 μl of a saturated solution of SPA (Sigma Inc American) in 0.5 l/l CAN, and 5 ml/l TFA was applied onto each chip twice and the chips were again air-dried.

The chips were detected by the PBS-II plus mass spectrometer reader (PBS II+ SELDI-TOF-MS Ciphergen Biosystem Inc, American). Data were obtained by averaging 144 laser shots with an intensity of 185, a detector sensitivity of 7, a high mass of 100000 Da, and an optimized range of 2000–20000 Da. Mass accuracy was calibrated to <0.1% with the All-in-one peptide molecular mass standard (Ciphergen Biosystems).

Bioinformatics and statistics analysis

The total experimental data including through data preprocessing and model construction were arrived at the Zhejiang University Cancer institute’s Protein Chip Data Analysis System (ZUCIPDAS, www.zlzx.net) designed by Yu Jiekai. Firstly, the original data were denoised using the undecimated discrete wavelet transform (UDWT) method. Secondly, the spectra were subjected to baseline correction by aligning with a monotone local minimum curve and mass calibration (adjusting the intensity scale according to three labeled peaks that appeared in all the selected spectra). The proteomic peaks were detected and quantified by an algorithm which takes into account the maximal height of every denoised, baseline-corrected, and calibrated mass spectrum. Thirdly, the peaks were filtered to maintain an S/N of more than three. The S/N of a peak was the ratio of the height of the peak above the baseline to the wavelet-defined noise. Finally, to match peaks across spectra, we pooled the detected peaks whose relative difference in their mass sizes was not more than 0.3%. The minimal percentage of each peak, appearing in all the spectra, was specified to ten. The matched peaks across the spectra were defined as a peak cluster. If there was not a peak in a spectrum within a given cluster, the maximal height within the cluster would be assigned to its peak value. Normalization was performed only with the identified peak clusters.

The preprocessed data were used to establish models. In this experiment, we used a non-linear SVM classifier with a radial-based function kernel, and with a parameter Gamma of 0.6, and with a cost of the constraint violation of 19 to discriminate the different groups. The diagnostic model was evaluated and validated by LOOCV. The principle of validation is that in the approach one sample was taken out each time as the test set and the remaining samples were kept as the training set, and then the test was repeated until each sample has been taken once as a test sample.

Each peak in the experimental data was estimated by the P value of Wilcoxon test. The top ten peaks with the smallest P value were selected for further analysis. Combinations with the highest accuracy in distinguishing different groups of data were selected as potential biomarkers. The SVM model with the highest YI was selected as the model for diagnosing NSCLC.

Results

LCM

Fresh tissue specimens cut from lung tumors and BLDs and normal lungs were prepared (Fig. 1a). A section from NSCLC was stained with routine H&E (Fig. 1b). A Cryostat section 8 μm thickness was stained with slightly improved H&E for LCM (Fig. 1c, d). Cells of interest were collected on an LCM cap (Fig. 1f), while the interstitial tissues remained on section after LCM (Fig. 1e). Approximately 4000 laser hits to each LCM cap resulted in 12000 cells captured in NSCLCs, 3500 laser hits in 7000 cells in BLDs, and 3000 laser hits in 6000 cells in normal lung tissues. Each type of cell populations on LCM cap was estimated to be over 95% homogeneous as determined by microscopic visualization.

Fig. 1
figure 1

LCM process of No. 2 NSCLC tissue specimen: (a) Fresh sample after operation; (b) 10 μm thick section fixed in formalin and paraffin embedded and stained with routine H&E (20 × 10 folds); (c) 8 μm thick frozen section stained with improved H&E (20 × 10 folds) (d) LCM after improved H&E (20 × 10 folds); (e) Interstitial tissue remained on section after LCM (20 × 10 folds); (f) Interested cells collected on LCM cap (20 × 10 folds)

Analysis of the differential pattern

A total of 123 qualified M/Z peaks in an optimized range of 2000–20000 Da were profiled by an IMAC3 protein chip in NSCLCs and normal lungs, and the relative intensity of the peaks between NSCLCs and normal lungs was analyzed by the P value of Wilcoxon test. The results showed the relative intensity of 98 peaks had statistical difference (< 0.05). Among the 98 peaks, 43 were over-expressed in NSCLCs while being down-expressed in normal lungs. As opposed to the above, the remaining 55 were over-expressed in normal lungs while being down-expressed in NSCLCs. The top ten peaks (Table 1) with the smallest P value were randomly combined and then fed into SVM. The accuracy of each combination in distinguishing NSCLCs from normal lungs was analyzed, and the combination with the highest YI containing three M/Z peaks of 4282, 3201, and 4252 Da, respectively, was chosen as the diagnostic pattern. The three M/Z peaks were all over-expressed in normal lungs while being down-expressed in NSCLCs. When the diagnostic pattern was evaluated by LOOCV, the specificity, sensitivity, and positive predictive value (PPV) were 90.9%, 100.0%, and 92.9%, respectively (Fig. 2 and Table 2).

Table 1 The top ten peaks with the smallest P value in NSCLCs versus normal lungs
Fig. 2
figure 2

Representative spectra (a) and gel views (b) of the 4253 Da M/Z peaks in NSCLCs and normal lungs

Table 2 The validated result of the diagnostic pattern in test set evaluated by LOOCV

All 188 qualified M/Z peaks were found in NSCLCs and BLDs and the relative intensity of 54 peaks among them showed statistical difference (P < 0.05), 16 were over-expressed in NSCLCs while being down-expressed in BLDs and the remaining 38 over-expressed in BLDs while being down-expressed in NSCLCs. The diagnostic pattern with the highest YI comprised two M/Z peaks of 3204 and 3701 Da (Fig. 3 and Table 3). When the diagnostic pattern was validated by LOOCV, the specificity, sensitivity, and PPV were all 100%.

Fig. 3
figure 3

Representative spectra (a) and gel views (b) of the 3204 Da M/Z peaks in NSCLCs and benign lung diseases

Table 3 Diagnostic pattern with the highest YI consisting of 3204 and 3701 Da in NSCLCs versus BLDs

Discussion

At present, an urgent need in lung cancer study is to seek early diagnostic biomarkers. Traditional examinations, such as chest X-ray, sputum cytology, etc. cannot discover lung cancer immediately because there are no obvious symptoms and signs at its early stage [6]. Clinical studies show that existing lung cancer markers have shortcomings of low sensitivity and specificity [7]. SELDI-TOF-MS, a new and powerful proteomic technology, has advantages of high sensitivity throughout and can detect many biomarkers quickly and simultaneously. Indeed, the application of this technology has made a great progress in tumoral study [810]. However, the complex pathology and marked heterogeneity of lung cancer tissue have severely hampered proteomic study [11].

LCM, a revolutionary technology in tumoral study that emerged in late 1990 s, can exactly separate single cell or the same kind of cell populations from complex tissues and effectively solve the problem of tissue heterogeneity in experiments [12]. To our knowledge, so far only one published paper by Zhukov et al. [13] studied lung cancer tissue proteomics by applying LCM and SELDI technology. Regretfully, only 4 patients with lung cancer (1 SCC, 1 AD, 1 bronchioloalveolar carcinoma, and 1 mixed) were included in the work.

The SVM classification technique, firstly described by Vapnik, is a new mechanical learning method based on statistical theory. The SVM can solve problems such as the generalization of the medium and small samples in pattern recognition, pattern selection, over-fitting, etc. [14, 15]. The SVM is specifically used for the finite samples to get the optimal solution with available information other than the optimal solution with the sample number tending to be infinite in theory; the overall optimal point can be obtained to solve the local extremum problem which is unsolvable in the artificial neural network method [16]. LOOCV is utilized to determine the accuracy of the classifier. Independent test sets are used to further evaluate the accuracy of our proposed models. All these steps ensure that the selection of biomarkers is not influenced by systematic biases.

In this study, we slightly improved the H&E stain method for LCM. Clearly, stained cells appearing with complete structure and parenchyma and interstitial cells were easily recognized (Fig. 1c). The interested cells were successfully separated by LCM, and each type of cell populations in the three groups was evaluated to be over 95% homogeneous, which coincided with literature [13, 17].

A total of 123 qualified peaks were detected and the relative intensity of 98 peaks had statistical difference (< 0.05) between NSCLCs and normal lung tissues using SELDI-TOF-MS in this study. The combination with the highest YI including three peaks with 4282, 3201, and 4252 Da was chosen as the diagnostic model. When the diagnostic model was examined by LOOCV, the specificity, sensitivity, and PPV were 90.9%, 100.0%, and 92.9%, respectively. Similarly, among 188 qualified peaks determined between NSCLCs and BLDs, the relative intensity of 54 peaks showed significant difference (< 0.05). The diagnostic pattern consisting of two proteins, h M/Z 3204 and 3701 Da, yielded 100% in specificity, sensitivity, and PPV, respectively. The above results indicated that the SELDI patterns can effectively discriminate the NSCLCs from normal controls and BLDs, respectively, and have superb diagnostic power as compared with carcinoembryonic antigen (CEA), cytokeratin-19 fragments (Cyfra21-1), and so on [11].

In conclusion, LCM coupled with SELDI-TOF-MS applied in lung cancer research can effectively solve the problem of tissue heterogeneity and screen tumor biomarkers more accurately. It is possible that these candidate biomarkers will be applied in clinical practice in the future. However, the sample size in this study is still limited and more samples are being collected for further study.