Introduction

Since every cell or biological fluid is rich in proteins, an efficient method for achieving their separation and successive determination is necessary. This problem was partially solved by the development of two-dimensional electrophoretic separation, which is certainly the most widely used analytical method in proteomics. This technique allows the efficient separation of the protein content of a particular cell or biological fluid, producing a two-dimensional image of the proteins present in the sample under investigation. Two-dimensional polyacrylamide gel electrophoresis [1, 2] has a unique capacity to resolve complex mixtures, permitting the simultaneous analysis of hundreds or even thousands of proteins. The separation is achieved by two successive electrophoretic runs: the first (through a pH gradient) separates the proteins according to their isoelectric points, while the second (through a porosity gradient) separates them according to their molecular masses. The result is a two-dimensional map with spots (proteins) spread all over the gel surface.

2D-PAGE (from 2-Dimensional PolyAcrylamide Gel Electrophoresis) maps can therefore be used for both diagnostic and prognostic purposes: by investigating the differences between the 2D-PAGE gels of control and pathological individuals, it is possible to classify the patients accordingly or even to capture the evolution of the disease [37].

The problem of how best to compare maps belonging to different individuals thus becomes the fundamental issue in the application of this technique for diagnostic/prognostic purposes.

In the field of drug development, especially for cancer, the 2D-PAGE technique is also widely applied [8, 9]. The study of two-dimensional maps can provide useful information about the effectiveness of a drug treatment; that is, it can be performed to investigate whether the drug has had the expected effect on the protein contents of the pathological cells.

Unfortunately, the comparison of different 2D-PAGE maps is not a trivial process to perform [10, 11]. The main difficulty that arises during such comparisons is the high complexity of the specimen, which can result in maps with thousands of spots; this complexity is also increased by the complicated sample pretreatment, which is often characterized by many purification/extraction steps. These experimental steps may cause the appearance of spurious spots due to impurities in the final 2-D maps. Moreover, the differences between the treated and reference samples can be very small, thus complicating their identification in a real complex map.

In the classical approach, the comparison is performed by specific software, such as Melanie III or PD-Quest [12, 13]. In this case, each 2-D slab gel is analyzed by a densitometer that provides the optical density at each point on each map. The analysis performed by this software consists of the following different steps:

  • Spot detection: the identification of protein spots in the gel image.

  • Spot revelation: the software reveals the spots independently for each map.

  • Matching the maps: the 2-D maps are matched to reveal common features (spots present in all maps) and those that differ between maps.

This procedure is usually time-consuming and affected by the particular ability of the operator, which tends to determine the final quality of the results.

Here, the classification of 2D-PAGE maps was performed in a completely different way: by decomposing the map images in terms of Zernike moments and applying multivariate tools usually adopted in image analysis problems. The moments calculated are then coupled to multivariate classification techniques; here we use partial least squares discriminant analysis (PLS–DA). The procedure proposed here allows us to bypass all of the steps listed previously, in particular the critical action of aligning the maps, which is not necessary here since Zernike moments are invariant with respect to map translations.

The Zernike moments of the map images were calculated using software written in Visual Basic and developed in-house, and then PLS–DA [1418] together with variable selection procedures [1418] were applied to classify the samples considered. This procedure was applied to four different datasets (six cases were studied overall) to check the general validity of the procedure.

Theory

Moment functions have a broad spectrum of application in image analysis, such as for invariant pattern recognition, object classification, pose estimation, image coding and reconstruction. A set of moments computed from a digital image generally represents global characteristics of the image shape and provides a lot of information about the different types of geometrical features of the image. The ability of image moments to represent features has been widely exploited in object identification techniques in several areas of computer vision and robotics [1925]. Geometric moments were the first to be applied to images, as they are computationally very simple. As research into image processing has progressed, many new types of moment functions have been recently introduced, each with its own advantages for specific applications.

In this paper, complex Zernike moments have been implemented as feature descriptors for 2D-PAGE map classification. Zernike moments were first introduced by Teague [26] based on orthogonal functions called Zernike polynomials [27]. Though computationally very complex, compared to geometric and Legendre moments [2830], Zernike moments have been shown to provide superior feature representation and low noise sensitivity [31, 32]. Moreover, the orthogonal basis for Zernike moments means that a value of zero can be attained for the redundancy measure in a set of moment functions, such that these orthogonal moments correspond to independent characteristics of the image. In other words, moments with orthogonal basis functions can be used to represent the image with a set of mutually independent descriptors, yielding a minimum amount of information redundancy. Therefore, orthogonal moments are more robust than nonorthogonal moments in the presence of image noise.

For the specific application we are interested in, two of the various important features of Zernike moments, invariance to rotation and translations of the image, are particularly relevant.

Zernike moments

Zernike moments are based on orthogonal Zernike polynomials defined using the polar coordinates inside a unit circle. The two-dimensional Zernike moments of order p with repetition q for an image intensity function \(f{\left( {r,\vartheta } \right)}\) are defined as:

$$Z_{{pq}} = \frac{{p + 1}}{\pi }{\int_{\vartheta = 0}^{2\pi } {{\int_{r = 0}^1 {V^{*}_{{pq}} {\left( {r,\vartheta } \right)}f{\left( {r,\vartheta } \right)}r{\text{ d}}r{\text{ d}}\vartheta ,\quad \quad {\left| r \right|}} }} } \leqslant 1,$$

where the Zernike polynomials of order p with repetition q, \(V_{{pq}} {\left( {r,\vartheta } \right)}\), are defined as:

$$V_{{pq}} {\left( {r,\vartheta } \right)} = R_{{pq}} {\left( r \right)}{\text{e}}^{{iq\vartheta }} ,$$

and the real-value radial polynomial, \(R_{{pq}} {\left( r \right)}\), is given as follows:

$$R_{pq} \left( r \right) = \sum\limits_{k = 0}^{{{\left( {p - \left| q \right|} \right)} \mathord{\left/ {\vphantom {{\left( {p - \left| q \right|} \right)} 2}} \right. \kern-\nulldelimiterspace} 2}} {\left( { - 1} \right)^k \frac{{\left( {p - k} \right)!}}{{k!\left( {{{\left( {p + \left| q \right|} \right)} \mathord{\left/ {\vphantom {{\left( {p + \left| q \right|} \right)} {2 - k}}} \right. \kern-\nulldelimiterspace} {2 - k}}} \right)!\left( {{{\left( {p - \left| q \right|} \right)} \mathord{\left/ {\vphantom {{\left( {p - \left| q \right|} \right)} {2 - k}}} \right. \kern-\nulldelimiterspace} {2 - k}}} \right)!}}} r^{p - 2k} ,\quad \;0 \leqslant \left| \text{q} \right| \leqslant p\;\text{and}\;\text{p} - \left| q \right|\text{is}\;\text{even}$$

Since Zernike moments are defined in terms of polar coordinates \({\left( {r,\vartheta } \right)}\) with \({\left| r \right|} \leqslant 1\), their computation requires a linear transformation of the image coordinates (i, j) (with i, j = 0,1,2,...,N−1) to a suitable domain \({\left( {x,y} \right)} \in R^{2} \) inside a unit circle. In this way we can express the discrete approximation of the continuous integral of the moments as:

$$Z_{pq} = \frac{{2\left( {p + 1} \right)}}{{\pi \left( {N - 1} \right)^2 }}\sum\limits_{i = 0}^{N - 1} {\sum\limits_{j = 0}^{N - 1} {R_{pq} \left( {r_{ij} } \right)e^{ - iq\vartheta _{ij} } f\left( {i,j} \right)} } $$

where the general image coordinate transformation to the interior of the unit circle is given by:

$$r_{{ij}} = {\sqrt {x^{2}_{i} + y^{2}_{j} } },{\text{ }}\vartheta _{{ij}} = \tan ^{{ - 1}} {\left( {\frac{{y_{i} }}{{x_{i} }}} \right)},{\text{ where }}x_{i} = \frac{{{\sqrt 2 }}}{{N - 1}}i - \frac{1}{{{\sqrt 2 }}}{\text{, }}y_{j} = \frac{{{\sqrt 2 }}}{{N - 1}}j - \frac{1}{{{\sqrt 2 }}}$$

The image intensity function f (i, j) can be reconstructed from a finite number n of Zernike moments, using the following equation:

$$f\left( {r,\vartheta } \right) = \sum\limits_{p = 0}^n {\sum\limits_q {Z_{pq} } } V_{pq} \left( {r,\vartheta } \right),\text{ defined for: }\left| q \right| \leqslant p\text{ and }p - \left| q \right| = \text{even}$$

Zernike moments were calculated here by exploiting the so-called Q-recursive method, developed by Mukundan, Raveendran and Chong [33], which allows them to be calculated in a reduced computational time.

In this method, \(R_{{pq}} {\left( r \right)}\) (with p = q−4) is calculated by:

$$R_{{p(q - 4)}} {\left( r \right)} = H_{1} R_{{pp}} {\left( r \right)} + {\left( {H_{2} + \frac{{H_{3} }}{{r^{2} }}} \right)}R_{{p(q - 2)}} {\left( r \right)}$$

where:

$$R_{{pp}} = r^{p} $$
$$R_{{p{\left( {q - 2} \right)}}} {\left( r \right)} = pR_{{pp}} {\left( r \right)} - {\left( {p - 1} \right)}R_{{{\left( {p - 2} \right)}{\left( {p - 2} \right)}}} {\left( r \right)}$$
$$H_{1} = \frac{{q{\left( {q - 1} \right)}}}{2} - qH_{2} + \frac{{H_{3} {\left( {p + q + 2} \right)}{\left( {p - q} \right)}}}{8}$$
$$H_{2} = \frac{{H_{3} {\left( {p + q} \right)}{\left( {p - q + 2} \right)}}}{{4{\left( {q - 1} \right)}}} + {\left( {q - 2} \right)}$$
$$H_{3} = \frac{{ - 4{\left( {q - 2} \right)}{\left( {q - 3} \right)}}}{{{\left( {p + q - 2} \right)}{\left( {p - q + 4} \right)}}}$$

Partial least squares discriminant analysis (PLS–DA)

Partial least squares (PLS) [1418] is a multivariate regression method that allows the relationship between one or more dependent variables (Y) and a group of descriptors (X) to be established. The X and Y variables are modeled simultaneously to find the latent variables (LVs) in X that will predict the latent variables in Y. These latent variables (also called the PLS components) are similar to the principal components calculated from principal component analysis [14-18]: they are extracted such that each successive latent variable accounts for the largest possible amount of variation that is not accounted for by the previous variables (they are orthogonal to each other), in both the descriptor space (X) and the response space (Y). The LVs are computed hierarchically so that the last LVs are mostly responsible for random variations and experimental error.

The optimal number of LVs (i.e., a model that uses the information in X to predict the response Y while avoiding overfitting) is determined by the residual variance in prediction. Here, leave-one-out cross-validation is applied to evaluate the predictive ability and to select the optimal number of latent variables in the final model.

In the case where a large number of descriptors (X variables) are present or a large experimental error is expected, it can be quite difficult to obtain a final model with a suitable predictive ability. In these cases, some techniques for variable selection can be exploited. Here, two subsequent strategies were applied: an initial simplification of the model achieved by eliminating groups of nonsignificant X variables up to a maximum of 200 variables, based on the minimum error obtained in cross-validation; and a second phase where variables were eliminated one at a time to provide a final model with an overall minimum error in cross-validation.

PLS was created to model continuous responses, but it can also be applied for classification purposes by establishing an appropriate Y that is related to the membership of each sample to a class. When only two classes are present (for example control and treated samples), a binary Y variable is added to the dataset, which is coded so that −1 is attributed to one class (control samples) and +1 to the other one (treated samples). When more than two classes are present, the Y matrix contains one column for each class, and the sample is coded to +1 for the column corresponding to the class it belongs to, and −1 for the other classes.

The regression is then carried out between the X-block variables (Zernike moments) and the Y variables just established. This process of classification is called PLS discriminant analysis (PLS–DA).

Model evaluation

The coefficient of multiple determination, R 2, for PLS was calculated as:

$$R^{2} = 1 - \frac{{{\sum\limits_{i = 1}^n {{\left( {\ifmmode\expandafter\hat\else\expandafter\^\fi{y}_{i} - y_{i} } \right)}^{2} } }}}{{{\sum\limits_{i = 1}^n {{\left( {y_{i} - \ifmmode\expandafter\bar\else\expandafter\=\fi{y}} \right)}^{2} } }}}$$

where the two sums run on the samples used for calibration (R 2), or for validation (\(R^{2}_{{cv}} \)); \(\ifmmode\expandafter\hat\else\expandafter\^\fi{y}_{i} \) is the predicted value of the response for the i-th experiment; y i is the experimental predicted value for the response for the i-th experiment; \(\ifmmode\expandafter\bar\else\expandafter\=\fi{y}\) is the average response of the samples used for calibration (R 2), or for validation (\(R^{2}_{{cv}} \)).

The root mean square error (RMSE) is calculated as:

$$RMSE = {\sqrt {\frac{{{\sum\limits_{i = 1}^n {{\left( {\ifmmode\expandafter\hat\else\expandafter\^\fi{y}_{i} - y_{i} } \right)}^{2} } }}}{n}} }$$

where the sum runs on the samples used for calibration (RMSEC), and for validation (RMSECV).

In this case, the best model complexity in terms of both the number of X variables present and the number of latent variables in the model was selected by the minimum value of RMSECV (leave-one-out procedure).

Experimental

Datasets

The procedure was applied to six groups of 2D-PAGE maps belonging to four different pathologies: human lymphoma, neuroblastoma, human colon cancer, human pancreatic cancer. For lymphoma and neuroblastoma, the comparison involved:

  • Lymphoma: Four samples from the GRANTA519 cell line of human lymphoma (control) and four samples from the MAVER-1 cell line

  • Neuroblastoma: Four samples from control adrenal mouse glands (control) and four samples from adrenal mouse glands affected by neuroblastoma.

The other two cases under investigation were more complex:

  • Colon cancer exposed to a histone deacetylase (HDAC) inhibitor. Nuclei and total cell lysates were investigated from colon cancer cell line HCT116. The nuclei dataset comprised six control (diseased) and five samples treated with a HDAC inhibitor. The lysates dataset instead comprised five control and five HDAC inhibitor-treated samples.

  • Pancreatic cancer. Two human pancreatic cancer cell lines were investigated: the PACA44 and T3M4 cell lines, both treated or untreated with trichostatin A. For the PACA 44 cell line, four control and four drug-treated samples were investigated, while for the T3M4 cell line the dataset consisted of five control and five drug-treated samples.

The experimental protocols followed in order to obtain the several 2D-PAGE maps used in this study are not reported here since they are described elsewhere [10, 34, 35] and represent standard practice in proteomics.

Software

PLS–DA with variable selection was performed by PARVUS (M. Forina, University of Genova, Italy, http://www.parvus.unige.it/). Zernike moments were computed by software developed in-house in Visual Basic (Microsoft Visual Studio 6.0). Data pretreatment and graphical representations were performed by Visual Basic, Parvus and Microsoft Excel 2003.

Results and discussion

Image pretreatment

Each map was automatically digitalized to provide a grid of 100 × 100 pixels, where each pixel contained the grayscale intensity at the corresponding position in the image. These values therefore ranged from 0 (black) to 255 (white).

Each image map was then pretreated to eliminate the contribution from the background to the signal. This correction exploits two threshold values that must be fixed: the value of the first derivative of each image (the slope), indicating the presence of an actual spot rather than noise, and the value of the pixel (the cut), indicating the threshold value corresponding to the background.

For each image, the first derivative is calculated as difference between the values of two adjacent pixels (calculations are performed row-wise). Each image is then corrected for the background by considering the first derivative and the value of each pixel simultaneously: if the first derivative is less than the slope and the pixel is larger than the cut, the value of the pixel is set to 255 (white). Good results can be obtained with cut values of 100–150 and slope values ranging from 10 to 20. Figure 1 shows an example of a sample corrected using cut = 100 and slope = 15. All of the maps were treated using these two values for cut and slope.

Fig. 1
figure 1

Example of map pretreatment: sample CTR1 from the Lymphoma dataset before (a) and after (b) background correction with cut = 100 and slope = 15

Zernike moment calculation and dataset preparation

For each image, Zernike moments were calculated with a maximum p order of 100. This procedure provides a total of 2601 moments. The algorithm allows the separate calculation of the real and imaginary parts of the moments, providing a total of 5202 descriptors for each image: 2601 corresponding to the real parts of the moments and 2601 to the imaginary parts. Since the X matrix can only contain real numbers, only the coefficient of the imaginary part of the Zernike moment was considered (i.e., the numerical coefficient multiplying the i character). For example, the complex number (−5.34 − 0.0478i) can be separated into two parts: −5.34 is the real part and −0.0478 is the imaginary part.

The samples in the four different datasets were coded as follows:

  • Lymphoma dataset: GRANTA519 cell line (CTR1-4) and MAVER-1 cell line (MAV1-4)

  • Neuroblastoma dataset: adrenal mouse glands (CTR1-4) and adrenal mouse glands affected by neuroblastoma (ILL1-4);

  • Colon cancer dataset: nuclei from colon cancer cells (CTR1-6) and nuclei from colon cancer cells treated by a HDAC inhibitor (NHD1-5); lysates from colon cancer cells (CTR1-5) and lysates from colon cancer cells treated with the inhibitor (LHD1-5)

  • Pancreatic cancer dataset: Control (PACA1-4) and drug-treated (PTSA1-4) PACA 44 cell line; control (T3M41-5) and drug-treated (TTSA1-5) T3M4 cell line.

PLS–DA

Each dataset was autoscaled before performing PLS–DA. PLS–DA was applied, as specified in the “Theory” section, with variable selection, exploiting a backward elimination algorithm. This procedure enables only the most relevant moments—those that allow the correct classification of the samples (minimum RMSECV)—to be identified.

Due to the large number of variables present (5202), the backward elimination procedure was applied in two consecutive steps:

  • A first selection was made where groups of nonsignificant variables were eliminated, providing a final dataset containing a maximum of 200 moments, based on finding the smallest error in cross-validation

  • A second refinement was performed by eliminating the variables one at a time in order to select the actual number of moments that provide the smallest error in cross-validation.

Table 1 reports the amount of variance explained by the first LV for each case study for both X and Y variables. The first LV was considered the significant one for all of the cases under investigation (leave-one-out cross-validation). The first LV in fact explains more than 99% of the total amount of information contained in the Y variable; the only exception is the Nuclei dataset, for which it explains about 96%. The use of one LV in each classification model allows the correct classification of all of the samples in each dataset with a final NER% (non-error rate) of 100%.

Table 1 Variance (%) explained by the first LV for the X and Y variables for each case study

Figures 2 and 3 report scores and loadings plots, respectively, for all of the investigated datasets. The control sample class is plotted as circles and the other class as squares in the score plots. For all cases, the control samples are located at large negative values along the first LV, while the other class is located at large positive scores. This behavior confirms the ability of the first LV to separate the samples in the two classes present for each dataset.

Fig. 2
figure 2

Score plots of the first two LVs for the four datasets investigated: Lymphoma (a); Neuroblastoma (b); Pancreas (PACA cell line: c; T3M4 cell line: d); Colon cancer (Lysates: e; Nuclei: f). Control samples are represented as circles; samples belonging to the second class are represented as squares

Fig. 3
figure 3

Loading plots of the first two LVs for the four datasets investigated: Lymphoma (a); Neuroblastoma (b); Pancreas (PACA cell line: c; T3M4 cell line: d); Colon cancer (Lysates: e; Nuclei: f). Zernike moments are represented by numbers

The analysis of the corresponding loading plots allows the identification of the Zernike moments responsible for the differences between the classes of samples. In each loading plot, each moment is represented by a number from 1 to 5202. Moments located at large negative values along LV1 show large positive values for control samples and large negative ones for the other class; the moments located at large positive values along LV1 show the opposite behavior: large negative values for control samples and large positive ones for the other class in each dataset. The loading plots report only the moments found to be significant by the backward elimination procedure applied to each dataset.

Table 2 reports the R 2 and RMSE values obtained during fitting and cross-validation for all of the datasets investigated. The R 2 values show that the models provide very good performances in terms of both fitting and validation. The worst results were obtained for the Nuclei dataset, which still however presents R 2 and \(R^{2}_{{cv}} \) values of above 0.94. The good abilities of the derived classification models to describe the information provided (fitting) and to predict new values (validation) are also demonstrated by the RMSE values calculated: the fitting errors (RMSEC) are almost all below 0.1 (the only exception is for the Nuclei dataset), while validation errors (RMSECV) are almost all below 0.11 (again, the only exception is for the Nuclei dataset).

Table 2 R 2 and RMSE values calculated for fitting (R 2, RMSEC) and cross-validation (\(R^{2}_{{cv}} \), RMSECV) for all of the datasets investigated

These conclusions are also confirmed by Fig. 4, which reports, for each case study, the calculated and predicted Y values vs. the actual Y values. In all cases there is good agreement between the actual and the calculated or predicted values. Since in this case PLS is used as a classification tool, the most important information provided by these diagrams is represented by the variations in the calculated and predicted values along the Y axis: the positions of both the fitted and the validated values at negative values for control samples and at positive ones for the other class in each dataset prove that the models derived here provide 100% NER%. This is also true for the nuclei dataset, even if the variations along the Y axis appear to be the largest in this case.

Fig. 4
figure 4

Calculated and predicted Y values vs. reference Y values for the four datasets investigated: Lymphoma (a); Neuroblastoma (b); Pancreas (PACA cell line: c; T3M4 cell line: d); Colon cancer (Lysates: e; Nuclei: f). Calculated values are represented as circles, while predicted values are shown as squares. Solid regression lines correspond to calculated values, while dotted regression lines correspond to predicted values

Table 3 reports the number of significant moments selected for each dataset; for each case studied, the real and imaginary parts of the moments are reported separately. Moments are represented by an alphanumeric string reporting the values of the orders p and q followed by R if the moment represents the real part or by I if it represents the imaginary part. The number of significant moments (ranging from five for the Lysates dataset to 32 for the Nuclei dataset) shows the importance of the selection procedure, which eliminates information present in the maps that is not directly related to the classification of the samples (i.e., redundant information). The analysis of the p, q orders found to be significant then shows that the significant moments do not show recursive p, q values; in other words, for the different cases studied, different moments are significant. This is logical, since different classes of maps will show differences in different areas of the maps themselves. Unfortunately, it is not a trivial task to directly identify the features in each group of images that determine the differences between the two classes investigated: this is due to the particular nature of Zernike moments (and other image moment functions), which capture global independent aspects of each image.

Table 3 Zernike moments considered to be significant for the six cases under investigation; real and imaginary parts are reported separately

Conclusions

A new method for the fast comparison of proteomic 2-D maps is presented here. The method exploits Zernike moment functions coupled to classification tools. Zernike moments were calculated for four different datasets (six case studies in total) of varying complexity, all characterized by the presence of two classes: control samples and diseased or treated samples.

The procedure developed proved to be a successful tool for extracting the global information present in the maps obtained from 2-D gel electrophoresis: PLS–DA provided the correct classification of all of the samples for all of the cases investigated. The application of backward elimination procedures enabled the most parsimonious set of moments that provided the best cross-validation results to be identified. For the cases investigated, final numbers of moments ranging from 5 to 32 were found to be significant for classification.

The method proposed could be applied in principle to perform rapid comparisons of 2-D proteomic maps; increasing the number of samples in each class could also lead to its use in diagnostic applications.

It is, however, important to point out that Zernike moments extract general independent aspects of an image, and so they do not easily and directly provide information about the differences that exist between the classes of maps investigated. At the moment, the reconstruction of the images based on the significant moments selected enables large areas of the image containing the most relevant differences to be identified. For all the cases studied, these areas contain relevant information (i.e., actual spots). This initial information is important, since we can state that Zernike moments classify the images based on the spots rather than differences in the background or image artefacts. However, this information is not sufficient if the purpose of the diagnostic (a role ably fulfilled by the proposed procedure) is connected to the identification of differences (functional proteomics). Work is in progress in our lab to solve this problem and thus to also make Zernike moments useful from this point of view.