1. Diagnosis and Prognosis

Diagnosis and prognosis are core disciplines of modern medicine. Cancer diagnosis is usually associated with dismal prognosis, so that, in oncology, both disciplines are closely associated. Modern developments of clinical genomics tend to confirm this association, since molecular markers of disease can often be used for both purposes. However, associating diagnosis with prognosis is an oxymoron: diagnosis is a generalization, the result of a classification that is independent of the individual case, while the prognosis describes the probable course of the illness in a particular patient.[1] Another difference has to be made in terms of time. Diagnosis detemporalizes the disease process, while prognosis unrolls a momentary state into a significant time sequence.[2] In molecular medicine, diagnosis can be made by identifying common traits between people with illness. For example, common proteomics patterns can be determined in patients with cancer: this approach is usually named expression proteomics. In contrast, to forecast prognosis, only those qualitative and quantitative differences in protein expression that ultimately result in dysfunctions in cellular behavior and thus in clinical phenotype over time will be selected (so-called functional proteomics).

2. The Dimensions of Prognosis

Forecasting plays a central role for decision-making in oncology, and prognosis concerns not only the end of disease, but also its progression and duration.[3] Unlike diagnosis, prognosis depends on a collection of variables, the so-called prognostic factors. Prognosis must be evaluated in the context of the endpoints of interest, endpoints that can be multiple (metastasization, response, local tumor control, organ preservation, survival, etc.). Finally, prognosis also changes with the therapeutic choices made. Thus, prognosis is multidimensional.

Conventional prognostic factors include, for instance, macroscopic and microscopic tumor characteristics, and the most important disease-related prognostic factor is the anatomic extent of disease at the timepoint of diagnosis. This extension is defined according to the International Union against Cancer (UICC) Tumor Nodes Metastasis (TNM) classification:[4] for example, 90% of patients with stage I colorectal cancer will survive disease-free for 5 years, when less than 5% of patients with stage IV cancer survive so long (table I).[5] Other tumor-related factors include microscopic characteristics such as tumor grade and vessel infiltration, both of them correlating with recurrence and survival. In rectal cancer, for example, tumors showing a poor differentiation and a vascular invasion are associated with a poorer survival.[6]

Table I
figure Tab1

Example of a tumor-related prognostic factor. Survival in colorectal cancer by tumor nodes metastasis (TNM) stage (without adjuvant therapy). Probability of survival varies from 90% to <5% according to the anatomic extent of disease at the timepoint of diagnosis. The most powerful predictor of outcome in patients with newly diagnosed colorectal cancer continues to be the pathologic stage

However, prognostic factors are not limited to the tumor itself: they are defined as variables accounting for the heterogeneity associated with the expected course and outcome of a disease,[7] including factors concerning the patient and his environment. Thus, analyzing only the tumor, for example with proteomics tools, will only provide part of the prognostic information. Treatment-related prognostic factors include, for instance, the clearance of the surgical resection margin. This clearance is defined by the R-classification of the UICC[4] and the presence of a positive pathologic surgical margin is associated with a poor prognosis. This is not trivial information: in colorectal cancer, a positive margin increases the local failure rateFootnote 1 in all TNM-stages from 3–85% (table II). In rectal cancer, the local failure rate has been reported between 5 and 50%, according to the quality of surgery.[8] Other factors such as complications following surgery also play a role, for instance anastomotic leakage after rectal cancer significantly decreases cancer-related survival.[9] Although usually not related to the presence of tumor, host-related prognostic factors may also have a profound impact on the outcome. In colorectal cancer for instance, co-morbidities such as obesity, weight loss, anemia, diabetes, cirrhosis, and renal failure, have been shown to represent preoperative risk factors.[10] Several prognostic factors, each individually giving predictions with relatively low accuracy, can be combined to provide a single variable of better accuracy. For example, all possible combinations of the TNMFootnote 2 categories are usually combined into four stages of disease (UICC stages I to IV)[4] that serve in turn as a basis for therapeutic decisions. However, the number of clinical and pathologic variables usually surpasses the number of TNM categories. In clinical practice, the number of clinical and pathologic variables considered for indicating adjuvant therapy is between 10 and 100. Of course, this conventional prognostic information is providing the clinical framework for proteomics studies, and not the converse (table III).

Table II
figure Tab2

Example of a treatment-related prognostic factor. Local failure rate depends on the clearance of the surgical margin (R0). The presence of a positive pathologic surgical margin (R1) is a poor prognostic factor

Table III
figure Tab3

Framework for prognostic studies using proteomics tools

3. Clinical Framework for Proteomics Studies

How accurate is forecasting based solely on clinical and pathologic variables?

Recently, the Commission on Cancer data from the National Cancer Data Base (NCDB) for patients with colon carcinoma was used to develop several artificial neural network models and regression-based models.[11] Twenty-one variables were inputted (3 patient-related, 6 treatment-related, and 12 tumor-related) and the survival prediction examined. The neural network found a strong pattern in the database predictive of 5-year survival status, by yielding a receiver-operating characteristic (ROC) areaFootnote 3 of 87.6%. In other words the specificityFootnote 4 was, 41% at a sensitivity to mortalityFootnote 5 of 95%. Using conventional statistics, the logistic regression yielded a ROC area of 82% and, at a sensitivity to mortality of 95%, gave a specificity of 27%. This is more accurate than the TNM classification alone. Since molecular factors only determine part of the prognosis, the significant protein patterns obtained by proteomic studies still need to be linked with relevant clinical data to provide accurate forecasting (figure 1).

Fig. 1
figure 1

Prognosis is multidimensional and is determined by multiple prognostic factors concerning not only the tumor, but also the patient and its environment. Prognosis is changing over time and with therapeutic choices. Better than genomic analysis such as sequencing or quantitative amplification, proteomics tools allow the study of tumor phenotype and to direct therapy against specific functional tumor characteristics (pharmacoproteomics). Proteomics analysis can complement the high informational content of clinical and pathologic prognostic factors. Shaded areas: privileged domains of application for proteomic studies.

4. Protein Technologies and Prognosis

Most current prognostic tests in cancer are based on the detection and quantification of single proteins in body fluids. The direct availability of these proteins in body fluids, in particular in the serum, is an important feature in clinical practice, because repeated sampling is possible with minimal invasiveness. The following tests are examples of those currently used in clinical practice:

  • • carcinoembryonary antigen (CEA) in colorectal cancer

  • • prostate-specific antigen (PSA) in prostate cancer

  • • α-fetoprotein (AFP) in hepatocellular carcinoma

  • • CA 19-9 in pancreatic cancer

  • • CA 125 in ovarian cancer

  • • calcitonin in medullary thyroid cancer, etc.

Historically, these tests —all based on enzyme-linked immunosorbent assay (ELISA) technology —were developed only on empirical grounds based on observation and correlation of measured levels of single proteins with the diagnosis, recurrence, or prognosis of disease. However, a common characteristic is the relatively low predictive value of these diagnostic tests, so that they have to be combined with other diagnostic procedures such as biopsy and/or endoscopy. In the meantime, improved analytical tools have driven the attention to possible improvements of these classical tests in order to improve their specificity and, thus, their potential for population screening. In any case, the relative clinical success of these tests has provided researchers with the proof of principle that protein-based tests can be successfully applied in cancer diagnosis, in detecting tumor recurrence and/or for outcome prediction. Patterns of protein expression have been shown to yield more biologically relevant and clinically useful information than assays of single proteins.[12] However, currently available technologies either limit the number of proteins which can be analyzed simultaneously, or are expensive, difficult, and time-consuming. Moreover, the tools adapted for expression proteomics may not be the same as those required for the investigation of protein function over time.

4.1 Two-Dimensional Electrophoresis

Over the last decade, improved 2-dimensional polyacrylamide gel electrophoresis (2-D PAGE) combined with advanced mass spectrometers such as matrix-assisted laser desorption lionization time-of-flight (MALDI-TOF), tandem mass spectrometry (MS-MS) and ion-trap, have allowed a systematic search of the cellular proteome, or a fraction of this proteome. These combined tools have been increasingly applied in the cancer setting, in particular to lung,[13] renal,[14] colorectal,[1517] pancreatic,[1] breast,[18] and bladder[19] cancers. Post-translational modifications in multiple oncogene and tumor suppressor gene products can also be highlighted simultaneously (figure 2).[20] Numerous proteins of interest have been identified in these studies, the clinical significance of which is currently under investigation. Of course, this is a cumbersome process requiring close links between bench and bedside (figure 3).

Fig. 2
figure 2

Mini 2-dimensional electrophoresis (2-D PAGE) immunoblot of selected gene products in human colorectal cancer versus normal control.[21] The proteomic approach shows not only quantitative, but also qualitative differences in protein expression. Cyclin E is a cell cycle control protein the expression of which correlates with metastasization in colorectal cancer.[22] Kip-1 (p27) is another cell-cycle protein correlating with disease-free survival in rectal cancer.[23] Crk is a signaling adaptor protein activating the AKT pathway in transformation.[24]

Fig. 3
figure 3

Clinical proteomics necessitate close collaboration between bench and beside. After obtaining informed consent, tissue, body fluid and associated data are sampled. Standardization is obtained by specific sample preparation. Proteins are usually denatured and separated using two-dimensional electrophoresis (2-D PAGE) or other techniques. After scanning, the protein pattern is matched between different conditions, e.g. tumors with favorable and dismal prognosis. Spots of interest, that are showing differential behavior, are localized, and excised. Protein fingerprint is obtained using mass spectrometry, and identification is possible by matching this fingerprint with databases. Finally, results are validated using clinical samples, e.g. by immunoblot, ELISA and/or immunohistochemistry.

A first example of a successful prognostic application of 2-D PAGE patterns has been delivered recently in ovarian tumors. These tumors range from benignity to aggressive malignity, including an intermediate class referred to as borderline carcinoma, so that their proper classification represents a challenge for pathologists. In turn, the prognosis of the disease is strongly dependent on tumor classification, patients with borderline tumors having a much better prognosis than patients with carcinomas. In clinical practice, the extent of surgery and the indication for chemotherapy depends on such correct classification. Hierarchical clustering was applied for analysis of protein profiles obtained by 2-D PAGE and allowed differential diagnosis of ovarian carcinomas and borderline tumors.[25]

4.2 Peptidomics

Peptides, such as hormones, cytokines, and growth factors, play a major role in carcinogenesis and tumor progression, but cannot be detected by 2-D PAGE, since this technique is methodologically restricted to the analysis of proteins with higher molecular masses (>10 kDa). ‘Peptidomics’ is a technology which covers peptides with low molecular weight and small proteins (0.5–15 kDa) where necessary, and combines liquid chromatography or affinity purification with mass spectrometric identification. Peptidomics has also been applied successfully to the analysis of human serum in the cancer framework.[26]

5. Limitations of Proteomics in Cancer

Indeed, protein studies also have several drawbacks. The identification of proteins of interest remains cumbersome, or even impossible in the absence of related information in repository databases. Current protein tools only allow narrow-range analyses, about 103 spots can be reproduced in the clinical setting.[27] Silver-stained 2-D PAGE is a semi-quantitative technique that is still hampered by an important interassay variability, the biological significance of small differences in protein expression being questionable: we found that 90% of proteins with an expression ratio >2 might reveal biological significance, when using purified normal and pathological epithelium in colorectal cancer.[27]

Protein expression and function are dynamic features, linked not only to synthesis, but also to processing, cleavage, etc. Thus, protein studies involve unravelling complex biochemical pathways. To analyze such complex patterns, relatively large amounts of protein are necessary, since no amplification technique is available, and access to large tissue samples in the clinical setting is a challenging task. In serum, detection of small amounts of proteins is difficult. Protein studies in the clinical setting require research tools, for example, polyclonal or even monoclonal antibodies, which first have to be developed. Only when such tools are available will clinical validation become relatively straightforward, in particular via immunohistochemistry —a technique widely available in pathology laboratories —and ELISA testing of body fluids.[28]

6. The Isotope -Coded Affinity Tag Method

A novel method, and the new software tools to support it, has been proposed to allow the large-scale, quantitative analysis of membrane proteins and other classes of proteins that have been refractory to standard proteomics technology, and their systematic identification and quantification. The method, called isotope-coded affinity tag (ICAT) consists of three steps: (i) preparation of microsomal fractions from clinical material; (ii) covalent tagging of the proteins with ICAT reagents followed by proteolysis of the combined labeled protein samples; and (iii) isolation, identification, and quantification of the tagged peptides by multidimensional chromatography, automated tandem mass spectrometry, and computational analysis. In cancer, this novel method has already been used to identify and determine the ratios of abundance of each of 491 proteins contained in the microsomal fractions of naive and in vitro-differentiated human myeloid leukemia (HL-60) cells.[29]

7. Protein Arrays: the Solution?

A major industrial effort in the development of various kinds of protein expression arrays is underway to overcome the present limitations of proteomics tools. The novel tools range from synthetic peptide to whole cellular proteins arrays.[30,31] These different protein arrays can be applied to examine in parallel the expression of thousands of proteins previously known only by their DNA sequence. This is achieved by immobilizing different probes (e.g. antibodies, lectins, DNA, receptor, etc.), followed by binding the complete proteome of a living cell, or, alternatively, by selecting a fraction of this proteome according to predetermined chemical or physical (cationic, hydrophobic, etc.) properties. This is exemplified by the surface-enhanced laser desorption ionization time-of-flight (SELDI-TOF) technology for discovering a small set of key proteins discriminating between normal and cancerous tissue: in a first step, a given population of proteins is selected on a protein chip according to specific physical or electrical properties. Then, differences in expression of these proteins between pathological and control samples are compared using sophisticated algorithms, a spectrum of peaks being used as a read-out.

This approach has been applied in ovarian[32] and prostate cancer. In the latter study,[19] profiles of serum from 167 patients with prostate cancer were compared with 77 patients with benign prostate hyperplasia and 82 age-matched unaffected healthy men. A sensitivity of 83%, a specificity of 97%, and a positive predictive valueFootnote 6 of 91% for the general population were obtained when comparing the prostate cancer sera versus non-cancer.[19] The ability to bind comprehensive sets of specific peptides and proteins on a chip also permits high-throughput screening for discrete biochemical properties, and thus an insight into functional aspects. These functional aspects will be critical to progress in prognostic oncology using proteomics tools. Some technical solutions have already been worked up that allow the detection of interactions between one protein and another, a small molecule and a protein and an enzyme and its substrate.[33] The first applications of this method have been seen in cancer: reverse-phase protein microarrays combined with laser-capture microdissection (LCM) have been applied to the study of the invasion front in prostate cancer: the degree of phosphorylation of pro-survival checkpoint proteins could be determined, highlighting a decreased phosphorylation of the extracellular-signal regulated kinase (ERK), a member of the Raf/MEK/ERK signal transduction cascade, and a surge in phosphorylated Akt, a component of the phosphatidylinositol 3-kinase AKT pathway at the tumoral invasion front.[34]

However, all protein chip technologies suffer the problem of miniaturization, identification of low levels of proteins being limited by the small volume of protein solution that is applied to the surface of the chip, so that downstream studies will remain difficult or even impossible in the near future.

8. The Role of Transcriptomics Studies

In contrast to the current narrow-range protein profiling, mRNA profiling enables us to monitor expression levels of thousands of transcripts in a cell simultaneously. For example, gene expression patterning was recently linked to outcome in a relatively large series of patients (n=117), and signatures identifying patients with a high risk of metastasization could be recognized in breast cancer.[35] Nevertheless, analyses in yeast and mammalian cells have demonstrated that mRNA levels alone are unreliable indicators of the corresponding protein abundance. This discrepancy between mRNA and protein levels argues for the relevance of additional control mechanisms besides transcription in the management of protein abundance.[36] As translational control is a major mechanism regulating gene expression, there is a strong rationale for a combination of the mature technical potential of transcriptomics with the physiological relevance of proteomics.

9. Toward the Development of a Cdlinical Molecular Scanner?

Although automation is often possible, a number of limitations still adversely affect the rate of protein identification and annotation in 2-D PAGE databases. These include:

  • • the sequential excision process of pieces of gel containing protein

  • • the enzymatic digestion step

  • • the interpretation of mass spectra (reliability of identifications)

  • • the manual updating of 2-D PAGE databases.

To overcome these limitations, a highly automated method which generates a fully annotated 2-D PAGE map has been recently proposed.[37] Using a parallel process, all proteins of a 2-D PAGE are first simultaneously digested proteolytically and electro-transferred onto a polyvinylidene difluoride (PVDF) membrane. The membrane is then directly scanned by MALDI-TOF mass spectrometry. After automated protein identification from the obtained peptide mass fingerprints,[38] a fully annotated 2-D PAGE map was created on-line (see reference for web address), containing interpreted peptide mass fingerprint data in addition to protein identification results. These methods were applied to a human plasma scan, where the presence of very abundant proteins, like albumin and immunoglobulins, is another difficulty because these proteins are so abundant that they obscure the detection of less abundant proteins. After discarding chemical noise, many proteins annotated on the SWISS-2-D PAGE human plasma master gel could be identified and interesting properties were observed.[39] No application of the molecular scanner in human cancer has been published so far.

10. Conclusion

Proteome technology has already been widely used in cancer research and found to be a useful tool for the identification of new molecular markers and treatment-related changes. The proof of feasibility has been delivered by the detection of known tumor markers by proteomics discovery techniques.

So far there is little literature on proteomic analysis in the field of cancer prognosis: this is probably because of the difficulty of bringing proteomic technologies to the bedside. We have discussed in detail how proteomics data would fit into the existing, complex prognostic scenario, and emphasized the need to combine new informations about tumors from proteomics with other non-genomic/proteomics information about patients, their treatment history, and their environment. In our opinion, these are preconditions for applying proteomics to direct patient benefit. Such bridging of clinical information with molecular data will necessitate the use of advanced, machine-learning tools such as neural networks or rule-based systems.

From the technical point of view, current proteomics technologies fall short of the goal of providing a complete proteome expression profile, and they suffer from a lack of standardization and reproducibility. Indeed, over the last few years, advances in mass spectrometry and other proteomics technologies have been substantial, leading to, in particular, the improvement of the sensitivity and dynamic range of detection.

Classical protein tests in oncology —such as the quantitation of CEA, CA 19-9, α-fetoprotein, and other assays —cannot be used for screening purposes because of their lack of specificity. Of course, proteomics discovery tools could allow the identification of a novel, single biological marker with such a high specificity that would allow screening or forecasting in a particular type of cancer. However, in our opinion, and on the basis of experience, the probability that future prognostic tests will derive from a single marker-approach is low: how could a single protein reflect the complex biological behavior of a malignant tumor in a particular patient.

Thus, an approach combining several markers and clinical information is more likely to be successful in forecasting cancer behavior. Typically, such a pattern approach would model several facets of tumor behavior, such as growth, invasiveness, angiogenesis, etc. Indeed, post-translational modifications (in particular phosphorylation and glycosylation) have been shown to play a major role in cancer behavior. Even in light of most recent developments in the off-gel approaches, 2-D PAGE remains extremely powerful for highlighting such post-translational modifications. Thus, we could well imagine that 2-D PAGE and the derived approaches —such as the molecular scanner described above —will play an important role in the markers discovery process. Of course, it is likely that gel-based approaches will be complemented by other discovery tools, in particular for the detection of signaling-relevant membrane proteins, low-abundance proteins, proteins with extreme pH, etc. However, because of the low reproducibility of gel-based approaches, and the cost and time needed for analysis, we believe that, for routine clinical applications, these approaches will be replaced by more robust techniques, such as small antibody-based protein arrays.

In summary, we strongly believe that clinical proteomics will have a positive impact on oncological forecasting, because these studies provide functional information on tumor in addition to gene expression data. It can be expected that this functional information will, in turn, be used for indicating and determining adjuvant chemo-and/or immunotherapy. Our preliminary results in forecasting metastasization on the basis of protein patterns, combined with clinical information, are most encouraging. Using similar approaches, it will be possible in future to better select the kind of chemotherapy on the basis of protein chemoresistance patterns (pharmacoproteomics). In the course of disease, proteomics tools will allow real-time monitoring of efficacy, and possibly of drug toxicity (toxicoproteomics). However, a number of changes still need to take place for proteomics to become useful to the oncologist.