Introduction

Oxaliplatin is a member of the family of Pt-containing chemotherapeutic agents that also include cisplatin and carboplatin. It is distinguished from these two older drugs by its different spectrum of activity both in preclinical models [1820] and in clinical trials (reviewed in reference 17). Oxaliplatin is the only Pt-containing drug to have activity in colon cancer, a disease for which this drug has now become a mainstay of therapy [17]. Nevertheless, the majority of patients with colon cancer, as well as the other types of tumor against which oxaliplatin is effective, are either intrinsically resistant to this drug, or become resistant during therapy.

Oxaliplatin is believed to kill cells by forming adducts in DNA, the most prevalent of which is the intrastrand linkage of two adjacent guanines by the nitrogen atoms at position 7, that impair its structure and function (reviewed in reference 4). Oxaliplatin-resistant cells are characterized by reduced cellular oxaliplatin accumulation and decreased DNA adduct formation [16]. Increased levels of intracellular glutathione and abnormalities in the apoptotic pathway may also play a role [8]. Oxaliplatin can enter the cell via the major copper influx transporter CTR1 [14], but whether this transporter is disabled in oxaliplatin-resistant cells has not yet been investigated. It remains a major goal of research in the field to identify the mechanisms that mediate oxaliplatin resistance with the aim of either preventing or overcoming this problem that so often limits therapeutic effectiveness.

Recently, microarray technology has made it possible to examine the mRNA levels for very large numbers of genes simultaneously. In this present study, we used cDNA microarrays to compare gene expression differences in pairs of related cell lines sensitive and resistant to oxaliplatin. The stably oxaliplatin-resistant sublines were selected from the parental cell lines by repeated cycles of exposure to oxaliplatin in vitro. The statistical technique of “significance analysis of microarrays” (SAM) was used to identify genes whose mRNA levels were statistically significantly differentially expressed, and these were mapped to the Kyoto Encyclopedia of Genes and Genomes (KEGG) biochemical pathway and chromosomal location databases. We report here the identification of pathways and chromosomally juxtaposed genes not previously known to be associated with resistance to this important chemotherapeutic agent.

Materials and methods

Cells and culture

Four human ovarian carcinoma cell lines (2008, A2780, 1A-9 and IGROV-1) and one human squamous cell carcinoma of the head and neck cell line (UMSCC10b) were used in this study. Sublines with stable resistance to oxaliplatin (2008-R7, A2780-R4, 1A9-OX15, IGROV-1-R6, and UMSCC10b-R5) had been prepared from each parental line by repeated in vitro exposure to oxaliplatin as previously described [16]. All cell lines were maintained in drug-free RPMI-1640 medium (GIBCO) with 5% heat-inactivated (2008 and 2008-R7) or 10% (all other cell lines) fetal calf serum (GIBCO) at 37°C in a humidified atmosphere containing 5% CO2. The degree of resistance of each subline was determined again at the time RNA was harvested using a clonogenic assay with continuous drug exposure as previously described [16].

cDNA microarrays

cDNA microarrays were purchased from the Stanford Functional Genomics Facility (http://www.microarray.org) and contained 43,200 elements representing approximately 29,593 genes as estimated by UniGene clusters.

RNA isolation and cDNA synthesis

When the cell cultures reached about 80% confluence they were lysed with a guanidine isothiocyanate buffer (4 M guanidine isothiocyanate (Gibco), 25 mM sodium acetate, pH 5.5 (Ambion), 0.5% Sarkosyl (Fisher Scientific) and 0.1 M 2-mercaptoethanol (Gibco). Total RNA was pelleted through a CsCl (Gibco) step gradient and reverse-transcribed into cDNA with a 2:1 ratio of aminoallyl-dUTP to dTTP (Sigma). cDNA from oxaliplatin-resistant cell lines was then labeled in a separate reaction with Cy3 (Amersham Biosciences) and cDNA from oxaliplatin-sensitive cell lines was labeled with Cy5 (Amersham Biosciences).

Microarray hybridization and washing

Cy3- and Cy5-labeled cDNA (500 ng) were hybridized to the cDNA microarrays for 18 h at 42°C. The arrays were washed four times with 1× SSC containing 0.1% SDS, twice with 1× SSC, once with 0.1× SSC and finally spun dry.

Microarray scanning and quality assurance

Features on the microarrays were located and Cy3 and Cy5 fluorescence intensities were analyzed with GenePix Pro 3.0 software using a GenePix 4000A scanner (Axon Instruments). The data sets were imported into Microsoft Excel spreadsheets for analysis of the quality of each feature. Four parameters were used to assess the quality of each feature, and features were excluded for any of the following conditions: diameter <50 μm; ≥50% saturated pixels in both channels; <54% of the pixels with an intensity greater than the median background intensity plus one standard deviation in either channel; flagged by GenePix as “not found” or “absent” or manually flagged as “bad” due to high background, misshapen features, scratches or debris on the slide undetected by GenePix. The log2(Cy3/Cy5) was calculated for each feature and these values were normalized using the within-print tip group normalization method based on locally weighted regression (lowest) as proposed by Yang et al. [23].

Identification of genes of interest using SAM

The SAM software is a statistical tool that was developed for finding differentially expressed genes in microarray experiments. It works as a Microsoft Excel add-in and is available via http://stat.stanford.edu/~tibs/SAM/index.html. SAM was used to compute a d score for the normalized log2(Cy3/Cy5) of each gene. The d score is a modified t-statistic value that in our experiments was calculated as the mean log2(Cy3/Cy5) divided by the standard error to which a constant value was added. The addition of a constant value gives the tests more power on average and diminishes large d scores that arise from genes whose expression level is near zero [21]. The cut-off for significance is determined by a tuning parameter, delta, which is chosen by the user based on the estimated false discovery rate (FDR). In these studies the value of delta was always chosen so that the estimated FDR was about one gene. After filtering the data sets to include only those features for which a log2(Cy3/Cy5) value was available in at least four of the six replicates, each cell line pair was subjected to SAM [22] with permutations honoring the pairing of resistant/sensitive members of each pair within each replication. SAM analyses were carried out on the data sets from each cell line pair separately. First, the average log2(Cy3/Cy5) was determined for each feature for the six replicates for a given cell line pair. Those features for which an average log2(Cy3/Cy5) was not available for all five pairs were discarded. Finally, the average of the normalized log2(Cy3/Cy5) ratios of expression level of each feature was computed for each cell line pair and these averages were subjected to SAM across all five pairs.

Identification of genes of interest by hierarchical clustering

Complete linkage hierarchical clustering was carried out using the Cluster software version 2.11 that was written by Michael Eisen and was downloaded from http://rana.lbl.gov/EisenSoftware.htm [6]. The Pearson r (uncentered) was used as a measure of similarity. The results were analyzed and visualized with the TreeView program version 1.50 that was also written by Michael Eisen and downloaded from http://rana.lbl.gov/EisenSoftware.htm. This analysis was based on the average log2(Cy3/Cy5) of all features with significantly higher or lower expression levels in at least one cell line pair as determined by SAM. Clusters of interest were identified by visual inspection.

Results

Identification of genes of interest by SAM

The SAM technique [22] was used to identify genes that were statistically significantly upregulated or downregulated in the oxaliplatin-resistant member of each of the five cell line pairs. Only those features that passed quality assurance criteria in at least four of the six replicates were included in the analysis. The number of features whose cognate mRNA was significantly increased ranged from 246 to 2790 (1.7–16.6% of the features included in the analysis). The number whose mRNA was significantly decreased ranged from 30 to 3487 (0.2–20.7%). Across all cell line pairs an average of 15,472 features were included in the SAM analysis; an average of 1594 were found to have significantly higher expression and 1520 features were found to have lower expression levels. The number of features that were significantly differentially expressed in three or more cell line pairs was quite limited (235 up, 190 down). Only 28 features had increased expression and 15 decreased expression in any four of the five cell line pairs, and only three features had increased expression and none had decreased expression in all five cell line pairs.

To determine whether the genes identified by SAM as being differentially expressed exhibited any coordinate patterns of expression, complete linkage hierarchical clustering of features and cell lines was performed. This analysis was based on the average log2(Cy3/Cy5) of all features with significantly higher or lower expression levels in at least one cell line pair as determined by SAM. By visual inspection, a total of eight clusters of coordinately upregulated features (Fig. 1) and ten clusters of coordinately downregulated features were identified (Fig. 2). These clusters contained an average of 17 genes each. Attempts were made to identify genes within these clusters known to operate in the same biochemical pathway or to have a similar function based on gene ontology databases; however, no functional associations have thus far been discerned.

Fig. 1
figure 1

Clusters of features that had higher expression levels in the oxaliplatin-resistant cell lines compared to their parental oxaliplatin-sensitive cell lines represented by yellow intensities. Each heat map depicts a separate cluster. Gray represents missing or excluded data, blue means a lower expression level and black means no difference in expression level

Fig. 2
figure 2

Clusters of features that had lower expression levels in the oxaliplatin-resistant cell lines compared to their parental oxaliplatin-sensitive cell lines represented by blue intensities. Each heat map depicts a separate cluster. Gray represents missing or excluded data, yellow means a higher expression level and black means no difference in expression level

Identification of pathways in KEGG

The ultimate goal of identifying genes that are differentially expressed in resistant cells is to determine what biochemical pathways have become altered during the development of oxaliplatin resistance. Having identified genes based on their detection by SAM, it was of interest to determine whether any of these genes appeared to be part of the same pathway. The approach taken was to search the KEGG biochemical pathway database using the genes that were found by SAM to be differentially expressed in any of the five cell line pairs. Prior to this analysis, the data were filtered to remove all duplicate features as well as any gene that did not have an appropriate identifier as used by KEGG. In order to evaluate the association between genes and biochemical pathways, an estimate was needed of the number of genes represented on the microarray expected to be associated with any of the 127 pathways in the KEGG database by chance. The approach taken involved determining the number of SAM-identified genes that are expected to be associated with each of the pathways given the number of genes represented on the microarray (N), the number of these that are in a given pathway (M), and the number of genes found by SAM analysis for a given cell line pair (K). The number of genes expected to target any particular pathway is given by the function (K×M)/N. The number of hits actually observed in a pathway was then compared to the expected number of hits by Chi-squared analysis. Chi-squared values of >6.0 are considered of interest in this type of analysis.

A relatively large number of pathways were targeted by the SAM-identified genes upregulated in the resistant cells in only one of the cell line pairs (21 or 17% of the 127 pathways available for analysis). However, there were only four pathways that were significantly targeted in two cell line pairs, only two that were targeted in three cell line pairs and none that was targeted in four or all five cell line pairs. Likewise, there were a relatively large number of pathways that were targeted by the SAM-identified genes downregulated in the resistant member of at least one cell line pair (32 or 25% of the pathways). However, only 12 were significantly targeted in two cell line pairs, only one in three cell line pairs, and none in four or all five cell line pairs. Table 1 lists the pathways significantly targeted in two or more cell line pairs. There was no association with biochemical pathways previously identified as being important to sensitivity to the Pt-containing drugs including glutathione synthesis or DNA repair pathways. Figures 3, 4 and 5 present the maps of the three most commonly targeted KEGG biochemical pathways color-coded to identify the cell line pair for which SAM-identified genes were found. It is apparent from visual inspection that when a pathway is targeted by SAM-identified genes, it is often the same genes in the pathway that are hit in the different cell line pairs.

Table 1 Biochemical pathways targeted by SAM-identified genes (numbers in parentheses are the number of cell line pairs in which the pathway was targeted)
Fig. 3
figure 3

Genes of the Huntington’s disease pathway identified by SAM analysis. Each color represents a different cell line pair. Genes with multiple colors were found by SAM analysis to be significantly differentially expressed in multiple cell line pairs

Fig. 4
figure 4

Genes of the ribosome pathway identified by SAM analysis. Each color represents a different cell line pair. Genes with multiple colors were found by SAM analysis to be significantly differentially expressed in multiple cell line pairs

Fig. 5
figure 5

Genes of the ATP synthesis pathway identified by SAM analysis. Each color represents a different cell line pair. Genes with multiple colors were found by SAM analysis to be significantly differentially expressed in multiple cell line pairs

Identification of genes that reside close together on the chromosome

Another approach to assessing the significance of genes identified by SAM is to determine whether any of these differentially expressed genes lie close to each other on their cognate chromosome. The underlying hypothesis of this approach is that such genes may be part of an amplicon or a deletion. Although not every SAM-identified gene, or genes located near a SAM-identified gene, is necessarily involved in the development of oxaliplatin resistance, the groups containing the largest number of SAM-identified genes residing close together are candidates for participation in a resistance-specific amplicon or deletion.

Initially, the fraction of the genes on each chromosome represented on the microarray that were SAM-positive was determined for each chromosome based on the gene location provided by the Stanford Functional Genomics Facility. This did not disclose any clear association of SAM-identified genes with a particular chromosome in more than one cell line pair.

In order to refine this analysis, the genomic location of the start site for each SAM-identified gene was determined from the build 30 freeze of the human genome using the Ensembl database (www.Ensembl.org). At the time of this analysis, genomic start sites were only available for those genes with an Ensembl identification. The frequencies of all cases where two, three, four or more upregulated or downregulated SAM-identified genes were found to lie within 10 kb of each other are presented in Tables 2 and 3.

Table 2 Number of SAM-identified upregulated genes with start sites within 10 kb of each other
Table 3 Number of SAM-identified downregulated genes with start sites within 10 kb of each other

The most interesting cases were those where multiple genes that were identified by SAM to be differentially expressed in multiple cell lines were found to reside close together. Among the upregulated genes, there was one group of genes (APACD, IF-2, and REV1L) whose start sites lie immediately adjacent to each other on chromosome 2 (within 1.1 kb) and for which there are no intervening genes in build 30 of the human genome. This group was identified by SAM as differentially expressed in three of the five pairs of cell lines (2008, UMSCC10b and IGROV-1). Additional such groups containing three SAM-identified upregulated genes with no other intervening genes were found on chromosome 17 (PSME3, BECN1, and Q9BTE6 in the 2008 cell line pair only), chromosome 12 (DDIT3, DCTN2, and MARS in the 1A9 cell line pair only) and on chromosome 20 (C20orf30, PCNA and CDS2 in 1A9 cell line pair only). The most interesting downregulated SAM-identified genes was a group of three genes on chromosome 9 that lie in very close proximity with no intervening genes (RPL7A, SURF1 and SURF2). This group of genes was identified by SAM as differentially expressed in the 2008 and UMSCC10b cell line pairs. Table 4 identifies and provides a brief description of these genes.

Table 4 Genes identified by SAM analysis that reside close to each other on the chromosome

Assessing the significance of finding genes that reside in close proximity on the chromosome

The finding of SAM-identified genes that lie close to each other raises the question as to whether this could reasonably be attributed to chance alone. The probability of finding such genes in close proximity by chance alone depends on several factors including: (1) a judgment as to how close the genes must be to be considered in proximity; (2) how many SAM-identified genes are included; and, (3) in how many cell lines is each gene found to be differentially expressed. Given a set of choices for each of these criteria, the distribution of the number of cases where SAM-identified genes would be found in proximity by chance alone, under the hypothesis of random sampling, can be tabulated by randomly drawing samples of genes from the pool of all genes on a given chromosome that are represented on the microarray where the sample size is equal to the number of genes identified by SAM. Setting the criteria that the genes must lie within 100 kb, that the group contain at least three genes, and that that they were SAM-identified in at least three pairs of cells, 100 such samplings were performed and the number of genes satisfying these criteria was tabulated. The average number over the 100 samplings provides an estimate of the number of cases where SAM-identified genes would be expected to lie next to each other by chance alone.

This analysis indicated that there was an excess of genes identified by SAM as differentially expressed showing up in short windows. Interestingly, one window was identified in which three genes lying within a 100 kb region were found to be upregulated in three cell line pairs; these are the same genes lying on chromosome 2 that were found to have no intervening genes in build 30 of the human genome. This analysis permitted the conclusion that this was an unexpectedly strong grouping since only 0.05 such cases were expected by chance alone.

Associations between genes of interest and gene ontology classifications

In the interest of determining whether there existed any functional similarities between SAM-identified differentially expressed genes, the ontology classification was obtained for each gene from the Source Database [5]. The proportion of differentially expressed genes was calculated in each available gene ontology category and compared to the proportion of differentially expressed genes not in that category. This approach results in a 2×2 table of the number of genes in (or not in) a gene ontology category and the number of genes identified by SAM as being upregulated or downregulated or not. A P-value was calculated to determine whether the proportion of differentially expressed genes in each gene ontology category was significantly different from the proportion of genes not contained in the category. Permutation tests were performed to determine the reference distribution of the calculated P-values for each gene ontology category. Finally, the permutation distribution was used to calculate the expected number of false discoveries for each P-value. Based on this analysis, no gene ontology categories contained a significant number of SAM-identified genes.

Discussion

In the current study, we sought to analyze the functional relevance of genes identified as significantly differentially expressed in oxaliplatin-resistant cells compared with their sensitive parental lines. The first approach involved searching the biochemical pathways of the KEGG database to locate those in which multiple genes involved in the pathway were identified by SAM as being statistically significantly upregulated or downregulated in oxaliplatin-resistant cell lines. A formal statistical approach was applied to the analysis to identify those pathways in which the number of SAM-identified genes exceeded the number expected based on the total number of genes in the pathway that were also represented on the Stanford cDNA array. Although many pathways were found to be hit more often than expected by chance alone in one cell line pair, only four pathways were hit by upregulated genes in two cell line pairs and only two pathways in three cell line pairs (Figs. 1 and 2). Likewise, only 12 pathways were hit by downregulated genes in two cell line pairs and only one in three cell line pairs (Fig. 3). The pathways identified in three cell line pairs are of particular interest. First, they are not pathways previously identified as having anything to do with resistance to any of the Pt-containing drugs. Thus, they offer new insight into a possible mechanism of Pt-drug resistance. Second, the individual steps in the pathway hit in the SAM-identified genes from one pair tended to be the same steps hit in the other pairs. Third, there is substantial confidence that those pathways identified in multiple cell line pairs really are of interest because of the statistically formal approaches used to identify these genes and pathways. Oxaliplatin is thought to produce a variety of mutations, including deletions, point mutations and base substitutions similar to those produced by cisplatin [3, 15]; however, it remains to be determined whether the reason that the same genes are hit in multiple pairs is due to the presence of a sequence that is particularly susceptible to oxaliplatin-induced mutagenesis.

The observation that several members of the ATP synthesis pathway were identified by SAM as downregulated in the A2780, 1A9 and UMSCC10b cell line pairs is of particular interest. Although there do not appear to be many studies of mitochondria in oxaliplatin-resistant cell lines, cisplatin-resistant cells have been shown to have abnormalities of mitochondrial function and structure in several different laboratories [2, 8, 9] and some elements of the electron transport chain have previously been identified as being abnormally expressed in such cells [11]. How reduced expression of genes in the ATP synthesis pathway is etiologically linked to oxaliplatin resistance is not currently apparent.

Huntington’s disease is a neurodegenerative disorder characterized by the loss of striatal and cortical neurons [1] and several genes in this pathway were found to be differentially expressed in 2008, 1A9 and UMSCC10b cell line pairs. Of interest is caspase 8, an apoptosis-related cysteine protease, NCOR1, which promotes chromatin condensation and prevents access to the transcription machinery and calmodulin, which plays a role in growth and cell cycle control. The ribosome pathway also contains several genes found to be differentially expressed in the 1A9, IGROV-1 and A2780 cell line pairs. Of interest are RPLP1, RPLP2, and RPS6 as these genes are differentially expressed in three of five cell line pairs. RPLP1 and two are components of the large ribosomal subunit and play an important role in elongation during protein synthesis. RPS6 is a component of the small ribosomal subunit and is a major substrate for protein kinases in the ribosome. Phosphorylation of RPS6 is regulated by growth factors and tumor-promoting agents and, in turn, regulates growth progression and arrest. It seems feasible that expression of these genes may contribute to the control of cell growth and proliferation such that resistant cells are able to avoid cell death upon exposure to oxaliplatin.

The finding that some SAM-identified genes lie immediately adjacent to each other on the chromosome provides strong validation of the ability of the expression profiling approach utilized in this study to identify genes of interest. A formal statistical analysis indicated that the probability of finding three such genes together by chance alone was low enough that this result is highly significant. That truly differentially expressed genes might be located close to each other makes structural sense since they may be included in amplicons or deletions unique to the oxaliplatin-resistant cells.

Among all the genes thus far identified as being associated with the oxaliplatin-resistant phenotype, the subsets of SAM-identified genes that also lie immediately adjacent to each other on the chromosome are of the highest interest. In particular, the three genes found on chromosome 2 (APACD, IF-2, and REV1L) appear to be important because these genes were significantly upregulated in the resistant member of three of the five pairs cell lines examined (2008, IGROV-1 and UMSCC10b). Another line of inquiry has provided independent evidence of the likely importance of at least one of these genes. The REV1L gene codes for one of the three proteins (REV1, REV3, and REV7) that together form DNA polymerase ζ. DNA polymerase ζ is one of the recently discovered specialized polymerases that can replicate across various kinds of adducts in DNA, sometimes producing mutations in the process (reviewed in references 7, 10 and 12). Recent work in this laboratory has shown that polymerase ζ is a major determinant of the mutagenicity of cisplatin. Loss of polymerase ζ function results in hypersensitivity to cisplatin and a marked reduction in its ability to generate drug-resistant clones in the surviving population [13]. This provides a mechanistic basis that supports the suggestion from the current study that enhanced expression of polymerase ζ contributes to oxaliplatin resistance.

The goal of this project was to identify genes whose expression differed in oxaliplatin-sensitive and oxaliplatin-resistant cells with the eventual aim of determining the mechanisms of resistance. However, selection of carcinoma cells for acquired oxaliplatin resistance itself generates mutations, many of which may lead to altered gene expression. Therefore, although many genes are expected to be differentially expressed in any given cell line pair, only a fraction of these are expected to be consistently differentially expressed in multiple pairs of cell lines and it is these genes that are most likely to mediate oxaliplatin resistance. In an attempt to identify such genes, it is pertinent to search among a large number of genes, and to use approaches that are capable of detecting a real signal against the typically noisy background associated with cDNA microarrays.

We conclude that the approach of using pairs of cell lines, each consisting of a drug-sensitive parent and a resistant subline of the same cells, in combination with a large number of independently isolated RNA samples and hybridizations and a rigorous statistical approach is efficient for the discovery of genes whose expression may be associated with oxaliplatin resistance. Because of the rigor of this approach there is a high degree of confidence that the genes identified are in fact differentially expressed in oxaliplatin-resistant cells. The further finding that genes discovered to be associated with oxaliplatin resistance in this way are also statistically significantly associated with particular biochemical pathways and chromosomal locations, provides further evidence of the utility of this strategy.